Florian i applaud your enthusiasm, and have gone through similar thinking myself … and decided it isn’t worth the effort at this time.
My current view is:
-
not sure if you are meaning Rhasspy (https://community.rhasspy.org) v2.5, Rhasspy v3 (not completed), which became the core of HA Voice Assist (which is the topic of this thread). Mike stated some time ago that he intends to update Rhasspy when he gets time - which I believe is mostly documentation on how to use it for non-HA uses.
-
The Rhasspy Raspberry Pi hardware options are overly expensive for limited functionality.
- Raspberry Pi is a general purpose computer, and uses only a fraction of its CPU for the voice assistant and wakeword detection.
- While Speech-to-text and Intent recognition can run on a satellite RasPi, it is not particularly suitable for the compute-intensive techniques used by Digital Signal Processing.
- Driver for the seeed 2-mic HATs actually only uses 1 microphone and has none of the DSP (Digital Signal processing) magic we have come to expect from those big-name brands. HinTak has updated for new OS kernels, but no-one is interested in improving the code.
- Conferencing speakerphones reportedly give good audio quality - but at a high price.
- If someone already has a Raspberry Pi sitting around doing nothing, and a decent quality microphone, then it makes sense - but don’t spend money to go this route.
-
Mike and Paulus have talked (briefly) about an ESP32-S3 voice kit hardware device being developed by Nabu Casa; as @vunhtun says, this is the focus currently.
- The ESP32-S3 has a co-processor and additional hardware instructions that make it suitable for AI and the maths required for Digital Signal Processing … without the overheads required to run a full Linux OS.
- They mentioned the hope for this hardware to be released before the end of this year. They also want enough stock ready to ship at release so potential customers aren’t disappointed.
- The big question is price. Inevitably it will be compared (on both quality and price) directly with the current generation voice assistant devices from huge corporations who have been subsidising production. Totally unfair comparison, but there are a huge number of HA users who don’t seem concerned about privacy when it comes to their voice assistants.
-
I expect that this new ESP32-S3 voice kit will instantly become the recommended hardware for new satellites. I personally intend to replace my RasPis running Rhasspy with this new voice kit as soon as I can afford to do so.
-
then there will be only a few people looking at RasPi voice satellite instructions. I guess that:
- most of those people will be more experienced, and so can handle the current instructions.
- those people left wanting to use RasPi with Rhasspy (or Wyoming as the new version seems to be called now) will not be using it as a simple voice assistant - but wanting to integrate its modules into other systems (including developing their own voice assistants). This will require a different, much broader, focus for the documentation.
- as part of updating the documentation for using Rhasspy v3 / Wyoming on RasPi, the installation will probably change anyway to incorporate techniques used by the ESP32-S3 voice kit.
One of my Rhasspy 2.11 satellites is running on a Raspberry Pi Zero (not even the version 2). With a nice 3D printed case it can look like an alexa device … but without the various Digital Signal Processing algorithms being placed into the public domain we can’t get the same quality.