Raspberry Pi as a CHAPTER 5 voice assistant

Florian i applaud your enthusiasm, and have gone through similar thinking myself … and decided it isn’t worth the effort at this time.

My current view is:

  • not sure if you are meaning Rhasspy (https://community.rhasspy.org) v2.5, Rhasspy v3 (not completed), which became the core of HA Voice Assist (which is the topic of this thread). Mike stated some time ago that he intends to update Rhasspy when he gets time - which I believe is mostly documentation on how to use it for non-HA uses.

  • The Rhasspy Raspberry Pi hardware options are overly expensive for limited functionality.

    • Raspberry Pi is a general purpose computer, and uses only a fraction of its CPU for the voice assistant and wakeword detection.
    • While Speech-to-text and Intent recognition can run on a satellite RasPi, it is not particularly suitable for the compute-intensive techniques used by Digital Signal Processing.
    • Driver for the seeed 2-mic HATs actually only uses 1 microphone and has none of the DSP (Digital Signal processing) magic we have come to expect from those big-name brands. HinTak has updated for new OS kernels, but no-one is interested in improving the code.
    • Conferencing speakerphones reportedly give good audio quality - but at a high price.
    • If someone already has a Raspberry Pi sitting around doing nothing, and a decent quality microphone, then it makes sense - but don’t spend money to go this route.
  • Mike and Paulus have talked (briefly) about an ESP32-S3 voice kit hardware device being developed by Nabu Casa; as @vunhtun says, this is the focus currently.

    • The ESP32-S3 has a co-processor and additional hardware instructions that make it suitable for AI and the maths required for Digital Signal Processing … without the overheads required to run a full Linux OS.
    • They mentioned the hope for this hardware to be released before the end of this year. They also want enough stock ready to ship at release so potential customers aren’t disappointed.
    • The big question is price. Inevitably it will be compared (on both quality and price) directly with the current generation voice assistant devices from huge corporations who have been subsidising production. Totally unfair comparison, but there are a huge number of HA users who don’t seem concerned about privacy when it comes to their voice assistants.
  • I expect that this new ESP32-S3 voice kit will instantly become the recommended hardware for new satellites. I personally intend to replace my RasPis running Rhasspy with this new voice kit as soon as I can afford to do so.

  • then there will be only a few people looking at RasPi voice satellite instructions. I guess that:

    • most of those people will be more experienced, and so can handle the current instructions.
    • those people left wanting to use RasPi with Rhasspy (or Wyoming as the new version seems to be called now) will not be using it as a simple voice assistant - but wanting to integrate its modules into other systems (including developing their own voice assistants). This will require a different, much broader, focus for the documentation.
    • as part of updating the documentation for using Rhasspy v3 / Wyoming on RasPi, the installation will probably change anyway to incorporate techniques used by the ESP32-S3 voice kit.

One of my Rhasspy 2.11 satellites is running on a Raspberry Pi Zero (not even the version 2). With a nice 3D printed case it can look like an alexa device … but without the various Digital Signal Processing algorithms being placed into the public domain we can’t get the same quality.

Hi @donburch888 i mean this kind of software: GitHub - rhasspy/wyoming-satellite: Remote voice satellite using Wyoming protocol.

I know that the current focus is the esp based hardware. I tested the current m5 stack software and for me it doesnt perform really good. And i am not sure if a “bigger” esp32 will. I am also missing multiroom-audio. If that changes on the 19th i may recap my decision but i think with a pi as a base for the hardware there is a lot more possible.

The rasperry pi zero 2w costs 17€ and the 2mic_hat costs 9€. For a good speaker and a small amp i payed around 15€. Thats a total of 41€. Almost the same price like a alexa.

The speaker is the same like the ones for the Bose Soundlink Mini.

The pi’s cpu is ~ 50% idle, thats ok for me. The wake word detection is the only thing running locally on that, all the other stuff is on my server.

I know that the 2mic_hat uses only one mic. I tested both systems and the mic from the 2mic_hat works much better.

I am thinking about creating a fork from the software mentioned above. I already created a working docker-compose file with docker files that work. I am currently fine tuning the settings.

Yep. That is the latest branch … and yes, Mike has previously commented that (a) he thinks the RasPi installation should be improved; and (b) that he intends to come back to Rhasspy when he gets some spare time. So I am sure he will appreciate your efforts.

Down here at the end of the world, in the land of Aus, a Raspberry Pi Zero 2 W + Adafruit Voice Bonnet for Raspberry Pi (reSpeaker HAT look-alike) already costs double the price of a basic google or alexa device (without adding power supply, case and speaker) for noticeably inferior performance. Here it makes sense only if someone already has the RasPi lying around unused. I am pleased to hear that is not the case in your part of the world.

I am curious that you seem to be dismissing the upcoming Voice Kit as just a “bigger” Atom M5 stack. I’m no expert on this, but I understand that the ESP32-S3 processor has additional instructions built into the CPU which makes it better - better even than a Raspberry Pi for the sort of computations required for digital signal processing and AI.

Nabu Casa are (presumably) addressing the BOX3 limitations … optimising hardware and firmware for voice assist, manufacturing at scale, and providing the ongoing support and development that real-world users need. I am eagerly awaiting the 19th to find out more; especially the price … which I guess will be somewhere between the google/alexa subsidised offerings, and the cost of a Raspberry Pi solution … with ironically worse performance than the big guys and better than RasPi.