VoiceAssistant + TTS + MultiRoom Audio HW and SW Options Local only


I am researching right now a full audio setup to integrate with HA.

Some features/requirements:

  • Totally Local (No cloud/internet connection for anything)
  • Voice commands input (speech2text)
  • TTS notifications
  • Multi-room audio (ideally, but if not possible can be standalone per room also so that the SW doesn’t have to sync it) with support for Spotify, Airplay, Bluetooth play (not sure how to achieve this)
  • Lowest footprint/cost possible

I know it is a lot to ask but just trying to understand what is possible and what is not. From my investigation I reached some conclusions:

HW side:

  • Raspberry pi 4
  • ReSpeaker 4 MIC array (https://respeaker.io/4_mic_array/) (they already have a 6 MICs option but it way more expensive). Probably there are other options but this was the cheapest I found that could be integrated into the rpi with lowest footprint and good performance
  • Speakers Tribit XSound Go connected via 3,5mm audio cable to rpi. (Here there are some other options like buying some adhoc speakers and then using a USB sound card on the rpi or even some DAC connected directly to GPIO, but I thin they are more cumbersome and probably costly). Also I am hoping that the rPI can power/charge the speakers via USB cable.
  • Custom made 3D case to have the PI and somehow the microphone array (if there was a ready to use version to buy in aliexpress or other it would be better since I don’t have a 3d Printer nor skills to make a custom case)

SW Side:
For the TTS/Voicecontrol:

  • Almond/Ada - Seems nice and the official HA SW but it isn’t fully local for now as far as I understand
  • Mycroft - Same as above. It isn’t fully local
    So it leaves me with two options:
  • Rhasspy or Project Alice( based on Snips). Rhasspy seems to be more developed and Project Alice newer. Any opinion on both?

For the MultiRoomAudio:

  • Snapcast and Mopidy or Volumio (Any other options?)

The issue here is that I don’t seem to reach a solution that provides the TTS and VoiceControl as well as the multiroom audio on a single SW package and I am afraid that I cannot run Rhasspy or Project Alice alongside Snapcast with Mopidy or Volumio and have everything working together.

I am open to more opinions and thoughts on this.

Thanks a lot.

Well i just ordered the rpi and the respeaker to test for now.

If building a new house or when laying cables is an option I’d consider putting speaker and microphone cables all leading to a central server.

Ty. That is not the case. It is an existing one. For new houses agreed, cables is the best option.

Then yes probably for a synced Multiroom Solution the Snapcast is your best bet. You can also take a look into LMS (Logitech Media Server or Squeezebox) but from my testings Snapcast does a much better job. My current situation is similar. For now I am running Google Assistant / Alexa alongside Snapcast for Multiroom Audio. But I am contemplating simply getting rid of Google and Amazon Devices and deploying rPIs with the added Rhasspy and Microphone Arrays. As far as I am aware, there is not 1 single package available as you described, however I wouldn’t expect them not to work nicely alongside each other.

Yes my doubt is that since they are different packages they could lock in the MIC Hardware device inside rpi and not let it be available for the other software.
Thanks for the answers btw

Any update on this? What hardware did you end up using?
I would like to start with VoiceAssistant (Rhasspy) + MultiRoom audio (Snapcast), but have no idea what hardware to buy (esp32, raspberry pi, dac, amp …).


