Home Assistant Mic/Speaker HiFi Alexa migration

Does anyone have any reference, guide, post le tutorial for this “Hi-Fi systems” both cheap (like amazon echo dot 5th gen) and more sophisticated ones (like amazon echo studio + echo sub)? The idea is to use proportional or an upgrade as speakers (sonos era 100 / denon home 150; to replace echo dots 5th gen) and (Sonos era 300 + Sonos sub 4 / denon home 350 + denon subwoofer; to replace echo studio + echo sub) just as a reference, cause is not proportional.

I’m trying to “duplicate” or “copy” the behaviour of a typical Alexa setup (I made the mistake of buying just amazon devices, now I have trouble with the AI of Amazon that never update and It’s not “native” with a local implementation with HA). Anyways, I’m trying to configure other setup but (as I don’t have one) I don’t know how bad/well the sound of a home assistant voice preview edition (HAVPE) is. Is it worse than an echo 5th gen? If yes, how can I “fix that” keeping the same funcionality and user experience as Amazon devices have?

As I could find, the only way is using well integrated with home assistant (HA) speakers (sonos or denon for best quality) and use the HAPVPE just as a “microphone” (configuring “Alexa” as custom wake word from HA just to avoid training all my family to wake devices again). But I couldn’t realize how does that REALLY work (I mean for example with Alexa you have an integrated speaker, you can talk, and if you have music, it volumes down to hear your instructions, then give you the response, and lastly comes back the original volume; again without really “interrupt” the music, that behaviour should happen even without calling the assistant, for example, if I’m listening to music and someone give an “announcement” to my specific speaker, or to all speakers in the house, it should have the same behaviour turn down the volume say the announcement and turn up to original; also you can play audio in all speakers at the same time, the idea is to keep this ux, also without breaking the sonos/denon integrations like audio groups and other funcionalities that the speakers have themselves).

Has someone heard about any content here that can help me? I really want not to make a mistake (again) related on this cause it’s not cheap, but I need someone with this experience to help me. At least with speaker/mic brands integration and implementation (I don’t know if HAVPE mic + sonos or denon speakers is the best combination, or is it something better over there). Thanks for your help.

You can probably get pretty close to Alexa with HA, but it is not free then.
The issue with voice assistants are computing power in order to do speech to text and if you want to do that locally, then you need some powerful gear, like state of the art graphic card or a similar solution and they are costly.
Alternatively you can subscribe to NabuCasa that provides that functionality in the cloud.

All the other things can somewhat easily be done.
Sonos can sync music and there are other solutions like Lyrion or Snapcast.
Volume can be controlled with scripts.

That being said local voice assistants are still in an early state and the evolution goes fast, so be prepared to upgrade soon to stay in the game.

I know, but I don’t wanna have another (high cost) mistake as I did with Alexa devices.

I know models can be tricky that’s why I’m planning ti build a mini pc (not the built in available in the market cause they don’t have graphics card), I think RTX 3060 OC (12GB VRAM) and a good processor like ryzen 7 7000 gen, 32GB RAM should be enough to run all that stuff, STT, TTS, a medium (7B) decent local model and also image AI for streaming analitics im real time with some cameras like frigate and 10 streams or less (analysis in low quality 720p 15 fps).

I think that should be enough, or is it needed a 5090 (or similar) for that?

Also the question that remains are the things related to “replace the echo dots”. I don’t know if HAVPE + Sonos is the best combo. Or maybe Is better to use that typical recommended 4mic array, but I haven’t tried.
Also I have questions about Spotify integration with sonos and music assistant. Now I have already had problems sometimes connecting Spotify to my echo dots devices, they just didn’t appear on the app, and if I said “Alexa reproduce Spotify” or something like that, it just didn’t do anything. It happens for some moments. I have heard that a friend had some trouble similar to this with his sonos system, but I don’t know which system version he is using, I think all this stuff was corrected with the new era devices. But I am not sure, also I don’t know If this happens to the music assistant add on too.

De behavior I was mentioning what’s not just the volume down, but the simultaneous reproduction of the music (low volume) and the notification (or recording of voice command) without interrupting the music. I mean giving the TTS response while reproducing low volume music, at the same time.

If you have information about this please let me know.

’m running them on a 3090 GPU, but based on the model sizes you mentioned, a 3060 should be totally fine. Sure, a 5090 would be better, but it’s also significantly more expensive.

It really comes down to the size of the models you want to run. You could even consider adding a second 3060 later for a dual-GPU setup if needed. But I’d recommend starting with just one 3060, get familiar with it, and only upgrade if you hit limits.

I built a mini PC with a 3090, Ryzen 7, and 32GB RAM—probably overkill for my needs, to be honest. I run Proxmox on it with high availability, and I offload Whisper and Piper. I also use it to download OpenAI models as needed—all on the same box.

Like you, I invested heavily in Echo devices back when it made sense. I wanted voice control in every room (still do), but Alexa just never delivered. I’m now trying to move away from Big Tech. I’ve got the HA preview edition—built-in speakers are poor, but you can plug in an external one, which sounds great (though that means two devices instead of one). It’s still early days, so I expect improvements.

I also picked up a couple of Satellite 1 dev kits from Future Proof Homes. You’ll need to 3D print enclosures—they’re not exactly stylish compared to Echos and might not pass the “wife test.” But the community is growing, and I expect better enclosures soon. You can connect any speaker you like to them.

I’ve only got one Sonos setup (Arc + Sub), so I can’t really comment much there. Personally, I feel like going all-in on Sonos is just another ecosystem trap—like Echo. Some folks love it, but I’m not convinced the premium is worth it.

Test your models and set-up, its not perfect yet, its good but just a start, so you might need to be a little more patient