I have ordered the Home Assistant Voice Preview Edition, and I want to do local voice recognition. I haven’t committed to specific hardware for that, and I would love to get some ideas of what the tradeoffs are.
There have been some good threads which take a deep dive into specific approaches, but I’m looking more for where to start, and what expectations to have in terms of hardware versus performance.
My goals:
Voice recognition in English for several users. Basic “turn this on” type commands should be very reliable. More complex commands would be fun.
Not too much lag - a few seconds is probably okay.
Modest power usage. My HA Yellow with compute module 5 uses about 4 W when it is mostly idle, while my 32 GB 24-core desktop uses 40-60 Watts even with minimal background tasks and the monitor powered off. I don’t want to eat 40 Watts 24 hours a day to handle a handful of voice events.
I know that these goals are vague, but again I’m looking for general guidance, not exact wattage and performance figures.
What am I will to put into it?
I’m happy to go though fairly complex setup steps, but I don’t want to become an expert on this.
I’m fine with assembling things, working with circuit boards, Linux command line, and programming in most of the common languages. I just don’t want this to become a huge job.
Hundreds of dollars are okay, not thousands.
In addition to the HA Yellow I have an idle Raspberry Pi 5 and some mediocre laptops sitting around. Also an older graphics card based on the GTX-1070. No idea whether any of this helps.
So, please answer any way you find useful, or take a shot at these:
What can I do with just my HA Yellow with CM5?
Is the Whisper Base model a good target for flexible but reasonable capabilities?
Is it crazy to expect to run a decent model on hardware that doesn’t consume a kWh every day?
Is offloading work to my spare Raspberry Pi 5 of any use?
Actually if you have Home Assistant (HA) running on HA Yellow, and you’ve just bought a Voice PE then you have committed to all the hardware you need to get started with Voice and satisfy your goals.
The Voice PE should be easy to install and setup using the default setting of HA Cloud for the heavy computational bits (Speech-To-Text and Text-To-Speech). That’s plenty to start with and get familiar with voice.
There is of course plenty of tinkering, upgrading and expansion which you can do - and it sounds like you are keen to get stuck in, so that’s great … but I suggest you take it one step at a time because it’s too easy to get overwhelmed and confused with all the options.
I see three main directions for expansion:
Multiple satellite devices, so you can use HA Voice in different rooms. There is a variety of devices and options available here, but many can essentially be considered variations on the Voice PE.
Bring the STT and TTS local, by adding another computer and Graphics card. The more powerful the CPU and GPU the better your result will be, and I suspect RasPi 5 or GTX-1070 would be considered minimal.
general AI rather than just controlling HA
I’m not knowledgeable enough to answer your specific questions, other than repeating that you start by having a play with what you have now.
Thanks. This is the sort of input I need. I’ll probably just need one satellite, at least at first. As you suggest I’ll start off with the HA cloud, where I already have a subscription. After using that for a while I should have a better idea of what I really need.
I do want to look into moving STT and TTS to a local device before long, so if anyone has suggestions for a lower-energy but effective setup, I’m still listening.
I also ordered a premade voice assistant only to play with it (arrivng today ) because everything i can find whether its a DIY or off the shelf is a sh*t device.
ideally the ultimate voice assistant would look like this:
Pain Points:
1- Voice assistants only act as voice assistants not as a great music player as well.
2- No airplay, In all the builds i only seen one guy come close to a perfect voice assistant some how adding airplay to it to stream music.
3- Small 3watt speakers, terrible volume and frequency response.
4- Not discrete enough, who wants another thing to plug in, when you can have an out of the way embedded in the ceiling voice assistant.
Ive been researching the matter on spare time, to create just that. A voice assistant that is embedded like a reccessed ceiling speaker with great audio capture & a good enough frequency response to stream music too but theres other hurdles as a voice assistant cant do all of these things on its own without a server to run airplay, so this isnt the end of the for me but a clear obstacle in my way to my ultimate voice assistant experience. I have design a custom pcb that can do all of these things as well as power a full sized driver (at least a 4inch speaker, car sized speakers) then when all of that is done head over to cad and design a mount/enclosure to hang it int he ceiling.
But i havnt gotten to finish that project because im working on several integration at once while bug fixing the one i did release & also renovating my home
That pretty much comes down to wonderful hi-fi music not being their primary goal, and some objectives conflicting.
If you look through this forum you will see that integrating music sources, and outputting to quality audio systems, are things people are still working on.
I look forward to hearing more about your project, the challenges and compromises, as you progress with it.
I figured lol, but if your gonna make an assistant but still have an amazon echo or apple homepod for music or pretty much anything then why make an assistant lol. If i followed in those footsteps id have to explain to my family why they have to ask siri to play music in the living room but ask the home assistant satelite to do other task .