Voice Chapter 8 - Assist in the home today

As you have probably already read, we launched our Home Assistant Voice Preview Edition today. The culmination of the past several years of open-source software progress on Home Assistant’s home-grown voice assistant, Assist. A sizable group of dedicated developers has been working together on adding and honing its many features, and if it’s been a while since you tried Assist, you should use this launch as a chance to jump back in and see the progress we’ve made.

Home Assistant Voice Preview Edition has been launched to build on this work, continuing the momentum we’ve already built and accelerating our goal of not only matching the capabilities of existing voice assistants but surpassing them. We had an early production run of Voice Preview Edition (a preview preview 😉), and we tried to get them in the hands of as many of our language leaders and voice developers as possible - and we’re already seeing the fruits of their efforts with language support improving over the past month alone!

I’d like to highlight in this voice chapter all the things you can do with Assist today. I also want to give the state of our development, what the limitations are, and where your support can be best applied.

Table of Contents

Assist in the home today

Origins of Assist

Early versions of Assist via chat - things have come a long way

Voice control for Home Assistant goes back further than most people assume, with some of the groundwork we use today being added as far back as 2017. The major turning point came when we refocused our efforts and declared 2023 the Year of the Voice. This was an effort to focus development and find areas where our community could make the most impact. During the Year of the Voice Assist was added to voice, intents were improved, languages added, wake words were created, and we established great local and cloud options for running voice. Shortly after Year of the Voice many more features were added, including integrated AI, timers, and even better wake words. Year of the Voice got the ball rolling, and Voice Preview Edition will continue its momentum.

Commands

Assist is the underlying technology that allows Home Assistant to turn commands (“turn on the light”) into Actions (light.turn_on). Commands, or as we call them intents, allow you to control pretty much every aspect of your smart home, including on, off, play, pause, next, open, close, and more. We also have intents that give you helpful information like what’s the time, weather, temperature, and so on. Lastly, there are a bunch of other useful miscellaneous things, like adding items to a shopping list and setting timers. If you’re interested, there is a full list here.

Timers

Your browser does not support the video tag.

When we asked our community timers were a top-requested ability. You can not only set a timer, pause, increase, decrease or cancel it, but you can also set commands to trigger after a set amount of time, for example, “turn off the TV in 15 minutes”. You can also just say “Stop” without a wake word, to silence the timer’s alarm. On our Voice Preview Edition, when you set a timer the LED ring counts down the last seconds and flashes when it’s done.

Exposing devices and Aliases

This sets us apart from other voice assistants: we allow you to expose and effectively hide devices from your voice assistant. For example, you could choose not to expose a door lock but instead just expose the sensor that knows if the door is closed. It puts you in the driver’s seat on what voice can do in your home. We also introduced aliases to allow you to give devices multiple names, allowing you to speak more naturally with Assist.

Room context

If you tell your Assist hardware what room it is in and ensure other devices are organized by room, you can give commands like “turn off the lights”, and without specifying anything, it will turn off the lights in the room you are in. This feature also works with media players (play/pause/next) and timers.

Wake words

Our community is donating small amounts of time to improve wake words with our tool.

Wake Words are the unique phrases that initiate a voice assistant to listen and start processing a command. Wake words originally had to be processed on Home Assistant via an add-on like openWakeWord, meaning the Assist hardware needed to continuously stream audio to Home Assistant. Shortly after Year of the Voice microWakeWord was released, which brought wake word processing on-device for faster responses. It is improving fast thanks to our community using our fast and easy tool to donate samples of their voice. There is a growing list of wake words, and the on-device options include “Okay Nabu” (default and most reliable), “Hey Jarvis”, and “Hey Mycroft”. Both of these wake word engines were built by the Home Assistant community and are open source, giving the world two great free and open wake word engines!

Speech Processing

The Assist pipeline in all its glory

Assist can’t understand spoken words and needs something to take that audio and turn it into text - all this together is called an Assist pipeline. This speech processing is really CPU intensive, so it can’t happen on the Voice Assistant Hardware, and sometimes your Home Assistant system can’t even handle it. One important step we made was adding speech-to-text and text-to-speech capabilities to Home Assistant Cloud, which allows low-powered Home Assistant hardware to offload speech processing to the cloud. Home Assistant Cloud doesn’t store or use this data to train on - clouds don’t get any more private than ours. It is also the most accurate and power-efficient way to process speech. We’ve put considerable effort into local speech processing, building the add-ons and a new protocol they use to speak to Home Assistant, but they are very reliant on language support from the community.

Language support

See if your language is supported with our checker.

Assist aims to support more languages than other voice assistants, and this has been a massive undertaking for our community - We need more help. The first step for language support is getting the commands (intents) right, and we have over 25 major languages that are ready to use today. Our wake words are also getting better at understanding different accents thanks to our Wake Word Collective tool.

Text-to-speech

We built our own text-to-speech system, Piper, and it now supports over 30 languages. It’s a fast, local neural network-powered text-to-speech system that sounds great and can run on low-powered hardware (it’s optimized for Pi4!). It was built with the voices of our community, and if you don’t see your native tongue, add your voice!

Speech-to-text

There is one area that holds back the rest of our language support more than others, and that’s local speech-to-text. Building a full speech-to-text model needs big compute resources and terabytes of samples, which is currently outside our reach. We use Whisper for local speech-to-text processing, an open-source project from OpenAI, and we’re grateful it exists. For some languages, it works great and doesn’t require a lot of system resources to run well, but for others, you need a pretty beefy system to get acceptable results. In our opinion, only about 15 languages are ready to be run locally on reasonable hardware (an Intel N100 or better) - that’s why before you begin dreaming up your perfect all-local setup, we recommend checking language support.

We’re always looking for new solutions for low-powered hardware, and are now building another tool that uses much less complex sentence recognition. This could even run on a Raspberry Pi 4, but it would only be able to identify predefined sentences, so if you go off script you may need to call in an AI to help Assist understand your needs. Our language leaders are hard at work putting together the needed translations, but if you want to learn more visit Rhasspy Speech.

In general, even when your language is supported, you’ll almost always get better results from Home Assistant Cloud. Use the free trial to see what works best for you. Also, you can use both, we know someone using an automation to switch the Assist pipeline to an all local setup when their internet is down.

AI and Assist

Our default local conversation agent mixed with AI is great for natural language and speed

Another aspect where we beat the competition hands down is the integration of AI into our voice assistant. You can choose from some of the biggest cloud AI providers like ChatGPT, Google Gemini, and Claude (paid accounts required). You can also run it locally via Ollama if you have a modern graphics processor with enough VRAM, allowing you to build the most capable offline voice setup around.

Our intents (Assist’s built-in sentences) are getting better at understanding most commands, but AI processes commands in natural language, meaning if you get the device’s name ever so slightly off, it can still figure things out. It also provides the ability to ask outside the built-in intents. For instance, if you tell it “It’s a bit cold in here”, it may raise the temperature on your thermostat, but it could forgo any home control and just tell you to put on a jacket - results are not yet consistent. More useful is its ability to take multiple sensors and provide context. For instance, you could ask it for an air quality report, and it could review the CO2 levels and tell you to open a window it observes is shut. All this is experimental, and having an AI control your home is not for everyone, but what’s important is that you have the choice.

Conclusion

So many new innovations and improvements for Assist have happened in the past couple of months, and this speaks to the power of having good hardware to build our software on. Voice Preview Edition is the best open voice hardware available today, and even with it only in the hands of a couple of hundred people today, it’s making a noticeable difference. Whether that’s writing code, improving language support, making blueprints, or even just reporting bugs. The momentum we will build having this in the hands of thousands will be game-changing - it’s why we’ve declared that the era of open voice assistants has arrived.

In the comments sections, we always have a couple of people saying, “but I don’t use voice, what about improving (this or that)”. The good news is that improving Assist and Home Assistant’s other features are already happening in tandem (check out our roadmap for the complete picture of our priorities). In the end, only a fraction of our development goes towards voice, and our budget is what Amazon’s voice team probably spends on pizza parties 😆. A great side effect is the problems we’re solving with voice are benefiting other parts of Home Assistant, for example, our integration of AI was driven by voice.

We really think voice is an integral part of a well-rounded smart home ecosystem. It’s especially important for improving the accessibility of home control to all members of the household. There needs to be real options in the space, most importantly ones that give you full control and a real choice on privacy.

Home Assistant Voice Preview is available at retailers today,


This is a companion discussion topic for the original entry at https://www.home-assistant.io/blog/2024/12/19/voice-chapter-8-assist-in-the-home
11 Likes

Hi Mike, amazing work on this! One question: it seems some entities are exposed to assist by
default, but not all. Is it documented what is auto-exposed?

2 Likes

I want object recognition cameras someday, “hey home, where did I put my glasses?”

Thank you all for your amazing products and services!

Amazing work and excited to get the HA Voice Preview! You had mentioned that:

someone using an automation to switch the Assist pipeline to an all local setup when their internet is down

Any chance there is a Blueprint for this automation? :slight_smile:

2 Likes

Is there any road map for this Hardware ?? Im interesting for local support Polish language.

Its a great piece of hardware, but its a big BS sellers differ from the price…
Re-sellers now ask 20 - 25euro more than the price listed in the stream… and that without shipping… are you kidding me. and then they just add some shipping fee too it so they basically they earn almost 1.5 of it…

For example the youtuber Everything smart, he ask 72 … kinda BS.
If i add shipping (it was 20 extra) i will need to pay almost 100 euro lol…

While in spain i pay 55 and 8euro shipping (sold out)
And germany even same price as the youtuber, also 72 euro and 12 euro shipping. (still selling)

I mean look all the re-sellers they differ huge in price, just imagine buying 3 i get 1 for free instead haha.

// edited

Keep up the awesome work on Assist and voice Mike and Kevin!

Please also consider adding more/other wake-words too :wink:

It is £48 from Everything Smart Home. Lewis runs an honest business. If you want it shipped to the EU and not pay extra then do not buy it from the UK. I mean, you ever heard about Brexit?

2 Likes

If you use words that are too common in natural conversation as wake words you will get a lot of false positive triggers. Home, assist, open and such are way too common. Nabu and Jarvis on the other hand…

Everythingsmart is selling it cheap though?
It’s £48 delivered in the UK (it’s a UK store) which is $60, that’s including our 20% sales tax as well! So it’s actually cheaper than the MSRP US price by a fair bit.

You are typically much better off ordering from a local retailer in your country especially after Brexit screws everyone so costs aren’t the same.

2 Likes

Lewis even mentioned this in another video.

The price includes import tax, something that is often not included in the displayed price but will be added later and comes as a surprise when the item arrives at your door.

And of course, as other have mentioned, the blessing of Brexit (the majority voted for it, so it must be good, right) has brought so many improvements and has made everybodies’ lives so much better.
You must be living under a stone if this has not made its way to your part of the world.

I bought from everything smart home 2 days ago. There is no advertised seller for NZ so I looked overseas.

Today the exchange rate tells me usd59 is nzd104.50. They charged me $109, plus $25 postage. Given that the currency rate has worsened for NZ in those 2 days, I am ok with that.

Love the new device, even though I don’t use voice or text commands to interact with HA currently.

I’m wondering if this new platform could also be used to recognize certain events like smoke alarms or unusual sounds like glass breaking.

Looks like Wyoming Rhasspy Speech docker image not updated?

I wanted to try it, but can’t :frowning_face:

That addon installs fine here. What exactly are you experiencing?

I created a compose.yml file and can’t pull the image. I see this error:

not found: manifest unknown: manifest unknown

I assume you are running

docker run -it -p 10300:10300 rhasspy/wyoming-rhasspy-speech

Try

docker run -it -p 10300:10300 rhasspy/wyoming-rhasspy-speech:1.0.0

:1.0.0 works, but latest tag is 1.4.3.
I have :1.0.0 running but I can’t add it as service in HA for some reason.
Maybe v1.0.0 not compatible with current version of HA :man_shrugging:

Does :1.4.3 work?

no, I can’t pull the image with that tag. Same with stable and latest tags