Year of the Voice - Chapter 1: Assist

This year is Home Assistant’s year of the voice. It is our goal for 2023 to let users control Home Assistant in their own language. Today, one month into 2023, we start our first chapter.

At Home Assistant we believe that technology is meant to be played with, and projects should be usable as soon as possible. Together with the community we can then iterate and refine. That’s why today, we’re delivering a basic experience for 22 languages to interact with Home Assistant. Oh, and we are also releasing some fun stuff that we cooked up along the way.

To watch the video presentation of this blog post, including live demos, check the recording of our live stream.

Intentions

The core of a voice assistant is to be able to understand the intention of a spoken sentence. What is it the user wants to do? To extract these intentions we created our own template sentence matching format and intent recognizer named Hassil.

This new format is used by our new Home Assistant Intents project. The goal of this project is to collect home automation sentences in every possible language. Since it’s start a month ago, we have had 112 people contribute. The project now supports 22 languages and 14 more are in progress.

Assist

We have added a new feature to Home Assistant: Assist. It allow users to use natural language to control Home Assistant. It is powered by Hassil and the sentences from the Home Assistant Intent project.

We want Assist to be as accessible to as many people as possible. To do this, we made it work without requiring extra hardware – just update to Home Assistant 2023.2 and you can start! Through a combination of smart algorithms combined with sheer brute force (we are collecting a lot of sentences), we have been able to make a system that works for most common sentences. Support for more powerful, AI-powered, intent recognizers might come in the future as an opt-in feature.

Assist is enabled by default in the Home Assistant 2023.2 release. Tap the new Assist icon at the top right of the dashboard to use it.

Assist documentation.

Assist on Android Wear

We want to make it as easy as possible to use Assist. To enable this for Android users, we have added a new tile to the Android Wear app. A simple swipe from the clock face will show the assist button and allows you to send voice commands.

Assist on Android Wear documentation.

The new tile is currently available in the Android beta and will be part of the next Android release.

Assist via Siri and Apple Shortcuts

For Apple devices we have been able to create a fully hands-free experience by integrating with Siri. This is powered by a new Apple Shortcut action called Assist, which is part of the Home Assistant app. This shortcut action can also be manually triggered from your Mac taskbar, iPhone home screen or Apple Watch complication. We have two ready-made shortcuts that users can import from the documentation with a single tap to unlock these features.

Assist via Siri and Apple Shortcuts documentation.

The Assist shortcut will be available in the Mac and iOS beta channel today and will be part of the next release for iOS and Mac.

Custom Sentences

With Home Assistant we believe that every home is uniquely yours and that technology should adapt to you, not the other way around. That’s why we have architected Home Assistant to allow users to extensively customize their experience. Our Assist feature is no different.

  • Are you into Game of Thrones and want every response to be “hodor”?
  • Want to turn on lights in rooms by saying “Hocus pocus living room”?
  • Want to trigger your party mode script using a custom sentence?

Assist includes support for custom sentences, responses and intents, allowing you to achieve all of the above, and more. We’ve designed the custom sentence format in a way that it can be easily shared with the community.

Read the documentation on how to get started.

In a future release we’re planning on adding a user interface to customize and import sentences.

Custom Assist engines

By default Assist is powered by our own intent recognizer. It is local but it’s limited to controlling devices. Maybe you want to be able to ask more wide-range queries or you are looking for a conversational AI that will make up responses and present it as the truth. For such cases the Assist feature supports swapping out its engine that handles all Assist interactions.

The Home Assistant 2023.2 release includes two alternative Assist engines that you can enable: Google Assistant and OpenAI GPT-3.

The Google Assistant Assist engine is able to control your devices if you have linked up your Home Assistant instance to Google Assistant.

All ways to intereact with Assist will work, as they are not bound to the the Assist engine that is being used. So if you ever wanted to use Google Assistant on your HomePod, now you can 🤭

The OpenAI GPT-3 Assist engine will process all your interactions using GPT-3, a sibling of the infamous ChatGPT. It is not able to control your house or help you automate your house. Anything you ask it may or may not be factually correct. But it can be fun!

In a future release we’re planning to make it possible to configure multiple Assist engines to handle interactions.

What’s next

For Year of the Voice - Chapter 1 we focused on building intent recognition into Home Assistant while relying on Google and Apple to do the hard parts (speech recognition). This allowed us the fastest path to get something to the community to play with.

We will continue collecting home automation sentences for all languages (anyone can help!). Updates will be included with every major release of Home Assistant.

Our next step is integrating Speech-to-Text and Text-to-Speech with Assist. We don’t have a timeline yet when that will be ready. Stay tuned!

Credits

A lot of people have worked very hard to make all of the above possible.

Technology: Mike Hansen, Paulus Schoutsen, Daniel Shokouhi, Zac West, Rosemary Orchard, Tronikos

Language Leaders: @AalianKhan, @Ahmed-farag36, @alpdmrel, @arunshekher, @auanasgheps, @benjaminlecouteux, @bluefoxlee, @cibernox, @cvladan, @davefx, @dinhchinh82, @dsimop, @duhow, @easterapps, @ErnestStaug, @fadamsen, @flexy2dd, @gabimarchidan, @haim-b, @halecivo, @HepoH3, @hertzg, @hristo-atanasov, @huusissa, @joaorgoncalves, @larsdunemark, @leranp, @LubosKadasi, @makstech, @mojikosu, @MTrab, @nagyrobi, @schizza, @Scorpoon, @skynetua, @spuljko, @tetele, @TheFes, @Uriziel01, @xraver, @zubir2k

Voice Community: @Alexivia, @Atalonica, @AwesomeGuy000, @BossNeo, @CedricFinance, @Davidsoff, @EmilZackrisson, @FragMenthor, @InfiniteBed, @Kalma-House, @Licmeth, @Marlo461, @N3rdix, @Nismonx, @Robin-St, @TaQuangTien, @ThomDietrich, @TomaszPilch, @Wojciechgc, @alessandroias, @bemble, @berendhaan, @dejan2101, @dependabot[@bot], @dobromir-hristov, @frenck, @hugovsky, @iddiek, @jfisbein, @jharrvis, @jorclaret, @kamildoleglo, @kblin, @khymmera, @kroimon, @lellky, @ludeeus, @lukahra, @lunmay, @mardito, @martindybal, @mib1185, @michaelmior, @orrc, @pckahrs, @piitaya, @pmentis, @poltalashka, @rPonuganti, @rechin304, @relust, @rickydg, @rpochot, @rrakso, @rumbu13, @sanyatuning, @tasmin, @thecode, @waltlillyman, @witold-gren, @x15pa3ck15x, @yuvalabou


This is a companion discussion topic for the original entry at https://www.home-assistant.io/blog/2023/01/26/year-of-the-voice-chapter-1/
10 Likes

Amazing stuff as always!
Thank you

On beta 1 now and I must have missed something. Yes the light is light.dining_table and its name in the GUI is dining table. I guess it is beta.

EDIT: Actually I checked. It is shown in the gui as dining table lights (plural) and asking it to turn on the dining table lights

When i saw your “first” announcement of “Year of the Voice” i was thinking i maybe could “contribute” here, but the fact is Im Danish , and lived (last)half of my life in Sweden, so my (edit: Spoken ) language is FU, not even Alexa/Google get it right, and im to oold to try improve my speech ( yeah i also mumble alot ) … so i wonder, is there any hope that ha would be able to interpret Swedish with a strong Danish dialect ? .
I tried with 3 languages in Google Assistant, totally chaos, i never knew whether it understood/interpret what i said, did it think i talked Danich or Swedish ?, or was it so confused so i just got an English answer back … you get my “concern” ? … Well it’s no a concern, I dropped it … beside starting every “command” with “Hey Google” or “Ok Google” makes me feel like im repeating my self in a ridicules way … I think it’s better i stay out of this topic, until there comes a voice-assistant, who learn my “Voice” only respond to my “Voice” and interpret commands whether it’s Danish, English or Swedish … And then respond back in i.e English ( or choice of language )

Great work so far! I have a question from the perspective of an integration developer. Will it be possible for integrations to provide intents? For example, the Mazda integration has a button entity that starts the car engine. Could the integration provide a customized intent that enhances the user experience by allowing users to say “Start my car” rather than a generic (built into HA) sentence that I assume would be something like “Activate/press the ‘start engine’ button”? Or could someone theoretically build a joke-telling integration that responds to “tell me a joke”?

Ah, glad you figured it out! This is a great example of what we’ll be focusing on next: generating different forms of entity names. For English this is usually just singular and plural, but it gets much more complicated for other languages :grinning_face_with_smiling_eyes:

It should be possible to “fine tune” a speech to text model for this purpose. This would be local only though, of course.

Yes! It’s already possible for integrations to register intents, but they can’t easily add custom sentences yet in code.

[Picard walks into room]

Computer, lights!

… and there was Lights!

Thanks, love your work, been fiddling with rhasspy on and off for ages. Never got much done though. Incentives now.

1 Like

Impressive, great work everyone involved! :clap:t2:

I have one idea that I’m not sure where to put… for Android WearOS it would be cool if you could start Assist from a complication! You know those small shortcuts on watch faces.

One swipe less to chat with Assist :slight_smile:

1 Like

Yeah thats a nice idea. Adding support for binding the triggering of assistant to a watch hardware button would be even better. And perhaps also submitting the query after you have stopped speaking, like google assistant also does. Then there is no need for a confirmation tap.

Is there a way to activate the voice assistant by speaking, and not just by clicking on the Assist icon?

1 Like

I watched the live stream and I think it’s awesome how home assistant is evolving.
I was thinking about voice however and everything being local. A few months ago I came across something that might already have local voice control but seems to only available at a premium and marketed towards the high end smart home market like Savant,4-control,ELAN. Ect… The product I found was Josh.so If home assistant can develop something similar at a extremely lower price point I’d be in.

Also how would HA’s approach be different then Josh’s .so?

Hey that’s great news. Do I get it right, that this part only covers the interpretation of the spoken words, but the recognition is left to Google or Apple?

If we could make this work with an offline speech recognition (“whisper” is very good and open source!), that would be a tremendous advantage over other voice assistants, because people do care about privacy!

Wow, This is just the start of the year and this looks already awesome. Wonder what it will look like at the end of 2023. Keep up the good work.

Really looking forward to playing with this and seeing if it can replace Alexa for a lot of things.

One thing Alexa can now do is a timed action - e.g. turn on the lights in 5 minutes. Will this be able to?

And perhaps also Turn on the lights for 20 minutes too.

works great, though I can’t seems to get the mic working in either firefox or edge, is it me, or is not not suppose to yet ?

It seems like most browsers have effectively blocked microphone access over plain HTTP. It sometimes works on localhost, but there are all sorts of hoops to jump through. It may be broken for a while :frowning:

yeah i also figured this was the culprit , that is so weird a decision for the “browser” makers, should be a simple “will you allow, even thou it’s over http” this restriction to “Force Https” , if the url is http, is absurd
PS: it’s not only microphone … , it’s even camera, location and notification it seems

EDIT: Yo might need to “include” this “option” in the Cookie (Allow microphone / or ask for permission, when clicking the Mic, and store in the Cookie)

nice request! We can definitely look into this! Feel free to submit a feature request in the meantime :slight_smile:

1 Like