2023: Home Assistant's year of Voice

I find the dashboards just fine.
the autogenerated might be a mess, but there is no way to make an autogenerated dashboard that will show the wide spread of sensors and active entitites that are possible in HA in a way that is suitable to all.
HA provides a way to make you own dashboard setup and themes is an option to spice it up further.
If you are arguing for a better autogenerated dashboard, then you will probably end up disappointed.
I know a new take on it is coming at one point in the future, but it will most likely only be 50% of what the userbase wants, because the needs, wishes and requirements are so diverse from user to user.

2 Likes

I would be interested in a local voice recognition. But if the microphone has to be on the Home Assistant host computer, I’m out.

I have an Alexa device in every room of my house including the laundry room, bathroom and basement. I also have Alexa Auto in my cars and Alexa Frames for my prescription glasses. I have some 50 devices that I control through Alexa.

My first requirement before even trying a voice add-on would be that the HA Voice device would have to be stand-alone (the way that ESPHome devices are standalone). This would, however, make it very easy to know where the voice commands are coming from. For example, in Alexa you have to say where you are when you give a command. I should be able to walk into a room and say “… turn on the light” and the light in that room would turn on.

That said, let us iknow if there will be a need for beta testing.

It’s hard to see how Rhasspy will be competitive with the accuracy or ecosystem of Google or Alexa. I had done voice recognition tries with systems before Google home (I have about 10 hubs in the house now), but the accuracy was just OK (using Homeseer). My wife would refuse to use it because when it missed, she just stopped trying.

With the google homes all linked into HA, she uses it all the time now! So do the kids. I could put up with specific phrases and speaking in a way to help it understand, but not so with the rest of the family.

Also, the Google homes look nice in the house. Some 3d printed case is not going to be allowed to be placed on tables and out in the open where everyone can see it. My wife doesn’t mind meh looking gadgets as long as they are out of sight. That is not good if you are trying to do VR.

And Google has a large ecosystem of devices at great prices. I love the little $25 Lenovo units that look like clock radios. And the Nest hubs screens are super useful in the kitchen so it’s worth the $60 for them. I have a hard time seeing how Rhasspy can compete with this!

If HA developers are going to joust at that windmill, it will be a huge waste of time and energy and distract from the HA core mission. However, if HA can improve the general utility of voice even on Alexa and Google Assistant platforms, maybe that would be worth it. I really like the nabu casa google integration, though with a lot of devices the interface is a bit clumsy and could use some work. And being able to natively make announcements to Google assistant devices and resume playback of media etc… without having to add blueprints and plugins would be great. That should really be part of the nabu casa integration, as well as knowing which device initiated the commands and being able to interact in a dialog.

So I think there is a lot of great work to make HA be truly voice friendly (listening and speaking), but not to try and out do Google and Amazon here, especially on the device side.

3 Likes

What is the spec per user of these supercomputers? How many requests per second does amazon or google get to their voice command servers?

Very good summary, I’m fully sympathizing with it.
Just one comment: with google speakers they actually know where they are located so if you say “turn on the light” it will turn all lights in that room only. Same with turning off. And with a few other commands e.g. adjusting HVAC or media player volume.

Request per seconds are really not a measurement that tells anything.
The issue here is really how deep you can seek into an AI built solution tree.
The provider of this service have to find an acceptable time for responses, which usually is a few seconds and then built the solution tree so it can be searched in that time for a decent result.
You can get a 100% match if time is no issue, because you can have all possible responses.
But you really only have a few seconds and there a super computer can just search a bigger solution tree than a Raspi4 or some home computer.
The development goes in the right direction though.
Tensor chips made specifically for AI solutions helps and research into optimizing the code is done too.
At the same time several projects have been made to make other languages than English available, like Mozilla’s Common Voice project

Thanks for that. I know next to nothing about AI, I had to look up what ChatGPT was the other day…

1 Like

I think that we could use aliases for that.
There were recent PR’s to the core that added aliases, for example, this one: Add aliases to area registry items by emontnemery · Pull Request #84294 · home-assistant/core · GitHub
So if the UI will support setting the aliases, then we could find sypialnia and add sypialni as an alias.

Well, the goal is different. Siri, Google Assistant, Alexa supposed to be general purpose assistants that should be able to assist you with any request. In case of HA very specific purpose is targetted - to be able to recognize specific, predefined comand to controll your home. This should be way easier! I really hope (being Polish and very dissappointed how these big boys miserably failed to deal with my language)…

3 Likes

Well, as I said, I had experience deploying a VR system with homeseer that required pretty specific commands. My wife and kids hated it and didn’t use it. I think the VR has to be general enough so that you don’t have to continually explain things to family members about why they aren’t doing it properly. They’ll just stop using it and WAF will take a hit.

You may have lessor goals than others, just like me, but unless you live alone, I think you will be disappointed. And as for language support, getting that right is much harder for a small team than for Google or Amazon. This isn’t about programmer quality, but about access to training data and enough compute to build a good VR model. Google has that in spades because of Youtube and search. Amazon has massive compute as well. It’s hard to do VR right without access to lots of training data and compute hardware, even if you have outstanding programmers.

This is why you see so many sucky VR implementations (cars, IVR, etc…) all over the place. VR is hard to get right without a LOT of training data. Less popular languages are even harder.

Now, Google does make at least some of it’s VR tech available in the cloud. I suppose Nabu Casa could feed Rhasspy data to GCP and use that engine, but then it’s not local. To do local well you need that training data set and a lot of hardware optimization to run the model. Again, hard to do well.

HA is a terrific platform, but saddling it with an inferior VR system when cheap much better alternatives are available from Google and Amazon seems like a bad investment of focus. I don’t think that is what HA leadership is trying to do, at least I hope not. But I would be pretty measured in what I would expect out of Rhasspy if it’s standalone.

One good thing is that HA has a lot of statistics of what gets used and what doesn’t If folks see it’s Rhasspy voice is not going anywhere, resources will get diverted.

1 Like

I agree, to be honest. My plan over the next year is to only have a floorplan showing where a light or group of lights is on and being able to use voice or the existing light switches to turn them on and off. Plus, there are a lot of things that I would use voice for that I would otherwise have to set up the dashboard to display.

Just read this article about this

Not sure I understand this distinction. HA has access to all the same data (more for some people). What are some tasks you expect Google/Alexa to understand that you wouldn’t expect an HA voice assistant to be able to handle? And why?

Key worsds in the post you quoted, at least to me, are “any request”. I don’t expect, at least initially, that Home Assistant/Rhasspy will be able to give you the weather or play music on demand, like the big three voice assistants can.

If your home assistant has a weather integration and a media player there is no reason those two intents could not be added.

2 Likes

Right exactly as Tom said. Every HA instance has weather set up out of the box and there are many music integrations. So why wouldn’t that work?

Ok what about 3rd partly skills - like remote car start. Or just a random question “what’s the tallest mountain in the world”. Dropping in/intercom with other devices in the house, and announcements to other devices. Yellow ring notifications the device can store and then you can ask “what’s my notification”. Guard functions to detect glass breakage. Calling someone in another location through it by saying “call Mom”.

I don’t see support for all this, or at least not right away. The point is these smart speakers aren’t used for just home automation.

The point is that this is not a smart speaker.

This is voice control for Home Assistant.

Two different things.

2 Likes

Tom, the problem is this is not a brand new product (HA VR) in an empty market. Many many HA homes already have Google or Alexa integration. The stats on the integrations page says 20% of HA users use the Google assistant integration, and 13.7% use the Alexa integration. So at least 1/3 of the user base are integrated, and those are probably the most advanced users for HA that would be a target for a HA VR product.

Those users will be unlikely to switch to a HA VR solution unless it provides better functionality and performance, which is explicitly not the target of the effort as far as I can tell.

Then there is the cost and WAF issues of appearance to deal with. It’s hard to beat a $20 smart speaker that is in mass production and looks nice! The closest integration that matches the complexity of hardware setup and config is ESPhome, and it is only used by 20% of HA users. I bet you there is huge overlap of those users with the 33% of existing smart speaker users.

I guess many of us are trying to understand the target market for a Rhasspy implementation. How many existing Google or Alexa users would switch to this if it had only limited functionality (HA core functions), and was less flexible in syntax and understanding? I mentioned how my wife and kids gave up on a VR system that had only limited flexibility in syntax. How many new users would pick this over an existing smart speaker solution that was cheaper and had higher functionality?

Can you elaborate on what the target market for Rhasspy is supposed to be? Have you done some polling to see what the demand signal is from the existing user base for this as opposed to a smart speaker product with much broader functionality?

I don’t think any of these existing users will give up media control, calendar integration, entertainment and other functions for what has been explained as the target product. Certainly my family would kill me if I tried to replace it. But maybe I am not the normal HA user, so a poll might give you valuable data.

It would be good to apply some product management techniques to this decisionmaking - it will result in a better product.

3 Likes

Maybe people who are interested in voice control, but who do not want a cloud connected smart speaker in their home due to privacy concerns. No idea what part of the current or future user base that would represent. I mean, I really don’t care about VC, but if I did, I would probably fit that audience.

I also think there is a some marketing opportunism going on here. Is it a coincidence that this was released just after that Amazon $10b projected loss piece made the news ? And then there’s also the hypothetical (but not too farfetched either) situation where Google/Apple/AMZN all shut down their voice assistants because they’re unprofitable. Or make them a subscription model. In that case, local solutions would be the only way to get free VC and if HA has a working solution, that would attract a lot of new users.

1 Like