2023: Home Assistant's year of Voice

Well, to my anecdote you’ve replied that Ford has not invented a car. And nowhere in the Jobs’ quote it states that he did. I’m not a historian of motorization so I don’t know what exactly the market wanted when Ford came out with model T so I can’t arbitrary judge if Jobs quote makes any sense or not. But it is not the point what was the reality.

Looking and the WTH and Feature Requests (which in big part are the same requests) I don’t see anything ground breaking. Majority actually are some minor improvements, fixes and beautifications. What I know is that this is typical in many businesses that a lot of effort and money is put in this type of run & support activities whereas not enough is spent on actual innovation to push the business forward. Moreover some of the latest innovations are very welcomed by the community (e.g. Bluetooth and Matter/thread).

If you don’t understand if Jobs’ quote “makes any sense or not”, perhaps it’s best not to use it to justify an opinion.

It makes sense to me and to many people as well. You are trying to deny it with statements which are not verifiable and pointing to something that he was not even referring to, hence making an arbitrary assumption that he is not right.
And again, you are putting something in my mouth that I have not said. Read it again. I did not say that I don’t understand what Jobs has said. I’m saying that I can’t arbitrary state if the market wanted something over a 100 years ago. The quote is an anecdote and an explanation of a way of thought and not a market analysis.

Let me explain it using the simplest terms possible, you picked a quotation that doesn’t support your argument. Why? Because the person you quoted, misquoted someone else in order to suit his own purposes.

I’d never heard either quote, by Ford or Jobs, or maybe I’d forgotten. But that article was fascinating. I learned a lot today, thank you!

As for it’s relevance to this discussion, I have been involved in a lot of development projects over the years, and more than my share of maintenance on existing products, too. I will stand behind my comments about how being responsive to customers makes the difference between success and failure. I can’t say any more without repeating myself.

Thanks, very interesting article. If you read it carefully you would actually see that it does not matter whether or not that quote has been real or just made up to support Jobs line of thinking. In the first years it was a groundbreaking and, as per the article, enabled “the creation of a new and rapidly growing market”.

Maybe we are at this point of evolution of a home automation and HA that needs this mass market? Maybe the VR is something that millions (and not just half a million) users would buy into? I don’t know for sure but you won’t get there by fixing the way automations are displayed in the settings menu or by making it easier to change entity_id (two of the top 3 WTH items). That is my point.

Fully agree, but being responsive to customers, and creating products they will love does not necessarily mean to implement every feature they ask for. It is much more to invest in the UX, observing their behavior while using the product and identifying actual pain points (not necessarily related to the use of the product itself) and solving those paint points. This might require introducing completely new features which the users did not think about or requested for.

I’ve heard both quotations and, like so much else about the past, the term ‘apocryphal’ pops up (authenticity is questionable but it’s widely believed to be true). Something sounds exactly like what we would expect the person to say or do except historians can’t find evidence it was actually said or done.

I am looking forward to someone figuring out a easy hack to take old Google Mini speakers and remove the brain and pop in a ESP or Pi to leverage the mic and speakers. Or maybe someone creates a module that we can buy and place inside…but google will prob sue at that point.

I’m sorry, but I don’t understand - you don’t know why Google hasn’t done what?

Unlike Amazon, Google no longer send mic audio recordings to the cloud to do voice recognition, both on hubs and the phones. They built models that run in the hardware on their Nest hubs, ostensibly for privacy enhancement reasons, and I suspect also to avoid getting slapped with warrants for captured audio as well.

If you go into their portal you can look for recordings and notice their aren’t any anymore. Again, I don’t know about Lenovo or other 3rd parties, but this is certainly the case for all the Nest hubs etc… in my house. Now, the downside of this is the voice recognition doesn’t work as well as Alexa’s, where they ship the raw audio to the cloud, which does a better job than the on device processing that Google does. I think this confusion may lead people to think they need a local solution for voice recognition when it’s already mostly running locally. Clearly the integration with HA from Google uses the cloud, but with the local API now supported, it would be great if that could all run locally instead.

So I’m not sure what a HA native voice recognition is going to be that much better from a privacy POV than Google’s, except maybe the record of what you are saying to the assistant won’t be be known to Google (though you can set a flag in account settings that deletes all that data automatically if you want).

More languages could be an upside, as I don’t think Google supports languages in countries they don’t sell the devices into. But implementing lots of languages poorly from a voice POV may only be marginally better than not supporting the language in the first place. I do think this may be one of the differences between the HA leadership and some of us in the forum - my experience tells me that doing the voice recognition well for lots of languages is very hard, and they may think it’s a lot easier. Or that the quality bar may be a lot lower to be useful than my experience. Either way, this will be a big learning experience for the HA devs.

Whether the “Ford Quote” inside the “Jobs Quote” is a misstatement or outright fabrication or technically innaccurate doesn’t matter. The point of the Jobs quote is valid - innovators don’t rely on market surveys. They often see a need others don’t and fill it. They are also often wrong.

But at this point voice response, even local only voice response, is hardly an innovation. It is more inevitable than innovation.

I think anyone that has done much (or any) work with the available speech recognition projects would realize this is VERY doable for a limited vocabulary. It won’t be Alexa or Google Assistant any time soon, but could and should be very functional.

A lot of talk here about language model issues, “complex” grammars, etc, but it all seems conceptually easy to address to me (conceptually easy <> quick or trivial). It doesn’t need to be 100% out of the gate. It doesn’t need support for dozens of languages out of the gate.

The hardware issue is the biggest problem, IMO, but I’d expect an esphome-like product (even if it isn’t an esp chip at the heart) will spring up from the community or nabu casa itself.

There will be a reasonable user base for this if Nabu Casa implements it properly.

Will the new feature bring in the new user numbers to justify the development expense? Questionable.

From a business perspective, it could cannibalize nabu casa subscriptions. I have to wonder how many have subs primarily for Google/Alexa integration. How many of those will drop the sub if they can spend $150 on mic/speaker units and be done with it. Still, a shiny new feature can draw in users even if they ultimately don’t use it.

But at the end of the day, the decision’s been made. I suspect Nabu Casa is a benevolent dictatorship and if Paulus wants to pursue this, it is his call. It’s not the first questionable decision, it won’t be the last.

C’est la vie.

1 Like

+1 Also, it’s important to note that Jobs introduction of the iPhone experience was a brand new experience for almost everyone. People have a hard time asking for things that they have no experience with. The iPhone was very different from everything that came before it - a truly innovative product!

Voice recognition today is anything but that. People are very used to it because it’s a core feature of every smart phone. And their expectations have already been set based on these experiences. People are used to a general purpose voice interface, not just for HA functions but everything. If HA had implemented a limited voice experience before these assistants had been fielded in the mass market, the expectations would have been a lot different. This is not something like what Jobs did with the iPhone - people already know how to deal with assistants, and this HA effort is targeted at much less functionality.

Alexa has shown that people are willing to trade privacy for utility. Voice recognition has been around for decades, but it never took off because it’s utility was too low. Alexa and Google provided high utility, and now all those older systems are basically out of business. Now, I am sure if someone was asked if they would like the same functionality as a Google home at the same price and form factor, but with completely privacy and running local, everyone would say “Yes, I want that!”, but that is absolutely not what the devs have said they are aspiring to build. Having something that may be better from a privacy POV, but much more expensive in hardware cost and form factor appearance with much less utility has been built before (think Homeseer VR, Dragon naturallySpeaking, etc… , ) and is unlikely to be successful in the market today than these other efforts were.

I would hate to see HA rebuild a system that was known to not be competitive. I am not saying that is what the HA folks are aiming to do, but it’s good to not ignore history of products in this space.

I’m looking forward to trying this out, and watching as it progresses. I’ll work on maybe adding a few sentences based on our usage.

Best of luck to all involved!!

you would actually see that it does not matter whether or not that quote has been real or just made up to support Jobs line of thinking.

Made up. Even a ‘visionary’ is fallible.

FWIW, the conversation and intents integrations have been around for awhile. Extending them to support dozens of languages plus a library of pre-built intents doesn’t seem like Home Assistant’s ‘iPhone’ moment.

you won’t get there by fixing the way automations are displayed in the settings menu or by making it easier to change entity_id (two of the top 3 WTH items).

Implementing the Top Ten FRs and WTHs would make many existing users happy and improve usability for new users. Appealing to a “mass market” would first require a turn-key solution (because installing Home Assistant is far beyond the average consumer’s comfort level).

Experiments like Home Assistant Blue and Yellow have demonstrated that it’s not easy to develop a successful mass-market turn-key device. In addition, Home Assistant’s frequent release schedule introduces significant (breaking) changes each month. That’s not conducive for mass market adoption; the project is still a long way from being a consumer-friendly appliance and remains a hobbyist’s platform.

If you read the release notes of 2023.1 release you’ll see some first developments. First of all you can add aliases in multiple languages to HA entities and they will synchronize with Google Assistant. This is great improvement as so far it required yaml changes, system restart and sync. So also the google assistant (Alexa probably as well) will be easier to set up and use.

Whether the “Ford Quote” inside the “Jobs Quote” is a misstatement or outright fabrication or technically innaccurate doesn’t matter.

It undermines an argument if fabrications are used to justify it. Jobs justified his decision-making based on something Ford never said. Then his misquote was used in this topic to justify a decision made, with all due respect, by management that doesn’t have Jobs’ track record. Overall, it was a poor choice to support an unpopular opinion.

The point of the Jobs quote is valid - innovators don’t rely on market surveys. They often see a need other don’t and fill it. They are also often wrong.

There’s nothing innovative about voice recognition as far as home automation goes. It’s already being used in various forms for several years. Making it local is a welcome idea but not a bold new innovative direction for home automation. That’s why comparisons with Jobs’ decisions (even if it were a factually accurate quote) are farcical.

Well, like I pointed earlier, it does not have to be about innovation (the same as Ford has not invented neither the car, nor the serial manufacturing) but maybe about making it more accessible to the home assistant users? E.g. by adding all HA supported languages and making it easy to configure? Today it is not the case. Google or Alexa will only allow you to control devices that are supported by them +anything in HA but after quite painful configuration and in selected languages only.

FWIW, I’m currently using Home Assistant and Alexa to control devices that are not natively supported by Alexa.

The only way local voice recognition can make it more convenient for me is:

  1. Simplify the configuration process.
    Odds are it’s likely to succeed in simplifying it.

  2. Support the existing voice recognition hardware I already have because having to issue voice commands via a phone/tablet is not a convenient means of interacting with a voice assistant.
    Chances are this is unlikely to happen anytime soon.

The sticking point for mass adoption of this new functionality is likely to be the same as for the product it will be based on (Rhasspy) and that’s the absence of compact, inexpensive hardware for receiving/relaying voice commands. Home-brew solutions are more expensive than commercial products (likely because the commercial ones are sold at cost or a loss) and don’t share the aesthetics and packaging of their commercial counterparts. A game-changer would be Amazon opening Echos to receive open-source firmware (I wouldn’t bet on it).

2 Likes

Fully agree.
It will take some time before there are good hardware options for the local HA VR. Maybe that is the goal of Nabu Casa to come up with something?
In the mean time I hope HA will make the best and easiest possible use of google speakers. They can communicate locally as well. Btw it is confirmed that there is no chance to put a custom firmware on google speakers. Not sure about Alexa ones.

1 Like

On a positive note, for those wanting to cosplay as Tony Stark, Jarvis-like voice capabilities would be cool.