Its still better than the current cloud based VRs that does exactly the same, but then transmit everything to the cloud for others to examine even more.
The biggest difference is probably that with Rhasspy someone inside the household will have to sit and listen through everything, if they want to extract info and that might simply be too resource heavy to do.
With the cloud-based services someone will be listening to your recording and they have the resources to hire people to do it all the time.
No they don’t. They do onboard wake word detection and only transmit the phrases after they (think) detected the wake word. This is still a privacy concern (and I will personally never let one of these things anyway near my home), but saying that they transmit everything to the cloud is just not true.
Or whenever they hear glass breaking. Or whenever they incorrectly believe they heard either of those things. Of course there’s really no way to confirm those are the only two times they are recording, fortunately Google and Amazon are extremely trustworthy companies
I love this initiative. I would gladly throw all the voice assistants I have in the trash if there was a local one that worked well. The devices we have now are too convenient to get rid of but besides the privacy issues they are quite slow to respond to anything. There’s clearly room for improvement.
Do I expect a feature match or as good nlp as Alexa/Google? Absolutely not. But tbh I have no need of that. I can count the number of things we use these for on one hand:
- trigger specific scenes and scripts that I hardcoded the phrase for (bedroom on/off, downstairs on/off, etc)
- turning on and off the TV
- occasionally changing light level in a room when the number automation uses is too bright/dark
- cooking timers
That’s basically it. Occasionally I ask it weather questions. Would be nice if I could still do that but not a big deal if it can’t either. I feel like with that small a requirement list it’s not impossible something new could meet it.
Oh please. As much as I don’t like these cloud connected voice assistants in general and for privacy concerns in particular, this is getting into conspiracy theory land. It’s not as if these companies wouldn’t be legally liable for unbelievable damages if it turned out they intentionally wiretapped millions of homes without consent. Unlike all these shady Chinese companies, whose cheap devices seem so popular around here and which could definitely pull something like that off without any legal repercussions whatsoever if they get caught.
I’m all for privacy, but let’s stay reasonable and logical when dicussing this. If I was the kid of or married to a tinkerer who started to put DIY ESP audio streaming mics all over the house, this wouldn’t go over very well. Much worse actually than a large corporation collecting semi-anonymized data about me for targetted ads.
On-device wake word detection is a must have feature if you value your and your families privacy in any way, even DIY ones. Maybe if Pi Zero stocks becomes available again this would solve the problem. That’s a big if though.
A few threads going on in this one post.
For me I would rather see HA support a set of tablets that can work with Rhasspy and a set of speaker/mic devices. Tablets are set, and I do not see HA getting into that market so that should be fairly straight forward. Mic/speaker combo hopefully someone steps up and provide a “solution.” Most of us are tinkerers, but for mass market people want to go somewhere click buy and be done. Until then it is on us to find a solution that works best for everyone. The packaging and form can come later.
The other piece is the software, the learning, and the AI. I trust HA way more than those other companies. If you aren’t with me on that then stop reading. I would like to see HA develop the code to allow opt-in or opt-out of using the cloud. Those who want to send their information to help build the system, make it smarter, etc. can opt-in. Those left out can opt-out. I see that is the only viable long-term way to build the software, and the library it requires. This would allow HA to gather, program, and push to our instances of Rhasspy. There will be limitations no matter what. If you want to ask to turn the light on then, or maybe even ask for the weather or to play a song those will likely all be possible. If you want to order toilet paper while you do your business, then this may not be the solution for you long term.
Reading all of this so far VR is not an easy subject and certainly not a won race. There are many issues and choices to be made and before you put the resources behind it I would certainly go for a extensive poll to measure the appetite for VR in HA. The issues to be check are plenty:
- functionality
- language
- hardware
- willingness to pay for hardware
- and how much
- would you use VR yes or no
- is VR a priority in HA
Etc
Obviously some of it is already a passed station since a dedicated person is hired to work on it full time. But if you follow the month of WTH VR is not a big deal in HA.
I am much more on finding the lowest interaction with HA as possible by automating as much as possible so that things happen without interference. Machine Learning and AI is more interesting then VR in that respect.
Anyway I would like to see a poll to find out if the resources are justified.
Interesting - so you have more trouble trusting your family then a gigantic anonymous company?
Didn’t you ever hear about the cases where they used the voice recordings from Alexa / Google Home for legal cases like Murder? I mean very coincidenly they apparently just listened this time around. Right?
Or do you perhaps tried them yourself to watch them wake up in the middle of the night - just because they can?
Or less related the recent LastPass leaks?
The biggest issue with them I have is that they are slowly migrating towards worse of everything with Amazon cutting large on the team for it for example:
Just think about Google dropping their “don’t be evil slogan” then add a little dose of Elon Musks “Free Twitter” and imagine what those three can do if they sell special services to governments.
All it takes is a few greedy and powerful people.
Thats likely also the reason the HA Team is investing in that direction and also the very same reason Open Hardware like RISC-V is on the horizon.
The longer its going, the more presure there is the likelier we will see something on the horizon (or worse: not see it)
The only escape is a local trusted install you are responsible for with Open Source.
Nothing is worse then a closed source cloud device in your network - specifically with microphones and cameras. Including RING, NEST and all those devices that might be helpful but arent stopped by incoming drop rules.
No amount of propitiary engineering and hardware is going to make up for it.
Yes it is going to be painful to migrate, yes its not perfect but there’s eventually no choice if you’re going for privacy also.
Its about time people realize security is something they are also responsible for, as soon as you hand that off to someone else you’re fucked.
This is good for people with brains. Bad for people that like One-Click-Installers and things that just work out of the box.
Yes: the one in your pocket, traveling wherever you go. And please don’t say you own an iPhone so it is less bad. For one thing, they will have you keep bluetooth on so they can track other people’s stuff even without you yourself even owning a tracker or ever asking your consent.
I dont own an iPhone and I’m very aware of the issues with Smartphones per se but thats not the topic here.
Thankfully there are also choices from Graphene or /e/os over Sailfish to dedicated phones with Hardware Killswitches (that are very limited in availability).
But yes the same holds true for smartphones.
There’s “real” voice recognition like Google/Alexa, and there’s “passable” voice triggers. As a brand new user of HA (but a 25 year programmer) I feel “real” voice recognition is a misplaced priority for HA.
As I set up an instance for the first time on a dust-collecting Pi 3B+ recently, what I wish had been smoother wasn’t voice, but “What would you like to set up? You typed ‘Wyze Cam’. Here’s information about that…” Or perhaps “lights,” “door locks,” “garage door” etc.
I’ve been searching forums, Google, YouTube etc. There’s not a well-presented single best source of most recent information for onboarding users that I’ve found. Please focus there.
As far as voice goes, I’d much rather have a wav-form trigger recognition facility. Let me speak into my phone or phone’s ubiquitous BT device to record an input, transfer it to my HA instance and compare it in the dumbest way possible to my pre recorded trigger prompts that I manually associated with automations and be done. I don’t need Jarvis to interpret my dreams. I might want the garage lights on, that’s all. I’d be far happier if I didn’t have to speak for that to happen.
If the aim is to really approach assistive technology for capability-challenged people or no-training interpretation of anything I utter into HA actions, I think this platform is a bit early for competing with the experiences provided by huge multinational corporation dev capabilities that still fall short of reliability. And that’s said with total respect and admiration for what HA is today and could become.
Some people are prepared to put up with the privacy implications of Google/Amazon’s methods. Others aren’t. For those who aren’t, this is not a misplaced venture.
The hired developer has already created a very capable voice control system (Rhasspy) which connects to HA. It is not as if he is proceeding from a standing start.
Thank you for your fresh perspective. I think we can all learn a lot from new users.
Your quoted comment, above, struck a chord with me this morning. I have a 2-year-old Feature Request on this forum which has garnered a decent amount of support among users. This morning someone inadvertently submitted a similar one, and it was closed with a link to mine. The poster made a comment along the lines of this being a “must have” and that they couldn’t imagine why it’s been languishing for two years. I agree.
There are hundreds of FRs. Some are frivolous, but many point out glaring weaknesses in the core functionality of HA. Yet the HA team hires UI designers and voice experts.
I’m not opposed to any volunteer contributor going down their own path. That’s a good thing. But these sidelines shouldn’t be the focus of the “core” of the support team. It seems to me that Nabu Casa’s resources (collectively, the resources of all who pay for that service) should support the core of the product. Keep things working under the hood. These are not the glamorous, exciting tasks that volunteers do for fun. These are the kinds of thankless, invisible but vital tasks you need to hire someone to do.
I agree that adopting existing, robust platforms to gain new features is smart, and I’m also privacy-minded (no Google/Alexa devices here, except “necessary evil” phones). I called it a “misplaced” priority, not “no” priority.
If I just finished a bare metal HA install, I want to DO something. Connect devices, schedule activity, remote control something, automatically respond to conditions, monitor sensors. HA should be asking what I’ve got in mind for it.
Am I just missing some built-in feature that answers “OK, now what?”
At that moment, VR is a smart parrot stuck in a cage, when I want a service dog that can fetch a beer. I see this as working on the parrot. Sure, I’d like to have both. But if I had to choose one… It’s not the bird.
Just my opinion and suggestion for something else that needs attention too. I happen to think it would do more to help new users than voice.
Well thats just a thing with Feature Requests.
Unless somebody picks it up its not a Pull Request.
There is no code yet.
Instead of waiting 2 years - if you deem it important you could’ve had spend the time to learn the codebase and develop it yourself.
Why not start now - if you deem it important?
There is a shit ton of things going on for a large/old codebase - especially quality wise to make sure everything goes smooth and not south. Tons of regression bugfixes, code reviews, and likely hundred to thousand Pull Requests done you never see mentioned in this timeframe because they just happen.
Thats just the nature of things. The best thing about an Open Source project like Homeassistant is that nobody stops you to code it yourself - be it as an Addon or a Core functionality (not guranteed that the later will ever be merged tought - thats the downside of it, but also natural for any project no matter the size).
There’s also HACS for a reason and a lot of issues can be also solved with architectural smarts, without modifying any code.
Just saying, in a constructive way.
Enuff Off-Topic
On-Topic: @HA Team: Some fresh Hardware Recommendations as in Testings would be a good start. Especially how the Firmware Situation is moving with them …
Is ReSpeaker still a good pick? 2017 to 2023 is quite some timeframe passed … so things could’ve changed especially Firmware-Wise …
More work than available workers and they’re free to work on what they want, including work that’s not in the backlog. Such is the way of open-source software projects, even the ones that have a dedicated, paid team.
It doesn’t seem optimal but it does produce useful results (every month for several years now).
It’s worth keeping in mind that this will have a positive impact on the ‘conversation’ feature through updated intents, which is text-based. This enables better interaction with existing structures without each user having to write their own, to say nothing of the more complex parts. So no matter how you feel about the voice component, this updates something else too. (And for those playing intents/API calls from other devices, even if it’s pass-through from email, SMS services, or whatever else, that’s a good thing.)
This much is made explicit in the blog post, but without someone using Telegram (or Signal, or whatever else), it’s unlikely relevant to most people (yet).
Not a fan, personally, but it would still require a grammar for HA to understand it while remaining flexible, which means that a lot of what’s discussed above is required anyway.
My concern, looking at the repository, is the YAML format is simplistic (and as a result inappropriate?). Linguistics isn’t a small, or simple, field, and a lot of this ground has been covered by other open source projects. For example: Project Fluent (https://projectfluent.org/). The issue I can see is that the structures in the repository will result in ungrammatical outcomes when ‘resolved’ against potentials (in some cases), and that the system will almost certainly reject acceptable forms because it perceives them to be ungrammatical.
I’m not raising it to be blindly/uselessly critical of the project/aim, but because the limitations need to be designed around, rather than just ‘forced’ (into pseudo-correctness, usually). So instead of having intents like Turn on the light
accept any article (so a/an/the), it should be split into action (turn on
), subject (user), and object (??? light
). Context detection (which microphone used it, if relevent) would then allow article → determiner (this/that), so Turn on *this* light
would turn on the light in the room/area of the device (even that light if the system had enough sensors to work out which one, or a prompt back saying it doesn’t). But more importantly, as we move out of English, and articles/determiners/deixis shift, the split makes it easier to reconstruct. And, from a development/programming point of view, it takes this back closer to traditional parsers, loaded with dynamic ‘modules’ used to feed the flexible grammar.
Now this I can’t necessarily agree with. Paid employees can be directed into work that needs doing. Of course good leadership is about convincing employees that they want to fix a problem, rather than telling them they have to.
I agree with your statement that paid employees can be directed. However, the backlog of Issues (and FRs) suggests to me that they haven’t been directed to whittle it down … or at least it hasn’t been their priority. The new features in the latest versions of Home Assistant, and this year’s focus, implies the direction is elsewhere.
And I think that is the source of the “resources” comments in this and other threads.
It is far sexier being hired to forge new features than to fix legacy code. My son got hired to fix legacy code in a commercial venture. It was soul destroying and ultimately a dead enf. Not to say his programming skills were not put to good use, but not what programmers dream of.
Just my personal opinion, but I don’t care much about voice. It’s more of a gimmick than a feature to me. And even though I am not a native English speaker, I don’t care much about localization of voice commands either. I set my interface language to English on most devices if I have the chance. I appreciate that this is an important feature to others but I’d much rather see time and effort spent on other things.