Is it planned to have or at least please consider to introduce a kind of voice_actions.enable and voice_action.disable services, like the cloud.remote_connect etc.
So that voice commands here (and coming from google or alexa) are not executed if switched off if the user set up rules to enable or disable.
Im missing this one now for years already, to avoid situations, that someone is shouting through the windows to do someting, when I’m e.g. not at home or in other points in time.
I was just looking into Rhasspy so add one more really interested HA user !
However, and I know this is hard to do, but I would need somehow wirelessly connected devices in several locations. A microphone wired to my HA server wouldn’t be enough.
I’m too used to (=spoiled by, I guess) Google Home devices now…
Rhasspy already have the ability to be set up as a satellite where the wakeword feature is running and then when it is detected the audio can be sent to a central unit for further processing.
The satellites can be something like Raspi 0W or maybe even ESP.
I had some Raspi 0W laying around and they work fine for satellites, but struggled a bit when I started to add E-ink screens, temperature sensors and speakers, so I had to upgrade to Raspi 3, although Raspi 0W2 might be enough, but it was not available at that time,
I read somewhere (don’t ask me where!) that an esp32 does not have enough grunt to do wakeword detection. Meaning you have to put all your audio back to the server.
I think the newer ones do, but head over to the Rhasspy forum and ask.
I know some of the more active members there dabble a lot with alternatives to Raspi satellites.
Well, some features seems to be missing on the Nicla Voice, like far field, beamforming, automatic echo cancellation and active noise suppression.
I suggest heading over to the Rhasspy forum and ask the questions there. You might get a response from rolyan_trauts, who have a wide knowledge on microphones and tried lots of alternatives for satellites.
But this is only really if you are first adopter. Now that Rhasspy is being developed by NabuCasa, then the potential userbase will grow and so will the incentive to make better products by the manufacturers, so the future products is really what is the interesting thing now.
For all of you that are complaining about them “ignoring the UI issues and hiring a voice guy”: they didn’t ignore them, because they DID hire a UI guy a few months ago. As for the various long-standing feature requests and suggestions made during the month of WTH, they don’t always see those things as being as useful as you might see them. Ultimately, the best way to get something you want to be added is to learn to code in Python and contribute it to the project; that’s why you see things like HACS; those are features that people wanted, but were, for whatever reason, not accepted by the HA developers for inclusion in it.
Sounds nice. But what is the hardware required to run this? I’m currently running HA on a raspberry pi 3B - which probably won’t be capable to run voice recognition. I’m considering to upgrade the hardware. Any suggestions what would be the minimum requirement?
As I have been reading this thread, I cannot help but wish this level of effort could be directed at some more critical fixes/improvements, such as cleaning up the Websocket so events can actually be filtered based on what entities the UI is exposing. As it currently is, it’s just a massive firehouse that slows down older tablets and kills mobile data plans, especially on larger deployments.
I waiting that HA will have level of voice quality. At premium level it may use ChatGPT, maybe make more expensive subscription for premium level. Also waiting more integration with video Frigate, use machine learning to build 3d models of persons, which were recognized at video. Also, it will be nice to recognize our cars at video and change color in real time. And use ChatGPT to generate Lovelace cards and interface.
There are probably already people working on those topics, but you can not just put more people on it.
New people needs to be brought up-to-date with the current progress and maybe even have to learn new skills.
Many in the current team will probably not be working that much or deep with VR, because it is not an easy area to master.
Michael Hansen is already the master on this area with a Proof-of-Concept code in the form of Rhasspy.
The effort you think there is based on the focus in the forums and blogs, but they might not be relatable with how much attention the developer team put into the different areas.
True, but you shouldn’t hire someone just because they became available, then change your priorities to match their skill set. I’m sure voice has been a priority for some, but looking at this thread and the long list of languishing FRs and WTHs, it’s clear that many of us would identify different priorities. Perhaps with all the recent tech layoffs, there’s a candidate out there with the right skills to address some of those, instead.