Does this specifically improve the detection of “Okay Nabu”, or does the processing of the collected recordings allow for an “abstraction” to (potentially) improve the detection of self-chosen wakewords as well? (“This is how an “a”-sound changes when people are not talking directly to a microphone”, “This is how vowels get affected by background noise”, …)
And kind of related: is there a blogpost or a video somewhere to better understand the decision criteria that resulted in “Okay Nabu”?
I haven’t really been keeping up with the voice assistant, but I thought the wake word was going to be configurable. I was hoping to be able to say “ok [house]”. Is that not the case?
Quick feedback though: please let us listen to the recordings before submitting them. (Just one example: Sometimes my webcam mic has issues and I’d like to make sure not to transmit white noise or silence instead of what you guys were asking for.)
How would ‘they’ (abusers) link that voice to the real identity behind hedda?
There are so many voices that can be used from YT vids that are more interesting then these snippets from a virtual person with a nickname.
Sorry, but this wake word isn’t gonna succeed. Nabu isn’t brand that people want to mention dozens of times every day. I won’t put my 2c into making it one. I want some generic term like “computer”, “house”. Alternatively “Alfred” or “Jarvis” (so I can feel like a batman or iron man). I don’t want to explain to my guests what “nabu” means.
Since you’re seem to be using HA for some time, you might know that you get a lot of freedom to set up your system the way you want.
Helping with this request will not imply that you have to use that wakeword in the future and I guess they are asking to use the same words since that’s probably the best way to train this.
It is, but as long as it’s not clear if this is an effort to specifically improve this one wake word, or if this helps with setting up any kind of wake word as well, then i see it as valuable feedback when people see reasons why “okay nabu” might from a “usability” perspective not the best choice.
@synesthesiam Can I suggest that you take this opportunity to also ask the same people to also say "Home Assistant", “Okey Assist”, and “Okey Jarvis” to help train such models as well so they could be included as alternative models for optional wake words than any user can chance to without too much effort?
Do you think “Alexa” and “Siri” were in response to millions of people already begging to say these names ? The brand recognition came from the names being used.
A name/word that will not be used in ordinary conversation is desirable. While talking to “computer” is popular (especially for Trekers), “Computer” or “House” would create too many false positives in many homes (especially while TV/radio news is in the background). In fact existing brand recognition would be a problem as it increases the chance of false positives.
I do agree with Hedda that Assist should provide a choice of several well-developed wake words, though I suggest a bit more discussion on which phrases would be best.
The point is that many of us Home Assistant entusiasts want ”Home Assistant” to become well known.
We want “Home Assistant” to become a household name and become a brand people recognise.
We want to use “Home Assistant” as the wake word when others visit out homes so we can show it off.
We do not want to say “Hey Nabu”. Nabu is not the project we love.
Nabu Casa is the company that many Home Assistant developers work, but “Nabu” does not represent all Home Assistant developers or the Home Assistant as a whole. Nabu still only represent the company Nabu Casa.
Just to highlight, there are several microWakeWord options currently available (Okay Nabu, Hey Jarvis, Hey Mycroft), with more coming. We want to use the Wake Word Collective tool to improve other wake words eventually; it just takes time and lots of samples to train good models. Trust us, this is just the start!
We are also interested in allowing language communities to pick wake words that work for them and build models around those. Keep an eye on our socials for when we open the tool for other wake words.
For those interested in making their own wake word, it is possible today with microWakeWord but requires a good GPU and expertise to tune it correctly. We think microWakeWord is the best option for fast on-device waking, but if you really want to make your own wake word it’s possible with openWakeWord (which doesn’t run on-device, but instead is run on your Home Assistant OS) Create your own wake word - Home Assistant
Answering questions that have been asked above (paraphrasing):
Why “Okay Nabu” and not “Hey Home Assistant” or something else?
Ideally, a wake word should not be something you expect to hear often in conversation by people in the home or via media (music, TV). “Nabu” is a fairly unique word, as opposed to “home”, “assistant”, and “assist”. The microWakeWord models are performing really well, and we like “Okay Nabu”
In time, I expect the training process for microWakeWord to be simplified and for a community collection of wake words to start forming like openWakeWord has.
Does this just improve “Okay Nabu” or also future wake words?
For now, mostly just “Okay Nabu”. However, any audio data for one wake word can be used as “negative” samples for a different one, so it helps other wake words in that sense.
Keep in mind too that this data will be made publicly available under a CC0 license, so it will hopefully help in the development of future wake word systems. This is similar to how we’ve published our text-to-speech datasets in the hope that they will aid the development of future text-to-speech systems.
I’ve used the above link to submit samples as I think it’s the least I can do to support a fantastic project.
But, I’d really like to be able to use something other than okay nabu in my production environment. I just can’t see my household members adopting it easily.
The problem I have is that I can’t get microWakeWord to accept anything other than okay nabu when I want to use hey jarvis
I select hey jarvis in the pipeline and I still have to use okay nabu
Thanks for contributing! Sorry for the confusion: the wake word selected in the pipeline is only for streaming wake word engines. Since microWakeWord is on-device, we have a different mechanism coming in the next release. We plan to adjust the pipeline dialog accordingly
… On esp32 devices without a headache for novice users?
Currently, the “OK Nabu” wake word is simple to setup… And annoying.
If I could figure out how to litter my home with esp32 devices that would actually respond to “Computer,” I’d love it.
So it doesn’t seem to be working for me. Even though I get the indicator in the address bar that is telling me the site is accessing my mic there is a red error box saying it can’t.
I would love to help out but unfortunately my Voice Assistant has been borked since the last major espHome update and nothing I have tried will bring it back to life (even a clean build fails). Several people have reported the same issue but this has been closed by the dev’s.
Yes, I have read through the comments and tried including the following to override what is included within the package but I still get the same error when attempting a clean build.
I have also tried installing the project directly from Ready-Made Projects — ESPHome as suggested in another comment and this also fails.
The issue was closed on 19-Sep and scanning through the comments I count at least 7 people including myself who say that the problem persisted after this.
All the comments point to the same error, people using an old configuration.
[mode] is an invalid option for [speaker.i2s_audio]. Did you mean [i2s_mode]?
mode: mono
Everyone that posted that “they got it working” is using channel: mono. The files contain channel: mono as well.
How are you applying this change? Your error in that issue specifically points to you using the wrong key mode: mono and mode: mono not present on any of the current m5stack-atom-echo files. Which means you’re doing something wrong to update the configuration. So, I urge you to try again.
If you have further questions, please create a new post in ESPHome, this blog is not a place for support. When you do create a post, make sure you show new logs and explain which example you decided to try. And remember, if you get the same error about mode: mono, you’ve done something wrong because those keys are not present in the example files.
My biggest issue with “OK Nabu” is the length. Alexa, Computer, Hey Google, all have three syllables. That’s 25% shorter than 4, and significantly easier to say. I’m surprised more people aren’t bothered by that.
And besides the length: i find it also kind of “bumpy” to pronounce. Not sure what the correct linguistic (?) term for it is, but something like the “usual suspects” wake words have a better “flow”. At least for me as a German pronouncing it in English.
Happy to help. Some more suggestions for those who are willing to contribute more time might be beneficial:
preferred recording device(s): If we have access to multiple devices that work with the web app (phone, tablet, laptop, iOS, Android, etc.) is there a preference for which to use? Do you want multiple recordings from the same person from different devices?
alternate microphones: should Bluetooth hands free type devices be used to cover more microphone types?
background noise: suggestions for what to include/avoid such as fans/AC/HVAC, music, background conversations (TV/Radio or live)
Two things that could be done to improve the web app for mobile
prevent the device from locking/going to sleep while recording in progress if possible (alternatively give instructions to be keep device awake, tap screen between recordings)
make the recording ready indication easier to see or ideally hear from across the room
Preferred recording device(s):
Any mobile device (all that you listed) using the built-in mic. Don’t use desktops or laptops (laptop mics can be really bad). Feel free to contribute from a range of devices you have in your home
Alternate microphones
I’d prefer if people kept to built-in mics. Bluetooth mics are usually designed for enhancing voice to sound clearer and closer to a device, we’re aiming for samples that come from various distances
Background noise
I spoke to Mike on our team about this. He said “Anything is fine as long as it’s not overwhelming the person speaking. Fans, etc. are fine, and work well for testing the robustness of the model.”
I have created 2 issues in the GitHub repo and will take a look at them when I get a moment.
The only problem I see with the latter one is that if we make an audible noise, we need to ensure it doesn’t get included in the recording via an echo in the room, for example. The first point however, is easily doable.
I have found the Voice unable to recognise my Australian accent, and have given up tryuing to integrate it into HA. So anything that improves recognition is a good thing.
I’m using the latest Chrome, on an up to date windows 11, but I got this:
It first asks for permission to use the mike, but once I accept, I got the above error “There was an issue accessing your microphone. Please check your browser permissions and try again”
I found out that my mike access was turned off in Windows settings. I did not even know of this setting… See screenshot below. Once I turned it on, all worked as inspected.
Yes, the CC0 license doesn’t restrict the use of the data. I’m not worried about that personally, as I doubt many companies will find “okay nabu” samples to be commercially useful
I agree with @tobol . I’d never choose to use “Okay Nabu”. I’d much rather use my own choice of wakeword(s).
MUCH better to ask for effort to improve the whole voice recognition environment across multiple words, languages and accents. That I’d happily support and contribute my time to. But “Okay Nabu” - no thanks.
I also agree with @Hedda that voice contributions, if being made public, should be truely anonymous.
Otherwise you’re going against the HA personal privacy principle in essence, even if you’ve asked and got permission.
Appreciate all the hard work thats going into this. And I get it "Okay Nabu” is a distinct enough phrase for detection. But for the love of god, please lets drop this as the default intent. Especially with the rumored voice devices coming out. Its truly an awful choice, with all due respect
And you guys just seem to be going down hard in the rabbit hole here to double down on this as the default choice.
It feels strange to me that the first wakeword to be crowd-sourced is one that is associated with the for-profit entity, rather than the Open Home Foundation or Home Assistant. It feels like the priority is being placed in the wrong place and it sends the wrong message.
Until of course home assistant achieves world domination, and the commercial players want to emulate it. “Buy our product and get all the advantages of home assistant, plus these added features, on a simple monthly subscription”
FWIW: Nabu == Germany’s biggest and oldest NGO. Continuously present in media, I doubt German users want to call their HASS “Nabu” Wir über uns - NABU
+1 for custom wake words. No worries training it myself as it only has to listen to me and my family as opposed to a trillon different accents and dialects. In fact, I want HASS to only listen to me and my family … just my 2ct
It’s a human thing (and seems to become even worse as we are ‘evolving’): complaining, criticizing, …
In this case also: they probably didn’t read the relevant postings…
Calling your voice assistant with the same name as an environmental association is a boss move.
Both are looking out for the environment - one of them takes care of the nature around you, while the other takes care of the environment by making sure you’re not wasting electricity because you don’t want to get up to flip a switch
What is the status on adding an all-in-one training notebook for microWakeWord (mWW) model training so that we can more easily can train our our own microWakeWord models?
PS: For the openWakeWord model training there is a fully scripted and automated Google Codelab notebook but I understand that might not work for microWakeWord (mWW) model training which takes longer and require more resources?
FYI, some/most posts that questioned the choice of default wake-word on official Home Assistant voice hardware where moved to a new thread seperating that discussion (though it still a very much on-topic here), see:
@synesthesiam, I think I understand it now! So microWakeWord already has “okay nabu”, “hey Jarvis” and “Alexa” models. But from the three, “okay nabu” has the worst false-accept/false-reject rate, so this is to have a dataset big enough to improve the model. Am I right?
If it’s that simple, I can see the benefit and the choiceability here. Thanks!
Hm. That web interface for training only seems to work if I’m in good range of my phone. If not, well, it doesn’t recognize me. Seems to me it would be more useful to collect less distinct phrases.