It is not the wake word that is the issue, but the speak-to-text conversion.
It is somewhat easy to take the spoken text and compare it to a select few lines of commands and add some deviation in the comparing to get a successful hit.
If you want to replace those select few lines with any possible spoken text, then the lines with increase exponentially many times with each word.
You would require a supercomputer quite quickly to get a decent response time and this is the difference on a local voice assistant and the ones provided by Apple/Amazon/Google.
humm. I don’t quite understand yet.
What is the local faster-whisper then, that we can install via Wyoming Protocol?
Can’t it do exactly that?
I understand that it needs a trigger/wakeword and limited input window to avoid data overload.
but such as I already can use the local Whisper Speech to Text via the HA Browser it also should be possible to use that via the M5 Echo, no?
I looked up faster whisper and it can do somewhat of that what you want.
I do not know if you have the hardware for it.
The GitHub page seems to suggest a NVidia Tesla V100 to get faster than real time or an Intel Xeon Gold 6226R to get equal to real time.
Ok, I see where the confusion is coming from.
There are two ways of setting up voice assistant in HA.
One is local and the other is with cloud services, where NabuCasa is one of them.
Can anyone confirm or provide me with some helpful tips, tutorials, etc. to get it done?
It would be so amazing if I could talk to my cloud automations and chatbots via HA…
still don’t know any way to capture SST sentences and pass them straight forward.
This tutorial of Technithusiast got me closer but I don’t want to use telegram, I just want to be totally hands-free by using M5Stack Wake Words to extract the TTS text which should then be passed on to make.com via HTTP request: