Year of the Voice - Chapter 4: Wake words

We sort of still have ‘real world’ benchmarks using direct non processed audio to Porcupine which if compared to commercial results then actually its not the same ‘real world’ as say the accuracy of Google or Amazon.
Opensource seems to set it self up for a fall as the bring-your-own mic to the party creates a much wider scope to big data who dictate hardware.
Also we completely lack any decent audio preprocessing as in use the main sources of noise are other media and dynamic where static noise is actually a rarity.
There big-data goes one step further with profile based targetted voice extraction and the difference in use can be huge.

Its great you can train your own KW and you can post benchmarks, but the real world use is a comparison to commercial KW (that people have used) that we have so much missing from the initial opensource audio pipeline to the huge quality datasets they have that its very likely the results in comparison are poor as the Porcupine benchmarks are far less than the accuracy Google and Alexa units use.

Even the beamforming on commercial units is targetted that lock onto voice for that command sentence and simply using a conference speakerphone that has no target is not the same.
The mics we have very rarely have farfield as the expected setup is broadcast style.
In commercial setups it not just hardware but alg dictate so the models are trained on the exact signature of the device and ontop of that they use transfer learning of captured use to increase accuracy.

We still have a hole with the initial audio-processing and the bring-your-own device compounds this that likely results are going to be similar in use to what we have had before that is quite a way off what commercial units achieve.

Likely some will enjoy building and testing, whilst many who do use or have used commercial system maybe might not or consider it worthwhile.
If the thread fills up with enthuastic users then maybe, but likely what is current will remain with a few hobbyists.

You can do ondevice training and transfer learning to a large pretrained model biased to a smaller model of captured use and likely bridge the gap.
Also HomeAssistant like its Green/Yellow devices could create a KWS device and release openhardware documentation to dictate datasets of use and bridge the gap even closer.

The gap is big of both software and hardware and where opensource has a hole it still doesn’t seem to of been addressed…

Mass adoption and happy users will be the test, but I have a hunch on how that will go.

1 Like

It’s work in progress, not a finished project and am curious what the next chapter will bring.
I’m impressed with what has been achieved in a short time and the fact that you can run it locally.

After some initial challenges I was able to get the wake word to work on the M5 Stack atom however this morning when I cam back it no longer responded. I tried restarting it but no luck. I saw there was a beta update for esphome so I installed that but unfortunately that didn’t help either. I tried restarting HA and now my wake-word integration within Wyoming is failing. I only see this in the logs.

Logger: homeassistant.bootstrap
Source: bootstrap.py:508
First occurred: 8:27:42 AM (4 occurrences)
Last logged: 8:30:42 AM

Waiting on integrations to complete setup: wyoming

It never starts up. Any ideas on how to get past this? Do I need to uninstall the integration?

HA finally restarted and now I also see this error

Logger: homeassistant.config_entries
Source: config_entries.py:399
First occurred: 8:31:52 AM (1 occurrences)
Last logged: 8:31:52 AM

Error setting up entry openWakeWord for wyoming
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/config_entries.py", line 399, in async_setup
    result = await component.async_setup_entry(hass, self)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/wyoming/__init__.py", line 23, in async_setup_entry
    service = await WyomingService.create(entry.data["host"], entry.data["port"])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/wyoming/data.py", line 38, in create
    info = await load_wyoming_info(host, port)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/wyoming/data.py", line 57, in load_wyoming_info
    async with AsyncTcpClient(host, port) as client, asyncio.timeout(timeout):
  File "/usr/local/lib/python3.11/site-packages/wyoming/client.py", line 35, in __aenter__
    self._reader, self._writer = await asyncio.open_connection(
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/streams.py", line 48, in open_connection
    transport, _ = await loop.create_connection(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1069, in create_connection
    sock = await self._connect_sock(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 973, in _connect_sock
    await self.sock_connect(sock, address)
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 628, in sock_connect
    return await fut
           ^^^^^^^^^
asyncio.exceptions.CancelledError: Global task timeout

What a great feature !!!

By the way, would it be possible to stream the answer (if any) on other multimedia devices (such as Sonos or other). The reason is because sound quality and power with device such as ESP32 would probably not be the best. This might maybe also help sparing some ESP32 cpu ressources to help locally detecting wake word. On top of that, it’s more sustainable to reuse existing speaker than adding some new ones.

Using ESP32 as wakeword capture device and Sonos (or other TTS smart speaker) to get the answer might be in interesting in some users configuration.

Thanks.

11 Likes

Came here to say this. It’s likely to get buried in this thread, maybe open a separate feature request and I will upvote it!

I like the idea of having simple microphone ESP32s hidden in sneaky places and then having the replies come over specific multimedia/google home mini devices.

Edit: I opened a feature request.

4 Likes

M5stack does not have customer service to speak to. they don’t answer emails, my parcel is lost and I loose my money.
Where to get ATOM Echo Smart Speaker Development Kit from EU ?

Plugged in my USB microphone and installed assist microphone addon, As per the bit " Turn Home Assistant into a voice satellite". In the config it is asking for the microphone name or number, how can I find this? Or is there something more I should be doing to get the Microphone to work with HA? It is showing up in the hardware list.

1 Like

Are there plans to add support for generic microphones/camera streams as well as custom set output devices?

It would be nice to be able to have a simple UI where you select the device to listen on(microphone, camera, etc…), then chose the device to output sound on

2 Likes

Make sure to go to Devices & Services and configure the discovered openWakeWord.

You shouldn’t need to mess with that. I will be removing those options in the next update, since there is already audio drop downs at the bottom that let you pick the default mic/speaker.

Looks like it lost connection to the add-on or Docker container running the wake word service.

Thanks, I jumped over to Discord and was able to get some help to resolve this. fyi, it took a restart of the entire system.

1 Like

Tinytronics in nl does have them in stock …

You need addon Assist Microphone

For developers: when I add Piper, it automatically pop the option to configure it with wyoming in the devices, if you don’t do it at that time (I didn’t) and you do it manually later, it ask you for host and port and adding HA host IP and port 10200 it doesn’t work.
Unistalling Piper and adding it again, it back showing the option to configure and doing straight away it works.
It seems like first config after installin Piper work, but if you don’t do it immediately then you can no longer do it manually.
I am on HA OS.

@greggotcher there was a bug in the example Google Colab notebook that was fixed last night. If you try again with the updated version it should work correctly.

2 Likes

Same with Whisper

I saw that in the stream. Where is the Assist Microphone add-on located?

It’s in the first post.

Hello,

First of all, I am impress by the progression of this year of the voice.

I have a question about the openWakeWord. As it’s seems to use Tensor flow ligth, did you plan to support the Coral USB Device for finding the wake work in the stream ?

Thanks,

1 Like