Year of the Voice - Chapter 4: Wake words

will35 · October 20, 2023, 9:59am

Hi Pepe59

On your dedicaced homeassistant satellite, just install Rpi OS and follow the procedure explained here:
synesthesiam/homeassistant-satellite: Streaming audio satellite for Home Assistant (github.com)

On your HA, install openwakeword, porcupine1, and/or snowboy addon

pepe59 · October 20, 2023, 3:34pm

Thanks, I’ll try it.

tbrasser · October 20, 2023, 8:57pm

So the link I posted above seems to work with IP webcam android app, which I was using for frigate anyway. Wakeword support is pending however.

SamJongenelen · October 21, 2023, 12:36pm

The best thing is you can have it respond rude too, for that perticular wakeword

pimw · October 21, 2023, 2:21pm

Hi, the processing times for the M5 Echo on my rapsberry Pi 4 are lenghty (like 37 seconds).
I’ve created an issue for this:

https://github.com/home-assistant/core/issues/102461

NeeeeB · October 21, 2023, 2:52pm

Hi,
Just tried to train a new wake word, trying to get a “frenchy” pronounciation for Alexa.
Script errored after 15 minutes, saying something about not being able to find and open a file…
I think I’ll have to wait a bit for things to be a bit more straightforward.
It works great though (using an Atom Echo), but having to pronounce the wake words with an english accent feels not natural ^^

TRusselo · October 22, 2023, 1:41am

When using HomeAssistant-core, how do we point it to a docker container running OpenWakeWord?
It doesnt look like you can at the moment.

This is how im running piper and whisper
Home assistant, piper and whisper, run in separate dockers, as core does not have “addons”
But I can still go to Home Assistant > Settings > Inetegrations > Add and it let me select piper/whisper, and input the IP:port of the corresponding docker.

openwakeword, does not show in the Add integration list…

Fraddles · October 22, 2023, 1:43am

Go to your ‘Wyoming Protocol’ integration under ‘Devices and Services’ and ‘add entry’ with the details of your openwakeword container…

Cheers.

pimw · October 22, 2023, 10:46am

While testing further, i have another observation. During the livestream , it is mentioned that false positives did not occur during testing.

In my testing, i do have false positives. This is while using the ATOM Echo M5 Development Kit. This happens both while using openWakeWord 1.8.0 and while using Porcupine 1.0.0 (both set to “Alexa”). All settings are left to default.

The typical scenario at which the false positives occur, is when watching tv, Formula 1: two false positives in one hour.

will35 · October 22, 2023, 12:27pm

Hello

French user as me and pitiful english accent ?
You can use the new snowboy addon ( Thanks Mike !) and train language free custom wakeword.
hassio-addons/snowboy at master · rhasspy/hassio-addons (github.com)

stuartiannaylor · October 22, 2023, 12:45pm

github.com

dscripka/openWakeWord/blob/19d59519eeb5ce30bef3b285c505bc231897f45f/docs/custom_verifier_models.md

# Custom Verifier Models

If the performance of a trained openWakeWord model is not sufficient in a production application, training a custom verifier model on a particular speaker or set of speakers can help significantly the performance of the system. A custom verifier model acts as a filter on top of the base openWakeWord model, determining whether a given activation was likely from a known target speaker. In particular, this can be a very effective way at reducing false activations, as the model will be more focused on a the target speaker instead of attempting to activate for any speaker.

There are trade-offs to this approach, however. In general, training a custom verifier model can be beneficial with two assumptions:

1) It is feasible to collect the training data required to build a custom model for all of the desired users of the system. The training requirements are minimal (likely <5 minutes of effort), but needs to be repeated for every user.

2) The range of acoustic environments seen in production are similar enough to that observed during collection of the user-specific data. If there are singicant differences across deployment acoustic environments, custom models will need to be trained for each one.

# Verifier Model Design

The custom verifier models are designed to be very lightweight and easy to train. For the current version of openWakeWord, the verifier models are simple logistic regression binary classifiers the take in the shared audio features from the openWakeWord preprocessing stage and returns a score between 0 and 1 indicating whether the audio contains a wakeword or phrase spoken by the target speaker. Because this task in inherently much more narrow compared to the detecting the wakeword or phrase from any speaker, the combination of the verifier model and base model can be quite effective.

Note that while the verifier model is focused on a target speaker, it is not intended to perform the task of speaker verification directly. Performance on this task may be adequate for certain use-case cases, but caution is recommended.

# Verifier Model Training

Training a custom verifier model is conceptually simple, and only requires a very small amount of training data. Recommendations for training data collection are listed below.

This file has been truncated. show original

You can add a 2nd stage verifier that will reduce false positives.

As far as I know they didn’t say you would not get false positives GitHub - dscripka/openWakeWord: An open-source audio wake word (or phrase) detection framework with a focus on performance and simplicity.

Basically its this TensorFlow Hub from a Google paper [2002.01322] Training Keyword Spotters with Limited and Synthesized Speech Data

Its basically a way to create a KWS when you don’t have good quality datasets and will never compete with commercial KWS such as Alexa/Google as they have the good quality datasets.

It does well against porcupine but porcupine is in very much the same area as its an easy custom KWS model that performed well against many obsolete KWS aka Snowboy and others but vs Commercial KWS its results are far less.

Its also true of Whisper as for a limited set of languages the Large model posted SotA results, this is not true of how its being used often with the Tiny model where WER skyrockets.

I guess it is what it is…

NeeeeB · October 22, 2023, 12:54pm

Oh nice !!
My English accent is not bad, but it is more a matter of WAF ^^
Will try this addon btw, and many thanks.

stuartiannaylor · October 22, 2023, 12:57pm

If you can find a French language TTS model or create a dataset with it then likely it would be better.
Likely the keyword could be the same but the language model will bring through the accent more.

pimw · October 22, 2023, 1:11pm

Thanks for replying. Regarding the (absense of) false positive wakeword detection, my expectations were based on this part of the video: https://www.youtube.com/live/YzgYYkOrnhQ?si=LzCIib2LOcvInfLP&t=1446

My experience is somehow very different.

stuartiannaylor · October 22, 2023, 1:13pm

Yeah apparently “It’s a technological marvel that is created with 4 goals in mind” but hey nothing like blowing your own trumpet as some have ribs removed to achieve it.
From Mycroft to Rhasspy we have some very enthuastic hobbyists doing some great work and a lot of work.
Also though when you compare to big data its trailing quite a long way behind.

I have been trying to advocate how easy it is to capture quality data and how hardware dictate is a huge advantage to big data.
A project can roll and dictate a KW and quickly capture a quality dataset of use where users opt-in to send collated packages of data with gold standard metadata.
What we have is more what could be done than maybe what users of commercial voice systems have come to expect.

Opensource has been nothing short of a disaster in the discipline of applying quality datasets and metadata where the CommonVoice initiative was likely the biggest waste of Mozilla funds ever.
It is riddled with wrongly labelled data and much is bad, but also it totally lacks (a few items do) quality metadata.
Region, Age group, gender, native speaker, mic hardware and maybe a few others are all that is needed as we do not need your name or house address just metadata to provide specific language and accent models for KWS & ASR.
Same with the Mlcommons iniative that merely forced aligned CommonVoice as at a guesstime contains more non native spoken word than native with what is near zero metadata so you are unable to filter for specifics.
It just didn’t happen and hasn’t happened and strangely opensource hasn’t started collating usage data via an opt-in.
Also its created custom KW so that the herd will never be able to collate a single KW in any qty.

When a ASR command sentance activates a skill, without an instant stop or the same skill being instantly repeated then its likely you can assume the KW & ASR command sentences to be good.
Opensource could be creating gold standard datasets, through use and closing the gap via an opt-in of only KW and command sentence collation, with ever evolving models based on them shipped OTA.

bbrendon · October 22, 2023, 6:29pm

I noticed stutter when using en-GB. En-US was fine

shampeon · October 22, 2023, 7:44pm

This is how I did it.

  openwakeword:
    container_name: openwakeword
    image: rhasspy/wyoming-openwakeword
    volumes:
      - ./config/openWakeWord/config:/config
      - ./config/openWakeWord/data:/data
      - ./config/openWakeWord/custom:/custom
    environment:
      - TZ=America/Los_Angeles
    restart: unless-stopped
    command: --preload-model 'ok_nabu' --custom-model-dir /custom
    ports:
      - 10400:10400
      - 10400:10400/udp

Then as @Fraddles says, enter the IP address of your Docker container. I couldn’t get it to work with the non-default port, fwiw, but there’s no conflict with 10400 on my host.

OzGav · October 22, 2023, 10:09pm

Do you mean here

I have tried changing options there but I am finding the stutter isn’t consistent so hard to test. This is why I thought it might be a hardware resource constraint that was coming and going?

xteaun · October 23, 2023, 4:11pm

Same problem here. Did you find a solution?

INFO ESPHome 2023.10.1
INFO Reading configuration /config/esphome/m5stack-atom-echo-8a1468.yaml...
INFO Updating https://github.com/esphome/esphome.git@pull/5230/head
INFO Generating C++ source...
Traceback (most recent call last):
  File "/usr/local/bin/esphome", line 33, in <module>
    sys.exit(load_entry_point('esphome', 'console_scripts', 'esphome')())
  File "/esphome/esphome/__main__.py", line 1036, in main
    return run_esphome(sys.argv)
  File "/esphome/esphome/__main__.py", line 1023, in run_esphome
    rc = POST_CONFIG_ACTIONS[args.command](args, config)
  File "/esphome/esphome/__main__.py", line 454, in command_run
    exit_code = write_cpp(config)
  File "/esphome/esphome/__main__.py", line 190, in write_cpp
    return write_cpp_file()
  File "/esphome/esphome/__main__.py", line 208, in write_cpp_file
    writer.write_cpp(code_s)
  File "/esphome/esphome/writer.py", line 342, in write_cpp
    copy_src_tree()
  File "/esphome/esphome/writer.py", line 295, in copy_src_tree
    copy_files()
  File "/esphome/esphome/components/esp32/__init__.py", line 593, in copy_files
    repo_dir, _ = git.clone_or_update(
  File "/esphome/esphome/git.py", line 95, in clone_or_update
    old_sha = run_git_command(["git", "rev-parse", "HEAD"], str(repo_dir))
  File "/esphome/esphome/git.py", line 32, in run_git_command
    raise cv.Invalid(err_str)
voluptuous.error.Invalid: fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

sparkydave · October 23, 2023, 11:53pm

I did actually. I just tried again a day or so later and it worked fine. I think there was just an issue with it accessing the github repo