Home Assistant Voice PE - Custom Wake Words Please!

qJake · February 10, 2025, 2:32pm

Any word from the HA Voice PE team yet on when the ESP32 firmware will support custom wake words?

There have been some community projects training new microWakeWord wake word models, and there have even been some additional wake words trained and uploaded to the official microWakeWord repository, as well.

Even if the implementation is somewhat primitive / limited, I’d still like to see it. For example, have the firmware look for a specific file (like /config/voicepe/wake/model.json,model.tflite) and if it exists, then allow that to be used as a fourth “Custom” option in the wake word dropdown.

My family doesn’t like any of the wake words that come out of the box because they aren’t easy to say or remember, especially for kids. Having more options, or the ability to add custom / third party trained models, is really the only major complaint I have about it at this point!

NathanCu · February 10, 2025, 2:55pm

You should be able to take control of the device firmware in Esphome and drop your file in. Just add it in the same place the other files are mentioned. (I think the lines actually actually refer to a git repo where the microwakeword is stored.)

Edit the files to point to yours and recompile.

I agree I hate the words except Jarvis but he’s too overused already… So I am one of those working on compilation of my own microwakeword and that what I intend to do.

But I’m waiting to see what Year of the voice chapter 9 brings before I invest time and effort.

qJake · February 10, 2025, 3:39pm

Ah, I didn’t know you could do that by taking over the device. Nice!

Here’s my concern though… I really like the “set it and forget it” nature of the non-customized firmware. It “just works” - from the HA integration to the automatic updates.

If I take control, I’m assuming that means I’ll have to use ESPHome to flash the firmware manually whenever there’s an update, and I’m also responsible for the connection / integration with HA? Do I lose anything by taking control?

NathanCu · February 10, 2025, 3:49pm

Essentially you will have to replicate edits in new firmware in the future. Keep track of what you did.

Look at the end of the day they are still working on making it work right. There’s a reason it’s PREVIEW EDITION. Making it work in everyones fringe setting (like four million wake words) is very not high priority right now i suspect for them.

Id rather they work on instead get the timers exposed to HA so we can see the entities, make sure it doesn’t crap out playing audio or stop answering the wake word at all.

I can edit the code to put in my wake word when they update it.

But that’s why we have this and not a goog or Amz device. You can always throw it back tk default if it becomes too cumbersome for you.

qJake · February 10, 2025, 4:05pm

You’re absolutely right, and I wanted to at least make this thread since Google results for HA Voice PE and “custom wake word” didn’t turn up a whole lot of great results.

I also didn’t know you could go back to managed firmware. I think I’ve been burned too much before, in trying to get older voice hardware to work (looking at you ESP32-S3-BOX), where the ESPHome config is in someone’s Github, but if you want the screen to work, you have to pull in parts of someone else’s project, too, and the whole thing just becomes a huge mess.

Voice PE is already miles and miles ahead of all of that stuff… I guess I just didn’t want to go back to futzing around with ESPHome config files. But I suppose my choices are that, or wait for them to officially implement it.

In the interest of this thread being a potential Google search result, can anyone share what the modifications to the OOTB ESPHome firmware would look like if you were to augment it with a custom wake word?

qJake · February 11, 2025, 12:23am

Argh… this is what I was afraid of.

When I “took control”, I just had a barebones config file with a packages: reference to Voice PE… but nothing about microWakeWord.

First I added only the micro_wake_word: section to the config and kept packages:, but that didn’t work.

Then I tried replacing my entire config with the contents of this:

https://raw.githubusercontent.com/esphome/home-assistant-voice-pe/refs/heads/dev/home-assistant-voice.yaml

Replacing/adding stuff as needed (e.g. API encryption key, wifi credentials, device name). Plus an experiment to add another wake word:

micro_wake_word:
  id: mww
  models:
    - model: https://github.com/kahrendt/microWakeWord/releases/download/okay_nabu_20241226.3/okay_nabu.json
      id: okay_nabu
    - model: hey_jarvis
      id: hey_jarvis
    - model: hey_mycroft
      id: hey_mycroft
    # CUSTOM
    - model: https://github.com/kahrendt/microWakeWord/releases/download/v2.1_models/alexa.json
      id: alexa
    - model: https://github.com/kahrendt/microWakeWord/releases/download/stop/stop.json
      id: stop
      internal: true
  # CUSTOM
  vad:
    model: https://github.com/kahrendt/microWakeWord/releases/download/v2.1_models/vad.json      
  microphone: comm_mic

# ......... the rest is "stock" / unchanged. omitted for conciseness.

Compiled and OTAed it and… it errors with something I can’t easily debug with my very limited ESP32 knowledge:

Of course that error makes little sense to me, because I did not change the voice assistant configuration at all, I have other Voice PE devices using the same assistant successfully, and if I push the button on the device, the whole prompt/response flow works flawlessly! And line 776, in case you’re wondering, is part of the lambda for this:

      - addressable_lambda:
          name: "Error"
          update_interval: 10ms
          lambda: |-
            # Line 776 is in here...

So I’ll reiterate the need for more user-friendly customization of the wake word. Instead of choosing from a predefined selection, allow users the option to upload or reference their own.

wmaker · February 11, 2025, 7:34pm

I’m not an expert, but can tell that the audio samples were sent to STT which figured out you’re sentence, but when HA ran the sentence through the intent sentence parser it failed somehow and HA seems to just be informing the VPE of this failure so that the VPE can shutoff its mic, etc.

I’m not sure what the intent error is, I tried this sentence on my system and it worked.
Just to make sure, on your HA, goto UI->DevTools->Assist and type in can you tell me the time? and see what it shows.

qJake · February 12, 2025, 4:33am

That works Other VPE devices that I have connected (same network, same HA instance) also work. I’m not sure why one specific VPE would have an intent error… quite odd.

pove · February 14, 2025, 4:51pm

substitutions:
  name: home-assistant-voice-xxx
  friendly_name: Home Assistant Voice xxx
packages:
  Nabu Casa.Home Assistant Voice PE: github://esphome/voice-kit/home-assistant-voice.yaml
esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}
api:
  XXX 

wifi:
  XXX

micro_wake_word:
  models:
    - model: https://github.com/.../cassandra.json
      id: cassandra
    - id: !remove hey_mycroft
    - id: !remove hey_jarvis

qJake · February 14, 2025, 5:04pm

I feel like I tried that already and got an ESPHome compilation error, but I will try it again and report back if it works!

pove · February 14, 2025, 5:39pm

It’s working for me. Let me know if it throws any error for you.

NathanCu · February 14, 2025, 5:43pm

@pove just to make sure we’ve got the steps for someone playing at home.

take control of device
take note of the sections in the existing yaml where you have the placeholders and transfer them into a copy of what you have above
overwrite, save, compile, deploy
4)…
profit?

(or are we missing anything) - besides microwakewords?

pove · February 14, 2025, 5:59pm

Actually, when you take control of Voice PE in Esphome, you’ll get everything in the yaml and a reference to the official code as a package from GitHub.

If they release a new version, you just need to install again your device using ESPHome. It will get last code from GitHub, compile and upload to device.

The only thing I added was the microWakeWord section. I included a new wake word referencing the GitHub model and giving a new id (important not repeat id’s that exist on the official yaml code). I faked the new wake word in the code I shared, but we all know what we’re talking about here…

I also removed two official wake words with “!remove” key (I’m not going to use them).

NathanCu · February 14, 2025, 6:04pm

That’s what I’d surmised, just making sure we get it for posterity… Else someone WILL Ask in about 25 posts down the thread.

Im replacing Mycroft myself. I keep Jarvis as the ‘system’ personality. And something tells me someone wrote code somewhere assuming the default is always there so I left Nabu alone… For now.

If I’m not mistaken we have to replace at LEAST one because of space limits.

pove · February 14, 2025, 6:22pm

I wanted to remove all stock wake words, but I noticed that model didn’t load because you cannot change the wake word (only one option in the picker).

qJake · February 14, 2025, 8:07pm

This was not my experience… when I took control, I only got (more or less) lines 1-9 in this screenshot. Definitely not the hundreds of lines in the full YAML config.

I think I was a point release behind on Home Assistant (2025.2.2 instead of the latest), not sure if that makes a difference or not.

I’m confused how this works… In the official config, there are other sections that reference id: mww to point to the micro_wake_word: component/section. So if you declare a second micro_wake_word: section with a different ID, how are the other sections supposed to know to use that one instead of the existing one with ID mww?

pove · February 14, 2025, 9:53pm

The hundreds of lines of the full official yaml are here:

You are adding them as a package.

Regarding the “mww” id, you are right, but I’m not replacing that section, I’m just adding another model, with a new id for the model, inside that section.

bkazm · February 15, 2025, 5:49pm

I have a new wakeword that was trained using instructions here: Wake words for Assist - Home Assistant
That gave me two files (a .tflite and a .onnx), but no json. Can these be used with the PE? If so, how do I reference them (- model: ??) after taking control? Currently they are in the /share/openwakeword folder on my HA device.

pove · February 15, 2025, 8:42pm

Important: wake-word is different from the micro-wake-word.

stuartiannaylor · February 15, 2025, 10:16pm

No.
That is for OpenWakeWord whilst PE uses MicroWakeWord.

github.com/kahrendt/microWakeWord

notebooks/basic_training_notebook.ipynb

1c2d86664

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "r11cNiLqvWC6"
   },
   "source": [
    "# Training a microWakeWord Model\n",
    "\n",
    "This notebook steps you through training a basic microWakeWord model. It is intended as a **starting point** for advanced users. You should use Python 3.10.\n",
    "\n",
    "**The model generated will most likely not be usable for everyday use; it may be difficult to trigger or falsely activates too frequently. You will most likely have to experiment with many different settings to obtain a decent model!**\n",
    "\n",
    "In the comment at the start of certain blocks, I note some specific settings to consider modifying.\n",
    "\n",
    "This runs on Google Colab, but is extremely slow compared to training on a local GPU. If you must use Colab, be sure to Change the runtime type to a GPU. Even then, it still slow!\n",
    "\n",
    "At the end of this notebook, you will be able to download a tflite file. To use this in ESPHome, you need to write a model manifest JSON file. See the [ESPHome documentation](https://esphome.io/components/micro_wake_word) for the details and the [model repo](https://github.com/esphome/micro-wake-word-models/tree/main/models/v2) for examples."
   ]

This file has been truncated. show original

But to be honest its pretty stinky in terms of dataset creation as the 1000 piper KW are 2 gender voices of very little variation and then its starts applying reverberation recorded in forests and its likes… ?!?
I created a microWakeword dataset that I have been meaning to train, excluding reverberation as not sure how well or how the Xmos works.

Tend to binge or procrastinate and doing the later, but not even sure if the training fits current model settings…