Home Assistant Voice PE - Custom Wake Words Please!

Thanks for the suggestion. I mirrored this almost exactly but still no luck on the wake word.

You dropped both the generated files into config/ESPHome? If you renamed the files the after generating you have to modify the contents of the json file as well.

That was a good suggestion however it’s still not working. I added the files locally to /config/esphome and referenced them to the HA root in both the config and json, instead of the github remote. New config…
yaml:
model: /config/esphome/computer.json
json:
ā€œmodelā€: ā€œ/config/esphome/computer.tfliteā€,

It took much longer to compile this time and ran into errors I hadn’t seen before. I think it may have actually been compiling the tflite finally, but maybe now I’m learning the tflite is bad?

I don’t know. I’m about to give up and try to force my family to learn ā€˜ok nabu’. That won’t go well. We’ve been using ā€˜computer’ for Alexa for years.

New errors (sample):

In file included from components/esp-tflite-micro/tensorflow/lite/micro/kernels/mul_common.cc:19:
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h: In lambda function:
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:150:34: warning: declaration of 'const tflite::ArithmeticParams& params' shadows a parameter [-Wshadow]
  150 |       [](const ArithmeticParams& params, const uint8_t input1_val,
      |          ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:126:56: note: shadowed declaration is here
  126 | inline void BroadcastMul6DSlow(const ArithmeticParams& params,
      |                                ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h: In lambda function:
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:209:34: warning: declaration of 'const tflite::ArithmeticParams& params' shadows a parameter [-Wshadow]
  209 |       [](const ArithmeticParams& params, const T input1_val,
      |          ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:171:44: note: shadowed declaration is here
  171 | BroadcastMul6DSlow(const ArithmeticParams& params,
      |                    ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h: In lambda function:
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:249:34: warning: declaration of 'const tflite::ArithmeticParams& params' shadows a parameter [-Wshadow]
  249 |       [](const ArithmeticParams& params, const std::complex<float> input1_val,
      |          ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
components/esp-tflite-micro/tensorflow/lite/kernels/internal/reference/mul.h:221:56: note: shadowed declaration is here
  221 | inline void BroadcastMul6DSlow(const ArithmeticParams& params,
      |                                ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~

It had fatal errors? These look like warnings.

Can you post your config? No one should have to use Alexa.

These are the last group of log entries the VPE outputs after the flash. Errors are highlighted…

[16:54:44][D][micro_wake_word:261]: Inference task has started, attempting to allocate memory for buffers
[16:54:44][D][micro_wake_word:266]: Inference task is running
[16:54:44][D][micro_wake_word:378]: State changed from STARTING to DETECTING_WAKE_WORD

**[16:54:44]Failed to resize buffer. Requested: 10512, available 7956, missing: 2556**
**[16:54:44][E][micro_wake_word:061][mww]: Failed to allocate tensors for the streaming model**
**[16:54:44][E][micro_wake_word:251]: Encountered an error while performing an inference**

[16:54:44][D][micro_wake_word:273]: Inference task is stopping, deallocating buffers
[16:54:44][D][micro_wake_word:278]: Inference task is finished, freeing task resources
[16:54:44][D][micro_wake_word:378]: State changed from DETECTING_WAKE_WORD to STOPPED
[16:54:54][D][power_supply:050]: Disabling power supply.
[16:55:32][I][safe_mode:042]: Boot seems successful; resetting boot loop counter
[16:55:32][D][esp32.preferences:142]: Writing 1 items: 0 cached, 1 written, 0 failed

This is the latest build config:

substitutions:
  name: home-assistant-voice-09a836
  friendly_name: Home Assistant Voice 09a836
  # micro_wake_word_model: computer
packages:
  Nabu Casa.Home Assistant Voice PE: github://esphome/home-assistant-voice-pe/home-assistant-voice.yaml
esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}
api:
  encryption:
    key: ***
wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
micro_wake_word:
  id: mww
  models:
    - model: /config/esphome/computer.json
      id: computer
      # probability_cutoff: 0.9
      # sliding_window_size: 5
    # - model: okay_nabu
    #   id: okay_nabu_model
# voice_assistant:
#   noise_suppression_level: 3
#   auto_gain: 20dBFS
#   volume_multiplier: 3

This is the json:

{
  "type": "micro",
  "wake_word": "computer",
  "author": "Amac16",
  "website": "https://github.com/amac16/Custom_V2_MicroWakeWords",
  "model": "computer.tflite",
  "trained_languages": ["en"],
  "version": 2,
  "micro": {
    "probability_cutoff": 0.66,
    "sliding_window_size": 10,
    "feature_step_size": 10,
    "tensor_arena_size": 22860,
    "minimum_esphome_version": "2024.7.0"
  }
}

Which version of ESPHome Device Builder are you compiling with?

This was on 2025.6.3. I just upgraded to 2025.7.1 and re-compliled. Same error:

[05:54:04][D][ring_buffer:034][mww]: Created ring buffer with size 3840
[05:54:04]Failed to resize buffer. Requested: 10512, available 7956, missing: 2556
[05:54:04][E][micro_wake_word:251]: Encountered an error while performing an inference
[05:54:04][D][micro_wake_word:273]: Inference task is stopping, deallocating buffers
[05:54:04][D][micro_wake_word:278]: Inference task is finished, freeing task resources
[05:54:04][D][micro_wake_word:378]: State changed from DETECTING_WAKE_WORD to STOPPED
[05:54:04][E][micro_wake_word:061][mww]: Failed to allocate tensors for the streaming model
[05:54:14][D][power_supply:050]: Disabling power supply.

Try reducing the window size. I’ve used 7 and 8 without problems

No luck. I tried 8. Then I tried 7 in the json. Also did an install file cleanup and tried. Same error.

Im surprised - I thought that was going to work.

So you have a memory allocation issue, and if its not the ESPHome config, then it is likely the tflite file itself. I am no expert at this stuff, but the next thought is that the .tflite file is too large. I think the file size needs to be in the ballpark of 100kb, mine is 60kb. You used the basic training notebook? Did you change any of the settings from the defaults? If so, which ones and by how much?

I did use the basic notebook without modifications. I want the word to be computer and that’s the default under basic.

Im fresh out of ideas atm. If your file size is similar (~60-100kb) then it should work. My only other thought would be to re-run the training and see if what comes out is working, perhaps somewhere in the process it was corrupted.

Ive trained three models (trying to refine the phonetic spelling of the wake word) and each time its worked. My last model I 4x’d the samples and various settings and it worked. Ive also re-compiled and flashed over a dozen times refining the sensitivity parameters, still no errors.

So if the VPE works with Nabu and Jarvis, but not the custom, and bringing down the window size in your json didnt work, then i would only assume there is something wrong with the .tflite and you should train a new one.

1 Like

How long did that take for you, and how does your system look like when it comes to CPU and RAM? Curious about ā€œno gpuā€ setups (as I don’t have one :slight_smile: )

That sounds reasonable. I’ll work with that and create a few different tflites if I can. It’s great knowing someone else does have it working. I greatly appreciate all of the help!

I had not checked. I run a AMD Ryzen 5 7600X and 32GB of ram. I haven’t checked the time I would guess something between 1-2 hours? Took a nap and let it run. Nothing to think about. Just give it a try what can you loose?
The new word works pretty well so far.

2 Likes

I finally got it working! Once we ruled out the ESP build and config, you were correct to look at the tflite again. I figured I’d try a few more runs but they ran slow I didn’t want to try too many iterations unless I could speed it up. I have a mid-level RTX 4060 which should be pretty peppy. I decided to verify the container was utilizing the GPU fully. When I ran it I didn’t see a spike in GPU activity, so I decided to investigate.

In short, I think docker was using the crappy integrated on-board graphics, and something about that chip couldn’t generate a valid tflite. I fixed the problem by upgrading the nvidia drivers, updating my WSL version, and running the container from the Docker Desktop CLI adding the =gpus all flag. I followed these instructions from: GPU support | Docker Docs

To enable WSL 2 GPU Paravirtualization, you need:

  • A Windows machine with an NVIDIA GPU
  • Up to date Windows 10 or Windows 11 installation
  • Up to date drivers from NVIDIA supporting WSL 2 GPU Paravirtualization
  • The latest version of the WSL 2 Linux kernel. Use wsl --update on the command line
  • To make sure the WSL 2 backend is turned on in Docker Desktop

Validate GPU support

To confirm GPU access is working inside Docker, run the following:

 docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

Then my run command from the DD terminal windows was:

docker run --gpus all -it -p 8888:8888 <image tag>

Finally I ran the complete notebook, moved over the tflite and json, and re-flashed the VPE and viola. Thanks again for the help!

4 Likes

So i trained my own micro wake words and i wanted to share them with you all add this yaml to your config and it should give you the wake words
ā€œGlaDOSā€
ā€œHey GlaDOSā€
ā€œGlaDOS Pleaseā€
this is a refrence to the GlaDOS robot character in portal. I combined it with a trained glados assistant voice and llm with a glados prefix to the llm prompt like following : GLaDOS now controls my smart home thanks to Home Assistant's Voice Preview, and anyone can do it for free I’m super happy with the work i’ve done, and ā€œhey gladosā€ is detected really well

  id: mww
  microphone:
    microphone: i2s_mics
    channels: 1
    gain_factor: 4
  stop_after_detection: false
  models:
    - model: https://github.com/kahrendt/microWakeWord/releases/download/stop/stop.json
      id: stop
      internal: true
    - model: https://raw.githubusercontent.com/Darkmadda/ha-v-pe/refs/heads/main/glados.json
      id: glados
      internal: true
    - model: https://raw.githubusercontent.com/Darkmadda/ha-v-pe/refs/heads/main/glados_please.json
      id: glados_please
      internal: true
    - model: https://raw.githubusercontent.com/Darkmadda/ha-v-pe/refs/heads/main/hey_glados.json
      id: hey_glados
      internal: true

type or paste code here
2 Likes

Thank you very much for this one. :slight_smile:
Seems to be very reliable detected for me.

Don’t really like the idea to use this wake word, now that I finally switch away from the Amazon Echos. :smile:
But my family is already ā€œtrainedā€ on that wake word and if it works better I guess I have to ignore my feelings.

Are there users here who can share their experience with the alexa wake word about false positives?
Currently I tested it only at my office desk, but I need it to also perfom nicely on in the living room with music or tv turned on, without being activated all the time without a reason …

2 Likes

I am facing a similar issue for my trained Hey Jarvis models (the default wakeword does not work with my accent; expecially when I am >7 feet away from the mic).

I am using this notebook for training, which is a slightly modified version of the original training nb by kahrendt. I had to modify a few parts (not the training params) to get it working; I have added a cell to test and visualize inferences too. Unfortunately as soon as I select my wakeword post flash, I see the error:

[23:31:02][E][micro_wake_word:251]: Encountered an error while performing an inference
[23:31:02][D][micro_wake_word:273]: Inference task is stopping, deallocating buffers
[23:31:02][D][micro_wake_word:278]: Inference task is finished, freeing task resources
[23:31:02][D][micro_wake_word:378]: State changed from DETECTING_WAKE_WORD to STOPPED
[23:31:02][E][micro_wake_word:061][mww]: Failed to allocate tensors for the streaming model

And inference stops altogether, until I select another wakeword and reboot voice pe.

It is interesting that @Amac16 's model output was fixed after getting the gpu involved- I run linux and do not have any cuda issues.

Does anyone have any ideas?

Thanks!

update: fixed this issue!
The problem was, the my trained .tflite was bigger (more tensors) than the default models, hence the arena value (22860) is too small.

Since on ESP32-S3 with PSRAM was can allocate much more (hundreds of KBs), I updated the tensor_arena_size to 100000, and it works well.

2 Likes