Home Assistant Voice PE device together with openWakeWord?!

Rainer_HN · February 6, 2025, 8:13am

Hello community,

I bought this new Home Assistant Voice Preview Edition hardware and want to use it with a personal self generated WakeWord (openWakeWord).

I am running an up to date HassOS on a Raspi5 (installed via official HassOS image).

I was able to do the following things successfully:
1.) Installation of the Home Assistant Voice was successful. It is working with the standard WakeWords like “ok nabu” or “hey jarvis”.
2.) Generation of my personal WakeWord seems to have worked sucessfully. I have a self generated xyz.tflite-file and have put it in share/openwakeword/
3.) Installation of openWakeWord seems to have worked successfully.
4.) I created a new Voice Assistant, selected openWakeWord there and also selected the self generated personal WakeWord there.

And now my problem:
Selecting the self generated personal WakeWord within the Voice Assistant has no effect. It always uses the StandardWakeWord which is configured directly in the Home Assistant Voice device. In this device I can select the Voice Assistant to be used, but it also is not taking the WakeWord from the Voice Assistant. It is also not possible to select the self generated personal WakeWord directly in the Home Assistant Voice device. Here I only can select “hey jarvis”, “ok nabu” and “hey microft”.

Has anybody an idea how this can be solved? Is it somehow possible to make the Home Assistant Voice device taking the WakeWord from the Voice Assistant?

wmaker · February 6, 2025, 9:40pm

I’m not an expert on this, but to use the openWakeWord, the satelite needs to support “streaming wake word”. However looking at the ESPHome Voice Assistant component there is this config parameter:

use_wake_word (Optional, boolean): Enable wake word on the assist pipeline. Defaults to false.

and looking at the VPE’s YAML here

use_wake_word: false

so it appears that the VPE is setup to only use the on-board micro-wake-word and thus no streaming support.

Searching around a bit, I found this yaml from JLo (HA’s product manager) that seems to make streaming possible.

Others have taken a try at creating and using their own micro-wake words and loading it on the VPE itself but there are lots of discussion as to how well this will even perform.

jermn007 · February 19, 2025, 2:43am

Tommy - thanks for the information and suggestion. Do you have any idea how we would go about updating the yaml on the Voice PE device itself? Would this require recompiling the firmware?

wmaker · February 19, 2025, 6:58pm

I have not yet come across a solution from someone for getting the VAPE to use openWakeWord

RobMeades · February 20, 2025, 1:59pm

I have been able to take the yaml from JLo above and apply it to the most recent version of the home-assistant-voice-pe repo on a fork which can be found here.

If you have the ESP Home Builder add-on installed, you have the option of taking control of your Voice Assistant device with that and loading my fork onto it instead. It is working fine with openWakeWord here; not tested for hours yet, but looking good so far.

malballuk · February 28, 2025, 2:27pm

Looks interesting to me
I have the ESP Home Builder and have taken control of the VPE, I see where to put the link to your repo but what I dont understand is how to use my own wakeword, I already have the file but where does it go?

sobchek · March 5, 2025, 3:46am

Just flashed and it is working. Wakeword is triggering and the commands are executing. Thank you for this. Only caveat is that after the wakeword has been triggered, if the voice command isn’t started with about a second or so, it fails.

Is there any setting in the config to add a delay, or preferably have a longer duration before the assist times out?

jackal7 · March 5, 2025, 1:36pm

Hi Rob,
To keep this short, I was trying to follow what you did and seem to have bricked my Voice PE. I already posted this in the Hardware section and someone there said I should try posting here as well.

I was able to get to “take-over the device” within ESPHome Builder. At the end of the flashing, it showed an IP but it couldn’t connect to the VPE. I retried several times but no luck. Now it shows the VPE in Home Builder in red. When I try to install now, it asks how I want to connect. When I choose OTA it ends with a “couldn’t find device” message.

It now seems to be bricked. The light ring (white) stays on all the time and it doesn’t connect to the wifi anymore. Power cycling has no effect. At this point I figured I should stop and ask for help. This is the first time I’ve used Home Builder, so I really hope I didn’t screw something up and permanently break my brand new Voice Assist.

No ideal what I should do next. Any help would be much appreciated.

sobchek · March 5, 2025, 2:03pm

Do you have a voice pipeline setup? OpenWakeWord in particular? Could just be that it can’t initiate without it

NathanCu · March 5, 2025, 2:13pm

Sorry to hear.

If you can’t get anything you may need to reinstall the base firmware…

Here you go.

sobchek · March 5, 2025, 2:50pm

[09:45:11][D][voice_assistant:641]: Event Type: 10
[09:45:11][D][voice_assistant:650]: Wake word detected
[09:45:11][D][voice_assistant:641]: Event Type: 3
[09:45:11][D][voice_assistant:655]: STT started
[09:45:11][D][media_player:073]: 'Home Assistant Voice 091f89' - Setting
[09:45:11][D][media_player:077]:   Command: STOP
[09:45:11][D][media_player:086]:  Announcement: yes
[09:45:11][D][light:036]: 'voice_assistant_leds' Setting:
[09:45:11][D][light:047]:   State: ON
[09:45:11][D][light:051]:   Brightness: 66%
[09:45:11][D][light:109]:   Effect: 'Waiting for Command'
[09:45:11][D][ring_buffer:034][ann_read]: Created ring buffer with size 1000000
[09:45:11][D][speaker_media_player:420]: State changed to ANNOUNCING
[09:45:11][D][speaker_media_player.pipeline:114]: Reading FLAC file type
[09:45:11][D][speaker_media_player.pipeline:124]: Decoded audio has 1 channels, 48000 Hz sample rate, and 16 bits per sample
[09:45:11][D][ring_buffer:034]: Created ring buffer with size 9600
[09:45:11][D][speaker_mixer:306]: Starting speaker mixer
[09:45:11][D][speaker_mixer:314]: Started speaker mixer
[09:45:11][D][voice_assistant:641]: Event Type: 11
[09:45:11][D][voice_assistant:804]: Starting STT by VAD
[09:45:11][D][light:036]: 'voice_assistant_leds' Setting:
[09:45:11][D][light:051]:   Brightness: 66%
[09:45:11][D][light:109]:   Effect: 'Listening For Command'
[09:45:12][D][speaker_media_player:420]: State changed to IDLE
[09:45:12][D][speaker_mixer:319]: Stopping speaker mixer
[09:45:12][D][voice_assistant:641]: Event Type: 12
[09:45:12][D][voice_assistant:808]: STT by VAD end
[09:45:12][D][voice_assistant:515]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[09:45:12][D][voice_assistant:522]: Desired state set to AWAITING_RESPONSE
[09:45:12][D][voice_assistant:515]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[09:45:12][D][light:036]: 'voice_assistant_leds' Setting:
[09:45:12][D][light:051]:   Brightness: 66%
[09:45:12][D][light:109]:   Effect: 'Thinking'
[09:45:12][D][voice_assistant:515]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[09:45:12][D][voice_assistant:515]: State changed from AWAITING_RESPONSE to AWAITING_RESPONSE
[09:45:12][D][power_supply:033]: Enabling power supply.
[09:45:13][D][power_supply:033]: Enabling power supply.
[09:45:13][D][power_supply:033]: Enabling power supply.
[09:45:13][D][power_supply:033]: Enabling power supply.
[09:45:13][D][voice_assistant:641]: Event Type: 0
[09:45:13][E][voice_assistant:776]: Error: stt-no-text-recognized - No text recognized
[09:45:13][D][voice_assistant:634]: Signaling stop...
[09:45:13][D][voice_assistant:515]: State changed from AWAITING_RESPONSE to STOP_MICROPHONE
[09:45:13][D][voice_assistant:522]: Desired state set to IDLE
[09:45:13][D][voice_assistant:515]: State changed from STOP_MICROPHONE to IDLE
[09:45:13][D][light:036]: 'voice_assistant_leds' Setting:
[09:45:13][D][light:051]:   Brightness: 76%
[09:45:13][D][light:058]:   Red: 100%, Green: 0%, Blue: 0%
[09:45:13][D][light:109]:   Effect: 'Error'
[09:45:13][D][voice_assistant:641]: Event Type: 2
[09:45:13][D][voice_assistant:733]: Assist Pipeline ended
[09:45:13][D][voice_assistant:515]: State changed from IDLE to START_MICROPHONE
[09:45:13][D][voice_assistant:522]: Desired state set to START_PIPELINE
[09:45:13][D][voice_assistant:225]: Starting Microphone
[09:45:13][D][voice_assistant:515]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[09:45:13][D][voice_assistant:515]: State changed from STARTING_MICROPHONE to START_PIPELINE
[09:45:13][D][voice_assistant:280]: Requesting start...
[09:45:13][D][voice_assistant:515]: State changed from START_PIPELINE to STARTING_PIPELINE
[09:45:13][D][voice_assistant:537]: Client started, streaming microphone
[09:45:13][D][voice_assistant:515]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[09:45:13][D][voice_assistant:522]: Desired state set to STREAMING_MICROPHONE
[09:45:13][D][voice_assistant:641]: Event Type: 1
[09:45:13][D][voice_assistant:644]: Assist Pipeline running
[09:45:13][D][voice_assistant:641]: Event Type: 9
[09:45:14][D][power_supply:033]: Enabling power supply.
[09:45:14][D][power_supply:033]: Enabling power supply.
[09:45:14][D][power_supply:033]: Enabling power supply.
[09:45:14][D][light:036]: 'voice_assistant_leds' Setting:
[09:45:14][D][light:047]:   State: OFF
[09:45:14][D][light:109]:   Effect: 'None'
[09:45:14][D][light:036]: 'LED Ring' Setting:
[09:46:02][D][esp32.preferences:114]: Saving 1 preferences to flash...
[09:46:02][D][esp32.preferences:142]: Saving 1 preferences to flash: 1 cached, 0 written, 0 failed

It seems as though it is stopping STT before I can even get out a command.

So it works if i basically state “Wake word, perform command” all as one sentence." But even the slightest pause is causing it to fail as you can see there is about 1 second between ‘wake word detected’ and ‘STT by VAD end’

jackal7 · March 5, 2025, 4:17pm

Hi,
Thanks for the reply. Unfortunately that did not work. It doesn’t see any devices. I wasn’t sure which one I needed, so installed all three of the drivers mentioned on the popup and retried but it still doesn’t see the device.
Edit: This was because the USB cable was not a data cable.

I tried doing a factory reset on the VPE but no matter how long I hold the button, nothing happens. I step away for awhile and when I came back it was “twinkling” red. So I’m looking into that.

NathanCu · March 5, 2025, 4:43pm

Twinkle red is “I can’t see HA.”. That’s a good sign it wouldn’t do that if it wasn’t running code.

jackal7 · March 5, 2025, 4:48pm

Did some digging and found this page on the status lights.
https://voice-pe.home-assistant.io/documentation/status_colors/

Based on this info I think I must have a typo in my wifi credentials. So it’s not bricked, it just can’t connect to the wifi because it has the wrong password.

So now the question is how to I clear the settings…

Thanks for helping be work through this.

jackal7 · March 5, 2025, 5:27pm

Fixed it!

The source of the problem was the wifi creds. Changing my wifi to match the typo let the VPE connect. I then fixed the wifi creds in secrets and updated the VPE. Changed my wifi network back and no more problems.

Thanks again for everyone’s help.

bvavra1 · March 6, 2025, 11:51am

I’m also running into this same timing issue after applying this mod, and altering the “Finished speaking detection” in the HA device config seems to have no effect.

~~@RobMeades is this something you might be able to fix in your forked repo?~~

EDIT: As sobchek mentions below, setting “Finished speaking detection” to Relaxed does work - I may have just needed to reload the device config after the change because it didn’t initially work for me.

sobchek · March 6, 2025, 2:34pm

If you go to the device settings in the ESPHome integration, and change ‘finished speaking detection’ to ‘relaxed’ it largely resolves the issue.

mhukill5 · April 9, 2025, 5:58pm

This is excellent, thank you! I’m planning to try this when I get home from work later. If I make this change, will it cause any issues with getting the updates to the HAVPE device released by the HA dev team?

williamjameshandley · May 7, 2025, 6:38am

Hi there,

Very new to the home assistant community.

At the moment I’m running into difficulty when installing from your fork:

Compiling .pioenvs/home-assistant-voice-09cba7/src/esphome/components/nabu_microphone/nabu_microphone.cpp.o
In file included from src/esphome/components/i2s_audio/i2s_audio.h:5,
                 from src/esphome/components/nabu_microphone/nabu_microphone.h:8,
                 from src/esphome/components/nabu_microphone/nabu_microphone.cpp:1:
/data/cache/platformio/packages/framework-espidf/components/driver/deprecated/driver/i2s.h:27:2: warning: #warning "This set of I2S APIs has been deprecated, please include 'driver/i2s_std.h', 'driver/i2s_pdm.h' or 'driver/i2s_tdm.h' instead. if you want to keep using the old APIs and ignore this warning, you can enable 'Suppress leagcy driver deprecated warning' option under 'I2S Configuration' menu in Kconfig" [-Wcpp]
   27 | #warning "This set of I2S APIs has been deprecated, \
      |  ^~~~~~~
src/esphome/components/mixer/speaker/mixer_speaker.cpp: In static member function 'static void esphome::mixer_speaker::MixerSpeaker::audio_mixer_task(void*)':
src/esphome/components/mixer/speaker/mixer_speaker.cpp:498:50: error: no matching function for call to 'esphome::audio::AudioSinkTransferBuffer::transfer_data_to_sink(TickType_t, bool)'
  498 |     output_transfer_buffer->transfer_data_to_sink(pdMS_TO_TICKS(TASK_DELAY_MS), false);
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from src/esphome/components/mixer/speaker/mixer_speaker.h:6,
                 from src/esphome/components/mixer/speaker/mixer_speaker.cpp:1:
src/esphome/components/audio/audio_transfer_buffer.h:93:10: note: candidate: 'size_t esphome::audio::AudioSinkTransferBuffer::transfer_data_to_sink(TickType_t)'
   93 |   size_t transfer_data_to_sink(TickType_t ticks_to_wait);
      |          ^~~~~~~~~~~~~~~~~~~~~
src/esphome/components/audio/audio_transfer_buffer.h:93:10: note:   candidate expects 1 argument, 2 provided
*** [.pioenvs/home-assistant-voice-09cba7/src/esphome/components/mixer/speaker/mixer_speaker.cpp.o] Error 1

Does the fork need updating since Mar 6 (April 7th today), or am I missing an installed component?

Pops1 · May 9, 2025, 6:56pm

I got the same errors about a week ago.

This the first time I have had time to address it.