Voice Chapter 7 - Supercharged wake words and timers

AleXSR700 · July 4, 2024, 8:57pm

Thank you, I didn’t know that stats page

Looking just briefly at it it seems that Europe is the biggest market (I would look at Europe vs. USA rather than one European country).

But yes, US has a big share and maybe the bloggers are American? Or it is simply because electronics are often listen in USD (also in Asia).

jfparis · July 5, 2024, 9:33am

@jenova70 I have the same problem than @tykeal
Are the timers supported in the home-built version of the voice assistant using that yaml firmware/voice-assistant/esp32-s3-box-3.yaml at a79c9fa96fe0fdd24c17e7571de3ef37755d54a1 · esphome/firmware · GitHub

Or is it only in the prebuilt firmware (potentially waiting for some update in the next release)

Vez · July 5, 2024, 3:30pm

With 2024.6.6, there is a new image for alarm sounding, does anyone know the substitute name for this for the yaml config? (esp-32-box3) Thanks

HarvsG · July 5, 2024, 3:59pm

Anyone know how to get the new micro wake word on the Atom M5 stack?

Didgeridrew · July 5, 2024, 4:12pm

jenova70 · July 5, 2024, 7:28pm

Not in this firmware indeed. I am curious why this seemingly random commit ?
Why not simply the main branch here ?

github.com

esphome/firmware/blob/cf3ab54f28776e3aee925341cd6643404f5a4501/wake-word-voice-assistant/esp32-s3-box-3.yaml

---
substitutions:
  name: esp32-s3-box-3
  friendly_name: ESP32 S3 Box 3
  loading_illustration_file: https://github.com/esphome/firmware/raw/main/voice-assistant/casita/loading_320_240.png
  idle_illustration_file: https://github.com/esphome/firmware/raw/main/voice-assistant/casita/idle_320_240.png
  listening_illustration_file: https://github.com/esphome/firmware/raw/main/voice-assistant/casita/listening_320_240.png
  thinking_illustration_file: https://github.com/esphome/firmware/raw/main/voice-assistant/casita/thinking_320_240.png
  replying_illustration_file: https://github.com/esphome/firmware/raw/main/voice-assistant/casita/replying_320_240.png
  error_illustration_file: https://github.com/esphome/firmware/raw/main/voice-assistant/casita/error_320_240.png
  timer_finished_illustration_file: https://github.com/esphome/firmware/raw/main/voice-assistant/casita/timer_finished_320_240.png

  loading_illustration_background_color: "000000"
  idle_illustration_background_color: "000000"
  listening_illustration_background_color: "FFFFFF"
  thinking_illustration_background_color: "FFFFFF"
  replying_illustration_background_color: "FFFFFF"
  error_illustration_background_color: "000000"

  voice_assist_idle_phase_id: "1"

This file has been truncated. show original

ginandbacon · July 5, 2024, 8:05pm

If your talking about the commit nine hours ago (when this was posted), there is a link to the fix and changes made to “Disable openWakeWord when using microWakeWord” which seems odd but you wouldn’t want it running if you were using Openeakeword because on device failed or some other reason. Just a quick glance but seems like they removed some of the use_wake_word from the voice pipeline, set it to false in another and a few other minor changes, nothing major.

ginandbacon · July 5, 2024, 9:20pm

I have the opposite issue. I created some very basic timers using an automation and by creating a timer helper. This will give you something you can display but it doesn’t appear that it’s creating anything when doing it through ESPHome. I just wanted something where I could set a timer, cancel it, and get a voice announcement and text when the timer is completed but without the timer helper entity, I have zero idea how ESPHome does this.

As far as a media player, it’s probably going to be a while, I got the espressif korvo-1 working and Everytime it plays audio there is a loud annoying popping sound when outputting from the 3.5mm jack. This doesn’t seem like it’s isolated to one device as I was looking at the M5StackCore3 and when looking at the HA code from m5stack, which uses Openeakeword on GitHub, the number one issue was a popping noise before audio playback. I ended up just sending all audio from voice commands to another speaker that can do announcements.

JHerbY2K · July 5, 2024, 9:41pm

I have an Onju voice (ESP board replacement of a Google Nest Mini) i just updated, and its telling me “timers are not supported on this device”. This is in the logs:

Intent handling error
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/conversation/default_agent.py", line 347, in async_process
    intent_response = await intent.async_handle(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/intent.py", line 140, in async_handle
    result = await handler.async_handle(intent)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/intent/timers.py", line 836, in async_handle
    raise TimersNotSupportedError(intent_obj.device_id)
homeassistant.components.intent.timers.TimersNotSupportedError: Device does not support timers: device_id=1aac71463ab0105af2266bc3d1166bb5

Are ESPHome voice satelites all supported or just certain models using a specific fork? Sounds like it should be working according to the blog post.

I see there are new voice assistant sections in the ESPHome docs:

  on_timer_started:
  on_timer_cancelled:
  on_timer_updated:
  on_timer_tick:
  on_timer_finished:

are these necessary to implement?

ginandbacon · July 5, 2024, 11:06pm

Right now a Wyoming satellite is probably your best bet. I have seen videos where someone put it in their roof with a POE adapter with a pi zero w (whatever lasted model is) with a respeaker hat and run the 3.5mm mic output to his receiver. Openeakeword runs natively on ARM and has never been ported to ESP32, or simply can’t be so the HA server has to constantly listen for the wake word when using Openeakeword and is resource intensive, especially on something like a raspberry pi… The Wyoming satellite can detect wake words since you actually install Wyoming and other prerequisites onto the Pi.

I am using an Espressif Korvo-1 which has an ESP32-S3 which has PSRAM. PSRAM is needed to run microwakeword, which can detect wake words so no constant listening from HA. It has a 3.5mm output and I get a popping noise Everytime it plays back voice responses. I ended up just sending it to another speaker. You could do this with a Chromecast and just have it output audio via the 3.5mm output on it. I don’t personally think DSP is high on the list but I could be wrong due to people who would actually use it.

8m still a bit confused about people saying microwake word for the m5stack atom echo. It doesn’t have any PSRAM and has to use Openeakeword or the button to trigger it. You can really see how much resources Openeakeword takes using the streaming method on a raspberry pi 4.

Regarding anything USB related, I think that should be handled by the Assist Microphone add on which works with any USB mic/speaker. Right now I’m using one of those round conference rooms speakerphone and it works great. It does have to be plugged into your HA server though

I just don’t think ESP32-S3 is going to be able to do all that as it’s only able to detect the wake word due to having 2 to.8MB of PSRAM to use as a “buffer” and listens for the wake word every 15ms or something. Maybe when the esp32-P4 comes out but it’s pretty much going to need some type of development board. It doesn’t have WiFi or BT. Early development prototype use a separate ESP32-C6 or C3 for that. It does have 50 GPIO pins, pins for Ethernet and other stuff I’m forgetting which could improve audio/microphone performance.It also actually has a dual core 440Mhz (above 400) compared to every other version at 240Mhz. Some type of new RAM, plus up to 32MB of PSRAM. I believe it was supposed to be out already but has been delayed. That and the ESPHome team has to add support for the new chip which takes time.

Use the below to output to a different device. Go to services and choose TTS service and the available speakers will show up like Sonos, Chromecast devices, soundbars, ect . Please note,.you have to go to settings > devices and integrations > ESPHome, clicking on the top icon so you see the configure option and choose that for your VA and check “allow to make home assistant service calls” and update or it won’t work. In favt, I just noticed I could just move it under theon_tts_stream_start phase as it’s set to replying, and remove on_tßs_end completely. Will try that later.

on_tts_stream_start:
    - lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
    - light.turn_on:
        id: led_ring
        blue: 0%
        red: 0%
        green: 100%
        brightness: 100%        
        effect: working
    - script.execute: reset_led 
  on_tts_end:
     - homeassistant.service:
         service: media_player.play_media
         data:
           entity_id: media_player.vlc_telnet  
           media_content_id: !lambda 'return x;'
           media_content_type: music
           announce: "true"
  on_tts_stream_end:

crosenkr · July 6, 2024, 2:04pm

Is it technically possible to also stop the beeping of a finished timer by voice command? On an S3 box, you have to click the button. On Alex and Google, it works by voice.

Maxi1134 · July 6, 2024, 8:45pm

Muting media players on a zone basis would be great!

It could be used for those of us wanting to mute a room when the assist is listening.

Or used for other kinds of automation requiring to mute a zone

Rich37804 · July 8, 2024, 11:36pm

I have created a script that is triggered by the timer-finished-command in a Wyoming Satellite. I use this to send notifications to our phones when a timer is finished.
The script just contains a webhook:

curl -X POST -H "Content-Type: application/json" -d '{ "key": "value" }' http://192.168.86.193:8123/api/webhook/<mywebhook>

Is there a way to pass the name of the timer in the data of the webhook?
So if i say “Start and 8 minute timer for Eggs” I want to send a notification to my phone that says “The timer for Eggs is complete”
Is this possible?

GLehnhoff · July 13, 2024, 11:42am

Devices connected via USB?

Hi guys, not sure if I am right here, I am confused and need answers.

We (our family) do a lot of things with our voices (to HA). We currently use Google Minis. We have a house. There are Google Minis in many rooms and they are connected via wifi. When I look at the video clips and these devices like Atom Echo and S3 Box, I see these devices connected with USB.

In a house or flat with several rooms, we usually do not go to where the server is and the device connected via USB to talk with the house.

Am I on the wrong track?

Next, what about the German language (or any language other than English)? We always talk to our smart house in our natural language.

Thank you for shedding some light on this.

Cheers, gl

donburch888 · July 13, 2024, 12:26pm

James, you sound pretty confused, so to summarise there are 3 different wakeword options …

The first release included an openWakeWord implementation which runs on the HA server. The satellite device(s) send all their audio to openWakeWord on the HA server. This is not specific to ARM, since it was running on my x86 HA PC.
Later release added an implementation of openWakeWord to run locally along with wyoming-satellite. I’m not sure if its generic linux, but I currently run this on my RasPi satellite. You are correct that this cannot be ported to ESP32.
microWakeWord is a different program which does run locally on ESP32. The big news in the first post in this thread is how much better version 2 of microWakeWord is than version 1 (which you are running).

brooksben11 · July 13, 2024, 12:38pm

USB for power, WiFi for connectivity.

Rich37804 · July 13, 2024, 12:47pm

IMO, build a Wyoming Satellite from a Pi and 2Mic hat. Youll be disappointed/frustrated with either the Atom Echo and the S3 box. Especially disappointed in the Atom Echo.

johnsnow · July 15, 2024, 10:06pm

Has microWakeWord been released yet?

HarvsG · July 16, 2024, 9:20pm

It will be released with ESPHome 2024.7 which should be released tomorrow (2024-07-17)

breakthestatic · July 17, 2024, 8:30pm

I just flashed my Atom Echo with what I believe is the latest firmware available:

Firmware: 2024.7.0 (Jul 17 2024, 08:52:15)
Hardware: 24.7.4.1

However, I’m not able to see any way to enable wake word detection. The assistant config says: " It looks like you don’t have a wake word engine setup yet. Find out more about wake words."

When I inspect the entities exposed by the Atom Echo, I don’t see anything seemingly related to a wake word engine (although there’s several wake-word/voice entities there):

M5Stack Atom Echo - light.m5stack_atom_echo_m5stack_atom_echo_
M5Stack Atom Echo Assist in progress - binary_sensor.m5stack_atom_echo_assist_in_progress
M5Stack Atom Echo Assist pipeline - select.m5stack_atom_echo_assist_pipeline
M5Stack Atom Echo Button - binary_sensor.m5stack_atom_echo_button
M5Stack Atom Echo Factory reset - button.m5stack_atom_echo_factory_reset
M5Stack Atom Echo Finished speaking detection - select.m5stack_atom_echo_finished_speaking_detection
M5Stack Atom Echo Firmware - update.m5stack_atom_echo_firmware
M5Stack Atom Echo Safe Mode Boot - button.m5stack_atom_echo_safe_mode_boot
M5Stack Atom Echo Use listen light - switch.m5stack_atom_echo_use_listen_light
M5Stack Atom Echo Use wake word - switch.m5stack_atom_echo_23da0c_use_wake_word

I do have the use_wake_word toggle on. Still though, I haven’t been able to figure out how to actually enable this for my pipeline.