Voice Chapter 7 - Supercharged wake words and timers

It’s fine if you are using a Wyoming Satellite since it runs on ARM but Openwakeword is actually installed on the satellite so it listens for the wake word. Right now since microwakeword v2, I think it and Assist Microphone, which does take up resources since the HA server is constantly listening, are the only 2 devices using it. Well, you always have the option to manually force it but that seems counter productive at this point.

No, microWakeWord runs locally on the ESP32 so HA Assist does not start listening until after the wakework is triggered in microWakeWord running locally on the ESP32

I head this is possible on the esp32

Can it be possible to only run the wake word, without the rest of the Voice Assistant part? To create a micro voice assistant.

For example, with wake words, microft light, micro music, microft off. In can have a voice assistant with all the commands for that room. And i do not need the rest of the processing power. And i hope that it can been more accurate, for just the 3 commands.

Is it possible to create these wake words?

This is possible, but the training process for microWakeWord is more involved than openWakeWord. Assuming you were able to train the models, though, it is definitely possible to just run microWakeWord and have it perform ESPHome actions when the different wake words are detected.

I am using a diy voice assisant and I am able to set timers with it. These timers are running on the esp32 device. For now I have a text_sensor in HA under the esphome device that show the remaining time of the timers. This far all good. But I can start 2, 3 or more timers at ounce and I would like to track their progress in HA. The tricky part is that I don’t know how many timers there will be running so I was thinking to only display a test_sensor in HA for each active timers or none if there are no timers running. Is there a way of doing this? And if so could you give me some pointers to look into. Thanks
(trying this question also here in the hope I get an answer)

Yes. But the first device they came out with that could listen for a wake word locally was the Wyoming satellite that worked without HA listening. But as you can see, it requires a pi zero w 2 and a respeaker hat, either 2 nica or 4 mica and they have a 3.5mm output. It has enough power to run Openeakeword locally. It was never ported to ESP32 and I don’t believe it has enough power to run it if they tried although I could be mistaken. When voice first hit ESP32 your HA server has to listen using Openeakeword but the Wyoming satellite has been around since Rhasspy.

Then, some ESPHome contributed heard the head voice guy from Nabu (Mike I believe, who also wrote Rhasspy so he obviously is extremely intelligent)) on a podcast talking about how they were working on it, saying that it wasn’t an easy task so the ESPHome contributer, can’t remember his name, saw it as a challenge. So, he’s one of those guys that’s obviously extremely smart also and can make you feel like your life accomplishments are a joke from a one hour Livestream (sarcasm/not sarcasm) He originally got microwakeword working on ESP32-S3, he wrote and trained it using open source software , Google tensorflow or something similar. He didn’t start from scratch but I don’t believe any one person could. At least that’s what I remember from the Livestream in early February or whenever version 1 was announced.

So he was the one that originally got it working although obviously Nabu/HA was more than happy to help him work out the bugs. I’m 99% sure that he’s a full time employee at Nabu Casa now working in the voice “department” or however their structure is. Probably pretty laid back. You can still build a Wyoming satellite. I do believe it’s the only if not a handful of devices that can run OpenWakeWord natively but they are all ARM based as no ESP32 device can, the HA server always listening was a “placeholder” for lack of a battery term. Wyoming was out and worked using Rhasspy before Nabu released their first voice update if I’m not mistaken. I remember it started using the companion app on your smartphone first or a web browser via text only at first (I might be mistaken ) and grew from there. Probably around the time Nabu hired Mike full time as their main voice guy.

Obviously going forward they will be focusing on microwakeword and ESP32 to leverage ESPHome because they own it. ESPHome does support some pi variants but I’ve never looked into it. But OpenWakeWord is already an add on so it will continue to support and work with a Wyoming satellite Just makes sense using Microwakeword going forward I’m sure the AI stuff might have much different requirements even using the same foundation (Wyoming/Piper/other stuff) but I don’t really need all that personally. I get the appeal but you need a GPU for it to work. It’s also probably a year away, maybe 6 months at the earliest.

Why not use the built in Timers now that they are out with the July release, if your not already doing so. I don’t think a text sensor would have to be manually created. You can name timers to run multiple ones. I believe you just have to use the voice phases similar to the S3 box. That’s what I’ve been doing although I haven’t updated my config for native timers yet on my voice assistant. The first part is from substitutions and be part after is the first script for native timers.

Now having them display in the HA app, that I’m not sure about as I just ask for the time remaining via voice using my jank automations solution for 2 timers at once. This involves creating a timer via helper and I had to create a text sensor for remaining time because for some reason the only way I could get that value was to pause the timer, store value, then resume immediately (I’m sure there is a much better way of doing this)but even that was based on some even older Rhasspy code I had used. You can also do stuff like the below with the new native solution.

Turn off the lights in 5 minutes
Pause TV in 10 minutes
Open the blinds in 5 minutes

voice_assist_idle_phase_id: "1"
  voice_assist_listening_phase_id: "2"
  voice_assist_thinking_phase_id: "3"
  voice_assist_replying_phase_id: "4"
  voice_assist_not_ready_phase_id: "10"
  voice_assist_error_phase_id: "11"
  voice_assist_muted_phase_id: "12"
  **voice_assist_timer_finished_phase_id: "20"**

  - id: fetch_first_active_timer
    then:
      - lambda: |
          const auto timers = id(va).get_timers();
          auto output_timer = timers.begin()->second;
          for (auto &iterable_timer : timers) {
            if (iterable_timer.second.is_active && iterable_timer.second.seconds_left <= output_timer.seconds_left) {
              output_timer = iterable_timer.second;
            }
          }
          id(global_first_active_timer) = output_timer;

Yes indeed I am using the timers offered by the July release
The thing is that if I have multiple timers running in forget which one are running. I could give them a name to start with but I like to see how much time each one has left and to add/subtract time or even cancel the right one. For that I like to have an overview in HA. I could make text sensors as 10 placeholder but I was hoping to do that a bit more dynamically based on the timer.size() call.

The design files for my ESP Assistant, including 3d files and yaml are here.
https://github.com/AshaiRey/ESP-Assistant

And just in case you still need it
Here is an example to update the text_sensor directly

- id: active_timer_widget
    then:
      - lambda: |
          id(check_if_timers_active).execute();
          if (id(global_is_timer_active)){
            id(fetch_first_active_timer).execute();
            int hours_left = floor(id(global_first_active_timer).seconds_left / 3600);
            int minutes_left = floor((id(global_first_active_timer).seconds_left - hours_left * 3600) / 60);
            int seconds_left = id(global_first_active_timer).seconds_left - hours_left * 3600 - minutes_left * 60 ;
            auto display_hours = (hours_left < 10 ? "0" : "") + std::to_string(hours_left);
            auto display_minute = (minutes_left < 10 ? "0" : "") + std::to_string(minutes_left);
            auto display_seconds = (seconds_left  < 10 ? "0" : "") + std::to_string(seconds_left) ;

            std::string display_string = "";
            if (hours_left > 0) {
              display_string = id(global_first_active_timer).name + " " + display_hours + ":" + display_minute;
            } else {
              display_string = id(global_first_active_timer).name + " " + display_minute + ":" + display_seconds;
            }
            id(text_timer).publish_state(display_string.c_str());
          }

+2 for this!!! And make timers work with ViewAssist

As far as what’s displayed in HA this can be done with conditions when creating the HA dashboard. You could create 10 text sensors than only have active ones show up in the HA dashboard. See the below where he has the dashboard update dynamically to the room he’s in using a condition based on the esprense location and has some other examples like the garage door status only showing when it’s open since it’s closed most of the time. If your main goal is not to have 10 text sensors showing up when only one timer is running in the HA dashboard this should accomplish that. This way you can show no timers or just active ones.

This is t needed if the above accomplishes what you need but just wanted to point out you can make HA service calls from within ESPHome. I’m also wondering if you could simply duplicate it in HA using a timer helper (helpers are under devices, then right icon at the bottom) and make a home assistant service calls from ESPHome although since I don’t have native timers setup so I can’t test and there could be a few issues with this method. Like adding time. That an duplicate timers existing but you could have the helper just for display reasons. I do know helpers or entities can’t be created dynamically in HA, the entity has to exist so you would have to create a helper named something like timer.timer_1, timer.timer_2, ect. Then use conditions to only show active timer in the HA dashboard. Below is my current solution using a timer helper. Probably not ideal but just a thought.

Regardless to make HA service calls you have to go to settings, devices, then ESPHome (top part), then configure and check the checkbox. For example, I own a korvo-1 as my.voive assistant and it makes a popping sound on voice reply’s so I just ended up sending all audio output to a smart speaker. There may be a better HA service to call from ESPHome also. If you go to developer tools in HA, then services, you will see all the different services you could use. You could also send a notification to your phone when a timer finishes as another example. I do this but via automations. Waiting on my m5stack core3 before switching to native timers.

  on_tts_end:
     - homeassistant.service:
         service: media_player.play_media
         data:
           entity_id: media_player.vlc_telnet  
           media_content_id: !lambda 'return x;'
           media_content_type: music
           announce: "true" 

In order to do this you must enable your voice assistant to HA make service calls to HA as stated above. No one told me this when I first read about it and I got somewhat"frustrated" the first time I attempted this until I learned this requirement.

1 Like

Thank you for the detailed reply. I will definitely try your suggestions.
Yesterday I made a few long running timers (10h +) om the ESP. This morning the where not available anymore on the ESP. Probably due to some connection problems, a restart or what else. I didn’t look into that yet. But to my surprise I noticed that these timers where still running in HA. For that i opened Assist and ask for “Timer Status”. It told me that 3 timers where running and gave details of the one with the shortest time left. In HA I wasn’t able to trace these timers. I used dev tool for it but they didn’t show up. I may be overlooking here something.

I have updated the yaml to the latest version
ESP Assistant yaml

Based on the documentation they are supposed to run completely on the voice assistant. If they are created in HA anywhere they would be found under developer tools, probably the timer domain which is what helpers create. If it’s not there then it’s not stored in HA, at least not visibly unless you store them in a text sensor like you are already doing but may be a way to do the same thing to a timer. HA usually just doesn’t create things without the users knowledge like entities. Some things can only be created through helpers, or are easiest to like virtual switches which is just a bool value. With that said I haven’t used them yet so I could be mistaken but developer tools would be the place to find them if they do exist.

Timers are supposed to survive reboots, at least when do e through an automation using HA but not sure about native ESPHome voice timers. Really not a lot of information out there besides the announcement at the beginning of July and some scattered threads here. I’m sure you could also have ESPHome announce when the timer is done over a smart speaker. Below is what I got for audio output when a timer ends but through automations. It’s fun to freak people out with sentence/response automations because if you go to services and choose the below it will show you entities, usually smart speakers like Chromecast/Sonos, and have it reply however you want but you can’t send it to a voice assistant, at least not yet. I think there is a local TTS service that works just as good also.

service: tts.cloud_say
data:
  entity_id: media_player.vlc_telnet
  message: >-
    (the timer finished at {{ now().strftime('%-I') }} {{ now().strftime('%M
    %p') }} on {{ now().strftime('%A') }} {{ now().strftime('%B') }} {{
    now().strftime('%d') }}
  cache: true
enabled: true

FYI, the newly announced ”ReSpeaker Lite" (ReSpeaker Lite board only) and “ReSpeaker Lite Voice Assistant Kit” (ReSpeaker Lite board with onboard ESP32) products.

So that same webpage offers different models, with one being a 2-Mic Array board model that combines ESP32-S3 ESPHome support + a XMOS XU-316 chip for advanced audio processing, and a second model that is a DIY-varient that is only a 2-Mic Array board model with just the XMOS XU-316 chip that you can use with your own compute solution (thus you need to add your own MCU board or a SBC/computer such as a the Raspberry Pi) and connect it via I2S or USB.

@synesthesiam The former full kit solution sound similar to the hardware specification mentioned for the upcoming voice-kit development platform that Nabu Casa members said that they are working on, or? …though looks like the ReSpeaker Lite Voice Assistant Kit is missing expansion ports and if so will not allow for additional hardware addons?

1 Like

Looks interesting, and am particularly pleased to see that their demonstration video is using Home Assistant with ESPHome and wyoming :slight_smile: For anyone else interested, the product page has price USD$26.91 for full kit with an enclosure. More info in the Wiki.

They state Onboard AI Algorithms … the kit includes Automatic Speech Recognition algorithms for Interference Cancellation (IC) , Acoustic Echo Cancellation, Noise Suppression, Voice-to-Noise Ratio (VNR), and Automatic Gain Control (AGC), enabling high quality voice capture.

I found a firmware file, but so small that I guess most of the algorithms are closed source hardcoded on-chip.

Personally I don’t see seeed as being interested in supplying hardware and support to end users - and wouldn’t want a repeat of their “support” for the reSpeaker 2-mic HAT for Raspberry Pi.

WOOT! Stumbled on this new “voice-kit” GitHub repository where ESPHome developers are developing new or improved components for I2S audio (XMOS) support and media playback support for FLAC, etc. for the upcoming voice-kit hardware platform from Nabu Casa:

They already added features and functions or improvements/enhancements to ESPHome, such as:

  • New: Nabu Media Player - new “nabu” media player from Nabu Casa running natively on ESP32
    • Music Assistant streams work (both mp3 and flac), but since it requires resampling, the audio quality isn’t great
  • New: Added support for FLAC files
  • New: Added a proper WAV decoder (that parses WAV headers with LIST, INFO, etc. chunks.)
  • New: Initial support for playing back local files
  • New: Playback Control for the VoiceKit
  • New: Added an is_paused condition for media players.
  • New: Add Click to Converse to button
  • New: LED animation
  • New: Scripts for controlling LEDs
  • New: Update Button Behaviour for the Voice kit
  • New: Dial Volume Control
  • New: Timer basic implementation
  • New: Dial Volume Control
  • New: Added HTTP(s) OTA updates
  • New: Dial Volume Control
  • New: Added Buttons for force ota update.
  • New: Software Mute Switch
  • Improvement: A basic resampler adjusts sample rates
  • Improvement: Configurable output sample rate (for experimental 48kHz XMOS firmware)
  • Improvement: The DAC mute state is read on boot
  • Improvement: volume/mute control via the DAC (the wheel works for increasing/decreasing volume)
  • Improvement: Logs what element failed if the pipeline breaks
  • Improvement: Fails gracefully if the incoming stream can’t be processed
  • Improvement: Differentiate between user facing LED Ring and Internal LED ring
  • Point external component to dev branch

They also have many TODO inline coments in the code there if anyone are interested in helping them:

https://github.com/search?q=repo%3Aesphome%2Fvoice-kit%20todo&type=code

Note! Be aware that there are many comments there to that most of the new stuff are not yet stable.

Not sure, they clearly aiming for ESPHome compatible. See separate thread for more discussion here:

I have to agree a little. This is clearly geared towards ESPHome but they will probably put out one config example and never update it for a voice assistant in ESPHome. Also, after looking at the XNOS datasheet, Seeed doesn’t specify the exact model and there are 10 or so with different capabilities, some supporting external LDDR1 RAM. On the summer release video when they briefly discussed the hardware, one of them sade the XMOS chip had 16 cores so I’m 99% sure it’s this model, but the numbers after make a difference.

If the firmware is closed source in the XMOS chip then it can’t be an ESPHome ready device like Nabus will be. It will be up to the community to tweak any YAML for this thing although I do expect it to sell well. Also, it looks rushed, the full kit is just the board in an acrylic case sitting on the speaker. It’s essentially a respeaker 2 hat with an XMOS chip and an ESP32-S3. Not sure why anyone would try to use anything but a XIAO on it.

Seeed is invested in ESPHome/Nabu now though. They are a reseller of nvidias Jetson compute modules and someone from Seeed started the post to get a local LLM working though and HA devs and Nvidia devs have made a lot of effort to port stuff. They are just interested in selling hardware but after it’s in your hands don’t expect anything else from them. There is nothing wrong with that approach but if they just made it open source (and it may be) I would have more faith in long term support. While I expect Nabus voice assistant too more expensive, you know it will just work and continue to be supported for at least 2 years, probably more

I would.like to know the exact model that’s in this and what will be in Nabus version but they obviously haven’t released hardware specifications yet.

It certainly seem to be taking ESPHome and Nabu Casa seriously, which is great to see !

Agree. I believe seeed and Espressif want to sell chips to companies who will develop products - not to directly support every hobbyist’s one-off project … hence the lack of ongoing support.
Sometimes they have to drum up interest (particularly in newer technologies), and so make small runs of “development kits” (probably at a loss) to show off the potential to developers.
Once interested, the developer will invest in learning the chip, developing their own product, selling and supporting their product … while buying lots of chips :wink:

Exactly !!! It seems Nabu Casa will use XMOS chips, and Nabu will do the bulk of the work to bring a user-friendly product to market, including providing support and (presumably) Open Source.

Yes, Nabu Casa’s VoiceKit will cost more than the XMOS development kit - but I trust Nabu Casa to put a fair price on their VoiceKit (though surely plenty will expect to price match with Google/Amazon’s deep pockets). I hope they allow an option for us to 3D print our own enclosures; and maybe the reSpeaker Lite is so close to Nabu Casa’s target design that we might also have an option to buy a reSpeaker Lite clone, assemble ourselves, and add ESPHome. I can’t wait to see what Nabu Casa come out with.

The future is looking very bright :star_struck:
Time for me to start saving my pension :wink:

@donburch888 @ginandbacon if want deeper discussions regarding that ReSpeaker Lite product specifically that only applies to it then suggest that you post to the separate thread that instead, see:

https://community.home-assistant.io/t/respeaker-lite-voice-assistant-kit-seeed-studio-voicekit-combining-xmos-xu-316-and-esp32-s3/756944

Yes it is interesting, showing a wider interest in better cheaper voice assistant devices … but unless Mike says that this is the hardware in Nabu Casa’s VoiceKit I will just wait for Nabu Casa VoiceKit.

FYI, FutureProofHomes has now also announced a similar XMOS and ESP32-based two-board Voice Satellite hardware development kit for Home Assistant that he is call ”Satellite1 PCB Dev Kit

Satellite1 PCB Dev Kit

The Satellite1 PCB Dev Kit contains the two PCBs necessary to build your own completely private voice assistant & multi-sensor with XMOS advanced audio processing & music playback. Add your own speaker and power supplies.

Satellite1 HAT Board:

This board features 4 PDM microphones, 12 NeoPixel LEDs, humidity/temp/lux sensors, 4 buttons (volume up/down, action button & hardware mute), plus the XMOS audio processing chip and a power DAC with for amplified speaker-out connection or 3.5mm headphone connection. All remaining GPIOs are also exposed.

The Satellite1 Hat connects easily to the Sat1 Core Board but can also be paired with a Raspberry Pi or a PC/Mac via USB! Perfect for all your voice assistant and audio projects!

Satellite1 Core Board:

The Satellite1 Core Board contains the ESP32-S3 n16r8, USB-C Power Delivery and 40-pin connection. This board attaches to the companion Sat1 HAT Board.

Looks like he has posted a future roadmap showing that he working on a a nice enclosure and more:

Noticed that @FutureProofHomes had a preview video on YouTube mentioning this project as “HomeX” 4-months ago (but at that time he had based the prototype on the wyoming-satellite platform running on a Raspberry Pi instead of using Nabu Casa’s upcoming ESPHome-based voice-kit hardware platform that runs on ESP32-S3 and using an XMOS xCORE chip for audio processing):

PS: The new design reminds me of the “Onju Voice” PCB replacement for the Google Nest Mini (2nd gen), which is a open-source hardware project that I hope someone else will pick up and update now: