Voice Chapter 7 - Supercharged wake words and timers

Atom Echo with microsoft stt works flawless here. Its super fast. Local stt just doenst work for me with whisper. Its too slow and makes alot of mistakes.

It may “work” but in real life use its terrible.

You are right, should have watched the live stream first, still seems the same to me but maybe it’s the code I am using, looking at the new S3 Box code there is some stuff missing that may need to be added to my config for my Espressif Korvo-1 config but not updating at the moment as it works great, it’s background noise that ruins it, having the TV on to loud makes it hard to use so I usually have to grab my phone and long press the power button on Android as HA is my default assistant.

So going forward will OpenWakeWord only be for Wyoming Satellites and the Assist Microphone add on? Honestly, using a round dedicated speaker phone has by far been the easiest route with Assist Microphone and OpenWakeWord, the obvious downside is it has to be plugged in via USB to the HA server though so limits where it can go. I need to update and test some of my atom echos as I had been using the button to activate to conserve resources but now it’s not needed;

It works amazingly well for me on my Korvo-1, not sure about m5stack echo but I got a Core 3 being delivered today so will update when I get that setup. What doesn’t work is if to much noise (TV/Music) are introduced. I’m sure they are working on this and it obviously can’t be easy. No telling how much cloud stuff is involved with Amazon/Google. In fact using my Android phone works best but doesn’t work with wake words. I have to long press the power button which is understandable, Google would have to allow it and making it work may not be an easy thing to do. I’ve got it set as my default assistant on Android.

Yes, background noise is a big issue, as is choosing a good microphone appropriate for the room it will be used in. My automation to turn the TV on (in time for 6pm news) also turns off the microphone in my Livingroom voice satellite to avoid the inevitable false positives - but I have a button programmed to pause TV, turn on light & turn on voice satellite; and double-tap to reverse these effects to continue watching the TV.

Honestly I don’t know, and Mike currently seems focussed on the AI aspects.

It seems to me that our current hardware options are rather limited or expensive (I can’t justify conference speakers in every room), and there are several desirable techniques under the DSP (Digital Signal Processing) topic which have not yet made it to open source.

Maybe Mike feels that he has done pretty much all he can for now with the current hardware and software.

Yeah and I get it. This is probably one of the hardest parts. Who knows how much Amazon and Google spent on both developers to cloud resources to get there. Or how much the cloud was used versus on device, which is still a obviously WAY faster ARM processor on phone so I get it. It’s certainly not easily and while Nabu cloud is faster than local, it’s not by much and mine is just worse on certain words locally. I’m running on x86 on a roughly 3 year old mini PC which is overkill anyways. I’m certainly not complaining, especially since it’s only a year and half in.

I swear I had read somewhere that they had chosen XMOS but I could easily be wrong and that’s just a chip, not the actual DSP, so take that with a huge grain of salt. I bought that because my HA server is in my bedroom. Since it has to be plugged into USB so without buying signal boosters and running cables, which adds more money and really makes it a bad idea. It will work with any USB microphone. Audio output is optional but I think it has to be USB also. Would have to read the add on docs. That and OpenWakeWord takes 3 percent CPU constantly when not working and as stated above, this isn’t a raspberry pi. It has to constantly listen for the wake word. Only device that does

Yeah I understand that OpenWakeWord is more resource demaning so can in practice only work on more powerful hardware, but there is now also the ”microWakeWord” project for microcontrollers with constraint hardware resources, where so far ESP32 series is first to be supported. microWakeWord by @kahrendt is a powerful on-device wake word engine for microcontrollers, and will powers Nabu Casa’s upcoming ESPHome based open-source voice satellites. See:

Just read in the Open Home Foundation newsletter that the microWakeWord project is by the way now part of the Open Home Foundation:

Check out Mike’s posts here

1 Like

Thanks. Makes sense and I don’t really need AI personally, I had been checking that Nvidia developer thread every couple weeks and they are making progress but still, there are some posts about lack of RAM on the 8GB Jetson models and those are 600 a pop plus they have to port a bunch of stuff to be GPU based just to work so I think the “full” voice assistant is a year or more away although nice to hear they are releasing a voice assistant (hopefully this year but I know they are busy) and that they are using XMOS for sound and echo cancellation.

It’s fine if you are using a Wyoming Satellite since it runs on ARM but Openwakeword is actually installed on the satellite so it listens for the wake word. Right now since microwakeword v2, I think it and Assist Microphone, which does take up resources since the HA server is constantly listening, are the only 2 devices using it. Well, you always have the option to manually force it but that seems counter productive at this point.

No, microWakeWord runs locally on the ESP32 so HA Assist does not start listening until after the wakework is triggered in microWakeWord running locally on the ESP32

I head this is possible on the esp32

Can it be possible to only run the wake word, without the rest of the Voice Assistant part? To create a micro voice assistant.

For example, with wake words, microft light, micro music, microft off. In can have a voice assistant with all the commands for that room. And i do not need the rest of the processing power. And i hope that it can been more accurate, for just the 3 commands.

Is it possible to create these wake words?

This is possible, but the training process for microWakeWord is more involved than openWakeWord. Assuming you were able to train the models, though, it is definitely possible to just run microWakeWord and have it perform ESPHome actions when the different wake words are detected.

I am using a diy voice assisant and I am able to set timers with it. These timers are running on the esp32 device. For now I have a text_sensor in HA under the esphome device that show the remaining time of the timers. This far all good. But I can start 2, 3 or more timers at ounce and I would like to track their progress in HA. The tricky part is that I don’t know how many timers there will be running so I was thinking to only display a test_sensor in HA for each active timers or none if there are no timers running. Is there a way of doing this? And if so could you give me some pointers to look into. Thanks
(trying this question also here in the hope I get an answer)

Yes. But the first device they came out with that could listen for a wake word locally was the Wyoming satellite that worked without HA listening. But as you can see, it requires a pi zero w 2 and a respeaker hat, either 2 nica or 4 mica and they have a 3.5mm output. It has enough power to run Openeakeword locally. It was never ported to ESP32 and I don’t believe it has enough power to run it if they tried although I could be mistaken. When voice first hit ESP32 your HA server has to listen using Openeakeword but the Wyoming satellite has been around since Rhasspy.

Then, some ESPHome contributed heard the head voice guy from Nabu (Mike I believe, who also wrote Rhasspy so he obviously is extremely intelligent)) on a podcast talking about how they were working on it, saying that it wasn’t an easy task so the ESPHome contributer, can’t remember his name, saw it as a challenge. So, he’s one of those guys that’s obviously extremely smart also and can make you feel like your life accomplishments are a joke from a one hour Livestream (sarcasm/not sarcasm) He originally got microwakeword working on ESP32-S3, he wrote and trained it using open source software , Google tensorflow or something similar. He didn’t start from scratch but I don’t believe any one person could. At least that’s what I remember from the Livestream in early February or whenever version 1 was announced.

So he was the one that originally got it working although obviously Nabu/HA was more than happy to help him work out the bugs. I’m 99% sure that he’s a full time employee at Nabu Casa now working in the voice “department” or however their structure is. Probably pretty laid back. You can still build a Wyoming satellite. I do believe it’s the only if not a handful of devices that can run OpenWakeWord natively but they are all ARM based as no ESP32 device can, the HA server always listening was a “placeholder” for lack of a battery term. Wyoming was out and worked using Rhasspy before Nabu released their first voice update if I’m not mistaken. I remember it started using the companion app on your smartphone first or a web browser via text only at first (I might be mistaken ) and grew from there. Probably around the time Nabu hired Mike full time as their main voice guy.

Obviously going forward they will be focusing on microwakeword and ESP32 to leverage ESPHome because they own it. ESPHome does support some pi variants but I’ve never looked into it. But OpenWakeWord is already an add on so it will continue to support and work with a Wyoming satellite Just makes sense using Microwakeword going forward I’m sure the AI stuff might have much different requirements even using the same foundation (Wyoming/Piper/other stuff) but I don’t really need all that personally. I get the appeal but you need a GPU for it to work. It’s also probably a year away, maybe 6 months at the earliest.

Why not use the built in Timers now that they are out with the July release, if your not already doing so. I don’t think a text sensor would have to be manually created. You can name timers to run multiple ones. I believe you just have to use the voice phases similar to the S3 box. That’s what I’ve been doing although I haven’t updated my config for native timers yet on my voice assistant. The first part is from substitutions and be part after is the first script for native timers.

Now having them display in the HA app, that I’m not sure about as I just ask for the time remaining via voice using my jank automations solution for 2 timers at once. This involves creating a timer via helper and I had to create a text sensor for remaining time because for some reason the only way I could get that value was to pause the timer, store value, then resume immediately (I’m sure there is a much better way of doing this)but even that was based on some even older Rhasspy code I had used. You can also do stuff like the below with the new native solution.

Turn off the lights in 5 minutes
Pause TV in 10 minutes
Open the blinds in 5 minutes

voice_assist_idle_phase_id: "1"
  voice_assist_listening_phase_id: "2"
  voice_assist_thinking_phase_id: "3"
  voice_assist_replying_phase_id: "4"
  voice_assist_not_ready_phase_id: "10"
  voice_assist_error_phase_id: "11"
  voice_assist_muted_phase_id: "12"
  **voice_assist_timer_finished_phase_id: "20"**

  - id: fetch_first_active_timer
    then:
      - lambda: |
          const auto timers = id(va).get_timers();
          auto output_timer = timers.begin()->second;
          for (auto &iterable_timer : timers) {
            if (iterable_timer.second.is_active && iterable_timer.second.seconds_left <= output_timer.seconds_left) {
              output_timer = iterable_timer.second;
            }
          }
          id(global_first_active_timer) = output_timer;

Yes indeed I am using the timers offered by the July release
The thing is that if I have multiple timers running in forget which one are running. I could give them a name to start with but I like to see how much time each one has left and to add/subtract time or even cancel the right one. For that I like to have an overview in HA. I could make text sensors as 10 placeholder but I was hoping to do that a bit more dynamically based on the timer.size() call.

The design files for my ESP Assistant, including 3d files and yaml are here.
https://github.com/AshaiRey/ESP-Assistant

And just in case you still need it
Here is an example to update the text_sensor directly

- id: active_timer_widget
    then:
      - lambda: |
          id(check_if_timers_active).execute();
          if (id(global_is_timer_active)){
            id(fetch_first_active_timer).execute();
            int hours_left = floor(id(global_first_active_timer).seconds_left / 3600);
            int minutes_left = floor((id(global_first_active_timer).seconds_left - hours_left * 3600) / 60);
            int seconds_left = id(global_first_active_timer).seconds_left - hours_left * 3600 - minutes_left * 60 ;
            auto display_hours = (hours_left < 10 ? "0" : "") + std::to_string(hours_left);
            auto display_minute = (minutes_left < 10 ? "0" : "") + std::to_string(minutes_left);
            auto display_seconds = (seconds_left  < 10 ? "0" : "") + std::to_string(seconds_left) ;

            std::string display_string = "";
            if (hours_left > 0) {
              display_string = id(global_first_active_timer).name + " " + display_hours + ":" + display_minute;
            } else {
              display_string = id(global_first_active_timer).name + " " + display_minute + ":" + display_seconds;
            }
            id(text_timer).publish_state(display_string.c_str());
          }

+2 for this!!! And make timers work with ViewAssist

As far as what’s displayed in HA this can be done with conditions when creating the HA dashboard. You could create 10 text sensors than only have active ones show up in the HA dashboard. See the below where he has the dashboard update dynamically to the room he’s in using a condition based on the esprense location and has some other examples like the garage door status only showing when it’s open since it’s closed most of the time. If your main goal is not to have 10 text sensors showing up when only one timer is running in the HA dashboard this should accomplish that. This way you can show no timers or just active ones.

This is t needed if the above accomplishes what you need but just wanted to point out you can make HA service calls from within ESPHome. I’m also wondering if you could simply duplicate it in HA using a timer helper (helpers are under devices, then right icon at the bottom) and make a home assistant service calls from ESPHome although since I don’t have native timers setup so I can’t test and there could be a few issues with this method. Like adding time. That an duplicate timers existing but you could have the helper just for display reasons. I do know helpers or entities can’t be created dynamically in HA, the entity has to exist so you would have to create a helper named something like timer.timer_1, timer.timer_2, ect. Then use conditions to only show active timer in the HA dashboard. Below is my current solution using a timer helper. Probably not ideal but just a thought.

Regardless to make HA service calls you have to go to settings, devices, then ESPHome (top part), then configure and check the checkbox. For example, I own a korvo-1 as my.voive assistant and it makes a popping sound on voice reply’s so I just ended up sending all audio output to a smart speaker. There may be a better HA service to call from ESPHome also. If you go to developer tools in HA, then services, you will see all the different services you could use. You could also send a notification to your phone when a timer finishes as another example. I do this but via automations. Waiting on my m5stack core3 before switching to native timers.

  on_tts_end:
     - homeassistant.service:
         service: media_player.play_media
         data:
           entity_id: media_player.vlc_telnet  
           media_content_id: !lambda 'return x;'
           media_content_type: music
           announce: "true" 

In order to do this you must enable your voice assistant to HA make service calls to HA as stated above. No one told me this when I first read about it and I got somewhat"frustrated" the first time I attempted this until I learned this requirement.

1 Like

Thank you for the detailed reply. I will definitely try your suggestions.
Yesterday I made a few long running timers (10h +) om the ESP. This morning the where not available anymore on the ESP. Probably due to some connection problems, a restart or what else. I didn’t look into that yet. But to my surprise I noticed that these timers where still running in HA. For that i opened Assist and ask for “Timer Status”. It told me that 3 timers where running and gave details of the one with the shortest time left. In HA I wasn’t able to trace these timers. I used dev tool for it but they didn’t show up. I may be overlooking here something.

I have updated the yaml to the latest version
ESP Assistant yaml