ESPHome Voice Assistant

I have tried 4 different INMP441’s on a nodemcu-32, doiT ESP32 DEVKIT V1 and S2 mini. All work with custom Arduino code but just can’t get any combination to work with voice_assistant :confused:

I messed with this today and couldn’t get it to work with the button push. I did however copy over the code for wake word from the atom device and got the wake word working. No idea what the button push problem was.


esphome:
  name: microphone-input
  friendly_name: microphone input
  min_version: 2023.10.0
  on_boot:
    - priority: -100
      then:
        - wait_until: api.connected
        - delay: 1s
        - if:
            condition:
              switch.is_on: use_wake_word
            then:
              - voice_assistant.start_continuous:

esp32:
  board: esp32dev
  framework:
    type: arduino

# Enable logging
logger:

# Enable Home Assistant API
api:
  encryption:
    key: "..."

ota:
  password: "..."


wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

i2s_audio:
  i2s_lrclk_pin: GPIO33  # LRCLK, WS, FS
  i2s_bclk_pin: GPIO19  # BLCK, SCK

microphone:
  - platform: i2s_audio
    id: echo_microphone
    i2s_din_pin: GPIO23  # DIN, SDIN, SD, SDATA, ADCDATA
    adc_type: external
    pdm: false
    bits_per_sample: 32bit

voice_assistant:
  id: va
  microphone: echo_microphone
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.0
  use_wake_word: True 
  on_listening:
    - light.turn_on: response_light
  on_end:
    - light.turn_off: response_light

output:
  - platform: gpio
    pin: GPIO2  #onboard led
    id: light_output

light:
  - platform: binary
    name: "Wake Word"
    id: response_light
    output: light_output

switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous

    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);







I mostly just copied and pasted code until it worked. No idea what I’m doing. This is with a normal esp32 dev board.

I have this code with ESP32 dev board and microphone and it works all well with wake work for sometime.

esphome:
  name: microphone
  friendly_name: Microphone
  
  on_boot: 
    then:
      - switch.turn_off: use_wake_word
      - delay: 30sec
      - switch.turn_on: use_wake_word
          

esp32:
  board: esp32dev
  framework:
    type: arduino


web_server:
  port: 80

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO26 #WS
    i2s_bclk_pin: GPIO25 #SCK

microphone:
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: mic_i2s
    channel: right
    bits_per_sample: 32bit
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO33 #SD



voice_assistant:
  microphone: mic_i2s
  id: va
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 4.0
  on_wake_word_detected: 
    - light.turn_on:
        id: led_light
  on_listening: 
    - light.turn_on:
        id: led_light
        effect: "Scan Effect With Custom Values"
        red: 63%
        green: 13%
        blue: 93%
  on_error: 
    - light.turn_on:
        id: led_light
        effect: "flicker"  
    - switch.turn_off: use_wake_word
    - delay: 1sec
    - switch.turn_on: use_wake_word        
  on_end: 
    - light.turn_off:
        id: led_light      



switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF
    entity_category: config
    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);

light:
  - platform: neopixelbus
    id: led_light
    type: grb
    pin: 32
    num_leds: 10
    name: "Light"
    variant: ws2812x
    default_transition_length: 0.5s
      
    effects:
      - addressable_flicker: 
          name: "flicker"
      - addressable_scan:
          name: Scan Effect With Custom Values
          move_interval: 50ms
          scan_width: 3

But after sometime, I guess it looks like the connection with HA drops and then it shows this log output in the esphome logs.

[22:52:13][D][voice_assistant:468]: Event Type: 0
[22:52:13][D][voice_assistant:468]: Event Type: 2
[22:52:13][D][voice_assistant:550]: Assist Pipeline ended
[22:52:13][D][voice_assistant:366]: State changed from STREAMING_MICROPHONE to IDLE
[22:52:13][D][voice_assistant:372]: Desired state set to IDLE
WARNING 192.168.0.37: Connection error occurred: [Errno 104] Connection reset by peer
INFO Processing unexpected disconnect from ESPHome API for 192.168.0.37
WARNING Disconnected from API
INFO Successfully connected to 192.168.0.37
[22:57:14][I][ota:117]: Boot seems successful, resetting boot loop counter.
[22:57:14][D][esp32.preferences:114]: Saving 1 preferences to flash...
[22:57:14][D][esp32.preferences:143]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed

It gets stuck here and these are the last log lines and the wake word does not work any more.

I have to then switch off the “use wake word” button and switch it on again to make it work.

I will try now and replace

esphome:
  name: microphone
  friendly_name: Microphone
  
  on_boot: 
    then:
      - switch.turn_off: use_wake_word
      - delay: 30sec
      - switch.turn_on: use_wake_word

with

  on_boot:
    - priority: -100
      then:
        - wait_until: api.connected
        - delay: 1s
        - if:
            condition:
              switch.is_on: use_wake_word
            then:
              - voice_assistant.start_continuous:

But finally what could be the issue?

Finally got this working, seems like my INMP441’s are wired incorrectly and run on the right channel when connected to GND.

There is however a lot of noise when sample rate is at the default 16000 set in ESPHome.
When testing with custom Arduino code, a sample rate of 8000 gives me clear audio with no noise.

Hi, just came across this as I might be having a similar issue.

I’m using an ESP32 with a INMP441 mic, and it works for a few hours at a time, then randomly I receive the following log error in Home Assistant Core.

Logger: homeassistant.components.assist_pipeline.pipeline
Source: components/assist_pipeline/pipeline.py:653
Integration: Assist pipeline (documentation, issues)
First occurred: 14:38:47 (19 occurrences)
Last logged: 19:00:08

Unexpected error during wake-word-detection
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/assist_pipeline/pipeline.py", line 653, in wake_word_detection
    result = await self.wake_word_entity.async_process_audio_stream(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/wake_word/__init__.py", line 112, in async_process_audio_stream
    result = await self._async_process_audio_stream(stream, wake_word_id)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/wyoming/wake_word.py", line 152, in _async_process_audio_stream
    chunk_info = audio_task.result()
                 ^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/wyoming/wake_word.py", line 82, in next_chunk
    async for chunk_bytes in stream:
  File "/usr/src/homeassistant/homeassistant/components/assist_pipeline/pipeline.py", line 736, in _wake_word_audio_stream
    async for chunk in audio_stream:
  File "/usr/src/homeassistant/homeassistant/components/assist_pipeline/pipeline.py", line 1147, in process_enhance_audio
    async for dirty_samples in audio_stream:
  File "/usr/src/homeassistant/homeassistant/components/esphome/voice_assistant.py", line 155, in _iterate_packets
    raise RuntimeError("Not running")
RuntimeError: Not running

My ESP32 also has a presence sensor and an LED attached to it. Both of those continue working no problem, it’s just the mic that stops. If I restart the ESP (I have programmed a restart toggle for it), then it works again no problem for a few hours.

You mentioned something about the wiring being wrong on the mic. Could you explain a bit more? Got any pictures of how you’ve wired it up? I’ve wired mine as per this guide (see at bottom).

Since I’m really underwhelmed with the performance of the voice assistant I wanted to listen to the recording of the microphone. I added the following lines to the configuration.yaml but no files are generated in the folder I created.

assist_pipeline:
  debug_recording_dir: /share/assist_pipeline/

I’m running Home Assistant Container version and Whisper also running in a container. Here’s the corresponding part of my compose:

  whisper:
    container_name: whisper
    image: rhasspy/wyoming-whisper
    restart: unless-stopped
    command: --model small --language de
    volumes:
      - /lightning/lightning/apps/portainer/data/whisper:/data
    environment:
      - TZ=Europe/Berlin
    ports:
      - 10300:10300
    networks:
      - default

Any clues why no files are generated?

Edit:
Okay got it myself.
I was expecting the /share folder to be inside of the /config folder just like media for example. Instead it is generated at the same level as the config folder. And since I’m not able to use addons I did not see it until I crawled through the console of my HA docker instance.
So if anyone is running Home Assistant and all the other voice components in Docker containers, you now know where to look for those recordings.

D1 mini is just a board style and esp32 comes in a d1 mini form. Using an external DAC isnt exactly a big deal and it doesnt get much simpler.

just trollin’, eh? good work, son!

No girl, im not trolling. Im correcting wrong information. Pay attention and you might learn something, kid.

I’m running an ESP32S, MAX98357A Amplifier for audio output, and INMP441 for Mic. I’ve got it working with OpenWakeWord, and I am getting output from the speaker. The problem I’m now facing is:

  1. The output to the speaker is choppy and within the log it appears that the audio buffer on the ESP32s is full.
  2. When changing the audio output over from the “speaker” component to “media_player”, so as to have a volume control slider within HA, the choppiness of the outputted audio is even more pronounced. (Sounds kinda like it’s speaking through a fan).

Hi Chris

Also struggling to get an Esp Wroom32 working. Which pins are you using? I see your prior post listing pins - we’re those the successful ones for you?

Tia

Andrew

@philimon121 Here are the pins I used, I’ve put the project on hold for the moment, looking for a suitable “housing”.

After reading the articles I referenced, I moved i2s_bclock onto the RX0 pin.

1 Like

I’ve been spending quite some time on tinkering and figuring out the pins myself. Therefore, in case someone has the same combo as I am using, here is what works:

Microphone: M5Stack UO89 SPM1423
ESP device: NodeMCU-32S ESP32

ESPHome config and pin mapping:

i2s_audio:
  i2s_lrclk_pin: GPIO15

microphone:
  - platform: i2s_audio
    id: mic
    bits_per_sample: 32bit
    adc_type: external
    i2s_din_pin: GPIO13
    channel: right
    pdm: true
1 Like

Thanks Chris. I’ve really been struggling to get my Ali esp wroom-32 board to work - been trying every combination I come across on the 'tinternet.

I’ll take a gander at it when I get back home… :crossed_fingers:

1 Like

Chris, you legend!! I got the mic working on HA and I can see from the logs that all is functioning. Thank you so much, I’d almost given up hope!!

Still some work to go as I plan on adding a ld2410 mmw sensor and some decent temp sensor to my config.

@philimon121 I wouldn’t bother with the temperature sensor UNLESS you can somehow thermally isolate the sensor. As a few people have previously mentioned on various threads the LD2410 generates quite a bit of heat.

I tried it when I first got the sensors and I can agree they get really mess up any sensible room temperature! I’ve just added a BH1750 to each of my LD2410s, but with the sensor inside an enclosure I’m not a sensible value, but a reading that I can use.

Hope that helps.

You can find some of my config in here:

https://github.com/Nerivec/SmartHomeEnhanced/tree/main/VoiceAssist

I used ESP32-S3-WROOM-1-N16R8, with dual INMP441 and PCM5102A for jack output, works pretty well (with the workarounds mentioned in the link above); though I did have to automate periodic restarts of the boards to avoid issues, the voice pipeline is not really stable, nor optimized yet (wake word detection).

Details on the board can be found here (mainly folder #5):

https://github.com/vcc-gnd/YD-ESP32-S3
1 Like

Yes @ChrisThomas I’ve got Couple of Bme280’s outputting works of fiction with them being in the same case as the two Ld2410 sensors I’ve deployed. Had to install separate zigbee temp/humidity sensors:

(TMP01 are the zigbee device readings, HPD is the bme280 readings)

Just added a bh1750 to my breadboard too :grin:. Next experiment is it see if the ESP can handle the mic, amp, Ld2410, bh1750 and sht31 temp/humidity sensors all at the same time. My pcb I’m designing as it on the rear of the board so hoping that will help isolate it.

Thanks again for your help and input - much appreciated!

1 Like

I had the same problem with flashing my M5 Atom Echo with the smart speaker/mediaplayer yaml.
Playing music works fine, but using the voice assistant feature gives the “stt-no-text-recognized” error.
The solution turned out to be very simple, the button on the front of the M5 Atom Echo needs to be HELD DOWN not pressed and released immediately.
The speaker only listens to your voice while it is held down. Which is why it does not recognize any text when pressed shortly. I am sure the yaml could be adjusted to change this behavior, but for me holding works fine.

After some trial & error, I finally got the INMP441 mic working but the quality is really shit…

I am using 32bit but in order to actually hear anything, and it being able to perform STT, I need to shout so that probably even my neighbours can hear me. Additionally, it only works if I‘m pretty much just a few cm away from the mic.
Even then the recording contains so much noise that me shouting is barely recognizable - just enough for STT to work in 70-80% of cases.

The wires are not longer than 10cm.
The device I‘m working on should function as a multisensor containing temp/hum, lux, CO2/AQI, and a mmWave sensor. For prototyping they are located approx. 10-15cm apart from each other but the whole setup is in a room with quite some electronic devices (phones, computers, displays, wifi access points, …). Nothing out of the ordinary though when thinking about real world conditions for a device, especially considering the sensors should also fit in a smaller enclosure later on.

Is there anything I‘m missing here?
Listening to the recording, the quality and noise filtering needs to improve considerably to be usable.
I’m happy to hear about any suggestions.