ESPHome Voice Assistant

I tried to use ESP32 with INMP441 MEMS microphone as Voice Assistant
But always get an error “Error: stt-no-text-recognized - No text recognized”

log

14:51:03	[D]	[binary_sensor:036]	
'Push': Sending state ON
14:51:03	[D]	[voice_assistant:065]	
Requesting start...
14:51:03	[D]	[voice_assistant:045]	
Starting...
14:51:03	[D]	[voice_assistant:083]	
Assist Pipeline running
14:51:03	[D]	[switch:012]	
'LED RED' Turning ON.
14:51:03	[D]	[switch:055]	
'LED RED': Sending state ON
14:51:05	[D]	[binary_sensor:036]	
'Push': Sending state OFF
14:51:05	[D]	[voice_assistant:073]	
Signaling stop...
14:51:07	[D]	[sensor:110]	
'sensor_wifi_signal': Sending state -54.00000 dBm with 0 decimals of accuracy
14:51:09	[D]	[switch:016]	
'LED RED' Turning OFF.
14:51:09	[D]	[switch:055]	
'LED RED': Sending state OFF
14:51:13	[D]	[sensor:110]	
'ESP32 WiFi Level': Sending state 92.00000 % with 0 decimals of accuracy
14:51:15	[D]	[internal_temperature:048]	
Ignoring invalid temperature (success=0, value=53.3)
14:51:17	[E]	[voice_assistant:145]	
Error: stt-no-text-recognized - No text recognized
14:51:17	[D]	[switch:012]	
'LED 1' Turning ON.
14:51:17	[D]	[switch:055]	
'LED 1': Sending state ON	

yaml

#--------------------------------------------------------
i2s_audio:
  i2s_lrclk_pin: GPIO15   #WS
  i2s_bclk_pin: GPIO02    #SCK
  

microphone:
  - platform: i2s_audio
    i2s_din_pin: GPIO04   #SD
    id: Mic

voice_assistant:
  microphone: Mic
  id: VA
  on_start:
    - switch.turn_on: RED

  on_end:
    - switch.turn_off: RED

  on_stt_end:
    - switch.turn_on: GREEN
    - delay: 5s
    - switch.turn_off: GREEN

  on_tts_start: 
    - switch.turn_on: BLUE
  on_tts_end:
  - switch.turn_off: BLUE

  on_error:
    - switch.turn_on: LED_1   
    - delay: 10s
    - switch.turn_off: LED_1  


binary_sensor:
  - platform: gpio
    pin: "GPIO05"
    name: "Push"
    filters:
      - delayed_on_off: 500ms
    on_press:
      - voice_assistant.start:
    on_release:
      - voice_assistant.stop:

switch:
  - platform: gpio
    pin: GPIO32
    id: RED
    name: "LED RED"

  - platform: gpio
    pin: GPIO33
    id: GREEN
    name: "LED GREEN"

  - platform: gpio
    pin: GPIO25
    id: BLUE
    name: "LED BLUE"

  - platform: gpio
    pin: GPIO26
    id: LED_1
    name: "LED 1"

  - platform: gpio
    pin: GPIO27
    id: LED_2
    name: "LED 2"

  - platform: gpio
    pin: GPIO14
    id: LED_3
    name: "LED 3"

  - platform: gpio
    pin: GPIO12
    id: LED_4
    name: "LED 4"

  - platform: gpio
    pin: GPIO13
    id: LED_5
    name: "LED 5"

Only PDM microphones (like the M5 Atom Echo) are supportd in ESPHome 2023.4.
Next version will support INMP441.

thanks @koying

I’m using an M5 Atom Echo, but I still get the same error:

[18:13:45][D][media_player:059]: 'Office Atom Echo' - Setting
[18:13:45][D][media_player:063]:   Command: TOGGLE
[18:13:48][E][voice_assistant:145]: Error: stt-no-text-recognized - No text recognized

I’m using the config from the example (https://raw.githubusercontent.com/esphome/media-players/main/m5stack-atom-echo.yaml) with only tweaks to connect to my Wifi network.

Looks like support for pdm=false has been added in esphome but I’m also getting the same error as OP.

1 Like

Did any one manage to get this working with the INMP441?

I tried INMP441 with multiple ESP32 versions but did NOT succeed. Here is the YAML that I used.

# MEMS microphone INMP441
i2s_audio:
  i2s_lrclk_pin: GPIO25  # LRCLK, WS, FS
  i2s_bclk_pin: GPIO27   # BLCK, SCK
microphone:
  - platform: i2s_audio
    id: mic_i2s
    i2s_din_pin: GPIO32  # DIN, SDIN, SD, SDATA, ADCDATA
    adc_type: external
    pdm: false     # quite sure that INMP441 is NO PDM!
    channel: right  # open or GND on L/R input of INMP441 => left, high level => right
voice_assistant:
  microphone: mic_i2s

binary_sensor:    
  - platform: gpio
    pin: 
      number: GPIO26
      inverted: true
      mode:
        input: true
        pullup: true
    name: Talk Switch
    internal: true
    on_press:
      - voice_assistant.start:
    on_release:
      - voice_assistant.stop:

Today I was super happy, the android app was able to use voice assist without certificate, finally!!!

But bummer, not a single word or command gets understood by fast-piper, always sorry do not understand, while when I type them they work. Maybe I have a terrible speech impediment, in my own language or… Who has a clue how to solve this?

1 Like

You basically need a GPU to enable accurate STT processing with those larger models. CPU processing is too slow.

1 Like

how to do that ?

I’m also trying to get this to work with ESPHome and an INM441 without success.

I created a super basic ESPHome YAML to test the INM441 similarly to how koying tested his here: Add more configuration for microphones - i2s/pdm/adc by jesserockz · Pull Request #4775 · esphome/esphome · GitHub right now, mine looks like this:

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO26
    i2s_bclk_pin: GPIO25

microphone:
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: mic_i2s
    channel: left
    bits_per_sample: 16bit
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO33

output:
  - platform: gpio
    pin: GPIO17
    id: light_output

switch:
  - platform: gpio
    pin: GPIO27
    name: L-R Switch

light:
  - platform: binary
    name: "Response Light"
    id: response_light
    output: light_output

voice_assistant:
  microphone: mic_i2s

binary_sensor:
  - platform: gpio
    pin: 
      number: GPIO00
      inverted: true
      mode:
        input: true
        pullup: true
    name: Boot Switch
    internal: true
    on_press:
      - voice_assistant.start:
      - light.turn_on: response_light
    on_release:
      - voice_assistant.stop:
      - light.turn_off: response_light

The device looks like this:

I did try koying’s YAML file for just the microphone too with the same pinouts and it gave the same results. Every time, I see the response: Error: stt-no-text-recognized - No text recognized when trying to issue a voice command.

  • I’ve tried swapping L/R to 3.3V vs. GND and changing the channel: left to channel: right.
  • I’ve tried swapping the entire SOC from a 38-pin ESP32 to a 30-pin ESP32 – both of these were done with koying’s mic-only config.
  • I’ve tried using minimal wires. I’ve tried changing from 16bit to 32bit.
  • I’ve tried using pdm: true.

I’m running out of things to try. :frowning:

1 Like

Not really of much help. But I also ran out of ideas many months ago and have not tried since. All I could conclude was the INM441 mic was not compatible.

I JUST GOT IT WORKING!!

I bought a set of 5 of these INMP441 microphones. I just built up another (by soldering the pins on it) and swapped it out with the one I had been using for testing and VOILA:

  1. I was able to speak to it
  2. It sent the recording to HA’s voice assistant
  3. The voice assistant decoded it and performed the operation!

Here’s the ESPHome USB-to-serial debug log data:

[17:55:27][D][binary_sensor:036]: 'Boot Switch': Sending state ON
[17:55:27][D][voice_assistant:132]: Requesting start...
[17:55:27][D][light:036]: 'Response Light' Setting:
[17:55:27][D][light:047]:   State: ON
[17:55:27][D][voice_assistant:111]: Starting...
[17:55:27][D][voice_assistant:154]: Assist Pipeline running
[17:55:28][D][sensor:094]: 'Testing Uptime Raw': Sending state 2099.45703 s with 0 decimals of accuracy
[17:55:29][D][binary_sensor:036]: 'Boot Switch': Sending state OFF
[17:55:29][D][voice_assistant:144]: Signaling stop...
[17:55:29][D][light:036]: 'Response Light' Setting:
[17:55:29][D][light:047]:   State: OFF
[17:55:30][D][voice_assistant:168]: Speech recognised as: " Turn on the office light."
[17:55:30][D][voice_assistant:144]: Signaling stop...
[17:55:30][D][voice_assistant:192]: Response: "Turned on light"
[17:55:30][D][voice_assistant:207]: Response URL: "http://[REDACTED]:8123/api/tts_proxy/c9423eae01959b2af87c0b8d21f861b36e9b0fec_en-us_8f9b84fea8_tts.piper.raw"
[17:55:30][D][voice_assistant:218]: Assist Pipeline ended

Success! I actually found out that you can add something like this to your configuration.yaml file to capture the WAV files from your ESPHome microphone:

assist_pipeline:
  debug_recording_dir: /share/assist_pipeline

That worked as well. I now have a recording of me saying “Turn off the office light”. It sounds pretty bad so I need to figure out how to clean it up but at least it was clear enough that HA understood what I wanted!

Make sure you use 32bit mode for the INMP441 – the recording is incredibly clearer with 32bit rather than 16bit mode:

microphone:
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: mic_i2s
    channel: right
    bits_per_sample: 32bit
    i2s_din_pin: GPIO33

This is how mine looks, for reference.

2 Likes

Hi SpikeyGG !

Could you share the yaml file ?

Thanks

Sure… It’s the same as my post from 6 days ago but with 32bit instead of 16bit:

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO26
    i2s_bclk_pin: GPIO25

microphone:
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: mic_i2s
    channel: left
    bits_per_sample: 32bit
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO33

output:
  - platform: gpio
    pin: GPIO17
    id: light_output

switch:
  - platform: gpio
    pin: GPIO27
    name: L-R Switch

light:
  - platform: binary
    name: "Response Light"
    id: response_light
    output: light_output

voice_assistant:
  microphone: mic_i2s

binary_sensor:
  - platform: gpio
    pin: 
      number: GPIO00
      inverted: true
      mode:
        input: true
        pullup: true
    name: Boot Switch
    internal: true
    on_press:
      - voice_assistant.start:
      - light.turn_on: response_light
    on_release:
      - voice_assistant.stop:
      - light.turn_off: response_light

you can use the ‘BOOT’ button to turn on the STT. You can use the software switch called ‘L-R Switch’ to change the microphone orientation - use this as a debug if you don’t get any audio. The idea is you’re trying to line up the channel: left with what the microphone is sending.

1 Like

Thanks !
Waiting my INMP441 for testing :wink:

Have a good day

Also wating for some INMP441’s, stupid question will this all work with D1-mini’s? thanks take care

I don’t see why it wouldn’t work. However, I don’t think the D1 mini has any DACs built in so you probably couldn’t do a stereo speaker output easily like with an ESP32, if you wanted the device to use TTS and respond.

1 Like

I’m trying to debug my voice assist and getting nowhere, in fact I’m in a worse state now than when I started! Anyone got a log output after the latest updates to HA and ESPHome?

Here’s mine, which I think is saying my voice assist pipeline is not receiving anything at all! My assist can control my lights as expected, I just can’t get any STT to it.

[14:40:03][D][binary_sensor:036]: 'Talk Button': Sending state ON
[14:40:03][D][voice_assistant:366]: State changed from IDLE to START_PIPELINE
[14:40:03][D][voice_assistant:372]: Desired state set to START_MICROPHONE
[14:40:03][D][light:036]: 'RGB LED' Setting:
[14:40:03][D][light:047]:   State: ON
[14:40:22][D][binary_sensor:036]: 'Talk Button': Sending state OFF
[14:40:22][D][voice_assistant:366]: State changed from START_PIPELINE to STOP_MICROPHONE
[14:40:22][D][voice_assistant:372]: Desired state set to IDLE
[14:40:22][D][light:036]: 'RGB LED' Setting:
[14:40:22][D][light:047]:   State: OFF

I have an ESP32 Dev board, INMP441, MAX98357A (and small speaker), WS2812 and switch. Whisper is configured and enabled, as is Piper and OpenWakeWord. Voice Assistant is set for my HA and all fields have (what I think) sensible entries.

If I use some Arduino code to test my microphone and speaker, they seem to work. I’ve added the config to save the recordings as suggested by @SpikeyGG above and I get folders being created but they are empty. I’ve tried changing Left/Right, 16/32bit but am getting nowhere and wishing I got a M5Stack ATOM Echo when they were available last week!