ESPHome Voice Assistant

I tried to use ESP32 with INMP441 MEMS microphone as Voice Assistant
But always get an error “Error: stt-no-text-recognized - No text recognized”

log

14:51:03	[D]	[binary_sensor:036]	
'Push': Sending state ON
14:51:03	[D]	[voice_assistant:065]	
Requesting start...
14:51:03	[D]	[voice_assistant:045]	
Starting...
14:51:03	[D]	[voice_assistant:083]	
Assist Pipeline running
14:51:03	[D]	[switch:012]	
'LED RED' Turning ON.
14:51:03	[D]	[switch:055]	
'LED RED': Sending state ON
14:51:05	[D]	[binary_sensor:036]	
'Push': Sending state OFF
14:51:05	[D]	[voice_assistant:073]	
Signaling stop...
14:51:07	[D]	[sensor:110]	
'sensor_wifi_signal': Sending state -54.00000 dBm with 0 decimals of accuracy
14:51:09	[D]	[switch:016]	
'LED RED' Turning OFF.
14:51:09	[D]	[switch:055]	
'LED RED': Sending state OFF
14:51:13	[D]	[sensor:110]	
'ESP32 WiFi Level': Sending state 92.00000 % with 0 decimals of accuracy
14:51:15	[D]	[internal_temperature:048]	
Ignoring invalid temperature (success=0, value=53.3)
14:51:17	[E]	[voice_assistant:145]	
Error: stt-no-text-recognized - No text recognized
14:51:17	[D]	[switch:012]	
'LED 1' Turning ON.
14:51:17	[D]	[switch:055]	
'LED 1': Sending state ON	

yaml

#--------------------------------------------------------
i2s_audio:
  i2s_lrclk_pin: GPIO15   #WS
  i2s_bclk_pin: GPIO02    #SCK
  

microphone:
  - platform: i2s_audio
    i2s_din_pin: GPIO04   #SD
    id: Mic

voice_assistant:
  microphone: Mic
  id: VA
  on_start:
    - switch.turn_on: RED

  on_end:
    - switch.turn_off: RED

  on_stt_end:
    - switch.turn_on: GREEN
    - delay: 5s
    - switch.turn_off: GREEN

  on_tts_start: 
    - switch.turn_on: BLUE
  on_tts_end:
  - switch.turn_off: BLUE

  on_error:
    - switch.turn_on: LED_1   
    - delay: 10s
    - switch.turn_off: LED_1  


binary_sensor:
  - platform: gpio
    pin: "GPIO05"
    name: "Push"
    filters:
      - delayed_on_off: 500ms
    on_press:
      - voice_assistant.start:
    on_release:
      - voice_assistant.stop:

switch:
  - platform: gpio
    pin: GPIO32
    id: RED
    name: "LED RED"

  - platform: gpio
    pin: GPIO33
    id: GREEN
    name: "LED GREEN"

  - platform: gpio
    pin: GPIO25
    id: BLUE
    name: "LED BLUE"

  - platform: gpio
    pin: GPIO26
    id: LED_1
    name: "LED 1"

  - platform: gpio
    pin: GPIO27
    id: LED_2
    name: "LED 2"

  - platform: gpio
    pin: GPIO14
    id: LED_3
    name: "LED 3"

  - platform: gpio
    pin: GPIO12
    id: LED_4
    name: "LED 4"

  - platform: gpio
    pin: GPIO13
    id: LED_5
    name: "LED 5"

Only PDM microphones (like the M5 Atom Echo) are supportd in ESPHome 2023.4.
Next version will support INMP441.

thanks @koying

I’m using an M5 Atom Echo, but I still get the same error:

[18:13:45][D][media_player:059]: 'Office Atom Echo' - Setting
[18:13:45][D][media_player:063]:   Command: TOGGLE
[18:13:48][E][voice_assistant:145]: Error: stt-no-text-recognized - No text recognized

I’m using the config from the example (https://raw.githubusercontent.com/esphome/media-players/main/m5stack-atom-echo.yaml) with only tweaks to connect to my Wifi network.

Looks like support for pdm=false has been added in esphome but I’m also getting the same error as OP.

1 Like

Did any one manage to get this working with the INMP441?

I tried INMP441 with multiple ESP32 versions but did NOT succeed. Here is the YAML that I used.

# MEMS microphone INMP441
i2s_audio:
  i2s_lrclk_pin: GPIO25  # LRCLK, WS, FS
  i2s_bclk_pin: GPIO27   # BLCK, SCK
microphone:
  - platform: i2s_audio
    id: mic_i2s
    i2s_din_pin: GPIO32  # DIN, SDIN, SD, SDATA, ADCDATA
    adc_type: external
    pdm: false     # quite sure that INMP441 is NO PDM!
    channel: right  # open or GND on L/R input of INMP441 => left, high level => right
voice_assistant:
  microphone: mic_i2s

binary_sensor:    
  - platform: gpio
    pin: 
      number: GPIO26
      inverted: true
      mode:
        input: true
        pullup: true
    name: Talk Switch
    internal: true
    on_press:
      - voice_assistant.start:
    on_release:
      - voice_assistant.stop:

Today I was super happy, the android app was able to use voice assist without certificate, finally!!!

But bummer, not a single word or command gets understood by fast-piper, always sorry do not understand, while when I type them they work. Maybe I have a terrible speech impediment, in my own language or… Who has a clue how to solve this?

1 Like

You basically need a GPU to enable accurate STT processing with those larger models. CPU processing is too slow.

1 Like

how to do that ?