ESPHome Voice Assistant

Today I was super happy, the android app was able to use voice assist without certificate, finally!!!

But bummer, not a single word or command gets understood by fast-piper, always sorry do not understand, while when I type them they work. Maybe I have a terrible speech impediment, in my own language or… Who has a clue how to solve this?

1 Like

You basically need a GPU to enable accurate STT processing with those larger models. CPU processing is too slow.

1 Like

how to do that ?

I’m also trying to get this to work with ESPHome and an INM441 without success.

I created a super basic ESPHome YAML to test the INM441 similarly to how koying tested his here: Add more configuration for microphones - i2s/pdm/adc by jesserockz · Pull Request #4775 · esphome/esphome · GitHub right now, mine looks like this:

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO26
    i2s_bclk_pin: GPIO25

microphone:
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: mic_i2s
    channel: left
    bits_per_sample: 16bit
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO33

output:
  - platform: gpio
    pin: GPIO17
    id: light_output

switch:
  - platform: gpio
    pin: GPIO27
    name: L-R Switch

light:
  - platform: binary
    name: "Response Light"
    id: response_light
    output: light_output

voice_assistant:
  microphone: mic_i2s

binary_sensor:
  - platform: gpio
    pin: 
      number: GPIO00
      inverted: true
      mode:
        input: true
        pullup: true
    name: Boot Switch
    internal: true
    on_press:
      - voice_assistant.start:
      - light.turn_on: response_light
    on_release:
      - voice_assistant.stop:
      - light.turn_off: response_light

The device looks like this:

I did try koying’s YAML file for just the microphone too with the same pinouts and it gave the same results. Every time, I see the response: Error: stt-no-text-recognized - No text recognized when trying to issue a voice command.

  • I’ve tried swapping L/R to 3.3V vs. GND and changing the channel: left to channel: right.
  • I’ve tried swapping the entire SOC from a 38-pin ESP32 to a 30-pin ESP32 – both of these were done with koying’s mic-only config.
  • I’ve tried using minimal wires. I’ve tried changing from 16bit to 32bit.
  • I’ve tried using pdm: true.

I’m running out of things to try. :frowning:

1 Like

Not really of much help. But I also ran out of ideas many months ago and have not tried since. All I could conclude was the INM441 mic was not compatible.

I JUST GOT IT WORKING!!

I bought a set of 5 of these INMP441 microphones. I just built up another (by soldering the pins on it) and swapped it out with the one I had been using for testing and VOILA:

  1. I was able to speak to it
  2. It sent the recording to HA’s voice assistant
  3. The voice assistant decoded it and performed the operation!

Here’s the ESPHome USB-to-serial debug log data:

[17:55:27][D][binary_sensor:036]: 'Boot Switch': Sending state ON
[17:55:27][D][voice_assistant:132]: Requesting start...
[17:55:27][D][light:036]: 'Response Light' Setting:
[17:55:27][D][light:047]:   State: ON
[17:55:27][D][voice_assistant:111]: Starting...
[17:55:27][D][voice_assistant:154]: Assist Pipeline running
[17:55:28][D][sensor:094]: 'Testing Uptime Raw': Sending state 2099.45703 s with 0 decimals of accuracy
[17:55:29][D][binary_sensor:036]: 'Boot Switch': Sending state OFF
[17:55:29][D][voice_assistant:144]: Signaling stop...
[17:55:29][D][light:036]: 'Response Light' Setting:
[17:55:29][D][light:047]:   State: OFF
[17:55:30][D][voice_assistant:168]: Speech recognised as: " Turn on the office light."
[17:55:30][D][voice_assistant:144]: Signaling stop...
[17:55:30][D][voice_assistant:192]: Response: "Turned on light"
[17:55:30][D][voice_assistant:207]: Response URL: "http://[REDACTED]:8123/api/tts_proxy/c9423eae01959b2af87c0b8d21f861b36e9b0fec_en-us_8f9b84fea8_tts.piper.raw"
[17:55:30][D][voice_assistant:218]: Assist Pipeline ended

Success! I actually found out that you can add something like this to your configuration.yaml file to capture the WAV files from your ESPHome microphone:

assist_pipeline:
  debug_recording_dir: /share/assist_pipeline

That worked as well. I now have a recording of me saying “Turn off the office light”. It sounds pretty bad so I need to figure out how to clean it up but at least it was clear enough that HA understood what I wanted!

Make sure you use 32bit mode for the INMP441 – the recording is incredibly clearer with 32bit rather than 16bit mode:

microphone:
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: mic_i2s
    channel: right
    bits_per_sample: 32bit
    i2s_din_pin: GPIO33

This is how mine looks, for reference.

2 Likes

Hi SpikeyGG !

Could you share the yaml file ?

Thanks

Sure… It’s the same as my post from 6 days ago but with 32bit instead of 16bit:

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO26
    i2s_bclk_pin: GPIO25

microphone:
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: mic_i2s
    channel: left
    bits_per_sample: 32bit
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO33

output:
  - platform: gpio
    pin: GPIO17
    id: light_output

switch:
  - platform: gpio
    pin: GPIO27
    name: L-R Switch

light:
  - platform: binary
    name: "Response Light"
    id: response_light
    output: light_output

voice_assistant:
  microphone: mic_i2s

binary_sensor:
  - platform: gpio
    pin: 
      number: GPIO00
      inverted: true
      mode:
        input: true
        pullup: true
    name: Boot Switch
    internal: true
    on_press:
      - voice_assistant.start:
      - light.turn_on: response_light
    on_release:
      - voice_assistant.stop:
      - light.turn_off: response_light

you can use the ‘BOOT’ button to turn on the STT. You can use the software switch called ‘L-R Switch’ to change the microphone orientation - use this as a debug if you don’t get any audio. The idea is you’re trying to line up the channel: left with what the microphone is sending.

1 Like

Thanks !
Waiting my INMP441 for testing :wink:

Have a good day

Also wating for some INMP441’s, stupid question will this all work with D1-mini’s? thanks take care

I don’t see why it wouldn’t work. However, I don’t think the D1 mini has any DACs built in so you probably couldn’t do a stereo speaker output easily like with an ESP32, if you wanted the device to use TTS and respond.

1 Like

I’m trying to debug my voice assist and getting nowhere, in fact I’m in a worse state now than when I started! Anyone got a log output after the latest updates to HA and ESPHome?

Here’s mine, which I think is saying my voice assist pipeline is not receiving anything at all! My assist can control my lights as expected, I just can’t get any STT to it.

[14:40:03][D][binary_sensor:036]: 'Talk Button': Sending state ON
[14:40:03][D][voice_assistant:366]: State changed from IDLE to START_PIPELINE
[14:40:03][D][voice_assistant:372]: Desired state set to START_MICROPHONE
[14:40:03][D][light:036]: 'RGB LED' Setting:
[14:40:03][D][light:047]:   State: ON
[14:40:22][D][binary_sensor:036]: 'Talk Button': Sending state OFF
[14:40:22][D][voice_assistant:366]: State changed from START_PIPELINE to STOP_MICROPHONE
[14:40:22][D][voice_assistant:372]: Desired state set to IDLE
[14:40:22][D][light:036]: 'RGB LED' Setting:
[14:40:22][D][light:047]:   State: OFF

I have an ESP32 Dev board, INMP441, MAX98357A (and small speaker), WS2812 and switch. Whisper is configured and enabled, as is Piper and OpenWakeWord. Voice Assistant is set for my HA and all fields have (what I think) sensible entries.

If I use some Arduino code to test my microphone and speaker, they seem to work. I’ve added the config to save the recordings as suggested by @SpikeyGG above and I get folders being created but they are empty. I’ve tried changing Left/Right, 16/32bit but am getting nowhere and wishing I got a M5Stack ATOM Echo when they were available last week!

I also get no response from my esp32 dev board with INMP441. I tried various settings. My M5stack echo though is working perfectly now after todays Esphome update, and was working before the update with the no response issue a lot had.

@Arh Curious isn’t it? As I said it appears as though both my microphone and speaker work, but voice assistant doesn’t like them. I wonder if it’s the ESP32 Dev board??? I don’t have a spare board of any type at the moment to test with unfortunately. Been holding off getting whilst I make my mind what to go for… But that’s for another post!

I know I said “for another post”, BUT what boards are people using in their Voice Assistant builds? That are working!

On my board there’s a ESP32-WROOM32 module and I’m using

  i2s_lrclock: GPIO25
  i2s_bclock: GPIO33
  i2s_din: GPIO32
  i2s_dout: GPIO14
  onboard_led: GPIO02
  onboard_button: GPIO00
  rgb_led: GPIO13
  push_to_talk: GPIO23

Finally got it working, well mostly! The issue with the board I have is that the i2s clock NEEDS to be on certain pins, when I used one of these pins, I have voice! In my case I used GPIO3, all other pins were unchanged.

Not yet perfect, but it works!

Documentation:

https://www.espressif.com/sites/default/files/documentation/esp32-wroom-32_datasheet_en.pdf
https://www.espressif.com/sites/default/files/documentation/esp32_datasheet_en.pdf

Using the following from @SpikeyGG I’m able to hear what was sent from STT

assist_pipeline:
  debug_recording_dir: /share/assist_pipeline
2 Likes

Do you mean the L-R switch GPIO should be connected to the L/R of the microphone so as to allow switching left or right?

That’s what I did, simply for debug.

Where did you get the updated version from? I saw the bugfix on github, but is this now available via the webinstaller, or do I have to have a local beta version of esphome installed for the compiling/flashing? Thanks in advance :slight_smile: