Best selfmade hardware/settings for Homeassistant Assist Microphone ?!

pimp1310 · March 22, 2024, 7:46pm

Hello,

I would like to build a self-made home assistant microphone, I have already done a bit of tinkering and ended up with the following hardware.

ESP32 MIni [with ESPhome Code]
INMP441 microphone
MAX98357A Audio Amplifier
3-watt speakers (for output)

Software:
faster-whisper with CUDA (on GPU, takes 0.96 sec for a command)

The whole thing actually works quite well, but on the one hand I always have crackling and crackling in the loudspeaker until I give the first command, then it stops, but the output, i.e. the response from the Assist, is constantly interrupted by disturbances.
But the main problem is that the microphone simply misunderstands me, regardless of whether I move closer or further away.

I then tested the whole thing and simply used my cell phone and whisper assist, and the assist understands me perfectly!
So it probably depends on the microphone I chose.

Here is my schematic

Here is my ESPHome sketch


captive_portal:


i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO26   #WS / LRC
    i2s_bclk_pin: GPIO25    #SCK /BCLK

microphone:
  - platform: i2s_audio
    adc_type: external
    pdm: false
    id: mic_i2s
    channel: right
    bits_per_sample: 32bit
    i2s_audio_id: i2s_in
    i2s_din_pin: GPIO33    #SD

speaker:
  - platform: i2s_audio
    id: my_speaker
    dac_type: external
    i2s_dout_pin: GPIO22   #DIN 
    mode: mono
    i2s_audio_id: i2s_in


voice_assistant:
  microphone: mic_i2s
  id: va
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 4.0
  use_wake_word: false
  speaker: my_speaker
  
  on_error: 
   - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - switch.turn_off: use_wake_word
          - switch.turn_on: use_wake_word      

  on_client_connected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.start_continuous:

  on_client_disconnected:
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.stop:
  

binary_sensor:
  - platform: status
    name: API Connection
    id: api_connection
    filters:
      - delayed_on: 1s
    on_press:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - voice_assistant.start_continuous:
    on_release:
      - if:
          condition:
            switch.is_on: use_wake_word
          then:
            - voice_assistant.stop:


switch:
  - platform: template
    name: Use wake word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    entity_category: config
    on_turn_on:
      - lambda: id(va).set_use_wake_word(true);
      - if:
          condition:
            not:
              - voice_assistant.is_running
          then:
            - voice_assistant.start_continuous
    
    on_turn_off:
      - voice_assistant.stop
      - lambda: id(va).set_use_wake_word(false);

Does anyone have an idea of what hardware to use?
But I don’t want a ready-made solution, I would like to make it myself because I have already drawn a housing and would print it with the 3D printer.

Sir_Goodenough · March 22, 2024, 8:03pm

I moved your post to voice category because I believe you can get help from someone with microphone/speaker settings in espHome.
I know if you look at the canned share your projects devices, there are a lot of microphone and speaker settings that could smooth things out for you.

I’m not an expert, but I thought you would have more focused eyes on your problem here.
(Feel free to move it back to hardware if you really think it needs to be there)

mchk · March 22, 2024, 9:25pm

you can save spoken phrases and evaluate their quality

All the popular solutions that you can find work about the same way at the moment. The sensitivity of the microphone (or microphone array) is lower than that of smart speakers from large companies.
But at a distance of up to 3 meters, there should be no noticeable problems

robgough1970 · March 23, 2024, 8:26am

move the i2s pins to different pins than 25 & 26 this has been known to cure the 'crackling issue" pins 25 & 26 are the internal DAC pins of the ESP32 so better to avoid using these

Rich37804 · March 23, 2024, 9:00am

Here is my wiring for the same hardware.

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO25
    i2s_bclk_pin: GPIO26
  - id: i2s_out
    i2s_lrclk_pin: GPIO32
    i2s_bclk_pin: GPIO13

microphone:
  platform: i2s_audio 
  id: external_microphone 
  adc_type: external 
  i2s_audio_id: i2s_in
  i2s_din_pin: GPIO34
  pdm: false
  bits_per_sample: 32bit


speaker:
  platform: i2s_audio 
  id: external_speaker 
  dac_type: external
  i2s_audio_id: i2s_out
  i2s_dout_pin: GPIO33

I have 4 of these that work pretty flawlessly. 2 with generic ESP32 boards and 2 with S-3 boards.

pimp1310 · March 23, 2024, 9:03pm

@robgough1970
i tried it first it works well, but then the crackling comes back when HA gives a answer… and now the esp cracklings and hangs, i must power off the device and reboot.

@Rich37804

what is on your connected?

id: i2s_out
i2s_lrclk_pin: GPIO32
i2s_bclk_pin: GPIO13

can you show your esphome code complete?
and a electric schematic?

and you have the ESP32 WROOM32 ?
this is my Esphome device, is it wrong?

esp32:
board: esp32dev
framework:
type: arduino

pimp1310 · March 24, 2024, 2:42pm

no one ?

i need here help

Rich37804 · March 24, 2024, 3:13pm

ESP32 (WROOM-32)	MAX98357A (Speaker)	I2S Microphone
GPIO33	DIN	-
GPIO12	LRCLK	-
GPIO13	BCLK	-
GPIO34	-	SD
GPIO25	-	WS
GPIO26	-	SCK
3.3V	VDD	VDD
GND	GND	GND

Rich37804 · March 24, 2024, 3:21pm

substitutions:
  voice_assist_idle_phase_id: '1'
  voice_assist_listening_phase_id: '2'
  voice_assist_thinking_phase_id: '3'
  voice_assist_replying_phase_id: '4'
  voice_assist_not_ready_phase_id: '10'
  voice_assist_error_phase_id: '11'
  voice_assist_muted_phase_id: '12'
esphome:
  name: living-room-voice-assistant
  friendly_name: Living Room Voice Assistant
  
  on_boot:
    priority: 600
    then:
      - script.execute: control_led
      - delay: 30s
      - if:
          condition:
            lambda: return id(init_in_progress);
          then:
            - lambda: id(init_in_progress) = false;
            - script.execute: control_led

esp32:
  board: esp32dev
  framework:
    type: esp-idf


# Enable logging
logger:

# Enable Home Assistant API
api:
  encryption:
    key: "snip"

ota:
  password: "snip"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "snip"
    password: "snip"

esp_adf:
external_components:
  - source: github://pr#5230
    components:
    - esp_adf 
    refresh: 0s

captive_portal:

light:
  - platform: esp32_rmt_led_strip
    rgb_order: GRB
    pin: GPIO18
    num_leds: 3
    rmt_channel: 0
    chipset: WS2812
    name: "Status LED"
    id: led
    default_transition_length: 0s
    effects:
      - pulse:
          name: "extra_slow_pulse"
          transition_length: 800ms
          update_interval: 800ms
          min_brightness: 0%
          max_brightness: 30%
      - pulse:
          name: "slow_pulse"
          transition_length: 250ms
          update_interval: 250ms
          min_brightness: 50%
          max_brightness: 100%
      - pulse:
          name: "fast_pulse"
          transition_length: 100ms
          update_interval: 100ms
          min_brightness: 50%
          max_brightness: 100%

i2s_audio:
  - id: i2s_in
    i2s_lrclk_pin: GPIO25
    i2s_bclk_pin: GPIO26
  - id: i2s_out
    i2s_lrclk_pin: GPIO32
    i2s_bclk_pin: GPIO13

microphone:
  platform: i2s_audio 
  id: external_microphone 
  adc_type: external 
  i2s_audio_id: i2s_in
  i2s_din_pin: GPIO34
  pdm: false
  bits_per_sample: 32bit


speaker:
  platform: i2s_audio 
  id: external_speaker 
  dac_type: external
  i2s_audio_id: i2s_out
  i2s_dout_pin: GPIO12
  mode: mono 

voice_assistant:
  id: va
  microphone: external_microphone 
  speaker: external_speaker
  use_wake_word: true
  noise_suppression_level: 2
  auto_gain: 31dBFS
  volume_multiplier: 2.5



  on_listening:
    - lambda: id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
    - script.execute: control_led

  on_stt_vad_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
    - script.execute: control_led

  on_tts_stream_start:
    - lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
    - script.execute: control_led

  on_tts_stream_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
    - script.execute: control_led

  on_error: 
    - if:
        condition:
          lambda: return !id(init_in_progress);
        then:
          - lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};
          - script.execute: control_led
          - delay: 1s
          - if:
              condition:
                switch.is_on: use_wake_word
              then:
                - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
              else:
                - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
          - script.execute: control_led

  on_client_connected: 
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - voice_assistant.start_continuous
          - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
        else:
          - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
    - script.execute: control_led          

  on_client_disconnected: 
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
    - script.execute: control_led

switch:
  - platform: template
    name: Use Wake Word
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    on_turn_on:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
            - if:
                condition:
                    not:
                      - voice_assistant.is_running
                then:
                  - voice_assistant.start_continuous
            - script.execute: control_led          
 
    on_turn_off:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - voice_assistant.stop
            - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
            - script.execute: control_led          

globals:
  - id: init_in_progress
    type: bool
    restore_value: no
    initial_value: 'true'
  - id: voice_assistant_phase
    type: int
    restore_value: no
    initial_value: ${voice_assist_not_ready_phase_id}
  
script:
  - id: control_led
    then:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - if:
                condition:
                    wifi.connected:
                then:
                  - if:
                      condition:
                          api.connected:
                      then:
                        - lambda: |
                            switch(id(voice_assistant_phase)) {
                              case ${voice_assist_listening_phase_id}:
                                id(led).turn_on().set_rgb(0, 0, 1).set_brightness(1.0).set_effect("none").perform();
                                break;
                              case ${voice_assist_thinking_phase_id}:
                                id(led).turn_on().set_rgb(0, 1, 0).set_effect("slow_pulse").perform();
                                break;
                              case ${voice_assist_replying_phase_id}:
                                id(led).turn_on().set_rgb(0, 0, 1).set_brightness(1.0).set_effect("fast_pulse").perform();
                                break;
                              case ${voice_assist_error_phase_id}:
                                id(led).turn_on().set_rgb(1, 1, 1).set_brightness(.5).set_effect("none").perform();
                                break;
                              case ${voice_assist_muted_phase_id}:
                                id(led).turn_off().perform();
                                break;
                              case ${voice_assist_not_ready_phase_id}:
                                id(led).turn_on().perform();
                                break;
                              default:
                                id(led).turn_on().set_rgb(1, 0, 0).set_brightness(0.2).set_effect("none").perform();
                                break;
                            }
                      else:
                        - light.turn_off:
                            id: led
                else:
                  - light.turn_off:
                      id: led
          else:
            - light.turn_on:
                id: led
                blue: 50%
                red: 50%
                green: 50%
                effect: "fast_pulse"

pimp1310 · March 24, 2024, 5:05pm

now mi use the same code and the same wiring as you. i have no crashes more, but i have no voice in the speaker now i have only crackling no undertandable speaking anymore.as answer.

have you by your setup the GND to L/R from the INMP441?

pimp1310 · March 24, 2024, 5:14pm

or must this i2s_lrclk_pin: GPIO32 be GPIO33 ?
there is no wiring for GPIO32

Rich37804 · March 24, 2024, 6:36pm

Yes, grnd to the L/R also. I dont see an issue with using GPIO 33
Make sure you have a good ground and 5V supply to the amplifier.

pimp1310 · March 24, 2024, 6:58pm

okay i will test to give L/R Grnd.

No i mean in your wiring table above there is no GPIO32 connected to something but in the ESPHome COde its deifnied.
is this correct?

EDIT

L/R to GND Changed nothing on the Problem witth the Speaker output.

Rich37804 · March 24, 2024, 7:42pm

I tried different configs on a couple devices. Both worked.

pimp1310 · March 24, 2024, 8:03pm

No, maybe I’m expressing myself wrong, you posted the connection diagram above. BUT the GPIO32 does NOT appear there.
but in your ESPHOME code the Gpio 32 is defined at “speaker” or under i2s_out

id: i2s_out
i2s_lrclk_pin: GPIO32
i2s_bclk_pin: GPIO13

That’s what I mean, is that correct?

I have now changed again from 3.3V to 5V for the amplifier, but it still cracks, unfortunately I don’t get any sound in response from the voice assistant, just a cracking sound.

Rich37804 · March 24, 2024, 8:23pm

You can use 32, or 33. Either will work.
Some boards GPIO 12 is not a good choice for i2s_dout_pin, move that to another pin. Maybe 32, or 33. Whichever one you have open.

pimp1310 · March 25, 2024, 1:51pm

now it works well.

here for another user the right wiring

wiiring

pery · March 25, 2024, 5:37pm

Hi,

do you have any guide/proven way to run faster-whisper on GPU?

pimp1310 · March 26, 2024, 10:45pm

it depends on the system you use, but its neccessary to use a Nvidia Card + Docker.
i buyed a GTX1680 OC 4GB for 80€ and put it in my Media Server with OMV6 where my Whisper run.

This Guide is for Ubunutu.

when you want to use it on omv6 wrote me a PM

pimp1310 · March 29, 2024, 3:27pm

gives a specific reason you use the “esp-idf” ?

i tried to add the media_player component, but this is not supported under the “esp-idf” Framework only under “adruino”.