pimp1310
(Pimp1310)
March 22, 2024, 7:46pm
1
Hello,
I would like to build a self-made home assistant microphone, I have already done a bit of tinkering and ended up with the following hardware.
ESP32 MIni [with ESPhome Code]
INMP441 microphone
MAX98357A Audio Amplifier
3-watt speakers (for output)
Software:
faster-whisper with CUDA (on GPU, takes 0.96 sec for a command)
The whole thing actually works quite well, but on the one hand I always have crackling and crackling in the loudspeaker until I give the first command, then it stops, but the output, i.e. the response from the Assist, is constantly interrupted by disturbances.
But the main problem is that the microphone simply misunderstands me, regardless of whether I move closer or further away.
I then tested the whole thing and simply used my cell phone and whisper assist, and the assist understands me perfectly!
So it probably depends on the microphone I chose.
Here is my schematic
Here is my ESPHome sketch
captive_portal:
i2s_audio:
- id: i2s_in
i2s_lrclk_pin: GPIO26 #WS / LRC
i2s_bclk_pin: GPIO25 #SCK /BCLK
microphone:
- platform: i2s_audio
adc_type: external
pdm: false
id: mic_i2s
channel: right
bits_per_sample: 32bit
i2s_audio_id: i2s_in
i2s_din_pin: GPIO33 #SD
speaker:
- platform: i2s_audio
id: my_speaker
dac_type: external
i2s_dout_pin: GPIO22 #DIN
mode: mono
i2s_audio_id: i2s_in
voice_assistant:
microphone: mic_i2s
id: va
noise_suppression_level: 2
auto_gain: 31dBFS
volume_multiplier: 4.0
use_wake_word: false
speaker: my_speaker
on_error:
- if:
condition:
switch.is_on: use_wake_word
then:
- switch.turn_off: use_wake_word
- switch.turn_on: use_wake_word
on_client_connected:
- if:
condition:
switch.is_on: use_wake_word
then:
- voice_assistant.start_continuous:
on_client_disconnected:
- if:
condition:
switch.is_on: use_wake_word
then:
- voice_assistant.stop:
binary_sensor:
- platform: status
name: API Connection
id: api_connection
filters:
- delayed_on: 1s
on_press:
- if:
condition:
switch.is_on: use_wake_word
then:
- voice_assistant.start_continuous:
on_release:
- if:
condition:
switch.is_on: use_wake_word
then:
- voice_assistant.stop:
switch:
- platform: template
name: Use wake word
id: use_wake_word
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
entity_category: config
on_turn_on:
- lambda: id(va).set_use_wake_word(true);
- if:
condition:
not:
- voice_assistant.is_running
then:
- voice_assistant.start_continuous
on_turn_off:
- voice_assistant.stop
- lambda: id(va).set_use_wake_word(false);
Does anyone have an idea of what hardware to use?
But I don’t want a ready-made solution, I would like to make it myself because I have already drawn a housing and would print it with the 3D printer.
I moved your post to voice category because I believe you can get help from someone with microphone/speaker settings in espHome.
I know if you look at the canned share your projects devices, there are a lot of microphone and speaker settings that could smooth things out for you.
I’m not an expert, but I thought you would have more focused eyes on your problem here.
(Feel free to move it back to hardware if you really think it needs to be there)
mchk
March 22, 2024, 9:25pm
3
you can save spoken phrases and evaluate their quality
All the popular solutions that you can find work about the same way at the moment. The sensitivity of the microphone (or microphone array) is lower than that of smart speakers from large companies.
But at a distance of up to 3 meters, there should be no noticeable problems
move the i2s pins to different pins than 25 & 26 this has been known to cure the 'crackling issue" pins 25 & 26 are the internal DAC pins of the ESP32 so better to avoid using these
Here is my wiring for the same hardware.
i2s_audio:
- id: i2s_in
i2s_lrclk_pin: GPIO25
i2s_bclk_pin: GPIO26
- id: i2s_out
i2s_lrclk_pin: GPIO32
i2s_bclk_pin: GPIO13
microphone:
platform: i2s_audio
id: external_microphone
adc_type: external
i2s_audio_id: i2s_in
i2s_din_pin: GPIO34
pdm: false
bits_per_sample: 32bit
speaker:
platform: i2s_audio
id: external_speaker
dac_type: external
i2s_audio_id: i2s_out
i2s_dout_pin: GPIO33
I have 4 of these that work pretty flawlessly. 2 with generic ESP32 boards and 2 with S-3 boards.
2 Likes
pimp1310
(Pimp1310)
March 23, 2024, 9:03pm
6
@robgough1970
i tried it first it works well, but then the crackling comes back when HA gives a answer… and now the esp cracklings and hangs, i must power off the device and reboot.
@Rich37804
what is on your connected?
id: i2s_out
i2s_lrclk_pin: GPIO32
i2s_bclk_pin: GPIO13
can you show your esphome code complete?
and a electric schematic?
and you have the ESP32 WROOM32 ?
this is my Esphome device, is it wrong?
esp32:
board: esp32dev
framework:
type: arduino
substitutions:
voice_assist_idle_phase_id: '1'
voice_assist_listening_phase_id: '2'
voice_assist_thinking_phase_id: '3'
voice_assist_replying_phase_id: '4'
voice_assist_not_ready_phase_id: '10'
voice_assist_error_phase_id: '11'
voice_assist_muted_phase_id: '12'
esphome:
name: living-room-voice-assistant
friendly_name: Living Room Voice Assistant
on_boot:
priority: 600
then:
- script.execute: control_led
- delay: 30s
- if:
condition:
lambda: return id(init_in_progress);
then:
- lambda: id(init_in_progress) = false;
- script.execute: control_led
esp32:
board: esp32dev
framework:
type: esp-idf
# Enable logging
logger:
# Enable Home Assistant API
api:
encryption:
key: "snip"
ota:
password: "snip"
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
# Enable fallback hotspot (captive portal) in case wifi connection fails
ap:
ssid: "snip"
password: "snip"
esp_adf:
external_components:
- source: github://pr#5230
components:
- esp_adf
refresh: 0s
captive_portal:
light:
- platform: esp32_rmt_led_strip
rgb_order: GRB
pin: GPIO18
num_leds: 3
rmt_channel: 0
chipset: WS2812
name: "Status LED"
id: led
default_transition_length: 0s
effects:
- pulse:
name: "extra_slow_pulse"
transition_length: 800ms
update_interval: 800ms
min_brightness: 0%
max_brightness: 30%
- pulse:
name: "slow_pulse"
transition_length: 250ms
update_interval: 250ms
min_brightness: 50%
max_brightness: 100%
- pulse:
name: "fast_pulse"
transition_length: 100ms
update_interval: 100ms
min_brightness: 50%
max_brightness: 100%
i2s_audio:
- id: i2s_in
i2s_lrclk_pin: GPIO25
i2s_bclk_pin: GPIO26
- id: i2s_out
i2s_lrclk_pin: GPIO32
i2s_bclk_pin: GPIO13
microphone:
platform: i2s_audio
id: external_microphone
adc_type: external
i2s_audio_id: i2s_in
i2s_din_pin: GPIO34
pdm: false
bits_per_sample: 32bit
speaker:
platform: i2s_audio
id: external_speaker
dac_type: external
i2s_audio_id: i2s_out
i2s_dout_pin: GPIO12
mode: mono
voice_assistant:
id: va
microphone: external_microphone
speaker: external_speaker
use_wake_word: true
noise_suppression_level: 2
auto_gain: 31dBFS
volume_multiplier: 2.5
on_listening:
- lambda: id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
- script.execute: control_led
on_stt_vad_end:
- lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
- script.execute: control_led
on_tts_stream_start:
- lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
- script.execute: control_led
on_tts_stream_end:
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
- script.execute: control_led
on_error:
- if:
condition:
lambda: return !id(init_in_progress);
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};
- script.execute: control_led
- delay: 1s
- if:
condition:
switch.is_on: use_wake_word
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
else:
- lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
- script.execute: control_led
on_client_connected:
- if:
condition:
switch.is_on: use_wake_word
then:
- voice_assistant.start_continuous
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
else:
- lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
- script.execute: control_led
on_client_disconnected:
- lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};
- script.execute: control_led
switch:
- platform: template
name: Use Wake Word
id: use_wake_word
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
on_turn_on:
- if:
condition:
lambda: return !id(init_in_progress);
then:
- lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
- if:
condition:
not:
- voice_assistant.is_running
then:
- voice_assistant.start_continuous
- script.execute: control_led
on_turn_off:
- if:
condition:
lambda: return !id(init_in_progress);
then:
- voice_assistant.stop
- lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
- script.execute: control_led
globals:
- id: init_in_progress
type: bool
restore_value: no
initial_value: 'true'
- id: voice_assistant_phase
type: int
restore_value: no
initial_value: ${voice_assist_not_ready_phase_id}
script:
- id: control_led
then:
- if:
condition:
lambda: return !id(init_in_progress);
then:
- if:
condition:
wifi.connected:
then:
- if:
condition:
api.connected:
then:
- lambda: |
switch(id(voice_assistant_phase)) {
case ${voice_assist_listening_phase_id}:
id(led).turn_on().set_rgb(0, 0, 1).set_brightness(1.0).set_effect("none").perform();
break;
case ${voice_assist_thinking_phase_id}:
id(led).turn_on().set_rgb(0, 1, 0).set_effect("slow_pulse").perform();
break;
case ${voice_assist_replying_phase_id}:
id(led).turn_on().set_rgb(0, 0, 1).set_brightness(1.0).set_effect("fast_pulse").perform();
break;
case ${voice_assist_error_phase_id}:
id(led).turn_on().set_rgb(1, 1, 1).set_brightness(.5).set_effect("none").perform();
break;
case ${voice_assist_muted_phase_id}:
id(led).turn_off().perform();
break;
case ${voice_assist_not_ready_phase_id}:
id(led).turn_on().perform();
break;
default:
id(led).turn_on().set_rgb(1, 0, 0).set_brightness(0.2).set_effect("none").perform();
break;
}
else:
- light.turn_off:
id: led
else:
- light.turn_off:
id: led
else:
- light.turn_on:
id: led
blue: 50%
red: 50%
green: 50%
effect: "fast_pulse"
1 Like
pimp1310
(Pimp1310)
March 24, 2024, 5:05pm
11
now mi use the same code and the same wiring as you. i have no crashes more, but i have no voice in the speaker now i have only crackling no undertandable speaking anymore.as answer.
have you by your setup the GND to L/R from the INMP441?
pimp1310
(Pimp1310)
March 24, 2024, 5:14pm
12
Rich37804:
i2s_lrclk_pin: GPIO32
or must this i2s_lrclk_pin: GPIO32 be GPIO33 ?
there is no wiring for GPIO32
Yes, grnd to the L/R also. I dont see an issue with using GPIO 33
Make sure you have a good ground and 5V supply to the amplifier.
pimp1310
(Pimp1310)
March 24, 2024, 6:58pm
14
okay i will test to give L/R Grnd.
No i mean in your wiring table above there is no GPIO32 connected to something but in the ESPHome COde its deifnied.
is this correct?
EDIT
L/R to GND Changed nothing on the Problem witth the Speaker output.
I tried different configs on a couple devices. Both worked.
pimp1310
(Pimp1310)
March 24, 2024, 8:03pm
16
No, maybe I’m expressing myself wrong, you posted the connection diagram above. BUT the GPIO32 does NOT appear there.
but in your ESPHOME code the Gpio 32 is defined at “speaker” or under i2s_out
id: i2s_out
i2s_lrclk_pin: GPIO32
i2s_bclk_pin: GPIO13
That’s what I mean, is that correct?
I have now changed again from 3.3V to 5V for the amplifier, but it still cracks, unfortunately I don’t get any sound in response from the voice assistant, just a cracking sound.
You can use 32, or 33. Either will work.
Some boards GPIO 12 is not a good choice for i2s_dout_pin, move that to another pin. Maybe 32, or 33. Whichever one you have open.
1 Like
pimp1310
(Pimp1310)
March 25, 2024, 1:51pm
18
now it works well.
here for another user the right wiring
2 Likes
pery
(pery)
March 25, 2024, 5:37pm
19
Hi,
do you have any guide/proven way to run faster-whisper on GPU?
pimp1310
(Pimp1310)
March 26, 2024, 10:45pm
20
it depends on the system you use, but its neccessary to use a Nvidia Card + Docker.
i buyed a GTX1680 OC 4GB for 80€ and put it in my Media Server with OMV6 where my Whisper run.
This Guide is for Ubunutu.
when you want to use it on omv6 wrote me a PM
pimp1310
(Pimp1310)
March 29, 2024, 3:27pm
21
gives a specific reason you use the “esp-idf” ?
i tried to add the media_player component, but this is not supported under the “esp-idf” Framework only under “adruino”.