Here is my working config for these S3 Dev boards. Audio quality seems better on these boards, although the speaker buffer issue has happened a couple of times. Im still not convinced that is a hardware issue anyway…:
Now nothing is connected to the pins on the chip. I’m not sure if that could some how cause the boot issue? Will probably try and connect up all the pins to the microphone and amp just to make sure that isn’t causing this loop.
For the one that isn’t working I’m using the M5Stamps3. For the one that is working I’m using the ESP32 you indicate you’re using. I just have a few of the M5stampS3 sitting around and was hoping they might provide a more powerful CPU option. I’m thinking I might have to pick up the other S3 mentioned in this thread if I want the more powerful board. I’m assuming you had the hanging issues with the original configuration you provide. Any chance you’ll try the mods I posted?
What mods did you make to clean things up? Prior to trying your approach, I tried the ATOM Echo Smart Speaker and it hangs. I tried the ESP32 S3 Box3, which was a lot better than the ATOM, but it would hang periodically. My initial build based on your approach would hang, much like the ATOM. I’ve been hoping to get something clean. The modes I provided cleaned up the hangs and cleared the studders up a lot. I still have some minor studder, but it’s not bad. The studder typically happen at the start of the response play back. Longer response phases play pretty clean. You mentioned openAI, I’m curious if the openAI works together with the HA voice assistant? I’m planning on looking at that next, assuming the HA voice assistant stays alive over 24 hours.
Thes s3 boards are just not cutting it. They just die after a bit and need to be unpluged, then plugged back in. The ones from the older boards are pretty reliable.
I’ll be honest, I took a day off from fighting with it yesterday. Integrating the changes supplied by @bkprath and @robgough1970 I would hope for a more stable experience with this board. I think the idea of stopping and restarting the voice_assistant is a good move, so long as it doesn’t introduce long delays.
Going to add wires soldered to the D+ and D- signals on that Onn power/USB board, specifically to enable data connections using the external USB-C connector. Ordered a few of these USB-C connectors to use.
There is a delay between completion of response and start of detection of the next wake word, as a lot of times I have to say the wake word twice. However, I do not know if that was the result of my change or just the way it would work. A lot of times I have to say the wake word twice when initiating the first command. For me stability was a big issue. With the change I don’t have any lost in space moments, the ones that required restart of the board to get things going again. And audio is a lot better.
Starting and stopping the voice assistant was the issue I was having with the S3 boards. They would work at first, then just not respond after a bit of time.
I removed that code from one of them and left it in the other last night. This morning, the one I removed that code from responded and the one that had the code did not.
This doesn’t seem to affect the other boards but I removed the code anyway.
Here is a vid of me showing the 4 I have built right after each other. Got a bit of choppiness in one. I talk a bit about what I did that I think helps that speaker buffer issue.
Nice video. You mention that you turn off wake word detection when you’re out of the room. Turning off/on wake word detection would actually be similar to my cycling the ESP voice assistant code during command processing. Since I’ve had they system hang problems with the other platforms people are using it makes sense that some network latency could be what triggers the issue within the ESP voice assistant code. I’m going to see if I get better performance simply by putting my one unit closer to my WiFi router. Which ESP32S3 are you using, I looked into the code and M5StampS3 I tried to use are not supported by the code base?
Thanks for posting the responses. I’m trying to get the localai set up now. It is an impressive setup. I’m also looking to see if shifting traffic in my network has an effect. I pulled my ESP32-S3-BOX back out to refresh myself on why I shelved it. Turn out it has both the stutter issue and the lost in space issue. I gave up before because the lost in space issue makes it a useless device. I just made similar changes to the code for it as I did to the code you posted to see if that helps with the issues. The code is a little different because it has the initial wake word detection onboard the ESP. I much prefer your simple LED feedback to the images they present on the S3-BOX. Thanks again.
For me the microphone is the issue. I can’t get any input.
Debug log:
INFO ESPHome 2024.2.0
INFO Reading configuration /config/esphome/esp32-va-001.yaml...
INFO Updating https://github.com/esphome/esphome.git@pull/5230/head
WARNING GPIO3 is a strapping PIN and should only be used for I/O with care.
Attaching external pullup/down resistors to strapping pins can cause unexpected failures.
See https://esphome.io/guides/faq.html#why-am-i-getting-a-warning-about-strapping-pins
INFO Starting log output from 192.168.207.241 using esphome API
INFO Successfully connected to esp32-va-001 @ 192.168.207.241 in 0.191s
INFO Successful handshake with esp32-va-001 @ 192.168.207.241 in 0.310s
[16:22:13][I][app:102]: ESPHome version 2024.2.0 compiled on Feb 27 2024, 16:19:54
[16:22:13][C][wifi:577]: WiFi:
[16:22:13][C][wifi:409]: Local MAC: 3C:84:27:14:86:7C
[16:22:13][C][wifi:414]: SSID: [redacted]
[16:22:13][C][wifi:415]: IP Address: 192.168.207.241
[16:22:13][C][wifi:417]: BSSID: [redacted]
[16:22:13][C][wifi:418]: Hostname: 'esp32-va-001'
[16:22:13][C][wifi:420]: Signal strength: -56 dB ▂▄▆█
[16:22:13][C][wifi:424]: Channel: 11
[16:22:13][C][wifi:425]: Subnet: 255.255.255.0
[16:22:13][C][wifi:426]: Gateway: 192.168.207.1
[16:22:13][C][wifi:427]: DNS1: 192.168.207.130
[16:22:13][C][wifi:428]: DNS2: 0.0.0.0
[16:22:13][C][logger:447]: Logger:
[16:22:13][C][logger:448]: Level: DEBUG
[16:22:13][C][logger:449]: Log Baud Rate: 115200
[16:22:13][C][logger:451]: Hardware UART: USB_SERIAL_JTAG
[16:22:13][C][esp32_rmt_led_strip:175]: ESP32 RMT LED Strip:
[16:22:13][C][esp32_rmt_led_strip:176]: Pin: 18
[16:22:13][C][esp32_rmt_led_strip:177]: Channel: 0
[16:22:13][C][esp32_rmt_led_strip:202]: RGB Order: GRB
[16:22:13][C][esp32_rmt_led_strip:203]: Max refresh rate: 0
[16:22:13][C][esp32_rmt_led_strip:204]: Number of LEDs: 8
[16:22:13][C][light:103]: Light 'Status LED'
[16:22:13][C][light:105]: Default Transition Length: 0.0s
[16:22:13][C][light:106]: Gamma Correct: 2.80
[16:22:13][C][template.switch:068]: Template Switch 'Use Wake Word'
[16:22:13][C][template.switch:091]: Restore Mode: restore defaults to ON
[16:22:13][C][template.switch:057]: Optimistic: YES
[16:22:13][C][psram:020]: PSRAM:
[16:22:13][C][psram:021]: Available: NO
[16:22:13][C][mdns:115]: mDNS:
[16:22:13][C][mdns:116]: Hostname: esp32-va-001
[16:22:13][C][ota:096]: Over-The-Air Updates:
[16:22:13][C][ota:097]: Address: esp32-va-001.local:3232
[16:22:13][C][ota:100]: Using Password.
[16:22:13][C][ota:103]: OTA version: 2.
[16:22:13][C][api:139]: API Server:
[16:22:13][C][api:140]: Address: esp32-va-001.local:6053
[16:22:13][C][api:142]: Using noise encryption: YES
[16:22:17][D][voice_assistant:521]: Event Type: 0
[16:22:17][D][voice_assistant:521]: Event Type: 2
[16:22:17][D][voice_assistant:611]: Assist Pipeline ended
[16:22:17][D][voice_assistant:414]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[16:22:17][D][voice_assistant:420]: Desired state set to WAITING_FOR_VAD
[16:22:17][D][voice_assistant:172]: Waiting for speech...
[16:22:17][D][voice_assistant:414]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[16:22:17][D][voice_assistant:185]: VAD detected speech
[16:22:17][D][voice_assistant:414]: State changed from WAITING_FOR_VAD to START_PIPELINE
[16:22:17][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[16:22:17][D][voice_assistant:202]: Requesting start...
[16:22:17][D][voice_assistant:414]: State changed from START_PIPELINE to STARTING_PIPELINE
[16:22:17][D][voice_assistant:435]: Client started, streaming microphone
[16:22:17][D][voice_assistant:414]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[16:22:17][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[16:22:17][D][voice_assistant:521]: Event Type: 1
[16:22:17][D][voice_assistant:524]: Assist Pipeline running
[16:22:18][D][voice_assistant:521]: Event Type: 9
[16:22:22][D][voice_assistant:521]: Event Type: 0
[16:22:22][D][voice_assistant:521]: Event Type: 2
[16:22:22][D][voice_assistant:611]: Assist Pipeline ended
[16:22:22][D][voice_assistant:414]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[16:22:22][D][voice_assistant:420]: Desired state set to WAITING_FOR_VAD
[16:22:22][D][voice_assistant:172]: Waiting for speech...
[16:22:22][D][voice_assistant:414]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[16:22:22][D][voice_assistant:185]: VAD detected speech
[16:22:22][D][voice_assistant:414]: State changed from WAITING_FOR_VAD to START_PIPELINE
[16:22:22][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[16:22:22][D][voice_assistant:202]: Requesting start...
[16:22:22][D][voice_assistant:414]: State changed from START_PIPELINE to STARTING_PIPELINE
[16:22:22][D][voice_assistant:435]: Client started, streaming microphone
[16:22:22][D][voice_assistant:414]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[16:22:22][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[16:22:23][D][voice_assistant:521]: Event Type: 1
[16:22:23][D][voice_assistant:524]: Assist Pipeline running
[16:22:23][D][voice_assistant:521]: Event Type: 9
[16:22:32][D][voice_assistant:521]: Event Type: 0
[16:22:32][D][voice_assistant:521]: Event Type: 2
[16:22:32][D][voice_assistant:611]: Assist Pipeline ended
[16:22:32][D][voice_assistant:414]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[16:22:32][D][voice_assistant:420]: Desired state set to WAITING_FOR_VAD
[16:22:33][D][voice_assistant:172]: Waiting for speech...
[16:22:33][D][voice_assistant:414]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[16:22:33][D][voice_assistant:185]: VAD detected speech
[16:22:33][D][voice_assistant:414]: State changed from WAITING_FOR_VAD to START_PIPELINE
[16:22:33][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[16:22:33][D][voice_assistant:202]: Requesting start...
[16:22:33][D][voice_assistant:414]: State changed from START_PIPELINE to STARTING_PIPELINE
[16:22:33][D][voice_assistant:435]: Client started, streaming microphone
[16:22:33][D][voice_assistant:414]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[16:22:33][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[16:22:33][D][voice_assistant:521]: Event Type: 1
[16:22:33][D][voice_assistant:524]: Assist Pipeline running
[16:22:33][D][voice_assistant:521]: Event Type: 9
[16:22:38][D][voice_assistant:521]: Event Type: 0
[16:22:38][D][voice_assistant:521]: Event Type: 2
[16:22:38][D][voice_assistant:611]: Assist Pipeline ended
[16:22:38][D][voice_assistant:414]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[16:22:38][D][voice_assistant:420]: Desired state set to WAITING_FOR_VAD
[16:22:38][D][voice_assistant:172]: Waiting for speech...
[16:22:38][D][voice_assistant:414]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[16:22:38][D][voice_assistant:185]: VAD detected speech
[16:22:38][D][voice_assistant:414]: State changed from WAITING_FOR_VAD to START_PIPELINE
[16:22:38][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[16:22:38][D][voice_assistant:202]: Requesting start...
[16:22:38][D][voice_assistant:414]: State changed from START_PIPELINE to STARTING_PIPELINE
[16:22:38][D][voice_assistant:435]: Client started, streaming microphone
[16:22:38][D][voice_assistant:414]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[16:22:38][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[16:22:38][D][voice_assistant:521]: Event Type: 1
[16:22:38][D][voice_assistant:524]: Assist Pipeline running
[16:22:38][D][voice_assistant:521]: Event Type: 9
Have you tried a different mic?
I can tell you from experience, that these mics can be damaged very easily while soldering.
Out of 10 I bought, I wound up with 6 usable ones.