I’ll be honest, I took a day off from fighting with it yesterday. Integrating the changes supplied by @bkprath and @robgough1970 I would hope for a more stable experience with this board. I think the idea of stopping and restarting the voice_assistant is a good move, so long as it doesn’t introduce long delays.
Going to add wires soldered to the D+ and D- signals on that Onn power/USB board, specifically to enable data connections using the external USB-C connector. Ordered a few of these USB-C connectors to use.
There is a delay between completion of response and start of detection of the next wake word, as a lot of times I have to say the wake word twice. However, I do not know if that was the result of my change or just the way it would work. A lot of times I have to say the wake word twice when initiating the first command. For me stability was a big issue. With the change I don’t have any lost in space moments, the ones that required restart of the board to get things going again. And audio is a lot better.
Starting and stopping the voice assistant was the issue I was having with the S3 boards. They would work at first, then just not respond after a bit of time.
I removed that code from one of them and left it in the other last night. This morning, the one I removed that code from responded and the one that had the code did not.
This doesn’t seem to affect the other boards but I removed the code anyway.
Here is a vid of me showing the 4 I have built right after each other. Got a bit of choppiness in one. I talk a bit about what I did that I think helps that speaker buffer issue.
Nice video. You mention that you turn off wake word detection when you’re out of the room. Turning off/on wake word detection would actually be similar to my cycling the ESP voice assistant code during command processing. Since I’ve had they system hang problems with the other platforms people are using it makes sense that some network latency could be what triggers the issue within the ESP voice assistant code. I’m going to see if I get better performance simply by putting my one unit closer to my WiFi router. Which ESP32S3 are you using, I looked into the code and M5StampS3 I tried to use are not supported by the code base?
Thanks for posting the responses. I’m trying to get the localai set up now. It is an impressive setup. I’m also looking to see if shifting traffic in my network has an effect. I pulled my ESP32-S3-BOX back out to refresh myself on why I shelved it. Turn out it has both the stutter issue and the lost in space issue. I gave up before because the lost in space issue makes it a useless device. I just made similar changes to the code for it as I did to the code you posted to see if that helps with the issues. The code is a little different because it has the initial wake word detection onboard the ESP. I much prefer your simple LED feedback to the images they present on the S3-BOX. Thanks again.
For me the microphone is the issue. I can’t get any input.
Debug log:
INFO ESPHome 2024.2.0
INFO Reading configuration /config/esphome/esp32-va-001.yaml...
INFO Updating https://github.com/esphome/esphome.git@pull/5230/head
WARNING GPIO3 is a strapping PIN and should only be used for I/O with care.
Attaching external pullup/down resistors to strapping pins can cause unexpected failures.
See https://esphome.io/guides/faq.html#why-am-i-getting-a-warning-about-strapping-pins
INFO Starting log output from 192.168.207.241 using esphome API
INFO Successfully connected to esp32-va-001 @ 192.168.207.241 in 0.191s
INFO Successful handshake with esp32-va-001 @ 192.168.207.241 in 0.310s
[16:22:13][I][app:102]: ESPHome version 2024.2.0 compiled on Feb 27 2024, 16:19:54
[16:22:13][C][wifi:577]: WiFi:
[16:22:13][C][wifi:409]: Local MAC: 3C:84:27:14:86:7C
[16:22:13][C][wifi:414]: SSID: [redacted]
[16:22:13][C][wifi:415]: IP Address: 192.168.207.241
[16:22:13][C][wifi:417]: BSSID: [redacted]
[16:22:13][C][wifi:418]: Hostname: 'esp32-va-001'
[16:22:13][C][wifi:420]: Signal strength: -56 dB ▂▄▆█
[16:22:13][C][wifi:424]: Channel: 11
[16:22:13][C][wifi:425]: Subnet: 255.255.255.0
[16:22:13][C][wifi:426]: Gateway: 192.168.207.1
[16:22:13][C][wifi:427]: DNS1: 192.168.207.130
[16:22:13][C][wifi:428]: DNS2: 0.0.0.0
[16:22:13][C][logger:447]: Logger:
[16:22:13][C][logger:448]: Level: DEBUG
[16:22:13][C][logger:449]: Log Baud Rate: 115200
[16:22:13][C][logger:451]: Hardware UART: USB_SERIAL_JTAG
[16:22:13][C][esp32_rmt_led_strip:175]: ESP32 RMT LED Strip:
[16:22:13][C][esp32_rmt_led_strip:176]: Pin: 18
[16:22:13][C][esp32_rmt_led_strip:177]: Channel: 0
[16:22:13][C][esp32_rmt_led_strip:202]: RGB Order: GRB
[16:22:13][C][esp32_rmt_led_strip:203]: Max refresh rate: 0
[16:22:13][C][esp32_rmt_led_strip:204]: Number of LEDs: 8
[16:22:13][C][light:103]: Light 'Status LED'
[16:22:13][C][light:105]: Default Transition Length: 0.0s
[16:22:13][C][light:106]: Gamma Correct: 2.80
[16:22:13][C][template.switch:068]: Template Switch 'Use Wake Word'
[16:22:13][C][template.switch:091]: Restore Mode: restore defaults to ON
[16:22:13][C][template.switch:057]: Optimistic: YES
[16:22:13][C][psram:020]: PSRAM:
[16:22:13][C][psram:021]: Available: NO
[16:22:13][C][mdns:115]: mDNS:
[16:22:13][C][mdns:116]: Hostname: esp32-va-001
[16:22:13][C][ota:096]: Over-The-Air Updates:
[16:22:13][C][ota:097]: Address: esp32-va-001.local:3232
[16:22:13][C][ota:100]: Using Password.
[16:22:13][C][ota:103]: OTA version: 2.
[16:22:13][C][api:139]: API Server:
[16:22:13][C][api:140]: Address: esp32-va-001.local:6053
[16:22:13][C][api:142]: Using noise encryption: YES
[16:22:17][D][voice_assistant:521]: Event Type: 0
[16:22:17][D][voice_assistant:521]: Event Type: 2
[16:22:17][D][voice_assistant:611]: Assist Pipeline ended
[16:22:17][D][voice_assistant:414]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[16:22:17][D][voice_assistant:420]: Desired state set to WAITING_FOR_VAD
[16:22:17][D][voice_assistant:172]: Waiting for speech...
[16:22:17][D][voice_assistant:414]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[16:22:17][D][voice_assistant:185]: VAD detected speech
[16:22:17][D][voice_assistant:414]: State changed from WAITING_FOR_VAD to START_PIPELINE
[16:22:17][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[16:22:17][D][voice_assistant:202]: Requesting start...
[16:22:17][D][voice_assistant:414]: State changed from START_PIPELINE to STARTING_PIPELINE
[16:22:17][D][voice_assistant:435]: Client started, streaming microphone
[16:22:17][D][voice_assistant:414]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[16:22:17][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[16:22:17][D][voice_assistant:521]: Event Type: 1
[16:22:17][D][voice_assistant:524]: Assist Pipeline running
[16:22:18][D][voice_assistant:521]: Event Type: 9
[16:22:22][D][voice_assistant:521]: Event Type: 0
[16:22:22][D][voice_assistant:521]: Event Type: 2
[16:22:22][D][voice_assistant:611]: Assist Pipeline ended
[16:22:22][D][voice_assistant:414]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[16:22:22][D][voice_assistant:420]: Desired state set to WAITING_FOR_VAD
[16:22:22][D][voice_assistant:172]: Waiting for speech...
[16:22:22][D][voice_assistant:414]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[16:22:22][D][voice_assistant:185]: VAD detected speech
[16:22:22][D][voice_assistant:414]: State changed from WAITING_FOR_VAD to START_PIPELINE
[16:22:22][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[16:22:22][D][voice_assistant:202]: Requesting start...
[16:22:22][D][voice_assistant:414]: State changed from START_PIPELINE to STARTING_PIPELINE
[16:22:22][D][voice_assistant:435]: Client started, streaming microphone
[16:22:22][D][voice_assistant:414]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[16:22:22][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[16:22:23][D][voice_assistant:521]: Event Type: 1
[16:22:23][D][voice_assistant:524]: Assist Pipeline running
[16:22:23][D][voice_assistant:521]: Event Type: 9
[16:22:32][D][voice_assistant:521]: Event Type: 0
[16:22:32][D][voice_assistant:521]: Event Type: 2
[16:22:32][D][voice_assistant:611]: Assist Pipeline ended
[16:22:32][D][voice_assistant:414]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[16:22:32][D][voice_assistant:420]: Desired state set to WAITING_FOR_VAD
[16:22:33][D][voice_assistant:172]: Waiting for speech...
[16:22:33][D][voice_assistant:414]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[16:22:33][D][voice_assistant:185]: VAD detected speech
[16:22:33][D][voice_assistant:414]: State changed from WAITING_FOR_VAD to START_PIPELINE
[16:22:33][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[16:22:33][D][voice_assistant:202]: Requesting start...
[16:22:33][D][voice_assistant:414]: State changed from START_PIPELINE to STARTING_PIPELINE
[16:22:33][D][voice_assistant:435]: Client started, streaming microphone
[16:22:33][D][voice_assistant:414]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[16:22:33][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[16:22:33][D][voice_assistant:521]: Event Type: 1
[16:22:33][D][voice_assistant:524]: Assist Pipeline running
[16:22:33][D][voice_assistant:521]: Event Type: 9
[16:22:38][D][voice_assistant:521]: Event Type: 0
[16:22:38][D][voice_assistant:521]: Event Type: 2
[16:22:38][D][voice_assistant:611]: Assist Pipeline ended
[16:22:38][D][voice_assistant:414]: State changed from STREAMING_MICROPHONE to WAIT_FOR_VAD
[16:22:38][D][voice_assistant:420]: Desired state set to WAITING_FOR_VAD
[16:22:38][D][voice_assistant:172]: Waiting for speech...
[16:22:38][D][voice_assistant:414]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[16:22:38][D][voice_assistant:185]: VAD detected speech
[16:22:38][D][voice_assistant:414]: State changed from WAITING_FOR_VAD to START_PIPELINE
[16:22:38][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[16:22:38][D][voice_assistant:202]: Requesting start...
[16:22:38][D][voice_assistant:414]: State changed from START_PIPELINE to STARTING_PIPELINE
[16:22:38][D][voice_assistant:435]: Client started, streaming microphone
[16:22:38][D][voice_assistant:414]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[16:22:38][D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[16:22:38][D][voice_assistant:521]: Event Type: 1
[16:22:38][D][voice_assistant:524]: Assist Pipeline running
[16:22:38][D][voice_assistant:521]: Event Type: 9
Have you tried a different mic?
I can tell you from experience, that these mics can be damaged very easily while soldering.
Out of 10 I bought, I wound up with 6 usable ones.
I don’t know how, but after erasing the build files and recompling and flashing, it works (somewhat).
I do get many errors:
[16:39:03][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:03][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:03][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:521]: Event Type: 99
[16:39:04][D][voice_assistant:670]: TTS stream end
[16:39:04][D][voice_assistant:285]: End of audio stream received
[16:39:04][D][voice_assistant:414]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[16:39:04][D][voice_assistant:420]: Desired state set to RESPONSE_FINISHED
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
[16:39:04][D][voice_assistant:349]: Speaker buffer full, trying again next loop
Sometimes that error will happen with these boards the first couple of times…
What is your HA server on? Hardware wise…
Watch the last video I posted in this thread. I talk about a couple things I think helped with that error.
I’m beginning to wonder about the particular ESP32-S3 board I’m using for my testbed. It seems the Voice Assistant code goes dormant for long periods or is waiting/looping an excessively long time. When the microphone works, it works well. When it doesn’t, it’s usually because the ESP32-S3 isn’t expecting it. The funny thing is, I shifted away from hard-wired/soldered connections to pins and very short jumper wires, specifically to obtain more flexibility/reliability. I have a few more ESP32-S3 boards, and a few more INMP441 microphones. I’ll definitely be trying with them today, and turning on VERY_VERBOSE logging if necessary.
All that being said, my summary opinion is: this Onn speaker makes a GREAT voice assistant presentation. The LEDs appear clearly through the grille mesh, the MIC works well through the grille mesh, and the ESP32 and amplifier boards tuck easily and neatly inside the case. After it’s connected/configured and the ESPhome Voice Assistant code is running reliably, it’s a FANTASTIC project.
What I’m observing after switching to a different microphone board (same manufacturer) is no different than before. It appears Voice_Assistant is executing a loop infinitely. No actual wake word detection is occurring. The following block repeats continuously:
[D][voice_assistant:435]: Client started, streaming microphone
[D][voice_assistant:414]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[D][voice_assistant:420]: Desired state set to STREAMING_MICROPHONE
[D][voice_assistant:521]: Event Type: 1
[D][voice_assistant:524]: Assist Pipeline running
[D][voice_assistant:521]: Event Type: 9
[D][voice_assistant:521]: Event Type: 0
[D][voice_assistant:521]: Event Type: 2
[D][voice_assistant:611]: Assist Pipeline ended
I’m going to double-check my connections and pin assignments, even with a digital meter to insure connectivity. Then moving to a fresh ESP32-S3 board.
I’m dead in the water since switching to a different ESP32-S3 board. It refuses to connect to Wifi, or the Asus ZenWifi guest network is refusing its connections. Either way, it won’t connect.
The blame for the aggravation rests solely with the module/board/ESP32 manufacturer. Two of three ESP32-S3-WROOM-1 boards in a single purchase were BAD. The first would not register correctly with USB, failing to provide a valid Device Descriptor. The second would not connect via WiFi no matter what was attempted. Only the third would actually load and connect. Jarvis is behaving properly for the first time (for me) ever.
Now, if only Voice_Assistant wouldn’t say this when nobody is speaking:
[06:11:35][D][voice_assistant:185]: VAD detected speech
All is working as expected except the speaker just crackles, occasionally it has a bit that recognisable but not often. Mostly on the longer responses that last word is legible. The actions are completed as requested 95% of the time. this is using the esp on board wake word detection.
This is my code if anyone has any suggestions for ending the crackling I would be grateful.