Maybe post the log or describe the problem? Can’t help otherwise
Thanks. That did the trick
The reason why ESP32-S3-BOX-3 and the code i adapted for the balls do not use the onboard ws2812 is that they got displays to do the indicating of modes, I did however update my code with:
- switch for the LED so it can be used from automations (alerts).
- added manual stop/start to the back button.
- removed the top text line (question) when replying, enough it shows when thinking.
- some new graphics linked in top of code.
v1: PasteZen - Share Code Effortlessly
v2; PasteZen - Share Code Effortlessly
link for alternative graphic is in top of each yaml.
to use the alternative graphics, uncomment the one of choice in yaml and comment out the original + copy the files to the path in esphome.
oooh I like that
NICE Project!
Actually THAT IS EPIC!!
Hi, I have a v1 Version and installed this code to it.
Its working fine with Home Assistant Cloud an wakeword Okay Nabu.
If i change the activation word or assistant the ball is not reacting at all even after reboot.
Do you know whats wrong ?
any logs?
I can’t see anything off the top of my head but am running V2 but shouldn’t make a difference
Hi there,
I think I may have overestimated my abilities with this ‘xiaozhi’.
I have the version without a stand, with “en-5” written underneath. So I tried installing V1 (also gave V2 and some other versions I found a shot), but no luck — the screen just stays black.
Here’s the log at the end of the V1 install,
INFO Successfully compiled program.
INFO Connecting to 192.168.1.128 port 3232...
INFO Connected to 192.168.1.128
INFO Uploading /data/build/esphome-web-61a85c/.pioenvs/esphome-web-61a85c/firmware.bin (3324000 bytes)
ERROR Error binary size: Error: The OTA partition on the ESP is too small. ESPHome needs to resize this partition, please flash over USB.
and the one after reboot.
INFO ESPHome 2025.5.2
INFO Reading configuration /config/esphome/esphome-web-61a85c.yaml...
WARNING GPIO0 is a strapping PIN and should only be used for I/O with care.
Attaching external pullup/down resistors to strapping pins can cause unexpected failures.
See https://esphome.io/guides/faq.html#why-am-i-getting-a-warning-about-strapping-pins
WARNING GPIO3 is a strapping PIN and should only be used for I/O with care.
Attaching external pullup/down resistors to strapping pins can cause unexpected failures.
See https://esphome.io/guides/faq.html#why-am-i-getting-a-warning-about-strapping-pins
INFO Starting log output from 192.168.1.128 using esphome API
INFO Successfully connected to esphome-web-61a85c @ 192.168.1.128 in 0.081s
INFO Successful handshake with esphome-web-61a85c @ 192.168.1.128 in 0.015s
[07:36:50][I][app:100]: ESPHome version 2025.4.0 compiled on Apr 28 2025, 22:41:43
[07:36:50][I][app:102]: Project esphome.web version 25.4.1
[07:36:50][C][wifi:600]: WiFi:
[07:36:50][C][wifi:428]: Local MAC: 98:3D:AE:61:A8:5C
[07:36:50][C][wifi:433]: SSID: [redacted]
[07:36:50][C][wifi:436]: IP Address: 192.168.1.128
[07:36:50][C][wifi:439]: BSSID: [redacted]
[07:36:50][C][wifi:441]: Hostname: 'esphome-web-61a85c'
[07:36:50][C][wifi:443]: Signal strength: -61 dB ▂▄▆█
[07:36:50][C][wifi:447]: Channel: 1
[07:36:50][C][wifi:448]: Subnet: 255.255.255.0
[07:36:50][C][wifi:449]: Gateway: 192.168.1.254
[07:36:50][C][wifi:450]: DNS1: 8.8.8.8
[07:36:50][C][wifi:451]: DNS2: 0.0.0.0
[07:36:50][C][logger:177]: Logger:
[07:36:50][C][logger:178]: Max Level: DEBUG
[07:36:50][C][logger:179]: Initial Level: DEBUG
[07:36:50][C][logger:181]: Log Baud Rate: 115200
[07:36:50][C][logger:182]: Hardware UART: USB_SERIAL_JTAG
[07:36:50][C][esp32_ble:418]: ESP32 BLE:
[07:36:50][C][esp32_ble:419]: MAC address: 98:3D:AE:61:A8:5E
[07:36:50][C][esp32_ble:421]: IO Capability: none
[07:36:50][C][captive_portal:089]: Captive Portal:
[07:37:28][I][safe_mode:041]: Boot seems successful; resetting boot loop counter
[07:37:28][D][esp32.preferences:114]: Saving 1 preferences to flash...
[07:37:28][D][esp32.preferences:142]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed
Well, after struggling and trying a bunch of different codes, I ended up flashing via USB instead of wireless — and it work
hehe, I was going to say use USB for the first time
Hi, i tested → HA Cloud, Okay Nabu, switch on Light- OK
Changed to Hey Jarvis → No reaction
changed back to Okay Nabu - OK
changed to Open AI - > no reaction
changed back to HA Cloud → no reaction
A little addition: changing to Hey Mycroft works, just hey Jarvis not.
I would really like to use chatgpt. Wakeword is not important.
Blockquote
INFO ESPHome 2025.5.2
INFO Reading configuration /config/esphome/esphome-web-303c68.yaml…
WARNING GPIO0 is a strapping PIN and should only be used for I/O with care.
Attaching external pullup/down resistors to strapping pins can cause unexpected failures.
See Frequently Asked Questions — ESPHome
WARNING GPIO3 is a strapping PIN and should only be used for I/O with care.
Attaching external pullup/down resistors to strapping pins can cause unexpected failures.
See Frequently Asked Questions — ESPHome
INFO Starting log output from 192.168.0.xxx using esphome API
INFO Successfully connected to esphome-web-303c68 @ 192.168.0.xxx in 0.086s
INFO Successful handshake with esphome-web-303c68 @ 192.168.0.xxx in 0.231s
[10:45:13][I][app:115]: ESPHome version 2025.5.2 compiled on Jun 11 2025, 12:27:10
[10:45:13][C][wifi:600]: WiFi:
[10:45:13][C][wifi:428]: Local MAC: 94:A9:90:30:3C:68
[10:45:13][C][wifi:433]: SSID: ‘XXXXXX’[redacted]
[10:45:13][C][wifi:436]: IP Address: 192.168.0.xxx
[10:45:13][C][wifi:439]: BSSID: F6:92:BF:98:96:F5[redacted]
[10:45:13][C][wifi:441]: Hostname: ‘esphome-web-303c68’
[10:45:13][C][wifi:443]: Signal strength: -52 dB ▂▄▆█
[10:45:13][C][wifi:447]: Channel: 11
[10:45:13][C][wifi:448]: Subnet: 255.255.255.0
[10:45:13][C][wifi:449]: Gateway: 192.168.0.xxx
[10:45:13][C][wifi:450]: DNS1: 192.168.0.xxx
[10:45:13][C][wifi:451]: DNS2: 0.0.0.0
[10:45:13][C][logger:224]: Logger:
[10:45:13][C][logger:225]: Max Level: DEBUG
[10:45:13][C][logger:226]: Initial Level: DEBUG
[10:45:13][C][logger:228]: Log Baud Rate: 115200
[10:45:13][C][logger:229]: Hardware UART: USB_SERIAL_JTAG
[10:45:13][C][logger:233]: Task Log Buffer Size: 768
[10:45:13][C][spi:068]: SPI bus:
[10:45:13][C][spi:069]: CLK Pin: GPIO14
[10:45:13][C][spi:070]: SDI Pin:
[10:45:13][C][spi:071]: SDO Pin: GPIO17
[10:45:13][C][spi:076]: Using HW SPI: SPI2_HOST
[10:45:13][C][ledc.output:180]: LEDC Output:
[10:45:13][C][ledc.output:181]: Pin GPIO3
[10:45:13][C][ledc.output:182]: LEDC Channel: 0
[10:45:13][C][ledc.output:183]: PWM Frequency: 1000.0 Hz
[10:45:13][C][ledc.output:184]: Phase angle: 0.0°
[10:45:13][C][ledc.output:185]: Bit depth: 14
[10:45:13][C][esp32_rmt_led_strip:250]: ESP32 RMT LED Strip:
[10:45:13][C][esp32_rmt_led_strip:251]: Pin: 48
[10:45:13][C][esp32_rmt_led_strip:253]: RMT Symbols: 192
[10:45:13][C][esp32_rmt_led_strip:281]: RGB Order: GRB
[10:45:13][C][esp32_rmt_led_strip:282]: Max refresh rate: 0
[10:45:13][C][esp32_rmt_led_strip:283]: Number of LEDs: 1
[10:45:13][C][template.select:065]: Template Select ‘Wake word engine location’
[10:45:13][C][template.select:065]: Icon: ‘mdi:account-voice’
[10:45:13][C][template.select:066]: Update Interval: 60.0s
[10:45:13][C][template.select:069]: Optimistic: YES
[10:45:13][C][template.select:070]: Initial Option: On device
[10:45:13][C][template.select:071]: Restore Value: YES
[10:45:13][C][template.text_sensor:020]: Template Sensor ‘text_request’
[10:45:13][C][template.text_sensor:020]: Template Sensor ‘text_response’
[10:45:13][C][ili9xxx:091]: ili9xxx
[10:45:13][C][ili9xxx:091]: Rotations: 0 °
[10:45:13][C][ili9xxx:091]: Dimensions: 240px x 240px
[10:45:13][C][ili9xxx:092]: Width Offset: 0
[10:45:13][C][ili9xxx:093]: Height Offset: 0
[10:45:13][C][ili9xxx:099]: Color mode: 16bit
[10:45:13][C][ili9xxx:108]: Data rate: 40MHz
[10:45:13][C][ili9xxx:110]: Reset Pin: GPIO18
[10:45:13][C][ili9xxx:111]: CS Pin: GPIO13
[10:45:13][C][ili9xxx:112]: DC Pin: GPIO10
[10:45:13][C][ili9xxx:114]: Color order: BGR
[10:45:13][C][ili9xxx:115]: Swap_xy: NO
[10:45:13][C][ili9xxx:116]: Mirror_x: YES
[10:45:13][C][ili9xxx:117]: Mirror_y: NO
[10:45:13][C][ili9xxx:118]: Invert colors: YES
[10:45:13][C][ili9xxx:123]: Update Interval: never
[10:45:13][C][gpio.binary_sensor:015]: GPIO Binary Sensor ‘left_top_button’
[10:45:13][C][gpio.binary_sensor:016]: Pin: GPIO0
[10:45:13][C][light:092]: Light ‘Screen’
[10:45:13][C][light:094]: Default Transition Length: 0.2s
[10:45:13][C][light:095]: Gamma Correct: 2.80
[10:45:13][C][light:092]: Light ‘Xiaozhi Ball’
[10:45:13][C][light:094]: Default Transition Length: 0.0s
[10:45:13][C][light:095]: Gamma Correct: 2.80
[10:45:13][C][template.switch:068]: Template Switch ‘Mute’
[10:45:13][C][template.switch:070]: Icon: ‘mdi:microphone-off’
[10:45:13][C][template.switch:090]: Restore Mode: restore defaults to OFF
[10:45:13][C][template.switch:057]: Optimistic: YES
[10:45:13][C][template.switch:068]: Template Switch ‘timer_ringing’
[10:45:13][C][template.switch:090]: Restore Mode: always OFF
[10:45:13][C][template.switch:057]: Optimistic: YES
[10:45:13][C][psram:018]: PSRAM:
[10:45:13][C][psram:022]: Available: YES
[10:45:13][C][psram:024]: Size: 8192 KB
[10:45:13][C][factory_reset.button:011]: Factory Reset Button ‘factory_reset_btn’
[10:45:13][C][factory_reset.button:011]: Icon: ‘mdi:restart-alert’
[10:45:13][C][captive_portal:089]: Captive Portal:
[10:45:13][C][web_server:285]: Web Server:
[10:45:13][C][web_server:286]: Address: esphome-web-303c68.local:80
[10:45:13][C][mdns:120]: mDNS:
[10:45:13][C][mdns:121]: Hostname: esphome-web-303c68
[10:45:13][C][esphome.ota:073]: Over-The-Air updates:
[10:45:13][C][esphome.ota:074]: Address: esphome-web-303c68.local:3232
[10:45:13][C][esphome.ota:075]: Version: 2
[10:45:13][C][safe_mode:018]: Safe Mode:
[10:45:13][C][safe_mode:019]: Boot considered successful after 60 seconds
[10:45:13][C][safe_mode:021]: Invoke after 10 boot attempts
[10:45:13][C][safe_mode:022]: Remain in safe mode for 300 seconds
[10:45:13][C][api:170]: API Server:
[10:45:13][C][api:171]: Address: esphome-web-303c68.local:6053
[10:45:13][C][api:178]: Using noise encryption: NO
[10:45:13][C][micro_wake_word:064]: microWakeWord:
[10:45:13][C][micro_wake_word:065]: models:
[10:45:13][C][micro_wake_word:014]: - Wake Word: Okay Nabu
[10:45:13][C][micro_wake_word:015]: Probability cutoff: 0.97
[10:45:13][C][micro_wake_word:016]: Sliding window size: 5
[10:45:13][C][micro_wake_word:014]: - Wake Word: Hey Mycroft
[10:45:13][C][micro_wake_word:015]: Probability cutoff: 0.95
[10:45:13][C][micro_wake_word:016]: Sliding window size: 5
[10:45:13][C][micro_wake_word:014]: - Wake Word: Hey Jarvis
[10:45:13][C][micro_wake_word:015]: Probability cutoff: 0.97
[10:45:13][C][micro_wake_word:016]: Sliding window size: 5
[10:45:30][D][micro_wake_word:325]: Detected ‘Okay Nabu’ with sliding average probability is 0.99 and max probability is 1.00
[10:45:30][D][voice_assistant:456]: State changed from IDLE to START_MICROPHONE
[10:45:30][D][voice_assistant:463]: Desired state set to START_PIPELINE
[10:45:30][D][micro_wake_word:370]: Stopping wake word detection
[10:45:30][D][voice_assistant:186]: Starting Microphone
[10:45:30][D][ring_buffer:034]: Created ring buffer with size 16384
[10:45:30][D][voice_assistant:456]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[10:45:30][D][micro_wake_word:378]: State changed from DETECTING_WAKE_WORD to STOPPING
[10:45:30][D][voice_assistant:456]: State changed from STARTING_MICROPHONE to START_PIPELINE
[10:45:30][D][micro_wake_word:273]: Inference task is stopping, deallocating buffers
[10:45:30][D][micro_wake_word:278]: Inference task is finished, freeing task resources
[10:45:30][D][micro_wake_word:378]: State changed from STOPPING to STOPPED
[10:45:30][D][voice_assistant:207]: Requesting start…
[10:45:30][D][voice_assistant:456]: State changed from START_PIPELINE to STARTING_PIPELINE
[10:45:30][D][voice_assistant:478]: Client started, streaming microphone
[10:45:30][D][voice_assistant:456]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[10:45:30][D][voice_assistant:463]: Desired state set to STREAMING_MICROPHONE
[10:45:30][D][voice_assistant:598]: Event Type: 1
[10:45:30][D][voice_assistant:601]: Assist Pipeline running
[10:45:30][D][voice_assistant:598]: Event Type: 3
[10:45:30][D][voice_assistant:612]: STT started
[10:45:30][D][text_sensor:064]: ‘text_request’: Sending state ‘…’
[10:45:30][D][text_sensor:064]: ‘text_response’: Sending state ‘…’
[10:45:30][D][light:036]: ‘Screen’ Setting:
[10:45:30][D][light:047]: State: ON
[10:45:30][D][light:085]: Transition length: 0.2s
[10:45:35][D][voice_assistant:598]: Event Type: 11
[10:45:35][D][voice_assistant:758]: Starting STT by VAD
[10:45:37][D][voice_assistant:598]: Event Type: 12
[10:45:37][D][voice_assistant:762]: STT by VAD end
[10:45:37][D][voice_assistant:456]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[10:45:37][D][voice_assistant:463]: Desired state set to AWAITING_RESPONSE
[10:45:37][D][voice_assistant:456]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[10:45:37][D][voice_assistant:598]: Event Type: 4
[10:45:37][D][voice_assistant:626]: Speech recognised as: “Schalte Regalbeleuchtung ein.”
[10:45:37][D][voice_assistant:456]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[10:45:37][D][text_sensor:064]: ‘text_request’: Sending state ‘Schalte Regalbeleuchtung ein.’
[10:45:37][D][i2s_audio.microphone:443]: Task finished, freeing resources and uninstalling I2S driver
[10:45:37][D][voice_assistant:598]: Event Type: 5
[10:45:37][D][voice_assistant:631]: Intent started
[10:45:37][D][voice_assistant:598]: Event Type: 6
[10:45:37][D][voice_assistant:598]: Event Type: 7
[10:45:37][D][voice_assistant:656]: Response: “Regalbeleuchtung eingeschaltet”
[10:45:37][D][text_sensor:064]: ‘text_response’: Sending state ‘Regalbeleuchtung eingeschaltet’
[10:45:37][D][voice_assistant:598]: Event Type: 8
[10:45:37][D][voice_assistant:678]: Response URL: “http://192.168.0.xxx:8123/api/tts_proxy/m1x7OWK813i1kNCEssIPOg.flac”
[10:45:37][D][voice_assistant:456]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[10:45:37][D][voice_assistant:463]: Desired state set to STREAMING_RESPONSE
[10:45:37][D][media_player:074]: ‘Xiaozhi Ball’ - Setting
[10:45:37][D][media_player:081]: Media URL: http://192.168.0.xxx:8123/api/tts_proxy/m1x7OWK813i1kNCEssIPOg.flac
[10:45:37][D][media_player:087]: Announcement: yes
[10:45:37][D][speaker_media_player:408]: State changed to ANNOUNCING
[10:45:37][D][voice_assistant:598]: Event Type: 2
[10:45:37][D][voice_assistant:697]: Assist Pipeline ended
[10:45:38][D][ring_buffer:034][ann_read]: Created ring buffer with size 1000000
[10:45:38][D][speaker_media_player.pipeline:114]: Reading FLAC file type
[10:45:38][D][speaker_media_player.pipeline:124]: Decoded audio has 1 channels, 48000 Hz sample rate, and 16 bits per sample
[10:45:38][D][ring_buffer:034][speaker_task]: Created ring buffer with size 96000
[10:45:38][D][i2s_audio.speaker:117]: Starting Speaker
[10:45:38][D][i2s_audio.speaker:122]: Started Speaker
[10:45:40][D][speaker_media_player:408]: State changed to IDLE
[10:45:40][D][voice_assistant:329]: Announcement finished playing
[10:45:40][D][voice_assistant:456]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[10:45:40][D][voice_assistant:463]: Desired state set to RESPONSE_FINISHED
[10:45:40][D][voice_assistant:456]: State changed from RESPONSE_FINISHED to IDLE
[10:45:40][D][voice_assistant:463]: Desired state set to IDLE
[10:45:41][D][i2s_audio.speaker:129]: Stopping Speaker
[10:45:41][D][i2s_audio.speaker:135]: Stopped Speaker
[10:45:41][D][micro_wake_word:360]: Starting wake word detection
[10:45:42][D][text_sensor:064]: ‘text_request’: Sending state ‘’
[10:45:42][D][text_sensor:064]: ‘text_response’: Sending state ‘’
[10:45:42][D][micro_wake_word:378]: State changed from STOPPED to STARTING
[10:45:42][D][ring_buffer:034][mww]: Created ring buffer with size 3840
[10:45:42][D][micro_wake_word:261]: Inference task has started, attempting to allocate memory for buffers
[10:45:42][D][micro_wake_word:266]: Inference task is running
[10:45:42][D][micro_wake_word:378]: State changed from STARTING to DETECTING_WAKE_WORD
[10:45:42][D][i2s_audio.microphone:431]: Task started, attempting to allocate buffer
[10:45:42][D][i2s_audio.microphone:436]: Task is running and reading data
[10:45:44][D][esp32.preferences:114]: Saving 1 preferences to flash…
[10:45:44][D][esp32.preferences:142]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed
[10:45:53][D][voice_assistant:885]: Enabled wake word: Hey Jarvis (id=micro_wake_word_wakewordmodel_id_3)
[10:46:14][D][voice_assistant:885]: Enabled wake word: Okay Nabu (id=micro_wake_word_wakewordmodel_id)
[10:46:18][D][micro_wake_word:325]: Detected ‘Okay Nabu’ with sliding average probability is 0.98 and max probability is 1.00
[10:46:18][D][voice_assistant:456]: State changed from IDLE to START_MICROPHONE
[10:46:18][D][voice_assistant:463]: Desired state set to START_PIPELINE
[10:46:18][D][micro_wake_word:370]: Stopping wake word detection
[10:46:18][D][voice_assistant:186]: Starting Microphone
[10:46:18][D][ring_buffer:034]: Created ring buffer with size 16384
[10:46:18][D][voice_assistant:456]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[10:46:18][D][micro_wake_word:378]: State changed from DETECTING_WAKE_WORD to STOPPING
[10:46:18][D][voice_assistant:456]: State changed from STARTING_MICROPHONE to START_PIPELINE
[10:46:18][D][micro_wake_word:273]: Inference task is stopping, deallocating buffers
[10:46:18][D][micro_wake_word:278]: Inference task is finished, freeing task resources
[10:46:18][D][micro_wake_word:378]: State changed from STOPPING to STOPPED
[10:46:18][D][voice_assistant:207]: Requesting start…
[10:46:18][D][voice_assistant:456]: State changed from START_PIPELINE to STARTING_PIPELINE
[10:46:18][D][voice_assistant:478]: Client started, streaming microphone
[10:46:18][D][voice_assistant:456]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[10:46:18][D][voice_assistant:463]: Desired state set to STREAMING_MICROPHONE
[10:46:18][D][voice_assistant:598]: Event Type: 1
[10:46:18][D][voice_assistant:601]: Assist Pipeline running
[10:46:18][D][voice_assistant:598]: Event Type: 3
[10:46:18][D][voice_assistant:612]: STT started
[10:46:18][D][text_sensor:064]: ‘text_request’: Sending state ‘…’
[10:46:18][D][text_sensor:064]: ‘text_response’: Sending state ‘…’
[10:46:19][D][light:036]: ‘Screen’ Setting:
[10:46:19][D][light:085]: Transition length: 0.2s
[10:46:25][D][voice_assistant:598]: Event Type: 11
[10:46:25][D][voice_assistant:758]: Starting STT by VAD
[10:46:26][D][voice_assistant:598]: Event Type: 12
[10:46:26][D][voice_assistant:762]: STT by VAD end
[10:46:26][D][voice_assistant:456]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[10:46:26][D][voice_assistant:463]: Desired state set to AWAITING_RESPONSE
[10:46:26][D][voice_assistant:456]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[10:46:26][D][voice_assistant:598]: Event Type: 0
[10:46:26][E][voice_assistant:730]: Error: stt-no-text-recognized - No text recognized
[10:46:26][D][voice_assistant:580]: Signaling stop…
[10:46:26][D][voice_assistant:456]: State changed from STOPPING_MICROPHONE to STOP_MICROPHONE
[10:46:26][D][voice_assistant:463]: Desired state set to IDLE
[10:46:26][D][voice_assistant:456]: State changed from STOP_MICROPHONE to IDLE
[10:46:26][D][i2s_audio.microphone:443]: Task finished, freeing resources and uninstalling I2S driver
[10:46:26][D][voice_assistant:598]: Event Type: 2
[10:46:26][D][voice_assistant:697]: Assist Pipeline ended
[10:46:27][D][micro_wake_word:360]: Starting wake word detection
[10:46:27][D][text_sensor:064]: ‘text_request’: Sending state ‘’
[10:46:27][D][text_sensor:064]: ‘text_response’: Sending state ‘’
[10:46:27][D][micro_wake_word:378]: State changed from STOPPED to STARTING
[10:46:27][D][ring_buffer:034][mww]: Created ring buffer with size 3840
[10:46:27][D][micro_wake_word:261]: Inference task has started, attempting to allocate memory for buffers
[10:46:27][D][micro_wake_word:266]: Inference task is running
[10:46:27][D][micro_wake_word:378]: State changed from STARTING to DETECTING_WAKE_WORD
[10:46:27][D][i2s_audio.microphone:431]: Task started, attempting to allocate buffer
[10:46:27][D][i2s_audio.microphone:436]: Task is running and reading data
[10:46:28][W][component:257]: Component voice_assistant took a long time for an operation (257 ms).
[10:46:28][W][component:258]: Components should block for at most 30 ms.
[10:46:40][D][micro_wake_word:325]: Detected ‘Okay Nabu’ with sliding average probability is 0.98 and max probability is 1.00
[10:46:40][D][voice_assistant:456]: State changed from IDLE to START_MICROPHONE
[10:46:40][D][voice_assistant:463]: Desired state set to START_PIPELINE
[10:46:40][D][micro_wake_word:370]: Stopping wake word detection
[10:46:40][D][voice_assistant:186]: Starting Microphone
[10:46:40][D][ring_buffer:034]: Created ring buffer with size 16384
[10:46:40][D][voice_assistant:456]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[10:46:40][D][micro_wake_word:378]: State changed from DETECTING_WAKE_WORD to STOPPING
[10:46:40][D][voice_assistant:456]: State changed from STARTING_MICROPHONE to START_PIPELINE
[10:46:40][D][voice_assistant:207]: Requesting start…
[10:46:40][D][voice_assistant:456]: State changed from START_PIPELINE to STARTING_PIPELINE
[10:46:40][D][micro_wake_word:273]: Inference task is stopping, deallocating buffers
[10:46:40][D][micro_wake_word:278]: Inference task is finished, freeing task resources
[10:46:40][D][micro_wake_word:378]: State changed from STOPPING to STOPPED
[10:46:40][D][voice_assistant:478]: Client started, streaming microphone
[10:46:40][D][voice_assistant:456]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[10:46:40][D][voice_assistant:463]: Desired state set to STREAMING_MICROPHONE
[10:46:44][D][esp32.preferences:114]: Saving 4 preferences to flash…
[10:46:44][D][esp32.preferences:142]: Saving 4 preferences to flash: 4 cached, 0 written, 0 failed
@Fireblade900rr how long did you wait before trying the wake word? When I change mine it can take up to 20 seconds to reach the Saving 4 preferences to flash
log, then I can use the new word, I see you changes yours to “Hey Jarvis” but didn’t wait for the “Saving Preferences” output.
Try it again while watching the logs and see what happens. I also use the “On device” Wake Word Engine. Not sure if that make a difference.
As stated above, wakeword itself is not the problem (I don’t need hey jarvis and there are many users with problems).
I would like to use chatgpt, when i change to it wakeword is not recognized at all.
I just used your yaml from github and it worked perfectly for the SpotPear ESP32-S3-Touch-LCD-1.28-BOX-THIN. Your project is linked from its SpotPear product resources page.
Thank you for pursuing this.
Where did you get the STL for the muscle body?
Hi,
I am testing the ball V2 with the yaml from https://github.com/RealDeco/xiaozhi-esphome
In my setup, my home assistant is in a remote location as described below.
My place
- Ball V2
- added Wireguard client settings
Remote location
Router with WG server ↔ host /HA in Docker container, network_mode=host
The Ball V2 connects successfully to WG server.
I added the device in HA using the WG VPN IP address, it is detected by HA.
My voice commands are recognised and executed but no sounds are being played on the speaker.
Log
[12:48:06][D][voice_assistant:656]: Response: “Il est 12:48:05.”
[12:48:06][V][text_sensor:013]: ‘text_response’: Received new state Il est 12:48:05.
[12:48:06][D][text_sensor:064]: ‘text_response’: Sending state ‘Il est 12:48:05.’
[12:48:06][V][ili9xxx:236]: Start display(xlow:0, ylow:0, xhigh:239, yhigh:239, width:240, height:240, mode=16, 18bit=0, sw_time=28640us, mw_time=114440us)
[12:48:06][V][ili9xxx:244]: Doing single write of 115200 bytes
[12:48:06][V][ili9xxx:295]: Data write took 26ms
[12:48:06][VV][api.service:1108]: on_voice_assistant_event_response: VoiceAssistantEventResponse {
[12:48:06][VV][api.service:1108]: event_type: VOICE_ASSISTANT_TTS_END
[12:48:06][VV][api.service:1108]: data: VoiceAssistantEventData {
[12:48:06][VV][api.service:1108]: name: ‘url’
[12:48:06][VV][api.service:1108]: value: http:/ /192.168.0.9:8123/api/tts_proxy/bpEkqYuw_tJPyYTKAF9HOw.flac
[12:48:06]}
[12:48:06]}
[12:48:06][D][voice_assistant:598]: Event Type: 8
[12:48:06][D][voice_assistant:678]: Response URL: http: / /192.168.0.9:8123/api/tts_proxy/bpEkqYuw_tJPyYTKAF9HOw.flac
[12:48:06][D][voice_assistant:456]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[12:48:06][D][voice_assistant:463]: Desired state set to STREAMING_RESPONSE
[12:48:06][D][media_player:074]: ‘Xiaozhi Ball V2’ - Setting
[12:48:06][D][media_player:081]: Media URL: http:/ /192.168.0.9:8123/api/tts_proxy/bpEkqYuw_tJPyYTKAF9HOw.flac
[12:48:06][D][media_player:087]: Announcement: yes
[12:48:06][VV][api.service:352]: send_media_player_state_response: MediaPlayerStateResponse {
[12:48:06][VV][api.service:352]: key: 526161276
[12:48:06][VV][api.service:352]: state: MEDIA_PLAYER_STATE_PLAYING
[12:48:06][VV][api.service:352]: volume: 0.5
[12:48:06][VV][api.service:352]: muted: NO
[12:48:06]}
[12:48:06][D][speaker_media_player:408]: State changed to ANNOUNCING
[12:48:06][VV][api.service:1108]: on_voice_assistant_event_response: VoiceAssistantEventResponse {
[12:48:06][VV][api.service:1108]: event_type: VOICE_ASSISTANT_RUN_END
[12:48:06]}
[12:48:06][D][voice_assistant:598]: Event Type: 2
[12:48:06][D][voice_assistant:697]: Assist Pipeline ended
[12:48:10][VV][light.addressable:015]: Addressable Light ‘Xiaozhi Ball V2’ (effect_active=NO)
[12:48:10][VV][light.addressable:018]: [ 0] Color: R= 0 G= 0 B= 0 W= 0
[12:48:10][VV][light.addressable:021]:
[12:48:11][VV][esp-idf:000][ann_read]: E (194030) esp-tls: [sock=60] select() timeout
[12:48:11][VV][esp-idf:000][ann_read]: E (194030) transport_base: Failed to open a new connection: 32774
[12:48:11][VV][esp-idf:000][ann_read]: E (194031) HTTP_CLIENT: Connection failed, sock < 0
[12:48:11][E][speaker_media_player.pipeline:112]: Media reader encountered an error: ESP_ERR_HTTP_CONNECT
[12:48:11][E][speaker_media_player:328]: The announcement pipeline’s file reader encountered an error.
[12:48:14][V][wireguard:080]: enabled=1, connected=1, peer_up=1, handshake: current=1749898071 latest=1749898071 updated=0
[12:48:14][D][wireguard:098]: WireGuard remote peer is online (latest handshake 2025-06-14 12:47:51 CEST)
[12:48:15][VV][light.addressable:015]: Addressable Light ‘Xiaozhi Ball V2’ (effect_active=NO)
[12:48:15][VV][light.addressable:018]: [ 0] Color: R= 0 G= 0 B= 0 W= 0
[12:48:15][VV][light.addressable:021]:
[12:48:16][VV][esp-idf:000][ann_read]: E (199058) esp-tls: [sock=60] select() timeout
[12:48:16][VV][esp-idf:000][ann_read]: E (199058) transport_base: Failed to open a new connection: 32774
[12:48:16][VV][esp-idf:000][ann_read]: E (199058) HTTP_CLIENT: Connection failed, sock < 0
[12:48:16][E][speaker_media_player.pipeline:112]: Media reader encountered an error: ESP_ERR_HTTP_CONNECT
[12:48:16][E][speaker_media_player:328]: The announcement pipeline’s file reader encountered an error.
[12:48:20][VV][light.addressable:015]: Addressable Light ‘Xiaozhi Ball V2’ (effect_active=NO)
[12:48:20][VV][light.addressable:018]: [ 0] Color: R= 0 G= 0 B= 0 W= 0
[12:48:20][VV][light.addressable:021]:
[12:48:21][VV][esp-idf:000][ann_read]: E (204088) esp-tls: [sock=60] select() timeout
[12:48:21][VV][esp-idf:000][ann_read]: E (204088) transport_base: Failed to open a new connection: 32774
[12:48:21][VV][esp-idf:000][ann_read]: E (204088) HTTP_CLIENT: Connection failed, sock < 0
[12:48:21][E][speaker_media_player.pipeline:112]: Media reader encountered an error: ESP_ERR_HTTP_CONNECT
[12:48:21][E][speaker_media_player:328]: The announcement pipeline’s file reader encountered an error.
[12:48:24][V][wireguard:080]: enabled=1, connected=1, peer_up=1, handshake: current=1749898101 latest=1749898071 updated=1
[12:48:24][D][wireguard:098]: WireGuard remote peer is online (latest handshake 2025-06-14 12:48:21 CEST)
[12:48:25][VV][light.addressable:015]: Addressable Light ‘Xiaozhi Ball V2’ (effect_active=NO)
[12:48:25][VV][light.addressable:018]: [ 0] Color: R= 0 G= 0 B= 0 W= 0
[12:48:25][VV][light.addressable:021]:
[12:48:26][VV][esp-idf:000][ann_read]: E (209117) esp-tls: [sock=60] select() timeout
[12:48:26][VV][esp-idf:000][ann_read]: E (209117) transport_base: Failed to open a new connection: 32774
[12:48:26][VV][esp-idf:000][ann_read]: E (209117) HTTP_CLIENT: Connection failed, sock < 0
[12:48:26][E][speaker_media_player.pipeline:112]: Media reader encountered an error: ESP_ERR_HTTP_CONNECT
[12:48:26][E][speaker_media_player:328]: The announcement pipeline’s file reader encountered an error.
[12:48:30][VV][light.addressable:015]: Addressable Light ‘Xiaozhi Ball V2’ (effect_active=NO)
[12:48:30][VV][light.addressable:018]: [ 0] Color: R= 0 G= 0 B= 0 W= 0
[12:48:30][VV][light.addressable:021]:
[12:48:31][VV][esp-idf:000][ann_read]: E (214150) esp-tls: [sock=60] select() timeout
[12:48:31][VV][esp-idf:000][ann_read]: E (214150) transport_base: Failed to open a new connection: 32774
[12:48:31][VV][esp-idf:000][ann_read]: E (214150) HTTP_CLIENT: Connection failed, sock < 0
[12:48:31][E][speaker_media_player.pipeline:112]: Media reader encountered an error: ESP_ERR_HTTP_CONNECT
[12:48:31][E][speaker_media_player:328]: The announcement pipeline’s file reader encountered an error.
[12:48:34][V][wireguard:080]: enabled=1, connected=1, peer_up=1, handshake: current=1749898101 latest=1749898101 updated=0
[12:48:34][D][wireguard:098]: WireGuard remote peer is online (latest handshake 2025-06-14 12:48:21 CEST)
[12:48:35][VV][light.addressable:015]: Addressable Light ‘Xiaozhi Ball V2’ (effect_active=NO)
[12:48:35][VV][light.addressable:018]: [ 0] Color: R= 0 G= 0 B= 0 W= 0
[12:48:35][VV][light.addressable:021]:
[12:48:36][VV][esp-idf:000][ann_read]: E (219189) esp-tls: [sock=60] select() timeout
[12:48:36][VV][esp-idf:000][ann_read]: E (219189) transport_base: Failed to open a new connection: 32774
[12:48:36][VV][esp-idf:000][ann_read]: E (219189) HTTP_CLIENT: Connection failed, sock < 0
[12:48:36][E][speaker_media_player.pipeline:112]: Media reader encountered an error: ESP_ERR_HTTP_CONNECT
[12:48:36][E][speaker_media_player:328]: The announcement pipeline’s file reader encountered an error.
I also tried using Developer Tools and have “Text-to-speech (TTS): Say a TTS message with google_translate” to output to the speaker of the Ball V2 and I get the same errors.
Problem solved… netmask was not define in wireguard settings !!
Ball V2 yaml works !!