My Voice PE stop playing anything. If will play the start sound when I use wake word, it would not announce any answers. The log in HA all seems normal. My voice get recognized and a response was sent to Voice PE. But Voice PE isn’t playing it. Here is the log I see in ESPHome console:
[08:58:16][D][esp32.preferences:142]: Saving 4 preferences to flash: 3 cached, 1 written, 0 failed
[08:58:19][I][safe_mode:041]: Boot seems successful; resetting boot loop counter
[08:58:19][D][esp32.preferences:114]: Saving 1 preferences to flash...
[08:58:20][D][esp32.preferences:142]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed
[08:58:29][D][media_player:074]: 'Media Player' - Setting
[08:58:29][D][media_player:081]: Media URL: http://172.30.1.200:8123/api/esphome/ffmpeg_proxy/64ec6ad9ecdedbb26f81c74bc8f976be/dCYlN57_N8bAUCSzRuoYVQ.flac
[08:58:29][D][media_player:087]: Announcement: yes
[08:58:29][D][speaker_media_player:426]: State changed to ANNOUNCING
[08:58:34][D][esp-idf:000][ann_read]: W (75859) HTTP_CLIENT: Connection timed out before data was ready!
[08:58:34]
[08:58:34][E][speaker_media_player.pipeline:112]: Media reader encountered an error: ESP_FAIL
[08:58:29][D][media_player:081]: Media URL: http://172.30.1.200:8123/api/esphome/ffmpeg_proxy/64ec6ad9ecdedbb26f81c74bc8f976be/dCYlN57_N8bAUCSzRuoYVQ.flac
---
[08:58:34][D][esp-idf:000][ann_read]: W (75859) HTTP_CLIENT: Connection timed out before data was ready!
Probably tts does not have time to generate a response in 5 seconds (VPE software limitation)
What speech generation service are you using?
Temporarily use cloud solutions.
Streaming response generation is still in development.
For responses longer than a few words, good performance with Piper is achieved now only on GPU.
Text and music use different audio pipelines on esp.
To understand the cause of the problem, you need to provide more logs and information about the actions used.
This is the log on Voice when using Media → Text to speech → Piper for sending announcement
[16:14:57][I][safe_mode:041]: Boot seems successful; resetting boot loop counter
[16:14:57][D][esp32.preferences:114]: Saving 1 preferences to flash...
[16:14:57][D][esp32.preferences:142]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed
[16:15:02][D][media_player:074]: 'Media Player' - Setting
[16:15:02][D][media_player:081]: Media URL: http://172.30.1.200:8123/api/esphome/ffmpeg_proxy/64ec6ad9ecdedbb26f81c74bc8f976be/4Qx4oyHA_nKszIU_dFZxWA.flac
[16:15:02][D][media_player:087]: Announcement: yes
[16:15:02][D][speaker_media_player:426]: State changed to ANNOUNCING
[16:15:07][D][esp-idf:000][ann_read]: W (71640) HTTP_CLIENT: Connection timed out before data was ready!
[16:15:08]
[16:15:08][E][speaker_media_player.pipeline:112]: Media reader encountered an error: ESP_FAIL
[16:15:08][D][speaker_media_player:426]: State changed to IDLE
Here is the log on Voice when using media → my media → media file
[16:16:38][D][media_player:074]: 'Media Player' - Setting
[16:16:38][D][media_player:081]: Media URL: http://172.30.1.200:8123/api/esphome/ffmpeg_proxy/64ec6ad9ecdedbb26f81c74bc8f976be/hAYVHvV4U_RzKkUnvXJqWA.flac
[16:16:38][D][speaker_media_player:426]: State changed to PLAYING
[16:16:43][D][esp-idf:000][med_read]: W (167502) HTTP_CLIENT: Connection timed out before data was ready!
[16:16:43]
[16:16:43][E][speaker_media_player.pipeline:112]: Media reader encountered an error: ESP_FAIL
[16:16:43][D][speaker_media_player:426]: State changed to IDLE
Both looks pretty identical on Voice PE logs.
I get these logs via https://web.esphome.io/ and then connect to my voice via USB
Occasionally I may also see this error on HA side
Logger: aioesphomeapi.connection
Source: runner.py:154
First occurred: 4:14:15 PM (1 occurrences)
Last logged: 4:14:15 PM
home-assistant-voice-09df2c @ 192.168.1.39: Connection error occurred: [Errno 104] Connection reset by peer
[ann_read] - announcement pipeline
[med_read] - media pipeline
In that case, my first assumption is wrong. There are some issues for receiving any type of media.
If a full vpe reset doesn’t solve the problem, it’s better to go to discord or create an issue on github
Do we have any details of timescales for access to the streaming work? I’m interested in getting involved with the testing as I’m working with a local ollama instance and F5TTS instance running with a wyoming wrapper
There is no information on timing, but we do know that work is underway due to the availability of PRs in the main repository.
It is worth considering that third-party integrations will have to add support separately. And probably for this purpose the engine should have functionality to give data by chunks.