Voice PE stop playing any medias

dnwk · April 22, 2025, 4:01pm

My Voice PE stop playing anything. If will play the start sound when I use wake word, it would not announce any answers. The log in HA all seems normal. My voice get recognized and a response was sent to Voice PE. But Voice PE isn’t playing it. Here is the log I see in ESPHome console:


[08:58:16][D][esp32.preferences:142]: Saving 4 preferences to flash: 3 cached, 1 written, 0 failed
[08:58:19][I][safe_mode:041]: Boot seems successful; resetting boot loop counter
[08:58:19][D][esp32.preferences:114]: Saving 1 preferences to flash...
[08:58:20][D][esp32.preferences:142]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed
[08:58:29][D][media_player:074]: 'Media Player' - Setting
[08:58:29][D][media_player:081]:   Media URL: http://172.30.1.200:8123/api/esphome/ffmpeg_proxy/64ec6ad9ecdedbb26f81c74bc8f976be/dCYlN57_N8bAUCSzRuoYVQ.flac
[08:58:29][D][media_player:087]:  Announcement: yes
[08:58:29][D][speaker_media_player:426]: State changed to ANNOUNCING
[08:58:34][D][esp-idf:000][ann_read]: W (75859) HTTP_CLIENT: Connection timed out before data was ready!
[08:58:34]
[08:58:34][E][speaker_media_player.pipeline:112]: Media reader encountered an error: ESP_FAIL

Is it hardware failure?

mchk · April 22, 2025, 5:32pm

[08:58:29][D][media_player:081]:   Media URL: http://172.30.1.200:8123/api/esphome/ffmpeg_proxy/64ec6ad9ecdedbb26f81c74bc8f976be/dCYlN57_N8bAUCSzRuoYVQ.flac
---
[08:58:34][D][esp-idf:000][ann_read]: W (75859) HTTP_CLIENT: Connection timed out before data was ready!

Probably tts does not have time to generate a response in 5 seconds (VPE software limitation)
What speech generation service are you using?

dnwk · April 22, 2025, 5:44pm

I am using piper.

mchk · April 22, 2025, 5:53pm

Temporarily use cloud solutions.
Streaming response generation is still in development.
For responses longer than a few words, good performance with Piper is achieved now only on GPU.

dnwk · April 22, 2025, 6:01pm

I tried piper and Google translate. Both are the same issue. This happened on I am using media play function to send a specific text or mp3 too.

mchk · April 22, 2025, 6:20pm

Text and music use different audio pipelines on esp.
To understand the cause of the problem, you need to provide more logs and information about the actions used.

dnwk · April 22, 2025, 11:17pm

This is the log on Voice when using Media → Text to speech → Piper for sending announcement

[16:14:57][I][safe_mode:041]: Boot seems successful; resetting boot loop counter
[16:14:57][D][esp32.preferences:114]: Saving 1 preferences to flash...
[16:14:57][D][esp32.preferences:142]: Saving 1 preferences to flash: 0 cached, 1 written, 0 failed
[16:15:02][D][media_player:074]: 'Media Player' - Setting
[16:15:02][D][media_player:081]:   Media URL: http://172.30.1.200:8123/api/esphome/ffmpeg_proxy/64ec6ad9ecdedbb26f81c74bc8f976be/4Qx4oyHA_nKszIU_dFZxWA.flac
[16:15:02][D][media_player:087]:  Announcement: yes
[16:15:02][D][speaker_media_player:426]: State changed to ANNOUNCING
[16:15:07][D][esp-idf:000][ann_read]: W (71640) HTTP_CLIENT: Connection timed out before data was ready!
[16:15:08]
[16:15:08][E][speaker_media_player.pipeline:112]: Media reader encountered an error: ESP_FAIL
[16:15:08][D][speaker_media_player:426]: State changed to IDLE

Here is the log on Voice when using media → my media → media file

[16:16:38][D][media_player:074]: 'Media Player' - Setting
[16:16:38][D][media_player:081]:   Media URL: http://172.30.1.200:8123/api/esphome/ffmpeg_proxy/64ec6ad9ecdedbb26f81c74bc8f976be/hAYVHvV4U_RzKkUnvXJqWA.flac
[16:16:38][D][speaker_media_player:426]: State changed to PLAYING
[16:16:43][D][esp-idf:000][med_read]: W (167502) HTTP_CLIENT: Connection timed out before data was ready!
[16:16:43]
[16:16:43][E][speaker_media_player.pipeline:112]: Media reader encountered an error: ESP_FAIL
[16:16:43][D][speaker_media_player:426]: State changed to IDLE

Both looks pretty identical on Voice PE logs.
I get these logs via https://web.esphome.io/ and then connect to my voice via USB

Occasionally I may also see this error on HA side

Logger: aioesphomeapi.connection
Source: runner.py:154
First occurred: 4:14:15 PM (1 occurrences)
Last logged: 4:14:15 PM

home-assistant-voice-09df2c @ 192.168.1.39: Connection error occurred: [Errno 104] Connection reset by peer

But it’s not always happening.

mchk · April 22, 2025, 11:29pm

[ann_read] - announcement pipeline
[med_read] - media pipeline

In that case, my first assumption is wrong. There are some issues for receiving any type of media.
If a full vpe reset doesn’t solve the problem, it’s better to go to discord or create an issue on github

dnwk · April 22, 2025, 11:41pm

Ok. I have did a reinstall and bootloader reinstall earlier today before I posted this. I will try to open a github issue.

andy.dennis · April 25, 2025, 4:01pm

Do we have any details of timescales for access to the streaming work? I’m interested in getting involved with the testing as I’m working with a local ollama instance and F5TTS instance running with a wyoming wrapper

mchk · April 25, 2025, 4:22pm

There is no information on timing, but we do know that work is underway due to the availability of PRs in the main repository.

It is worth considering that third-party integrations will have to add support separately. And probably for this purpose the engine should have functionality to give data by chunks.

andy.dennis · April 25, 2025, 6:54pm

That’s cool. I think Ollama was mentioned to be implemented and I’ll work on integrating F5TTS if need be