Automation speak TTS not using local AI

Hi all, wondering if anyone else has run into this with HA PE. I have kokoro running as my tts for assist. I have an automation that says “someone is at the door” when my sensor detects motion. Oddly, if i use home assistant cloud entity, it runs great. But if I choose my kokoro entity, it doesn’t make any noise. is this some kind of bug where the response time for kokoro is too slow and speaker is off? or something else? Anyone else have this happen to them? if so, were you able to leverage your non-home assistant cloud TTS to speak in automations?

You can say it this way. If tts does not prepare the file in 5 seconds, then the action (on satellites operating on esphome) is interrupted.
But, to be fair, a properly configured tts module should be able to generate text within the allotted time (usually a server with a GPU is required, if we are talking about local use).
And we are still waiting for the addition of streaming functionality for tts from the HA team, this will also be a partial solution

Yeah i’m not sure if it is a lag or delivery in time issue. Just super weird that I can use home assistant cloud TTS, but i can’t use my kokoro tts, when it is running everything else fine…

This is my trace:
Result:
params:
domain: tts
service: speak
service_data:
cache: true
media_player_entity_id: media_player.home_assistant_voice_09560c_media_player
message: Hi, someone is at the door
entity_id:
- tts.openai
target:
entity_id:
- tts.openai
running_script: false

Any ideas what i’m doing wrong?

is it perhaps that the speak service doesn’t know how to make a call via the wyoming protocol to my local deployment of kokoro? I can’t seem to make that linkage in services though…

Open the media browser in the side menu, select the TTS section, find your service and test it using the browser as audio output.

I found this error in the raw logs:

2025-04-25 12:03:47.778 ERROR (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [E][speaker_media_player:339]: The announcement pipeline’s file reader encountered an error.

Doing a google search, people have said it is a ffmpeg path error or firewall error. But for those people the issue was about setting up HAVPE. But i have it working just not for this automation. So it would seem unlikely those errors are relevant to me…

And yes, i did the media test and it works fine

The easiest way is to look at the log in esphome
The error looks like this

[19:25:13][D][media_player:081]:   Media URL: http://10.144.1.2:8123/api/esphome/ffmpeg_proxy/ec1f4ac4b37bb759807baf4be710ff1e/Mnb105rGftSg-473zWS2fw.flac
[19:25:13][D][media_player:087]:  Announcement: yes
---
[19:25:13][D][ring_buffer:034][ann_read]: Created ring buffer with size 1000000
[19:25:13][D][speaker_media_player.pipeline:114]: Reading FLAC file type
[19:25:18][E][speaker_media_player:339]: The announcement pipeline's file reader encountered an error.

sorry, how can i find the esphome log?

NM i turned on debug logging and have looked at the raw system log as well as the log file. Will add it here

Ok I think I found the relevant log areas. I did it once with home assistant cloud, and then again with the local pipeline with kokoro.

With home assistant cloud:

2025-04-25 13:30:55.938 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][media_player:074]: 'Media Player' - Setting\033[0m"

2025-04-25 13:30:55.939 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][media_player:074]: 'Media Player' - Setting
2025-04-25 13:30:55.939 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][media_player:081]:   Media URL: http://192.168.1.214:8123/api/esphome/ffmpeg_proxy/fd9045c30e9e0283aaa284e04e56dce7/CEuQvByWv6wn2QJ9Q5NFGw.flac\033[0m"

2025-04-25 13:30:55.939 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][media_player:081]:   Media URL: http://192.168.1.214:8123/api/esphome/ffmpeg_proxy/fd9045c30e9e0283aaa284e04e56dce7/CEuQvByWv6wn2QJ9Q5NFGw.flac
2025-04-25 13:30:55.944 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][media_player:087]:  Announcement: yes\033[0m"

2025-04-25 13:30:55.945 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][media_player:087]:  Announcement: yes
2025-04-25 13:30:55.949 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][power_supply:033]: Enabling power supply.\033[0m"

2025-04-25 13:30:55.949 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][power_supply:033]: Enabling power supply.
2025-04-25 13:30:55.979 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type MediaPlayerStateResponse: key: 2232357057
state: MEDIA_PLAYER_STATE_PLAYING
volume: 1

2025-04-25 13:30:55.982 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][speaker_media_player:426]: State changed to ANNOUNCING\033[0m"

2025-04-25 13:30:55.983 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][speaker_media_player:426]: State changed to ANNOUNCING
2025-04-25 13:30:55.998 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg -i http://192.168.1.214:8123/api/tts_proxy/fqApRSKXtKN-XhjbX-WGow.mp3 -f flac -ar 48000 -ac 1 -sample_fmt s16 -map_metadata -1 -vn -nostats pipe:
2025-04-25 13:30:56.051 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][ring_buffer:034]\033[1;31m[ann_read]\033[0;36m: Created ring buffer with size 1000000\033[0m"

2025-04-25 13:30:56.051 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][ring_buffer:034][ann_read]: Created ring buffer with size 1000000
2025-04-25 13:30:56.051 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][speaker_media_player.pipeline:114]: Reading FLAC file type\033[0m"

2025-04-25 13:30:56.051 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][speaker_media_player.pipeline:114]: Reading FLAC file type
2025-04-25 13:30:56.095 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output: ffmpeg version 6.1.2 Copyright (c) 2000-2024 the FFmpeg developers
2025-04-25 13:30:56.095 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:   built with gcc 14.2.0 (Alpine 14.2.0)
2025-04-25 13:30:56.095 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:   configuration: --prefix=/usr --disable-librtmp --disable-lzma --disable-static --disable-stripping --enable-avfilter --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libdrm --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-libmp3lame --enable-libopenmpt --enable-libopus --enable-libplacebo --enable-libpulse --enable-librav1e --enable-librist --enable-libsoxr --enable-libsrt --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-lto=auto --enable-lv2 --enable-openssl --enable-pic --enable-postproc --enable-pthreads --enable-shared --enable-vaapi --enable-vdpau --enable-version3 --enable-vulkan --optflags=-O3 --enable-libjxl --enable-libsvtav1
2025-04-25 13:30:56.095 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:   libavutil      58. 29.100 / 58. 29.100
2025-04-25 13:30:56.095 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:   libavcodec     60. 31.102 / 60. 31.102
2025-04-25 13:30:56.096 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:   libavformat    60. 16.100 / 60. 16.100
2025-04-25 13:30:56.096 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:   libavdevice    60.  3.100 / 60.  3.100
2025-04-25 13:30:56.096 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:   libavfilter     9. 12.100 /  9. 12.100
2025-04-25 13:30:56.096 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:   libswscale      7.  5.100 /  7.  5.100
2025-04-25 13:30:56.096 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:   libswresample   4. 12.100 /  4. 12.100
2025-04-25 13:30:56.096 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:   libpostproc    57.  3.100 / 57.  3.100
2025-04-25 13:30:56.111 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output: Input #0, mp3, from 'http://192.168.1.214:8123/api/tts_proxy/fqApRSKXtKN-XhjbX-WGow.mp3':
2025-04-25 13:30:56.112 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:   Metadata:
2025-04-25 13:30:56.112 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:     Text            : en-US
2025-04-25 13:30:56.112 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:   Duration: N/A, start: 0.000000, bitrate: 48 kb/s
2025-04-25 13:30:56.112 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:   Stream #0:0: Audio: mp3, 24000 Hz, mono, fltp, 48 kb/s
2025-04-25 13:30:56.116 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output: Stream mapping:
2025-04-25 13:30:56.116 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:   Stream #0:0 -> #0:0 (mp3 (mp3float) -> flac (native))
2025-04-25 13:30:56.116 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output: Press [q] to stop, [?] for help
2025-04-25 13:30:56.120 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output: Output #0, flac, to 'pipe:':
2025-04-25 13:30:56.120 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:   Metadata:
2025-04-25 13:30:56.120 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:     encoder         : Lavf60.16.100
2025-04-25 13:30:56.120 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:   Stream #0:0: Audio: flac, 48000 Hz, mono, s16, 128 kb/s
2025-04-25 13:30:56.121 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:     Metadata:
2025-04-25 13:30:56.121 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output:       encoder         : Lavc60.31.102 flac
2025-04-25 13:30:56.169 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output: [flac @ 0x7fa48f0280] unable to rewrite FLAC header.
2025-04-25 13:30:56.170 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output: [out#0/flac @ 0x7fa4970300] video:0kB audio:58kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 13.871594%
2025-04-25 13:30:56.170 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[555] output: size=      66kB time=00:00:02.68 bitrate= 202.5kbits/s speed=50.5x
2025-04-25 13:30:56.267 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][speaker_media_player.pipeline:124]: Decoded audio has 1 channels, 48000 Hz sample rate, and 16 bits per sample\033[0m"

2025-04-25 13:30:56.268 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][speaker_media_player.pipeline:124]: Decoded audio has 1 channels, 48000 Hz sample rate, and 16 bits per sample
2025-04-25 13:30:56.345 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][ring_buffer:034]: Created ring buffer with size 9600\033[0m"

2025-04-25 13:30:56.345 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][ring_buffer:034]: Created ring buffer with size 9600
2025-04-25 13:30:56.353 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][speaker_mixer:310]: Starting speaker mixer\033[0m"

2025-04-25 13:30:56.353 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][speaker_mixer:310]: Starting speaker mixer
2025-04-25 13:30:56.354 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][speaker_mixer:318]: Started speaker mixer\033[0m"

2025-04-25 13:30:56.354 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][speaker_mixer:318]: Started speaker mixer
2025-04-25 13:30:57.949 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type LightStateResponse: key: 1459351809
brightness: 0.66
red: 0.099051632
green: 0.772392
blue: 1
white: 1
color_brightness: 1
color_mode: 35
cold_white: 1
warm_white: 1

2025-04-25 13:30:58.760 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type MediaPlayerStateResponse: key: 2232357057
state: MEDIA_PLAYER_STATE_IDLE
volume: 1

2025-04-25 13:30:58.763 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][speaker_media_player:426]: State changed to IDLE\033[0m"

2025-04-25 13:30:58.764 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][speaker_media_player:426]: State changed to IDLE
2025-04-25 13:30:58.896 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][speaker_mixer:323]: Stopping speaker mixer\033[0m"

2025-04-25 13:30:58.896 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][speaker_mixer:323]: Stopping speaker mixer
2025-04-25 13:31:08.064 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][power_supply:048]: Disabling power supply.\033[0m"

2025-04-25 13:31:08.064 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][power_supply:048]: Disabling power supply.

With kokoro local pipeline:

2025-04-25 13:23:44.028 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][media_player:074]: 'Media Player' - Setting\033[0m"

2025-04-25 13:23:44.028 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][media_player:074]: 'Media Player' - Setting
2025-04-25 13:23:44.030 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][media_player:081]:   Media URL: http://192.168.1.214:8123/api/esphome/ffmpeg_proxy/fd9045c30e9e0283aaa284e04e56dce7/_ZAKGRaRheWW8MIsGZ4VLw.flac\033[0m"

2025-04-25 13:23:44.031 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][media_player:081]:   Media URL: http://192.168.1.214:8123/api/esphome/ffmpeg_proxy/fd9045c30e9e0283aaa284e04e56dce7/_ZAKGRaRheWW8MIsGZ4VLw.flac
2025-04-25 13:23:44.031 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][media_player:087]:  Announcement: yes\033[0m"

2025-04-25 13:23:44.031 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][media_player:087]:  Announcement: yes
2025-04-25 13:23:44.040 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][power_supply:033]: Enabling power supply.\033[0m"

2025-04-25 13:23:44.040 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][power_supply:033]: Enabling power supply.
2025-04-25 13:23:44.067 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type MediaPlayerStateResponse: key: 2232357057
state: MEDIA_PLAYER_STATE_PLAYING
volume: 1

2025-04-25 13:23:44.071 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][speaker_media_player:426]: State changed to ANNOUNCING\033[0m"

2025-04-25 13:23:44.072 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][speaker_media_player:426]: State changed to ANNOUNCING
2025-04-25 13:23:44.086 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg -i http://192.168.1.214:8123/api/tts_proxy/4nQrZzgQxB_3FwQETovLrg.mp3 -f flac -ar 48000 -ac 1 -sample_fmt s16 -map_metadata -1 -vn -nostats pipe:
2025-04-25 13:23:44.098 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][ring_buffer:034]\033[1;31m[ann_read]\033[0;36m: Created ring buffer with size 1000000\033[0m"

2025-04-25 13:23:44.098 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][ring_buffer:034][ann_read]: Created ring buffer with size 1000000
2025-04-25 13:23:44.115 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][speaker_media_player.pipeline:114]: Reading FLAC file type\033[0m"

2025-04-25 13:23:44.115 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][speaker_media_player.pipeline:114]: Reading FLAC file type
2025-04-25 13:23:44.181 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[553] output: ffmpeg version 6.1.2 Copyright (c) 2000-2024 the FFmpeg developers
2025-04-25 13:23:44.182 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[553] output:   built with gcc 14.2.0 (Alpine 14.2.0)
2025-04-25 13:23:44.182 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[553] output:   configuration: --prefix=/usr --disable-librtmp --disable-lzma --disable-static --disable-stripping --enable-avfilter --enable-gpl --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libdrm --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libharfbuzz --enable-libmp3lame --enable-libopenmpt --enable-libopus --enable-libplacebo --enable-libpulse --enable-librav1e --enable-librist --enable-libsoxr --enable-libsrt --enable-libssh --enable-libtheora --enable-libv4l2 --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxcb --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-lto=auto --enable-lv2 --enable-openssl --enable-pic --enable-postproc --enable-pthreads --enable-shared --enable-vaapi --enable-vdpau --enable-version3 --enable-vulkan --optflags=-O3 --enable-libjxl --enable-libsvtav1
2025-04-25 13:23:44.182 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[553] output:   libavutil      58. 29.100 / 58. 29.100
2025-04-25 13:23:44.182 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[553] output:   libavcodec     60. 31.102 / 60. 31.102
2025-04-25 13:23:44.182 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[553] output:   libavformat    60. 16.100 / 60. 16.100
2025-04-25 13:23:44.182 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[553] output:   libavdevice    60.  3.100 / 60.  3.100
2025-04-25 13:23:44.182 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[553] output:   libavfilter     9. 12.100 /  9. 12.100
2025-04-25 13:23:44.182 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[553] output:   libswscale      7.  5.100 /  7.  5.100
2025-04-25 13:23:44.182 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[553] output:   libswresample   4. 12.100 /  4. 12.100
2025-04-25 13:23:44.182 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg[553] output:   libpostproc    57.  3.100 / 57.  3.100
2025-04-25 13:23:46.023 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type LightStateResponse: key: 1459351809
brightness: 0.66
red: 0.099051632
green: 0.772392
blue: 1
white: 1
color_brightness: 1
color_mode: 35
cold_white: 1
warm_white: 1

2025-04-25 13:23:49.105 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg transcoding cancelled
2025-04-25 13:23:49.121 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_ERROR
message: "\033[1;31m[E][speaker_media_player:339]: The announcement pipeline's file reader encountered an error.\033[0m"

2025-04-25 13:23:49.121 ERROR (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [E][speaker_media_player:339]: The announcement pipeline's file reader encountered an error.
2025-04-25 13:23:49.157 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type MediaPlayerStateResponse: key: 2232357057
state: MEDIA_PLAYER_STATE_IDLE
volume: 1

2025-04-25 13:23:49.158 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][speaker_media_player:426]: State changed to IDLE\033[0m"

2025-04-25 13:23:49.159 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][speaker_media_player:426]: State changed to IDLE
2025-04-25 13:23:51.174 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-091773 @ 192.168.1.138: Sending PingRequest: 
2025-04-25 13:23:51.174 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] home-assistant-voice-091773 @ 192.168.1.138: Sending frame: [000007]
2025-04-25 13:23:51.248 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-091773 @ 192.168.1.138: Got message of type PingResponse: 
2025-04-25 13:23:56.048 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][power_supply:048]: Disabling power supply.\033[0m"

2025-04-25 13:23:56.049 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][power_supply:048]: Disabling power supply.

to me it would seem that, the issue is that home assistant speak service isn’t calling the wyoming protocol kokoro…

Looks like in both cases, HAVPE is calling to my raspberry pi for the HA associated ffmpeg function, ip ending 214

I would imagine the call should be through my wyoming protocol to my kokoro docker, to generate the flac/mp3 file, but I see no message out to my main server ip.

2025-04-25 13:23:44.072 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][speaker_media_player:426]: State changed to ANNOUNCING
2025-04-25 13:23:44.086 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg -i http://192.168.1.214:8123/api/tts_proxy/4nQrZzgQxB_3FwQETovLrg.mp3 -f flac -ar 48000 -ac 1 -sample_fmt s16 -map_metadata -1 -vn -nostats pipe:
2025-04-25 13:23:44.098 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][ring_buffer:034]\033[1;31m[ann_read]\033[0;36m: Created ring buffer with size 1000000\033[0m"

This call is the same with cloud and with kokoro… but I don’t really understand this code very well… is HA doing something in the background that isn’t coming through in the log?

the only time I see a call to my main server is via a ping well after the pipeline error, see the ping to 138.

2025-04-25 13:23:49.159 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][speaker_media_player:426]: State changed to IDLE
[u]**2025-04-25 13:23:51.174 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-091773 @ 192.168.1.138: Sending PingRequest: **[/u]
[u]**2025-04-25 13:23:51.174 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] home-assistant-voice-091773 @ 192.168.1.138: Sending frame: [000007]**[/u]
[u]**2025-04-25 13:23:51.248 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-091773 @ 192.168.1.138: Got message of type PingResponse:**[/u] 
2025-04-25 13:23:56.048 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][power_supply:048]: Disabling power supply.\033[0m"

2025-04-25 13:23:56.049 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09560c: [D][power_supply:048]: Disabling power supply.
2025-04-25 13:23:44.115 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: "\033[0;36m[D][speaker_media_player.pipeline:114]: Reading FLAC file type\033[0m"
---
2025-04-25 13:23:49.105 DEBUG (MainThread) [homeassistant.components.esphome.ffmpeg_proxy] ffmpeg transcoding cancelled
2025-04-25 13:23:49.121 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09560c @ 192.168.1.177: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_ERROR
message: "\033[1;31m[E][speaker_media_player:339]: The announcement pipeline's file reader encountered an error.\033[0m"

This is the case I mentioned at the very beginning.
Look at the timings

Try running the phrase in the media browser first in the browser and then on the satellite.
In this case, the request uses audio from the cache, and the audio should appear on the satellite

Asking ChatGPT it suggested that it was a file type issue, lol that HA was expecting flac, instead of mp3… when I reviewed ESPHome documentation that seems to be erroneous…

Are you saying that my system is responding with a kokoro sound file >5 seconds after the call, and so the pipeline fails? I’m running HAVPE, HA on raspberry pi 4, and kokoro on my desktop 4090 via a docker container. Sometimes I’ve noticed substantial lag time when I make a call to Assist, as it loads everything into memory. But even if I do this after “warming” everything up, it continues to fail.

  • if it is a timeliness issue, is there a recommended setup what would decrease the latency?
  • i’m not sure i understand how to set up the automation so that it runs via media browser > satellite, if you could send me a link for more information that I can work off of?

Thanks for your help so far!

Turns out it may be the wyoming protocol middle ware that I’m using. The developer seems to have figured out the issue. I’m using announce instead of speak right now, but will learn to figure out how to update the wyoming-protocol middleware and then update to hopefully also solve speak.

I didn’t even consider the option with an integration error🙃. Usually the main functionality is tested before release. But this happens when you use custom integrations, I hope the author will cope with the problem.
Source selection is located here

also you don’t need to deal with wyoming, just update the integration and maybe the container. stay tuned for updates from the developer.