Thereās definitely a problem, possibly due to the new vad or other changes. Looking at the logs in esphome, thereās a significant delay between the speech recognition stage and the start of the intent. However, the assist debug information indicates that the recognition operation is very fast (a few hundredths of a second, just like before the update). Now we need to figure out why this additional delay is occurring (up to 5 seconds in my case).
Hereās an example of a log with a 2.5 second delay in aggressive mode. I said the phrase between 37 and 39 seconds, and only then the VAD events occurred.
[16:57:36.988][D][voice_assistant:646]: STT started
[16:57:37.285][D][i2s_audio.speaker:111]: Stopping //beep end
[16:57:37.286][D][i2s_audio.speaker:116]: Stopped
[16:57:39.311][D][voice_assistant:624]: Event Type: 11
[16:57:39.312][D][voice_assistant:827]: Starting STT by VAD
[16:57:42.771][D][voice_assistant:624]: Event Type: 12
[16:57:42.771][D][voice_assistant:831]: STT by VAD end
[16:57:42.771][D][voice_assistant:478]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[16:57:42.772][D][voice_assistant:485]: Desired state set to AWAITING_RESPONSE
[16:57:42.783][D][voice_assistant:478]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[16:57:42.790][D][voice_assistant:478]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[16:57:43.131][D][voice_assistant:624]: Event Type: 4
[16:57:43.131][D][voice_assistant:663]: Speech recognised as: "test phrase"
[16:57:43.136][D][voice_assistant:624]: Event Type: 5
[16:57:43.139][D][voice_assistant:668]: Intent started
I was wondering if anybody else was experiencing this.
I am having the same issue. I tried on my android phone and it is working. Then I tried on another computer (that I donāt typically use with HA) and it worked.
I went back to the original computer and restarted the browser - it didnāt work. Cleared site data - still didnāt work. Restarted the computer - still not working. Nothing in the logs.
I am wondering what I can do for this issue as well.
i would like to report that 2026.1 doesn not display properly on my shelly display xl. for example switches apear āgreyed outā and donāt seem t turn yelllow when on.
Oh no, I love the work you guys do but this update has completely broken my dashboards. It looks like making adjustments to the row height of a card is now disabled for some cards. I use the gauge card extensively and made fine adjustments to the height to make them align perfectly, now theyāre all over the place.
I canāt see any reason to have changed this, so please, please, please revert this change, or apply a fix in the next release.
I hope i didnāt miss this error in above posts, but 2026 ⦠doesnāt load ok when loading via external address. If i enter local IP all is ok, but if i enter my external dns itās loading⦠and ends with no connection. If i repeat a few times then i finally get to HA and then it works ok. I just spent entire day tinkering with my router settings, reverse proxy⦠to no avail. I have reverse proxy set up in my synology and for years all worked ok. My HA is installed in VM on proxmox. what changed in 2026.1?
Then i went back to 2025.12.4 and all now works perfectly again.
Voie assistants that use the local Home Assistant are all broken. Switching to Home Assistant Cloud works.
EDIT: Rolling back to 2025.12.5 did NOT fix the issue!
D][media_player:096]: 'Media Player' - Setting
[D][media_player:103]: Media URL: http://ha.local:8123/api/tts_proxy/02ScObZGHjl8s-ZmITqTiA.flac
[D][media_player:109]: Announcement: yes
[D][voice_assistant:624]: Event Type: 2
[D][voice_assistant:766]: Assist Pipeline ended
[D][http_media_source:244]: Started read and decode tasks for pipeline 1
[D][http_media_source:253]: Pipeline 1 starting
[D][voice_assistant:478]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[D][voice_assistant:485]: Desired state set to RESPONSE_FINISHED
[D][voice_assistant:478]: State changed from RESPONSE_FINISHED to IDLE
[D][voice_assistant:485]: Desired state set to IDLE
[D][light:090]: 'voice_assistant_leds' Setting:
[D][light:103]: State: OFF
[D][light:164]: Effect: 'None'
[D][http_media_source:258]: Pipeline 1 running
[D][speaker_source_media_player:367]: State changed to ANNOUNCING
[E][http_media_source:264]: Pipeline 1 error occurred during playback
[D][http_media_source:271]: Pipeline 1 both tasks finished
[E][http_request.idf:247][HTTPRead_1]: HTTP Request failed; URL: http://ha.local:8123/api/tts_proxy/02ScObZGHjl8s-ZmITqTiA.flac; Code: -1
[E][component:362][HTTPRead_1]: http_request set Error flag: unspecified
[E][http_media_source:400][HTTPRead_1]: Pipeline 1: HTTP request failed with status -1
[D][speaker_source_media_player:367]: State changed to IDLE
[E][component:379]: http_request cleared Error flag
It could be the new VAD, which is a version of silero-vad ported from whisper.cpp. On everything Iāve tested it on, it performs much better. But I donāt have any Apple hardware, or a virtualized setup to test with.
I know with virtualized setups in general, people often donāt pass through certain CPU capabilities (like AVX2, etc.). I wonder if that could be contributing here?
Well then everything is much simpler, my J4205 does not support AVX2.
I really donāt want to replace my energy-efficient machine just because of one library. At the same time, it can use real-time streaming ASR (sherpa-onnx) and streaming output from Piper. Compared to this, Silero VAD seems too resource-intensive.
Maybe you could keep support for microVAD? Or will I have to stick with 2025.12?