I got a Home Assistant Voice Preview Edition a few weeks ago and I’ve been having a hard time using it properly. It has gone through several updates but it still does not work great.
I have two main issues:
Audio responses get cut off pretty often. I’ll ask it to do something and it will reply with “Sorry, I…” and stop. Happens mostly with failed requests which leads me to
Requests made through this device fail when requests made directly on the Web app do not.
Example: “is the office climate on?”
I am making this request to an assistant called LLM which uses Google Generative AI which is configured to use Gemini Flash Lite (2.0 today but have been using 1.5 with similar issues). On the web, I get a proper response:
If I make the same request to the Voice PE, I just get “Sorry, I c…” Voice PE is configured to use the same LLM agent
I don’t know how to debug this to figure out what is happening, but I would expect the same results from using Assist on the web/app as when using it from an ESP32 device like Voice PE.
will35
April 10, 2025, 3:26pm
2
Hi
What is the STT engine used in the pipeline?
Just look at the voice pipeline debug tool if the spoken sentence is recognized correctly
Troubleshooting Assist - Home Assistant
Using Home Assistant Cloud. Here’s the config of the “LLM” Assist entry
Debug shows very similar STT outputs for PE and the HA
From Web:
Raw output
run:
pipeline: 01j8828984zn35h0k025svgwkx
language: en
conversation_id: 01JRG3MW2W565PPEB929K92FVG
runner_data:
stt_binary_handler_id: 4
timeout: 300
tts_output:
token: X4S-5JDVOlO4PeF3uKPRxQ.mp3
url: /api/tts_proxy/X4S-5JDVOlO4PeF3uKPRxQ.mp3
mime_type: audio/mpeg
events:
- type: run-start
data:
pipeline: 01j8828984zn35h0k025svgwkx
language: en
conversation_id: 01JRG3MW2W565PPEB929K92FVG
runner_data:
stt_binary_handler_id: 4
timeout: 300
tts_output:
token: X4S-5JDVOlO4PeF3uKPRxQ.mp3
url: /api/tts_proxy/X4S-5JDVOlO4PeF3uKPRxQ.mp3
mime_type: audio/mpeg
timestamp: "2025-04-10T15:03:42.940655+00:00"
- type: stt-start
data:
engine: stt.home_assistant_cloud
metadata:
language: en-US
format: wav
codec: pcm
bit_rate: 16
sample_rate: 16000
channel: 1
timestamp: "2025-04-10T15:03:42.940721+00:00"
- type: stt-vad-start
data:
timestamp: 1490
timestamp: "2025-04-10T15:03:44.460247+00:00"
- type: stt-vad-end
data:
timestamp: 4040
timestamp: "2025-04-10T15:03:47.010341+00:00"
- type: stt-end
data:
stt_output:
text: Is the office climate on?
timestamp: "2025-04-10T15:03:47.031625+00:00"
- type: intent-start
data:
engine: conversation.google_generative_ai
language: en-US
intent_input: Is the office climate on?
conversation_id: 01JRG3MW2W565PPEB929K92FVG
device_id: null
prefer_local_intents: true
timestamp: "2025-04-10T15:03:47.031699+00:00"
- type: intent-end
data:
processed_locally: false
intent_output:
response:
speech:
plain:
speech: >-
Yes, the office climate is on and set to cool. The current
temperature is 20 and the target temperature is 22.
extra_data: null
card: {}
language: en-US
response_type: action_done
data:
targets: []
success: []
failed: []
conversation_id: 01JRG3MW2W565PPEB929K92FVG
continue_conversation: false
timestamp: "2025-04-10T15:03:48.175331+00:00"
- type: tts-start
data:
engine: tts.home_assistant_cloud
language: en-US
voice: ChristopherNeural
tts_input: >-
Yes, the office climate is on and set to cool. The current temperature
is 20 and the target temperature is 22.
timestamp: "2025-04-10T15:03:48.175380+00:00"
- type: tts-end
data:
tts_output:
media_id: >-
media-source://tts/tts.home_assistant_cloud?message=Yes,+the+office+climate+is+on+and+set+to+cool.+The+current+temperature+is+20+and+the+target+temperature+is+22.&language=en-US&tts_options=%7B%22audio_output%22:%22mp3%22,%22voice%22:%22ChristopherNeural%22%7D
token: X4S-5JDVOlO4PeF3uKPRxQ.mp3
url: /api/tts_proxy/X4S-5JDVOlO4PeF3uKPRxQ.mp3
mime_type: audio/mpeg
timestamp: "2025-04-10T15:03:48.176662+00:00"
- type: run-end
data: null
timestamp: "2025-04-10T15:03:48.176700+00:00"
stt:
engine: stt.home_assistant_cloud
metadata:
language: en-US
format: wav
codec: pcm
bit_rate: 16
sample_rate: 16000
channel: 1
done: true
stt_output:
text: Is the office climate on?
intent:
engine: conversation.google_generative_ai
language: en-US
intent_input: Is the office climate on?
conversation_id: 01JRG3MW2W565PPEB929K92FVG
device_id: null
prefer_local_intents: true
done: true
processed_locally: false
intent_output:
response:
speech:
plain:
speech: >-
Yes, the office climate is on and set to cool. The current
temperature is 20 and the target temperature is 22.
extra_data: null
card: {}
language: en-US
response_type: action_done
data:
targets: []
success: []
failed: []
conversation_id: 01JRG3MW2W565PPEB929K92FVG
continue_conversation: false
tts:
engine: tts.home_assistant_cloud
language: en-US
voice: ChristopherNeural
tts_input: >-
Yes, the office climate is on and set to cool. The current temperature is 20
and the target temperature is 22.
done: true
tts_output:
media_id: >-
media-source://tts/tts.home_assistant_cloud?message=Yes,+the+office+climate+is+on+and+set+to+cool.+The+current+temperature+is+20+and+the+target+temperature+is+22.&language=en-US&tts_options=%7B%22audio_output%22:%22mp3%22,%22voice%22:%22ChristopherNeural%22%7D
token: X4S-5JDVOlO4PeF3uKPRxQ.mp3
url: /api/tts_proxy/X4S-5JDVOlO4PeF3uKPRxQ.mp3
mime_type: audio/mpeg
Using Voice PE
run:
pipeline: 01j8828984zn35h0k025svgwkx
language: en
conversation_id: 01JRG2TEZQ0QB7RD0PZPPMTY5W
tts_output:
token: ZITDXQEypz-0pfO1Er8tiw.flac
url: /api/tts_proxy/ZITDXQEypz-0pfO1Er8tiw.flac
mime_type: audio/flac
events:
- type: run-start
data:
pipeline: 01j8828984zn35h0k025svgwkx
language: en
conversation_id: 01JRG2TEZQ0QB7RD0PZPPMTY5W
tts_output:
token: ZITDXQEypz-0pfO1Er8tiw.flac
url: /api/tts_proxy/ZITDXQEypz-0pfO1Er8tiw.flac
mime_type: audio/flac
timestamp: "2025-04-10T15:04:08.808652+00:00"
- type: stt-start
data:
engine: stt.home_assistant_cloud
metadata:
language: en-US
format: wav
codec: pcm
bit_rate: 16
sample_rate: 16000
channel: 1
timestamp: "2025-04-10T15:04:08.808792+00:00"
- type: stt-vad-start
data:
timestamp: 1340
timestamp: "2025-04-10T15:04:10.139294+00:00"
- type: stt-vad-end
data:
timestamp: 3830
timestamp: "2025-04-10T15:04:12.594475+00:00"
- type: stt-end
data:
stt_output:
text: Is the office climate on?
timestamp: "2025-04-10T15:04:12.626966+00:00"
- type: intent-start
data:
engine: conversation.google_generative_ai
language: en-US
intent_input: Is the office climate on?
conversation_id: 01JRG2TEZQ0QB7RD0PZPPMTY5W
device_id: b8146cd22fd5962936417cda25274730
prefer_local_intents: true
timestamp: "2025-04-10T15:04:12.627107+00:00"
- type: intent-end
data:
processed_locally: false
intent_output:
response:
speech:
plain:
speech: Sorry, I couldn't understand that
extra_data: null
card: {}
language: en-US
response_type: action_done
data:
targets: []
success: []
failed: []
conversation_id: 01JRG2TEZQ0QB7RD0PZPPMTY5W
continue_conversation: false
timestamp: "2025-04-10T15:04:13.247191+00:00"
- type: tts-start
data:
engine: tts.home_assistant_cloud
language: en-US
voice: ChristopherNeural
tts_input: Sorry, I couldn't understand that
timestamp: "2025-04-10T15:04:13.247348+00:00"
- type: tts-end
data:
tts_output:
media_id: >-
media-source://tts/tts.home_assistant_cloud?message=Sorry,+I+couldn't+understand+that&language=en-US&tts_options=%7B%22audio_output%22:%22mp3%22,%22voice%22:%22ChristopherNeural%22,%22preferred_format%22:%22flac%22,%22preferred_sample_rate%22:48000,%22preferred_sample_channels%22:1,%22preferred_sample_bytes%22:2%7D
token: ZITDXQEypz-0pfO1Er8tiw.flac
url: /api/tts_proxy/ZITDXQEypz-0pfO1Er8tiw.flac
mime_type: audio/flac
timestamp: "2025-04-10T15:04:13.248262+00:00"
- type: run-end
data: null
timestamp: "2025-04-10T15:04:13.248500+00:00"
stt:
engine: stt.home_assistant_cloud
metadata:
language: en-US
format: wav
codec: pcm
bit_rate: 16
sample_rate: 16000
channel: 1
done: true
stt_output:
text: Is the office climate on?
intent:
engine: conversation.google_generative_ai
language: en-US
intent_input: Is the office climate on?
conversation_id: 01JRG2TEZQ0QB7RD0PZPPMTY5W
device_id: b8146cd22fd5962936417cda25274730
prefer_local_intents: true
done: true
processed_locally: false
intent_output:
response:
speech:
plain:
speech: Sorry, I couldn't understand that
extra_data: null
card: {}
language: en-US
response_type: action_done
data:
targets: []
success: []
failed: []
conversation_id: 01JRG2TEZQ0QB7RD0PZPPMTY5W
continue_conversation: false
tts:
engine: tts.home_assistant_cloud
language: en-US
voice: ChristopherNeural
tts_input: Sorry, I couldn't understand that
done: true
tts_output:
media_id: >-
media-source://tts/tts.home_assistant_cloud?message=Sorry,+I+couldn't+understand+that&language=en-US&tts_options=%7B%22audio_output%22:%22mp3%22,%22voice%22:%22ChristopherNeural%22,%22preferred_format%22:%22flac%22,%22preferred_sample_rate%22:48000,%22preferred_sample_channels%22:1,%22preferred_sample_bytes%22:2%7D
token: ZITDXQEypz-0pfO1Er8tiw.flac
url: /api/tts_proxy/ZITDXQEypz-0pfO1Er8tiw.flac
mime_type: audio/flac
will35
April 10, 2025, 3:54pm
4
Very strange, same pipeline, different results.
You have the last firmware on your HAVPE ? 25.3.4 (ESPHome 2025.3.3)
And with build in intent ?
I do
This has been happening for a while and there have been a few updates that have not resolved it. It’s very odd indeed.
will35
April 10, 2025, 3:58pm
6
And with build in intent ?
will35
April 10, 2025, 4:04pm
8
Like " turn on kitchen light", “close cover”…etc (handling commands locally)
Ah! Yes, those work fine using Home Assistant Cloud
Assist settings for this one
will35
April 10, 2025, 4:10pm
10
Do this test with your LLM pipeline ( default one ?)
The same command works, but it’s being processed locally:
Of course now that this worked, the original command from my first post is working correctly.
This does remind me that it has been intermittent in the past, sometimes it works, but then it breaks and continues to be broken for a long time so I stop using it.
will35
April 10, 2025, 4:25pm
12
I use Gemini Flash 2.0 with Pay-as-you-go billing API plan and i don’t have this problem.
Your billing plan is free of charge ?
No, I’m using the paid plan. The “LLM” assist entry uses the same Gemini Flash settings, but it works when using it through the web but not from Voice PE as was shown in the first few posts, that’s what is odd. If it iwas just failing every time with Gemini, that would make more sense.
The debug doesn’t seem to show a difference I can pinpoint to explain why it works with one and not the other.
These are my Google AI setings in Home Assistant:
My Google Gemini account (or whatever it is called these days) is set to use the paid version.
will35
April 10, 2025, 4:32pm
14
Sorry I can’t help you more. I hope you find an answer on this forum.
1 Like
I appreciate the time spent helping me diagnose.
1 Like