Voice PE Not Working the same as the assistant on the app/web page

I got a Home Assistant Voice Preview Edition a few weeks ago and I’ve been having a hard time using it properly. It has gone through several updates but it still does not work great.

I have two main issues:

  1. Audio responses get cut off pretty often. I’ll ask it to do something and it will reply with “Sorry, I…” and stop. Happens mostly with failed requests which leads me to

  2. Requests made through this device fail when requests made directly on the Web app do not.

Example: “is the office climate on?”

I am making this request to an assistant called LLM which uses Google Generative AI which is configured to use Gemini Flash Lite (2.0 today but have been using 1.5 with similar issues). On the web, I get a proper response:

If I make the same request to the Voice PE, I just get “Sorry, I c…” Voice PE is configured to use the same LLM agent

I don’t know how to debug this to figure out what is happening, but I would expect the same results from using Assist on the web/app as when using it from an ESP32 device like Voice PE.

Hi

What is the STT engine used in the pipeline?

Just look at the voice pipeline debug tool if the spoken sentence is recognized correctly
Troubleshooting Assist - Home Assistant

Using Home Assistant Cloud. Here’s the config of the “LLM” Assist entry

Debug shows very similar STT outputs for PE and the HA

From Web:

Raw output

run:
  pipeline: 01j8828984zn35h0k025svgwkx
  language: en
  conversation_id: 01JRG3MW2W565PPEB929K92FVG
  runner_data:
    stt_binary_handler_id: 4
    timeout: 300
  tts_output:
    token: X4S-5JDVOlO4PeF3uKPRxQ.mp3
    url: /api/tts_proxy/X4S-5JDVOlO4PeF3uKPRxQ.mp3
    mime_type: audio/mpeg
events:
  - type: run-start
    data:
      pipeline: 01j8828984zn35h0k025svgwkx
      language: en
      conversation_id: 01JRG3MW2W565PPEB929K92FVG
      runner_data:
        stt_binary_handler_id: 4
        timeout: 300
      tts_output:
        token: X4S-5JDVOlO4PeF3uKPRxQ.mp3
        url: /api/tts_proxy/X4S-5JDVOlO4PeF3uKPRxQ.mp3
        mime_type: audio/mpeg
    timestamp: "2025-04-10T15:03:42.940655+00:00"
  - type: stt-start
    data:
      engine: stt.home_assistant_cloud
      metadata:
        language: en-US
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2025-04-10T15:03:42.940721+00:00"
  - type: stt-vad-start
    data:
      timestamp: 1490
    timestamp: "2025-04-10T15:03:44.460247+00:00"
  - type: stt-vad-end
    data:
      timestamp: 4040
    timestamp: "2025-04-10T15:03:47.010341+00:00"
  - type: stt-end
    data:
      stt_output:
        text: Is the office climate on?
    timestamp: "2025-04-10T15:03:47.031625+00:00"
  - type: intent-start
    data:
      engine: conversation.google_generative_ai
      language: en-US
      intent_input: Is the office climate on?
      conversation_id: 01JRG3MW2W565PPEB929K92FVG
      device_id: null
      prefer_local_intents: true
    timestamp: "2025-04-10T15:03:47.031699+00:00"
  - type: intent-end
    data:
      processed_locally: false
      intent_output:
        response:
          speech:
            plain:
              speech: >-
                Yes, the office climate is on and set to cool. The current
                temperature is 20 and the target temperature is 22.
              extra_data: null
          card: {}
          language: en-US
          response_type: action_done
          data:
            targets: []
            success: []
            failed: []
        conversation_id: 01JRG3MW2W565PPEB929K92FVG
        continue_conversation: false
    timestamp: "2025-04-10T15:03:48.175331+00:00"
  - type: tts-start
    data:
      engine: tts.home_assistant_cloud
      language: en-US
      voice: ChristopherNeural
      tts_input: >-
        Yes, the office climate is on and set to cool. The current temperature
        is 20 and the target temperature is 22.
    timestamp: "2025-04-10T15:03:48.175380+00:00"
  - type: tts-end
    data:
      tts_output:
        media_id: >-
          media-source://tts/tts.home_assistant_cloud?message=Yes,+the+office+climate+is+on+and+set+to+cool.+The+current+temperature+is+20+and+the+target+temperature+is+22.&language=en-US&tts_options=%7B%22audio_output%22:%22mp3%22,%22voice%22:%22ChristopherNeural%22%7D
        token: X4S-5JDVOlO4PeF3uKPRxQ.mp3
        url: /api/tts_proxy/X4S-5JDVOlO4PeF3uKPRxQ.mp3
        mime_type: audio/mpeg
    timestamp: "2025-04-10T15:03:48.176662+00:00"
  - type: run-end
    data: null
    timestamp: "2025-04-10T15:03:48.176700+00:00"
stt:
  engine: stt.home_assistant_cloud
  metadata:
    language: en-US
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: true
  stt_output:
    text: Is the office climate on?
intent:
  engine: conversation.google_generative_ai
  language: en-US
  intent_input: Is the office climate on?
  conversation_id: 01JRG3MW2W565PPEB929K92FVG
  device_id: null
  prefer_local_intents: true
  done: true
  processed_locally: false
  intent_output:
    response:
      speech:
        plain:
          speech: >-
            Yes, the office climate is on and set to cool. The current
            temperature is 20 and the target temperature is 22.
          extra_data: null
      card: {}
      language: en-US
      response_type: action_done
      data:
        targets: []
        success: []
        failed: []
    conversation_id: 01JRG3MW2W565PPEB929K92FVG
    continue_conversation: false
tts:
  engine: tts.home_assistant_cloud
  language: en-US
  voice: ChristopherNeural
  tts_input: >-
    Yes, the office climate is on and set to cool. The current temperature is 20
    and the target temperature is 22.
  done: true
  tts_output:
    media_id: >-
      media-source://tts/tts.home_assistant_cloud?message=Yes,+the+office+climate+is+on+and+set+to+cool.+The+current+temperature+is+20+and+the+target+temperature+is+22.&language=en-US&tts_options=%7B%22audio_output%22:%22mp3%22,%22voice%22:%22ChristopherNeural%22%7D
    token: X4S-5JDVOlO4PeF3uKPRxQ.mp3
    url: /api/tts_proxy/X4S-5JDVOlO4PeF3uKPRxQ.mp3
    mime_type: audio/mpeg

Using Voice PE

run:
  pipeline: 01j8828984zn35h0k025svgwkx
  language: en
  conversation_id: 01JRG2TEZQ0QB7RD0PZPPMTY5W
  tts_output:
    token: ZITDXQEypz-0pfO1Er8tiw.flac
    url: /api/tts_proxy/ZITDXQEypz-0pfO1Er8tiw.flac
    mime_type: audio/flac
events:
  - type: run-start
    data:
      pipeline: 01j8828984zn35h0k025svgwkx
      language: en
      conversation_id: 01JRG2TEZQ0QB7RD0PZPPMTY5W
      tts_output:
        token: ZITDXQEypz-0pfO1Er8tiw.flac
        url: /api/tts_proxy/ZITDXQEypz-0pfO1Er8tiw.flac
        mime_type: audio/flac
    timestamp: "2025-04-10T15:04:08.808652+00:00"
  - type: stt-start
    data:
      engine: stt.home_assistant_cloud
      metadata:
        language: en-US
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2025-04-10T15:04:08.808792+00:00"
  - type: stt-vad-start
    data:
      timestamp: 1340
    timestamp: "2025-04-10T15:04:10.139294+00:00"
  - type: stt-vad-end
    data:
      timestamp: 3830
    timestamp: "2025-04-10T15:04:12.594475+00:00"
  - type: stt-end
    data:
      stt_output:
        text: Is the office climate on?
    timestamp: "2025-04-10T15:04:12.626966+00:00"
  - type: intent-start
    data:
      engine: conversation.google_generative_ai
      language: en-US
      intent_input: Is the office climate on?
      conversation_id: 01JRG2TEZQ0QB7RD0PZPPMTY5W
      device_id: b8146cd22fd5962936417cda25274730
      prefer_local_intents: true
    timestamp: "2025-04-10T15:04:12.627107+00:00"
  - type: intent-end
    data:
      processed_locally: false
      intent_output:
        response:
          speech:
            plain:
              speech: Sorry, I couldn't understand that
              extra_data: null
          card: {}
          language: en-US
          response_type: action_done
          data:
            targets: []
            success: []
            failed: []
        conversation_id: 01JRG2TEZQ0QB7RD0PZPPMTY5W
        continue_conversation: false
    timestamp: "2025-04-10T15:04:13.247191+00:00"
  - type: tts-start
    data:
      engine: tts.home_assistant_cloud
      language: en-US
      voice: ChristopherNeural
      tts_input: Sorry, I couldn't understand that
    timestamp: "2025-04-10T15:04:13.247348+00:00"
  - type: tts-end
    data:
      tts_output:
        media_id: >-
          media-source://tts/tts.home_assistant_cloud?message=Sorry,+I+couldn't+understand+that&language=en-US&tts_options=%7B%22audio_output%22:%22mp3%22,%22voice%22:%22ChristopherNeural%22,%22preferred_format%22:%22flac%22,%22preferred_sample_rate%22:48000,%22preferred_sample_channels%22:1,%22preferred_sample_bytes%22:2%7D
        token: ZITDXQEypz-0pfO1Er8tiw.flac
        url: /api/tts_proxy/ZITDXQEypz-0pfO1Er8tiw.flac
        mime_type: audio/flac
    timestamp: "2025-04-10T15:04:13.248262+00:00"
  - type: run-end
    data: null
    timestamp: "2025-04-10T15:04:13.248500+00:00"
stt:
  engine: stt.home_assistant_cloud
  metadata:
    language: en-US
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: true
  stt_output:
    text: Is the office climate on?
intent:
  engine: conversation.google_generative_ai
  language: en-US
  intent_input: Is the office climate on?
  conversation_id: 01JRG2TEZQ0QB7RD0PZPPMTY5W
  device_id: b8146cd22fd5962936417cda25274730
  prefer_local_intents: true
  done: true
  processed_locally: false
  intent_output:
    response:
      speech:
        plain:
          speech: Sorry, I couldn't understand that
          extra_data: null
      card: {}
      language: en-US
      response_type: action_done
      data:
        targets: []
        success: []
        failed: []
    conversation_id: 01JRG2TEZQ0QB7RD0PZPPMTY5W
    continue_conversation: false
tts:
  engine: tts.home_assistant_cloud
  language: en-US
  voice: ChristopherNeural
  tts_input: Sorry, I couldn't understand that
  done: true
  tts_output:
    media_id: >-
      media-source://tts/tts.home_assistant_cloud?message=Sorry,+I+couldn't+understand+that&language=en-US&tts_options=%7B%22audio_output%22:%22mp3%22,%22voice%22:%22ChristopherNeural%22,%22preferred_format%22:%22flac%22,%22preferred_sample_rate%22:48000,%22preferred_sample_channels%22:1,%22preferred_sample_bytes%22:2%7D
    token: ZITDXQEypz-0pfO1Er8tiw.flac
    url: /api/tts_proxy/ZITDXQEypz-0pfO1Er8tiw.flac
    mime_type: audio/flac

Very strange, same pipeline, different results.
You have the last firmware on your HAVPE ? 25.3.4 (ESPHome 2025.3.3)

And with build in intent ?

I do

This has been happening for a while and there have been a few updates that have not resolved it. It’s very odd indeed.

And with build in intent ?

What do you mean?

Like " turn on kitchen light", “close cover”…etc (handling commands locally)

Ah! Yes, those work fine using Home Assistant Cloud

Assist settings for this one

Do this test with your LLM pipeline ( default one ?)

The same command works, but it’s being processed locally:

Of course now that this worked, the original command from my first post is working correctly.

This does remind me that it has been intermittent in the past, sometimes it works, but then it breaks and continues to be broken for a long time so I stop using it.

I use Gemini Flash 2.0 with Pay-as-you-go billing API plan and i don’t have this problem.
Your billing plan is free of charge ?

No, I’m using the paid plan. The “LLM” assist entry uses the same Gemini Flash settings, but it works when using it through the web but not from Voice PE as was shown in the first few posts, that’s what is odd. If it iwas just failing every time with Gemini, that would make more sense.

The debug doesn’t seem to show a difference I can pinpoint to explain why it works with one and not the other.

These are my Google AI setings in Home Assistant:


My Google Gemini account (or whatever it is called these days) is set to use the paid version.

Sorry I can’t help you more. I hope you find an answer on this forum.

1 Like

I appreciate the time spent helping me diagnose.

1 Like