Voice assistant first request in "awhile" results in audio response not being played

Hey all, I’ve got a bug I’m running into and looking to see if anyone else has seen something similar.

The symptoms:
When asking for the first request in “awhile” (this happens at least every few hours if no command have been issued lately):

  • The wake word is detected (the sound is issued that a command has been heard)
  • The command is heard clearly and STT worked properly, per the debugger in the HA UI
  • According to the Assist Satellite addon-logs it also thinks it played the audio response without error
  • No audio response was played other than the ding that a command was received
  • The actions do also happen in the background (lights on and off etc)
  • Finally, any subsequent command works fine, without any issues. Full audio response, and the logs seem the same, to my eye, anyway

The setup:
I’m running the assist microphone addon, the latest HA version, and gemini as my LLM, and openwakeword as my wake word detection for my Jabra USB speakerphone. Assist Satellite is just a local audio version of Wyoming satellite running as an addon.

Have others experienced this?

Also some logs:

Assist Satellite
INFO:root:Connected to server
INFO:root:Streaming audio
DEBUG:root:Event(type='run-pipeline', data={'start_stage': 'wake', 'end_stage': 'tts', 'restart_on_end': True, 'snd_format': {'rate': 16000, 'width': 2, 'channels': 1}}, payload=None)
DEBUG:root:Ping enabled
DEBUG:root:Wake word detected
DEBUG:root:Event(type='transcript', data={'text': 'What time is it?'}, payload=None)
INFO:root:Streaming audio
DEBUG:root:Connected to snd service
DEBUG:root:Event(type='synthesize', data={'text': 'The time is 07:11 AM', 'voice': {'name': 'AvaNeural'}}, payload=None)
Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono
HA Assist pipeline of a failed run
stage: done
run:
  pipeline: 01hzttk4v9w4374h8ct6qncmey
  language: en
  conversation_id: 01JYFCK65QD0ZFJ7A4PSKC7JEF
  tts_output:
    token: gtl8LmXy3vqv-nYKttvnYg.wav
    url: /api/tts_proxy/gtl8LmXy3vqv-nYKttvnYg.wav
    mime_type: audio/x-wav
    stream_response: false
events:
  - type: run-start
    data:
      pipeline: 01hzttk4v9w4374h8ct6qncmey
      language: en
      conversation_id: 01JYFCK65QD0ZFJ7A4PSKC7JEF
      tts_output:
        token: gtl8LmXy3vqv-nYKttvnYg.wav
        url: /api/tts_proxy/gtl8LmXy3vqv-nYKttvnYg.wav
        mime_type: audio/x-wav
        stream_response: false
    timestamp: "2025-07-01T04:55:24.660946+00:00"
  - type: wake_word-start
    data:
      entity_id: wake_word.openwakeword
      metadata:
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
      timeout: 0
    timestamp: "2025-07-01T04:55:24.661212+00:00"
  - type: wake_word-end
    data:
      wake_word_output:
        wake_word_id: ok_nabu_v0.1
        wake_word_phrase: ok nabu
        timestamp: 33366780
    timestamp: "2025-07-01T14:11:44.432968+00:00"
  - type: stt-start
    data:
      engine: stt.home_assistant_cloud
      metadata:
        language: en-US
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2025-07-01T14:11:44.435498+00:00"
  - type: stt-vad-start
    data:
      timestamp: 33367310
    timestamp: "2025-07-01T14:11:45.092377+00:00"
  - type: stt-vad-end
    data:
      timestamp: 33368730
    timestamp: "2025-07-01T14:11:46.235985+00:00"
  - type: stt-end
    data:
      stt_output:
        text: What time is it?
    timestamp: "2025-07-01T14:11:46.410689+00:00"
  - type: intent-start
    data:
      engine: conversation.google_generative_ai_conversation
      language: en-US
      intent_input: What time is it?
      conversation_id: 01JYFCK65QD0ZFJ7A4PSKC7JEF
      device_id: aa360a57a7160ab5499b364b59a65b7e
      prefer_local_intents: true
    timestamp: "2025-07-01T14:11:46.411540+00:00"
  - type: intent-end
    data:
      processed_locally: false
      intent_output:
        response:
          speech:
            plain:
              speech: The time is 07:11 AM
              extra_data: null
          card: {}
          language: "*"
          response_type: action_done
          data:
            targets: []
            success: []
            failed: []
        conversation_id: 01JYFCK65QD0ZFJ7A4PSKC7JEF
        continue_conversation: false
    timestamp: "2025-07-01T14:11:46.424303+00:00"
  - type: tts-start
    data:
      engine: tts.home_assistant_cloud
      language: en-US
      voice: AvaNeural
      tts_input: The time is 07:11 AM
    timestamp: "2025-07-01T14:11:46.424554+00:00"
  - type: tts-end
    data:
      tts_output:
        media_id: media-source://tts/-stream-/gtl8LmXy3vqv-nYKttvnYg.wav
        token: gtl8LmXy3vqv-nYKttvnYg.wav
        url: /api/tts_proxy/gtl8LmXy3vqv-nYKttvnYg.wav
        mime_type: audio/x-wav
    timestamp: "2025-07-01T14:11:46.431173+00:00"
  - type: run-end
    data: null
    timestamp: "2025-07-01T14:11:46.431278+00:00"
wake_word:
  entity_id: wake_word.openwakeword
  metadata:
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  timeout: 0
  done: true
  wake_word_output:
    wake_word_id: ok_nabu_v0.1
    wake_word_phrase: ok nabu
    timestamp: 33366780
stt:
  engine: stt.home_assistant_cloud
  metadata:
    language: en-US
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: true
  stt_output:
    text: What time is it?
intent:
  engine: conversation.google_generative_ai_conversation
  language: en-US
  intent_input: What time is it?
  conversation_id: 01JYFCK65QD0ZFJ7A4PSKC7JEF
  device_id: aa360a57a7160ab5499b364b59a65b7e
  prefer_local_intents: true
  done: true
  processed_locally: false
  intent_output:
    response:
      speech:
        plain:
          speech: The time is 07:11 AM
          extra_data: null
      card: {}
      language: "*"
      response_type: action_done
      data:
        targets: []
        success: []
        failed: []
    conversation_id: 01JYFCK65QD0ZFJ7A4PSKC7JEF
    continue_conversation: false
tts:
  engine: tts.home_assistant_cloud
  language: en-US
  voice: AvaNeural
  tts_input: The time is 07:11 AM
  done: true
  tts_output:
    media_id: media-source://tts/-stream-/gtl8LmXy3vqv-nYKttvnYg.wav
    token: gtl8LmXy3vqv-nYKttvnYg.wav
    url: /api/tts_proxy/gtl8LmXy3vqv-nYKttvnYg.wav
    mime_type: audio/x-wav

Anecdotally, my brother has had the same problem (haven’t confirmed with logs yet, but identical behavior) using a voice PE, so my suspicion is outside of satellite, but first I wanted to see if others have seen this.

1 Like

I have the same problem. It’s happening for a few weeks now and I have no idea what’s going on. Tried debugging but with no luck. I also havd the mic add-on, but this happens also with the satellite running on Raspberry Pi 4 and on the other satellite on Atom Echo. I thought it could be something related to the beta versions I keep updating. I even bought a new sd card to install a new HA on the Raspberry Pi and see if this happens with a similar setup (mic add-on), or if it happens when I restore my backup. I think I’ll have some time to do these tests this weekend.
Don’t get me wrong, but it’s good to know that someone else is facing the same problem. :joy::joy:

Good to know I’m not alone at least! A couple other symptoms that might be diagnostic clues:

  1. If I go to the debug voice assistant UI, and try to Play Audio, I get and Error playing audio on the pipeline instance that failed to produce audio output from the satellite/microphone addon, but no such error (it just plays) when I do so on the pipeline where I reasked the same question immediately afterwards, though after some time has passed, both say error playing audio.
  2. In checking the TTS cache after the fact, I confirmed both files exist and are perfectly playable audio (identifiably by timestamp and the specific wording of the response)
  3. The first pipeline, the one that failed to have audio output, is also triggered hours ago, in my case, because I’m using OpenWakeWord. My most recent example had a wake word time of 22409 seconds, which while not technically relevant is a substantial enough difference that there could be some kind of pipeline timeout I’m seeing?

In my own case I’m using Home Assistant Cloud Ava voice.

Various theories I haven’t looked into yet, but a stream of consciousness: Are the audio files in the TTS cache both identical format/metadata? Possibly VLC can handle both but satellite can’t? Could it be a streaming issue since HA recentlyish added more live streaming for voice assistants, where HA is trying to stream content that doesn’t actually exist yet or hasn’t reach the host due to an unwarm connection to HA cloud. Might it be because the version of satellite under use is old? I don’t think Assist Microphones satellite version has been kept up to date with new releases. Where are the TTS audio files for the debugger stored vs the ones in the cache? If it’s the same cache, why is only the working one initially playable through the UI?

I just set up my first ever home assistant rig today and have this exact same problem. I have a Beelink S13 Pro running Home Assistant through a VM on Proxmox. Add-ons include Assist Microphone, openWakeWord, Piper, Speech-to-Phrase, and Whisper.

When I haven’t spoken to my voice assistant for a while, it will do tasks but not give me any audible response. For example, if I ask it to turn on the TV, it will but it won’t say “Done” after. If I ask it to tell me the time, it won’t say anything in return.

1 Like