Voice assistant first request in "awhile" results in audio response not being played

Hey all, I’ve got a bug I’m running into and looking to see if anyone else has seen something similar.

The symptoms:
When asking for the first request in “awhile” (this happens at least every few hours if no command have been issued lately):

  • The wake word is detected (the sound is issued that a command has been heard)
  • The command is heard clearly and STT worked properly, per the debugger in the HA UI
  • According to the Assist Satellite addon-logs it also thinks it played the audio response without error
  • No audio response was played other than the ding that a command was received
  • The actions do also happen in the background (lights on and off etc)
  • Finally, any subsequent command works fine, without any issues. Full audio response, and the logs seem the same, to my eye, anyway

The setup:
I’m running the assist microphone addon, the latest HA version, and gemini as my LLM, and openwakeword as my wake word detection for my Jabra USB speakerphone. Assist Satellite is just a local audio version of Wyoming satellite running as an addon.

Have others experienced this?

Also some logs:

Assist Satellite
INFO:root:Connected to server
INFO:root:Streaming audio
DEBUG:root:Event(type='run-pipeline', data={'start_stage': 'wake', 'end_stage': 'tts', 'restart_on_end': True, 'snd_format': {'rate': 16000, 'width': 2, 'channels': 1}}, payload=None)
DEBUG:root:Ping enabled
DEBUG:root:Wake word detected
DEBUG:root:Event(type='transcript', data={'text': 'What time is it?'}, payload=None)
INFO:root:Streaming audio
DEBUG:root:Connected to snd service
DEBUG:root:Event(type='synthesize', data={'text': 'The time is 07:11 AM', 'voice': {'name': 'AvaNeural'}}, payload=None)
Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono
HA Assist pipeline of a failed run
stage: done
run:
  pipeline: 01hzttk4v9w4374h8ct6qncmey
  language: en
  conversation_id: 01JYFCK65QD0ZFJ7A4PSKC7JEF
  tts_output:
    token: gtl8LmXy3vqv-nYKttvnYg.wav
    url: /api/tts_proxy/gtl8LmXy3vqv-nYKttvnYg.wav
    mime_type: audio/x-wav
    stream_response: false
events:
  - type: run-start
    data:
      pipeline: 01hzttk4v9w4374h8ct6qncmey
      language: en
      conversation_id: 01JYFCK65QD0ZFJ7A4PSKC7JEF
      tts_output:
        token: gtl8LmXy3vqv-nYKttvnYg.wav
        url: /api/tts_proxy/gtl8LmXy3vqv-nYKttvnYg.wav
        mime_type: audio/x-wav
        stream_response: false
    timestamp: "2025-07-01T04:55:24.660946+00:00"
  - type: wake_word-start
    data:
      entity_id: wake_word.openwakeword
      metadata:
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
      timeout: 0
    timestamp: "2025-07-01T04:55:24.661212+00:00"
  - type: wake_word-end
    data:
      wake_word_output:
        wake_word_id: ok_nabu_v0.1
        wake_word_phrase: ok nabu
        timestamp: 33366780
    timestamp: "2025-07-01T14:11:44.432968+00:00"
  - type: stt-start
    data:
      engine: stt.home_assistant_cloud
      metadata:
        language: en-US
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2025-07-01T14:11:44.435498+00:00"
  - type: stt-vad-start
    data:
      timestamp: 33367310
    timestamp: "2025-07-01T14:11:45.092377+00:00"
  - type: stt-vad-end
    data:
      timestamp: 33368730
    timestamp: "2025-07-01T14:11:46.235985+00:00"
  - type: stt-end
    data:
      stt_output:
        text: What time is it?
    timestamp: "2025-07-01T14:11:46.410689+00:00"
  - type: intent-start
    data:
      engine: conversation.google_generative_ai_conversation
      language: en-US
      intent_input: What time is it?
      conversation_id: 01JYFCK65QD0ZFJ7A4PSKC7JEF
      device_id: aa360a57a7160ab5499b364b59a65b7e
      prefer_local_intents: true
    timestamp: "2025-07-01T14:11:46.411540+00:00"
  - type: intent-end
    data:
      processed_locally: false
      intent_output:
        response:
          speech:
            plain:
              speech: The time is 07:11 AM
              extra_data: null
          card: {}
          language: "*"
          response_type: action_done
          data:
            targets: []
            success: []
            failed: []
        conversation_id: 01JYFCK65QD0ZFJ7A4PSKC7JEF
        continue_conversation: false
    timestamp: "2025-07-01T14:11:46.424303+00:00"
  - type: tts-start
    data:
      engine: tts.home_assistant_cloud
      language: en-US
      voice: AvaNeural
      tts_input: The time is 07:11 AM
    timestamp: "2025-07-01T14:11:46.424554+00:00"
  - type: tts-end
    data:
      tts_output:
        media_id: media-source://tts/-stream-/gtl8LmXy3vqv-nYKttvnYg.wav
        token: gtl8LmXy3vqv-nYKttvnYg.wav
        url: /api/tts_proxy/gtl8LmXy3vqv-nYKttvnYg.wav
        mime_type: audio/x-wav
    timestamp: "2025-07-01T14:11:46.431173+00:00"
  - type: run-end
    data: null
    timestamp: "2025-07-01T14:11:46.431278+00:00"
wake_word:
  entity_id: wake_word.openwakeword
  metadata:
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  timeout: 0
  done: true
  wake_word_output:
    wake_word_id: ok_nabu_v0.1
    wake_word_phrase: ok nabu
    timestamp: 33366780
stt:
  engine: stt.home_assistant_cloud
  metadata:
    language: en-US
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: true
  stt_output:
    text: What time is it?
intent:
  engine: conversation.google_generative_ai_conversation
  language: en-US
  intent_input: What time is it?
  conversation_id: 01JYFCK65QD0ZFJ7A4PSKC7JEF
  device_id: aa360a57a7160ab5499b364b59a65b7e
  prefer_local_intents: true
  done: true
  processed_locally: false
  intent_output:
    response:
      speech:
        plain:
          speech: The time is 07:11 AM
          extra_data: null
      card: {}
      language: "*"
      response_type: action_done
      data:
        targets: []
        success: []
        failed: []
    conversation_id: 01JYFCK65QD0ZFJ7A4PSKC7JEF
    continue_conversation: false
tts:
  engine: tts.home_assistant_cloud
  language: en-US
  voice: AvaNeural
  tts_input: The time is 07:11 AM
  done: true
  tts_output:
    media_id: media-source://tts/-stream-/gtl8LmXy3vqv-nYKttvnYg.wav
    token: gtl8LmXy3vqv-nYKttvnYg.wav
    url: /api/tts_proxy/gtl8LmXy3vqv-nYKttvnYg.wav
    mime_type: audio/x-wav

Anecdotally, my brother has had the same problem (haven’t confirmed with logs yet, but identical behavior) using a voice PE, so my suspicion is outside of satellite, but first I wanted to see if others have seen this.

1 Like

I have the same problem. It’s happening for a few weeks now and I have no idea what’s going on. Tried debugging but with no luck. I also havd the mic add-on, but this happens also with the satellite running on Raspberry Pi 4 and on the other satellite on Atom Echo. I thought it could be something related to the beta versions I keep updating. I even bought a new sd card to install a new HA on the Raspberry Pi and see if this happens with a similar setup (mic add-on), or if it happens when I restore my backup. I think I’ll have some time to do these tests this weekend.
Don’t get me wrong, but it’s good to know that someone else is facing the same problem. :joy::joy:

1 Like

Good to know I’m not alone at least! A couple other symptoms that might be diagnostic clues:

  1. If I go to the debug voice assistant UI, and try to Play Audio, I get and Error playing audio on the pipeline instance that failed to produce audio output from the satellite/microphone addon, but no such error (it just plays) when I do so on the pipeline where I reasked the same question immediately afterwards, though after some time has passed, both say error playing audio.
  2. In checking the TTS cache after the fact, I confirmed both files exist and are perfectly playable audio (identifiably by timestamp and the specific wording of the response)
  3. The first pipeline, the one that failed to have audio output, is also triggered hours ago, in my case, because I’m using OpenWakeWord. My most recent example had a wake word time of 22409 seconds, which while not technically relevant is a substantial enough difference that there could be some kind of pipeline timeout I’m seeing?

In my own case I’m using Home Assistant Cloud Ava voice.

Various theories I haven’t looked into yet, but a stream of consciousness: Are the audio files in the TTS cache both identical format/metadata? Possibly VLC can handle both but satellite can’t? Could it be a streaming issue since HA recentlyish added more live streaming for voice assistants, where HA is trying to stream content that doesn’t actually exist yet or hasn’t reach the host due to an unwarm connection to HA cloud. Might it be because the version of satellite under use is old? I don’t think Assist Microphones satellite version has been kept up to date with new releases. Where are the TTS audio files for the debugger stored vs the ones in the cache? If it’s the same cache, why is only the working one initially playable through the UI?

1 Like

I just set up my first ever home assistant rig today and have this exact same problem. I have a Beelink S13 Pro running Home Assistant through a VM on Proxmox. Add-ons include Assist Microphone, openWakeWord, Piper, Speech-to-Phrase, and Whisper.

When I haven’t spoken to my voice assistant for a while, it will do tasks but not give me any audible response. For example, if I ask it to turn on the TV, it will but it won’t say “Done” after. If I ask it to tell me the time, it won’t say anything in return.

2 Likes

I also encountered the same problem, in ha core 2025.9.4, Assist Microphone 1.3.0

Is there any progress or workaround?

Same here with 2025.10.4 and microphone assist 1.3

Hi,

I am seeing a repeatable issue with the Home Assistant voice assistant using the “OK Nabu” wake word.

Symptoms:

  • After a longer idle period (several minutes without using the assistant), I say “OK Nabu”, hear the wake sound and then ask a question that requires an internet/LLM answer (e.g. weather, time, general knowledge).
  • The pipeline clearly starts (wake sound, lights, etc.), but I do not hear any spoken response.
  • If I immediately say “OK Nabu” again and ask another internet question, this time the answer is played correctly.
  • Subsequent questions within a short time window keep working fine – the issue only affects the first request after a longer idle period.
  • Local home automation commands (turn on/off lights, etc.) work on the first try, even after a long idle time – the problem seems to affect only questions that should produce a spoken answer via ChatGPT/LLM or similar.
    Additional live test confirming this behaviour:
  • After a longer idle period I say “OK Nabu – [internet question]” → no audio response.
  • Without changing any configuration I immediately say “OK Nabu – [internet question]” again → audio response is played correctly.
  • I can reproduce this pattern every time after another long idle period.
    This looks very similar to reports where the first response after being IDLE for a while is not played out, but subsequent responses are fine (ESP32 voice satellites / Voice Assistant).
    Questions:
  • Is this a known issue in the Assist / TTS / playback path after being idle for some time (e.g. connection going to sleep, timeout, TTS cache state)?
  • Are there any recommended workarounds other than “warming up” the assistant with a short question (“what time is it?”) before the actual internet question?
  • Do you need any specific logs (pipeline, Conversation, TTS) that I can attach to help diagnose this?
    Environment (short description):
  • Home Assistant Core: up‑to‑date stable release (around 2026.xx.x).
  • Voice assistant with “OK Nabu” wake word (Assist / Nabu + microphone directly in HA).
  • Audio hardware: a simple USB microphone and a speaker connected directly to the Home Assistant host (no external Google Home / Alexa devices involved).
    Thanks in advance for any hints.