S3 Box 3 does not trigger whisper stt / streaming not working

ADeane6 · March 19, 2024, 8:06pm

So after setting up an assist pipeline and installing the latest voice assistant to a S3 box 3 local wake word works and opens a stream for the microphone but no response in whisper, home assistant shows assist as working, and I can see the triggering of whisper in assist debug but no text and seems to just hang.

esp32 screen also just shows white logo (not intent logo) and hangs until I mute from home assistant.

I tried switching to using wake word detection in the assist pipeline via openwakeword doesn’t work (No logs or wake wor detected)

S3 box 3 shows as assist in progress in ha even when the wake word is on device and the device is in detecting wake word state which seems incorrect to me.

Screenshot 2024-03-19 at 19.49.23

Here are the logs for the S3 box

[D][micro_wake_word:170]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[D][esp-idf:000]: I (941588) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[D][esp-idf:000]: I (941592) I2S: I2S0, MCLK output by GPIO2

[D][esp-idf:000]: I (941596) AUDIO_PIPELINE: link el->rb, el:0x3d05c5f0, tag:i2s, rb:0x3d05ca04

[D][esp-idf:000]: I (941598) AUDIO_PIPELINE: link el->rb, el:0x3d05c764, tag:filter, rb:0x3d05ea44
[D][esp-idf:000]: I (941603) AUDIO_ELEMENT: [i2s-0x3d05c5f0] Element task created

[D][esp-idf:000]: I (941605) AUDIO_THREAD: The filter task allocate stack on external memory

[D][esp-idf:000]: I (941608) AUDIO_ELEMENT: [filter-0x3d05c764] Element task created
[D][esp-idf:000]: I (941610) AUDIO_ELEMENT: [raw-0x3d05c894] Element task created



[D][esp-idf:000]: I (941614) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1

[D][esp-idf:000]: I (941617) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1
[D][esp-idf:000]: I (941620) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1

[D][esp-idf:000]: I (941624) AUDIO_PIPELINE: Pipeline started

[D][esp_adf.microphone:273]: Microphone started
[D][micro_wake_word:170]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD
[D][esp32.preferences:114]: Saving 1 preferences to flash...
[D][esp32.preferences:143]: Saving 1 preferences to flash: 1 cached, 0 written, 0 failed
[D][micro_wake_word:121]: Wake Word Detected
[D][micro_wake_word:170]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[D][micro_wake_word:127]: Stopping Microphone
[D][esp_adf.microphone:234]: Stopping microphone
[D][micro_wake_word:170]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE

[D][esp-idf:000]: W (1094412) AUDIO_PIPELINE: There are no listener registered

[D][esp-idf:000]: I (1094414) AUDIO_PIPELINE: audio_pipeline_unlinked
[D][esp-idf:000]: W (1094414) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE

[D][esp-idf:000]: I (1094416) I2S: DMA queue destroyed

[D][esp-idf:000]: W (1094418) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE
[D][esp-idf:000]: W (1094420) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE

[D][esp_adf.microphone:285]: Microphone stopped
[D][micro_wake_word:170]: State changed from STOPPING_MICROPHONE to IDLE
[D][voice_assistant:416]: State changed from IDLE to START_PIPELINE
[D][voice_assistant:422]: Desired state set to START_MICROPHONE
[D][voice_assistant:118]: microphone not running
[D][voice_assistant:202]: Requesting start...
[D][voice_assistant:416]: State changed from START_PIPELINE to STARTING_PIPELINE
[D][voice_assistant:437]: Client started, streaming microphone
[D][voice_assistant:416]: State changed from STARTING_PIPELINE to START_MICROPHONE
[D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[D][voice_assistant:155]: Starting Microphone
[D][voice_assistant:416]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[D][voice_assistant:523]: Event Type: 1
[D][voice_assistant:526]: Assist Pipeline running
[D][voice_assistant:523]: Event Type: 3
[D][voice_assistant:537]: STT started
[D][esp-idf:000]: I (1094475) AUDIO_PIPELINE: link el->rb, el:0x3d05c5f0, tag:i2s, rb:0x3d05ca04

[D][esp-idf:000]: I (1094477) AUDIO_PIPELINE: link el->rb, el:0x3d05c764, tag:filter, rb:0x3d05ea44

[D][esp-idf:000]: I (1094481) AUDIO_ELEMENT: [i2s-0x3d05c5f0] Element task created

[D][esp-idf:000]: I (1094481) AUDIO_THREAD: The filter task allocate stack on external memory

[D][esp-idf:000]: I (1094484) AUDIO_ELEMENT: [filter-0x3d05c764] Element task created

[D][esp-idf:000]: I (1094484) AUDIO_ELEMENT: [raw-0x3d05c894] Element task created


[D][esp-idf:000]: I (1094488) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1

[D][esp-idf:000]: I (1094490) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1

[D][esp-idf:000]: I (1094493) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1
[D][esp-idf:000]: I (1094495) AUDIO_PIPELINE: Pipeline started

[W][component:214]: Component voice_assistant took a long time for an operation (0.22 s).
[W][component:215]: Components should block for at most 20-30ms.
[D][esp_adf.microphone:273]: Microphone started
[D][voice_assistant:416]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE

There seem to be no logs in Whisper until I mute the box which I assume kills the microphone stream

The debug info from assist debug

stage: stt
run:
  pipeline: 01h3aqwm2apt1dftgvbzyfb4sw
  language: en
events:
  - type: run-start
    data:
      pipeline: 01h3aqwm2apt1dftgvbzyfb4sw
      language: en
    timestamp: "2024-03-19T19:25:40.454130+00:00"
  - type: stt-start
    data:
      engine: stt.faster_whisper
      metadata:
        language: en
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2024-03-19T19:25:40.454451+00:00"
stt:
  engine: stt.faster_whisper
  metadata:
    language: en
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: false

thought it could be slow to process so left for a few minutes and nothing

My assist pipeline is pretty basic, using faster-whipser with tiny model, I’ve tried different models and no difference. Piper is standard and working fine when testing in assist debug

Also to fully verify the assist pipeline I used a microphone on pc in debug mode which works perfectly fine.

init_options:
  start_stage: wake_word
  end_stage: tts
  input:
    sample_rate: 44100
  pipeline: 01h3aqwm2apt1dftgvbzyfb4sw
  conversation_id: null
stage: done
run:
  pipeline: 01h3aqwm2apt1dftgvbzyfb4sw
  language: en
  runner_data:
    stt_binary_handler_id: 4
    timeout: 300
events:
  - type: run-start
    data:
      pipeline: 01h3aqwm2apt1dftgvbzyfb4sw
      language: en
      runner_data:
        stt_binary_handler_id: 4
        timeout: 300
    timestamp: "2024-03-19T19:30:39.246644+00:00"
  - type: wake_word-start
    data:
      entity_id: wake_word.openwakeword
      metadata:
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
      timeout: 3
    timestamp: "2024-03-19T19:30:39.246888+00:00"
  - type: wake_word-end
    data:
      wake_word_output:
        wake_word_id: ok_nabu_v0.1
        wake_word_phrase: ok nabu
        timestamp: 1990
    timestamp: "2024-03-19T19:30:43.479709+00:00"
  - type: stt-start
    data:
      engine: stt.faster_whisper
      metadata:
        language: en
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2024-03-19T19:30:43.479975+00:00"
  - type: stt-vad-start
    data:
      timestamp: 2490
    timestamp: "2024-03-19T19:30:44.445430+00:00"
  - type: stt-vad-end
    data:
      timestamp: 3340
    timestamp: "2024-03-19T19:30:46.149604+00:00"
  - type: stt-end
    data:
      stt_output:
        text: " Turn on Bedroom Light."
    timestamp: "2024-03-19T19:30:46.694521+00:00"
  - type: intent-start
    data:
      engine: homeassistant
      language: en
      intent_input: " Turn on Bedroom Light."
      conversation_id: null
      device_id: null
    timestamp: "2024-03-19T19:30:46.694586+00:00"
  - type: intent-end
    data:
      intent_output:
        response:
          speech:
            plain:
              speech: Turned on the light
              extra_data: null
          card: {}
          language: en
          response_type: action_done
          data:
            targets: []
            success:
              - name: Bedroom Light
                type: entity
                id: light.bedroom_light
            failed: []
        conversation_id: null
    timestamp: "2024-03-19T19:30:46.719617+00:00"
  - type: tts-start
    data:
      engine: tts.piper
      language: en_GB
      voice: en_GB-vctk-medium
      tts_input: Turned on the light
    timestamp: "2024-03-19T19:30:46.719676+00:00"
  - type: tts-end
    data:
      tts_output:
        media_id: >-
          media-source://tts/tts.piper?message=Turned+on+the+light&language=en_GB&voice=en_GB-vctk-medium
        url: >-
          /api/tts_proxy/104c89b5f9053e4751d03002aab527c96124bd77_en-gb_d3b473ba1f_tts.piper.mp3
        mime_type: audio/mpeg
    timestamp: "2024-03-19T19:30:46.719913+00:00"
  - type: run-end
    data: null
    timestamp: "2024-03-19T19:30:46.719940+00:00"
wake_word:
  entity_id: wake_word.openwakeword
  metadata:
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  timeout: 3
  done: true
  wake_word_output:
    wake_word_id: ok_nabu_v0.1
    wake_word_phrase: ok nabu
    timestamp: 1990
stt:
  engine: stt.faster_whisper
  metadata:
    language: en
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: true
  stt_output:
    text: " Turn on Bedroom Light."
intent:
  engine: homeassistant
  language: en
  intent_input: " Turn on Bedroom Light."
  conversation_id: null
  device_id: null
  done: true
  intent_output:
    response:
      speech:
        plain:
          speech: Turned on the light
          extra_data: null
      card: {}
      language: en
      response_type: action_done
      data:
        targets: []
        success:
          - name: Bedroom Light
            type: entity
            id: light.bedroom_light
        failed: []
    conversation_id: null
tts:
  engine: tts.piper
  language: en_GB
  voice: en_GB-vctk-medium
  tts_input: Turned on the light
  done: true
  tts_output:
    media_id: >-
      media-source://tts/tts.piper?message=Turned+on+the+light&language=en_GB&voice=en_GB-vctk-medium
    url: >-
      /api/tts_proxy/104c89b5f9053e4751d03002aab527c96124bd77_en-gb_d3b473ba1f_tts.piper.mp3
    mime_type: audio/mpeg

So it must be something between the voice assistant on the s3 box and the assist pipeline?

where should I look for more debugging or has anyone come across this before?

Also some other questions that I’ve been trying to figure out that might help:

how does the pipeline know when to stop processing audio, when should the audio clip be trimmed?
what is in between the audio being streamed from the s3 box and whisper, as in what is home assistant doing?

Any help would be greatly appreciated as it’s been driving me nuts trying to figure out what it could be.

some version details:
HA: 2023.3.1
Whisper: wyoming - 1.5.3, faster-whisper - 1.0.1 via rhasspy/wyoming-whisper:latest
ESP32 s3 Box 3: esphome.voice-assistant version 2.0 / ESPHome version 2024.2.2

robgough1970 · March 20, 2024, 6:53am

Hi,
How are you running HA ?
Have you had the s3 box 3 working previously?
are the s3 box and HA on the same vlan?
cheers

ADeane6 · March 20, 2024, 9:18am

HA runs in a Kubernetes cluster, with an nginx lb for routing.

I’ve only had the s3 box work as far as it responds to local wake word and can be controlled from HA (mute, backlight toggle, etc…), I’ve not tried it with any other firmware (only voice assistant)

yes they are both on the same flat vlan

robgough1970 · March 21, 2024, 3:38am

HA requires all UDP ports to be open, as a random port is generated with each audio stream, so that is the first thing to check.

ADeane6 · March 22, 2024, 9:55am

Ah interesting, and from a quick search it seems the port range is not defined / random

github.com/home-assistant/core

Wake word & Satellite audio stream using seemingly random UDP high (> 1024) port

opened 10:09PM - 05 Dec 23 UTC

morremeyer

integration: wyoming

### The problem The wake word pipeline as described in [Year of the Voice - C…hapter 4](https://www.home-assistant.io/blog/2023/10/12/year-of-the-voice-chapter-4-wakewords/) connects from the satellite to Home Assistant using a seemingly random "high" (= number > 1024) port. This leads to pipeline timeouts as described in e.g.: - https://community.home-assistant.io/t/atom-echo-m5-13-voice-assistant-tutorial-issues-setting-up-in-docker/637086/9 - https://github.com/home-assistant/core/issues/92565 The current workaround for all systems is "open the firewall on all UDP ports for Home Assistant". This does not apply for Home Assistant OS since it seems to not firewall these ports/open these for that purpose. However, this is not possible without massively compromising on security on some setups, e.g. the one I run with the docker container on Kubernetes. In that setup, I would need to run Home Assistant in "host mode", meaning that it effectively runs in the network space of the host. ### Proposal To enable the use of wake words in all supported Home Assistant installation methods, the UDP port used to transmit the voice stream should be fixed. ### What version of Home Assistant Core has the issue? core-2023.11.3 ### What was the last working version of Home Assistant Core? never worked ### What type of installation are you running? Home Assistant Container ### Integration causing the issue wyoming ### Link to integration documentation on our website https://www.home-assistant.io/integrations/wyoming/ ### Diagnostics information Check these tcpdump lines to see that an M5 Atom Echo configured according to the [13$ Voice Assistant tutorial](https://www.home-assistant.io/voice_control/thirteen-usd-voice-remote/) connects on seemingly random UDP high ports. The M5 Atom Echo is currently configured to not use the Wake Word, but the button. Pressing the button for the first time leads to the light turning blue (= listening) and the connections in the log on port 60252: ``` 22:03:54.982184 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 ``` Pressing it again deactivates the microphone. Then pressing a third time leads to it listening again and connections on port `55427`: ``` 22:04:04.558757 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024 ``` Full log in the logs section below. ### Example YAML snippet _No response_ ### Anything in the logs that might be useful for us? ```txt 22:03:54.725859 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 22:03:54.758182 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 22:03:54.789796 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 22:03:54.793407 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], seq 108:118, ack 18, win 5344, length 10 22:03:54.793632 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [.], ack 118, win 65535, length 0 22:03:54.822011 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 22:03:54.854055 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 22:03:54.885818 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 22:03:54.919433 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 22:03:54.949970 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 22:03:54.950083 IP redacted-host-name.mor.re > m5stack-atom-echo-0f9dd0.fritz.box: ICMP redacted-host-name.mor.re udp port 60252 unreachable, length 556 22:03:54.982184 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 22:03:55.013683 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 22:03:55.045711 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 22:03:55.079668 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 22:03:55.109903 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 22:03:55.141867 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 22:03:55.160575 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], seq 118:126, ack 18, win 5344, length 8 22:03:55.160788 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [.], ack 126, win 65535, length 0 22:03:55.174065 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 22:03:55.205864 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.60252: UDP, length 1024 22:03:55.224046 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], seq 126:131, ack 18, win 5344, length 5 22:03:55.224227 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [.], ack 131, win 65535, length 0 22:03:55.235352 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], seq 131:177, ack 18, win 5344, length 46 22:03:55.235770 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [.], ack 177, win 65535, length 0 22:04:03.797933 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], seq 177:187, ack 18, win 5344, length 10 22:04:03.798144 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [.], ack 187, win 65535, length 0 22:04:04.145943 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], seq 187:195, ack 18, win 5344, length 8 22:04:04.146150 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [.], ack 195, win 65535, length 0 22:04:04.207595 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], seq 195:213, ack 18, win 5344, length 18 22:04:04.207780 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [.], ack 213, win 65535, length 0 22:04:04.213074 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [P.], seq 18:25, ack 213, win 65535, length 7 22:04:04.213907 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [P.], seq 25:30, ack 213, win 65535, length 5 22:04:04.214685 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [P.], seq 30:35, ack 213, win 65535, length 5 22:04:04.215479 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], ack 30, win 5332, length 0 22:04:04.279902 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024 22:04:04.279972 IP redacted-host-name.mor.re > m5stack-atom-echo-0f9dd0.fritz.box: ICMP redacted-host-name.mor.re udp port 55427 unreachable, length 556 22:04:04.298488 IP m5stack-atom-echo-0f9dd0.fritz.box.6053 > redacted-host-name.mor.re.16736: Flags [.], seq 213:267, ack 35, win 5327, length 54 22:04:04.298641 IP redacted-host-name.mor.re.16736 > m5stack-atom-echo-0f9dd0.fritz.box.6053: Flags [.], ack 267, win 65535, length 0 22:04:04.307496 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024 22:04:04.307553 IP redacted-host-name.mor.re > m5stack-atom-echo-0f9dd0.fritz.box: ICMP redacted-host-name.mor.re udp port 55427 unreachable, length 556 22:04:04.334908 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024 22:04:04.335048 IP redacted-host-name.mor.re > m5stack-atom-echo-0f9dd0.fritz.box: ICMP redacted-host-name.mor.re udp port 55427 unreachable, length 556 22:04:04.366975 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024 22:04:04.367143 IP redacted-host-name.mor.re > m5stack-atom-echo-0f9dd0.fritz.box: ICMP redacted-host-name.mor.re udp port 55427 unreachable, length 556 22:04:04.401671 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024 22:04:04.401794 IP redacted-host-name.mor.re > m5stack-atom-echo-0f9dd0.fritz.box: ICMP redacted-host-name.mor.re udp port 55427 unreachable, length 556 22:04:04.430792 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024 22:04:04.430930 IP redacted-host-name.mor.re > m5stack-atom-echo-0f9dd0.fritz.box: ICMP redacted-host-name.mor.re udp port 55427 unreachable, length 556 22:04:04.462812 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024 22:04:04.495515 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024 22:04:04.528875 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024 22:04:04.558757 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024 22:04:04.590977 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024 22:04:04.622744 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024 22:04:04.654825 IP m5stack-atom-echo-0f9dd0.fritz.box.6055 > redacted-host-name.mor.re.55427: UDP, length 1024 ``` ### Additional information I am seriously impressed with all the progress voice integration in Home Assistant is making. It's amazing to see it coming along and I'm looking forward to the future with this. I decided to open this issue to get a discussion started rather sooner than later. If I missed another open issue or documentation for this - I couldn't find any - please let me know.

Guess I’ll need to try one of the workarounds

ADeane6 · March 30, 2024, 11:18pm

Just to close this out, heres the work-around I ended up with.

Applied the changes in this PR to the voice-assistant.py in the esphome module

github.com/home-assistant/core

Change voice assistant to use fixed ports

home-assistant:dev ← fiatguy85:voice-assist-port

opened 07:30PM - 06 Jan 24 UTC

fiatguy85

+19 -2

## Proposed change Change voice assistant to use a fixed port, rather than a …random one. This way, only a fixed port needs to be opened for a device streaming to the voice assistant, rather than the whole range of UDP Ports. ## Type of change - [ ] Dependency upgrade - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New integration (thank you!) - [ ] New feature (which adds functionality to an existing integration) - [ ] Deprecation (breaking change to happen in the future) - [ ] Breaking change (fix/feature causing existing functionality to break) - [x] Code quality improvements to existing code or addition of tests ## Additional information ## Checklist - [x] The code change is tested and works locally. - [x] Local tests pass. **Your PR cannot be merged unless tests pass** - [x] There is no commented out code in this PR. - [x] I have followed the [development checklist][dev-checklist] - [x] I have followed the [perfect PR recommendations][perfect-pr] - [x] The code has been formatted using Ruff (`ruff format homeassistant tests`) - [ ] Tests have been added to verify that the new code works. If user exposed functionality or configuration variables are added/changed: - [ ] Documentation added/updated for [www.home-assistant.io][docs-repository] If the code communicates with devices, web services, or third-party tools: - [ ] The [manifest file][manifest-docs] has all fields filled out correctly. Updated and included derived files by running: `python3 -m script.hassfest`. - [ ] New or updated dependencies have been added to `requirements_all.txt`. Updated by running `python3 -m script.gen_requirements_all`. - [ ] For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description. - [ ] Untested files have been added to `.coveragerc`.  To help with the load of incoming pull requests: - [ ] I have reviewed two other [open pull requests][prs] in this repository. [prs]: https://github.com/home-assistant/core/pulls?q=is%3Aopen+is%3Apr+-author%3A%40me+-draft%3Atrue+-label%3Awaiting-for-upstream+sort%3Acreated-desc+review%3Anone+-status%3Afailure  [dev-checklist]: https://developers.home-assistant.io/docs/development_checklist/ [manifest-docs]: https://developers.home-assistant.io/docs/creating_integration_manifest/ [quality-scale]: https://developers.home-assistant.io/docs/integration_quality_scale_index/ [docs-repository]: https://github.com/home-assistant/home-assistant.io [perfect-pr]: https://developers.home-assistant.io/docs/review-process/#creating-the-perfect-pr

Added the modified file as a config map to my home assistant k8s deployment

Updated the deployment to expose the set of UDP ports as nodePorts, this worked as the HA instance is exposed through an ingress so the s3 box tries to reach the host IP anyway.

Also in the PR there’s mention of work going on to change the va for esphome to use the esphome API to receive audio

Freekers · May 9, 2024, 9:18pm

I ran into exactly the same issue as @ADeane6 . My setup is very similar apart from that I don’t use Kubernetes but just standalone Docker. Nevertheless the logs and behavior were the same.

Good news; as of ESPHome 2024.4.0 the workaround above is no longer needed:

Voice Assistant Audio

This release adds support for sending and receiving audio to/from voice assistants via the API. Currently ESPHome sends and receives the Voice Assistant audio bytes via a UDP socket which can be unreliable and insecure. Beginning with Home Assistant 2024.5, both sides will automatically recognise that they both support API Audio and will use that route instead. This is more reliable because the ESPHome API uses a TCP socket, so packet order and delivery is guaranteed, and if you use API Encryption, your audio will also be encrypted in transit.

To update, I simply reflashed it via Ready-Made Projects — ESPHome

DAveShillito · May 23, 2024, 12:20am

I too seem to be having the same issue where my S3 Box is not speaking to me.

I finally received a fresh S3 Box and installed it for the first time following the instructions here ESP32-S3-BOX voice assistant

After a bit of an effort to get it up to date (there was an EspHome update at the same time) it is now running Firmware: 2024.5.2 (May 22 2024, 17:03:27)

If I set it to use on device wake word it does not respond at all, so I’ll ignore that for now, and look at that later.

If I set it to use wake word in HA it will react and control my lights as expected, however it will not speak, and the logs are showing the same as @ADeane6.

Is there any difference in using the Ready-Made Projects — ESPHome version over the latest firmware built using the EspHome addon in HA?

My HA system is running on a HA Yellow,

Core 2024.5.4
Supervisor 2024.05.1
Operating System 12.3
Frontend 20240501.1
EspHome 2024.5.2

I also have two M5Stack Atom’s which are using the same assist pipeline, and these speak as expected.