ESP32-S3-Box-3 Voice Assistance broke and I cannot get it to work again

daletbet · August 29, 2024, 9:45am

Hi, I had the ESP32 Box set up and working perfectly with HA. I then went on vacation and it seems some updates were installed that broke it. I get “speech-to-text failed (stt-stream-failed)” when i debug in HA. I read this issue has to do with the UDP ports being randomly selected on the ESP and HA but it seems this is no longer the case (I also tried messing with the router settings but to no avail). I have HA deployed in docker on a Synology NAS and i’m using openwakeword from docker as well. If I manually type in HA voice assistant it all works fine. I tried different YAMLs for the ESP device but none work now (i’m on the default now). Debug log and openwakeword setup below, please help:

tage: done
run:
  pipeline: 01hz2ea0c9jt204t7rg0sae4yg
  language: en
events:
  - type: run-start
    data:
      pipeline: 01hz2ea0c9jt204t7rg0sae4yg
      language: en
    timestamp: "2024-08-29T09:34:52.051669+00:00"
  - type: wake_word-start
    data:
      entity_id: wake_word.openwakeword
      metadata:
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
      timeout: 5
    timestamp: "2024-08-29T09:34:52.051798+00:00"
  - type: wake_word-end
    data:
      wake_word_output:
        wake_word_id: ok_nabu_v0.1
        wake_word_phrase: ok nabu
        timestamp: 15500
    timestamp: "2024-08-29T09:35:07.091295+00:00"
  - type: stt-start
    data:
      engine: stt.home_assistant_cloud
      metadata:
        language: en-US
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2024-08-29T09:35:07.091518+00:00"
  - type: error
    data:
      code: stt-stream-failed
      message: speech-to-text failed
    timestamp: "2024-08-29T09:35:07.295904+00:00"
  - type: run-end
    data: null
    timestamp: "2024-08-29T09:35:07.296713+00:00"
wake_word:
  entity_id: wake_word.openwakeword
  metadata:
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  timeout: 5
  done: true
  wake_word_output:
    wake_word_id: ok_nabu_v0.1
    wake_word_phrase: ok nabu
    timestamp: 15500
stt:
  engine: stt.home_assistant_cloud
  metadata:
    language: en-US
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: false
error:
  code: stt-stream-failed
  message: speech-to-text failed

{
   "CapAdd" : null,
   "CapDrop" : null,
   "cmd" : "--preload-model ok_nabu",
   "cpu_priority" : 50,
   "enable_publish_all_ports" : false,
   "enable_restart_policy" : false,
   "enabled" : true,
   "env_variables" : [
      {
         "key" : "PATH",
         "value" : "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
      }
   ],
   "exporting" : false,
   "id" : "246065864a30f2604a88d48c1e690b7828a11813d0c9b16f08c67ccaade765c9",
   "image" : "rhasspy/wyoming-openwakeword",
   "is_ddsm" : false,
   "is_package" : false,
   "labels" : {},
   "links" : [],
   "memory_limit" : 0,
   "name" : "eloquent_dhawan",
   "network" : [
      {
         "driver" : "bridge",
         "name" : "bridge"
      }
   ],
   "network_mode" : "default",
   "port_bindings" : [
      {
         "container_port" : 10400,
         "host_port" : 10400,
         "type" : "tcp"
      },
      {
         "container_port" : 10400,
         "host_port" : 10400,
         "type" : "udp"
      }
   ],
   "privileged" : false,
   "shortcut" : {
      "enable_shortcut" : false,
      "enable_status_page" : false,
      "enable_web_page" : false,
      "web_page_url" : ""
   },
   "use_host_network" : false,
   "version" : 2,
   "volume_bindings" : []
}

robgough1970 · August 29, 2024, 11:18am

check the logs for the device from the ESPHome dashboard. say the wakeword and watch the logs to see if they are responding to your voice.

daletbet · August 29, 2024, 12:03pm

Yes forgot to mention I did that as well but I get the same stt error (the wake work is triggering the pipeline but the transmission of the data fails?):

[14:01:13][D][voice_assistant:258]: VAD detected speech
[14:01:13][D][voice_assistant:504]: State changed from WAITING_FOR_VAD to START_PIPELINE
[14:01:13][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE
[14:01:13][D][voice_assistant:275]: Requesting start...
[14:01:13][D][voice_assistant:504]: State changed from START_PIPELINE to STARTING_PIPELINE
[14:01:13][D][voice_assistant:525]: Client started, streaming microphone
[14:01:13][D][voice_assistant:504]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[14:01:13][D][voice_assistant:510]: Desired state set to STREAMING_MICROPHONE
[14:01:13][D][voice_assistant:627]: Event Type: 1
[14:01:13][D][voice_assistant:630]: Assist Pipeline running
[14:01:13][D][voice_assistant:627]: Event Type: 9
[14:01:14][D][voice_assistant:627]: Event Type: 10
[14:01:14][D][voice_assistant:636]: Wake word detected
[14:01:14][D][voice_assistant:627]: Event Type: 3
[14:01:14][D][voice_assistant:641]: STT started
[14:01:14][D][text_sensor:064]: 'text_request': Sending state '...'
[14:01:14][D][text_sensor:064]: 'text_response': Sending state '...'
[14:01:14][W][component:237]: Component voice_assistant took a long time for an operation (229 ms).
[14:01:14][W][component:238]: Components should block for at most 30 ms.
[14:01:14][D][light:036]: 'LCD Backlight' Setting:
[14:01:14][D][light:085]:   Transition length: 0.2s
[14:01:14][D][voice_assistant:627]: Event Type: 0
[14:01:14][E][voice_assistant:759]: Error: stt-stream-failed - speech-to-text failed

Rofo · August 29, 2024, 2:54pm

Have you made any network changes ? Is the box on a different VLAN ?

daletbet · August 29, 2024, 8:58pm

No, unless something to do with the network got updated… I don’t think I have different VLANs… The Box and the NAS are on the same network (box on wifi, nas on LAN but that has always been the case). I am not super confident with network stuff but HA/Docker and the Box are on the same internal domain… The Box is also successfully connected to HA