ESP32-S3-BOX-3 has no audio with

Hi All,

I have an ESP32-S3-BOX-3 “sort of” working with my home assistant. I never hear any audio come out of it, but I see the text display on the screen. Debugging it I can listen to the audio from my browser or even calling the URL directly. I’m using a containerized (podman) version of home assistant. I did have to allow specific UDP ports in the container definition and also I updated voice_assistant.py to match the UDP Port (8124). Which seemed to work great, now the only issue is no audio. I can hear it kind of “click” but never have I heard any audio. Am I missing an obvious configuration here? Here are some debug logs from my latest attempt:

2024-05-20 18:16:35.719 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Got message of type VoiceAssistantRequest: start: true
conversation_id: "01HYC7Z6P4PP1S0J9MBBT8S94A"
flags: 1
audio_settings {
  noise_suppression_level: 2
  auto_gain: 31
  volume_multiplier: 2
}
wake_word_phrase: "okay nabu"

2024-05-20 18:16:35.729 DEBUG (MainThread) [homeassistant.components.esphome.voice_assistant] Starting pipeline
2024-05-20 18:16:35.730 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantResponse: port: 8124

2024-05-20 18:16:35.731 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00035b08bc3f]
2024-05-20 18:16:35.736 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_RUN_START

2024-05-20 18:16:35.736 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00025c0801]
2024-05-20 18:16:35.738 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_STT_START

2024-05-20 18:16:35.739 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00025c0803]
2024-05-20 18:16:36.590 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_STT_VAD_START

2024-05-20 18:16:36.590 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00025c080b]
2024-05-20 18:16:37.606 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_STT_VAD_END

2024-05-20 18:16:37.606 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00025c080c]
2024-05-20 18:16:37.728 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_STT_END
data {
  name: "text"
  value: "What time is it?"
}

2024-05-20 18:16:37.728 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [001c5c080412180a04746578741210576861742074696d652069732069743f]
2024-05-20 18:16:37.729 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_INTENT_START

2024-05-20 18:16:37.729 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00025c0805]
2024-05-20 18:16:38.567 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_INTENT_END
data {
  name: "conversation_id"
  value: "01HYC7Z6P4PP1S0J9MBBT8S94A"
}

2024-05-20 18:16:38.567 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00315c0806122d0a0f636f6e766572736174696f6e5f6964121a3031485943375a36503450503153304a394d4242543853393441]
2024-05-20 18:16:38.570 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_TTS_START
data {
  name: "text"
  value: "The current time is 2024-05-20 18:14:10."
}

2024-05-20 18:16:38.571 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00345c080712300a047465787412285468652063757272656e742074696d6520697320323032342d30352d32302031383a31343a31302e]
2024-05-20 18:16:38.573 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_TTS_END
data {
  name: "url"
  value: "http://192.168.1.219:8123/api/tts_proxy/062b908cdde4e4849f58c50e86282a26f2793e8a_en-us_fb508441d3_tts.home_assistant_cloud.wav"
}

2024-05-20 18:16:38.574 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [008a015c08081285010a0375726c127e687474703a2f2f3139322e3136382e312e3231393a383132332f6170692f7474735f70726f78792f303632623930386364646534653438343966353863353065383632383261323666323739336538615f656e2d75735f666235303834343164335f7474732e686f6d655f617373697374616e745f636c6f75642e776176]
2024-05-20 18:16:38.574 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_RUN_END

2024-05-20 18:16:38.574 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00025c0802]
2024-05-20 18:16:38.577 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_TTS_STREAM_START

2024-05-20 18:16:38.577 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00025c0862]
2024-05-20 18:16:39.158 DEBUG (MainThread) [homeassistant.components.esphome.voice_assistant] Sending 190464 bytes of audio

1 Like

There have been complaints but I not sure it is problem or not

1 Like

It’s an ESPHome update issue. Mine worked until the latest version. See ESP32 S3 Box 3: Why is this so difficult?! - #9 by JeeBee for more

1 Like

Same. I also have the issue on both my M5Stack echo atoms and my two ESP32 boxes that they stop responding to wake words.