Hi All,
I have an ESP32-S3-BOX-3 “sort of” working with my home assistant. I never hear any audio come out of it, but I see the text display on the screen. Debugging it I can listen to the audio from my browser or even calling the URL directly. I’m using a containerized (podman) version of home assistant. I did have to allow specific UDP ports in the container definition and also I updated voice_assistant.py to match the UDP Port (8124). Which seemed to work great, now the only issue is no audio. I can hear it kind of “click” but never have I heard any audio. Am I missing an obvious configuration here? Here are some debug logs from my latest attempt:
2024-05-20 18:16:35.719 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Got message of type VoiceAssistantRequest: start: true
conversation_id: "01HYC7Z6P4PP1S0J9MBBT8S94A"
flags: 1
audio_settings {
noise_suppression_level: 2
auto_gain: 31
volume_multiplier: 2
}
wake_word_phrase: "okay nabu"
2024-05-20 18:16:35.729 DEBUG (MainThread) [homeassistant.components.esphome.voice_assistant] Starting pipeline
2024-05-20 18:16:35.730 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantResponse: port: 8124
2024-05-20 18:16:35.731 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00035b08bc3f]
2024-05-20 18:16:35.736 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_RUN_START
2024-05-20 18:16:35.736 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00025c0801]
2024-05-20 18:16:35.738 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_STT_START
2024-05-20 18:16:35.739 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00025c0803]
2024-05-20 18:16:36.590 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_STT_VAD_START
2024-05-20 18:16:36.590 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00025c080b]
2024-05-20 18:16:37.606 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_STT_VAD_END
2024-05-20 18:16:37.606 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00025c080c]
2024-05-20 18:16:37.728 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_STT_END
data {
name: "text"
value: "What time is it?"
}
2024-05-20 18:16:37.728 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [001c5c080412180a04746578741210576861742074696d652069732069743f]
2024-05-20 18:16:37.729 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_INTENT_START
2024-05-20 18:16:37.729 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00025c0805]
2024-05-20 18:16:38.567 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_INTENT_END
data {
name: "conversation_id"
value: "01HYC7Z6P4PP1S0J9MBBT8S94A"
}
2024-05-20 18:16:38.567 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00315c0806122d0a0f636f6e766572736174696f6e5f6964121a3031485943375a36503450503153304a394d4242543853393441]
2024-05-20 18:16:38.570 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_TTS_START
data {
name: "text"
value: "The current time is 2024-05-20 18:14:10."
}
2024-05-20 18:16:38.571 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00345c080712300a047465787412285468652063757272656e742074696d6520697320323032342d30352d32302031383a31343a31302e]
2024-05-20 18:16:38.573 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_TTS_END
data {
name: "url"
value: "http://192.168.1.219:8123/api/tts_proxy/062b908cdde4e4849f58c50e86282a26f2793e8a_en-us_fb508441d3_tts.home_assistant_cloud.wav"
}
2024-05-20 18:16:38.574 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [008a015c08081285010a0375726c127e687474703a2f2f3139322e3136382e312e3231393a383132332f6170692f7474735f70726f78792f303632623930386364646534653438343966353863353065383632383261323666323739336538615f656e2d75735f666235303834343164335f7474732e686f6d655f617373697374616e745f636c6f75642e776176]
2024-05-20 18:16:38.574 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_RUN_END
2024-05-20 18:16:38.574 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00025c0802]
2024-05-20 18:16:38.577 DEBUG (MainThread) [aioesphomeapi.connection] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending VoiceAssistantEventResponse: event_type: VOICE_ASSISTANT_TTS_STREAM_START
2024-05-20 18:16:38.577 DEBUG (MainThread) [aioesphomeapi._frame_helper.base] esp32-s3-box-3-05af40 @ 192.168.1.14: Sending frame: [00025c0862]
2024-05-20 18:16:39.158 DEBUG (MainThread) [homeassistant.components.esphome.voice_assistant] Sending 190464 bytes of audio