Thanks, I just collected logs from ESPHome (below). It does look like the microphone is streaming for 15 seconds before it stops
[21:24:35][D][voice_assistant:439]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[21:24:50][D][voice_assistant:563]: Event Type: 12
[21:24:50][D][voice_assistant:721]: STT by VAD end
Full logs:
[21:24:34][D][micro_wake_word:362]: Wake word sliding average probability is 0.556 and most recent probability is 0.910
[21:24:34][D][micro_wake_word:128]: Wake Word Detected
[21:24:34][D][micro_wake_word:177]: State changed from DETECTING_WAKE_WORD to STOP_MICROPHONE
[21:24:34][D][micro_wake_word:134]: Stopping Microphone
[21:24:34][D][esp_adf.microphone:234]: Stopping microphone
[21:24:34][D][micro_wake_word:177]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:24:34][D][esp-idf:000]: W (226440) AUDIO_ELEMENT: IN-[filter] AEL_IO_ABORT
[21:24:34][D][esp-idf:000]: E (226444) AUDIO_ELEMENT: [filter] Element already stopped
[21:24:34][D][esp-idf:000]: W (226474) AUDIO_PIPELINE: There are no listener registered
[21:24:34][D][esp-idf:000]: I (226479) AUDIO_PIPELINE: audio_pipeline_unlinked
[21:24:34][D][esp-idf:000]: W (226482) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE
[21:24:34][D][esp-idf:000]: I (226486) I2S: DMA queue destroyed
[21:24:34][D][esp-idf:000]: W (226493) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE
[21:24:34][D][esp-idf:000]: W (226497) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE
[21:24:34][D][esp_adf.microphone:285]: Microphone stopped
[21:24:34][D][micro_wake_word:177]: State changed from STOPPING_MICROPHONE to IDLE
[21:24:34][D][voice_assistant:439]: State changed from IDLE to START_PIPELINE
[21:24:34][D][voice_assistant:445]: Desired state set to START_MICROPHONE
[21:24:34][D][voice_assistant:126]: microphone not running
[21:24:34][D][voice_assistant:210]: Requesting start...
[21:24:34][D][voice_assistant:439]: State changed from START_PIPELINE to STARTING_PIPELINE
[21:24:34][D][voice_assistant:126]: microphone not running
[21:24:34][D][voice_assistant:126]: microphone not running
[21:24:34][D][voice_assistant:126]: microphone not running
[21:24:34][D][voice_assistant:126]: microphone not running
[21:24:34][D][voice_assistant:126]: microphone not running
[21:24:34][D][voice_assistant:126]: microphone not running
[21:24:34][D][voice_assistant:126]: microphone not running
[21:24:34][D][voice_assistant:126]: microphone not running
[21:24:34][D][voice_assistant:126]: microphone not running
[21:24:34][D][voice_assistant:460]: Client started, streaming microphone
[21:24:34][D][voice_assistant:439]: State changed from STARTING_PIPELINE to START_MICROPHONE
[21:24:34][D][voice_assistant:445]: Desired state set to STREAMING_MICROPHONE
[21:24:34][D][voice_assistant:163]: Starting Microphone
[21:24:34][D][voice_assistant:439]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:24:34][D][voice_assistant:563]: Event Type: 1
[21:24:34][D][voice_assistant:566]: Assist Pipeline running
[21:24:34][D][esp-idf:000]: I (226638) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[21:24:34][D][esp-idf:000]: I (226648) I2S: I2S0, MCLK output by GPIO2
[21:24:34][D][esp-idf:000]: I (226656) AUDIO_PIPELINE: link el->rb, el:0x3d05c4ac, tag:i2s, rb:0x3d05c8c0
[21:24:34][D][esp-idf:000]: I (226665) AUDIO_PIPELINE: link el->rb, el:0x3d05c620, tag:filter, rb:0x3d05e900
[21:24:34][D][esp-idf:000]: I (226671) AUDIO_ELEMENT: [i2s-0x3d05c4ac] Element task created
[21:24:34][D][esp-idf:000]: I (226675) AUDIO_THREAD: The filter task allocate stack on external memory
[21:24:34][D][esp-idf:000]: I (226680) AUDIO_ELEMENT: [filter-0x3d05c620] Element task created
[21:24:34][D][esp-idf:000]: I (226687) AUDIO_ELEMENT: [raw-0x3d05c750] Element task created
[21:24:34][D][esp-idf:000]: I (226693) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16463891 Bytes, Inter:82244 Bytes, Dram:82244 Bytes
[21:24:34][D][esp-idf:000]: I (226700) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1
[21:24:34][D][esp-idf:000]: I (226704) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1
[21:24:34][D][esp-idf:000]: I (226709) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1
[21:24:34][D][esp-idf:000]: I (226716) AUDIO_PIPELINE: Pipeline started
[21:24:35][W][component:237]: Component voice_assistant took a long time for an operation (247 ms).
[21:24:35][W][component:238]: Components should block for at most 30 ms.
[21:24:35][D][esp_adf.microphone:273]: Microphone started
[21:24:35][D][voice_assistant:439]: State changed from STARTING_MICROPHONE to STREAMING_MICROPHONE
[21:24:50][D][voice_assistant:563]: Event Type: 12
[21:24:50][D][voice_assistant:721]: STT by VAD end
[21:24:50][D][voice_assistant:439]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[21:24:50][D][voice_assistant:445]: Desired state set to AWAITING_RESPONSE
[21:24:50][D][esp_adf.microphone:234]: Stopping microphone
[21:24:50][D][voice_assistant:439]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[21:24:50][D][esp-idf:000]: E (242363) AUDIO_ELEMENT: [filter] Element already stopped
[21:24:50][D][esp-idf:000]: W (242392) AUDIO_PIPELINE: There are no listener registered
[21:24:50][D][esp-idf:000]: I (242396) AUDIO_PIPELINE: audio_pipeline_unlinked
[21:24:50][D][esp-idf:000]: W (242401) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE
[21:24:50][D][esp-idf:000]: I (242409) I2S: DMA queue destroyed
[21:24:50][D][esp-idf:000]: W (242415) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE
[21:24:50][D][esp-idf:000]: W (242420) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE
[21:24:50][W][component:237]: Component voice_assistant took a long time for an operation (230 ms).
[21:24:50][W][component:238]: Components should block for at most 30 ms.
[21:24:50][D][esp_adf.microphone:285]: Microphone stopped
[21:24:50][D][voice_assistant:439]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[21:24:57][D][voice_assistant:563]: Event Type: 4
[21:24:57][D][voice_assistant:591]: Speech recognised as: " Turn on the master bedroom lights."
[21:24:57][D][text_sensor:064]: 'text_request': Sending state ' Turn on the master bedroom lights.'
[21:24:57][W][component:237]: Component voice_assistant took a long time for an operation (223 ms).
[21:24:57][W][component:238]: Components should block for at most 30 ms.
[21:24:57][D][voice_assistant:563]: Event Type: 5
[21:24:57][D][voice_assistant:596]: Intent started
[21:24:57][D][voice_assistant:563]: Event Type: 6
[21:24:57][D][voice_assistant:563]: Event Type: 7
[21:24:57][D][voice_assistant:619]: Response: "Turned on the light"
[21:24:57][D][text_sensor:064]: 'text_response': Sending state 'Turned on the light'
[21:24:57][D][voice_assistant:563]: Event Type: 98
[21:24:57][D][voice_assistant:704]: TTS stream start
[21:24:57][D][esp-idf:000]: I (249651) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=8
[21:24:57][D][esp-idf:000]: I (249659) I2S: I2S0, MCLK output by GPIO2
[21:24:57][D][esp-idf:000]: I (249664) AUDIO_PIPELINE: link el->rb, el:0x3d05c34c, tag:raw, rb:0x3d05c4bc
[21:24:57][D][esp-idf:000]: I (249670) AUDIO_ELEMENT: [raw-0x3d05c34c] Element task created
[21:24:57][D][esp-idf:000]: I (249678) AUDIO_ELEMENT: [i2s-0x3d05c0a8] Element task created
[21:24:57][D][esp-idf:000]: I (249682) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16463679 Bytes, Inter:74792 Bytes, Dram:74792 Bytes
[21:24:57][D][esp-idf:000]: I (249688) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1
[21:24:57][D][esp-idf:000]: I (249691) I2S_STREAM: AUDIO_STREAM_WRITER
[21:24:58][W][component:237]: Component voice_assistant took a long time for an operation (244 ms).
[21:24:58][W][component:238]: Components should block for at most 30 ms.
[21:24:58][D][voice_assistant:563]: Event Type: 8
[21:24:58][D][voice_assistant:639]: Response URL: "http://192.168.50.135:8123/api/tts_proxy/104c89b5f9053e4751d03002aab527c96124bd77_en-us_4d30e09a66_tts.piper.wav"
[21:24:58][D][voice_assistant:439]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[21:24:58][D][voice_assistant:445]: Desired state set to STREAMING_RESPONSE
[21:24:58][D][voice_assistant:563]: Event Type: 2
[21:24:58][D][voice_assistant:653]: Assist Pipeline ended
[21:24:59][D][voice_assistant:563]: Event Type: 99
[21:24:59][D][voice_assistant:712]: TTS stream end
[21:24:59][D][voice_assistant:310]: End of audio stream received
[21:24:59][D][voice_assistant:439]: State changed from STREAMING_RESPONSE to RESPONSE_FINISHED
[21:24:59][D][voice_assistant:445]: Desired state set to RESPONSE_FINISHED
[21:25:00][D][esp-idf:000]: W (252732) AUDIO_PIPELINE: There are no listener registered
[21:25:00][D][esp-idf:000]: I (252738) AUDIO_PIPELINE: audio_pipeline_unlinked
[21:25:00][D][esp-idf:000]: W (252743) AUDIO_ELEMENT: [i2s] Element has not create when AUDIO_ELEMENT_TERMINATE
[21:25:00][D][esp-idf:000]: I (252750) I2S: DMA queue destroyed
[21:25:00][D][esp-idf:000]: W (252757) AUDIO_ELEMENT: [filter] Element has not create when AUDIO_ELEMENT_TERMINATE
[21:25:00][D][esp-idf:000]: W (252766) AUDIO_ELEMENT: [raw] Element has not create when AUDIO_ELEMENT_TERMINATE
[21:25:00][D][voice_assistant:342]: Speaker has finished outputting all audio
[21:25:01][D][voice_assistant:439]: State changed from RESPONSE_FINISHED to IDLE
[21:25:01][D][voice_assistant:445]: Desired state set to IDLE
[21:25:01][W][component:237]: Component voice_assistant took a long time for an operation (222 ms).
[21:25:01][W][component:238]: Components should block for at most 30 ms.
[21:25:01][D][micro_wake_word:177]: State changed from IDLE to START_MICROPHONE
[21:25:01][D][micro_wake_word:115]: Starting Microphone
[21:25:01][D][micro_wake_word:177]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[21:25:01][D][esp-idf:000]: I (253011) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8
[21:25:01][D][esp-idf:000]: I (253019) I2S: I2S0, MCLK output by GPIO2
[21:25:01][D][esp-idf:000]: I (253026) AUDIO_PIPELINE: link el->rb, el:0x3d05c4ac, tag:i2s, rb:0x3d05c8c0
[21:25:01][D][esp-idf:000]: I (253034) AUDIO_PIPELINE: link el->rb, el:0x3d05c620, tag:filter, rb:0x3d05e900
[21:25:01][D][esp-idf:000]: I (253044) AUDIO_ELEMENT: [i2s-0x3d05c4ac] Element task created
[21:25:01][D][esp-idf:000]: I (253051) AUDIO_THREAD: The filter task allocate stack on external memory
[21:25:01][D][esp-idf:000]: I (253058) AUDIO_ELEMENT: [filter-0x3d05c620] Element task created
[21:25:01][D][esp-idf:000]: I (253064) AUDIO_ELEMENT: [raw-0x3d05c750] Element task created
[21:25:01][D][esp-idf:000]: I (253070) AUDIO_PIPELINE: Func:audio_pipeline_run, Line:359, MEM Total:16469395 Bytes, Inter:87748 Bytes, Dram:87748 Bytes
[21:25:01][D][esp-idf:000]: I (253074) AUDIO_ELEMENT: [i2s] AEL_MSG_CMD_RESUME,state:1
[21:25:01][D][esp-idf:000]: I (253079) AUDIO_ELEMENT: [filter] AEL_MSG_CMD_RESUME,state:1
[21:25:01][D][esp-idf:000]: I (253085) RSP_FILTER: sample rate of source data : 16000, channel of source data : 2, sample rate of destination data : 16000, channel of destination data : 1
[21:25:01][D][esp-idf:000]: I (253094) AUDIO_PIPELINE: Pipeline started
[21:25:01][D][esp_adf.microphone:273]: Microphone started
[21:25:01][D][micro_wake_word:177]: State changed from STARTING_MICROPHONE to DETECTING_WAKE_WORD