Here’s a fun one.
So, I just finished setting up the re-speaker thanks to formatBCE github and it’s fantastic. Wakeword detection is fantastic (can’t wait to train my own) all is running smooth. Except…
For some reason I don’t get voice responses. Everything else works without issue but voice does not. I get the little ‘Ping’ sound for activation so it’s not a speaker issue; and get ready for it… Manual tts via the media player works without problem.
I saw someone earlier who had local access issues via http but this isn’t that. I can access local over http without any issues. From what I can tell it’s a home assistant issue? except I can’t find any mention of it anywhere else which makes me think it’s a me issue. My HASSOS is up to date, esp home is the newest version, XMOS is on 1.0.9.
The media player entity flashes on and then off again like it want’s to play something but it just fails and stays silent. Trying to access those two urls from the browser results in basically the same thing. The esphome/ffmpeg_proxy
endpoint returns a file but the tts_proxy
returns 404
I’m including the logs cause i’ve been messing with this for the last 2 hours and am at a loss. I don’t know enough about mmw or xmos to know if it’s a satellite software issue or not and before I sink more hours into trying to debug my hass setup I figured I’d post this here and see if something jumps out at someone.
Full log from wake word detection:
--------------------------- Wake Word ---------------------------
[23:49:39][D][esp-idf:000]: I (138) gpi[D][micro_wake_word:357]: Detected 'Okay Nabu' with sliding average probability is 0.98 and max probability is 1.00
[23:49:39][D][media_player:080]: 'Media Player' - Setting
[23:49:39][D][media_player:084]: Command: STOP
[23:49:39][D][media_player:093]: Announcement: yes
[23:49:39][D][media_player:080]: 'Media Player' - Setting
[23:49:39][D][media_player:093]: Announcement: yes
[23:49:39][D][ring_buffer:034]: Created ring buffer with size 48000
[23:49:39][D][ring_buffer:034]: Created ring buffer with size 48000
[23:49:39][D][ring_buffer:034]: Created ring buffer with size 65536
[23:49:39][D][ring_buffer:034]: Created ring buffer with size 65536
[23:49:39][D][nabu_media_player.pipeline:174]: Reading FLAC file type
[23:49:39][D][nabu_media_player.pipeline:186]: Decoded audio has 1 channels, 48000 Hz sample rate, and 16 bits per sample
[23:49:39][D][nabu_media_player.pipeline:208]: Converting the audio sample rate
[23:49:39][D][nabu_media_player.pipeline:211]: Converting mono channel audio to stereo channel audio
[23:49:39][D][ring_buffer:034][speaker_task]: Created ring buffer with size 16384
[23:49:39][D][esp-idf:000][speaker_task]: I (34985) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=4
[23:49:39]
[23:49:39][D][i2s_audio.speaker:118]: Starting Speaker
[23:49:39][D][i2s_audio.speaker:123]: Started Speaker
[23:49:40][D][voice_assistant:516]: State changed from IDLE to START_MICROPHONE
[23:49:40][D][voice_assistant:522]: Desired state set to START_PIPELINE
[23:49:40][D][voice_assistant:225]: Starting Microphone
[23:49:40][D][ring_buffer:034]: Created ring buffer with size 16384
[23:49:40][D][voice_assistant:516]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[23:49:40][D][voice_assistant:516]: State changed from STARTING_MICROPHONE to START_PIPELINE
[23:49:40][D][voice_assistant:280]: Requesting start...
[23:49:40][D][voice_assistant:516]: State changed from START_PIPELINE to STARTING_PIPELINE
[23:49:40][D][voice_assistant:537]: Client started, streaming microphone
[23:49:40][D][voice_assistant:516]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[23:49:40][D][voice_assistant:522]: Desired state set to STREAMING_MICROPHONE
[23:49:40][D][voice_assistant:639]: Event Type: 1
[23:49:40][D][voice_assistant:642]: Assist Pipeline running
[23:49:40][D][voice_assistant:639]: Event Type: 3
[23:49:40][D][voice_assistant:653]: STT started
[23:49:40][D][light:036]: 'resre-speak-jenna' Setting:
[23:49:40][D][light:047]: State: ON
[23:49:40][D][light:051]: Brightness: 60%
[23:49:40][D][light:059]: Red: 100%, Green: 20%, Blue: 100%
[23:49:40][D][light:109]: Effect: 'Slow Pulse'
[23:49:41][D][voice_assistant:639]: Event Type: 11
[23:49:41][D][voice_assistant:802]: Starting STT by VAD
[23:49:41][D][esp-idf:000][speaker_task]: I (36356) I2S: DMA queue destroyed
[23:49:41]
[23:49:41][D][i2s_audio.speaker:130]: Stopping Speaker
[23:49:41][D][i2s_audio.speaker:136]: Stopped Speaker
[23:49:43][D][voice_assistant:639]: Event Type: 12
[23:49:43][D][voice_assistant:806]: STT by VAD end
[23:49:43][D][voice_assistant:516]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[23:49:43][D][voice_assistant:522]: Desired state set to AWAITING_RESPONSE
[23:49:43][D][voice_assistant:516]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[23:49:43][D][light:036]: 'resre-speak-jenna' Setting:
[23:49:43][D][light:051]: Brightness: 60%
[23:49:43][D][light:059]: Red: 100%, Green: 20%, Blue: 100%
[23:49:43][D][light:109]: Effect: 'Fast Pulse'
[23:49:43][D][voice_assistant:516]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[23:49:43][D][voice_assistant:516]: State changed from AWAITING_RESPONSE to AWAITING_RESPONSE
[23:49:47][D][esp32.preferences:114]: Saving 1 preferences to flash...
[23:49:47][D][esp32.preferences:143]: Saving 1 preferences to flash: 1 cached, 0 written, 0 failed
[23:49:47][D][voice_assistant:639]: Event Type: 4
[23:49:47][D][voice_assistant:667]: Speech recognised as: " Turn on the bedroom light."
[23:49:47][D][voice_assistant:639]: Event Type: 5
[23:49:47][D][voice_assistant:672]: Intent started
[23:49:49][D][voice_assistant:639]: Event Type: 6
[23:49:49][D][voice_assistant:639]: Event Type: 7
[23:49:49][D][voice_assistant:695]: Response: "The bedroom light has been turned on."
[23:49:49][D][light:036]: 'resre-speak-jenna' Setting:
[23:49:49][D][light:051]: Brightness: 60%
[23:49:49][D][light:059]: Red: 20%, Green: 100%, Blue: 100%
[23:49:49][D][light:109]: Effect: 'Slow Pulse'
[23:49:49][D][voice_assistant:639]: Event Type: 8
[23:49:49][D][voice_assistant:717]: Response URL: "http://192.168.10.80:8123/api/tts_proxy/dd7f891e90e2eb75ceeda2bd2ab32502b9a12d04_en-gb_4433720218_tts.piper.flac"
[23:49:49][D][voice_assistant:516]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[23:49:49][D][voice_assistant:522]: Desired state set to STREAMING_RESPONSE
[23:49:49][D][media_player:080]: 'Media Player' - Setting
[23:49:49][D][media_player:087]: Media URL: http://192.168.10.80:8123/api/tts_proxy/dd7f891e90e2eb75ceeda2bd2ab32502b9a12d04_en-gb_4433720218_tts.piper.flac
[23:49:49][D][media_player:093]: Announcement: yes
[23:49:49][D][voice_assistant:639]: Event Type: 2
[23:49:49][D][voice_assistant:731]: Assist Pipeline ended
[23:49:49][D][nabu_media_player.pipeline:174]: Reading FLAC file type
[23:49:50][D][voice_assistant:516]: State changed from STREAMING_RESPONSE to IDLE
[23:49:50][D][voice_assistant:522]: Desired state set to IDLE
[23:49:50][D][light:036]: 'resre-speak-jenna' Setting:
[23:49:50][D][light:047]: State: OFF
[23:49:50][D][light:109]: Effect: 'None'
Log for media player:
--------------------------- Media Player ---------------------------
[23:50:53][D][media_player:080]: 'Media Player' - Setting
[23:50:53][D][media_player:087]: Media URL: http://192.168.10.80:8123/api/esphome/ffmpeg_proxy/34a8f3fb82101368b7bf54d34cc148c9/pjUR7xSahglK-KOErP7G9g.flac
[23:50:53][D][media_player:093]: Announcement: yes
[23:50:53][D][nabu_media_player.pipeline:174]: Reading FLAC file type
[23:50:53][D][nabu_media_player.pipeline:186]: Decoded audio has 1 channels, 16000 Hz sample rate, and 16 bits per sample
[23:50:53][D][nabu_media_player.pipeline:211]: Converting mono channel audio to stereo channel audio
[23:50:53][D][ring_buffer:034][speaker_task]: Created ring buffer with size 16384
[23:50:53][D][esp-idf:000][speaker_task]: I (108476) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=4
[23:50:53]
[23:50:53][D][i2s_audio.speaker:118]: Starting Speaker
[23:50:53][D][i2s_audio.speaker:123]: Started Speaker
[23:50:58][D][esp-idf:000][speaker_task]: I (113862) I2S: DMA queue destroyed
[23:50:58]
[23:50:58][D][i2s_audio.speaker:130]: Stopping Speaker
[23:50:58][D][i2s_audio.speaker:136]: Stopped Speaker