"ReSpeaker Lite" - new Seeed Studio Voice Assistant Development Kit hardware combine ESP32 with XMOS XU316 DSP chip for advanced audio processing as a ESPHome-based Home Assistant Assist Satellite voice devkit

Which discord server is that? If I click on the link, I only get a greyed-out discord server page.

It’s ESPHome server: ESPHome

I was doing almost exactly the same enclosure. But wasn’t satisfied with result (too small back chamber for speaker, hard to print with all that internal supports, pretty hard to assemble…) and did that thing I posted before. For convenience it’s all on threaded inserts. :slight_smile:

1 Like

Many thanks. This server I forgot to look at… :slight_smile:

Here’s a fun one.
So, I just finished setting up the re-speaker thanks to formatBCE github and it’s fantastic. Wakeword detection is fantastic (can’t wait to train my own) all is running smooth. Except…

For some reason I don’t get voice responses. Everything else works without issue but voice does not. I get the little ‘Ping’ sound for activation so it’s not a speaker issue; and get ready for it… Manual tts via the media player works without problem.

I saw someone earlier who had local access issues via http but this isn’t that. I can access local over http without any issues. From what I can tell it’s a home assistant issue? except I can’t find any mention of it anywhere else which makes me think it’s a me issue. My HASSOS is up to date, esp home is the newest version, XMOS is on 1.0.9.

The media player entity flashes on and then off again like it want’s to play something but it just fails and stays silent. Trying to access those two urls from the browser results in basically the same thing. The esphome/ffmpeg_proxy endpoint returns a file but the tts_proxy returns 404

I’m including the logs cause i’ve been messing with this for the last 2 hours and am at a loss. I don’t know enough about mmw or xmos to know if it’s a satellite software issue or not and before I sink more hours into trying to debug my hass setup I figured I’d post this here and see if something jumps out at someone.

Full log from wake word detection:

--------------------------- Wake Word ---------------------------
[23:49:39][D][esp-idf:000]: I (138) gpi[D][micro_wake_word:357]: Detected 'Okay Nabu' with sliding average probability is 0.98 and max probability is 1.00
[23:49:39][D][media_player:080]: 'Media Player' - Setting
[23:49:39][D][media_player:084]:   Command: STOP
[23:49:39][D][media_player:093]:  Announcement: yes
[23:49:39][D][media_player:080]: 'Media Player' - Setting
[23:49:39][D][media_player:093]:  Announcement: yes
[23:49:39][D][ring_buffer:034]: Created ring buffer with size 48000
[23:49:39][D][ring_buffer:034]: Created ring buffer with size 48000
[23:49:39][D][ring_buffer:034]: Created ring buffer with size 65536
[23:49:39][D][ring_buffer:034]: Created ring buffer with size 65536
[23:49:39][D][nabu_media_player.pipeline:174]: Reading FLAC file type
[23:49:39][D][nabu_media_player.pipeline:186]: Decoded audio has 1 channels, 48000 Hz sample rate, and 16 bits per sample
[23:49:39][D][nabu_media_player.pipeline:208]: Converting the audio sample rate
[23:49:39][D][nabu_media_player.pipeline:211]: Converting mono channel audio to stereo channel audio
[23:49:39][D][ring_buffer:034][speaker_task]: Created ring buffer with size 16384
[23:49:39][D][esp-idf:000][speaker_task]: I (34985) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=4
[23:49:39]
[23:49:39][D][i2s_audio.speaker:118]: Starting Speaker
[23:49:39][D][i2s_audio.speaker:123]: Started Speaker
[23:49:40][D][voice_assistant:516]: State changed from IDLE to START_MICROPHONE
[23:49:40][D][voice_assistant:522]: Desired state set to START_PIPELINE
[23:49:40][D][voice_assistant:225]: Starting Microphone
[23:49:40][D][ring_buffer:034]: Created ring buffer with size 16384
[23:49:40][D][voice_assistant:516]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[23:49:40][D][voice_assistant:516]: State changed from STARTING_MICROPHONE to START_PIPELINE
[23:49:40][D][voice_assistant:280]: Requesting start...
[23:49:40][D][voice_assistant:516]: State changed from START_PIPELINE to STARTING_PIPELINE
[23:49:40][D][voice_assistant:537]: Client started, streaming microphone
[23:49:40][D][voice_assistant:516]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[23:49:40][D][voice_assistant:522]: Desired state set to STREAMING_MICROPHONE
[23:49:40][D][voice_assistant:639]: Event Type: 1
[23:49:40][D][voice_assistant:642]: Assist Pipeline running
[23:49:40][D][voice_assistant:639]: Event Type: 3
[23:49:40][D][voice_assistant:653]: STT started
[23:49:40][D][light:036]: 'resre-speak-jenna' Setting:
[23:49:40][D][light:047]:   State: ON
[23:49:40][D][light:051]:   Brightness: 60%
[23:49:40][D][light:059]:   Red: 100%, Green: 20%, Blue: 100%
[23:49:40][D][light:109]:   Effect: 'Slow Pulse'
[23:49:41][D][voice_assistant:639]: Event Type: 11
[23:49:41][D][voice_assistant:802]: Starting STT by VAD
[23:49:41][D][esp-idf:000][speaker_task]: I (36356) I2S: DMA queue destroyed
[23:49:41]
[23:49:41][D][i2s_audio.speaker:130]: Stopping Speaker
[23:49:41][D][i2s_audio.speaker:136]: Stopped Speaker
[23:49:43][D][voice_assistant:639]: Event Type: 12
[23:49:43][D][voice_assistant:806]: STT by VAD end
[23:49:43][D][voice_assistant:516]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[23:49:43][D][voice_assistant:522]: Desired state set to AWAITING_RESPONSE
[23:49:43][D][voice_assistant:516]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[23:49:43][D][light:036]: 'resre-speak-jenna' Setting:
[23:49:43][D][light:051]:   Brightness: 60%
[23:49:43][D][light:059]:   Red: 100%, Green: 20%, Blue: 100%
[23:49:43][D][light:109]:   Effect: 'Fast Pulse'
[23:49:43][D][voice_assistant:516]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[23:49:43][D][voice_assistant:516]: State changed from AWAITING_RESPONSE to AWAITING_RESPONSE
[23:49:47][D][esp32.preferences:114]: Saving 1 preferences to flash...
[23:49:47][D][esp32.preferences:143]: Saving 1 preferences to flash: 1 cached, 0 written, 0 failed
[23:49:47][D][voice_assistant:639]: Event Type: 4
[23:49:47][D][voice_assistant:667]: Speech recognised as: " Turn on the bedroom light."
[23:49:47][D][voice_assistant:639]: Event Type: 5
[23:49:47][D][voice_assistant:672]: Intent started
[23:49:49][D][voice_assistant:639]: Event Type: 6
[23:49:49][D][voice_assistant:639]: Event Type: 7
[23:49:49][D][voice_assistant:695]: Response: "The bedroom light has been turned on."
[23:49:49][D][light:036]: 'resre-speak-jenna' Setting:
[23:49:49][D][light:051]:   Brightness: 60%
[23:49:49][D][light:059]:   Red: 20%, Green: 100%, Blue: 100%
[23:49:49][D][light:109]:   Effect: 'Slow Pulse'
[23:49:49][D][voice_assistant:639]: Event Type: 8
[23:49:49][D][voice_assistant:717]: Response URL: "http://192.168.10.80:8123/api/tts_proxy/dd7f891e90e2eb75ceeda2bd2ab32502b9a12d04_en-gb_4433720218_tts.piper.flac"
[23:49:49][D][voice_assistant:516]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[23:49:49][D][voice_assistant:522]: Desired state set to STREAMING_RESPONSE
[23:49:49][D][media_player:080]: 'Media Player' - Setting
[23:49:49][D][media_player:087]:   Media URL: http://192.168.10.80:8123/api/tts_proxy/dd7f891e90e2eb75ceeda2bd2ab32502b9a12d04_en-gb_4433720218_tts.piper.flac
[23:49:49][D][media_player:093]:  Announcement: yes
[23:49:49][D][voice_assistant:639]: Event Type: 2
[23:49:49][D][voice_assistant:731]: Assist Pipeline ended
[23:49:49][D][nabu_media_player.pipeline:174]: Reading FLAC file type
[23:49:50][D][voice_assistant:516]: State changed from STREAMING_RESPONSE to IDLE
[23:49:50][D][voice_assistant:522]: Desired state set to IDLE
[23:49:50][D][light:036]: 'resre-speak-jenna' Setting:
[23:49:50][D][light:047]:   State: OFF
[23:49:50][D][light:109]:   Effect: 'None'

Log for media player:

--------------------------- Media Player ---------------------------
[23:50:53][D][media_player:080]: 'Media Player' - Setting
[23:50:53][D][media_player:087]:   Media URL: http://192.168.10.80:8123/api/esphome/ffmpeg_proxy/34a8f3fb82101368b7bf54d34cc148c9/pjUR7xSahglK-KOErP7G9g.flac
[23:50:53][D][media_player:093]:  Announcement: yes
[23:50:53][D][nabu_media_player.pipeline:174]: Reading FLAC file type
[23:50:53][D][nabu_media_player.pipeline:186]: Decoded audio has 1 channels, 16000 Hz sample rate, and 16 bits per sample
[23:50:53][D][nabu_media_player.pipeline:211]: Converting mono channel audio to stereo channel audio
[23:50:53][D][ring_buffer:034][speaker_task]: Created ring buffer with size 16384
[23:50:53][D][esp-idf:000][speaker_task]: I (108476) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=4
[23:50:53]
[23:50:53][D][i2s_audio.speaker:118]: Starting Speaker
[23:50:53][D][i2s_audio.speaker:123]: Started Speaker
[23:50:58][D][esp-idf:000][speaker_task]: I (113862) I2S: DMA queue destroyed
[23:50:58]
[23:50:58][D][i2s_audio.speaker:130]: Stopping Speaker
[23:50:58][D][i2s_audio.speaker:136]: Stopped Speaker

My logs look identical. And i see your speaker is in started state for 5 seconds, so it is actually responding…
Could it be that TTS result generated incorrectly? Can you copy that .flac file URL from logs right after interaction, and try to paste it into browser and play?
[EDIT]
Oh sorry, i dove into the logs and actually missed that you tried it. Well, it looks like pipeline problem. Try to change your TTS provider to check if it helps. It might be that “manual TTS” is using different voice?

It was a pipeline issue. Although i’m not sure as to what. I tried the Cloud TTS and it worked so for some reason piper wasn’t working. I tried deleting and setting up a new voice assistant, restoring from a backup, and trying different piper voices. I ended up re-installing piper and that seems to have fixed it now :person_shrugging:

As a side note I saw in your repo that more Leds is on your todo section, I also am messing around with that so I ordered a small 8 LED ws2812 light strip that I’m gonna try to add to the onboard ws2812 led. I’ll update after I get the leds with how it goes

2 Likes

Yeah I guess using them on same GPIO as inbuilt LED would work. For myself I decided to pause on that so far - it’s too much of customization, and doesn’t fit simple integration. :slight_smile:

Also I decided to pause on multiple buttons. While I successfully did that (added 3 buttons with different resistors in parallel, and used GPIO3 for analog reading), it brought a lot of complication, and definitely not something that would be appreciated by majority of people. It’s easier to make voice command for volume up/down, than soldering 5 resistors and 3 buttons, and extending YAML with custom-tuned section for resistance sensor…

So I just keep it as simple as possible now. Speaker and enclosure, that’s it. So far very positive feedback from family. :slight_smile:

3 Likes

One thing I will say is that I do not love the onboard lighting (which was why my V1 at least tried to make use of the haos logo and do something creative). Someday down the road if you get around to publishing the YAML for external LEDs with GPIO mapping I’d love to make use of something that allows us to, say, ring an LED tube around a circular speaker cover.

Will finally get to side-by-side sound quality on the new design tonight. Hope the better speaker and higher enclosure volume help!

But you already can. Just use D0 (GPIO1) on S3, put your dout for led ring there.

It will be working in parallel with onboard LED.

1 Like

First of all, a big shout out to @formatBCE for his work on repeaker-lite! I’m using his instructions and yaml and really loving the result!

In a previous life i worked on some 3rd party voice assistants (you know, around 2018 when Alexa was being designed into everything going…!). So, I know the important of sealing the mics, reducing vibrations, not acoustically overloading the mics and the importance of AEC. With all of this in mind, is there a way to listen to the audio that the wakeword “hears”? I want to know if the AEC is working well, whether i need more mechanical damping and whether the mics should be further from the speaker etc.

I’d really appreciate any help, thanks!

1 Like

Thanks for kind words! :slight_smile:

Yes, AFAIK you can use UDP to send mic audio:

1 Like

BTW,your insights on how to build good enclosure and reduce mics audio distortion could make night and day difference! If you find useful info, please share :slight_smile:

2 Likes

I have what will likely amount to another stupid question, so apologies in advance if my search failed to turn up an answer when there was one.

Is it possible to have a group of respeaker lite satellites listen and receive commands but only have 1 respond?

It’s done by default. If several satellites report same wake word to the HA simultaneously, only first one will be connected to the pipeline. Others will turn off voice assistant.

1 Like

Sorry not what I meant. I wanted a single satellite to be the master in a sense. Like have a nice centralized amped speaker with a respeaker connected. Then have several others around listening but only the central one replies back to commands received by the others.

Well, you can redirect TTS responses to other media player (which is part of satellite), so you can do it for every satellite and route TTS to your main instance.
Don’t know how to do something like this (and why) otherwise. :slight_smile:

1 Like

Hey @AndyCap did you by chance have a look into 48kHz stuff? I saw that speaker works, but mic doesn’t for sure without resampler…

Do you have a link to the stl for this enclosure?

I guess I will post it. I thought about revamping it, but it’s good enough.

However, this enclosure requires fabric to be glued on. Probably I’ll try to make some basic face plate instead.

2 Likes