"ReSpeaker Lite" - new Seeed Studio Voice Assistant Development Kit hardware combine ESP32 with XMOS XU316 DSP chip for advanced audio processing as a ESPHome-based Home Assistant Assist Satellite voice devkit

I’m focused on music. The voice is perfect with the current iteration. And yeah I know music is limited at 16 but I also know it can be better (sounds much better on bose companion through 3.5 mm, for example. so just trying to squeeze out enough of an improvement to shed the Echo solution for good.

If you get around to posting your stl files for the enclosure you made I’d love to print one and try it out.

2 Likes

You are going to be limited to power output of the Dev board and any associated noise it picks up. There is only so much the digital to analog signal can do over JST powered via USB. Even though it’s 16khz over thee 3.5mm jack, you’re still sending it to something that probably doing some extra audio processing even though it’s already an analogue signal.

When running 3.5mm to my AMP only (DAC is separate) using a 3.5mm to L/R RCA adapter cable music sounds very good. Not as good as going through my DAC using say, a media player over USB/optical but it’s way better than any JST speaker, at least any speaker I have tried that the Dev board can actually drive.

1 Like

Ok interesting. What sort of noise was it? A general constant noise floor or changing depending on what the re-speaker was doing?

Here is what I get, so noise floor first, then broadband noise, then ‘silence’ with audio corruption and answer. : Dropbox

I can probably help you out with that if you direct me towards the code that needs changing. is it GitHub - esphome/home-assistant-voice-pe: Home Assistant Voice PE ?

As an aside I have started looking at getting HA running on one of these PÚCA DSP — OHMIC which has the advantage/disadvantage of not having the XMOS chip and far better audio quality. If I ever get it going I will let all you guys know!

As another aside is the XMOS source available anywhere, I have some XMOS development kits including an AI board and it would be interesting to see what they are doing and have the ability to modify it.

3 Likes

No doubt. All clear on that front. But limiting it to just a speaker and the kit is appealing from a cost, size and simplicity perspective. So I figure a little effort in CAD and a couple speaker trials could net a noticeable difference and that’s worth it to me.

oh wow. yeah nothing like that. I checked last night and it doesn’t happen at all when I eliminate the buck converter. When I power them separately everything is fine so it has to be a ground loop. I just wish I knew how to keep it from happening without an isolator that destroys gain and quality as I can fit them both in a small enclosure and really spice up the quality of music playback. But I am woefully ignorant on this stuff.

Yes indeed, this code. Please take a look, that would be awesome!

There’s a lot of discussion ongoing about it here:

You can read last ~50 messages for context. :slight_smile:

Also I can ask questions to Seeed dev, who made 48kHz firmware, if needed.

1 Like

Super, I will take a look…

Also, do you have a link to that firmware?

Yup, here: ReSpeaker_Lite/xmos_firmwares/respeaker_lite_i2s_dfu_firmware_48k_v1.0.9.bin at d272ef8059b7f39dbadb9d791b67a75ecd4b6dbd · respeaker/ReSpeaker_Lite · GitHub

2 Likes

Nice one, thanks :slight_smile:

1 Like

Some corrections to make but I’m generally satisfied with fitment. Need to reprint with changes and put it all together for testing but it’s moving along.

4 Likes

Which discord server is that? If I click on the link, I only get a greyed-out discord server page.

It’s ESPHome server: ESPHome

I was doing almost exactly the same enclosure. But wasn’t satisfied with result (too small back chamber for speaker, hard to print with all that internal supports, pretty hard to assemble…) and did that thing I posted before. For convenience it’s all on threaded inserts. :slight_smile:

1 Like

Many thanks. This server I forgot to look at… :slight_smile:

Here’s a fun one.
So, I just finished setting up the re-speaker thanks to formatBCE github and it’s fantastic. Wakeword detection is fantastic (can’t wait to train my own) all is running smooth. Except…

For some reason I don’t get voice responses. Everything else works without issue but voice does not. I get the little ‘Ping’ sound for activation so it’s not a speaker issue; and get ready for it… Manual tts via the media player works without problem.

I saw someone earlier who had local access issues via http but this isn’t that. I can access local over http without any issues. From what I can tell it’s a home assistant issue? except I can’t find any mention of it anywhere else which makes me think it’s a me issue. My HASSOS is up to date, esp home is the newest version, XMOS is on 1.0.9.

The media player entity flashes on and then off again like it want’s to play something but it just fails and stays silent. Trying to access those two urls from the browser results in basically the same thing. The esphome/ffmpeg_proxy endpoint returns a file but the tts_proxy returns 404

I’m including the logs cause i’ve been messing with this for the last 2 hours and am at a loss. I don’t know enough about mmw or xmos to know if it’s a satellite software issue or not and before I sink more hours into trying to debug my hass setup I figured I’d post this here and see if something jumps out at someone.

Full log from wake word detection:

--------------------------- Wake Word ---------------------------
[23:49:39][D][esp-idf:000]: I (138) gpi[D][micro_wake_word:357]: Detected 'Okay Nabu' with sliding average probability is 0.98 and max probability is 1.00
[23:49:39][D][media_player:080]: 'Media Player' - Setting
[23:49:39][D][media_player:084]:   Command: STOP
[23:49:39][D][media_player:093]:  Announcement: yes
[23:49:39][D][media_player:080]: 'Media Player' - Setting
[23:49:39][D][media_player:093]:  Announcement: yes
[23:49:39][D][ring_buffer:034]: Created ring buffer with size 48000
[23:49:39][D][ring_buffer:034]: Created ring buffer with size 48000
[23:49:39][D][ring_buffer:034]: Created ring buffer with size 65536
[23:49:39][D][ring_buffer:034]: Created ring buffer with size 65536
[23:49:39][D][nabu_media_player.pipeline:174]: Reading FLAC file type
[23:49:39][D][nabu_media_player.pipeline:186]: Decoded audio has 1 channels, 48000 Hz sample rate, and 16 bits per sample
[23:49:39][D][nabu_media_player.pipeline:208]: Converting the audio sample rate
[23:49:39][D][nabu_media_player.pipeline:211]: Converting mono channel audio to stereo channel audio
[23:49:39][D][ring_buffer:034][speaker_task]: Created ring buffer with size 16384
[23:49:39][D][esp-idf:000][speaker_task]: I (34985) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=4
[23:49:39]
[23:49:39][D][i2s_audio.speaker:118]: Starting Speaker
[23:49:39][D][i2s_audio.speaker:123]: Started Speaker
[23:49:40][D][voice_assistant:516]: State changed from IDLE to START_MICROPHONE
[23:49:40][D][voice_assistant:522]: Desired state set to START_PIPELINE
[23:49:40][D][voice_assistant:225]: Starting Microphone
[23:49:40][D][ring_buffer:034]: Created ring buffer with size 16384
[23:49:40][D][voice_assistant:516]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[23:49:40][D][voice_assistant:516]: State changed from STARTING_MICROPHONE to START_PIPELINE
[23:49:40][D][voice_assistant:280]: Requesting start...
[23:49:40][D][voice_assistant:516]: State changed from START_PIPELINE to STARTING_PIPELINE
[23:49:40][D][voice_assistant:537]: Client started, streaming microphone
[23:49:40][D][voice_assistant:516]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[23:49:40][D][voice_assistant:522]: Desired state set to STREAMING_MICROPHONE
[23:49:40][D][voice_assistant:639]: Event Type: 1
[23:49:40][D][voice_assistant:642]: Assist Pipeline running
[23:49:40][D][voice_assistant:639]: Event Type: 3
[23:49:40][D][voice_assistant:653]: STT started
[23:49:40][D][light:036]: 'resre-speak-jenna' Setting:
[23:49:40][D][light:047]:   State: ON
[23:49:40][D][light:051]:   Brightness: 60%
[23:49:40][D][light:059]:   Red: 100%, Green: 20%, Blue: 100%
[23:49:40][D][light:109]:   Effect: 'Slow Pulse'
[23:49:41][D][voice_assistant:639]: Event Type: 11
[23:49:41][D][voice_assistant:802]: Starting STT by VAD
[23:49:41][D][esp-idf:000][speaker_task]: I (36356) I2S: DMA queue destroyed
[23:49:41]
[23:49:41][D][i2s_audio.speaker:130]: Stopping Speaker
[23:49:41][D][i2s_audio.speaker:136]: Stopped Speaker
[23:49:43][D][voice_assistant:639]: Event Type: 12
[23:49:43][D][voice_assistant:806]: STT by VAD end
[23:49:43][D][voice_assistant:516]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[23:49:43][D][voice_assistant:522]: Desired state set to AWAITING_RESPONSE
[23:49:43][D][voice_assistant:516]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[23:49:43][D][light:036]: 'resre-speak-jenna' Setting:
[23:49:43][D][light:051]:   Brightness: 60%
[23:49:43][D][light:059]:   Red: 100%, Green: 20%, Blue: 100%
[23:49:43][D][light:109]:   Effect: 'Fast Pulse'
[23:49:43][D][voice_assistant:516]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[23:49:43][D][voice_assistant:516]: State changed from AWAITING_RESPONSE to AWAITING_RESPONSE
[23:49:47][D][esp32.preferences:114]: Saving 1 preferences to flash...
[23:49:47][D][esp32.preferences:143]: Saving 1 preferences to flash: 1 cached, 0 written, 0 failed
[23:49:47][D][voice_assistant:639]: Event Type: 4
[23:49:47][D][voice_assistant:667]: Speech recognised as: " Turn on the bedroom light."
[23:49:47][D][voice_assistant:639]: Event Type: 5
[23:49:47][D][voice_assistant:672]: Intent started
[23:49:49][D][voice_assistant:639]: Event Type: 6
[23:49:49][D][voice_assistant:639]: Event Type: 7
[23:49:49][D][voice_assistant:695]: Response: "The bedroom light has been turned on."
[23:49:49][D][light:036]: 'resre-speak-jenna' Setting:
[23:49:49][D][light:051]:   Brightness: 60%
[23:49:49][D][light:059]:   Red: 20%, Green: 100%, Blue: 100%
[23:49:49][D][light:109]:   Effect: 'Slow Pulse'
[23:49:49][D][voice_assistant:639]: Event Type: 8
[23:49:49][D][voice_assistant:717]: Response URL: "http://192.168.10.80:8123/api/tts_proxy/dd7f891e90e2eb75ceeda2bd2ab32502b9a12d04_en-gb_4433720218_tts.piper.flac"
[23:49:49][D][voice_assistant:516]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[23:49:49][D][voice_assistant:522]: Desired state set to STREAMING_RESPONSE
[23:49:49][D][media_player:080]: 'Media Player' - Setting
[23:49:49][D][media_player:087]:   Media URL: http://192.168.10.80:8123/api/tts_proxy/dd7f891e90e2eb75ceeda2bd2ab32502b9a12d04_en-gb_4433720218_tts.piper.flac
[23:49:49][D][media_player:093]:  Announcement: yes
[23:49:49][D][voice_assistant:639]: Event Type: 2
[23:49:49][D][voice_assistant:731]: Assist Pipeline ended
[23:49:49][D][nabu_media_player.pipeline:174]: Reading FLAC file type
[23:49:50][D][voice_assistant:516]: State changed from STREAMING_RESPONSE to IDLE
[23:49:50][D][voice_assistant:522]: Desired state set to IDLE
[23:49:50][D][light:036]: 'resre-speak-jenna' Setting:
[23:49:50][D][light:047]:   State: OFF
[23:49:50][D][light:109]:   Effect: 'None'

Log for media player:

--------------------------- Media Player ---------------------------
[23:50:53][D][media_player:080]: 'Media Player' - Setting
[23:50:53][D][media_player:087]:   Media URL: http://192.168.10.80:8123/api/esphome/ffmpeg_proxy/34a8f3fb82101368b7bf54d34cc148c9/pjUR7xSahglK-KOErP7G9g.flac
[23:50:53][D][media_player:093]:  Announcement: yes
[23:50:53][D][nabu_media_player.pipeline:174]: Reading FLAC file type
[23:50:53][D][nabu_media_player.pipeline:186]: Decoded audio has 1 channels, 16000 Hz sample rate, and 16 bits per sample
[23:50:53][D][nabu_media_player.pipeline:211]: Converting mono channel audio to stereo channel audio
[23:50:53][D][ring_buffer:034][speaker_task]: Created ring buffer with size 16384
[23:50:53][D][esp-idf:000][speaker_task]: I (108476) I2S: DMA Malloc info, datalen=blocksize=2048, dma_buf_count=4
[23:50:53]
[23:50:53][D][i2s_audio.speaker:118]: Starting Speaker
[23:50:53][D][i2s_audio.speaker:123]: Started Speaker
[23:50:58][D][esp-idf:000][speaker_task]: I (113862) I2S: DMA queue destroyed
[23:50:58]
[23:50:58][D][i2s_audio.speaker:130]: Stopping Speaker
[23:50:58][D][i2s_audio.speaker:136]: Stopped Speaker

My logs look identical. And i see your speaker is in started state for 5 seconds, so it is actually responding…
Could it be that TTS result generated incorrectly? Can you copy that .flac file URL from logs right after interaction, and try to paste it into browser and play?
[EDIT]
Oh sorry, i dove into the logs and actually missed that you tried it. Well, it looks like pipeline problem. Try to change your TTS provider to check if it helps. It might be that “manual TTS” is using different voice?

It was a pipeline issue. Although i’m not sure as to what. I tried the Cloud TTS and it worked so for some reason piper wasn’t working. I tried deleting and setting up a new voice assistant, restoring from a backup, and trying different piper voices. I ended up re-installing piper and that seems to have fixed it now :person_shrugging:

As a side note I saw in your repo that more Leds is on your todo section, I also am messing around with that so I ordered a small 8 LED ws2812 light strip that I’m gonna try to add to the onboard ws2812 led. I’ll update after I get the leds with how it goes

2 Likes

Yeah I guess using them on same GPIO as inbuilt LED would work. For myself I decided to pause on that so far - it’s too much of customization, and doesn’t fit simple integration. :slight_smile:

Also I decided to pause on multiple buttons. While I successfully did that (added 3 buttons with different resistors in parallel, and used GPIO3 for analog reading), it brought a lot of complication, and definitely not something that would be appreciated by majority of people. It’s easier to make voice command for volume up/down, than soldering 5 resistors and 3 buttons, and extending YAML with custom-tuned section for resistance sensor…

So I just keep it as simple as possible now. Speaker and enclosure, that’s it. So far very positive feedback from family. :slight_smile:

3 Likes

One thing I will say is that I do not love the onboard lighting (which was why my V1 at least tried to make use of the haos logo and do something creative). Someday down the road if you get around to publishing the YAML for external LEDs with GPIO mapping I’d love to make use of something that allows us to, say, ring an LED tube around a circular speaker cover.

Will finally get to side-by-side sound quality on the new design tonight. Hope the better speaker and higher enclosure volume help!

But you already can. Just use D0 (GPIO1) on S3, put your dout for led ring there.

It will be working in parallel with onboard LED.