Home assistant Satellites. (client hardware for home assistant) my wishlist

Type 1 does not need to be push to talk. Snips was using satellite mics with a pi-zero and respeaker that used trigger words, and then passed the following audio to the main server to be processed. There’s no need to limit this at all to push to talk OR single phrase activity.

1 Like

The Rpi-Zero can do wake-word detection with e.g. Porcupine, I agree. The reason I mentioned push-to-talk is because not all cheap hardware will be able to do it and you usually have to keep these devices rather close anyway unless you use cloud ASR, stick to English language for open-source or massively reduce your vocabulary (for German I can use Vosk + ~1000 Words custom LM in a range of about 2m).

“Single phrase activity” might have been a misunderstanding. I was talking about multi-turn conversations which require at least a speaker or display and increase complexity of the client a lot.

1 Like

The satellite only need to do wake word detection.
The rest an be offloaded to a power server.

3 Likes

I think there are 3 main questions that everybody needs to ask themselves when thinking about satellites:

  • At what distance do I want to use it? (1m, 2m, more?)
  • Do I want to play music on the device?
  • How much am I willing to spend? (20$, 100$, more?)
1 Like

I found something that could work, ReSpeaker Core v2.0
https://www.pbtech.com/product/SBCSED0006/Seeed-ReSpeaker-Core-v20-Powered-by-Axol-Core-Modu

From Seeeds own website ReSpeaker Core v2.0 | Seeed Studio Wiki
It seems perfect for this type of use.

1 Like

I don’t think that’s quite correct; I think MA supports sync under some conditions? (see below) Note that the below item, while it is a planned change, references existing capabilities … but the changes seem to be in the furthest-out schedule state.

Link: Music Assistant (V2) backlog · GitHub - Support (experimental) sync of different speakers/ecosystems within Universal group

Content:

Support (experimental) sync of different speakers/ecosystems within Universal group

Item status

Draft

marcelveldt opened on Apr 19

Description

marcelveldt

on Apr 19 (edited)

The Universal Group provider only syncs speakers of the same ecosystem.
Players that do not support sync at all will not be synced and also speakers of different ecosystems will not sync together.

Add an experimental toggle to allow some basic sync (timestamp based) of players so at least they more or less start playing the same song at the same time and not drift seconds apart.

  • start delay (prepend silence or cut frames from the beginning)
  • sync based on elapsed time (best effort, not accurate)
  • standard drift

It is just a newly added draft with no assigned developers.

1 Like

A valid point, thank you for considering and commenting.

I add this to the ts

This looks promising. Only too bad for the ddr looks

1 Like

That’s quite cool.

Nabu casa is working on a voice satilite.

Heared it on a podcast. Uses esphome and has multiple mics

1 Like

Tested this one. Quite a capable sensor!

More info: UltimateSensor Mini: The Compact Powerhouse for Smart Home Automation - espthings.io

Not sure if all of you have followed news about Nabu Casa’s upcming voice-assistant hardware project, but just heard Paulus Schoutsen reveal on their Home Assistant’s ESPHome Summer Release Party on YouTube that Nabu Casa’s ESPHome developers are working on a new open-source hardware platform for their voice-assistant products that will based on ESP32-S3 in combination with a very powerful XMOS xCORE chip for audio processing.

I think similar xCORE chips from XMOS is by the way used in Amazon Alexa Voice Service (AVS) Development Kit(s) and Amazon Echo products:

…and while I am unsure of it I would suspect that Google Nest / Google Home smart speaker series also contain XMOS xCORE chip?

Anyway, as for my wishist:

First of all, I wish that there could be a native HiFi quality music players inside ESPHome (or Home Assistant) that would fully integrated with Music Assistant so that you could simply set these speakers as “Player Provider” inside Music Assistant, as well as allow to group speakers for syncronized multi-room playback.

Thus wondering if ESPHome voice assistent combined hardware and firmware platform will also be great for music playback if high-quality amplifier and speakers are used?

Any work being done to also make ESPHome based voice-assistant devices better media player recievers with native support for featues such a multi-room and syncronized Hi-Fi quality playback?

I am hoping that since Nabu Casa’s designs it said to be open-source hardware and XMOS integration will probably be added to the ESPHome’s Media Player Components (and Microphone Components) I for one am hoping that it could and will be extended to different types of speakerless solutions with appliance solutions with AUX-output/audio-output and AUX-input/audio-input port and not only for voice-assistant.

Personally I would also love to see inexpensive speakerless network-streamer player/receiver hardware without microphones but only with with AUX-out that can connect to any of your existing amplifiers or speakers with built-in amplifiers in order to replace products like Chromecast Audio and Amazon Echo Input / Echo Link Amp, (e.i. devices with no on-board speakers that must be connected to external speakers for audio output (AUX-output).

That is, I am sure that not everyone only wants “smart speakers” with voice-assistant and that instead many would be also happy to have network streamers/players without microphone which only purpose is to receive and output highest quality audio possible from Music Assistant to your “dumb” speakers.

I for one still have loads of Chromecast Audio audio-only receivers connected to various models and brands of different speaker/reciever systems in each room used to achieve multi-room music playback on a budget (because could not afford Sonos speakers in all rooms).

So even if though Nabu Casa’s hardware will initially primarly be designed for “Home Assistant Satellite” (also known as “Wyoming Satellite”) for voice-assistant appliances, such open-source hardware it just like the ESPHome firmware does have a lot of potential for different use cases.

Also on my wishlist if a network streamer receiver hardware with AUX-input and ADC to get music from analog audio source. As an easy way to achieve a remote AUX input into Music Assistant from an external analog audio source like a vinyl record player (LP turntable) or cassette player.

What I want to achieve is a solution that is easy to install/maintain and use that allow my wife to stream music from a vinyl record player (LP turntable) to any speaker or group of speakers in our home. The vinyl record player (turntable) setup she has a pre-amp with phono (RCA) output ports for analog audio in stereo.

  • Architecture example: Analog audio source with preamp → ADC network appliance → music stream → Music Assistant → Any speakers

I would therefore prefer if we could buy some kind of networked (Wi-Fi) enabled appliance like a music streamer with stereo AUX input port that it will use for on-the-fly perform analog-to-digital conversion (ADC) + encoding for streaming to a Music Provider inside Music Assistant.

I do however think that both such a solution does need its own non-propriatory audio-only streaming protocol for high-quality music streams?

1 Like

In the Voice - Chapter 7 livestream Mike also mentioned that Nabu Casa are working on ESP32-S3 based voice satellite; but stated that they really want to get more of the audio processing software DSP (Digital Signal Processing - including AEC, noise supression, beam forming) magic working first … and not to expect it soon.

Good call, I say ! I have been using Rhasspy with HA for a couple of years, and now wyoming-satellite. An ESP32-S3 satellite will be cheaper than RasPi + reSpeaker options - but still way off the price and audio quality people have gotten used to from the alexa & google devices.

Nabu Casa can’t match those big companies deep pockets to subsidise their hardware; which leaves audio quality and non-cloud as the features customers will look at. I hope A & G start charging a subscription to use their cloud service, which will help counter the price difference.

Being non-cloud is worth something, especially to Home Assistant users, but how much ? I doubt enough to offset a product at higher price with poorer audio quality. The problem is that the DSP implementations are proprietary competitive advantage; and not being released as Open Source.

By the way, I believe that wyoming-satellite is RasPi ( and maybe more generic linux) app; whereas the ESP32 devices are using ESPHome components.

And it does seem obvious to me that a satellite device with speaker connected, should also expose a media_player interface.

1 Like

FYI, Seeed Studio released ”ReSpeaker Lite " and “ReSpeaker Lite Voice Assistant Kit ” products, with one 2-Mic Array board model that combine XMOS XU-316 + ESP32-S3 for advanced audio processing with ESPHome support, and another DIY-varient as a 2-Mic Array board model with you can use with your own compute solution (other MCU or SBC/computer such as s a Raspberry Pi) via I2S or USB connection:

Awesome news:

Looking for something off the shelf that I can put in my living room. I’m curious if anyone has tried the ReSpeaker 4-mic array w/case from Seeed Studio. How well does the mic and speaker work in a solid case like that? That’s an unusual design choice, and atypical of most assistant devices.

Wondering if it’s worth $70 or if I should hold out for the Satellite from Future Proof Homes.

Have the ReSpeaker USB 4-mic array waiting in a drawer for some time.
Still hope that some HA satellite device will support it in the future :wink: