The era of open voice assistants has arrived

I am not sure if that is implemented yet, Rhasspy used to handle multiple satelites but still not sure if in anyway it could detect the best signal or distance.

I don’t like the satelite infrastructure and that mics should connect to a websockets server that can handle concurrent clients.
That way its pretty easy to organise them into zones and generally a KWS gives hit probability of 0-1, so the KWS that sends the highest hit probability as long as its the same model should be a good indication of the best mic to use…
You have a small ‘debounce latency’ delay to wait for all the results come in and use the highest score and boot the other kws mic arrays.
.99 means its 99% probable the KW is correct and with the same model and hardware then the highest hit should be the best and likely closest, without need of complex additional algs.

I have been banging on about that for a while where zones are at the start of the process than allocating after and will be working on a rough demo the next couple of days as an alternative to the peer2peer wyoming…

Alot has been cobbled together. Emerging entrepreneurs creating their own hardware (albeit backed by nabu casa repos) and FOMO are what I am assuming caused the rush to get the hardware out and the software decent enough to show off. XMOS hardware is most likely because it’s proven effective and a shortcut to get all of that working in one package rather than glueing together and reinventing existing oss projects.

Tbh, it reminds me of Steve Jobs swapping crashing iPhones during its reveal. Ok, maybe not quite that, but it still begs the question of why so many things seem rushed. This isn’t me talking down, I understand things aren’t as cut and dried as that, but it’s still something to point out.

All in all, a great job. I am travelling down the assist rabbit hole with non PE hardware and just in the couple months I’ve been tinkering, the intent processing has changed drastically. Last week I could ask assist to “turn the temperature to 22” and it would, this week, I get “sorry I couldn’t understand that”, which means things are moving fast.

I get what the end goal is here but I also am bemused with this iteration of hardware and some of the software choices. It seems like nabu wants its own ecosystem (Wyoming backed data transfering), and I think we’re all a little weary of voice assistant ecosystems.

I’ll be optimistically cautious and take the good with the bad. A lot of hard work has gone into getting what we have out and I am grateful for it.

Here’s to putting on sunglasses, the future looks bright :sunglasses:.

1 Like

I have never had any commercially available smart speakers/ voice assistants and could not care less about them.
Zero need to ask such a thing to tell me a joke or read some crap from the Internet to me.
For smart music streaming I have been running a squeezebox system for more than 15 years, which perfectly integrated into home assistant.
So no need, for me, to have yet another music player.
If this new device has better voice recognition, which it seems to, that is all I am asking for to begin with. I would even consider using rhasspy speech since the intents cover 90% of my use cases.
I am very excited for the new voice assistant and curious how it will perform.
If I support all the great effort through its price this is fine with me. All the other stuff that I am doing in my house on the shoulders of home assistant came for free after all.

I have the feeling that many people seem to live in a reality where they expect everything for free…after all Google is, right :rofl:

If your privacy is worth nothing to you, this might not be the right plattform.

Thanks to everyone at HA and Nabu Casa for the great effort that went into this, and everything else that is already there, of course.

Merry X-mas

Merc

3 Likes

Already ordered mine - thank you for this awesome release.
So about 3d files. Would you be releasing STEP files? Or any other solids? STLs are fine for 3d printing and accessories but modifications need solid files not mesh. Yes, we can reverse engineer the STLs but it would be a lot easier if you just released originals. I already got some ideas on how to make those files easier to print. Thanks!

Please also add a switch to mute the speaker(s).

Plug any 3.5mm stereo/mono plug into the ext. speaker socket. That will disconnect the internal speaker.

I think 2 voice-related things could be improved/added:

  1. Timers. I wish we had maybe Timers item in side panel. So you could set timer from the dashboard too, pause and cancel timers from the dashboard. Set timers beep sound and output speaker.

  2. Alarm clock. Looks like it’s completely missing in HA. I wish we could set Alarm clock with voice or in dashboard. Like timers it needs a place in side panel too.


There could be indicator on the dashboard if timer or Alarm clock is active.

1 Like

Does anyone know one way or the other for sure if the device can initiate conversations via automations or is it strictly always only listening for the wake word to initiate some interaction?

Hmmm… Apparently you can no longer buy them through Seeedstudio

+1 however music playback specific requests are probably better directed to the Music Assistant project (which do share some of the same developers as Home Assistant), check out these existing feature request discussions about ESP32 playback:

I posted this directly related feature request there asking for "Matter Casting " (a.k.a. MatterCast):

There is by the way a good summary of the Music Assistant project in this blog post:

Would it be possible to add “Matter Casting” (a.k.a. MatterCast) audio/music and video player (streaming reciever) for new/upcoming video and music cast standard support in the future?

Please consider researching and planning for adding custom “Matter Casting” (a.k.a. MatterCast) receiver/client for Music Assistant and later in the future also “Matter Casting” streaming service for newer and upcoming connected smart-speakers and smart-displays/televisions (like the latest products from Amazon) that will be able to act as receiver endpoints and audio/music player for “Matter Casting” (a.k.a MatterCast) audio/music and video streaming when those become available. Matter Casting is aimed at democratize local video and audio casting in a universal way that can be supported by all ecosystems and platforms.

Matter Casting” is a new open protocol media streaming standard and the Matter Casting APIs for casting video and audio streams over a local network is only a small part of the currently much-hyped Matter standard suite for IoT which is being led, promoted and developed by the CSA (Connectivity Standards Alliance) and its very impressive list of member companies:

Yep, ffmpeg is used on the HA side to convert all incoming audio into something ESPHome understands (feature not limited to Voice PE)

1 Like

Wyoming is an open voice assistant protocol that we created to allow hosting Speech-to-Text etc engines in different processes or hosts (like your beefy server). Voice PE leverages the ESPHome protocol as it’s an ESPHome device. Everything is open and we encourage people and companies to use and integrate them as they wish.

This is not possible yet but planned.

Home Assistant contains wake word deduplication. Only one device will respond.

I know it is out of scope of this device… but I’d love it to have HDMI output to be able to output HA dashboard to connected TV in somehow standard way, directly from HA :slight_smile:

The RK3588 devices are likely to be a really good solution for that as Collabora supposedly continuing mainline dev as I think even though the intel have great hdmi/video support they lack CEC hmdi control (could be wrong)
The RK3588 does do cec but also it has a HDMI input that is ARC compatible where opensource wireless audio could service all devices.

If you have a Cec device it could know the channel you are on switch to a HA dashboard and on an exit switch back to the original channel or any type of control.
Pi’s have CEC but no ARC and even the PI5 with no hardware encode struggles with 4k.

Collabora are doing some amazing work but its complex and slow especially as Rockchip is not supporting opensource any more. Upstream support for Rockchip's RK3588: Progress and future plans

With intel you can buy an adapter and HDMI-CEC - Home Assistant does have an intergration but don’t see many posts of users using their TV…

Fair enough for this “preview” device, but for the final device release it would be nice not having to add a cinch plug for every single voice device.

1 Like

This is THE FINAL device from Nabu Casa for now. What will change and evolve is the software.
Other people might pick up the openly available pcb designs and build other devices based on that.

It’s all in the YouTube video…

FYI, FutureProofHomes has now also announced the final hardware design of their much more advanced ”Satellite1 Dev Kit” (or rather announced a public beta pre-launch with pre-order for their the USA only) so that development board hardware looks fully ready too even if not available to ship as of yet.

As you can see in their video they taken a very different approach by making it modular using a two-board design that seperate the compute board from the voice board, and making it compatible with the Raspberry Pi Zero standard it will be both flexible today and upgradable to other compute boards in the future.

1 Like