The era of open voice assistants has arrived

I could find the specific answer how to force my vpe to play all songs directly on the device?
Is there an easy way to force it play play the audio on the local device. I have 3 voice preview edition and I would like to always have the output forced to the local device or maybe also a Sonos speaker would be possible.
Is there any way?

@ds1707 anything music playback related you want to be looking at doing it via Music Assistent:

If you have not heard of MA before then especially recommend reading all these blog posts first:

So at least try Music Assistent to begin with and then ask questions in their community forums:

1 Like

So based on the HW specs, there’s no added AI accelerator, only dedicated audio processing HW, right? If it’s slow now (the “thinking”), it will still be slow with this device, do I understand it correctly? I can’t find this information anywhere else or in this thread.

Yes that is correct. It is maybe a little better explained in this other blog post which in turn also has many links too → Voice Chapter 11: multilingual assistants are here - Home Assistant

1 Like

Did the push from Open Home Foundation to bootstrap of era of open voice assistants fissle out?

At least I do not anything specific mentioned on the roadmap from Open Home Foundation in regards to encuraging third-party Home Assistant voice hardware or making it easier/simpler to make third-party Home Assistant voice hardware, and the only purpose-built third-party Home Assistant voice hardware that seen so far is FutureProofHomes Satellite1 Dev Kit and the MiciMike Drop-In Replacement Board for Google Home Mini. Was hoping to see more third-party Home Assistant voice hardware on the markets all the schematics are fully open-source and 18-months has now past.

What is needed to breath life into the concept Home Assistant voice hardware from third-parties?

Could the Open Home Foundation maybe help with PCB design advice and reviews or such?

That is, take an active role in helping and encuraging third-party Home Assistant voice hardware?

Anything else the Open Home Foundation or Home Assistant community can do to revive this?

Obviously the Open Home Foundation has a finate amount of people and resources, but I feel there must be more that could be done to encurange Home Assistant voice hardware from third-parties. Even with a limited resources I would think that the Open Home Foundation could both highlight this Home Assistant voice assistant ecosystem concept and offer a channel for advice and help with circuit board designs and tips on recommended manufacturers, even if it is only best-effort.

FYI; apparently Google will now take this idea and make it their own, as they have just launched a campaign for a very similar partnership program where they will help partners by providing a turnkey solution with validated reference designs (featuring specific SKUs for SOCs, sensors, and mics) to be designed and built with the help of partners like Amlogic, SEI Robotics, and Apical.

Google is expanding "Google Home Gemini built in Program" to encurage third-party hardware:

  • "New for 2026 - smart speakers: Our Speaker Reference Design allows you to build high-fidelity speakers that support the full Gemini voice experience, acting as the command center for the home."
    • "This is the most open the Google Home ecosystem has ever been. We are giving you the keys to the full stack, from the app layer to the hardware itself"

Google will soon also be releasing new official Google Home Audio smart speaker as reference hardware to help empower hardware partners and service providers:

"The Home Speaker marks a significant shift in Google’s smart home strategy. Rather than carrying the Nest branding, the speaker is simply called “Google Home Speaker” and has been designed from the ground up around Gemini."

"It features custom processing for Gemini, 360-degree audio, stereo pairing, multi-room support, and the ability to pair with the Google TV Streamer for a home theater-style setup. It will be available in Porcelain, Hazel, Berry, and Jade color options."

There is a group of items in the roadmap Voice V2 - Alpha · Issue #84 · OHF-Voice/backlog-issues · GitHub

The problem with any decent voice assistant at the moment you will need to send data to a cloud service. If you have the hardware you can run very capable AI now locally, here is my project trying to emulate the capabilities of cloud AI voice assistants on a “mid-range” graphics card that you can run fully local.

Fulloch (The Fully Local Home Voice Assistant)

Yeah but none of those open there today have anything in regards to hardware in their scope.

I don't think hardware is currently what is letting down the effort. There are multiple hardware options, I saw you mention some but I don't think you included Voice and Music Assistant Dev Edition – ThirdReality which is another such option.

Besides low quality speaker on the VPE, everything is generally a firmware or software problem.

At least to me it seems natural to want the default software experience to catch up to what is possible, otherwise a new hardware launch will be plagued with many of the same problems that the existing solutions have, likely leading to poor reception.

3 Likes

Microphones are the main problem for the entire ecosystem. For ASR models, it makes no difference whether you're speaking or a person in the background. Without beamforming, satellites will never be able to match or even come close to closed systems. For now, the only solutions are expensive speakerphones or a Seeed module for self-assembling a satellite (there are many reviews on YouTube, but not a single comparison of recorded samples in challenging conditions).

Purchasing a satellite with two microphones (VPE, SAT1, ...) offers no technical advantages.

From what I understand it's the firmware behind the microphones, not the microphones themselves. Some people are using older conference mics with GitHub - OHF-Voice/linux-voice-assistant: Voice satellite for Home Assistant using the ESPHome protocol · GitHub and having great results.

My ViewAssist device (Pixel 7a) also has no problem with audio quality and understanding commands.

Most microphones are good enough to work when tuned properly via the firmware backing them.

And the primary speaker / background speaker can also be solved in ASR using speaker diarization.

The Sat1 has 4 microphones, but only 2 are active right now which is something they're currently working on improving.

From what I understand it's the firmware behind the microphones, not the microphones themselves.

Of course, I am referring to a microphone array, which is a combination of hardware and software. The microphone modules themselves can be simple, one-dollar MEMS components.

And the primary speaker / background speaker can also be solved in ASR using speaker diarization.

This is a feasible, albeit quite resource-intensive, solution for integration into the pipeline.

1 Like

It does not seem to have been officially publicly announced yet but FYI, apparently @synesthesiam have now published an new official voice app/app called "Assist Satellite" app/addon Home Assistant Operating system that offer the alternative off running your Home Assistant OS as a voice satellite (and media player) by simply connecting a microphone and speaker:

That in turn is based/dependent on Linux Voice Assistant runtime (a.k.a. LVA) that uses ESPHome protocol, (which is a project that you can also use stand-alone without HAOS):

In summery I guess you could simply desrive it as the experimental Linux Voice Assistant (LVA) repackaged as a containerized varient for use inside Home Assistant Operating System as an app/addon.

For those who have now heard of "Linux Voice Assistant" (LVA) is an experimental Python program meant to be the successor of the older Wyoming Satellite based-solution concent and is designed to run on more powerful hardware such as an x86-64 or ARM64 computer (e.g. a mini-PC, Home Assistant Green or Raspberry Pi 3 and later), for more background and information read:

and

Also note that with this the older "Assist Mcrophone" app/addon has also officially been deprecated and its GitHub repository now refers to that newer "Assist Satellite" app/addon instead.

Firstly I think that so many choices are confusing for new users - both the off-the-shelf options; but especially all the DIY combinations of processor, microphone and software version. I'm not saying there are too many options ... just that they are not clearly laid out in one place. Consider also that many new users are not familiar with GitHub and Discord, so they miss out on support.

Secondly I don't think that Voice Assist has fizzled out. There was recently a burst of enthusiasm driven by the XMOS XU316 chip and several versions (including HA Voice PE) with similar hardware hit the market at the same time. XVF3800 seems to be the next step, but I get the impression that improvements don't justify double the XU316's price.

I see three areas for ongoing improvement, and all essential:

  1. The actual microphone hardware, and optimising the speaker and microphone sound paths in the cases.
  2. The Digital Signal Processing "magic" including AEC (Acoustic Echo Cancellation), de-reverberation, noise suppression, beamforming, and AGC (Automatic Gain Control). I believe a lot of this is locked into proprietary algorithms and chips. Hobbyists not covered by NDAs can eventually come up independently with similar results - but it will take time and the people with the skills are constantly employed by the big players.
  3. More AI-style processing cores, both at the satellite for DSP use, and at the voice server for interpreting the language which is detected. This is just Moore's Law and will happen anyway.

Personally I see the DSP as the key element. At the moment we are using XMOS chips which have "black box" DSP built-in. Not ideal, but as long as the chips don't require an internet connection we can use them.

And in the meantime the big players just dig into their deep pockets to leverage their economy of scale and pull further ahead. If they start charging users for back-end processing to offset reported financial losses, the market might change enough to give new entrants and FOSS a better chance.

In summary, I am impressed with where Open Source Voice Assistants have got to so far, and look forward to further improvements. Unfortunately the biggest problem is that new users expect Alexa/Google Home performance out of the box; and judge based on that unfair comparison.

3 Likes

I'm looking forward to ordering one. However, the XVF3800 does not seem to be available yet in Europe if you want to have it with an XIAO ESP32-S3 and a case, is that right?

For now, it seems you have to choose between either the XIAO ESP32-S3 or having a case. In the US, you can have the combo.

that‘s cool

that’s cool

Someone compared the HA Voice Assist PE with the XVF3800 running ESPHome. The conclusion is that the PE has better & more optimized software and therefore it's user experience currently wins over the XVF3800 solution. This may change with improved software for the XVF3800 but right now it seems to lack behind.
Voice PE war gestern? ReSpeaker XVF3800 als Home Assistant Sprachassistent - YouTube (in german language)

1 Like