The era of open voice assistants has arrived

This 100% opensource platform designed to be heavily extended is really what we needed to get to the next level. Hardware & code possibilities with ESPHome are so wide that we really lacked a common starting point. Lossless audio streaming really is the cherry on the top of the cake. Great vision and outstanding achievement !

OK Nabu. Take my money, send me the PE and just leave and take a well earned Christmas holidays now!!!

1 Like

Build in a 3.5 audiojack and a streaming player otherwise it not intresting for me

Good paring with musicassistant

It has one.

Can someone please make a design as replacement PCBs for all Google Nest / Google Home speakers?

That is, now that the production-ready IC components is both finalized and the PCB design/schematics being open source, using all this as the reference hardware design I hope some people in this community are skilled electrical engineers and interested in making replacement PCB designs with open source schematics for existing smart speaker products so we can retrofit/convert them into becoming ESP32 + XMOS hardware running ESPHome firmware.

I mean, would love the option to just swap out of the circuit board internals if could repurpose most existing Google Nest / Google Home and Amazon Echo smart speaker hardware, as many of those already have nice enclosures and good enough built-in speakers built-in to play music for multi-audio (at least if you are not too picky about your Hi-Fi audio quality).

Similar to the Onju Voice project which previously released open source PCB schematics for Google Nest Mini / Google Home Mini speakers PCB replacement, with updated PCB designs requested when it became clear that XMOS was going to be used (in combination with ESP32-S3):

5 Likes

Really opensource should be interoperable with already existing opensource software.
There are 2 great pieces of opensource wireless audio software, squeezelite which runs on esp32 like RASPIAUDIO · GitHub or GitHub - badaix/snapcast: Synchronous multiroom audio player which runs on a Pi.

Squeezelite is more limited than Snapcast as Snapcast is a full blown opensource Sonus challenger for wirelesss multichannel audio.

You place your speakers in the best place for speakers which usually is a stereo pair on a facing wall giving room coverage.
This allows your microphone to be optimal and close and away from your speakers, but not cloning always far more choice as not only are they your smart speakers they can be cast to by any device you set up with opensource casting software.
You can pick what amplifier you wish and if each speaker is active wireless or a reciever may drive several speakers.

My setup is snapcast with a Pi that needs no enclosure as its stuck on the back of a subwoofer I got from ebay for £20 and there are a whole load of very cheap but amazing quality as class D amp boards have improved so much.

If you want a liitle more quality then *December Promotion* WONDOM OFFICIAL SHOP - Amplifier Board - Sure Electronics - ADAU1701 - 18650 charger - Sigmastudio make some great audio boards.
I have 2x bookshelf speakers which again where 2nd user ebay buys as some great bargains can be made.

Not embedding a speaker creates choice and opens up to other devices that can cast to them so those speakers can be the output for all room media not just a ‘smart speaker’ …
Also makes enclosure design much easier as the engineering that goes into the Google and the rest is actually immense, you can check out a Nest audio and its ridgid cast metal body to stop resonance in its casing to help isolate speaker from microphone array. https://www.youtube.com/watch?v=4-3VodA-Nlo

Seperating microphone just makes software and engineering needs so much easier, enclosures… stick your amps to the back of your speakers on hex pillars and feed from a 24v brick PSU… :slight_smile:

I might have misunderstood it, but how does the timers work? It runs on the device itself? Because that is the biggest problem I have with timers in Alexa - thy run on that device. Is it possible to say “set tea timer for 2 minutes” and get it to start actual timer entity in HA, so I can display it on dashboard, and have it announced in whatever room I’m currently in, instead of the original device?

Congrats with the product release! I’ve ordered one.

Question:

  • i’d like to use it local controlled only
  • the only thing i wanna do, is for the assistant to execute max 10 scripts
  • no interest at all in llm and all the fancy stuff
  • English language is fine

Is it reasonable to do this on a Intel i5-10500, 2 cores being exposed to Home Assistant OS, without (i)gpu?

They added more files over night, this page is pretty extensive by now: Downloads – Home Assistant Voice Preview Edition

1 Like

Bummer they must be out of stock now!

Its sort of stange as the audio out is being returned from the upstream ASR when really the upstream ASR and intent response could just stream to wireless audio.

A pi with a respeaker 2 mic can do the same, but dislike the lag the driver software has from each new version of RaspiOS and prefer using stereo USB soundcards such as Plugable USB Audio Adapter – Plugable Technologies
or ADA-15 USB - HQ MINI audio | Axagon

I am not really a fan of how Mike sets out the voice infrastructure as yeah audio out is central and likely should not need to be in a microphone enclosure.
The Python wyoming ‘open standard’ is freshly created whilst Linux has a huge array of high performance C libs for audio and doesn’t make a lot of sense for me at least when we have ALSA to Pulse and the newer pipewire, but you can just pipe to a network socket if you wished which again uses high performance existing linux libs than Python creating unnecessary load for embedded…
But when you have great opensource wireless audio such as Squeezelite or Snapcast it makes even less sense to me.

I have been following Rhasspy and Mycroft from early days and have a repo at StuartIanNaylor · GitHub but thinking of starting again with LinuxVoiceContainer · GitHub just to create some tutorials on how to DiY and use some of the already existing 1st class high performance audio libs Linux already has to offer.
Building a beamforming microphone array on a Pi Zero2 or Radxa as I think I can do better with opensource than opting out to closed source hardware such as the xmos…
Next couple of days I will be making some vids and tutorials on LinuxVoiceContainer · GitHub as an alternative to the HA offering as the implementation often has me bemused.
Only little things but they add up as with stereo beamforming generally you have a front facing device, where the enclosure itself acts to attenuate from the rear.
Top up as with HA with 2 mics on top the beamforming is only on the x axis as three mics is the minium in a triangular config to also include the Y planar axis also…
Guess you could use the HA unit on its side but the wheel and button doesn’t lend itself for that in the manner its been constructed.
In fact why have a wheel and button for a voice input… and again bemused
Also why use Whisper as its huge and not that great for command sentences and why are we waiting for HA ASR when so much existing ASR is already production proved.
HA is a great piece of opensource automation control software for near all home control devices and protocols.
I am confused why like Google and the rest they seem to be making there own embedded brand of everything from ASR, TTS to wireless audio when so much already exists in the opensource arena.
My current favorite for ASR is GitHub - wenet-e2e/wenet: Production First and Production Ready End-to-End Speech Recognition Toolkit as its massively lighter than Whisper and can run on much lighter hardware or be a central ASR on a multiclient system where recognition latency is very small the more hardware you throw at it.
Its all been really frustrating as opensource does have competing software but has nowhere near the levels of discipline in the datasets bigdata have to train the opensource software and this is still true.
I am not sure why more isn’t focussed on create true highquality large datasets and new language models are created for existing than refactoring and creating own brand modules…
But hey…

1 Like

Excellent! It looks really cool, I like that I can plug it into another speaker for music. I imagine including a high quality speaker at this point bumped up the price too much. Excited for the RGB ring light too.

I think it needs a more friendly name though, something like Harvey - H(ome)A(ssistant)rV(oice)ey?

Great! But please don’t forget us in New Zealand as well. :crossed_fingers::wink:

1 Like

Is there a way to have an intercom feature between 2 of these devices?

5 Likes

I know its a pipe dream but it will make me purchase 10 of these, can they sync music???

praying its a yes.

3 Likes

This would be cool

1 Like

Don’t think so and its a shame it doesn’t use existing wireless audio opensource software and just be a client to one of those.
Not sure how much resources are left on the ESP32-S3 but squeezlite has been ported to Esp32 as in https://raspiaudio.com/ whilst full blown Sonos opensource Snapcast has much tighter sync that feathers the time sync with zero glitches, but a Pi Zero is a minimum with a Pi Zero2 prob being a better bet as huge step up for only £5 more.
Squeezelite and Snapcast are great pieces of wire audio opensource, one for lighter hardware (Squeezelite) and the other you could argue its even better than Sonos with tighter sync and up 96Khz multichannel if your hardware can cope but its all written in high performing C and supposedly still runs on the original Zero.

1 Like

Is there a way to improve response speed of the voice assistant by adding some hardware accelerator like Hailo AI?

1 Like

MUTE only means one thing - to stop something from producing sound. Maybe someone would use the word to imply silencing a mic, but that’s incorrect usage of the word.

1 Like

One of the annoying things to me on the S3-Box-3 was that when timers triggered and the sound played, it could only be stopped by physically pressing the button on the box.

Is it possible to stop the timer alert sound via voice now?

For example with Google Asisstant you set your timer, it triggers and the bell rings, and then you can just say “stop” to have it stop ringing.

I’d love this sort of functionality here.