The reason I ask is I think that the computational load would likely increase the need for larger servers. Hopefully that’s already taken into account at the current subscription fee.
This 100% opensource platform designed to be heavily extended is really what we needed to get to the next level. Hardware & code possibilities with ESPHome are so wide that we really lacked a common starting point. Lossless audio streaming really is the cherry on the top of the cake. Great vision and outstanding achievement !
OK Nabu. Take my money, send me the PE and just leave and take a well earned Christmas holidays now!!!
Can someone please make a design as replacement PCBs for all Google Nest / Google Home speakers?
That is, now that the production-ready IC components is both finalized and the PCB design/schematics being open source, using all this as the reference hardware design I hope some people in this community are skilled electrical engineers and interested in making replacement PCB designs with open source schematics for existing smart speaker products so we can retrofit/convert them into becoming ESP32 + XMOS hardware running ESPHome firmware.
I mean, would love the option to just swap out of the circuit board internals if could repurpose most existing Google Nest / Google Home and Amazon Echo smart speaker hardware, as many of those already have nice enclosures and good enough built-in speakers built-in to play music for multi-audio (at least if you are not too picky about your Hi-Fi audio quality).
Squeezelite is more limited than Snapcast as Snapcast is a full blown opensource Sonus challenger for wirelesss multichannel audio.
You place your speakers in the best place for speakers which usually is a stereo pair on a facing wall giving room coverage.
This allows your microphone to be optimal and close and away from your speakers, but not cloning always far more choice as not only are they your smart speakers they can be cast to by any device you set up with opensource casting software.
You can pick what amplifier you wish and if each speaker is active wireless or a reciever may drive several speakers.
My setup is snapcast with a Pi that needs no enclosure as its stuck on the back of a subwoofer I got from ebay for £20 and there are a whole load of very cheap but amazing quality as class D amp boards have improved so much.
Not embedding a speaker creates choice and opens up to other devices that can cast to them so those speakers can be the output for all room media not just a ‘smart speaker’ …
Also makes enclosure design much easier as the engineering that goes into the Google and the rest is actually immense, you can check out a Nest audio and its ridgid cast metal body to stop resonance in its casing to help isolate speaker from microphone array. https://www.youtube.com/watch?v=4-3VodA-Nlo
Seperating microphone just makes software and engineering needs so much easier, enclosures… stick your amps to the back of your speakers on hex pillars and feed from a 24v brick PSU…
I might have misunderstood it, but how does the timers work? It runs on the device itself? Because that is the biggest problem I have with timers in Alexa - thy run on that device. Is it possible to say “set tea timer for 2 minutes” and get it to start actual timer entity in HA, so I can display it on dashboard, and have it announced in whatever room I’m currently in, instead of the original device?
Its sort of stange as the audio out is being returned from the upstream ASR when really the upstream ASR and intent response could just stream to wireless audio.
I am not really a fan of how Mike sets out the voice infrastructure as yeah audio out is central and likely should not need to be in a microphone enclosure.
The Python wyoming ‘open standard’ is freshly created whilst Linux has a huge array of high performance C libs for audio and doesn’t make a lot of sense for me at least when we have ALSA to Pulse and the newer pipewire, but you can just pipe to a network socket if you wished which again uses high performance existing linux libs than Python creating unnecessary load for embedded…
But when you have great opensource wireless audio such as Squeezelite or Snapcast it makes even less sense to me.
I have been following Rhasspy and Mycroft from early days and have a repo at StuartIanNaylor · GitHub but thinking of starting again with LinuxVoiceContainer · GitHub just to create some tutorials on how to DiY and use some of the already existing 1st class high performance audio libs Linux already has to offer.
Building a beamforming microphone array on a Pi Zero2 or Radxa as I think I can do better with opensource than opting out to closed source hardware such as the xmos…
Next couple of days I will be making some vids and tutorials on LinuxVoiceContainer · GitHub as an alternative to the HA offering as the implementation often has me bemused.
Only little things but they add up as with stereo beamforming generally you have a front facing device, where the enclosure itself acts to attenuate from the rear.
Top up as with HA with 2 mics on top the beamforming is only on the x axis as three mics is the minium in a triangular config to also include the Y planar axis also…
Guess you could use the HA unit on its side but the wheel and button doesn’t lend itself for that in the manner its been constructed.
In fact why have a wheel and button for a voice input… and again bemused
Also why use Whisper as its huge and not that great for command sentences and why are we waiting for HA ASR when so much existing ASR is already production proved.
HA is a great piece of opensource automation control software for near all home control devices and protocols.
I am confused why like Google and the rest they seem to be making there own embedded brand of everything from ASR, TTS to wireless audio when so much already exists in the opensource arena.
My current favorite for ASR is GitHub - wenet-e2e/wenet: Production First and Production Ready End-to-End Speech Recognition Toolkit as its massively lighter than Whisper and can run on much lighter hardware or be a central ASR on a multiclient system where recognition latency is very small the more hardware you throw at it.
Its all been really frustrating as opensource does have competing software but has nowhere near the levels of discipline in the datasets bigdata have to train the opensource software and this is still true.
I am not sure why more isn’t focussed on create true highquality large datasets and new language models are created for existing than refactoring and creating own brand modules…
But hey…
Excellent! It looks really cool, I like that I can plug it into another speaker for music. I imagine including a high quality speaker at this point bumped up the price too much. Excited for the RGB ring light too.
I think it needs a more friendly name though, something like Harvey - H(ome)A(ssistant)rV(oice)ey?