Its sort of stange as the audio out is being returned from the upstream ASR when really the upstream ASR and intent response could just stream to wireless audio.
I am not really a fan of how Mike sets out the voice infrastructure as yeah audio out is central and likely should not need to be in a microphone enclosure.
The Python wyoming ‘open standard’ is freshly created whilst Linux has a huge array of high performance C libs for audio and doesn’t make a lot of sense for me at least when we have ALSA to Pulse and the newer pipewire, but you can just pipe to a network socket if you wished which again uses high performance existing linux libs than Python creating unnecessary load for embedded…
But when you have great opensource wireless audio such as Squeezelite or Snapcast it makes even less sense to me.
I have been following Rhasspy and Mycroft from early days and have a repo at StuartIanNaylor · GitHub but thinking of starting again with LinuxVoiceContainer · GitHub just to create some tutorials on how to DiY and use some of the already existing 1st class high performance audio libs Linux already has to offer.
Building a beamforming microphone array on a Pi Zero2 or Radxa as I think I can do better with opensource than opting out to closed source hardware such as the xmos…
Next couple of days I will be making some vids and tutorials on LinuxVoiceContainer · GitHub as an alternative to the HA offering as the implementation often has me bemused.
Only little things but they add up as with stereo beamforming generally you have a front facing device, where the enclosure itself acts to attenuate from the rear.
Top up as with HA with 2 mics on top the beamforming is only on the x axis as three mics is the minium in a triangular config to also include the Y planar axis also…
Guess you could use the HA unit on its side but the wheel and button doesn’t lend itself for that in the manner its been constructed.
In fact why have a wheel and button for a voice input… and again bemused
Also why use Whisper as its huge and not that great for command sentences and why are we waiting for HA ASR when so much existing ASR is already production proved.
HA is a great piece of opensource automation control software for near all home control devices and protocols.
I am confused why like Google and the rest they seem to be making there own embedded brand of everything from ASR, TTS to wireless audio when so much already exists in the opensource arena.
My current favorite for ASR is GitHub - wenet-e2e/wenet: Production First and Production Ready End-to-End Speech Recognition Toolkit as its massively lighter than Whisper and can run on much lighter hardware or be a central ASR on a multiclient system where recognition latency is very small the more hardware you throw at it.
Its all been really frustrating as opensource does have competing software but has nowhere near the levels of discipline in the datasets bigdata have to train the opensource software and this is still true.
I am not sure why more isn’t focussed on create true highquality large datasets and new language models are created for existing than refactoring and creating own brand modules…
But hey…
Excellent! It looks really cool, I like that I can plug it into another speaker for music. I imagine including a high quality speaker at this point bumped up the price too much. Excited for the RGB ring light too.
I think it needs a more friendly name though, something like Harvey - H(ome)A(ssistant)rV(oice)ey?
Don’t think so and its a shame it doesn’t use existing wireless audio opensource software and just be a client to one of those.
Not sure how much resources are left on the ESP32-S3 but squeezlite has been ported to Esp32 as in https://raspiaudio.com/ whilst full blown Sonos opensource Snapcast has much tighter sync that feathers the time sync with zero glitches, but a Pi Zero is a minimum with a Pi Zero2 prob being a better bet as huge step up for only £5 more.
Squeezelite and Snapcast are great pieces of wire audio opensource, one for lighter hardware (Squeezelite) and the other you could argue its even better than Sonos with tighter sync and up 96Khz multichannel if your hardware can cope but its all written in high performing C and supposedly still runs on the original Zero.
MUTE only means one thing - to stop something from producing sound. Maybe someone would use the word to imply silencing a mic, but that’s incorrect usage of the word.
One of the annoying things to me on the S3-Box-3 was that when timers triggered and the sound played, it could only be stopped by physically pressing the button on the box.
Is it possible to stop the timer alert sound via voice now?
For example with Google Asisstant you set your timer, it triggers and the bell rings, and then you can just say “stop” to have it stop ringing.
Well, “mute” is also quite an industry standard when it comes to microphones, so that in itself is fine. The tricky part is that in this device you have both directions, sound-wise.
Does it play well enough with a ha green. I just bought it a month ago and now I am hearing that the voice assistant just launched might need better hardware to process the requests which will be a pity and waste of money on my already made purchase
That rockchip SOC isn’t going to be fast enough to run LLMs locally, but otherwise you should be able to run the voice model locally with decent performance (guesstimate 3-5s response time for basic on/off commands).
Your best bet is going to be to just use homeassistant cloud though and not do it 100% locally
To be fair: the product (documentation) page was updated quite a bit in the last 24 hours, so things that were not there before only were provided later.
Also: this device is marketed as an “out of the box” experience without needing to diy. Then a livestream recording for the hardcore fans is nice to have, but it shouldnt be needed when the documentation is uptodate. Which it now (mostly) is.