Did try a lot already…
I had pretty bad experience with llama-3.2-3b as HA controller.
qwen2.5:7b model works better in my testing.
the weak point is faster-whisper, sometimes it invents long sentences I never said
I am using Home Assistant cloud for STT; faster-whisper was less reliable even with simple intents.
I want to try Rhasspy Speech instead of faster-whisper, but latest docker container is not available.
Really enjoying this device so far- well done! Shipped from Seeed Studio in about a week.
Question: How do I get Assist to respond on a different (Sonos) media_player? I took over the PE in ESPHome but I can’t find much documentation on the config. Thanks!
You should be able to play the announcement and to play the response on another media player that you have imported from home assistant within the ESP home configuration. However, I have no idea what effect that does have in the in-built echo cancellation of the X-MOS chip.
Interested in this question too since the internal speaker in PE is a disaster and not usable.
It’s not bad for a voice assistant. Most non-premium assistant products in the market have similar quality.
If you want to rock tunes with it, yeah, you must hook up to a decent powered speaker. For example, my living room assistant is hooked up to a 5.1 JBL LSR + Rythmik + Marantz combo. But a decent boombox or set of computer speakers with line in should be more than enough for you.
Dunno as in terms of opensource GitHub - voice-engine/ec: Echo Canceller, part of Voice Engine project based on GitHub - xiph/speexdsp: Speex audio processing library - THIS IS A MIRROR, DEVELOPMENT HAPPENS AT https://gitlab.xiph.org/xiph/speexdsp is massively finnicky about clock drift requiring mic and ref signal to be on the same audio hardware.
The Xmos AEC is called ‘adaptive’ but doesn’t seem to say if its non-linear or not, I think linear aec is a bit like the above and the ref signal and mic need to be tightly synced, experimentation will tell though.
The adaptive bit is the initial delay setting of distance from speaker to mic and it has a long tail, so if not like the above it could work, but my guesstimate its linear.
With Voice-En AEC there is a python script to get the delay/latency that you hard set as a parameter whilst Xmos does it on the fly.
There is also when you don’t have your mic sat ontop of your speaker AEC becomes less of an issue, when your mic proximity is much closer especially when using beamforming or targetted voice extraction.
All the opensource seems to be Arm/X86 based as WebRTC also had an AEC method but didn’t seem to work well on Arm and may be something to do with its scheduler/branch prediction of Arm.
Maybe its something PreemptRT may fix and its just lower end SoC latency…
A big part of why linear AEC is important is because you are going to clone commercial all-in-one ‘smart-speaker’ setups.
I have wondered if the tight timing sync of snapcast clients could be use with AEC where one is a mic ref signal on the mic hardware whilst another client is the input to the audio player hardware. Think it requires a non-linear alg and haven’t found suitable apart from ML and quite computationally heavy models (Pi5/Rk3588/n100 would likely have no prob though where a Pi4 starts to struggle). DTLN AEC is probably one.
Thanks. I’m not planning to use it for music, but considering the price, the speaker quality should be better. While the microphone performs well, the speaker is a major letdown that hurts my ears. Adding more accessories via 3.5 to improve the sound quality is not the right option, so I’m looking for ways to enhance it via software with the existing hardware. If that’s not possible, it will sadly end up as electronic waste.
I think you missed the point that this has very clearly been released as first-generation ”preview edition” hardware as is more meant to serve as a reference platform and start of a new development platform so that other hardware developers and companies can use a base to build on to make their own custom hardware variants with extended functions and/or other additional features.
This ”preview edition” is more a fully open-source platform to allow experimentation and help new ideas grow and is as such not really a product designed for end-users who expect a completly finished product. So if you want out-of-box then you just have to wait until others come out with derivative products based on this platform.
Then this reference platform will also continue to grow software-wise and later I am sure they will come a second-generation in the future to use a a base, as well variants with a large display/screen too.
As of the reason for the relativly high-cost if due to economy of scale and there Nabu Casa can not compete with the likes of Amazon and Google in mass manufacturing to keep cost, plus the fact that the hardware is not subsidiced by advertisement money as no one working on this get paid to have it be a way to deliver commercials on it or for it to serve as a sales platform like Amazon Echo and Google Nest/Home products.
Anyway, the onboard speaker of this ”preview edition” is tiny so I do not think you can make it sound much better with software optimization, that is just the physical limitation of having such a tiny speaker, thus the solutions is either to use an external speaker or replace the onboard speaker. With the physical speaker it comes with being the size it is there is no way to improve it without fully replacing the physical speaker.
Perhaps I did miss something, but my understanding, shared by others here, is that ‘preview’ refers to the current software and upcoming updates. From what I’ve seen in the HA release notes, there’s no indication of new hardware being planned anytime soon. Also, it doesn’t make sense to label hardware as ‘preview,’ as there’s nothing to develop except installing a decent speaker from the start instead of this subpar one, then trying to sell it with the excuse of being a ‘preview edition’ while charging a full price.
Regarding fixing it via software, I meant redirecting audio output to another device. It’s clear you can’t improve bad hardware in any other way, but I believe such redirection should be technically possible.
No you have deffinitly misunderstood the main points with this ”preview edition” release. Yes it is correct that software is not mature yet but just as important is to understand that this ”preview edition” hardware is not meant to be be ”the final design”, so the point this with it is instead that others should step up and build their own hardware variants, so recommend that you read the whole blog posts article again now that you know this, see:
Especially re-read these sections:
- Bringing choice to voice
- Fully open and customizable
- Community-driven
- Conclusion
- See what voice can do today
For the record, many other third-party companies as well as independent community developers are already working on their own hardware variants based on this reference design, and I am sure some of those will come with a speaker or many speakers out-of-the-box, (while some other variants could potentially come without a speaker at all and require you to always add your own).
PS: Paulus have made all this even clearer in many interviews he has done with the press and podcasters in the last year.
Clear on what exactly, any specific citation?
Again, I really appreciate everything else about this product, from the packaging and minimalist footprint to the sleek design, well-functioning microphone, and the promise of future software updates. However, the decision to cut corners on the speaker quality undermines all these positives for this product and its utility for me personally.
Because it is a PREVIEW device… That’s what hedda is saying. It’s what the team has very careful saying… They’re asking for feedback. There’s already at least one project out there doing something similar with a much higher quality DAC and I assume speaker (not to dig it up but the way they’re handling comms on the VPE is light years better than how they handled comms on backup)
I do agree they’ve been very above board about that in every public appearance and document.
But…
I will also agree a tiny fraction of actual users read those documents and a smaller still fraction watch the videos and im probably the only nerd watching Paulus on the Vergecast… (Patel, if you ever read this -DM me.)
That said I had to order some chokes to isolate and get rid of that horrid whine on the 3.5mm. Seems like the current design is susceptible to ground loops. I agree ths onboard speaker is absolute trash. But I’m hooking it into a denon avr so I really don’t care.
And this is my personal feedback on the commercial product made by NC that I purchased with real money, not ‘preview’ money. My initial comment, and still my focus, is on finding a software solution. However, the discussion seems to have shifted toward the usual ‘preview’ aka ‘beta’ narrative that’s commonly brought up on this forum.
You can impose constraints in the prompt words, such as not saying too much nonsense
Come on, I am not going to relisten to podcasts transscribe them for you, so you can either go listen to them yourself or not. If you really care and are not just here to whine then search for ”paulus schoutsen podcast” for the last few months as there are more than just one. → Google Search
Did you even read the the later sections in the full announcement where this again is also explained if you understand all what they wrote there? → The era of open voice assistants has arrived - Home Assistant
Anyway the gist is that this project follows the open-designs movement and to ignite that they more or less released this as a fully open design with a reference model under a open-source hardware license and open-source firmware because they want others to improve on it. See → Open-design movement - Wikipedia
It is the same concept as with RepRap 3D-printers (which as you probably know has blown up because of this idea of being fully open-source and community-driven) → RepRap - Wikipedia
I don’t have OCD to collect junk and electronic waste. Hopefully, you can read the screenshot, if you understand what’s written, as well as read transcript from the release video yourself.
I prefer being realistic over blind cheering and expect to get what was promised for my money.
That says today, (implied: under the applied constraints).
Very obviously one could do better hardware for a higher price even today.