Would love to also hear more about voice hardware roadmap with future reference hardware plans for both ESPHome on ESP32 and Linux Voice Assistent on ARM64 satellites from Open Home Foundation and partners. Will you for example be making a matching official product for Linux Voice Assistent that matches the Home Assistant Voice Preview Edition as reference hardware?
Would it be a good idea to have voice satellites with a fixes microphone board and a modular ācomputeā (a.k.a. ācoreā) board like they have in the FutureProofHomes Satellite1 modular design concept?
That is, make two swappable SoM (System-on-Module) ācomputeā boards, with one SoM-board based on ESP32 and one SoM-board based on a powerful ARM64 SoC (similar to Raspberry Pi Zero 2 W)?
Maybe base the ARM64 SoM compute-board on a SoC with built-in NPU that is powerful enough to off-load Speach-To-Text to free up resources from other tasks.
Perhaps even make the ESP32 SoM compute-board with multiple ESP32 chips on the same board to off-load the communication tasks and also make it work as a Thread Border Router (if could combine an ESP32-S3 with an ESP32-C6).
The smallest GPU (yes you you need one in LLM land - not optional⦠vram is your limiting factor) you can hope to use and have a decent experience is something better than a Nvidia 3xxx with at LEAST 8GB vram. Preferably 16 or more.
That puts you MINIMUM $800 usd. Probably in the low $1000ās
I would not expect them to. They have three (four if you include the stop) perfectly good (even if we donāt like them) copyright clearing terms already and have documented how to make your own.
If they build a new one and run afoul of copyright or service marks or trademark or⦠They get sued because theyāre a business entity.
If you do it⦠Itās your install⦠Have fun. Maybe the copyright owner comes to tell you to stop but HA isnāt sued out of existence by just shipping something.
Given those choices if Iām your Dev PM I wonāt LET you make more three is fine we have bigger fish to fry. You have satisfied the build requirements and prevented scope creep and lawsuit. I call it a win. Now am I in that room and do I KNOW they make that decision⦠No. But coming for the perspective of people whoāve has to make decisions like that. I wouldnāt hold my breath⦠(probably not)
Finally, I can see a route to moving away from Alexa/Plex. Lighting and music are my primary use cases (with heating, at least logging, being the next most relevant). LMS has replaced Plex now that I have some stand-alone wireless speakers (and I can use about 3 more squeezelite players).
Iām not too worried about the performance of local acceleration, I expect models and hardware to converge soon enough as the next ASIC iterations start coming to market.
Immediate priorities for me are phrase/context accuracy, and music library handling - which both seem to be in hand.