Preface:
Several months around the same time I finally found an adequate replacement for my existing PoE doorbell (Ring Video Elite replaced by Reolink) and learned about Home Assistant’s voice assistant pipeline. I’d been itching to get rid of Amazon and Google smart devices for a while, plus Ring’s closed system bothered me for years while the rest of my cameras chugged along on Frigate. My ratdgo controllers have been working flawlessly eliminating my need for MyQ. I finally had all the pieces I needed to almost completely remove all cloud dependent smart home products and services.
The Project:
Transition to a completely self hosted smart home which would allow me to keep all the data local to my home.
This means I needed to replace every Amazon and Google hardware in my home while preserving as much of the existing functionality as possible, which includes:
- 1x Amazon Echo 1st Gen
- Placed in a large great room open floorplan with dining room and kitchen
- 1x Amazon Echo Dot 1st gen
- In a basement kitchenette, but largely unused
- 1x Amazon Echo Show 8 (2nd Gen)
- Originally used to show Ring Doorbell stream upon doorbell being rung. Placed in bedroom with a TV
- 2x Echo Dot (3rd Gen) with Clock
- Placed in bedrooms and used for alarms
- 1x Echo Input
- Placed in utility room connected to amps that run speakers built into the ceilings. Used for chime for Ring doorbell
- 1x 10" Lenovo Smart Display from 2018
- Placed in kitchen primarily used for timers
- 1x Google Nest 7" (2nd Gen)
- Placed in Kitchenette next to Echo Dot, also used primarily for timers. Hardly used.
- 1x Chromecast Audio
- Same as Echo Input, connected to whole house audio system.
Since I was on the task and some of the LLM functionality would fill the gap between Home Assisant’s voice assistant and the larger tech companies’ offerings, I also decided to integrate a self-hosted LLM to allow smarter integrations and less rigid voice commands.
Importantly, I need to provide minimal friction to other occupants of house that are less patient than myself.
How things are going
If you read through my list of devices that I need to transition, you can probably guess that I have a fairly large system. I have around 190 devices, including around 90 zwave devices, which are mostly in-wall light switches.
I found quickly that no direct replacements exist really for many of the voice assistant devices. No problem, though, I’m an embedded engineer; let’s get started.
Voice Assistant Hardware
- Replaced Ring Video Doorbell Elite with Reolink PoE doorbell. Frigate+'s detection is way better than Reolink’s. Still haven’t gotten 2way talk working, but that’s low priority.
- Building smart screen with reSpeaker Lite - USB without ESP32 + Pi 4 B+ + RPI 7" Touch Screen to replace Echo Show in bedroom. Unfortunately the USB firmware for the XMOS chip lacks the integration with the USR button and LED on the board. I’m not going to hold my breath for Seeed to implement these any time soon. Mostly, their engineers seem spread pretty thin and they pump out hardware faster than their firmware engineers can keep up.
- Building small reSpeaker Lite Hat v2 + Pi Zero 2 W + some sort of Adafruit LCD/LED display to replace the Echo Dots with Clock in bedrooms
- 1x FutureProofHomes Satellite 1 1st Gen - to replace various echos and nest hubs. Working on designing custom enclosures with presence detection. It’s currently in my great room, and hardly works from more than 5ft away. Probably related to my own custom 3d printed enclosure and I’m working on improving it.
- 2x FutureProofHomes Satellite 1 2nd Gen - to replace various echos and nest hubs. Just got these. WIP
- 1x Home Assistant Voice Preview Edition - to replace various echos and nest hubs. I tried integrating with cheap USB speakers to improve audio output. It worked, and the audio is much louder and clearer, but the final product needs to be repackaged to be a little bit cleaner.
- WiiM Pro to replace Chromecast Audio and Echo Input and Chromecast Audio. So far, working well but the wakeup time of the amps is several seconds so need to add a delay after waking up when making announcements on house studio despite supporting triggers for amps. Oh well. I was hoping the trigger output would fix this.
Other Hardware
- Seeed TRML OG Diy ePaper display to display general house and system status, connected to Terminus server. Getting this set up took a bit more effort than I was prepared for. It works, but needs some fine tuning for getting my dashboard looking better on the display.
- Beelink Ryzen 9 AI HX 370 with 64gb of RAM running Whisper and Piper. Has backup Ollama running unsloth/Qwen3-4B-Instruct-2507-GGUF · Hugging Face model, which is sufficient but not quite as robust as larger models with the amount of context I feed to it.
- 2021 Macbook Pro 14" M1 Max with 64gb of RAM running Llama.cpp using the gpt-oss 20b model. Currently broken because the Home Assistant base upgraded the python version and it broke the Custom Conversation app. Will be moving Whisper and Piper to it when I get a chance
- Custom Ryzen 1700X server running OpenMediaVault, 2x Google Coral TPUs, Nvidia GTX 1060. This machine is currently running Frigate and my Terminus server for my TRMNL device. Currently streaming 9 cameras on it with detection and it’s running swimmingly.
I’d say that I’m about 75% to a functional solution. There are some services that I am still tied to, and I haven’t pulled out some of the Echo/google devices because the replacements aren’t properly tuned and/or assembled yet. Otherwise, there are some services that I can’t yet get rid of that depend on cloud
- Bond - For control of some allen + roth shades. The bridge connects to HA on the local network, so if I’m not mistaken, cloud is only needed for configuring devices. We’ll have to see if I block it from outside connections if it still works.
- August - I previously had Kwikset ZWave locks, but my ZWave was too crowded and spread out for them to behave reliably especially when modifying user codes or secure inclusion. One of them is the 1st gen that had integrated zwave. Using it via zwave was awful.
- Nabu Casa - For remote access. At some point I might remove this and use Cloudflare tunnels or something else to retain remote access. It’s also nice to have to swap out parts of the voice pipeline as needed while working on the system.
- I guess I should mention Frigate+. The premium detectors are considerably better
The state of Voice Assistant
-
Setup and configuration can be incredibly simple and painless. This part actually was a pleasant surprise. Getting my ESP based voice assistants connected was dead simple. Even my Pi based systems were simple to get connected using the linux-voice-assistant project. Turns out my Beelink has a mic array and works well with the linux-voice-assistant software.
-
Running a local LLM was harder than I expected, and still needs a lot of refinement. I have no regrets diving into this and have learned a significant amount about LLMs that I previous was unaware of.
On a smaller system with fewer devices, this might have been easier, but I still had to shrink the number of exposed entities down to the low 80s and crank up context size on some of these models. That meant combining most lights into groups and losing fine control on a lot of devices.
I have numerous computers laying around, including some modern Apple and AMD devices, one of which even has ‘AI’ in the title. My background is in Computer Engineering, and I was a bit shocked at what successfully running an LLM that tied into your smart home required. Responsiveness matters A LOT when interacting with it with voice. Accuracy also matters a lot when interacting with a smart home. Turning on the wrong light, or just spinning for 30s and then failing just, well, sucks. The larger models allow more accurate results and allow more complex behavior while the smaller models process significantly faster and have much lighter hardware requirements. To get speed AND accuracy with complex instructions, you probably will need at least a $2000 fairly modern PC with a decent GPU dedicated to the purpose. More likely, an ideal system would probably set you back more than $3500. Power users are certainly able to squeeze better performance on more approachable hardware, but there’s a reason there’s a major hardware shortage. The more mainstream your hardware, the easier it is to get set up, too. Macs and Nvidia GPUs seem to be the easiest. I ran into lacking support for AMD’s NPU in Linux and the LLM tools that would make my mini PC have a decent edge. Some of these complications explain why Google and Amazon are only now rolling out their AI powered smart home products. For now, I wouldn’t be surprised if you, yourself, can get better results with a local LLM due to how much customization it seems to require. Oh, and the software for managing smart home devices on both Google and Amazon is awful. -
None of the Home Assistant compatible voice assistant devices that I’ve evaluated come close to any of the Google or Amazon products in terms of voice recognition and noise cancellation. This point has been particularly painful. All the 2 mic products, which include the Home Assistant Preview Voice Edition and reSpeaker Lite (both USB and Hat edition) have a limited range and limited noise reduction abilities. 2 mic array solutions seem to be best in quiet locations where there won’t be other speech, which includes any room with a TV or a room where a group might assemble and converse. It seems, as well, that the 4 mic solutions don’t quite hit the mark yet in this area. The latest XVF3800 allegedly has the proper 4 mic processing, but word on the street is that it only has minor improvements over the previous generations.
After a lot of digging, it turns out that almost all of the devices on the market today use the XMOS XU316 chip. Without a team of audio engineers backing these products, it seems that most of them use the basic reference design and code. Nobody has put a huge amount of effort into the audio processing software for the chip. This isn’t a dig at anybody or any product, but it is just a result of lack of resources for all these smart assistant products when compared to Amazon or Google who can afford to optimize these things.
I have great hopes for the team at FutureProofHomes and their Satellite 1 product (I now have 3 of them) but their focus first has been the quality of the sound output and not the advanced features of a mic array. Both Seeed and FutureProofHomes allegedly have things in the pipeline, but as far as I can tell nothing is out yet. Additionally, they won’t be nearly as advanced processing as the 1st gen Google or Amazon products if they succeed at releasing them.
Based on this one issue alone, no Home Assistant smart assistant device can come close to matching any of my Amazon or Google devices in about 50% of my use cases where they are used in noisy or large areas. I will probably spend some more time designing better cases for my devices and trying better STT models, but this gap in functionality came as a surprise to me. -
That brings me to my next thought; options for good output sound quality are limited. The Home Assistant PVE is barely loud enough to be heard at about 10ft and when the volume is at 100% there’s significant distortion in its tiny speaker. There is a reSpeaker kit, but it might match the audio quality of a 1st gen Echo Dot. The Satellite 1 has just come out with a fairly professional and tuned speaker kit, including crossover, but they went with a more directional speaker rather than an omni-directional speaker like most of the Google and Amazon devices. That works in a couple of my spaces, but not when your devices is in the middle of a large room. Since most of these devices work best under 3m, a room that is 6m becomes a bit of a challenge. First world problems, but there has to be a reason why most of the other devices seem to be less directional.
I’ll be continuing to plug away and am more than willing to share any of my configs that I find better than what you can find elsewhere (which is very little right now). I haven’t really spoken to cost of running, and I live in one of the cheapest energy markets in the world. Overall, I assume it’s cheaper than any subscription for me, which may not be the case for others especially trying to run a local LLM.