Hey everyone! ![]()
I wanted to share my setup for a completely local voice assistant using Home Assistant with a Mac Mini as the AI inference server. This setup uses zero cloud services - everything runs on your own hardware!
Hardware
- Mac Mini M2 - Runs all AI services
- Home Assistant - On separate hardware
- Home Assistant Voice Preview Edition - ESP32-S3 based voice satellite## The Stack
On Mac Mini (All using Apple Silicon acceleration):
-
Whisper.cpp (Port 8910)
- STT using
ggml-large-v3-turbomodel - Launched via LaunchAgent with
whisper-server
- STT using
-
Wyoming-Whisper-API-Client (Port 10300)
- Bridges whisper.cpp to Wyoming protocol
- Installed via Homebrew
-
Ollama (Port 11434)
- Local LLM using
llama3.2-vision - Installed via Homebrew as a service
- Local LLM using
-
Wyoming-Piper (Port 10200) -
Important Note- TTS with
en_US-hfc_female-mediumvoice - Using custom fork: https://github.com/jooray/wyoming-piper
- This fork fixes compatibility with
piper-tts==1.3.0 - Installed in Python 3.12 venv (Python 3.13 removed
audioopmodule)
- TTS with
Why the Wyoming-Piper Fork?
The official wyoming-piper is currently broken with the latest piper-tts. This fork provides a temporary fix by using command-line invocation instead of the Python API. Use this fork until the official version is updated.
Gotchas I Encountered
- HTTPS breaks ESP audio - Use HTTP for internal_url in configuration.yaml
- Python 3.13 breaks audio libs - Use Python 3.12 for Wyoming-Piper
- Wyoming-Piper needs the fork - Official version has compatibility issues
- Ollama needs all interfaces - Set
OLLAMA_HOST=0.0.0.0 - Choosing a good model (see below) - tool calling
- Naming my devices - I have a light called bedroom, but also an AC called bedroom. They are both in my bedroom. When I tell it to turn on the bedroom AC, it usually fails, unless I create an Alias for this entity in Assistant configuration (literally “Bedroom AC”, “Bedroom Air Conditioning”).
Model choice
This is where I would like some help. A lot of information out there is out of date. What is the best model?
HA switched to tool calling for ollama, which is great, but not all models support tool calling. You can find those that do here. Out of these, qwen3-based models work best, but the problem is it is hard to turn off thinking (reasoning), I haven’t figured out how to do it. That means, that the model is thinking too much and that greatly increases the latency. I tried all models up to qwen3:4b.
LLaMA-based models - the smaller ones - seem to be worse and often don’t do what I need (and llama3 does not support tool calling). I generally often use gemma, with small gemma 3n it should be great, but these also don’t support tool calling.
The models specifically trained for HA such as fixt/home-3b-v3 don’t work anymore, because they don’t support function calling. So it’s quite hard to find online what people are using these days - past recommendations are often broken (and in AI world, 6 month old recommendation is basically paleolithic anyway).
Venice.ai
I have also tried venice-ai using their API. This is generally much faster and I can afford to run bigger model, but I also don’t know which one to choose, but 70b llama models seem at least usable. I would still prefer running local models.
Future Improvements
Once the official wyoming-piper is fixed, I’ll update to remove the fork dependency. Also considering adding more voice satellites around the house.
Hope this helps someone else achieve a fully local voice assistant! Happy to answer questions about the setup. ![]()
![]()
Note: This is a working setup as of July 2025.