Building the AI-powered local smart home

Home Assistant is guilty! Guilty of pulling me down the “local AI” rabbit hole and of spending far too much money on the dream of a personal, locally‑run Jarvis that can help my family and me with everything.

When the voice integration launched, my journey began with my first investment: a Mac Mini 4 with 32 GB ram. I intended to use it to feed Home Assistant with local LLMs. After some early successes with models like Qwen 2.5, I quickly realized that the Mini simply isn’t powerful enough for a fast, capable LLM‑powered smart home.

Using old PC parts and my first used RTX 3090, I’ve now built a full‑blown LLM rig: 4 × RTX 3090 plus a RTX 5090 (the latter for rapid STT, TTS, and AI‑image generation). I’m finally at a point where I can say the local Jarvis is almost done. Thanks to the hardware, I’ve moved from great models like Gemma 3 and Mistral 3.2 to GPT‑OSS‑120B, which works beautifully with Home Assistant. Since GPT‑OSS lacks vision, I’m also running Gemma 3 to analyze camera images.

The biggest hurdle—still—is the voice side, especially because I need German for my family. Whisper 3 and a German variant of it work most of the time, but they’re not 100 % perfect. Voice PE only has two microphones, so recognition drops when I’m more than two metres away. Still, it’s acceptable overall. The real challenge is a good‑sounding German TTS voice. Piper is fast, but the German voices are… not the most pleasant :smiley:

If you wander the LLM space, you quickly notice how many excellent TTS solutions exist—almost all English‑only. Kokoro, which is linguistically diverse, sadly has no German support. After a week of battling with countless tests and new Linux tricks, I finally succeeded! Thanks to Chatterbox and a few custom integrations, I now have a lovely German voice. It was a gritty fight, but worth it.

What I’d love most right now is an OpenAI‑compatible API integration with a custom endpoint. In the LLM world, the OpenAI API is the de‑facto standard, and I’d love to plug it directly into Home Assistant (please, please). There is a custom integration*, but it doesn’t perform as well as the native Ollama integration.

### Details & Repositories I use for my HA‑Jarvis

LLM part – Ollama

  • GPT‑OSS‑120B – controls HA and handles conversation (runs on 3 × RTX 3090)

  • Gemma 3 – vision (runs on 1 × RTX 3090)

(Here again I’d love a custom OpenAI‑API endpoint to hook in back‑ends like LLAMA.CPP or VLLM. Custom integrations exist in theory, but they’re flaky; I’ve wasted too much time on them. The native Ollama integration works reliably for now.)*

STT (runs on RTX 5090)

TTS (also on the same RTX 5090)

STT + TTS Wyoming integration

To bring the above STT/TTS solutions into HA I use a Wyoming bridge. The excellent project GitHub - roryeckel/wyoming_openai: OpenAI-Compatible Proxy Middleware for the Wyoming Protocol now even streams STT + TTS. It translates any OpenAI‑API‑compatible STT/TTS service into the Wyoming protocol for Home Assistant—absolutely essential!

Additional goodies

  • **Open AI API custom endpoint (Many thanks Michelle )

Thanks to the brilliant work of * @EuleMitKeule *, the HA LLM can be made far smarter. With n8n workflows, HA MCP, the vast world of n8n tools, and MCP servers, you can extend the HA LLM into countless domains. Streaming and file (camera image) handling are already built in!

All in all, I’m not yet at the perfect local Jarvis, but I’m damn close. Any tips, suggestions are more than welcome :slight_smile: I cant wait for upcoming AI/LLM/VOICE updates :slight_smile: Many thanks to the entire HA team for making this possible

Picture off my LLM monster

17 Likes