Hi everyone,
I wanted to share a small experimental project I have been using locally with Home Assistant Voice PE.
The idea came from a limitation I kept feeling in the normal voice assistant flow: the device listens, records the full utterance, sends it, waits for processing, and only then starts responding. It works, but the interaction does not feel as natural as a streamed conversation.
I also built this because I wanted a comfortable Polish-language voice assistant for my home. I wanted something that works naturally in Polish, gives me a lot of control over what the assistant can do, and still integrates deeply with Home Assistant. This setup turned out to be a very good fit for that. In my local setup, it controls my home reliably and feels much closer to the kind of assistant interaction I was looking for.
So I built a first beta / proof of concept that connects Home Assistant Voice PE to Gemini Live through a local Home Assistant add-on.
The goal is simple:
- lower perceived latency
- more natural back-and-forth conversation
- good non-English language support, including Polish
- stream microphone audio instead of waiting for a full recording
- stream response audio back to the device as soon as it is available
- keep Home Assistant service calls in the loop
- keep the assistant behavior configurable through prompts and settings
Short demo video, in Polish, showing the response speed I am getting locally:
There are two parts:
-
Custom ESPHome firmware for Home Assistant Voice PE:
GitHub - marcinnowak79/home-assistant-voice-pe · GitHub -
Home Assistant add-on / proxy for Gemini Live:
GitHub - marcinnowak79/gemini-live-proxy · GitHub
This is not an official Home Assistant, ESPHome, Nabu Casa, or Google project. It is also not something I would call production-ready. It is a first beta that was mostly vibe-coded, tested on my own setup, and published as inspiration.
That said, it works surprisingly well for me locally. The latency is low, the conversation flow feels much more natural, and Home Assistant commands can be executed through the Gemini Live session.
What I am hoping for with this post:
- Maybe someone will point me to an existing, cleaner solution that already solves this properly and that I can just install.
- Or maybe this can inspire someone more experienced with Home Assistant voice internals, ESPHome audio, or add-on packaging to build a more robust version.
- Or perhaps this can start a discussion about whether streaming voice interaction should be easier to support in the Home Assistant Voice PE ecosystem.
Current state:
- works locally in my setup
- requires custom firmware flashing
- requires a Gemini API key
- uses a local Home Assistant add-on as the proxy
- supports streamed microphone audio and streamed response audio
- exposes configurable prompt/language/voice options in the add-on
- should be treated as experimental beta software
I am sharing it mainly because the concept feels useful: a more real-time, streaming voice assistant flow makes the device feel much more responsive and conversational.
Feedback, pointers to existing projects, warnings, and ideas are very welcome.