Experiment: Using a tiny local LLM as a Home Assistant command router (without intents)

I’ve been experimenting with a different approach to natural language control in Home Assistant and wanted to share a quick demo.

Instead of relying on sentence templates or intent definitions, I’m experimenting with an entity-grounded natural language router.

The basic idea is fairly simple:

The system feeds a small language model a structured list of entities in a domain along with their current state and the services that domain supports. When a command comes in, the model selects the correct entity and service directly from that list and returns a deterministic JSON action.

Example prompt input looks something like this:

light.back_mudroom = off
light.back_porch = off
light.dog_house = on
light.kitchen = off

If the user says:

“turn off the dog house light”

the model returns something like:

{“type”:“ACTION”,“domain”:“light”,“service”:“turn_off”,“entity_id”:“light.dog_house”}

That JSON is then routed back into Home Assistant to execute the service call.


Natural language flexibility

Because the router is reasoning over entity names and states instead of sentence templates, it can handle a variety of natural phrasings without needing predefined intent patterns.

For example, the following phrases can all resolve to the same entity and service:

“turn off the dog house light”
“please turn off the dog house”
“hey can you shut that dog house light off”
“can you turn off that bedroom light”
“I don’t want that light on anymore”

The router simply maps the language to the closest matching entity and chooses an appropriate service.


Architecture

Right now the prototype pipeline looks like this:

Assist satellite

Whisper speech recognition

MQTT topic

Python AI router

LLM selects entity + service

MQTT action topic

Home Assistant executes service

MQTT is simply used as the message bus between the voice pipeline and the router.


Models Tested

I’ve been testing very small local models for the routing step, including:

  • Qwen3 0.6B
  • Qwen2.5 0.5B

Surprisingly these tiny models perform very well when the prompt is structured with entity/state information.

Current round-trip voice latency in testing is around ~2.5–3 seconds end-to-end, fully local.


Important notes

This is very early experimentation and nowhere near production ready.

The goal isn’t to replace Home Assistant’s existing intent system, but to explore whether a lightweight LLM router could provide a more flexible natural language layer without maintaining sentence templates.

Right now I’m mostly experimenting to see how far this approach can be pushed with very small local models.


Demo video:

1 Like

So. I don’t want to bust your bubble but I’m about to.

Context window will kill this fast - at only 30 ents you can expect around 1300 toks (more than 1/4 of most small model contexts) and at 100 ents you’re blowing a 5K context window.

Ok yes I know my install is abnormal but…

Dumping an ent list into a context. Large or small isn’t going to be a long term answer. Work on ways to remove things from context until they become necessary… Just. In time context pulls will become your friend.

After a year and a half, I’m thoroughly convinced indices and treating HA like a data lake is the answer…

Totally agree with multi models and even the inference router concept… I have LiteLLM installed and ready for phase 2 of Friday.

That’s a good point. Right now the router isn’t sending the full entity registry to the model. It first detects the domain and filters entities before constructing the prompt, so the model typically only sees entities within a single domain (for example light.*). In my environment that usually means a couple dozen entries rather than the full HA state list.

I completely agree that if this were scaled to thousands of entities a retrieval or indexing layer would probably make sense. For now I’m mostly experimenting to see how far the entity-grounded approach can go with very small models.

1 Like

Good experiments just remember don’t infer deterministic stuff. No need to burn the toks local or paid… Small models Amplify the problem.

I actually experimented with pushing more deterministic logic into Python earlier in the pipeline (for example resolving things like on/off or volume commands before the prompt was built). What I found was that it reduced the model’s ability to interpret more natural phrasing like “that’s too loud”, “kill the light”, etc.

For now I’ve been letting the model infer the service while Python verifies the entity and executes the action. The prompt is small enough that the token cost is minimal, and it keeps the natural language flexibility higher. Still very much experimenting with where that boundary should live though.

1 Like

Cool… What Ive learned it’s not huge context but well prepared context that matters.

Go see what the summarizer os doing in my build and how it preps the context window… It’s easier for you to see than for me to describe. But tiny rich dense meaningful co text beats broad thin context every time so I gave up on the ents and just try to extract meaning and link back to the ents involved with labels.

Thag said even then model choice matters. A small model below 8b simply can’t tool well or act on that context. So that’s where we differ.

Anything below 4b was simply infuriating to work with.

I think we might be solving slightly different problems, which is why the architectures look different.

I’m not trying to build a reasoning layer or a data-lake style assistant for Home Assistant. I’m only experimenting with the NLP layer that sits between speech recognition and the Assist action pipeline.

In other words, the goal isn’t to replace automation logic or build a full AI home brain. It’s simply to translate natural language into a deterministic service call.

The router I’m experimenting with classifies incoming text into three buckets:

ACTION – device control
WEB – external knowledge queries
CHAT – conversational responses

Only the ACTION path interacts with Home Assistant.

When ACTION is selected, the model receives a very constrained prompt containing:

  • the domain being targeted (light, media_player, etc.)
  • the list of entities in that domain
  • the current state of those entities
  • the services that domain supports

So instead of reasoning over the whole environment, the model is simply selecting a valid combination of:

domain + service + entity

Example prompt context:

light.back_mudroom = off
light.back_porch = off
light.dog_house = on
light.kitchen = off

User says:

turn off the dog house light

The model returns deterministic JSON like:

{“type”:“ACTION”,“domain”:“light”,“service”:“turn_off”,“entity_id”:“light.dog_house”}

That JSON is then routed back through MQTT to Home Assistant where the service call executes normally.

So the LLM isn’t handling the deterministic part — Python is still doing that. The model is only resolving natural language ambiguity, which is where the existing intent system struggles.

Because the prompt is tightly constrained and entity-grounded, very small models like Qwen3 0.6B and Qwen2.5 0.5B actually perform well for this routing task.

Current local round-trip latency with Whisper → router → action is roughly ~2.5–3 seconds.

I’m definitely not claiming this replaces the intent system or scales to full home reasoning. It’s just an experiment to see how far a lightweight LLM router can go as a flexible NLP layer for Assist.

1 Like

Again really cool. While I do disagree that having a dedicated tiny inference nlp is required because - ent selection ultimately does matter as context in the inference pipe so you can’t split out the nlp. If you do you’ve got only a remote control… but… Prove me wrong please I do see the value in what you’re doing. Will be keeping an eye. :wink:

Good luck!