MCP Assist - 95% Token Reduction for Voice Assistants with Local & Cloud LLMs

sparkydave · January 28, 2026, 4:09am

Is that something that can be changed via the UI somewhere?

jose1711 · January 28, 2026, 3:46pm

With Gemini model I am getting an “unsupported action” when trying to set a timer. Is that expected? Thanks

jose1711 · January 29, 2026, 7:37am

it’s hardcoded:

github.com/mike-nott/mcp-assist

custom_components/mcp_assist/mcp_server.py

d7626ac0e


      
          """Discover entities based on criteria with progress notifications."""
          # Notify start
          self.publish_progress("tool_start", "Starting entity discovery", tool="discover_entities", args=args)
          
          entities = await self.discovery.discover_entities(
              entity_type=args.get("entity_type"),
              area=args.get("area"),
              domain=args.get("domain"),
              state=args.get("state"),
              name_contains=args.get("name_contains"),
              limit=args.get("limit", 20),
              device_class=args.get("device_class"),
              name_pattern=args.get("name_pattern"),
              inferred_type=args.get("inferred_type"),
          )
          
          # Notify completion
          self.publish_progress(
              "tool_complete",
              f"Discovery complete: found {len(entities)} entities",
              tool="discover_entities",

mikenott · February 1, 2026, 4:23pm

@jose1711 Unfortunately HA doesnt have a good timers framework that assist can connect into yet. There are tools for start and stop, but not for create or managing notification etc.

mikenott · February 1, 2026, 5:03pm

@sparkydave @jose1711 - I’ve added a setting to adjust the entity limit - Release v0.15.4 - Configurable Entity Discovery Limit · mike-nott/mcp-assist · GitHub

JoaoFernandes02 · March 15, 2026, 8:46pm

@mikenott Hey, I’m not sure why but it looks like the AI isn’t sending arguments to the tool calls so it justs keep executing them over and over until it reaches the limit

dzmiller · March 16, 2026, 2:39am

related. I use a local LLM with a cache. I have about 90 entities exposed, a few are extensive. I just asked about the rain totals and then answered one question. 2040 tokens were pulled from the cache and 112 were produced. Between HA and openclaw the LLM server runs 90-94% cache rate. oMLX for MacOS. (qwen3.5 35b for HA and Qwen3,5 122b for openclaw).

I don’t know my final architecture, but openclaw looping ability has possibilities that HA doesn’t. For example, “Check the rainfall totals every 30 minutes and turn on the bedroom lights if rainfall exceeds 4 inches. Delete this check at 7am” actually works so far.

bgreet · March 23, 2026, 2:28am

I can’t find a place to put the API key for llama.cpp when adding my model. I can get everything added initially but get the error about the API when trying to use the assistant.