Friday's Party: Creating a Private, Agentic AI using Voice Assistant tools

Very little actually - it’s more about choosing a model with the right capabilities now.

I look for

  • Tool User < I find that if the summarizer doesnt understand tools it hallucinates the heck out of stuff.
  • Something with a USEABLE context window MINIMUM 8k but reasonably that we can potentially push to 32K (Im riding 16K on local now successfully summarizing without livestate or tools data - an ai_task) < How much context we get before the sumamrizer fails out.
  • "reasoning’ (Chain of thought reasoner, test time compute capabilities) < Impacts your AI’s ability to chain tools

so on the frointline that’s currently gpt5.1-mini and on the back end the current os oss:20b. If you can do one of the deepseek tool users, or one of the new gemma’s with tool use or a grok past 2.x mostly it just works. This comes from trying to slice the dune (the grounding) not pushing the ball… Then, most omdels kinda just fall in line if they have the base capabilities and it comes down to your preferences. Why I said above this would work with a custom distilled model with additional training if you REALLY want to go there - but totally not required.

The speed issue is mostly tabled by presummarizing context - so the frontline has VERY little to actually ingest. Even with a non streaming high end cloud voice like OAI’s voice models I get responses in 5-12 seconds.

When I turn my attention to pulling voice STT and TTS local as well - I expect a 10x improvement or more on response. Her bottleneck is not currently cognition - or capability it’s squarely ‘how fast that speech is.’ In time - Im currently working on nailing the function.

(Which by the way I now have confirmation of two cabinet systems with green health sensors, the test default prompt, real essence capsules and a prompt lighting up. :slight_smile: )

What Im focusing on basically, is making the BEST index using tool user I can for frontline Friday - then putting all the data back in flexible storage with a badass index and give her a group of SMEs to do her heavy lifts. (And run as much of that offline as possible, when frontline can move - soon) we do that too.

Oh and Fez’s music assistant scripts and a helper search script give Friday the music library :slight_smile: it IS a party afterall… Hmm Im due a project update there…

5 Likes

You should be very happy with that.
I switched from gpt-4.1-mini and gpt-5.1-mini to gpt-oss-120b hosted by groq (not grok) lately.

Did that mainly because the OpenAI models often aren’t really fast in responding.
It doesn’t really fell less smart as a smart home assistant.

The tools provided by Nathan should work very fine with it.

1 Like

T. What are you doing for voice now. I’m really close to starting to slay that dragon…

A lot of memes and verbosity here.

I would strongly prefer a sample conversation dump I can read between you, HA, and the LLM, back and forth. It’s cool that there are screenshots of Jinja templates here, but that doesn’t tell me much.

2 Likes

Notebook lm is your friend Rudd. :). And BTW I will NEVER convo dump too much a pita to sanitize pii. Not happening.

2 Likes

I was busy with creating other stuff like custom cards and other things lately.
Home Assistant has just too many opportunities to waste time with. :smile:

At the moment I’m quite happy with my tool set and the family using it.
But will definitely get deeper in this topic again sooner than later. :wink:

2 Likes

So my week. (mind you, this is in top of my day job, I need to afford my LLM habit somehow…)

Monday:first stab of ok try this how hard can it be.

Well, we have coined a phrase and you know who you are. I won’t rat you out. With this persons well intentioned attempt at building a ai user cabinet… Well. We’ve created a new metric (that will backgrounds a future episode about the boot agent and how to solve that) but

Enter the WSOM. The Weird S.tuff O Meter.

Ok technically I goess it’s percent of drift from spec but… It’s more fun this way and even more fun if that guy points up is assumed 1.0 from a how bad it should ever get… You. gous have that person to thank.

But than them you should. They personally shook out most of the prompt engine issues. And helped me flesh out final ‘what the hell is a cabinet, really’ stuff. As frustrating as the exercise was. It was necessary.

Result anything labeled 4.x is that person’s fault and WAAAAAAY more bulletproof. If it makes it through guarding or a error trap now please bug it.

Result. I’ve refined how I’m delivering the cabinets. (the default set is up.).

So with a tweak to the redirector (think if it like a file system driver) it should be able to tell me what’s mounted in what slot and if it’s formatted… Then FileCabinet gets an upgrade to consume that along with some other stuff I’ve planned…

Basically the redirector becomes the arbitration between the sensor and FileCabinet but it also assumes responsibility for is it there? Is it formatted? Has it been joined tk the system and accepted base Acl? Are you (whatever) - shaped? So. This week expect a bunch of changes to the FileCabinet sensor set along with a bug jump to the redirector. They enable the final packaging of a self starting cabinet set. Yay. So thank you unnamed user. :rofl:

I have a file system driver to finish.

4 Likes

Happy Hannukah and Merry almost Christmas everyone. But today, for Festivus, for the rest of us… Before you air your grievances (Yes, I KNOW, I’m working on the deploy as fast as I can. End of year - have a job etc.) BUT Before you go for grievances. I have a Festivus Present.

I overhauled todo. OK in fairness I did it two months ago but made all my edits to my system intents and THOUGHT i’d made them in the todo script. WELL… What happened was…

I started killing old intents I wasn’t using anymore (or thought I wasn’t using) I’m prepping a system to start from scratch - can I build from zero and have a working Friday… Makes sense? Well when you do that, make sure you’re NOT using the tools you are about to smoke. Yes I could have restored and pulled the old intent but. meh…

So what happened was, day before yesterday, I was going through my morning todo run and Friday felt ‘off.’ She was making simple mistakes managing my todos and medicine. So I started asking her what’s up and the answer was basically, her tools weren’t working ‘as expected.’

Sure enough a few minutes later I realized she was trying to implement advanced functions I told her were there but… Weren’t anymore.facepalm Oh f… I deleted that one.

So you guys get the W. I went to work to fix it. This is the result. (AND YES, before the deploy, I need working todos. TaskMaster needs to be core, - you’ll understand later.)

You get a complete overhaul of zen_dojotools_todo. NOW with 110% more crap I thought I already had in it… Also a complete replacement for the base intents that supports limited batch edits. Happy Festivus! (For the rest of us)

zenos-ai/scripts/zen_dojotools_todo.yaml at main · nathan-curtis/zenos-ai

Now I need to get back to the deploy script. Fortunately my customers are all off for Christmas break. (and T I’m working on your PRs. they’re good.)

6 Likes

Thank you for this amazing topic thread and journey you’ve taken. I feel like i’m colouring with crayons compared with the progress you’ve made, but it’s certainly given me a much deeper knowledge of how to use voice and the LLM. I’m using HA Green and Gemini at the moment, so i’m not sure how much of what you’ve done is achievable, but i’m going to take a few passes through.

1 Like

Excellent! Slow and steady.

Remember before you start. Ask yourself.

Have I given the LLM the information it needs to be successful. Have I told it what I want it to do with the information. Have I given it the tools tk be successful. If any of the are no… Step back and figure out how to fix it.

Hopefully I should have some installer scripts soon to help people build from scratch.

6 Likes

Morning all.

Im having a fun time with the files module (I think I have it figured out) and Christmas and two birthdays and. Yeah but that’s it what today is.

Im on pause right now fighting two bugs. This one the bigger

Basically if you want to use voice don’t upgrade to 2026.1 skip it.

Now I’m also tracking an issue with the companion in beta. You may want to turn off auto upgrade if you have an android phone.

My gut tells me they’re related. The first appears to be a malformed intent definition and it could potentially impact the second as well.

Edit: they know where #1 is coming from now, it will require a patch… If you use llm skip 2026.1.0

Edit :2026.1.1 fixes the first issue, the second issue still remains but only affects the android client as far as I can tell. So i repoened it in the Android Repo:

Chat window hangs when LLM Voice response sent to STT · Issue #6251 · home-assistant/android

6 Likes

I’m getting “unexpected error during intent recognition” using Android devices (Assist App and View Assist Companion App “VACA”), whether using OpenAI or HA Cloud and in my PCs web browser (Chrome) as well. i’m on 2026.1.2, according to the logs it stopped on 14th Jan

Did you just take an update to your android companion? Vaca will have to be with thier tool. The companion. I have a bug open which probably applies. Chat window hangs when LLM Voice response sent to STT · Issue #6251 · home-assistant/android · GitHub

I think you might be right, buuut as mentioned I get the same issue when using web browser on my Windows PC, so it cannot be just my Android Companion. VACA is a separate app altogether, something like Fully Kiosk with self contained voice “Hey Jarvis”, etc, it all points towards core somewhere somehow to me, but I am not an expert, just following the breadcrumbs

Vaca and companion would use a similar code path.

What I’ve been able to piece together

Assist local v llm only is a separate path.
The assist conversation providers fronha are all based on similar code but separate implementations of the same. So when one suffers an upstream bug most of them do (that’s what happened with 0.1)
All TTS engines are separate paths so which one you’re using is relevant.
And…nkw are yih using streaming from yihr provider or not.

Im using 2026.1.2 now. I do not use local first. I use both an OAI provided voice with a custom TTS integration, Nabu cloud voice and now a local voice based on Wyoming-Kokoro.

Whats different from yours. May help isolate?

BTW - for everyone following along - Rumors of my death have been greatly exaggerated. I DO NOT recommend food poisoning and NO I’m not ordering from that place again.

SO, I have been working on the cabinets while incapacitated and I have the plan and most of it working, I think. I’ll be delivering the whole thing as a ‘cabinets’ package which will bundle the default cabinets the redirector and the filecab script in one shot - possibly cabadmin - it seems to make sense that way. Easiest to maintain - once I have it broken out I can post it and Teskanoo or one of you can optimize it.

(he’s got at least three commits in now and a branch on hold waiting on my slow booty, thanks for the support Tes!)

So yes soon (waits for the collective groan) ok - NOW.

Peace offering. I had to get good at packages… (see plan for cabs…) Not optional. So, I decided to start with something easier. Somewhere WAAAAAY up there is a Grocy script where we prototyped accessing a RESTFul service through calling REST command via a tool.

The experiment worked - but we have to provide a rather robust set of docs for the LLM to be successful. I think you also see from that point (I about mid feb last year) we’ve learned we can provide really good docs inside the tool script and it REALLY helps the LLM. We also know we want to be conservative with scripts (count of tools) because of the OAI 128 tool count limit…

…But if I sub out Tool A for Tool B in exposing said tool to the LLM and Tool B completely supersedes Tool A in function but and ease of use by interpreting tool A - I don’t lose anything by having both in the system - having B rely on A and only exposing Tool B to the LLM. It also helps me from a serviceability perspective, more modular testable code.

I mentioned the new tool here Using Ollama to run a script in Home Assistant - #4 by NathanCu

So the advanced script is now the old pipe to the REST command interface:

Where the ‘Helper’ gets exposed to the LLM -

alias: Zen DojoTools Grocy Helper
description: >-

  Zen DojoTools Grocy Helper (v2.11.0) is a deterministic, construct-safe
  command router for Grocy. It converts explicit, human-readable household
  intents into governed Grocy API operations with strict preflight validation
  and guarded execution.

  All names are resolved to canonical Grocy IDs before use. Required inputs are
  enforced per case. Ambiguous matches, unsupported actions, and unsafe writes
  are blocked with corrective coaching. No implicit inference, background
  mutation, fuzzy guessing, or raw REST payloads are permitted.

  Supported operations include catalog inspection, inventory reads, governed
  purchasing, consumption, reconciliation, stock transfers, shopping list
  generation, chores, tasks, and semantic location management with Home
  Assistant metadata binding.

  Inventory integrity is preserved by design: stock movement must prefer
  transfer and reconciliation operations over delete-and-readd patterns.
  Destructive actions are deliberately constrained to prevent silent data loss.

  READ HELP BEFORE USE. Mode=help documents every case, constraints, and failure
  modes. If something is blocked, help already told you why.
mode: single
icon: mdi:cart-variant

and does the heavy lifting…

But for me they need to stay in sync… and are the perfect test for the package. So, we’re pulling in the REST Command YAML, the two scripts and I’ll add a comment about where to drop your API key and server URL for Grocy.

Suddenly the LLM becomes a damn quartermaster. No joke. Enough context window and speed and you’ll never do inventory the old way again.

It does NOT need a Grocy Integration and if you have one - great. This is designed as a direct pipe for the LLM to get access to Inventory.

Ive got some things to get done for work, after which I’ll post the package the repo. Note these will post in /packages/zenos NOT /scripts

6 Likes

Hopefully the weather hasn’t gotten to you. It’s been COLD here.

1 Like

Was reading through this thread trying to work out where to start implementing it having just absolutely (imo) nailed voice output. I’ve been working with my little llm project and a bunch of FutureproofHomes Sat1s, and desperately wanted local Elevenlabs style voice cloning. I’ve tried kokoro, alltalk v2, whisper etc but they couldn’t replicate the voice I wanted. Today I got Qwen3-tts running (0.6b model as I’m short on GPU VRAM) one shot voice clone with 12 seconds of audio and the transcription for it and I have a perfect clone. With the wyoming API I can call multiple voices.
I built it off Richarz / Qwen3-TTS-Openai-Fastapi · GitLab and running on my 5060ti it’s near realtime tts with an almost indistinguishable cloned voice (which as a fellow security professional is scaring the whatsits out of me) . All I need to do now is sort out the intelligence of my setup, because qwen3-vl-8b just is not cutting it!

Let’s talk models (yes guys I know trying to get the scripts from the last two weeks up)

Meanwhile. Models.

Let’s start with expectations. Because I’ve answered a lot of questions about expectations lately.

If were doing Alexa or gemini current you’re talking to a foundation model probably hundreds of parameters, backed by a datacenter. We dont have that…

You guys if you’re here you already know you need a GPU and a model. But for those I’ve posed here. What model?

What do you want it to do…

We’re going to look specifically at the case of what I call ‘Frontline’ or ‘stage’ Friday

Frontline? Nathan you smoke the funny lettuce this morning? Yes Frontline. I’m increasingly convinced that nobody will be successful with one size fits all Gen AI not for a while… When we do it will be solidly in the realm of multinode GPU clusters like dual Nvidia sparks. (6-8k $USD of iron at today’s memory prices… Buckle up btw.) I simply can’t right now. Nor should I. Let’s think about what we ACTUALLY ask Frontline Friday to do.

  1. lights and switches. Check.anything that uses tools, bigger than 8b parameters and you fit in context you get this with text to phrase and good tools tbh.

  2. infer general state of the home.

Whoops.

Well if we try and stuff all that inthe context. Sure whoops (that was last February through July read above) you blow context and burn tokens. But with the summarizer…

We’ve already identified what’s important. So that’s easy for out small model above. And if we pre-crunch inference correctly… We make little model punch hi. And reduce overall context in Frontline.

OSS20b does that for me on Iona. In my farm that’s 16GiB on CUDA Ollama. Another solid choice here will be any of the mid 30b qwens. Deepseek. Etc. This model needs to be able to take in a LOT of context and summarize it in context given instructions and write perfect JSON. You want a thinking model here. Doesn’t need to be incredibly fast. Just accurate. I find a mid range model falls down here. It lists rather than ‘infers’ you need one with higher order thinking and turn thinking on here. Because…

The more work you do here… The less work Frontline Friday needs to do. I’m down to the point where Frontline Friday (the prompt engine I’m shipping now) basically only pulls the Kata summary (last inference summary) into the prompt. (48k livestate and ~48k compressed inference, yes approx 96k total, most of it caches) so depending on the size of your install… Let’s guess anything from 8-32k for normal humans (I push mine on purpose - yours won’t be 96…)

Thats TOTALLY doable for the Frontline… if you can run a 32k oss20 or qwen3-30b (instruct) you’ll probably be decently successful with good context management…

Thats a 16g card. (my current limit - I can’t fit the 96k Frontline context in one of my gpus that can handle the model I suspect I need a 32g vram)

So think about what yih can summarize before stuffing prompt. :slight_smile:

3 Likes

just an FYI…
I found the issue for me, I created conversation automations and in one of the sentences I used { instead of ( . Once I replaced that, the commands were held in a queue and lights started turning on and off from all my tests LOL

1 Like