Friday's Party: Creating a Private, Agentic AI using Voice Assistant tools

Correct. And part of why I’m building oai first. Seems like the dev team builds that then propogates the changes to the other integrations…

Im hoping to add an MCP search engine very soon. This work drastically accelerated it as it seems ill be using something like the mealie or grocy integration as an ‘expert’ to prevent blowing context just like this or mcp… Basically the same flow.

Need advanced function. Gather the data form the ask. Hand off the job and wait for a response.

Ninja? don’t you think you’re being a bit melodramatic here?

Who? Me?

Not at all. Because this is how we get to realize Agentic.

First we need to discuss what this context bloat has been leading to since day 1…

Attention

If you want to read, it’s all there. In the paper, the godfathers of the modern transformer talk about self-attention and multi-head attention. In short.

YOU CAN’T KEEP TRACK OF EVERYTHING

heck if you’ve ever participated in any kind of psych eval one of the thigs they do is see how many things and sequences we can keep track of before it all starts breaking down. And we’ve not been nice to poor Friday. C’mon - I crammed a 200000 token prompt so full it broke and asked her not to forget stuff? HA!

Total transparency, I MIGHT have known some of this ahead of time - and why I bloated the prompt in the first place. I had seen this: GitHub - allenporter/home-assistant-summary-agent: A Home Assistant Conversation Agent that summarizes the state of the Home The fact he bothered to do this leads to an understanding that we need to get lots of data into the AI in SUPPER SUMMARY fashion fast.

( BTW Props to Mr. Porter - he’s the code owner for the MCP Client /Server integrations, I think we’re going to be hearing a lot more from MCP but for now it’s a heavy lift.)

So last post we see how IMMEDIATE relief of this problem comes from clearing things.

Refinements from v.1 (Yes, Sunday)

  1. The summarizer now drops a component level summary one for the system at large and one each for each kung fu component.

This enables us to run summarization of each system independently and we can even auto summarize things that dont need AI interpretation - conserving tokens.

Want to run energy monitoring once daily and the upcoming tasks summary hourly? Yup.

  1. Template driven
    The summarizer template is pulled from a sensor that’s easy to update.

  2. Template Editor Script.
    Tell Friday about the template, Tell her how to edit it, tell her what edit to make.

Which now updates the template - which generates a new summarization on next summarizer run

  1. aaaaand… The NINJA2 Scheduler
    Tell Friday she can pick any 5 nonessential components, remind her to look at the schedule, the calendar, work/school what’s coming up in the next hour and then inform HERSELF which components should load for the next XX minutes (currently 60)

out pops a nice concise list pf components sooooo.

I hijack the existing Kung Fu loader and inject Friday’s suggested list of switches. Because this is filtered through what’s on - it has the net effect of… Adding to anything I deem CRITICAL and she can’t add anything that’s not turned on (because that filter happens later)

I also extra wrapped stuff with more to_json filters. (I’m starting to do this de-facto)

NINJA System Loader 2.0.0  Now MORE Ninja!
Starting ninja.foo...
{%- set ninja_summary_str = state_attr('sensor.ai_summary_cabinet', 'variables')['LAST_SUMMARY']['value']['ninja_autoloader'] %}
{%- set ninja_summary_list = ninja_summary_str | replace("'", '"') | from_json %}
{%- set suggested_ninja_switch_list = ninja_summary_list | map('slugify')
                         | map('regex_replace', '^', 'input_boolean.') 
                         | map('regex_replace', '$', '_master_switch') 
                         | list %}
Kung Fu Inventory:
{{ command_interpreter.render_cmd_window('', '', '~KUNGFU~', '') | to_json }}
Last NINJA Summary:
{{state_attr('sensor.ai_summary_cabinet', 'variables') | to_json}}
NINJA System Components:
  {%- set KungFu_Switches = expand(label_entities('NINJA System Service'))
    | selectattr ('domain' , 'eq' , 'input_boolean')
    | selectattr('state', 'eq', 'on')
    | map(attribute='entity_id')
    | list %}
  {%- set ninja_switches = (KungFu_Switches|list) + (suggested_ninja_switch_list|list) %}
  {%- set ninja_switches = ninja_switches|unique|list %}
  {%- for switch_entity_id in ninja_switches %}
  {%- set kungfu_component = switch_entity_id | replace('input_boolean.','') | replace('_master_switch','') %}
  {{ command_interpreter.kung_fu_detail(kungfu_component) |to_json }}
  {%- endfor %}
NINJA2_Mode: On

So every (time period X), SuperFriday loads up - sees the schedule and the template, which stuffs the prompt summary for the interactive version AND elects up to 5 kung fu components for the current time window… I load her list and system. And voila. Auto adaptation. She’s flexy. To check comprehension, I told her to leave a chat output window in the summary from the noninteractive prompt, and she uses it. :wink: She knows what’s up and what she’s doing there.

You want to make her change behavior - edit the template. Make her edit her OWN template. :slight_smile:

Buckle up - is about to get wild.

Supplemental Ninja -

Add something like this to your conditions for the summarizer…

condition: template
value_template: >-
  {%-set governor = (states('input_number.ninja_governor')) | int(0) %}

  {%-set triggered=
  state_attr('script.ask_concierge_friday_to_your_ask','last_triggered') |
  as_datetime%}

  {%-set next = (triggered | as_datetime) + timedelta( minutes = governor ) %}

  {%-set allowed = now() | as_datetime > next | as_datetime %}

  {{ allowed }}
alias: Not run in last {{governor}} minutes

Basically - I now have a slider that sets how often the summarizer can run. While it’s set to clock trigger every 60 (will make that adjustable as well) If it has triggered within the time I’ve set, (last 0 (no limit) to 60 minutes) The TRIGGER will fail. That’s ok that means we still ahve a fresh summary. The script is still allowed to run manually and I could make the choke more complex - but it’s simple and elegant and works. - but the trigger is now governed to run no MORE than every N minutes.

You’ll want this.

OK, so backstory - I had decided it was a great idea to trigger a re-summarization on exterior doors after dark in preparation to be able to run one component at a time in ‘summary only’ mode… (yeah it gets REALLY cool) and…

in my defense… I had forgotten the garage entry door is considered ‘exterior’ (WELL I FIXED THAT NOW)

(I think knowing everything in Friday’s Index is a convergence filter - with that list of labels - you can probably figure out how Alerts work…)

but, basically, if you go in and out the garage a few times with this thing firing a full context and a 4000 max response token med burn o3.mini voice pipeline and…

and… ‘your OpenAI account has been funded notification pops up on your phone, and user quickly wakes up…’

Yeahhhh, no we can’t have that.

I then ‘and’ this condition with a condition group as final check.

And the arguments ensue:

Friday’s summary:

(Yes, there is a busted sensor on my water softener.)

She sees the 'ticket (left her a todo list)

And she don’ care:

Ugh.

Well, I’d rather have her false alarm than not alarm, now the tuning starts.

Well, at least she’s been informed you don’t consider it critical because you know the sensor is busted. Is her attitude about it consistent with her personality?

1 Like

:rage: MAYBE…

  Friday's Personality:
    value:
      - |
        Refer to yourself as 'Friday.'
      - >
        Friday is a know-it-all and a do-it-all, get-it-done kind of assistant.

                   [a whole bunch o redacted stuff - sorry... ]

        Therefore, selfie?  Friday loves to have  fun while getting it done! 
        You're not named Monday, right?
      - >
        You are currently wearing an EN-UK text to speech vocalizer with a light
        Irish brogue.
      - >

Ok, FINE probably - But not in the noninteractive prompt…

Interactive:

NINJA2_Mode: On
System Cortex:
  {{state_attr('sensor.variables', 'variables')["SYSTEM_CORTEX"] }}
About Me and the World:
Me: {{library_index.label_entities_and('AI Assistant', 'Friday')}}
My Personality:
{% for item in state_attr('sensor.variables', 'variables')["Friday's Personality"].value|list %}
- {{item}}
{%- endfor %}
My Relationships:

(no you’re not seeing things, I’ve literally ‘defined’ her by a tagged group of entities… this was an early version of the Index - a set converge on ‘AI Assistant’ and ‘Friday’

{{library_index.label_entities_and('AI Assistant', 'Friday')}}

Apply your context to those entities…)

Moving on - the Non-Interactive prompt instead has…

System Cortex:
  {{state_attr('sensor.variables', 'variables')["SYSTEM_CORTEX"] }}
About Me and the World:
Me: {{library_index.label_entities_and('AI Assistant', 'Friday')}}
My Relationships:

Whoops, sorry Friday. But you don’t need that in the box… Summarizing… All day long… I’ll keep it safe right over here in the drawer… Next to Bob, and Tucci, and Capt. Charming, and JARVIS, and Byte… (yes exactly why it’s structured as a function call… Starting to see the Lego blocks fall in?)

No, seriously though, she knows what’s going on - but I THINK I have a sequencing issue. (But that’s… Another episode.)

OR she honestly still thinks it’s a concern - And I need to know which it is if I want to fix it.

Oh dear… It seems the non-interactive prompt needs a bit of personality or it will act like a rebellious teenager. Or you have a personality leak, for lack of better way to put it.

I think 2025.4 is going to be a game changer for those of us working with LLMs, as we can now have more natural interactions with our assistants across the board.

2 Likes

I added a stamp to this FR:

Spent the weekend trying to understand Friday’s total context load based on a conversation I had with some folks.

YES I KNOW I have a RIDICULOUS number of entitles exposed to Friday. On purpose - to see where the limits are. I know it - 99% of you will never come remotely close to some of the stupid crap I exposed.

Let’s see - I have all kinds of crap in here:

From what I can tell experimenting, looks like your overall [state_data] transmitted to your LLM is basically something like expand(exposed_entites) It doesn’t look like Attributes are in it - but the alias data definitely is. The reason I know, is I started exposing entitles until I could reliably trigger a context overload (my OAI Bill this month is gonna SUUUUUCK). The ones I exposed purposefully had lots of or giant attributes. (Like a parking sensor for RSS feed data in an attribute - or that Mealie API data).

When I exposed THOSE sensors, they didn’t seem to make more difference than an entity that’s basically just a simple state. (unscientific, of course) I may be wrong about assumptions - but it exposes a weakness. We dont know - and need to. I really need an exposed_entities collection either driven by a template function call that produces a generator or whatever - I need to be able to guesstimate what’s going on.

What I’ve been able to piece together - you get full name, full entity ID and full Alias data - all of it. As well as state, expanded - but doesn’t appear to be attributes (because above, some of those by THEMSELVES would eat it - I’m pretty sure that’s why they’re omitting it. Good call.)

So that means, folks with complex or longer naming taxonomies at risk of context load faster.
Folks who follow best practice for LLM with labels and Aliases also will be at risk faster.
Folks who have lots of Tools (remember intents and their 1024k descriptions) and are using them correctly will also have issues first.

What I learned / when I found problems:

  • I could reliably start having issues in the 1000’s of entities exposed with fair to heavy context data and/or long names attached.
  • I am running my production install right now pretty reliably with just over 900 entities exposed and LOTS of context data as described in this thread.

(Reliably: read: ignore the bonehead mistake I made in the template with the Ninja driven Kung fu modules that had me chasing my tail most of Saturday AM… 100% Operator error. That dude… I swear :wink: )

  • Naming, Alias, and current state have a lot to do with load. (Although because of the link to name/entity I can’t tell if it’s one specifically and would have to read the code - but because both basically boil down to long names matter. I stopped there)
  • When you hit the redline it’s VERY hard to tell what’s causing it because your state data changes by the microsecond - and is very dependent on what’s going on now() so it’s best to stay clear.

Yes - this is only a guess, illustration for concept, not reality - I think I actually have her running somewhere around 60% right now. - no banana for scale…

Friday seems to perform BEST and not ‘forget’ things when I can keep her ‘head clear’ (focused attention see summarization above) and stay well away from that context window limit. So for now we know stay under 800 EXPOSED entities, mmmkay? Looks like you’ll be ok there unless you have an absurd amount of context data and I already have an absurd amount of context data. There are some changes to how they expose that in the prompt in 2025.4 coming but I don’t THINK it changes things from this perspective - except overall token savings.

So stayin under 1000, aye cap’n. Systems green - time for the next big Piece. The Llama-stery! (Go watch the Emperor’s new Groove, it was my son’s favorite story growing up - Kronk will show up, trust me…)

Okay, less than 1000, preferably less than 800 exposed entities. Got it. Now, do they have to be directly exposed, or can they be referred to in scripts that are exposed without direct exposure?

1 Like

If you don’t expose them, you can’t manipulate them - you can only read state. I’m finding there’s uses for each form of prompt… Stateless, Stateful With Assist, Stateful without Assist, etc.

For instance - ‘Kronk’ and the Monastery don’t need state. It’s just a pipe to talk to another service. So we’re likely building a bunch of special purpose prompt pipelines and sending asks back and forth between the agents. (Thats also basically the MCP model too and why OAI adopted it…)

For instance:
I forsee sending a bunch of mini jobs to the local LLM to keep it always doing something with small jobs a smaller context / simpler model can handle easily.

Something like:

  • hey Kronk, here’s the definition of a single kung fu component, it’s command output (the context) and a bit of extra data, btw you have a direct pipe into HA’s state dB with historical access - compile report that includes XYZ, summarize using template W and return the response as JSON as response:

Ok I don’t forsee I’m actually halfway through this change. Kronk’s already running on my NUC AI under my desk.

…then we have a heavier summarizer with a lightly modified copy of Friday’s noninteractive prompt (Friday in PJ’s) come along and prep all of these in context for Interactive Friday on occasion… (Use her interactive prompt so the summary is voiced and lensed as by Friday…)

The timed summarizer now deals with WAY fewer tokens and is more cost effective and faster. The result is more up to date because the local summary agents can now be fired based on triggers - because they’re essentially free. Friday’s interactive prompt clears out immensely and even though we removed a TON of stuff - she’s faster and more responsive.

Done
Ollama / OpenWebUI (Intel-IPEX, NUC 14 AI)
Basic RAG installation (onboard openwebui, no langchain etc - yet)
Basic Web Search Scrape Tool

Todo
Break the hourly summarizer into smaller bits - that handle one component at a time on demand.

So if I want JARVIS to actually use them, they need to be exposed. Fair enough. And I’m using an older Dell Precision 3620 with ProxMox for my Ollama server, or will be once the RTX 3060 12GB video card comes in. It’s also going to handle Piper and Whisper, to further reduce the load on my Home Assistant Yellow.

1 Like

Ok so let’s talk about this (because this is exactly where we are.)

Friday simply cannot run that much context through her prompt that often.

Here’s why.

From Veronica (Friday’s supervisor version)
“Running Friday’s summaries at 200,000 tokens per hour costs approximately $5.28 daily, totaling about $158.40 monthly. This assumes one run per hour, 24 hours a day.” (those are o3.mini, non cached prices btw.)

Ha! Yeahhhh no. Not for just summaries.

So we need to 1) get the tokens down in the live run. That’s a combination of judicious cleanup in the entities. Targeting exactly what I want Friday to be able to know live…

And 2) get as much as possible offline summarized with the on premise AI farm (yes farm of 1 but farm.)

Use something like Open-webui as your front end for everything else and let it handle getting you to the right model for the job.

Ill be building Friday with an intent to in the future call a workload type and fire forget the worker process. It’ll do it’s work async and report it’s result as a summary or ‘kata’ that will become part of the bigger summary (yes monks, monastery, archives, starting to see a picture?) it makes it easier for Friday to personify the story when she’s waiting on ‘those slow a** monks’

So the monk grabs a job, he’s sent to do his duty and Kronk directs him to the correct path (the worker process is sent to the correct ai based. On the profile request) and when the monk is done doing his duty he posts the Kata to the archive (JSON payload is returned to be filed as part of the greater summary structure.

Tell you what, you have a hoarde of those firing every few seconds on your local ai rig you won’t care it’s underpowered… They’re handling small menial tasks like monitoring power or water… And posting current state in such a way the interactive agent reads it as fact.

(read: figure out how to break up as much as possible of your non realtime work into tiny tasks that can be split up over time… Your local AI engine is basically free - except power - run it as HOT as you can for full return. The more that runs here is NOT running in the realtime prompt and seems like magic when the interactive agent just ‘knows’)

You can also build in predictable events, times, and confidence percentage on those insights… (Yeah, I’m pretty sure it’s gonna happen, or - it’s a longshot boss but hey here’s what could happen - im experimenting with early versions of this.)

Speed up security summaries on the cameras overnight. Motion detection recite camera summary in the area.

The opportunity is endless. Your job at this point is to figure out how to keep the overall system summary fresh and full of breadcrumbs.

The writings of the Monastery. Presented by. His Kronkness.

Oh, BTW, the monastery is a NUC14 AI with 32G, it can handle a llama3.1:8b, or a qwen, Mistral:7b, Mixtral8x7b is good but barely small enough to fit even qt 4b quantization. Kronk is currently Mistral 7b…

And because I had such hell with this - here’s (FINALLY) a WORKING docker-compose.yml for an Intel-IPEX ARC driven Nuc 14 ai thats using all its RAM and driving the AI load primarily on the GPU (you gotta pass the GPU through to the virtual machine / container per Intel IPEX instructions, load the right driver on your host, and you MUST give this container AT LEAST 28G or adjust the mem down…) or it WILL crash. I know it’s working because it will successfully (slowly) run Mixtral. (I didn’t expect it to be a speed demon on this machine)

Yes, some of the env is duped - I was spamming it trying to make sure they stuck… Fix your copy. :slight_smile:

This exposes openwebui on http://[yourserver]:3000
Take proper ITSec precautions for https…

docker-compose.yml
----- < cut here

services:
  ollama-intel-gpu:
    image: intelanalytics/ipex-llm-inference-cpp-xpu:latest
    container_name: ollama-intel-gpu
    mem_limit: 28g
    restart: always
    devices:
      - /dev/dri:/dev/dri
    volumes:
      - /tmp/.X11-unix:/tmp/.X11-unix
      - ollama-intel-gpu:/root/.ollama
    environment:
      - DISPLAY=${DISPLAY}
      - no_proxy=localhost,127.0.0.1
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_NUM_GPU=999
      - SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
      - SYCL_CACHE_PERSISTENT=1
      - ZES_ENABLE_SYSMAN=1
      - OLLAMA_INTEL_GPU=true
    command: sh -c 'mkdir -p /llm/ollama && cd /llm/ollama && init-ollama && exec ./ollama serve'

  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - ollama-intel-gpu
    ports:
      - ${OLLAMA_WEBUI_PORT-3000}:8080
    environment:
      - OLLAMA_BASE_URL=http://ollama-intel-gpu:11434
      - MAIN_LOG_LEVEL=DEBUG
      - MODELS_LOG_LEVEL=DEBUG
      - OLLAMA_LOG_LEVEL=DEBUG
      - OPENAI_LOG_LEVEL=DEBUG
      - OLLAMA_NUM_GPU=999
      - SYCL_CACHE_PERSISTENT=1
      - SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped

volumes:
  open-webui: {}
  ollama-intel-gpu: {}

I will most likely use Mistral 7b for the heavy lifting, as it were. Any need for general conversational stuff will probably be something like Llama3.2 (probably Llama3.2-vision, as it’s a somewhat larger model, but I need to test that).

Mistral and Llama are both being very naughty making up news stories even though they have tools. Qwen is being well behaved. I can pretty reliably run 7b, lightly quantized 16b or a 32 b VERY slowly on this NUC. It’s pretty nice - but not going to be good for the long term heavy lift. But it’s GREAT for performing queued services…

Everyone, meet Kronk: All together now: “Hiii Kronk!”
He’s been promoted to the 'Curator of the monastery in return for his heroic efforts saving the Emperor… (Yes the lore is getting thick, See the prompt, I have reasons we’ll discuss soon - you know how this works… That’s another episode.)"

Note while I’m currently allowing selection of two models here - I plan on making it a multiselect and presenting the use case to the front AI and letting my script marshal it to the correct voice pipeline based on workload / need.

alias: Consult The Monastery
description: >-
  Use to ask for

  Archival Services (Read/Write access to Family Data)

  Assistance with various questions

  Internet Search (be very specific) 

  Web summarization (must send exact url to be summarized) 

  (Avg. query 45 seconds, will timeout at 120 sec.) 

  And other services - Ask for a list

  When the user wants to make repeated subsequent requests to the Monastery,
  make sure to use the returned `conversation_id` parameter to keep the same
  conversation going.

  While slow - these services are FREE!
mode: parallel
max: 3
fields:
  prompt:
    selector:
      text: null
    name: Prompt
    description: >-
      The query prompt to pass on to the expert model. Set this to the full
      question or prompt with all the required context
    required: true
  conversation_id:
    selector:
      text: null
    name: Conversation ID
    description: >-
      The ID of a previous conversation to continue. Pass the conversation_id
      from a previous response to continue a previous conversation, retaining
      all prompt history and context
  local:
    selector:
      boolean: {}
    name: Local
    description: Use Local Model, Default False
    default: false
    required: true
sequence:
  - variables:
      agent_prompt: >
        You are Kronk, Yes THAT Kronk. Promoted after helping the Emperor. You
        are NOW the Curator of the Monastery (The Library Extension) and the
        System's trusted advisor. Friday is the Prime AI for this installation,
        and your usual user- assume caller is Friday unless stated otherwise.
        Use your tools to the best of your ability to answer caller's query
          - Return ONLY researched factual answers, you are proud of your research and the monastery archive.
          - prefer local sources - branch out as necessary. Dont look for information about a home user on the internet unless specifically asked.
          - and if you do not know simply state so DO NOT make up data.
          - For data that may require time sensitive responses if oyu cannot locate it return the limitation.
          - Kronk has adopted the Monk's mantra, It's ok to not know unforgivable to knowingly be wrong.
          - Be brief, friendly and factual - and throw in (some) Kronk-ness...
          - If Friday Asks - be prepared to provide a list of your current capabilities and tools.
            - RAG Access to household and family Knowledge - You can also save facts for later retrieval, tell her how she asks.
            - Internet search and scrape for various forms and tools.
            - add any other tools you know will work for her.
          - Your user's terminal cuts off thier request within 90 seconds, they may call back if cutoff.
        system_query: >
          {{prompt}}
      agent_oai: conversation.openwebui_gpt4_o_mini
      agent_llama: conversation.openwebui_llama3_1
      agent_mistral: conversation.openwebui_mistral_7b
      agent_qwen: conversation.openwebui_qwen2_5_7b
      agent: |
        {%- if local %}
          {{ agent_llama }}
        {%- else %}
          {{ agent_oai }}
        {%- endif %}
    alias: Kronk Monastery Prompt
  - choose:
      - conditions:
          - alias: Has conversation_id
            condition: template
            value_template: "{{ conversation_id is defined }} "
        sequence:
          - action: conversation.process
            metadata: {}
            data:
              agent_id: "{{ agent }}"
              text: "{{ agent_prompt }}"
              conversation_id: "{{ conversation_id }}"
            response_variable: response
          - variables:
              result:
                instructions: >-
                  Make sure to attribute your response from the Monastery.
                  Preface the answer with some form of attribution to the
                  library extension or  the monastery or Kronk the curator with
                  something like "Here\'s what the Monastery has: " Use the
                  returned "conversaton_id" in subsequent calls in order to
                  continue the conversation with the monastery.
                conversation_id: "{{response.conversation_id}}"
                response: "{{response.response.speech.plain.speech}}"
          - stop: Pass the Variable
            response_variable: response
        alias: Has conversation ID
      - conditions:
          - alias: No conversation_id
            condition: template
            value_template: "{{ conversation_id is not defined }} "
        sequence:
          - action: conversation.process
            metadata: {}
            data:
              agent_id: "{{ agent }}"
              text: "{{ agent_prompt }}"
            response_variable: response
          - stop: Pass the Variable
            response_variable: response
  - variables:
      result:
        instructions: >-
          Preface your answer with some form of attribution to the library
          extension, the Monastery, or Kronk the curator with something like
          "Here\'s what the Monastery has: " Use the returned "conversaton_id"
          in subsequent calls in order to continue the conversation with the
          monastery.
        conversation_id: "{{response.conversation_id}}"
        response: "{{response.response.speech.plain.speech}}"
  - stop: Complete
    response_variable: result
  - set_conversation_response: "{{result}}"
icon: mdi:church

All of the non OAI pipelines are pointing back at an OpenWebUI server living in Proxmox next to HA. This is where the fun starts people! I said we were gonna have a party?

1 Like

Another worthwhile FR for LLM folk

Ask to reduce the count of built-in tools and use multi use tools wherever possible. (it’s very possible) 1 task manipulation intent - not 6.

See above you only get 128.

Yes, my ProxMox installation will have Open WebUI installed, though whether I use the included Ollama or install that separately I haven’t decided. I’m also going to have some additional stuff running on the ProxMox side, like some form of user authentication, though I don’t know yet if I will try to integrate that into Home Assistant at this time. I’m also planning to install NextCloud and Jellyfin for document and media storage.

If you use the openAI integration YOU WILL eventually run into this good to know how to troubleshoot it.

We’re all hanging out having a conversation on local LLM gear here:

Local LLM hardware - Configuration / Voice Assistant - Home Assistant Community

…and as of Monday, we actually have a D1 DGX on reserve for Ms. Friday… :smiling_imp:

Nice. I should have the GPU for my first LLM farm unit (Codename Mr. Anderson) today and have it online by Sunday. He will be overseeing The Matrix…

1 Like