Future-proofing HA with local LLMs: Best compact, low-power hardware?

It is necessary to get a network interface to the Macs LLM and it is necessary to convert the HA Wyoming protocol to what the Macs LLM use.

Why is this part necessary?

You run Ollama on your mac which is either on your local network or available via something like Tailscale. Theres an API call ability, you can use their’s or the OpenAI compatible one . Someone has already done the work and made an integration to get that information into HA. The hard work is already done. You can use the year of the voice work to talk to your own endpoint or have the data from the end point read back to you if you’re focused on voice.

Or you can write your own scripts and ask for formatted data that you can use for your own entities. I personally use PyScript for all my automations, so it is possible to skip the official integration completely and talk directly to the end points if I need information from the LLM that won’t be strictly voice assistance.

1 Like

It is necessary, but that necessary part is what you describe as already available.
As I said I did not know what is available for on the MacMini.

I hadn’t thought about using two devices at home—one for HA and the other for Ollama—connected by a standardized API. That’s definitely a workable option for adding AI to an existing HA setup. Personally, though, I’d prefer a single, low-power device, as it will be stored in a tight cabinet in my living room. It would be cheaper than running two devices, and the hardware should be powerful enough for both HA and Ollama. Even my 4 year old MacBook Air M1 with 16GB RAM can handle medium-sized LLMs in Ollama.

I also wish the HA core team would offer clearer guidance on preparing for an AI-enabled HA, like recommended hardware and the roadmap. I’d love to upgrade from my current Pi4, and I’m still hoping we’ll eventually hear that HAOS (and future LLMs) can run natively on Apple Silicon, which I think would be the most stable setup.

It really depends on what you want from the AI.
The more general it should be the larger the LLM will have to be and the floating point operations needs to increase too.
The problem with LLMs is that when the size increase linearly the floating point operations needed typically increase exponentially.

The developer team is working hard on getting AI-enabled devices ready, but the aim is probably not to provide a finished solution, rather a framework where other developers and manufacturers can provide their piece and be sure that it can interact with other pieces.
At the moment it is impossible to recommend hardware or even a roadmap, because the development of AI and voice goes incredible fast and HA is forging a new road, which no one have taken before.

One is pretty sure and that is HA will never run natively on Apple Silicon.
It is developed on Python, because it is platform independent, so it is possible to develop to many hardware platforms at the same time.
Apple Silicon is actually a very small hardware base for HA, because it is too expensive, too locked up and too little used generally.

1 Like

Looks like it is in the roadmap.

I have signed up because I intend to try the same thing, I have a Proxmox “server” with:
i5 10th gen CPU
64 gb DDR4 ram
And I’m thinking of getting an Intel ARC A770 16gb to put in for the LLM side.

I think (hope) for homelab this is an excellent prospect for an affordable LLM card.

You REALLY REALLY REALLY want something with NPU OR a big honkin GPU for an LLM. It’s more then floating point it’s tensor math that stuff lights uo the silicon and why NVIDIA is basically printing cash right now.

Cpu and disk will not be the issues (if you have an appropriate NPU) instead put as much RAM in it as you can freaking afford. And spring for the box with the NPU if you want ‘future proof’ (agree there is currently no such animal as future proof AI)

For my Ollama setup I have a NUC14AI edition with 64G of ram and I’m STILL concerned with performance and doubt it will live my usual 3-yr deployment lifetime.

In fact, depending on perf profiling on the NUC14, I may keep my HA instance on My current NUC10

Also remember if yih do AI you’ll probably also want local Speech to Text/text to speech and probably even N install if Frigate to review videos. All of that adds to the box specs. You literally cannot put too much ram in (supported configuration of course) if those are your use cases. RAM and TOPS is the name of the game.

RAM in the motherboard is one thing, but you also need as much ram as possible on the GPU/Tensor card.

1 Like

ALL the RAM! :rofl:

I got an Nvdia Jetson Orin NX 16GB about 2 weeks ago. Luckily, HA had worked with Nvidia to get it all working, using GPU’s for whisper/piper working, plus you can add custom docker containers to the HA container as they run a special ARM version of Ubuntu. Seemed like it was going to be a huge headache. I was thinking of returning it (still might).

Then I decided to just add an entry under the Wyoming interface on my current HAOS server, point it to the IP of my Jetson and ports needed and it works just as good as when I was running just the Docker container on the Jetson! With that said, the models I am currently using are pretty old. There are some newer ones that are much smaller and don’t seem to work that good. Actions all locally are almost instant. I will say, HA Cloud is still better at not mixing your words up as much but local commands are slightly faster then HA cloud. Right now OpenWakeWord is on a wyoming satellite so HA isn’t streaming anything, that and I am using a USB speaker phone hooked into my jetson from HA. I just pointed it to port 10700 and I am running the assist-satellite container on the Jetson.

Still working on getting and LLM setup on it and get HA pointed to it. These things aren’t fun to work with, especially when stuff goes bad. I am not a Linux or Container guru but I know enough that it’s “different” Out of the box, it just froze on the last step. Spent hours troubleshooting because some key prerequisites just didn’t work. It has a dedicated USB C port for plugging another computer into, running Ubuntu 22.04… VM’s not recommended… One of their utilities “jtop” which shows how much GPU, CPU, ect (CLI resource manger)… is being used just “doesn’t” work anymore so I cant tell how much resources its even using. Might just wipe it and start over. Especially now that I know I don’t need to move everything to the Jetson but if I can just point piper and whisper to the Jetson, then I only need to run those containers. The difference between the 8GB model and 16GB model is 70TOPs vs 100TOPs so RAM really does matter. HA containers listed here

i will say, he max power mode on this thing is 25W is impressive, which I set it to. it’s on medium on of the box. Below are the specs. With that said, it was not cheap. I think the 8GB model was about 3/4th the price and came with the OS on an sdcard instead of an nvme drive. Also, they are apparently picky with nvme drives which is another reason I went with the 16GB model. Here are the full specs

1024 NVIDIA® CUDA® cores | 32 Tensor cores | End-to-end lossless
compression | Tile Caching | OpenGL® 4.6 | OpenGL ES 3.2 | Vulkan™
1.1
| CUDA 10 |

Here are the docker containers I am using. As you can see, Piper is 17GB and Whisper is 10GB alone. Newer version is around 728MB for each. I was having issues with it but the below work great. It just has issues picking up specific words that HA cloud doesn’t. Like attic, it thinks I am always saying 'added" or ‘addict’. TV noise in the background is particularly an issue but running the CPU based models on a roughly 3 year old mini PC, which is total overkill for my HA server, takes 3 to 5 seconds for local commands. Need to compare the 2 but it’s obvious.

dustynv/wyoming-openwakeword        latest-r36.2.0               e3f760f9cc65   7 months ago   994MB
dustynv/wyoming-assist-microphone   latest-r36.2.0               0ead157124bc   7 months ago   1.08GB
dustynv/homeassistant-core          latest-r36.2.0               2eb72d233ee8   7 months ago   3.24GB
dustynv/wyoming-whisper             latest-r36.2.0               0869f969c10b   7 months ago   10GB
dustynv/wyoming-piper               master-r36.2.0               619a537fc0bc   7 months ago   17.4GB

2 Likes

Anyone have experience with an AMD 8845HS mini pc? That CPU has some kind of AI support and I’m thinking of replacing my RPi4 with something beefier that can run other things too in Proxmox.

Honest if I’m going for a box with an AI capable proc /GPU right now in efforts of supporting AI/LLM Local voice, etc. (anything related to ollama)

Nvidia

EVERYTHING builds to Nvidia first and proc support for everything else in AI right now is a second thought.

I built a working Intel IPEX-ARC Ollama.

It was rough.

Id go Nvidia.

Every 6 months I come back to this and it looks like currently the best option is to still offload this to a service (Either HA cloud or ChatGPT). I put 5$ in there a year ago and not even halfway through. I would prefer 100% local for privacy but currently it looks like thats gonna cost ya over 1k$ to do plus the energy cost of having this always up. Hopefully we get better models and optimized hardware in the next year, but I feel like you need to hit under 500$ on a device that idles 15w or less and doesn’t take >10s to respond. Local AIs work great if I want to leave my 4090 running all the time but I’m not trying to nuke the whales here.

4 Likes

The more I work with my IPEX box (NUC14ProAI) it’s about the smallest I’d practically go. And you’re not running big context models. It’s fast for small context and I’m getting creative about workload splitting because of it. For that you’re currently in the $1200 USD space and will trend downward most likely as GPU/NPU in the minipc class becomes more common.

My gut says you get in your sweet spot in about 12 mo.? Ish?

Will HAOS support NPU’s. They are bring out NAS’s with NPUs now. (n5_pro – official store) Would be great if HAOS could take advantage of this. Thinks like Frigate etc could be using these.

Since HA does not use any of the NPU directly but (already) integrates with systems that do, it makes much more sense to keep HAOS light. Install the NPU heavy things on hardware that supports it. That way it runs on almost anything, rather than trying to allow users to install NPU drivers etc. for all kinds of needs on HAOS, making it complex and bulky.

2 Likes

I don’t think you need HAOS to support an npu for this case. Honestly if you’re doing local inference it’s NOT running as an addon in haos (which is where it would HAVE to live) instead you’re putting haos As a guest os on the same iron running inference - not the other way around.

Hi Guys,
I’m actually trying to do it too… Making my Home Assistant assist a real global assistant.
my Home Assistant actually runs on a RPI4, but i’ll migrate it on a mini PC with an intel N100. I’ve installed Ollama on an other computer with an old i7-2600k and i bought an RTX 3060 TI (second hand) for a very attractive price.
The best results i had until now are with QWEN 3:8b. This model runs well and is really fast. It includes tools for home assistant. What works and what doesn’t :

  • Activations
  • requesting sensors values
  • all areas/rooms, etc. are known and if i ask “where is Steffou” or “Is Steffou at home actually”, it can find quiet easy threw my smartphone entities (all are running tailscale to see each other).
  • asking "what are my next calendar entries ? " works too.
  • Weather is working too
  • All requests for date/time/day and night recogs, are also working well (thanks to the astro weather integration and an other one for the weather).
  • Todo list handling works weel too (add and requests)
    So i would say that everything related to the Home Assistant works well with a natural language…

What doesn’t work :

  • Web Search
  • Add new event in my calendars (local or distant)

So i tried to install open-WebUI on my PC with the llm and created my web search tool with google (i also tried with duckduckgo), but it significantly increase the response time, i mean… a lot ! I can probably improve that by writing a good system prompt, but i didn’t tried… yet…
My conclusion for now is that the best and easiest way is probably to add some new integrations to Home Assistant for these tools.

I also plan to try an other method when i’ll migrate my home assistant to the mini pc. Install open-webUI on the mini pc, with a local 4b llm model and let this couple be the “request router”. like if the request is about domotic, handle it, if it’s something else (general purpose), send it to my 8b model, but if the topic is complexe (like a coding topic), send it to a cloud llm (gemini, mistral or else). But i’ll have to try the 4b model first.

What do you guys think about it ?

1 Like

This Is what my Friday’s Party post is all about.

Qwen 3 4-8 b is the most solid I’ve found for local so far. That 3080 should do it nicely.

I finally found an alternative solution by following this example : LLM Agent for Google Search with Assist in Home Assistant - #6 by Musca
so i created a second agend using gemini, and an automation based on an single phrase to call it. When i need some informations that i know it requires a web search, i say “search on the internet [query]”, and assist send my request to this second agent. In that way, i split my domotic (still handled by my local llm), and the web search (handled by gemini). It’s not perfect but it works.
An other thing that doesn’t work is the scheduled tasks. i’m looking if something exists for that… basically, the best would be that the llm create an automation based on what i ask. For example, something like “do [action] at [time]” or “do [action] if [condition] (like weather or day/night)”. i don’t know if someone already managed to create such a tool ?