Future-proofing HA with local LLMs: Best compact, low-power hardware?

Larrikin · November 12, 2024, 2:56pm

It is necessary to get a network interface to the Macs LLM and it is necessary to convert the HA Wyoming protocol to what the Macs LLM use.

Why is this part necessary?

You run Ollama on your mac which is either on your local network or available via something like Tailscale. Theres an API call ability, you can use their’s or the OpenAI compatible one . Someone has already done the work and made an integration to get that information into HA. The hard work is already done. You can use the year of the voice work to talk to your own endpoint or have the data from the end point read back to you if you’re focused on voice.

Or you can write your own scripts and ask for formatted data that you can use for your own entities. I personally use PyScript for all my automations, so it is possible to skip the official integration completely and talk directly to the end points if I need information from the LLM that won’t be strictly voice assistance.

WallyR · November 12, 2024, 6:30pm

It is necessary, but that necessary part is what you describe as already available.
As I said I did not know what is available for on the MacMini.

aleco · November 14, 2024, 12:47pm

I hadn’t thought about using two devices at home—one for HA and the other for Ollama—connected by a standardized API. That’s definitely a workable option for adding AI to an existing HA setup. Personally, though, I’d prefer a single, low-power device, as it will be stored in a tight cabinet in my living room. It would be cheaper than running two devices, and the hardware should be powerful enough for both HA and Ollama. Even my 4 year old MacBook Air M1 with 16GB RAM can handle medium-sized LLMs in Ollama.

I also wish the HA core team would offer clearer guidance on preparing for an AI-enabled HA, like recommended hardware and the roadmap. I’d love to upgrade from my current Pi4, and I’m still hoping we’ll eventually hear that HAOS (and future LLMs) can run natively on Apple Silicon, which I think would be the most stable setup.

WallyR · November 14, 2024, 1:57pm

It really depends on what you want from the AI.
The more general it should be the larger the LLM will have to be and the floating point operations needs to increase too.
The problem with LLMs is that when the size increase linearly the floating point operations needed typically increase exponentially.

The developer team is working hard on getting AI-enabled devices ready, but the aim is probably not to provide a finished solution, rather a framework where other developers and manufacturers can provide their piece and be sure that it can interact with other pieces.
At the moment it is impossible to recommend hardware or even a roadmap, because the development of AI and voice goes incredible fast and HA is forging a new road, which no one have taken before.

One is pretty sure and that is HA will never run natively on Apple Silicon.
It is developed on Python, because it is platform independent, so it is possible to develop to many hardware platforms at the same time.
Apple Silicon is actually a very small hardware base for HA, because it is too expensive, too locked up and too little used generally.

WallyR · November 14, 2024, 9:38pm

Looks like it is in the roadmap.

senectus · November 21, 2024, 6:05am

I have signed up because I intend to try the same thing, I have a Proxmox “server” with:
i5 10th gen CPU
64 gb DDR4 ram
And I’m thinking of getting an Intel ARC A770 16gb to put in for the LLM side.

I think (hope) for homelab this is an excellent prospect for an affordable LLM card.

NathanCu · November 21, 2024, 6:16am

You REALLY REALLY REALLY want something with NPU OR a big honkin GPU for an LLM. It’s more then floating point it’s tensor math that stuff lights uo the silicon and why NVIDIA is basically printing cash right now.

Cpu and disk will not be the issues (if you have an appropriate NPU) instead put as much RAM in it as you can freaking afford. And spring for the box with the NPU if you want ‘future proof’ (agree there is currently no such animal as future proof AI)

For my Ollama setup I have a NUC14AI edition with 64G of ram and I’m STILL concerned with performance and doubt it will live my usual 3-yr deployment lifetime.

In fact, depending on perf profiling on the NUC14, I may keep my HA instance on My current NUC10

Also remember if yih do AI you’ll probably also want local Speech to Text/text to speech and probably even N install if Frigate to review videos. All of that adds to the box specs. You literally cannot put too much ram in (supported configuration of course) if those are your use cases. RAM and TOPS is the name of the game.

WallyR · November 21, 2024, 8:04am

RAM in the motherboard is one thing, but you also need as much ram as possible on the GPU/Tensor card.

NathanCu · November 21, 2024, 8:05am

ALL the RAM!