Future-proofing HA with local LLMs: Best compact, low-power hardware?

It is necessary to get a network interface to the Macs LLM and it is necessary to convert the HA Wyoming protocol to what the Macs LLM use.

Why is this part necessary?

You run Ollama on your mac which is either on your local network or available via something like Tailscale. Theres an API call ability, you can use their’s or the OpenAI compatible one . Someone has already done the work and made an integration to get that information into HA. The hard work is already done. You can use the year of the voice work to talk to your own endpoint or have the data from the end point read back to you if you’re focused on voice.

Or you can write your own scripts and ask for formatted data that you can use for your own entities. I personally use PyScript for all my automations, so it is possible to skip the official integration completely and talk directly to the end points if I need information from the LLM that won’t be strictly voice assistance.

It is necessary, but that necessary part is what you describe as already available.
As I said I did not know what is available for on the MacMini.

I hadn’t thought about using two devices at home—one for HA and the other for Ollama—connected by a standardized API. That’s definitely a workable option for adding AI to an existing HA setup. Personally, though, I’d prefer a single, low-power device, as it will be stored in a tight cabinet in my living room. It would be cheaper than running two devices, and the hardware should be powerful enough for both HA and Ollama. Even my 4 year old MacBook Air M1 with 16GB RAM can handle medium-sized LLMs in Ollama.

I also wish the HA core team would offer clearer guidance on preparing for an AI-enabled HA, like recommended hardware and the roadmap. I’d love to upgrade from my current Pi4, and I’m still hoping we’ll eventually hear that HAOS (and future LLMs) can run natively on Apple Silicon, which I think would be the most stable setup.

It really depends on what you want from the AI.
The more general it should be the larger the LLM will have to be and the floating point operations needs to increase too.
The problem with LLMs is that when the size increase linearly the floating point operations needed typically increase exponentially.

The developer team is working hard on getting AI-enabled devices ready, but the aim is probably not to provide a finished solution, rather a framework where other developers and manufacturers can provide their piece and be sure that it can interact with other pieces.
At the moment it is impossible to recommend hardware or even a roadmap, because the development of AI and voice goes incredible fast and HA is forging a new road, which no one have taken before.

One is pretty sure and that is HA will never run natively on Apple Silicon.
It is developed on Python, because it is platform independent, so it is possible to develop to many hardware platforms at the same time.
Apple Silicon is actually a very small hardware base for HA, because it is too expensive, too locked up and too little used generally.

Looks like it is in the roadmap.

I have signed up because I intend to try the same thing, I have a Proxmox “server” with:
i5 10th gen CPU
64 gb DDR4 ram
And I’m thinking of getting an Intel ARC A770 16gb to put in for the LLM side.

I think (hope) for homelab this is an excellent prospect for an affordable LLM card.

You REALLY REALLY REALLY want something with NPU OR a big honkin GPU for an LLM. It’s more then floating point it’s tensor math that stuff lights uo the silicon and why NVIDIA is basically printing cash right now.

Cpu and disk will not be the issues (if you have an appropriate NPU) instead put as much RAM in it as you can freaking afford. And spring for the box with the NPU if you want ‘future proof’ (agree there is currently no such animal as future proof AI)

For my Ollama setup I have a NUC14AI edition with 64G of ram and I’m STILL concerned with performance and doubt it will live my usual 3-yr deployment lifetime.

In fact, depending on perf profiling on the NUC14, I may keep my HA instance on My current NUC10

Also remember if yih do AI you’ll probably also want local Speech to Text/text to speech and probably even N install if Frigate to review videos. All of that adds to the box specs. You literally cannot put too much ram in (supported configuration of course) if those are your use cases. RAM and TOPS is the name of the game.

RAM in the motherboard is one thing, but you also need as much ram as possible on the GPU/Tensor card.

1 Like

ALL the RAM! :rofl:

I got an Nvdia Jetson Orin NX 16GB about 2 weeks ago. Luckily, HA had worked with Nvidia to get it all working, using GPU’s for whisper/piper working, plus you can add custom docker containers to the HA container as they run a special ARM version of Ubuntu. Seemed like it was going to be a huge headache. I was thinking of returning it (still might).

Then I decided to just add an entry under the Wyoming interface on my current HAOS server, point it to the IP of my Jetson and ports needed and it works just as good as when I was running just the Docker container on the Jetson! With that said, the models I am currently using are pretty old. There are some newer ones that are much smaller and don’t seem to work that good. Actions all locally are almost instant. I will say, HA Cloud is still better at not mixing your words up as much but local commands are slightly faster then HA cloud. Right now OpenWakeWord is on a wyoming satellite so HA isn’t streaming anything, that and I am using a USB speaker phone hooked into my jetson from HA. I just pointed it to port 10700 and I am running the assist-satellite container on the Jetson.

Still working on getting and LLM setup on it and get HA pointed to it. These things aren’t fun to work with, especially when stuff goes bad. I am not a Linux or Container guru but I know enough that it’s “different” Out of the box, it just froze on the last step. Spent hours troubleshooting because some key prerequisites just didn’t work. It has a dedicated USB C port for plugging another computer into, running Ubuntu 22.04… VM’s not recommended… One of their utilities “jtop” which shows how much GPU, CPU, ect (CLI resource manger)… is being used just “doesn’t” work anymore so I cant tell how much resources its even using. Might just wipe it and start over. Especially now that I know I don’t need to move everything to the Jetson but if I can just point piper and whisper to the Jetson, then I only need to run those containers. The difference between the 8GB model and 16GB model is 70TOPs vs 100TOPs so RAM really does matter. HA containers listed here

i will say, he max power mode on this thing is 25W is impressive, which I set it to. it’s on medium on of the box. Below are the specs. With that said, it was not cheap. I think the 8GB model was about 3/4th the price and came with the OS on an sdcard instead of an nvme drive. Also, they are apparently picky with nvme drives which is another reason I went with the 16GB model. Here are the full specs

1024 NVIDIA® CUDA® cores | 32 Tensor cores | End-to-end lossless
compression | Tile Caching | OpenGL® 4.6 | OpenGL ES 3.2 | Vulkan™
1.1
| CUDA 10 |

Here are the docker containers I am using. As you can see, Piper is 17GB and Whisper is 10GB alone. Newer version is around 728MB for each. I was having issues with it but the below work great. It just has issues picking up specific words that HA cloud doesn’t. Like attic, it thinks I am always saying 'added" or ‘addict’. TV noise in the background is particularly an issue but running the CPU based models on a roughly 3 year old mini PC, which is total overkill for my HA server, takes 3 to 5 seconds for local commands. Need to compare the 2 but it’s obvious.

dustynv/wyoming-openwakeword        latest-r36.2.0               e3f760f9cc65   7 months ago   994MB
dustynv/wyoming-assist-microphone   latest-r36.2.0               0ead157124bc   7 months ago   1.08GB
dustynv/homeassistant-core          latest-r36.2.0               2eb72d233ee8   7 months ago   3.24GB
dustynv/wyoming-whisper             latest-r36.2.0               0869f969c10b   7 months ago   10GB
dustynv/wyoming-piper               master-r36.2.0               619a537fc0bc   7 months ago   17.4GB