I found View Assist which looks like an amazing opportunity to eventually replace alexa-based devices in my home.
Ideally, with this setup I can keep everything local and run without any cloud.
However, it is a WIP, so it took a while to get it working for me. Thus, I am documenting my experience here in case it helps others. The View Assist setup guide is pretty good, but there were a few places where I got lost (i.e., on the satellite, the best way to setup is simply install the View Assist Companion App).
Hardware:
a) a desktop running docker desktop with a NVIDIA gpu
b) HA server
c) cheap android 14 device running View Assist Companion App
Setup:
-
Obviously a running HA system.
-
Setup LocalAI in a docker container using the command (debug so I can see the HA-passed queries):
docker run -ti --name local-ai -e DEBUG=true -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12 -d -
Install the Home-LLM integration from HACS - configure access to my LocalAI system
3.1) in Home LLM integration Add Service to LocalAI, then Add Conversation agent for the Home-Llama-3.2-3B model.
3.2) Add the model with “Selected LLM API” set to “Assist”, “Enable Legacy Tool Calling” checked, “Refresh System Prompt Every Turn” On, rest at default. -
Setup an assist pipeline with wyoming-piper and wyoming-whisper (I eventually moved these to docker to take advantage of the gpu, but not necessary).
-
Configure View Assist integration (follow setup guide here)
-
Configure View Assist Companion App integration
Steps 2 and 3 above required a lot of trial and error on my part. I tried a bunch of different models, but the one I got working the best was from Home-LLM’s suggested setup - model file: Home-Llama-3.2-3B
Since the model is not in Local-AI’s directory, I also had to setup a config file which was copied from another llama config and I had to double the context_size: 16384 to allow for all of the prompt data not to choke the LLM.
Once these files are placed in LocalAI’s /models directory, need to restart the container so it reads them.
With my RTX 4060Ti responses are often faster than what I get from alexa < 3 seconds.
It is a little buggy, but pretty darn close to production ready for me.
It does timers, shopping list, device control, tells jokes, does math, etc.