Getting View Assist working with LocalAI

I found View Assist which looks like an amazing opportunity to eventually replace alexa-based devices in my home.

Ideally, with this setup I can keep everything local and run without any cloud.

However, it is a WIP, so it took a while to get it working for me. Thus, I am documenting my experience here in case it helps others. The View Assist setup guide is pretty good, but there were a few places where I got lost (i.e., on the satellite, the best way to setup is simply install the View Assist Companion App).

Hardware:
a) a desktop running docker desktop with a NVIDIA gpu
b) HA server
c) cheap android 14 device running View Assist Companion App

Setup:

  1. Obviously a running HA system.

  2. Setup LocalAI in a docker container using the command (debug so I can see the HA-passed queries):
    docker run -ti --name local-ai -e DEBUG=true -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12 -d

  3. Install the Home-LLM integration from HACS - configure access to my LocalAI system
    3.1) in Home LLM integration Add Service to LocalAI, then Add Conversation agent for the Home-Llama-3.2-3B model.
    3.2) Add the model with “Selected LLM API” set to “Assist”, “Enable Legacy Tool Calling” checked, “Refresh System Prompt Every Turn” On, rest at default.

  4. Setup an assist pipeline with wyoming-piper and wyoming-whisper (I eventually moved these to docker to take advantage of the gpu, but not necessary).

  5. Configure View Assist integration (follow setup guide here)

  6. Configure View Assist Companion App integration

Steps 2 and 3 above required a lot of trial and error on my part. I tried a bunch of different models, but the one I got working the best was from Home-LLM’s suggested setup - model file: Home-Llama-3.2-3B
Since the model is not in Local-AI’s directory, I also had to setup a config file which was copied from another llama config and I had to double the context_size: 16384 to allow for all of the prompt data not to choke the LLM.

Once these files are placed in LocalAI’s /models directory, need to restart the container so it reads them.

With my RTX 4060Ti responses are often faster than what I get from alexa < 3 seconds.

It is a little buggy, but pretty darn close to production ready for me.
It does timers, shopping list, device control, tells jokes, does math, etc.

1 Like