Not my work, but the author of this reached out to me and I would like to raise awareness of this: GitHub - machinefi/trio-core: Real-time vision intelligence engine for Apple Silicon. YOLO counting + VLM scene understanding + auto-calibration. REST API in one command. · GitHub
The pain point: a frustration with HA camera automations is that you can trigger on “person detected” but you can’t trigger on what that person is actually doing. The current workaround is Frigate plus GenAI through Ollama, but that takes 5 to 10 seconds per frame on a snapshot, not the live stream. That’s too slow for automations like “if someone is at the door with a package, unlock the locker” or “if the garage door is open after 10pm, send me an alert.”
Trio Core runs VLM on live camera feeds at 279ms per frame, fully local on Apple Silicon.
- Plain English conditions. Ask anything about the scene, no model retraining required
- KV cache reuse across video frames giving a 1.71x speedup on sequential frames
- 73% visual token compression with under 1% accuracy loss. This is why it’s fast where Ollama isn’t
- REST API built in. Just run
pip install trio-core[mlx] && trio serveand you get a local API at localhost:8100, ready for HA automations - No Docker, no MQTT, no API keys, no YAML. One pip install and you’re done
The goal is HA automations that understand scenes, not just object labels. Trio Core makes that possible without cloud dependency or multi-second latency.