[Custom Integration] HA Video Vision - AI Video Analysis + Facial Recognition

HA Video Vision

AI-powered video analysis and facial recognition for Home Assistant cameras.

Installation

hacs_badge

Add as custom repository in HACS:
https://github.com/LosCV29/ha-video-vision

Features

  • :movie_camera: Real video analysis - sends video clips, not snapshots
  • :bust_in_silhouette: Facial recognition - “Who’s at the door?” → “It’s Carlos”
  • :free: Free by default - OpenRouter Nemotron model
  • :house: Local option - run on your own GPU
  • :iphone: Smart notifications - AI descriptions with snapshots

Supported Providers

Provider Model Cost
OpenRouter Nemotron 12B VL FREE
Google Gemini gemini-2.0-flash Free tier
Local vLLM Qwen-VL, LLaVA Free

Example Output

Instead of: “Motion detected on Front Porch”

You get: “A woman in a blue jacket is approaching the front door carrying a package. Identified: Mom (87% confidence)”

Works With

  • PolyVoice - Voice control: “Check the driveway”
  • Any RTSP camera - Reolink, Hikvision, etc.
  • Frigate - Trigger on events

Links

Feedback welcome!

1 Like

Looks super promising, looking forward to check it out. Am I correct in understanding that for those of us without a dedicated machine with a graphics card, this can still be used to send alerts if the camera spots say “a man carrying a parcel”? Just not do actual face recognition? Would an Nvidia Jetson Nano be useful for something like this - or is something beefier required?

Any chance of supporting Google Coral in the future like frigate does? Or does that not help here?

My apologies… As this is LLM based, even small models would be too much for a coral or Nano. If you want to try the features, you can definitely try with Open Routers free models. They work great - ive tested it.

Ah I see. So it’s not even possible to point the facial recognition at Gemini, for instance, to do the work?

The facial rec is a separate component. But its very light. The facial rec can be done by a CPU, no beefy equipment needed. Just spin up a server on any PC and point the integration at your facial rec server. You can run a mix of cloud for video capture and analysis and local for facial rec component. Hope that helps.

1 Like