Hey everyone. I’ve been working on a project to address an issue in my home, and it finally works well enough that I am not motivated to continue putting too much work into it, so figured I’d share what I have
The Problem
There are four cats in my household that all free feed.
When the cats go to the vet, we are unable to answer simple questions about if each cat is eating, drinking and using the litter box. The best we can say is “Food and water disappear, and the litter gets soiled, but we can’t say which cats are responsible.”
The Solution
It’d be nice if I could just take a quick look at Home Assistant and have this information available. It should be able to tell the difference between the cats, and should be able to handle the food and water bowls being moved about.
I set up cheap WiFi cameras pointing at the cats food bowls and litter boxes. I then used images from these cameras to train a YOLOv5 model to recognize the individual cats, the food and water bowls, and their litter boxes. (We use distinct bowls for the food versus the water).
I then wrote a program to run this trained model on an RTSP stream from these cameras, and to raise events if certain objects overlap. For example, there are rules such as “if Sammy’s detection box overlaps the Food Bowl detection box by 50% for 5 seconds then raise event Sammy Eating Food until they no longer overlap for 5 seconds”. These events show up as binary_sensors in Home Assistant.
Video of all interactions is also saved. In Home Assistant’s Media browser I can filter through videos by browsing, eg, “Videos, CatDrinkingWater, Basement, Kitty” to get all of the videos of Kitty drinking water in the basement .
The object detection can handle the bowls being moved about, and is probably ~90% accurate in telling the difference between the cats. Luckily there is video evidence of all interactions, so if it gets the cat wrong in a video then that video can become training data for it to do better next time.
Sample Output
Once the raw event data is in Home Assistant I create a few ‘template sensors’ to aggregate some data, and use apex-charts to graph it. Here’s a sample page:
(Yes, Black Cat is sick and not eating . Kitty loves her water, and the other cats prefer a water bowl that is not covered by a camera)
How it works
There are three ‘parts’ to this program, which interact with each other over MQTT:
- yolo2mqtt.py - Runs image recognition on camera streams and posts object detections to MQTT
- interactionTracker.py - listens for object detections on MQTT and checks if any objects overlap in ways that satisfy the conditions to raise an event which can be picked up by HomeAssistant MQTT Discovery.
- recordingManager.py - listens for events on MQTT and records the video to disk
Ideally, parts 1 and 3 could be replaced by Frigate. However, when I initially looked into Frigate I couldn’t immediately find information on training a custom model, and the model that comes with Frigate could not reliably tell the difference between a cat and a person, so I wasn’t overly hopeful I could train a model to tell the difference between specific cats… so I made my own, less polished, less efficient version
Where to get it
I’ve put the code up on github: https://github.com/cobryan05/Yolo2Mqtt.
I’d consider it somewhere after ‘alpha’ but before ‘beta.’ You should be able to get it to work, and it should be fairly reliable, but don’t expect a super polished experience. It’s also not as well optimized as, say, Frigate. I run 3 cameras and run object detection at 1fps on my CPU, and it uses about 15% of my Ryzen 7 5700G.
For this to be useful you really will want to train your own YOLO model on your own dataset. There is plenty of easily accessible information on how to do this. You can get basic recognition working with a remarkably small amount of training data (10s of images), though you’ll want a lot more if you need to differentiate between several similar looking animals (or delivery trucks or whatever you are using this to detect).
It runs in a docker container. The container currently runs on the CPU, since my HA box does not have a GPU so I could not test otherwise. It should be easy to switch to GPU, but untested (change DockerFile to use the ‘cuda’ version of pytorch, change yolo ‘device’ to ‘cuda’ in the config.yml)