Using YOLO Object Recognition for detecting objects interacting with each other - Track pet feeding habits using RTSP cameras

Hey everyone. I’ve been working on a project to address an issue in my home, and it finally works well enough that I am not motivated to continue putting too much work into it, so figured I’d share what I have :slight_smile:

The Problem

There are four cats in my household that all free feed.

When the cats go to the vet, we are unable to answer simple questions about if each cat is eating, drinking and using the litter box. The best we can say is “Food and water disappear, and the litter gets soiled, but we can’t say which cats are responsible.”

The Solution

It’d be nice if I could just take a quick look at Home Assistant and have this information available. It should be able to tell the difference between the cats, and should be able to handle the food and water bowls being moved about.

I set up cheap WiFi cameras pointing at the cats food bowls and litter boxes. I then used images from these cameras to train a YOLOv5 model to recognize the individual cats, the food and water bowls, and their litter boxes. (We use distinct bowls for the food versus the water).

I then wrote a program to run this trained model on an RTSP stream from these cameras, and to raise events if certain objects overlap. For example, there are rules such as “if Sammy’s detection box overlaps the Food Bowl detection box by 50% for 5 seconds then raise event Sammy Eating Food until they no longer overlap for 5 seconds”. These events show up as binary_sensors in Home Assistant.

Video of all interactions is also saved. In Home Assistant’s Media browser I can filter through videos by browsing, eg, “Videos, CatDrinkingWater, Basement, Kitty” to get all of the videos of Kitty drinking water in the basement .

The object detection can handle the bowls being moved about, and is probably ~90% accurate in telling the difference between the cats. Luckily there is video evidence of all interactions, so if it gets the cat wrong in a video then that video can become training data for it to do better next time.

Sample Output

Once the raw event data is in Home Assistant I create a few ‘template sensors’ to aggregate some data, and use apex-charts to graph it. Here’s a sample page:

(Yes, Black Cat is sick and not eating :frowning: . Kitty loves her water, and the other cats prefer a water bowl that is not covered by a camera)

How it works

There are three ‘parts’ to this program, which interact with each other over MQTT:

  1. yolo2mqtt.py - Runs image recognition on camera streams and posts object detections to MQTT
  2. interactionTracker.py - listens for object detections on MQTT and checks if any objects overlap in ways that satisfy the conditions to raise an event which can be picked up by HomeAssistant MQTT Discovery.
  3. recordingManager.py - listens for events on MQTT and records the video to disk

Ideally, parts 1 and 3 could be replaced by Frigate. However, when I initially looked into Frigate I couldn’t immediately find information on training a custom model, and the model that comes with Frigate could not reliably tell the difference between a cat and a person, so I wasn’t overly hopeful I could train a model to tell the difference between specific cats… so I made my own, less polished, less efficient version :slight_smile:

Where to get it

I’ve put the code up on github: https://github.com/cobryan05/Yolo2Mqtt.

I’d consider it somewhere after ‘alpha’ but before ‘beta.’ You should be able to get it to work, and it should be fairly reliable, but don’t expect a super polished experience. It’s also not as well optimized as, say, Frigate. I run 3 cameras and run object detection at 1fps on my CPU, and it uses about 15% of my Ryzen 7 5700G.

For this to be useful you really will want to train your own YOLO model on your own dataset. There is plenty of easily accessible information on how to do this. You can get basic recognition working with a remarkably small amount of training data (10s of images), though you’ll want a lot more if you need to differentiate between several similar looking animals (or delivery trucks or whatever you are using this to detect).

It runs in a docker container. The container currently runs on the CPU, since my HA box does not have a GPU so I could not test otherwise. It should be easy to switch to GPU, but untested (change DockerFile to use the ‘cuda’ version of pytorch, change yolo ‘device’ to ‘cuda’ in the config.yml)

10 Likes

I just created an account here so that I could say: This is awesome! And I love it! Cloning your repo now to check it out! Good job and a very creative solution to check on your cats eating/drinking habbits :slight_smile:

1 Like

Although this was 100% made to fulfill a personal need, it was still a little bit disappointing to not get a single comment here after all of the work I put into it. Thank you for the kind words :slight_smile:

Let me know if you try to get it running and need any help!

2 Likes

Can you upgrade YOLOv5 to YOLOv7?
Latest version is much more productive.

I found this post on a goog search for “frigate” “yolo” (2nd link down). I’m looking to do something very similar, counting vehicles instead of cats. I can’t pull RTSP from the ip cams directly, because firewalls.

Instead, I’m cronning the scp of video files to a server with a public ip, then rsyncing them down to a homelab server, where training and testing will take place. (A wise engineer once said: a GPU in the cloud is worth twenty in the home).

I’m using the cheap Yi 1080p indoor cameras, with an open-source firmware that permits easier admin and custom behaviors.

I noticed you’re using YOLOv5. Don’t know if it matters, but from the same search I found a similar project, within this forum, /t/manual-yolo-v7-python-integration-in-ha/477500/3, which utilises YOLOv7.

I’m about to dive in to both projects, learn what’s applicable. I appreciate you posting about yours. The accuracy level you attained -90% cat specificity!- is encouraging. Especially given that the image area you’re working with is a small fraction of each frame.

Just out of curiosity, what camera make & model(s) are you using? While the Yi ip cams I have are cheap, they deliver poor image quality.

Cheers,
Kevin

Hey Kevin,
I’m just using 3 $20 WyzeCam v2’s (Wyze Cam v2) with the Dafang Hacks firmware (GitHub - EliasKotlyar/Xiaomi-Dafang-Hacks). Unfortunately they don’t sell these cameras anymore.

I actually had originally planned to do something very similar to you. Saving a video to the camera is far better quality than streaming over RTSP.

The image quality isn’t always amazing, but YOLO was trained on the low quality images and can at times do an amazing job picking cats out of noisy images. I do get false detections where it will think some other object in the scene is a cat, but since those false detections rarely overlap with the target bowls/litter it’s not been a huge issue for my use case, though you can see in my attached graphs below that the spikes in ‘time’ are usually from a false detection overlapping a target box.

YoloV7 sounds interesting! Unfortunately the cat that I wrote this for is no longer with us, and the system is running well unattended, so I don’t foresee my self doing more development work on this. I’m very happy to help others get this running and would be happy to discuss anything about my experience with it. But since it is working for me at the moment I’m not going to risk breaking it by touching the code myself :slight_smile:

i just wanted to thank you for the work you did on this and say im sorry for the loss of black cat.

to use this as a therapy session, my cat ran out in september and refused to come home. he did everything imaginable to NOT be caught. when i finally did catch him, he was too far gone to be saved and passed away in my arms en route to the ER. after insiting on an autopsy, they found he had an underlying bone marrow condition exacerbated by the starvation and dehydration and i truly think he ran away knowing he was going to die/was very sick.

riddled with guilt that i missed the signs in my multi cat household, ive been obsessively researching ways to track these exact metrics and stumbled on this.