Manual Yolo V7 (Python) integration in HA

I’ve been tinkering around with different object detection integrations (Frigate, Deepstack, Viseron, Doods), but none of them actually satisfied me in terms of the quality of night person detection.
As Yolo v7 came out, I found it the most accurate one, besides it works perfectly with Nvidia GPU via CUDA.

I’m looking for a method of seamless integration within HA.
What I need is to take the output from detect.py using IPcam RTSP address:

python detect.py --weights yolov7-e6e.pt --conf 0.60 --source rtsp://192.168.0.4:554/axis-media/media.amp?streamprofile=mobile

which look like:

0: 1 car, Done. (117.6ms) Inference, (4.2ms) NMS
0: 1 car, Done. (150.5ms) Inference, (3.6ms) NMS
0: Done. (123.8ms) Inference, (1.0ms) NMS
0: Done. (126.3ms) Inference, (1.0ms) NMS
0: Done. (105.5ms) Inference, (1.0ms) NMS
0: 1 person, Done. (142.7ms) Inference, (5.2ms) NMS
0: 1 person, Done. (93.9ms) Inference, (3.0ms) NMS
0: 1 person, Done. (92.2ms) Inference, (3.0ms) NMS

and create sensors that would show ON/OFF (or 1/0) that would indicate a detection of a particular object and send a picture of it by Telegram.

Kindly asking community to help with:

  1. Proper parsing of detect.py results and converting them to understandable by HA parameters. I need to detect only persons & cars (--class 0 1 for Yolo)
  2. Advise how to grab a detected object picture and utilize it in HA.

Thanks in advance!

Hi,
Thanks for the update on a new version of Yolo. I played with v5 a couple of years ago, and it was OK (but expected more). I haven’t done anything more since then.

May I ask what GPU you use?

I’m trying to think about an answer to your question. A good number of HA people using Yolov5 are using it via Deepstack which provides a REST interface.

When running yolo via python and getting the results as you have shown, one idea is to create a command line sensor for a car and runt the command as shown but in addition, pipe the results through grep -c car. Do likewise for another sensor for person. In your example, the output is 2 for car and 3 for person (it would be 0 for none present).

A more elaborate idea maybe would be to to see if anyone has built a TorchServe based setup for Yolov7. Here is a TorchServe example I played with for Yolov5. This provides a REST interface and as such this can easily be incorporated into HA as a REST sensor. I went ahead and googled around but didn’t find anything on this for Yolov7. However I did come across the Yolov7 support for something called a triton-inference server which may do the same thing (I don’t know really).

Hi Tommy!

I’m using GeForce GTX1650 on the machine with HA.
I am looking for a rational sequence of actions that need to be done to reach my final goal of accurate object detection.
So I though initially I could run inference on RTSP stream, but now it seems to be pretty tough job for the server (I have 6 cameras, and 6 streams will be an overkill).

  • I can trigger command line sensor as you suggested by a HTTP GET command incoming from camera motion detection. Then I need to process an RTSP stream and check if there’s an object of interest (let’s say - person) with a target confidence level --conf 0.60.

  • Now I need to save the image in case an object of interest is found (how can I do it with rewriting of JPEG’s in my runs/exp folders?). Is there a way to save a separate object of interest for each camera with rewriting JPEG’s, but not producing exp2/exp3/exp4 folders & so on?

  • If I manage to do the previous point, I can send a triggered image by Telegram and that would be at least an initial viable solution.

Kindly asking you to help with this.
Thanks!

P.S. I think Triton-inference is something different.
P.P.S Does Deepstack use YoloV5?

I’m not sure how much more I can help, but I will explain my setup:

  • The camera detects motion, takes a snapshot and sends the snapshot as a jpeg image file to a remote directory. It is also setup to ignore further motion detection for a period of time so as to not get a rapid fire sequence of motion events.
  • The new snapshot image file is detected on the remote directory and Deepstack is called to analyze the image and provide results.
  • Deepstack’s results are then further filtered say based on what kind of objects detected, confidence level, etc. and depending on these results, HA can be notified.

I use a Windows based tool called AI Tool which detects the new snapshot image file, and it is the one that calls Deepstack with the image, and filters Deepstack’s results. One of its filtering capabilities I like is that I can setup masks to ignore certain locations of the snapshot image from having objects detected. After filtering, AI Tool sends an MQTT message to HA. Once HA gets the message I use an automation to notify me of certain objects being detected.

Rob Cole has tried to implement some of the same capabilites as AI Tool using various custom HA integrations (again using Deepstack).

I think your bigger question is, if I were to use Yolov7, how could I do all of the above? the answer is I really don’t know.

Mostly yes for object detection, but see this link which says they have done something to remove some of the dependencies on Yolov5 codebase.
I should also mention that Deepstack also supports Person detection and custom training.

For night time person detection, you could discover or create a custom dataset and then train and test yolov5/v7 models on it. I suspect there wouldn’t be too much difference in accuracy, but maybe some difference in speed. However I recommend that the best course of action is to acquire better quality night time images through choice of suitable camera (with good night mode) and doing some experimentation on the image contrast to find the optimum settings for your model

@robmarkcole
Do you mean custom-trained dataset in Deepstack, right?
I used DeepStack ExDark, but it made no difference against the standard dataset.
If Deepstack uses YoloV5, I think I’ll give a try to custom training.
It seems that Yolo V5 vs. V7 makes no significant difference in terms of quality, just speed may be higher in the latter.

1 Like

I’ve run across Deepstack latest version and first thing I encountered while testing thru Python script on developer’s site was:

Traceback (most recent call last):
  File "D:/Python Stuff/deepstack/deepstack-test.py", line 7, in <module>
    for object in response["predictions"]:
KeyError: 'predictions'

It looks like the GPU latest version is buggy. I’ve recalled the last time I tested it - nothing changed. Looks like it’s time to search for native Yolo manual integration or CodeProject.AI server.

1 Like

@SuperMaximus check this comment KeyError: 'predictions' · Issue #240 · robmarkcole/HASS-Deepstack-object · GitHub

Another option: NVIDIA Jetson (On Device) - Roboflow

Looks like it’s not my case.
I’ve just fresh installed Deepstack with newly deployed API key in Docker container.

So guys, I finally decided to go for Yolo V7 as it doesn’t need training for my basic needs (person / car detection from IP camera at nighttime).

Kindly asking you to help on how it’s logically better to integrate into HA.
My plan is to initiate a motion detection trigger from my VMS (NX Witness) and send an HTTP GET request to HA (it may be NodeRED).
After that I need to make sure there is a person or a car in order to trigger an alarm. If it’s another object - ignore.

My main concern is how to launch python detect.py inference. I can continue launching it every 15 seconds during the time while the motion lasts, but then there’s a question - how to avoid the constant sending of alarming pictures (I will use notify.telegram) with the same object?
Need to arrange the logic so it would act the same way as Frigate.

@robmarkcole
Is it difficult for you to refactor this handy API

for YOLOV7?
Thanks for your feedbacks & support!

how to even just get yolo3 let alone 5 or 7 working in frigate???

I think impossible.
Frigate was made to work with Tensorflow.
But the interface of Frigate is beyond comparison, it’s the best.

This is why i like frigate. but its nighttime performance really sucks.