I’m not sure how much more I can help, but I will explain my setup:
- The camera detects motion, takes a snapshot and sends the snapshot as a jpeg image file to a remote directory. It is also setup to ignore further motion detection for a period of time so as to not get a rapid fire sequence of motion events.
- The new snapshot image file is detected on the remote directory and Deepstack is called to analyze the image and provide results.
- Deepstack’s results are then further filtered say based on what kind of objects detected, confidence level, etc. and depending on these results, HA can be notified.
I use a Windows based tool called AI Tool which detects the new snapshot image file, and it is the one that calls Deepstack with the image, and filters Deepstack’s results. One of its filtering capabilities I like is that I can setup masks to ignore certain locations of the snapshot image from having objects detected. After filtering, AI Tool sends an MQTT message to HA. Once HA gets the message I use an automation to notify me of certain objects being detected.
Rob Cole has tried to implement some of the same capabilites as AI Tool using various custom HA integrations (again using Deepstack).
I think your bigger question is, if I were to use Yolov7, how could I do all of the above? the answer is I really don’t know.
Mostly yes for object detection, but see this link which says they have done something to remove some of the dependencies on Yolov5 codebase.
I should also mention that Deepstack also supports Person detection and custom training.