Local realtime person detection for RTSP cameras

Yep, and it works perfectly with my other Unifi cameras, across multiple zones.

Huh. Iā€™m not really sure then. Maybe the dafanged implementation is buggy?

Yeah, Iā€™ll have to check that GitHub and see if there are any discussions on the frame rate. Either way, hopefully your PR solves the problem :smiley:

I added a Ko-fi link for anyone who wants to contribute to the project because several of you asked for it. Thanks for your support.

@cjackson234 @scstraus @Kyle

5 Likes

I seem to be getting notifications for very small regions when my min_person_area is something absurd like 2500.

Is this conditional backwards? Iā€™m not sure if Iā€™m understanding it correctly but wouldnā€™t it want to be flipped to only exit the loop and continue if itā€™s not smaller?

Also, is there a chance that the area size can also be added to the label here to help with debugging person sizes?

The conditional looks right. It should continue at that point to avoid report a person that is too small. Are you sure you have the minimum set on the correct region?

Adding the person area to the label should be simple enough. It will help me keep track of that request if you create a github issue for it.

Thanks! Iā€™ve submitted 2 issues. One for the enhancement, and one for the issue Iā€™m encountering. Sorry for adding so many lately.

It has my config listed, which I believe is correct? https://github.com/blakeblackshear/frigate/issues/43

What is Unable to grab frame indicative of? I was able to solve my queue full problem. But now, after about 15-60 seconds, I get a blast of about 25 lines of Unable to grab frame? Is it a FPS problem or bitrate?

That means opencv returned an error code for a single frame. It could be anything, but as long as it clears up on its own, I would just ignore it. Probably just a few corrupt frames.

Hmm well it usually results in it terminating the capture process and restarting on that rstp. Now Iā€™m wondering if its just a poor quality feed coming from the wyze camera.

That would be my guess. Itā€™s doing exactly what it should when the RTSP feed is failing. After a certain number of failed frames, it establishes a completely new capture process.

I know you responded yesterday to this, but I think the conditional is behaving as filtering the largest allowed size in the current form.

Just as a tip in case anyone needs it in the future, I was able to get my Dafang Wyze camera to work using FixedQp Video Format with 1600x900, 5000 bitrate, and 15fps with motion detection and audio turned off. With these settings, the feed has only corrupted/terminated 4 times in 24hrs, which is a massive improvement. For my Unifi cameras, Iā€™m using the 1280 resolution with 5fps. Thanks!

2 Likes

Great project.

@blakeblackshear have you looked at feeding streaming video packets (h264) directly into a TF model? I wonder if the camera built-in hardware compression to h264 can be a leverage to reduce processing time, reduce CPU and memory requirements.

1.If there is little change between frames, there wouldnā€™t be much new data to process by the TF model. Only some of the DNN parameters would have to be recalculated.
2. If the TF model can process directly the media stream, there wouldnā€™t be a need for the intermediate step of taking image snapshots from the video feed. This could save CPU and memory resources.
3. Latency and packet loss. If there are occasionally packet delays or losses, the TF model could potentially learn about them and correct for these errors. Denoising is an area of much progress in DNN lately.
4. Correction for low light (night time or foggy weather). With more compute resources available, It might be possible to apply additional DNN layers for improving detection in low light images.

I havenā€™t thought of that, but it would require training a completely new model from scratch as the shape of the data would look completely different. The challenge is probably that all the good training data is currently images. Have you seen anyone do this?

Even if it was possible, Iā€™m not sure the ROI would be very high. Detection is currently <1s, and with hardware accelerated h264 decoding the CPU usage is much less as well.

Apparently i canā€™t buy or ship a Coral to Australia. WTF???
Anyone want to buy one and iā€™ll payback through paypal?

I have 7 cameras, ideally would like to do 11 regions (or more) in total. I canā€™t seem to even get 5 cameras to be stable without filling the queue. I have these turned down to 4 FPS, not sure how else I can optimize.

Using Windows (or OSX) computers, using Virtualbox (USB3 passthrough) to share the USB to the hostā€¦ Hmmm.

I would start by trying to see what your inference times are. If you are under 10ms (I am averaging 7-8ms), you should be able to do 100 regions per second. With 7 cameras at 4fps, you should be able to do 3 regions per camera and still stay under 100 per second. If not, there is probably some overhead with the way VirtualBox passes the USB through, or another reason your machine may not be able to max out the USB speed of the Coral.

Thanks. Perhaps a stupid question but how can I check my inference times?

I wrote a quick benchmarking script and included it in the repo here: https://github.com/blakeblackshear/frigate/blob/8218ea569974b83a470b40bf684319a9cac5b05f/benchmark.py

It is not in any of the published docker images, so you will need to checkout the repo and build it yourself. Not sure how technical you are.