Local realtime person detection for RTSP cameras

blakeblackshear · January 1, 2020, 7:44pm

It is the whole video feed, not specific to a region. You could verify by making your entire mask black. It shouldn’t detect any objects.

ebendl · January 1, 2020, 8:30pm

I have a trouble with one camera. I can ffplay rtsp://172.16.16.192:554/onvif1 it just fine.

But frigate seems to continue throwing out this issue:

On connect called
[rtsp @ 0x27e4060] Nonmatching transport in server reply
Traceback (most recent call last):
  File "detect_objects.py", line 134, in <module>
    main()
  File "detect_objects.py", line 79, in main
    cameras[name] = Camera(name, FFMPEG_DEFAULT_CONFIG, GLOBAL_OBJECT_CONFIG, config, prepped_frame_queue, client, MQTT_TOPIC_PREFIX)
  File "/opt/frigate/frigate/video.py", line 132, in __init__
    self.frame_shape = get_frame_shape(self.ffmpeg_input)
  File "/opt/frigate/frigate/video.py", line 55, in get_frame_shape
    frame_shape = frame.shape
AttributeError: 'NoneType' object has no attribute 'shape'

I thought the error might be the message:
[rtsp @ 0x27e4060] Nonmatching transport in server reply
but I’m not sure how to figure out what transport method it is supposed to use, or even if I’m doing the correct config to override it?

If have these global ffmpeg arguments:

ffmpeg:
   global_args:
     - -hide_banner
     - -loglevel
     - panic
   hwaccel_args: 
     - -hwaccel
     - vaapi
     - -hwaccel_device
     - /dev/dri/renderD128
     - -hwaccel_output_format
     - yuv420p
   input_args:
     - -avoid_negative_ts
     - make_zero
     - -fflags
     - nobuffer
     - -flags
     - low_delay
     - -strict
     - experimental
     - -fflags
     - +genpts+discardcorrupt
     - -vsync
     - drop
     - -rtsp_transport
     - tcp
     - -stimeout
     - '10000000'
     - -use_wallclock_as_timestamps
     - '1'
   output_args:
     - -vf
     - mpdecimate
     - -f
     - rawvideo
     - -pix_fmt
     - rgb24

and then this particular cam (1280x720, 15fps) is supposed to have:

  cat_cam:
    ffmpeg:
      input: rtsp://172.16.16.192:554/onvif1
      global_args: 
        - -hide_banner
        - -loglevel
        - info       
      # hwaccel_args: []
      input_args: 
        - -avoid_negative_ts
        - make_zero
        - -fflags
        - nobuffer
        - -flags
        - low_delay
        - -strict
        - experimental
        - -fflags
        - +genpts+discardcorrupt
        - -vsync
        - drop
        - -rtsp_transport
        - udp
        - -stimeout
        - '10000000'
        - -use_wallclock_as_timestamps
        - '1'
       # output_args: []
#
    take_frame: 5 #15fps, so bring it down to 3
#
    regions:
      - size: 400
        x_offset: 280
        y_offset: 275
#        objects:
#          person:
#            min_area: 5000
#            max_area: 1000000
#            threshold: 0.5
      - size: 400
        x_offset: 680
        y_offset: 275

Any ideas?

blakeblackshear · January 1, 2020, 8:46pm

I’m assuming you have tried tcp instead of udp right? That error message usually means you have the wrong one specified.

ebendl · January 1, 2020, 9:20pm

Yup, tried tcp originally, then tried udp. Running ffmpeg from the command line with -rtsp_transport tcp gave me the same error, but it looks like udp works (I was able to stream to a local file).

Is there a chance that my config is wrong and it isn’t specifically sending through the per-cam settings? I tried including all the parts (global_args, hwaccel_args, input_args and output_args) but still the same issue.

I even tried changing the global ffmpeg settings to udp but still get the same error.

Edit: I wonder if it might be this? https://answers.opencv.org/question/120699/can-opencv-310-be-set-to-capture-an-rtsp-stream-over-udp/

blakeblackshear · January 2, 2020, 12:58am

OpenCV isn’t being used to capture the stream anymore unless you are still on the CPU version. I switched away from it for exactly these kinds of limitations. The logs will output the exact ffmpeg command for each camera so you can make sure your config is being applied.

ebendl · January 2, 2020, 9:25am

OK thanks, good to know OpenCV isn’t part of the issue.

This seems to go wrong before that ffmpeg command is output though? I know the ffmpeg commands usually get output (and it works fine for the other two cameras I have when I run only those two), but in this instance it seems it starts up, throws the exception and then doesn’t get to the point where it prints out the ffmpeg command. The log I posted earlier is literally the first couple of lines after I start up the Docker container and nothing seems to happen afterwards (I have to stop/restart the container).

I will probably debug a little bit further tonight (back at work) and go through the code to see if I can output the ffmpeg command earlier.

blakeblackshear · January 2, 2020, 12:46pm

You are right. I do use OpenCV to inspect the stream so I know how many bytes to pull for each frame. I wonder if I can do that another way.

ebendl · January 2, 2020, 12:51pm

Ah ok, thanks for checking!

Not the end of the world – not a camera I am planning on using long term. But I guess still something that could be bothersome to someone else.

blakeblackshear · January 2, 2020, 12:51pm

Can you see if this command works and outputs the resolution correctly for you?

ffprobe -v error -select_streams v:0 -show_entries stream=width,height -of json "rtsp://172.16.16.192:554/onvif1"

ebendl · January 2, 2020, 12:54pm

It looks correct, yes:

$ ffprobe -v error -select_streams v:0 -show_entries stream=width,height -of json "rtsp://172.16.16.192:554/onvif1"
{
    "programs": [

    ],
    "streams": [
        {
            "width": 1280,
            "height": 720
        }
    ]
}

blakeblackshear · January 2, 2020, 1:09pm

Ok. I will open an issue to use ffprobe instead of OpenCV.

blakeblackshear · January 2, 2020, 1:22pm

I made good progress, but I wasn’t able to finish the next version yet. Dynamic regions and object tracking is requiring almost a complete rewrite. Here is a preview of what will be in the next version. There are 3 overlapping regions that partially see a bicycle, and the green box is a dynamically created region.

scstraus · January 2, 2020, 8:05pm

What are the overlapping regions for if the regions are dynamically generated? Does it need them to kickstart the process of finding people or will they eventually go away?

blakeblackshear · January 2, 2020, 9:46pm

They are two distinct problems.

First, frigate needs to know where to look when no known objects have been seen. That can be done a variety of ways: you can tell it where to look (current implementation of regions), it can try to look everywhere, it could learn where to look based on past detections (one day), etc. I may end up making it optional to specify regions and just assume I divide the image into as few regions as possible as a default. However, you will almost always get better accuracy if you specify a region (unless your camera resolution is 300x300). For example, with one large region on the above image, the bike will score ~20%. With a more tightly defined region, it scores consistently above 90%.

Once an object is detected, frigate is checking to see whether or not the region included the entire object. If it was too close to the edge of the region, frigate computes a new “dynamic” region and runs detection again until it is sure it captures the the entire object. With the current version, a partial bike would be detected by all 3 regions and low scores.

I have to have complete bounding boxes in order to identify and track objects as they move across the camera.

scstraus · January 3, 2020, 10:26am

I had thought about this a bit, and I liked the idea of using your old motion detection code from the CPU version for this. It was very good at determining the size and location of an object as a starting point, so that the tensorflow engine could start with a good bounding box and get an accurate read.

blakeblackshear · January 3, 2020, 11:31am

I had that thought too. It took a good amount of CPU though. I would like to find a more efficient way.

scstraus · January 3, 2020, 5:04pm

Would be nice to have multiple options. I have an idea how I’d like to do the bounding boxes for an ideal setup, but it involves doing about 6-9 boxes per camera which takes an awful lot of time for 4 cameras… I’d be happy to sacrifice some CPU cycles to get a good starting point until I get off my lazy ass and do a proper job.

blakeblackshear · January 4, 2020, 7:39pm

Implemented object tracking this morning, and it is working fairly well: https://imgur.com/a/aznQV6p

I started with simple centroid tracking, so sometimes an object gets switched. There a lot of techniques for more advanced tracking that I may wait until later to implement. For now, I think I am going to get everything cleaned up so I can get a beta release out this weekend.

scstraus · January 4, 2020, 11:58pm

Looks really solid. So great to have it change the region size as you get further away. My cameras go pretty far back so that’s a big help.

tc23 · January 5, 2020, 2:07am

Looks awesome! What kind of use case/automation were you thinking of for something like this?