Local realtime person detection for RTSP cameras


It is really built around the use case of real time analysis of an RTSP. It could be modified to process video files, but it looks like foscam supports RTSP. Why not use the live RTSP feed?


@blakeblackshear, I originally decided to keep the network traffic to a minimum and let the camera’s motoin detection be the trigger, push the file and then send notifications over Telegram (with mpeg4 video converted to animated gif). Using RTSP is most likely a suitable option but I reckon it would be a great feature for Frigate if it could also handle offline files ? :slight_smile:


Looking at the opencv docs, I think you could just pass in a path to a file as the RTSP_URL and it would process it. However, the entire processing pipeline is built around real-time so MQTT messages wouldn’t be very informative. I don’t have a use case myself for processing files, so I am unlikely to add support for it any time soon. Since it’s open source, someone could create something based on this or open a pull request.


@blakeblackshear, thanks for investigating the opencv docs - I’ll see what I can work out and get back to you once I get a machine capable of running it. Do you have a clear understanding of requirements for Frigate and the products it’s using ? an i5 1.3 ghz NUC with 16gb RAM should be adequate, yeah ?


I am running it on a lower powered NUC than that already. You will definitely be able to get it running, but not sure how many regions/cameras it will handle. In the future, I expect something like Google’s new Coral edge TPU and Tensorflow Lite will make running this on lower powered hardware much easier. Everyone is working on faster ways to do inference at the edge, so it will only get better.


When setting the regions - is there a limit to how many you can list in the docker run command? In your example here you have 3, but in your GitHub you have 2 - so can you add as many/few as needed?

Also, I just wanted to confirm (as I am taking my firs adventure into this) all the values for the region are required?

Last question - if you are running this on multiple cameras will you need to make a container for each camera?



You can list as many regions as you want. All values are required at the moment. You will need a separate container for each camera.


Very interesting component, nicely done! I was attempting something similar myself but it wouldn’t have turned out this nicely. I am trying it out but am getting “Unable to capture video stream”. Here is what I’m given when I go to the RTSP URL of my camera. Do you see anything it would choke on? Is H.264 a problem?

<StreamingChannel xmlns="http://www.hikvision.com/ver10/XMLSchema" version="1.0">


I would try getting a RTSP url working in VLC. If that works, you should be able to pass the same url to the container.


Got an error - but shortly after realized my mistake.
Now I’m happy to report I got this working!
Followed your suggestions above (for .pb and label map).
Set up four zones (thinking this would help distribute the load across multiple cores).
When it does detection, the video feed seems to lag a bit.
Feeding it low res at only 2 fps - so I don’t know if the detection is this slow, or only the output feed.

Either way - just needs some tweaking - but “it’s alive!”

Thanks again!

quick follow up - I checked the MQTT messages and it seemed like it was pumping out a massive number of them… is there a way to adjust the frequency of both the detection and (probably related) the MQTT message sends?


Okay, so I needed to add the port (554) and then it choked on my password which had a ! in it (though VLC managed with it), so I changed my password, and now I’m happy to report it’s working as well!! Here is the URL format for my HikVision camera in case it helps anyone:

rtsp://USERNAME:[email protected]_ADDRESS:554/Streaming/channels/1

Very nice and easy way to implement it, I’m very impressed, thanks so much for sharing this, I think it will get a lot of love.

A few questions:

Is there any way to define rectangular regions? I think I’d just prefer having it scan the whole stream all at once, or is that a problem?

If not, can regions overlap?

And finally, is there any way to have it dump a photo to disk of what it saw? I use this with the current tensorflow integration to show a little dashboard of what was detected in case I didn’t get a chance to see it in realtime on the stream. It’s pretty handy.

EDIT: From FAQ it probably looks like not yet on that one, but open to any creative ideas.

Thanks again for a really nice component, this opens up a lot of new possibilities for me, especially in finally getting off of my overburdened Pi and onto a more robust containerized environment.


I have thought about rate limiting the MQTT messages. I will add it to my list.


In theory, yes. The models are mostly trained on square images, so when the detection runs, it will squish a rectangle into the square. I believe this will throw off the aspect ratio and decrease accuracy. Also of note, the fast models are all trained on 300x300 pixel images, and your region is resized to that before processing by tensorflow. I chose my camera resolution so that my smallest region would be as close to that size as possible. Because the images are resized, tensorflow will have a harder time picking up smaller items in your region. For best accuracy, you want to create your regions so that a person will fill as much of the area as possible.

Regions can overlap, but you will waste some CPU cycles processing the same thing multiple times. Also, you will detect 2 people when one person stands in the overlapping parts.

The DEBUG parameter will write images to disk at /lab/debug as well as a few other debug images. I do have the /best_person.jpg endpoint so I can see the last person I missed integrated as a camera in homeassistant. This container is one part of a full docker based NVR solution I built from scratch. I haven’t open sourced the rest of it yet, but I am working towards all the same use cases.

Microsoft facial recognition, who is able to make it work?

Okay that all makes sense. Additional accuracy is definitely the goal for me, I’m willing to throw some CPU cycles at it, so far it doesn’t seem too bad.

So I will make some big rectangles in the front and some small ones in the back for when people are further away. I would guess about half a person’s worth of overlap would probably be optimal so that it can see that person regardless of if he’s standing between the regions, yes, otherwise it would be “blind” to anything between the regions, wouldn’t it?

I will try it with debug and see what comes out for viewing later.


So, I made 2 720x720 regions and I’m actually pretty impressed with how well it works even when I’m far from the camera. Much better than the one I had running on my Pi. It works at night which the other one couldn’t do, has a high framerate, and accuracy is much better. I don’t think anyone could make it past it…

Couple more questions…

DEBUG parameter, what are the options? I set it to 1, is that right? I don’t see anything in /lab/debug (absolute path) inside the container. The folder doesn’t exist. I made one but didn’t see anything come there.

best_person.jpg is good enough for now, though… But is always good to have persistent storage to look back at.

I’ve started trying to set up the MQTT component, but I get
Socket error on client <unknown>, disconnecting.
in the log when it tries to connect… Anything I should know about the client? Does it require SSL or authentication or anything? I’m trying it on a new test instance so quite possible it’s something on my side, but it does seem to connect to hass okay. I might try tomorrow from my production instance which I know works well (but also often cracks under the pressure of chatty MQTT clients).


Yea. 1 is the only option for debug. If the debug folder isn’t there on startup, it might cause some of the subprocesses to fail until you restart the container. I usually mount a volume from my local at that location when running the container. It only writes on motion or objects and you can search the source to see what I’m writing. The mqtt client doesn’t currently work if you have a username and password set. You could test with mosquitto_pub command line tool from within the container.


Finally got the MQTT going… Hass.io MQTT addon really doesn’t like operating without users and security, that would be a good addition someday, along with the ability to change the MQTT subscriber name. But it’s pretty amazing to have something up and running in a docker container with this much functionality and this little fuss, so great work again.

If anyone wants a setup which works, I used the MQTT with Web Client Hass.io addon with the below config

  "log_level": "debug",
  "certfile": "fullchain.pem",
  "keyfile": "privkey.pem",
  "web": {
    "enabled": true,
    "ssl": false
  "broker": {
    "enabled": true,
    "enable_ws": false,
    "enable_mqtt": true,
    "enable_ws_ssl": false,
    "enable_mqtt_ssl": false,
    "allow_anonymous": true
  "mqttusers": []

One more question, is there any way to set the detection threshhold for /best_person.jpg, it’s been showing my umbrellas occasionally as people with 65% confidence, real people are always above 80%, usually high 90’s, so I think 80% is a good cutoff to not find them (they are unfortunately on wheels so hard to mask).


Also, 3.5GB of debug data today, very comprehensive ;-), but probably a bit much to use every day. Any way to turn off the motion detection debug stuff? Another nice feature would be to be able to set the interval between captures, but now I’m just being annoying.

Thanks again for the great work.


I actually just added support for MQTT username and password this morning and pushed a new image. You could comment out the lines that write on motion and rebuild the container yourself. Can you create github issues for feature requests? I am losing track of the things people are asking for.


Sorry, absolutely will do. Thanks again.