Object detection for video surveillance

rafale77 · June 27, 2020, 4:45pm

Very impressive new project. I just happened to have added a GPU to do things like these.
I have so far implemented the dlib facial recognition and optimizing through it.
I discovered that the bottleneck with the home assistant implementation is in the ffmpeg camera module. Indeed an ffmpeg camera component does not appear to start a stream from the camera and neither is it capable of taking a single frame from an open stream. Instead, every snapshot requires a call to ffmpeg to start a new stream which can take up to 1s and varies quite a bit.
Have not looked into your code yet but it seems you have overcome or don’t have this problem?

Also thoughts about supporting CUDA11/cudnn8?

mr-onion · June 27, 2020, 9:04pm

Awesome. Watching with interest Great documentation too.

I see you support multi-region detection reporting and multiple accelerators, but I’m curious how else this compares to Frigate?

.

blakeblackshear · June 28, 2020, 12:25am

I glanced at the codebase, and it has a lot of similarities. A few different design decisions and the code is very well organized. Also adds GPU support and a few new features I haven’t gotten to yet. I’m most interested in the differences around multiprocessing and how frames are managed in memory. I ran into a lot of brick walls trying to optimize and use multiple cores efficiently while sharing frames in memory.

asmirnou · June 28, 2020, 8:56am

Would it be possible to provide a count for each class rather than ‘just’ an on/off value?

I was looking at using to check if a car is coming up my driveway, but if there is a car parked in the driveway already, the state wouldn’t change from off to on - it would be more useful, though, if it changed from 1 to 2.

Yes, the counting is possible to do on HomeAssistant side.
Watsor can transmit the detection details over MQTT publishing them at watsor/cameras/outdoorcam/detection/car/details topic in JSON format as follows:

{
    "t": "2020-06-27T17:41:21.681899",
    "d": [{
        "c": 73.7,
        "b": [54, 244, 617, 479]
    }]
}

where attribute d represents the array of all cars detected in the current video frame (b is a bounding box around a car). This opens bunch of opportunities to implement custom logic on top of Watsor reporting. For example, the length of the array d equals to the number of cars detected.

That scenario is implemented in my demo project.
One sensor subscribes to the topic to listen to the detection details. Then another sensor template camera1_car_count converts the length of array d into the number of cars detected:

{% if is_state('binary_sensor.camera1_car_detected', 'on') %}
  {{ states.sensor.camera1_car_detection_details.attributes["d"] | count }}
{% else %}
  0
{% endif %}

The check of state of binary sensor is necessary to drop the counter when all cars moved out as far as nothing is transmitted over MQTT if nothing detected.

The reporting of detection details by default is turned off. HomeAssistant enables the reporting as soon as camera is turned on by commanding Watsor details = on.

Having sorted out the CPU load on the Raspberry, could you share the metrics following the link at Watsor’s home page?

asmirnou · June 28, 2020, 9:59am

I discovered that the bottleneck with the home assistant implementation is in the ffmpeg camera module.

I noticed that all HomeAssistant integrations with cameras utilize much resources on the machine, therefore I recommend to use other ways to watch the live streams if possible.

With regard to FFmpeg camera the bottleneck may not be its HomeAssistant implementation, but the fact that the camera transmits video stream encoded such as H.264. Opening that stream always has noticeable latency, which is unavoidable due to the nature of video coding algorithm. As far as the video sequencing compression transports only the changes in the sequence, a decoder (or a player) has to accumulate several dozens of frames prior to be able to reconstruct the first one, resulting in a delay.

Watsor can produce video streams in two formats: MPEG-TS and Motion JPEG. The former is suitable for broadcasting with the purpose of live watching or recording for long. The latter results in large amounts of data transmitted across the network, but it can provide the watching in sheer real-time with zero latency as far the stream is a continuous flow of ready-to-see images.

Considering that MPEG-TS containing H.264 (or similar) stream leads to a latency, to quickly take a snapshot I use MJPEG IP Camera in HomeAssistant instead of FFmpeg Camera or Camera.

Also thoughts about supporting CUDA11/cudnn8?

Watsor uses TensorRT to perform the inference on GPU. The code is not tied to a specific version of TensorRT or the underlying CUDA/CUDNN. I hope that it runs fine on the newest. Remembering how troublesome the installing of all GPU infrastructure is, I postpone the testing for later.

asmirnou · June 28, 2020, 10:52am

Thank you for being interested in.

I see you support multi-region detection reporting and multiple accelerators, but I’m curious how else this compares to Frigate ?

Frigate is great, it’s inspired me a lot! I actually contributed to it a bit having added GPU support.

I realized I can do more at some point. So besides of the features you’ve mentioned, Watsor can broadcast the video with rendered object detections over HTTP, very efficiently in terms of using the resources of a host, resulting also in using less network bandwidth (and storage). Not hundreds of clients, of course, but a dozen - no sweat.

Watsor allows HomeAssistant to start/stop the decoder and limit frame rate, saving computing resources and energy when system is not armed. The reporting of bounding boxes of the detected objects over MQTT opens the opportunities to implement additional logic.

Watsor ships as Python module targeting embedded systems such as Jetson Nano, where all the dependencies and the infrastructure to perform an inference is installed out of the box, thus the usage of Docker is not reasonable.

The application uses multiple processes and manages computing resources effectively providing couple of interesting synchronization primitives and engineering solutions. These things take effect when the feed from several cameras well exceeds the throughput of the detectors available. The load is balanced ensuring none of the cameras is deprived of analysis and the most recent frame of each feed is processed. Fast reaction against a possible threat is highly desirable in surveillance.

mr-onion · June 28, 2020, 11:27am

Thanks for the response. Very cool
I like the idea of being able to easily pause the decoder.

chairstacker · June 28, 2020, 7:12pm

Again, thank you for your help!

This is really very impressive - and I can see from your responses to the other posts as well how much thought has gone into this project

I could not implement any of the suggestions you made to bring the CPU load down (other than reducing the rate to 5fps). When I tried to use e.g. MMAL I received an error message, saying that only VAAPI and vdpau would be supported. So I gave up on that one.

And although my Pi4 4GB seems to be just about coping with the load right now, I decided to order a Coral stick so that I can turn up the fps from the current 5fps and connect more than just 1 camera to it.
Sorry, but I’ll probably have to bother you again once I receive the stick so that I can get it set up properly

Another issue I could not resolve was to lower the resolution on my camera’s substream to 640x480.
It was easy to do it on the main stream but I don’t want to lose the resolution in general.

And here are some snapshots of the metrics info you asked for:
Snapshot 1:

{
    "cameras": [
        {
            "name": "cam",
            "fps": {
                "decoder": 5.8,
                "sieve": 5.2,
                "visual_effects": 3.5,
                "snapshot": 5.2,
                "mqtt": 5.2
            },
            "buffer_in": 10,
            "buffer_out": 0
        }
    ],
    "detectors": [
        {
            "name": "CPU",
            "fps": 5.2,
            "fps_max": 7,
            "inference_time": 141.9
        }
    ]
}

Snapshot 2:

{
    "cameras": [
        {
            "name": "outdoorcam",
            "fps": {
                "decoder": 6.6,
                "sieve": 5.3,
                "visual_effects": 5.4,
                "snapshot": 5.3,
                "mqtt": 5.3
            },
            "buffer_in": 0,
            "buffer_out": 0
        }
    ],
    "detectors": [
        {
            "name": "CPU",
            "fps": 5.3,
            "fps_max": 7,
            "inference_time": 143.7
        }
    ]
}

Snapshot 3:

{
    "cameras": [
        {
            "name": "outdoorcam",
            "fps": {
                "decoder": 5.2,
                "sieve": 4.9,
                "visual_effects": 4.9,
                "snapshot": 4.9,
                "mqtt": 4.9
            },
            "buffer_in": 10,
            "buffer_out": 0
        }
    ],
    "detectors": [
        {
            "name": "CPU",
            "fps": 4.9,
            "fps_max": 7,
            "inference_time": 143.9
        }
    ]
}

Thanks again for your support and this great project

BTW:
After understanding the documentation a little better I disabled the encoder and now the Pi4 runs at about 100% CPU load and around 120F.

rafale77 · June 28, 2020, 8:32pm

We maybe speaking of a different type of latency. I actually finally managed to get rid of it yesterday by using a wrapper with openCV to open a stream in the background so that each time a frame is needed, a new rtsp stream does not need to be negotiated. The downside of course is that the stream is always on and is loading up 10% of one CPU thread. Running the ffmpeg capture from the CLI, I actually see the response diagnosing 0% overhead and a capture time of 0.03s corresponding to the 20fps… This is only the capture time but the actual time from command sent to frame received varied from <1s to well over 3s. Most of the time it is around 1-2s and is due to the I/Os exchanged with my NVR to get security and start the stream.

This is awesome. I think I will test it soon. I am using glib for facial recognition at the moment. Will need this for other object processing.

chairstacker · June 29, 2020, 6:50pm

Hi @asmirnou!

I just received the Coral Device and added the following section back into the config file:

    - -hwaccel
    -  vaapi
    - -hwaccel_device
    -  /dev/dri/renderD128
    - -hwaccel_output_format
    -  yuv420p

I’m getting the following error messages, though:

Watsor    | MainThread       werkzeug                 INFO    : Listening on ('0.0.0.0', 8080)
Watsor    | MainThread       root                     INFO    : Starting Watsor on 1bd657130284 with PID 13
Watsor    | outdoorcam       FFmpegDecoder            INFO    : [AVHWDeviceContext @ 0x55b8848730] No VA display found for device: /dev/dri/renderD128.
Watsor    | outdoorcam       FFmpegDecoder            INFO    : Device creation failed: -22.
Watsor    | outdoorcam       FFmpegDecoder            INFO    : Device setup failed for decoder on input stream #0:0 : Invalid argument
Watsor    | detector1        ObjectDetector           ERROR   : Detection failure
Watsor    | Traceback (most recent call last):
Watsor    |   File "/usr/local/lib/python3.6/dist-packages/watsor/detection/detector.py", line 88, in _run
Watsor    |     with detector_class(*detector_args) as object_detector:
Watsor    |   File "/usr/local/lib/python3.6/dist-packages/watsor/detection/edge_tpu.py", line 15, in __init__
Watsor    |     device_path=device_path)
Watsor    |   File "/usr/lib/python3/dist-packages/edgetpu/detection/engine.py", line 71, in __init__
Watsor    |     super().__init__(model_path, device_path)
Watsor    |   File "/usr/lib/python3/dist-packages/edgetpu/basic/basic_engine.py", line 90, in __init__
Watsor    |     model_path, device_path)
Watsor    | RuntimeError: Error in device opening (/sys/bus/usb/devices/2-2)!

It seems like I need to allow the docker container/user access to the USB Device, correct?

Any hints how I’d go about that?

EDIT:
I also added this line back into the docker-compose.yaml:

    devices:
      - /dev/dri:/dev/dri

asmirnou · June 29, 2020, 7:57pm

Hi @chairstacker,

The options hwaccel are not related to Coral, but to FFmpeg video decoder. Coral performs the inference on raw video frames, but before they become raw, the stream from a camera has to be decoded. That’s where these options come into play.

I can emulate the error RuntimeError: Error in device opening, when I pretend to forget adding a host device to the container. If you use docker-compose, make sure you pass the following option:

devices:
      - /dev/bus/usb:/dev/bus/usb

or, if you run the container from command line:

--device /dev/bus/usb:/dev/bus/usb \

The docker container/user belongs to plugdev group, which should have access to the device.

Besides of /dev/bus/usb:/dev/bus/usb you can add another host devices to a container - /dev/dri:/dev/dri. The latter is a set of video hardware decoders on Raspberry, to which hwaccel options are applied. May be the fact that you haven’t passed them to a container was the reason they didn’t work?

chairstacker · June 29, 2020, 8:47pm

Again, way over my head - sorry.

Here’s the ffmpeg part of my config.yaml

ffmpeg:
  decoder:
    - -hide_banner
    - -loglevel
    -  error
    - -nostdin
    - -fflags
    -  nobuffer
    - -flags
    -  low_delay
    - -fflags
    -  +genpts+discardcorrupt
    - -i                          # camera input field will follow '-i' ffmpeg argument automatically
    - -filter:v
    -  fps=fps=15
    - -f
    -  rawvideo
    - -pix_fmt
    -  rgb24

And this is my docker-compose.yaml

version: '3'

services:
  watsor:
    container_name: Watsor
    image: smirnou/watsor.pi4:latest
    environment:
      - LOG_LEVEL=info
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /etc/watsor:/etc/watsor:ro
    devices:
      - /dev/bus/usb:/dev/bus/usb
      - /dev/dri:/dev/dri
    ports:
      - 8080:8080
    shm_size: 512m

asmirnou · June 29, 2020, 9:11pm

Maybe you need to install Edge TPU runtime on Raspberry OS prior to running Docker?

chairstacker · June 29, 2020, 9:46pm

Still getting error messages:

Watsor    | MainThread       werkzeug                 INFO    : Listening on ('0.0.0.0', 8080)
Watsor    | MainThread       root                     INFO    : Starting Watsor on 31387a180f95 with PID 13
Watsor    | outdoorcam       FFmpegDecoder            INFO    : [h264 @ 0x55ae8828f0] corrupted macroblock 43 15 (total_coeff=-1)
Watsor    | outdoorcam       FFmpegDecoder            INFO    : [h264 @ 0x55ae8828f0] error while decoding MB 43 15
Watsor    | outdoorcam       FFmpegDecoder            INFO    : [h264 @ 0x55ae8828f0] corrupted macroblock 70 0 (total_coeff=-1)
Watsor    | outdoorcam       FFmpegDecoder            INFO    : [h264 @ 0x55ae8828f0] error while decoding MB 70 0
Watsor    | outdoorcam       DetectionSieve           ERROR   : Frame 0 missed
Watsor    | outdoorcam       FrameBuffer              WARNING : Stale frame 0 dated 39 seconds ago is in State.PUBLISH, resetting...

Do I need to install the TensorFlow Lite library as well or is this part of the docker container already?

asmirnou · June 29, 2020, 10:08pm

Try to unplug the Coral, plug again and wait for a couple of second before starting docker container. Also the restart of the Raspberry may help.

TensorFlow Lite library is part of the Docker image. It make sense to install it outside of Docker and run classification test just to identify whether the problem is in device USB connection, 64-bit Raspberry OS or Docker image.

chairstacker · June 30, 2020, 2:33am

Got it back up & running.

I added a 2nd camera at 2fps and turned up the fps for the 1st one to 10 but this seems to go straight to the CPU usage of the Pi4 itself.

Here’s the output from the metrics link:

{
    "cameras": [
        {
            "name": "outdoorcam",
            "fps": {
                "decoder": 10.3,
                "sieve": 10.3,
                "visual_effects": 10.1,
                "snapshot": 10.3,
                "mqtt": 10.3
            },
            "buffer_in": 10,
            "buffer_out": 0
        },
        {
            "name": "yardcam",
            "fps": {
                "decoder": 2.1,
                "sieve": 2.1,
                "visual_effects": 0.0,
                "snapshot": 2.1,
                "mqtt": 2.1
            },
            "buffer_in": 10,
            "buffer_out": 0
        }
    ],
    "detectors": [
        {
            "name": "Coral",
            "fps": 12.2,
            "fps_max": 58,
            "inference_time": 17.1
        }
    ]
}

I also noticed that when I increase the fps further the Pi4 (2GB) that runs the ‘Superviced’ HA Install’ gets very sluggish.
Seems like it gets overwhelmed by the MQTT messages from Watsor with 3 messages per second for e.g the ‘car’ sensor that constantly keeps updating for the 3 cars it sees.

asmirnou · June 30, 2020, 3:15pm

Got it back up & running.

What was the problem and how did you solve it?

I also noticed that when I increase the fps further the Pi4 (2GB) that runs the ‘Superviced’ HA Install’ gets very sluggish.
Seems like it gets overwhelmed by the MQTT messages from Watsor with 3 messages per second for e.g the ‘car’ sensor that constantly keeps updating for the 3 cars it sees.

The slowness is not concerned with MQTT. The messages that are being sent, even if there are tens of them per second, are small - several hundreds bytes at max. They hardly cause performance problems.

Highly likely the UI of HomeAssistant leads to the tardiness. Watching the live stream of Motion JPEG camera (and other camera integrations) in Lovelace utilize much resources on the machine. You may notice that closing the browser window results in decrease of CPU load. I recommend opening the live stream UI or direct video stream from Watsor HTTP on the machine other than where HomeAssistant backend is running.

Large video resolution results in more CPU load especially since hardware acceleration is not enabled in FFmpeg. If it can not be sorted in camera settings, reduce the size of the video using FFmpeg scaling.

rafale77 · June 30, 2020, 3:45pm

This! It is actually a combination of strain on the CPU and how the snapshots are being taken. If the stream is not on then the lag and tardiness is due to the negotiation and establishment of the stream to get the one frame. If the stream is already established in a thread then it is likely loading the CPU.

chairstacker · June 30, 2020, 10:10pm

Honestly, I’m not sure.
I rebooted, unplugged and re-plugged the Coral device multiple time and noticed that once I added a 2nd camera this one was showing an image on the interface. So I rebooted the first camera and it worked there as well.

I have tried to implement the scaling, but yet again, I am out of my depth; I added it to my decoder config after the - i option:

    - -i                          # camera input field will follow '-i' ffmpeg argument automatically
    - -filter:v
    -  fps=fps=15
    - -f
    -  rawvideo
    - -pix_fmt
    -  rgb24
# Test
    - -vf
    -  scale=-1:480

and changed the resolution for the camera to 640x480 as well. But it results in a single snapshot being shown at /video/mjpeg/outdoorcam, no moving picture; same result when I look at it on VLC.

It recognizes objects at startup, but will then freeze while the sensors are still active for a little longer:

{
    "cameras": [
        {
            "name": "outdoorcam",
            "fps": {
                "decoder": 73.5,
                "sieve": 23.4,
                "visual_effects": 0.0,
                "snapshot": 23.1,
                "mqtt": 23.4
            },
            "buffer_in": 20,
            "buffer_out": 0
        }
    ],
    "detectors": [
        {
            "name": "Coral",
            "fps": 29.3,
            "fps_max": 58,
            "inference_time": 17.3
        }
    ]
}

Sorry, I tried to upload a screenshot from HA but there seem to be issues pasting images, so here’s the info from MQTT Explorer:

{"fps_in": 177.7, "fps_out": 28.5, "buffer": 20}

As you can see, the frame rate - even though the limit is set to 15fps - goes through the roof and eventually knocks out my Pi4 that runs HA cold.

I’m pretty sure it’s not the UI because the Pi4 croaks eventually, even when the UI is not open/not in use - just like it happens to me in this case as well:

asmirnou · July 1, 2020, 5:33pm

the frame rate - even though the limit is set to 15fps - goes through the roof

When you defined the second video filter, the first one stopped to be honored. The filters in FFmped are separated by commas. Instead define both speed and scale filters in one line as follows:

    - -filter:v
    -  fps=fps=15,scale=-1:480

The slowness is not concerned with MQTT.

I think I know what’s going on. The recorder integration in HomeAssistant constantly saves data. By default it stores everything from sensors to state changes. The data is saved on the SD card of Raspberry Pi, which is slow medium, resulting in degradation of system’s reaction time. I can assure that this happens on a PC too with file-based database engine such as SQLite.

Fortunately, HomeAssistant allows to customize what needs to be written or not, using the include and exclude parameters of the recorder. It turned out is easier to include what’s needed rather than trying to exclude unnecessary, because too much is being saved by default.

In my demo project I include only few sensors that are rendered in History. They do not need to be recorded for Watsor to work, just if one wants to observe their measurements.

recorder:
  include:
    entities:
      - alarm_control_panel.home_alarm
      - binary_sensor.camera1_person_detected
      - binary_sensor.camera1_car_detected
      - sensor.detector1_fps
      - sensor.detector1_fps_max
      - sensor.detector1_inference_time
      - sensor.camera1_person_count

Customize the recorder and your Raspberry PI will come to life.