Object detection for video surveillance

width and height are supposed to reflect the resolution of the actual video feed. They don’t change the size of the video, but are used to configure the frame buffers and other things needed for Watsor to work. If width or height are wrong, the image can’t be displayed correctly.
So the right approach is to change the resolution at Foscam settings or use its sub stream.

300x300 is the size of the images used to train the object detection model. During detection the framework automatically converts an input image to the size of the model to match the size of its tensors. As the resize happens anyway during detection, feeding the stream of high resolution doesn’t make much sense unless the output stream the Watsor produces (the size of which matches the input resolution) is being recorded for later review (where the high resolution video is obviously desired).

Few things can be made to reduce the load on Rasberry CPU:

  • consider configuring hardware acceleration for decoding H.264 video (or whatever Foscam produces) in FFmpeg. Remember the options with hwaccel prefix. According to the doc, Raspberry has some dedicated device, which can take care of the decoding. This should take off some load from CPU, maybe 10%.
  • either in Foscam setting or using the filter lower the frame rate. Ideally, the input frame rate shouldn’t exceed Raspberry capabilities (~5 FPS), otherwise all available CPU units will be used for detection.
  • the image can be rebuilt with NEON and VFPV3 options of OpenCV turned ON to enable native support of single and double-precision in Raspberry’s ARM Cortex-A72 CPU. Rebuilding is last resort and I can’t say how much can be gained as far as OpenCV is not used for detection, but only to produce Motion JPEG and some video effects. The options can be enabled only when running the build on the Raspberry itself. I used cross build where these options are not available.

The most reasonable is buying hardware accelerator, such the Coral, as I think the Raspberry itself won’t cope with more than 5 FPS of the lightest detection model. The accelerator takes off almost all the load and opens the opportunities to use the models with higher accuracy.

2 Likes

Thanks for all the explanations but this is all way over my head at the moment.

Need some time to properly read up on this.

One more question, though:
Would it be possible to provide a count for each class rather than ‘just’ an on/off value?

I was looking at using to check if a car is coming up my driveway, but if there is a car parked in the driveway already, the state wouldn’t change from off to on - it would be more useful, though, if it changed from 1 to 2.

Very impressive new project. I just happened to have added a GPU to do things like these.
I have so far implemented the dlib facial recognition and optimizing through it.
I discovered that the bottleneck with the home assistant implementation is in the ffmpeg camera module. Indeed an ffmpeg camera component does not appear to start a stream from the camera and neither is it capable of taking a single frame from an open stream. Instead, every snapshot requires a call to ffmpeg to start a new stream which can take up to 1s and varies quite a bit.
Have not looked into your code yet but it seems you have overcome or don’t have this problem?

Also thoughts about supporting CUDA11/cudnn8?

Awesome. Watching with interest :slight_smile: Great documentation too.

I see you support multi-region detection reporting and multiple accelerators, but I’m curious how else this compares to Frigate?

.

I glanced at the codebase, and it has a lot of similarities. A few different design decisions and the code is very well organized. Also adds GPU support and a few new features I haven’t gotten to yet. I’m most interested in the differences around multiprocessing and how frames are managed in memory. I ran into a lot of brick walls trying to optimize and use multiple cores efficiently while sharing frames in memory.

3 Likes

Would it be possible to provide a count for each class rather than ‘just’ an on/off value?

I was looking at using to check if a car is coming up my driveway, but if there is a car parked in the driveway already, the state wouldn’t change from off to on - it would be more useful, though, if it changed from 1 to 2.

Yes, the counting is possible to do on HomeAssistant side.
Watsor can transmit the detection details over MQTT publishing them at watsor/cameras/outdoorcam/detection/car/details topic in JSON format as follows:

{
    "t": "2020-06-27T17:41:21.681899",
    "d": [{
        "c": 73.7,
        "b": [54, 244, 617, 479]
    }]
}

where attribute d represents the array of all cars detected in the current video frame (b is a bounding box around a car). This opens bunch of opportunities to implement custom logic on top of Watsor reporting. For example, the length of the array d equals to the number of cars detected.

That scenario is implemented in my demo project.
One sensor subscribes to the topic to listen to the detection details. Then another sensor template camera1_car_count converts the length of array d into the number of cars detected:

{% if is_state('binary_sensor.camera1_car_detected', 'on') %}
  {{ states.sensor.camera1_car_detection_details.attributes["d"] | count }}
{% else %}
  0
{% endif %}

The check of state of binary sensor is necessary to drop the counter when all cars moved out as far as nothing is transmitted over MQTT if nothing detected.

The reporting of detection details by default is turned off. HomeAssistant enables the reporting as soon as camera is turned on by commanding Watsor details = on.


Having sorted out the CPU load on the Raspberry, could you share the metrics following the link at Watsor’s home page?

I discovered that the bottleneck with the home assistant implementation is in the ffmpeg camera module.

I noticed that all HomeAssistant integrations with cameras utilize much resources on the machine, therefore I recommend to use other ways to watch the live streams if possible.

With regard to FFmpeg camera the bottleneck may not be its HomeAssistant implementation, but the fact that the camera transmits video stream encoded such as H.264. Opening that stream always has noticeable latency, which is unavoidable due to the nature of video coding algorithm. As far as the video sequencing compression transports only the changes in the sequence, a decoder (or a player) has to accumulate several dozens of frames prior to be able to reconstruct the first one, resulting in a delay.

Watsor can produce video streams in two formats: MPEG-TS and Motion JPEG. The former is suitable for broadcasting with the purpose of live watching or recording for long. The latter results in large amounts of data transmitted across the network, but it can provide the watching in sheer real-time with zero latency as far the stream is a continuous flow of ready-to-see images.

Considering that MPEG-TS containing H.264 (or similar) stream leads to a latency, to quickly take a snapshot I use MJPEG IP Camera in HomeAssistant instead of FFmpeg Camera or Camera.


Also thoughts about supporting CUDA11/cudnn8?

Watsor uses TensorRT to perform the inference on GPU. The code is not tied to a specific version of TensorRT or the underlying CUDA/CUDNN. I hope that it runs fine on the newest. Remembering how troublesome the installing of all GPU infrastructure is, I postpone the testing for later.

Thank you for being interested in.

I see you support multi-region detection reporting and multiple accelerators, but I’m curious how else this compares to Frigate ?

Frigate is great, it’s inspired me a lot! I actually contributed to it a bit having added GPU support.

I realized I can do more at some point. So besides of the features you’ve mentioned, Watsor can broadcast the video with rendered object detections over HTTP, very efficiently in terms of using the resources of a host, resulting also in using less network bandwidth (and storage). Not hundreds of clients, of course, but a dozen - no sweat.

Watsor allows HomeAssistant to start/stop the decoder and limit frame rate, saving computing resources and energy when system is not armed. The reporting of bounding boxes of the detected objects over MQTT opens the opportunities to implement additional logic.

Watsor ships as Python module targeting embedded systems such as Jetson Nano, where all the dependencies and the infrastructure to perform an inference is installed out of the box, thus the usage of Docker is not reasonable.

The application uses multiple processes and manages computing resources effectively providing couple of interesting synchronization primitives and engineering solutions. These things take effect when the feed from several cameras well exceeds the throughput of the detectors available. The load is balanced ensuring none of the cameras is deprived of analysis and the most recent frame of each feed is processed. Fast reaction against a possible threat is highly desirable in surveillance.

1 Like

Thanks for the response. Very cool :sunglasses:
I like the idea of being able to easily pause the decoder.

Again, thank you for your help!

This is really very impressive - and I can see from your responses to the other posts as well how much thought has gone into this project :star2::star2::star2::star2::star2:

I could not implement any of the suggestions you made to bring the CPU load down (other than reducing the rate to 5fps). When I tried to use e.g. MMAL I received an error message, saying that only VAAPI and vdpau would be supported. So I gave up on that one.

And although my Pi4 4GB seems to be just about coping with the load right now, I decided to order a Coral stick so that I can turn up the fps from the current 5fps and connect more than just 1 camera to it.
Sorry, but I’ll probably have to bother you again once I receive the stick so that I can get it set up properly :frowning:

Another issue I could not resolve was to lower the resolution on my camera’s substream to 640x480.
It was easy to do it on the main stream but I don’t want to lose the resolution in general.

And here are some snapshots of the metrics info you asked for:
Snapshot 1:

{
    "cameras": [
        {
            "name": "cam",
            "fps": {
                "decoder": 5.8,
                "sieve": 5.2,
                "visual_effects": 3.5,
                "snapshot": 5.2,
                "mqtt": 5.2
            },
            "buffer_in": 10,
            "buffer_out": 0
        }
    ],
    "detectors": [
        {
            "name": "CPU",
            "fps": 5.2,
            "fps_max": 7,
            "inference_time": 141.9
        }
    ]
}

Snapshot 2:

{
    "cameras": [
        {
            "name": "outdoorcam",
            "fps": {
                "decoder": 6.6,
                "sieve": 5.3,
                "visual_effects": 5.4,
                "snapshot": 5.3,
                "mqtt": 5.3
            },
            "buffer_in": 0,
            "buffer_out": 0
        }
    ],
    "detectors": [
        {
            "name": "CPU",
            "fps": 5.3,
            "fps_max": 7,
            "inference_time": 143.7
        }
    ]
}

Snapshot 3:

{
    "cameras": [
        {
            "name": "outdoorcam",
            "fps": {
                "decoder": 5.2,
                "sieve": 4.9,
                "visual_effects": 4.9,
                "snapshot": 4.9,
                "mqtt": 4.9
            },
            "buffer_in": 10,
            "buffer_out": 0
        }
    ],
    "detectors": [
        {
            "name": "CPU",
            "fps": 4.9,
            "fps_max": 7,
            "inference_time": 143.9
        }
    ]
}

Thanks again for your support and this great project :+1:

BTW:
After understanding the documentation a little better I disabled the encoder and now the Pi4 runs at about 100% CPU load and around 120F.

We maybe speaking of a different type of latency. I actually finally managed to get rid of it yesterday by using a wrapper with openCV to open a stream in the background so that each time a frame is needed, a new rtsp stream does not need to be negotiated. The downside of course is that the stream is always on and is loading up 10% of one CPU thread. Running the ffmpeg capture from the CLI, I actually see the response diagnosing 0% overhead and a capture time of 0.03s corresponding to the 20fps… This is only the capture time but the actual time from command sent to frame received varied from <1s to well over 3s. Most of the time it is around 1-2s and is due to the I/Os exchanged with my NVR to get security and start the stream.

This is awesome. I think I will test it soon. I am using glib for facial recognition at the moment. Will need this for other object processing.

Hi @asmirnou!

I just received the Coral Device and added the following section back into the config file:

    - -hwaccel
    -  vaapi
    - -hwaccel_device
    -  /dev/dri/renderD128
    - -hwaccel_output_format
    -  yuv420p

I’m getting the following error messages, though:

Watsor    | MainThread       werkzeug                 INFO    : Listening on ('0.0.0.0', 8080)
Watsor    | MainThread       root                     INFO    : Starting Watsor on 1bd657130284 with PID 13
Watsor    | outdoorcam       FFmpegDecoder            INFO    : [AVHWDeviceContext @ 0x55b8848730] No VA display found for device: /dev/dri/renderD128.
Watsor    | outdoorcam       FFmpegDecoder            INFO    : Device creation failed: -22.
Watsor    | outdoorcam       FFmpegDecoder            INFO    : Device setup failed for decoder on input stream #0:0 : Invalid argument
Watsor    | detector1        ObjectDetector           ERROR   : Detection failure
Watsor    | Traceback (most recent call last):
Watsor    |   File "/usr/local/lib/python3.6/dist-packages/watsor/detection/detector.py", line 88, in _run
Watsor    |     with detector_class(*detector_args) as object_detector:
Watsor    |   File "/usr/local/lib/python3.6/dist-packages/watsor/detection/edge_tpu.py", line 15, in __init__
Watsor    |     device_path=device_path)
Watsor    |   File "/usr/lib/python3/dist-packages/edgetpu/detection/engine.py", line 71, in __init__
Watsor    |     super().__init__(model_path, device_path)
Watsor    |   File "/usr/lib/python3/dist-packages/edgetpu/basic/basic_engine.py", line 90, in __init__
Watsor    |     model_path, device_path)
Watsor    | RuntimeError: Error in device opening (/sys/bus/usb/devices/2-2)!

It seems like I need to allow the docker container/user access to the USB Device, correct?

Any hints how I’d go about that?

EDIT:
I also added this line back into the docker-compose.yaml:

    devices:
      - /dev/dri:/dev/dri

Hi @chairstacker,

The options hwaccel are not related to Coral, but to FFmpeg video decoder. Coral performs the inference on raw video frames, but before they become raw, the stream from a camera has to be decoded. That’s where these options come into play.

I can emulate the error RuntimeError: Error in device opening, when I pretend to forget adding a host device to the container. If you use docker-compose, make sure you pass the following option:

devices:
      - /dev/bus/usb:/dev/bus/usb

or, if you run the container from command line:

--device /dev/bus/usb:/dev/bus/usb \

The docker container/user belongs to plugdev group, which should have access to the device.

Besides of /dev/bus/usb:/dev/bus/usb you can add another host devices to a container - /dev/dri:/dev/dri. The latter is a set of video hardware decoders on Raspberry, to which hwaccel options are applied. May be the fact that you haven’t passed them to a container was the reason they didn’t work?

Again, way over my head - sorry.

Here’s the ffmpeg part of my config.yaml

ffmpeg:
  decoder:
    - -hide_banner
    - -loglevel
    -  error
    - -nostdin
    - -fflags
    -  nobuffer
    - -flags
    -  low_delay
    - -fflags
    -  +genpts+discardcorrupt
    - -i                          # camera input field will follow '-i' ffmpeg argument automatically
    - -filter:v
    -  fps=fps=15
    - -f
    -  rawvideo
    - -pix_fmt
    -  rgb24

And this is my docker-compose.yaml

version: '3'

services:
  watsor:
    container_name: Watsor
    image: smirnou/watsor.pi4:latest
    environment:
      - LOG_LEVEL=info
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /etc/watsor:/etc/watsor:ro
    devices:
      - /dev/bus/usb:/dev/bus/usb
      - /dev/dri:/dev/dri
    ports:
      - 8080:8080
    shm_size: 512m

Maybe you need to install Edge TPU runtime on Raspberry OS prior to running Docker?

Still getting error messages:

Watsor    | MainThread       werkzeug                 INFO    : Listening on ('0.0.0.0', 8080)
Watsor    | MainThread       root                     INFO    : Starting Watsor on 31387a180f95 with PID 13
Watsor    | outdoorcam       FFmpegDecoder            INFO    : [h264 @ 0x55ae8828f0] corrupted macroblock 43 15 (total_coeff=-1)
Watsor    | outdoorcam       FFmpegDecoder            INFO    : [h264 @ 0x55ae8828f0] error while decoding MB 43 15
Watsor    | outdoorcam       FFmpegDecoder            INFO    : [h264 @ 0x55ae8828f0] corrupted macroblock 70 0 (total_coeff=-1)
Watsor    | outdoorcam       FFmpegDecoder            INFO    : [h264 @ 0x55ae8828f0] error while decoding MB 70 0
Watsor    | outdoorcam       DetectionSieve           ERROR   : Frame 0 missed
Watsor    | outdoorcam       FrameBuffer              WARNING : Stale frame 0 dated 39 seconds ago is in State.PUBLISH, resetting...

Do I need to install the TensorFlow Lite library as well or is this part of the docker container already?

Try to unplug the Coral, plug again and wait for a couple of second before starting docker container. Also the restart of the Raspberry may help.

TensorFlow Lite library is part of the Docker image. It make sense to install it outside of Docker and run classification test just to identify whether the problem is in device USB connection, 64-bit Raspberry OS or Docker image.

Got it back up & running.

I added a 2nd camera at 2fps and turned up the fps for the 1st one to 10 but this seems to go straight to the CPU usage of the Pi4 itself.

Here’s the output from the metrics link:

{
    "cameras": [
        {
            "name": "outdoorcam",
            "fps": {
                "decoder": 10.3,
                "sieve": 10.3,
                "visual_effects": 10.1,
                "snapshot": 10.3,
                "mqtt": 10.3
            },
            "buffer_in": 10,
            "buffer_out": 0
        },
        {
            "name": "yardcam",
            "fps": {
                "decoder": 2.1,
                "sieve": 2.1,
                "visual_effects": 0.0,
                "snapshot": 2.1,
                "mqtt": 2.1
            },
            "buffer_in": 10,
            "buffer_out": 0
        }
    ],
    "detectors": [
        {
            "name": "Coral",
            "fps": 12.2,
            "fps_max": 58,
            "inference_time": 17.1
        }
    ]
}

I also noticed that when I increase the fps further the Pi4 (2GB) that runs the ‘Superviced’ HA Install’ gets very sluggish.
Seems like it gets overwhelmed by the MQTT messages from Watsor with 3 messages per second for e.g the ‘car’ sensor that constantly keeps updating for the 3 cars it sees.

Got it back up & running.

What was the problem and how did you solve it?

I also noticed that when I increase the fps further the Pi4 (2GB) that runs the ‘Superviced’ HA Install’ gets very sluggish.
Seems like it gets overwhelmed by the MQTT messages from Watsor with 3 messages per second for e.g the ‘car’ sensor that constantly keeps updating for the 3 cars it sees.

The slowness is not concerned with MQTT. The messages that are being sent, even if there are tens of them per second, are small - several hundreds bytes at max. They hardly cause performance problems.

Highly likely the UI of HomeAssistant leads to the tardiness. Watching the live stream of Motion JPEG camera (and other camera integrations) in Lovelace utilize much resources on the machine. You may notice that closing the browser window results in decrease of CPU load. I recommend opening the live stream UI or direct video stream from Watsor HTTP on the machine other than where HomeAssistant backend is running.

Large video resolution results in more CPU load especially since hardware acceleration is not enabled in FFmpeg. If it can not be sorted in camera settings, reduce the size of the video using FFmpeg scaling.

This! It is actually a combination of strain on the CPU and how the snapshots are being taken. If the stream is not on then the lag and tardiness is due to the negotiation and establishment of the stream to get the one frame. If the stream is already established in a thread then it is likely loading the CPU.