OpenCV vs FFMPEG Efficiency

Odianosen25 · July 25, 2020, 5:57pm

@blakeblackshear, My facial recognition component, relies on a caffe model run by opencv for face detection and in spite of running this pixel conversion from YUV to BGR, the full 3MP conversion is 2-3x more efficient than using FFMpeg directly. I really am not sure why.
I just evaluated the difference in wattage consumption per real-time stream of the video capture only using the GPU on my setup.
OpenCV: 3W//3MP 20fps stream
Ffmpeg: 10W/1MP 20fps stream, 25W/3MP.
The additional load is on the CPU. The additional GPU decoder load is about the same.

video+processing:
Video processing Face detection+face recognition, 3MP 20fps using openCV image processing+ caffe DNN model+Dlib encoding+ SciKit linear classifier: 8W/stream
Watsor person detection, 1MP(downscaled). 20fps: 20W/stream. At full 3MP resolution the consumption goes to ~50W/stream.

We can continue the discuss here. I find your results very intriguing. I have been working on compiling OpenCL within my OpenCV docker image, so I could make use of the Transparent API to speed it up. I am trying to see how I can maximise the Intel hardware chip-set, before I decide to get either the Coral Stick or a dedicated GPU. I am of the view OpenCV could even be more optimised, once Intel VA can be compiled with it.

rafale77 · July 26, 2020, 4:35pm

So I can provide some updates to this as I fixed what I thought was an openCV bug with a cuda memory issue and turns out to be my own ignorance. I had to run the inference functions within an async wrapper to prevent memory conflicts and I now have full GPU supported stream management and inference with openCV. I have also switched to a tensorflow framework MobilenetV3 model which is much easier to code and… tada…

I am now getting much lower CPU loading, basically only the 3.5%/stream and 0% for inference.
On the GPU, the decoder is loaded 6%/stream and the inference adds an average of 3%.
Much better than watsor_gpu using ffmpeg. Which was 30% vs 3.5% on the cpu for decoding (still with GPU) and 20W/inference stream vs. 0W currently for my 4 streams.

code is here and you would have to download the mobilenetV3 model from the opencv github site:

github.com

rafale77/home-assistant/blob/d58d78916efd22f658aa227bd742169b95154382/homeassistant/components/opencv/image_processing.py

"""Component that will process object detection with opencv."""
import asyncio
import logging
import numpy as np
import cv2
# pylint: disable=import-error

import voluptuous as vol

from pathlib import Path
import os

from homeassistant.components.image_processing import (
    CONF_ENTITY_ID,
    CONF_NAME,
    CONF_SOURCE,
    PLATFORM_SCHEMA,
    ImageProcessingEntity,
)

This file has been truncated. show original

Odianosen25 · July 26, 2020, 6:37pm

This is pretty amazing stuffs, and to think abt the difference in efficiency compared to ffmpeg still surprises me. I will study your changes, and see how I can integrate into mine. Though I don’t have a dedicated GPU, so not able to achieve the level of minimal CPU usage.

rafale77 · July 28, 2020, 3:43pm

So I got to my final setup with all streams running and I am now running the YOLO V4 model on openCV.
So comparing the 4 streams on which I had person detection setup on watsor to my current setup, knowing also that YOLO is more accurate that the inception V2 model I previously used:
Watsor/FFMPEG/inceptionV2/GPU:
Stream decoding: 15% CPU load per stream when downsized to 720P, 40% CPU load per stream when in full size. ~2% GPU decoder load/stream.
Object detection: Added ~10% CPU load/stream and 20W consumption/stream on GPU. The model uses 300x300 input.

HomeAssistant/OpenCV/YOLOV4/GPU:
Stream decoding: 3.5% CPU load per stream full size. ~2% GPU decoder load/stream.
Object detection: Added 8% CPU load per stream and 0.5W consumption/stream on GPU. The model uses 608x608 input.

My old watsor/FFmpeg setup was loading my CPU at 60%+40% = 100% (1 full thread) and added 80W to my GPU power draw.
My new setup with openCV loads my CPU with 14%+32% = 46% and <2W on GPU power draw and again is more accurate.

rafale77 · July 29, 2020, 1:52pm

Well not so final afterall. I made some further updates today:

I added one more camera stream to now have a total of 6 streams managed by openCV and am running either facial and object recognition, in both case with my own modified components.
With my additional camera, homeassistant CPU utilization went to 76% and I started getting bothered:
Before I had the baseline of 8%+ 11% (facial recognition stream) + 46% I described in the previous post = 66%, so the extra stream just added about 10%, not very inconsistent but I looked at the CPU load average from top and noticed that it had skyrocketed to 3~3.5… I am only running only 2 CPU thread on my VM which means the CPU is overwhelmed… and is why my GPU only used very little additional power: It was waiting for the CPU to feed it.

Long story short, I looked at the home assistant code and found that the image processing component pulls its frames from the camera component in the jpeg format, as well as the camera streams in MJPEG. It all makes sense to display on the UI but it makes no sense for image processing. This is what the image frame goes through:

Camera stream in H264 or H265 format -> decoded by ffmpeg or opencv, gpu -> raw -> encoded by CPU to numpy array-> encoded by CPU to JPEG -> converted to Bytes by CPU -> decoded by CPU to numpy array for processing.

It’s pretty crazy. I therefore modified the camera component, the ffmpeg and image processing to add a “get raw image” functions. Modified my dlib and opencv image processing component to eliminate to the decoding steps. This is what it looks like now:
Camera stream H264 or H265 ->decoded by GPU to raw ->encoded by CPU to numpy array -> processing.
These mods dropped my home assistant cpu utilization from 76% to 37% and the CPU load average from 3~3.5 to 0.6~1… but now my GPU is being fed a lot more frames and it’s power consumption went up by 25W. Still less than 1/3rd of Watson and with an additional stream…

Summary:

I will probably post these code change suggestion as github pull requests.

Odianosen25 · July 29, 2020, 3:08pm

These are amazing. Unfortunately I still don’t get this level of utilisation as I am still yet to figure out the GPU stuff on Intel in opencv.

But glad you got this working, so now I know I got work to do, to maximise my system. I do use raw frames already, so I need not stress over the jpeg compression.

Thanks for all these.