Image processing efficiency

This is addressed more to the devs at nabucasa:

I am trying to see if the team would be open to some major changes in the core around the image processing component.
There is quite a bit of interest around face detection/recognition, object detection and a variety of camera stream processing. The camera components, from what I have seen all more or less rely on ffmpeg when almost all the python code I am seeing for image processing relies on opencv (which relies on ffmpeg but within its own code).
The problem now is that I have observed a massive amount of inefficiency coming from two areas:

  1. Running ffmpeg at the OS level forces some re-encoding by ffmpeg which I have not been able to figure out. As a result opencv uses < 50% of the cpu load ffmpeg does for exactly the same stream. I verified this outside of HA in my own Python3 test.
  2. The camera components in HA is designed to get a snapshot every 10s to display it on Lovelace, either by sending a snapshot url to the camera or by asking ffmpeg to start a stream, grab one frame and shut off the stream. The later can be extremely slow (a matter of 1-2s) which is not useable for image processing.
  3. The camera component is also designed to processed jpeg/Mjpeg format, forcing an additional encoding and loading up the cpu if the stream is constantly open just for the decoding/encoding. For image processing, this is unnecessary.

I have modified and now thoroughly tested the following solution but because it affects some core components, would constitute some major breaking changes for a bunch of components:

  1. Replaced ffmpeg calls with opencv calls for all camera components I am using: amcrest and ffmpeg, but I am sure it can apply to other components, and have them open a constant stream within a thread. This eliminated the 1-2s lag from image processing and decreased my cpu load per stream by>50% when I attempted doing the same thing with ffmpeg.
  2. I addedļ½get raw image function in the core camera component and in the individual camera integration components so that the frames to be processed by the image processing core components would come as Numpy arrays instead of jpeg encoded iobytes, thereby eliminating 2-3 encoding/decoding steps for the cpu to be done for every processed frame. This decreased the cpu load by another 50%. This of course required also modifying the processing integration component to not decode the image to be processed.

Final result is a 75% reduction in cpu load per processed camera streamā€¦ and this is using the gpu for decoding the h264/h265 stream and running parts of the image processing.

I will create github pull requests if needed but of courseā€¦ this will require other components, most that I donā€™t use, to be modified.

11 Likes

I donā€™t think they read it here in the forum.
Maybe better to ask in the discord devs_core channel

1 Like

Thanks for the suggestion! I may actually just do it on Github and see what feedback I get.