Object detection using OpenCV and Darknet/YOLO

I took my experience from the last attempt using Darkflow, and redid it, so now it’s a modified version of the OpenCV plugin that uses the new Darknet support in version 3.4.3 of OpenCV.

This eliminates the requirements for Tensorflow, and frankly runs a bit faster and uses less resources. This also allows you to now use the new and improved YOLOv3 models. I’m still able to run it on a raspberry pi 3 without much of a performance hit. You will probably have to compile OpenCV from source if you want to run it on the raspberry pi. Please try it out and let me know what you think.

Weight and cfg files for YOLOv3 can be found at https://pjreddie.com/darknet/yolo/. I recommend the YOLOv3-tiny for use on the raspberry pi.

3 Likes

o/

thanks for the plugin, I see in the sensor history that it detected 1 or 2 times (total_matches seems to be the variable used for the history).
How can I see/save the pictures used ?
Is there a way to know which “object” it detected (car,dog,person etc ?)

thanks a lot for your help!

What sort of FPS are you seeing with this on a Pi alone? Do you have anything else running on the Pi or is it dedicated? How would you go about exporting the output as text through Mqtt?

It doesn’t really have a way of saving out the image that was analyzed. I tried to keep the output the same as what the main opencv plugin has in order to maintain compatibility. What you can do if you want a history of detected certain objects is to setup a binary sensor to check for the relevant item. You can then see the history of the binary sensor and know when it detected something.

binary_sensor:
  - platform: template
    sensors:
      dog_at_door:
        friendly_name: "Dog is at door"
        value_template: >-
          {% if states.image_processing.opencv_dogcam.attributes.matches["dog"] is defined -%}True{%- endif %}

The FPS is highly dependent on the size of the input image and what the input image source is (native raspi cam or IP network camera). The actual home assistant plugin has a time delta that is by default set to every 30 seconds (variable is named SCAN_INTERVAL). The actual inference time for an image that’s 640x480 is less than 200ms. You could probably decrease the scan interval to 5 seconds. If you start start seeing errors in the log saying it couldn’t complete in the interval time, then you need to increase the value.

I’ve run this on the same pi that I had home assistant running on, but you’re not going to be able to get many streams unless they’re really low resolution or the scan interval is high.

I do have another project that I’ve been working on that is designed to take the feed from a raspberry pi camera and perform the darknet analysis and then send the image with the items highlighted in a mjpg stream that can be viewed by other devices. It also has the ability to post to an MQTT server with the results. It’s still a little rough around the edges, and a little purpose built for the project I built it for, but if there’s enough interest I can see about uploading it to my github.

@flashoftheblades nice work. Another project of interest is here, which includes the MQTT approach you describe. I think a nice PR to HA would be to add dwaring of bounding boxes on images, I may do that soon.
Cheers

That project looks really interesting (both projects are amazing!), but I’m interested in the linked project because of the binary mqtt sensor. That seems to be exactly what I was aiming for, in terms of person counting

1 Like

Nice catch, I’ll try that.

The issue is I need to know that it detected a “dog”. Do you know how I could get “dog” as a variable ? and use it in some notification ?

Thx a lot!

When you say you need to know when it detected a “dog”, do you mean literally need to know that a dog was detected? If that’s the case, you can configure an automation to send a notification of some kind whenever the state of the binary sensor I demonstrated in my earlier post changes from off to on. You can create binary sensors for other relevant things you care about (person, car, etc.). Any of the items in your labels file would be valid.

If that’s not what you meant, and you need to know whenever some kind of “object” is detected, and what that object type is, that gets a little more complicated. I’m not immediately aware of easy way to do it, but someone smarter than myself may be able to chime in.

Indeed I wanted to be notified when any objections was detected, and what it is :grinning:

So I managed to get notified this way:

id: detection_opencv
     alias: "Detection Image opencv"
     trigger:
     - entity_id: image_processing.opencv_dafang3
       platform: numeric_state
       above: 0
     - entity_id: image_processing.opencv_foscam_camera
      platform: numeric_state
      above: 0
    action:
      service: notify.telegram
      data_template:
        message: >
  Dafang3: {{ states.image_processing.opencv_dafang3.attributes.matches  }},{{ states.image    _processing.opencv_dafang3.attributes.total_matches  }}
  Foscam: {{ states.image_processing.opencv_foscam_camera.attributes.matches  }},{{ states.    image_processing.opencv_foscam_camera.attributes.total_matches  }}
  
~

Would be great to get the picture with a square around the object, but that’s another story :slight_smile:

@flashoftheblades you could add firing of an event for each detected object, see how facebox does this

Here is an example of how you can retrieve that information into a variable or a sensor
https://github.com/skalavala/smarthome/tree/master/jinja_helpers#23-parsing-imageprocessing-json-data-and-making-sense-for-your-automations

I wrote this code for @arsaboo, where he wants to retrieve only those tags that have a box size bigger than 0.5. I have variables for camera, and the tags, so that you can replace them with the actual trigger data, something like following:

{% set camera = trigger.entity_id.attributes.camera.split('.')[1] %}
{%- set tags = state_attr('image_processing.tensorflow_driveway','matches').keys()|list -%}
{%- for object in tags -%}
{%- set outer_loop = loop %}
{%- for x in state_attr('image_processing.tensorflow_driveway','matches')[object]|list if x.box | max > 0.25 -%}
{%- if outer_loop.first %}{% elif outer_loop.last %}, {% else %},{% endif -%}{{ object }} 
{%- endfor -%}
{% endfor -%}
{{- ' detected in ' ~ camera if tags|count |int > 0 }}

The output would be something like car, bus detected in Driveway, that can be used for Text to speech, and notifications.

1 Like

@skalavala This is not working. So, here is an example that we can all play with in the template editor. I manually set value_json to state_attr('image_processing.tensorflow_driveway','matches')

{% set value_json = {'person': [{'score': 99.69417452812195, 'box': [0.2749924957752228, 0.3772248923778534, 0.6910392045974731, 0.4704430401325226]}], 'car': [{'score': 99.01034832000732, 'box': [0.34889374375343323, 0.21685060858726501, 0.23301419615745544, 0.3547678291797638]}, {'score': 99.01034832000732, 'box': [0.54889374375343323, 0.21685060858726501, 0.23301419615745544, 0.3547678291797638]}, {'score': 98.71577620506287, 'box': [0.14932020008563995, 0.3567427694797516, 0.22214098274707794, 0.4808700978755951]}]} %}

{% set camera = 'driveway' %}
{%- set tags = value_json.keys()|list -%}
{%- for object in tags -%}
{%- for x in value_json[object]|list if x.box[0] > 0.25 -%}
{%- if loop.first %}{% elif loop.last %}, {% else %},{% endif -%}{{ object }}
{%- endfor -%}
{% endfor -%}
{{- ' detected in ' ~ camera if tags|count |int > 0 }}

returns personcar, car detected in driveway. It should, instead, return person, car detected in driveway. Further, if we try the same code with

{% set value_json = {'car': [{'score': 99.01034832000732, 'box': [0.14889374375343323, 0.21685060858726501, 0.23301419615745544, 0.3547678291797638]}, {'score': 98.71577620506287, 'box': [0.14932020008563995, 0.3567427694797516, 0.22214098274707794, 0.4808700978755951]}]} %}

we get detected in driveway. The second one should be empty.

Yep - Fixed all of them here :slight_smile:

https://github.com/skalavala/smarthome/tree/master/jinja_helpers#23-parsing-imageprocessing-json-data-and-making-sense-for-your-automations

This code is much cleaner now with the macros. It removes any duplicates, and honors the box sizes and does not show any output if nothing is returned. The previous code doesn’t honor if you have more than one items in the array that matches the criteria.

@flashoftheblades this is a great plugin. I have a couple of questions.

  1. Is there any easy way to get the coordinates for cropping the image. For instance, I have a camera in my front yard. I want to watch for a car in the driveway, not a car driving down the street.

  2. Is there a way to limit what it is searching for? For instance, I am only interested in cars and people. I have noticed it is finding all sorts of things like chairs etc… I am thinking by removing things like chairs it will take less time to process. I am thinking to accomplish this I might have to create a custom model.

Thanks again for all your help

I’m glad you’re enjoying the plugin.

  1. If you import a frame from the camera into something like ms paint, you should be able to see pixel coordinates.

  2. The only way to limit what is searched for by the model is to retrain the model. That’s beyond the scope of what I’m going to cover here, but there’s tutorials out there for training your own YOLOv3 model.

I am trying to think of a way to reduce the load on my server. I have a lot of cameras which I would like this to process. (Family Farm)… This is great but it is running constantly on images that for the most part are not changing. It would be nice to have this trigger off a sensor something that detected motion to then have it grab the image and process it. I have been fooling around with a python script that calls darknet gets the results and saves them off to a variable. I was thinking I could make a call to the HA API to update the state attributes on an entity. However, this seems clumsy any suggestions? On the other hand it also does generate the image with the boxes around the items that everyone wants.

Not sure what you’re running for your server, but the Raspberry Pi is never going to be able to really effectively process more than 1 ~ 2 cameras at a decent frame rate. Anything more and you really need something with a GPU. You can install OpenCV with CUDA/OpenCL support to be able to accelerate the processing of images. It sounds like you’re trying to make some kind of security system using homeassistant and all the cameras. I ended up using a different method for my home setup (combination of Zoneminder and some custom scripts to do the image recognition and triggering of the system).

If you do find a way to generate some kind of alert whenever the cameras sense motion, you can have homeassistant kick off the scan service for the image processor to get an immediate result instead of waiting the scan interval.

If that helps, I have motioneye setup on my raspberry and have enabled motion detection. Also notifications are enabled and have setup webhook notifications to trigger scripts on my HA instance. So I created a couple of scripts and HA is taking snapshots of the camera and saves the file.
I have this setup for a completely different reason than this topic, but I think you can implement it for your purpose.

That is how I am using it. Using an automation to trigger image processing on motion and only for the camera where motion was detected. Check my repo for relevant automations.