New Custom Compontent - Image Processing - Object Detection - DOODS

It’s a real case. For some reason the city council exposes public webcams as .gif static images that get updated every ~10 minutes or so.
I created in HA generic camera with still image URL defined to point to those only, no authentication.

Then trying to use that generic camera as input for DOODS breaks.

Any chance at looking to cast the returned value as a numeric state.
This history of values is useless to work with, to calculate trends, (to reduce database size thorugh the statistical_sensors recording) etc…

It should be just a ‘small’ change defining the generated sensor state_class as a ‘measurement’ entity:

Thanks!

image

Man this is cool!

I have a bit of a strange use case (probably). I’m starting to get long ads popping up when I watch YouTube videos on our Android TV, and I thought: Wouldn’t it be cool if Home Assistant could recognize and click the “Skip ad” button for me when it pops up? DOODS is working great and recognizing objects in the local camera screen grabs from the media_player entity. However, the “Skip ad” button doesn’t look enough like a skateboard or snowboard to be recognizable… :joy: I think the next step is to import a different model when there are not suitable labels in the default model, or perhaps eventually train my own to recognize this button. Can somebody nudge me in the right direction for changing the model for a HassOS add-on installation running on RPi4, or any other insights on how to proceed? I know this use case is ridiculous, but it sure would be a fun party trick.

Nice idea, entirely over-engineered, as is tradition :smiley:

There is an alternative which might be easier to use GitHub - yuliskov/SmartTubeNext: Ad free app for watching tube videos on Android TV boxes

Ah looks interesting, but indeed not quite what I’m aiming for. Maybe there is a different/easier/initial-step way to do the same thing with optical character recognition instead of broader object recognition, but I would like to continue using YouTube’s own app with an overengineered Home Assistant solution running in the background. For me at least half the fun is making the function alongside actually using it :nerd_face:

For anybody else as crazy as me, I actually managed to get this working with a custom tflite supervised learning model to recognize the YouTube ‘Skip ad’ button and the ad notification in the top left. There’s now a little AI running in the background when YouTube is up on the TV and exclaiming ‘Ad dismissed, oh yeah!’ on the smart speaker when the button pops up. Below is some info to help you get started with your own model creation.

Note that this doesn’t actually block ads from popping up (that wasn’t my intention), but instead uses a simple model to recognize the button and then press it for you (because… why not? Supervised learning is cool).

Here is a video that gives an overview of how to create your own tflite model: Train a custom object detection model using your data - YouTube Below is a chronology of where I went wrong along the way in case it reduces your pain:

  • I had no end of problems tring to install the python library tflite-model-maker, despite spending ~10hrs and trying it on PC, Mac & even different Python versions on the same machine. PIP insisted on downloading 20GB+ in nightly builds before failing every time. Use this colab notebook directly instead: Colab: Train custom model tutorial You can upload your own training and validation dataset in the files tab on the left, and use the first part of the colab notebook to generate your own tflite object recognition model.

  • I labeled my images using Label Studio, but closed the session before realizing how much troubleshooting I would need to do with the training data in tflite-model-maker. Don’t do that… There’s an overview of how to use Label Studio here.

  • The annotated training set should be exported from Label Studio in the format containing ‘pascal voc’

  • The XML files should not contain an XML attribute (I had to manually delete the first line before in all of my XML files, but perhaps there is some way to specify this in Label Studio)

  • I had many errors pop up in the colab notebook about the images not being in JPEG format. This has something to do with the colorspace even when they are JPEGs in RGB, but was hard to pinpoint. I came across the below script here that got the image set working for me.

import glob
import PIL
from PIL import Image

files = glob.glob('images/*')
len(files)

for file in files:
    if '.jpg' in file:
        image = PIL.Image.open(file)
        if image.format not in ['JPG', 'JPEG']:
            print(file)
            image.convert("RGB").save(file, 'JPEG')
  • To test the model in Google Colab, I uploaded a screenshot to a free online service and pasted the url ending in ‘.jpg’ into the “Run object detection and show the detection results” section of the colab notebook.

  • Next step was to copy the model and labelFile to the /share folder in HassOS. Make sure to update your DOODS2 AddOn config.yaml to include the new model with correct locations. For my model & labelFile I also had to include the flag labelsStartFromZero: true. In the end my AddOn config looked like this:

logger:
  level: info
server:
  port: "8080"
  auth_key: ""
doods.detectors:
  - name: default
    type: tflite
    modelFile: /opt/doods/models/coco_ssd_mobilenet_v1_1.0_quant.tflite
    labelFile: /opt/doods/models/coco_labels0.txt
    hwAccel: false
  - name: youtubeads
    type: tflite
    modelFile: /share/doods/youtubeads.tflite
    labelFile: /share/doods/youtubeads_labels.txt
    labelsStartFromZero: true
    hwAccel: false
  - name: tensorflow
    type: tensorflow
    modelFile: /opt/doods/models/faster_rcnn_inception_v2_coco_2018_01_28.pb
    labelFile: /opt/doods/models/coco_labels1.txt
    hwAccel: false
  • My plaintext labelFile looked as follows. Rows must start with an index and the labels must be in the same order as in the model in order for objects to be labeled correctly. The labels are written out just as I had in Label Studio when preparing the model training set.
0 ad_notice
1 skip_button
  • Thanks to some nice work supporting these models from snowzach, all that’s left is to check that your model is working through the DOODS2 web portal. Here you can also play around with shrinking the areas where objects can be detected, and “show configured detectors” to see how the labels loaded in.
3 Likes

Damn, and I just wanted to know if it can use my front yard securty camera (unifi) to tell me if the bins are out,

do you have your YT also clicking the skip add button as soon as it practically can?

It’s workable, but actually pretty slow. The DOODS processing happens quite fast (0.4-0.6sec), but everything else is slow (3-10sec). For now it’s enough that whomever’s nearest the remote control is slower to respond :+1:

The YouTube screengrab is large (1920x1080px) and comes from a local image generic camera I’ve configured. Unfortunately, its source entity_picture attribute from the Android TV media player only updates every ~5 seconds, so I have to get lucky with the capture time to have any speed. I also haven’t had any luck getting the remote commands to the TV to go fast via any method I’ve found so far (have tried these: generic & learn_sendevent ADB commands, Philips TV remote), so it also takes 2-3 seconds for Home Assistant to push the skip ads button.

Perhaps one day I’ll work on getting that faster. In particular the slow button presses are annoying because that’s a problem for a few other automations, but that’s not as fun to fix as over-the-top stuff. Next ridiculous home automation goal is a vision-driven watergun turret to discourage (humanely) the crows from knocking all our apples off the tree later in the year. :nerd_face:

I am also looking for the same exact thing. did you manage to find any pretrained model to recognize bins?

It’s not actually that hard to make your own trained model. Maybe give that a shot (see steps in my post above)? I imagine it could be quite accurate when you use your actual camera view and actual bins. For my YouTube ad button model trained on 100 or so stills I haven’t noticed any false positives. I think the Google collab article/script thing I linked to uses even less training data for its figurine detection example.

1 Like

Hi Snow, it’s possible to have a docker compose configuration for a Jetson Xavier Nx able to use its GPU ?
Thanks a lot !

No, and I have no idea how to train it on the bins,
I might look at using blueiris or similar

Hey, I don’t have one so unfortunately I wouldn’t know how to add support for it… Sorry.

Coming back to this, I realize that DOODS is a core integration in HA so as such should be reported directly in GitHub.
I have just done that. DOODS state not handled as numerical although being a count · Issue #92506 · home-assistant/core · GitHub hopefully it will be picked up and implemented soon enough…

Hey @snowzach, i’m using doods with a docker image, if i use the doods2 image can i still use my homeassistant/noderes configuration or i have to change something?

Can someone help me to investigate why the detection is now relly slow? It took about 10 seconds for a scan of a static image (100kb).

It depends on your CPU and model. The tensorflow model on a raspberry pi is 10 seconds typically.

I’m on a intel nuc with a celeron cpu and 4gb ram. It’s a normal scan time?

It depends on how slow of a CPU it is… That doesn’t seem out of the question if it’s a slow one…

It’s a J3455 Celeron, what do you think?