The motion jpeg decoder is getting a gif… Which must be confusing it. Is this a real use case or just playing around? Any chance you can switch to using jpg? Is this an animated gif?
It’s a real case. For some reason the city council exposes public webcams as .gif static images that get updated every ~10 minutes or so.
I created in HA generic camera with still image URL defined to point to those only, no authentication.
Then trying to use that generic camera as input for DOODS breaks.
Any chance at looking to cast the returned value as a numeric state.
This history of values is useless to work with, to calculate trends, (to reduce database size thorugh the statistical_sensors recording) etc…
It should be just a ‘small’ change defining the generated sensor state_class as a ‘measurement’ entity:
I have a bit of a strange use case (probably). I’m starting to get long ads popping up when I watch YouTube videos on our Android TV, and I thought: Wouldn’t it be cool if Home Assistant could recognize and click the “Skip ad” button for me when it pops up? DOODS is working great and recognizing objects in the local camera screen grabs from the media_player entity. However, the “Skip ad” button doesn’t look enough like a skateboard or snowboard to be recognizable… I think the next step is to import a different model when there are not suitable labels in the default model, or perhaps eventually train my own to recognize this button. Can somebody nudge me in the right direction for changing the model for a HassOS add-on installation running on RPi4, or any other insights on how to proceed? I know this use case is ridiculous, but it sure would be a fun party trick.
Ah looks interesting, but indeed not quite what I’m aiming for. Maybe there is a different/easier/initial-step way to do the same thing with optical character recognition instead of broader object recognition, but I would like to continue using YouTube’s own app with an overengineered Home Assistant solution running in the background. For me at least half the fun is making the function alongside actually using it
For anybody else as crazy as me, I actually managed to get this working with a custom tflite supervised learning model to recognize the YouTube ‘Skip ad’ button and the ad notification in the top left. There’s now a little AI running in the background when YouTube is up on the TV and exclaiming ‘Ad dismissed, oh yeah!’ on the smart speaker when the button pops up. Below is some info to help you get started with your own model creation.
Note that this doesn’t actually block ads from popping up (that wasn’t my intention), but instead uses a simple model to recognize the button and then press it for you (because… why not? Supervised learning is cool).
I had no end of problems tring to install the python library tflite-model-maker, despite spending ~10hrs and trying it on PC, Mac & even different Python versions on the same machine. PIP insisted on downloading 20GB+ in nightly builds before failing every time. Use this colab notebook directly instead: Colab: Train custom model tutorial You can upload your own training and validation dataset in the files tab on the left, and use the first part of the colab notebook to generate your own tflite object recognition model.
I labeled my images using Label Studio, but closed the session before realizing how much troubleshooting I would need to do with the training data in tflite-model-maker. Don’t do that… There’s an overview of how to use Label Studio here.
The annotated training set should be exported from Label Studio in the format containing ‘pascal voc’
The XML files should not contain an XML attribute (I had to manually delete the first line before in all of my XML files, but perhaps there is some way to specify this in Label Studio)
I had many errors pop up in the colab notebook about the images not being in JPEG format. This has something to do with the colorspace even when they are JPEGs in RGB, but was hard to pinpoint. I came across the below script here that got the image set working for me.
import glob
import PIL
from PIL import Image
files = glob.glob('images/*')
len(files)
for file in files:
if '.jpg' in file:
image = PIL.Image.open(file)
if image.format not in ['JPG', 'JPEG']:
print(file)
image.convert("RGB").save(file, 'JPEG')
To test the model in Google Colab, I uploaded a screenshot to a free online service and pasted the url ending in ‘.jpg’ into the “Run object detection and show the detection results” section of the colab notebook.
Next step was to copy the model and labelFile to the /share folder in HassOS. Make sure to update your DOODS2 AddOn config.yaml to include the new model with correct locations. For my model & labelFile I also had to include the flag labelsStartFromZero: true. In the end my AddOn config looked like this:
My plaintext labelFile looked as follows. Rows must start with an index and the labels must be in the same order as in the model in order for objects to be labeled correctly. The labels are written out just as I had in Label Studio when preparing the model training set.
0 ad_notice
1 skip_button
Thanks to some nice work supporting these models from snowzach, all that’s left is to check that your model is working through the DOODS2 web portal. Here you can also play around with shrinking the areas where objects can be detected, and “show configured detectors” to see how the labels loaded in.
It’s workable, but actually pretty slow. The DOODS processing happens quite fast (0.4-0.6sec), but everything else is slow (3-10sec). For now it’s enough that whomever’s nearest the remote control is slower to respond
The YouTube screengrab is large (1920x1080px) and comes from a local image generic camera I’ve configured. Unfortunately, its source entity_picture attribute from the Android TV media player only updates every ~5 seconds, so I have to get lucky with the capture time to have any speed. I also haven’t had any luck getting the remote commands to the TV to go fast via any method I’ve found so far (have tried these: generic & learn_sendevent ADB commands, Philips TV remote), so it also takes 2-3 seconds for Home Assistant to push the skip ads button.
Perhaps one day I’ll work on getting that faster. In particular the slow button presses are annoying because that’s a problem for a few other automations, but that’s not as fun to fix as over-the-top stuff. Next ridiculous home automation goal is a vision-driven watergun turret to discourage (humanely) the crows from knocking all our apples off the tree later in the year.
It’s not actually that hard to make your own trained model. Maybe give that a shot (see steps in my post above)? I imagine it could be quite accurate when you use your actual camera view and actual bins. For my YouTube ad button model trained on 100 or so stills I haven’t noticed any false positives. I think the Google collab article/script thing I linked to uses even less training data for its figurine detection example.
Hey @snowzach, i’m using doods with a docker image, if i use the doods2 image can i still use my homeassistant/noderes configuration or i have to change something?