Camera/video human presence detection API

antongisli · May 26, 2020, 6:23pm

Hi everyone,

I’ve been working on a home project for a couple of years now and built up an AI based human detection system integrated with home assistant. I see a few, but not many posts from others about this.

One of the things I setup for myself was a simple API that I could post videos to and get back an answer - whether a human was detected or not. I also built an RPI based adapter to use at my home to get motion events from my ONVIF cameras, create small video clips from them (when they detected motion) and upload it to this API. I have this stuff running solid 24/7. The API would then trigger “stuff” if a human was present, including pinging my home assistant to get notifications to my phone.

One of the biggest pains was the API service itself - setting it up, maintaining it, etc. Anyway I got the idea to perhaps build this API into a proper service for other people to use to analyse their own videos. I haven’t done it yet but I am thinking about it, and I wanted to get some thoughts and inputs from the good folks here. My idea is to make it do one thing, and one thing only, and as well and as fast as possible - human detection.

Some things to think about/some ideas I would like to test with you:

What’s different about this compared to using the Google Video API?
Well, it would be much simpler. It doesn’t do anything except detect people and return true or false. It also should be faster, it exits on the first frame that it sees a human - so you can get a result faster.

Also, you could use this to filter videos to find those with people in them, and then after that push it to the google video API to get deeper analysis (because they support lots of different features that are far more advanced). This way you could use your 1000 minutes of free time with google video api for more advanced things.

and finally, I don’t really trust google with my own videos - so my thinking here is that we would never keep any videos that were uploaded. They get uploaded, analysed, return result, and then deleted. Even to the point of using some of the more paranoid data center hosting locations.

What kind of solutions are you using today to filter out weather and lighting events that cause false positives? Have you built a custom setup, or just putting up with it?
would a simple API like this be something that you can easily fit into your existing systems? would you feel comfortable to write a small script (or copy paste examples) to upload the video to the API, wait for the result, and then trigger more actions yourself? Or would you want some kind of webhook function - or a mixture of both?
Would you be willing to try it out while I’m building it (if I build it), and give feedback?

In the interests of total honesty:

yes - I would like to make the service at a minimum pay for itself, we would have to rent servers/host servers to run it, keep it alive and healthy, etc.
no - I have not started building it yet. I want to see if anyone is actually interested in something like this first - otherwise, I will just keeping using it for myself
what I personally like about this is - I really dislike the grip that the mainstream camera vendors have on the market. They build the hardware, control the cameras and don’t let you run your own software on it, and force you to use their “human detection” features. So you can’t easily, without hacking around, get them to work with your own flow. I want to let anyone with any kind of camera get this particular feature in the easiest possible way.
in my day jobs I have been working building services like this and also in telecoms, so yes - I know how to build a reliable system that you could rely on to work.

I really appreciate any feedback, positive or negative criticism. If you guys think this is a good idea, then we will build it. If you think it’s not, then we won’t, basically!