Hi ! After switching from Nest to Frigate and HA, I tried to replicate the package delivery notification functionality of the older camera system. This month, with the release of the GPT-4 Vision API, I was able to take my experimentation to the next level to allow a higher level of contextual understanding.
Here is a prototype called AmbleGPT GitHub - mhaowork/amblegpt: Video surveilance footage analyst powered by GPT-4 Vision I put together quickly over the past couple weeks. Feel free to give it a shot!
Feedback and suggestions are welcome!
Summary
AmbleGPT is activated by a Frigate event via MQTT and analyzes the event clip using the OpenAI GPT-4 Vision API. It returns an easy-to-understand, context-rich summary. AmbleGPT then publishes this summary text in an MQTT message. The message can be received by the Home Assistant Frigate Notification Automation and sent to a user via iOS/Android notifications.
Demo
More video examples:
Video | GPT Summary |
---|---|
Suspicious: A man appeared, approached a package left outside, picked it up, and walked away. This could indicate a potential package theft, as the person showed no signs of verifying address or ownership before taking the package. | |
Suspicious: A person wearing a hoodie and a mask is seen approaching, standing by, and then walking away from the front door of a house. The person is carrying a bat and the scene takes place during nighttime, which is suggestive of suspicious or potentially criminal activity. | |
A delivery man, approximately 35 years old, approached the door and placed a package down. He briefly interacted with a mobile device before leaving the scene. | |
A female, approximately 30 years old and 1.65 meters tall, is seen approaching and standing at the front door, looking down momentarily and then preparing to interact with the person who might open the door | |
A postal worker (in a blue uniform) was seen exiting a delivery vehicle and walking off-screen, presumably to deliver mail or a package. | |
A male and a female, appearing to be in their 30s, are seen crossing the street from the left to the right. They walk side by side and are visible for a total of 18 seconds. |
More details are in GitHub - mhaowork/amblegpt: Video surveilance footage analyst powered by GPT-4 Vision