AI Generate Data Task Doorbell Video Analysis and Notifications

:camera_flash: Front Door AI Vision Automation (with Snapshots + Fun Commentary)

Hi everyone,

Tonight is first time to try to post here and my first one may have gone to the wrong place.

I built an automation that takes multiple snapshots when someone is detected at the front door, sends instant notifications, and then uses AI to provide a short, playful “reality-TV style” commentary about what’s happening (arriving, leaving, pausing, etc).

It works with any camera that can do snapshots and a detection sensor (person or motion). Notifications include deep links to HA’s camera dashboard, and you can also have the AI commentary spoken on your smart speakers.

:hammer_and_wrench: Requirements

Before you paste the YAML, here’s what you’ll need:

  1. A Camera in Home Assistant
  • Must support camera.snapshot.
  • Example: Reolink, Tapo, UniFi Protect, Doorbird, etc.
  1. A Binary Sensor for Person/Motion Detection
  • Example: binary_sensor.front_door_person_detection.
  • Some cameras provide person detection. If not, you can use generic motion detection.
  1. AI Task Integration
  • You need the new AI Task integration in Home Assistant 2025.
  • Works with OpenAI (ai_task.openai_ai_task), Google Gemini (ai_task.google_ai_task), or other supported providers.
  1. Notification Service
  • Example: notify.mobile_app_YOUR_PHONE.
  • Needed if you want push notifications with snapshots.
  1. (Optional) TTS Speakers
  • Example: media_player.YOUR_SPEAKER.
  • Any media_player entity that supports TTS will work.

:rocket: Setup Steps

  1. Copy the YAML code from this post into your automations.
  • In HA: Settings → Automations & Scenes → Add Automation → Edit in YAML → Paste.
  1. Adjust entity IDs
  • Replace camera.YOUR_FRONT_DOOR_CAMERA with your camera.
  • Replace binary_sensor.YOUR_PERSON_OR_MOTION_SENSOR with your detection sensor.
  • Replace notify.mobile_app_YOUR_PHONE with your notification service(s).
  • Replace media_player.YOUR_SPEAKER with your speaker(s).
  1. Check snapshot paths
  • This automation saves images to /media/snapshots/ and copies one to /config/www/snapshots/.
  • Make sure you have both folders in place.
  1. Update AI Task entity if needed
  • Change ai_task.YOUR_AI_TASK_ENTITY to your actual AI task entity (for example ai_task.openai_ai_task).
  1. Save and test
  • Walk in front of your camera or trigger your person sensor.

  • You should get:

    • Immediate “Person detected” spoken on speakers.
    • Instant notification with snapshot + deep link.
    • A second update with the AI commentary after all frames are analyzed.

:sparkles: Features

  • Fast initial alert (first snapshot + instant push).
  • Full analysis (multiple frames captured and sent to AI).
  • Fun commentary (reality-TV style, always 1–2 sentences).
  • iOS deep links to jump directly into HA’s Cameras dashboard.
  • TTS announcements on speakers of your choice.
  • 40s cooldown so you don’t get spammed.

alias: Front Door AI Vision (Reality-TV Style)
mode: single
max_exceeded: silent

triggers:
  - platform: state
    entity_id: binary_sensor.YOUR_PERSON_OR_MOTION_SENSOR
    to: "on"

variables:
  snapshot_dir: /media/snapshots
  snapshot_www: /config/www/snapshots
  camera_entity: camera.YOUR_FRONT_DOOR_CAMERA
  notify_target: notify.mobile_app_YOUR_PHONE
  speaker_target: media_player.YOUR_SPEAKER
  ai_entity: ai_task.YOUR_AI_TASK_ENTITY

actions:
  - parallel:
      # 1. Grab snapshots
      - sequence:
          - service: camera.snapshot
            data:
              entity_id: "{{ camera_entity }}"
              filename: "{{ snapshot_dir }}/frontdoor_{{ now().timestamp() }}.jpg"
          - delay: "00:00:02"
          - service: camera.snapshot
            data:
              entity_id: "{{ camera_entity }}"
              filename: "{{ snapshot_dir }}/frontdoor_{{ now().timestamp() }}_2.jpg"
          - delay: "00:00:02"
          - service: camera.snapshot
            data:
              entity_id: "{{ camera_entity }}"
              filename: "{{ snapshot_dir }}/frontdoor_{{ now().timestamp() }}_3.jpg"
          - service: camera.snapshot
            data:
              entity_id: "{{ camera_entity }}"
              filename: "{{ snapshot_www }}/frontdoor_latest.jpg"

      # 2. Send immediate notification
      - sequence:
          - service: notify.mobile_app_YOUR_PHONE
            data:
              title: "Front Door"
              message: "Person detected at the front door."
              data:
                image: "/local/snapshots/frontdoor_latest.jpg"
                url: "/lovelace/cameras"

      # 3. AI analysis + commentary
      - sequence:
          - service: ai_task.generate_data
            response_variable: result
            data:
              entity_id: "{{ ai_entity }}"
              task_name: Front door camera analysis
              instructions: >
                You are a live reality-TV commentator narrating the front porch.
                Be playful and dramatic, but stay factual: only describe what is visible.
                - Say if the person is arriving, leaving, pausing, or turning (if clear).
                - Mention a vehicle ONLY if clearly interacting with it (opening, entering, exiting).
                - If direction/action is unclear, say “unclear” instead of guessing.
                - Output exactly 1–2 short sentences, never a paragraph.
                - Example: “And there he goes—quick pause, turn right—no car cameo tonight.”
              structure:
                analysis:
                  selector:
                    text: {}
              attachments: "{{ ai_attachments }}"

          - variables:
              final_text: >
                {{ result.data.analysis
                   | default(result.response_text)
                   | default(result.response)
                   | default('No visible activity detected.')
                   | replace('*','')
                   | trim }}

          - service: notify.mobile_app_YOUR_PHONE
            data:
              title: "Front Door AI"
              message: "{{ final_text }}"

          - service: tts.speak
            target:
              entity_id: "{{ speaker_target }}"
            data:
              cache: false
              message: "{{ final_text }}"

1 Like

Hi, thanks for sharing this, I’ve been trying to get something almost identical working myself, but I’m struggling with how to reference the snapshot images (which for me exist in /homeassistant/www/snapshots)

Where are you defining ai_attachments and what to?

Thanks

try something like this. you can either use a variable like in the example or your file name instead of snap1 etc.

ai_attachments: |
    [
      {"media_content_id":"media-source://media_source/local/snapshots/{{ snap1_file }}","media_content_type":"image/jpeg"},
      {"media_content_id":"media-source://media_source/local/snapshots/{{ snap2_file }}","media_content_type":"image/jpeg"},
      {"media_content_id":"media-source://media_source/local/snapshots/{{ snap3_file }}","media_content_type":"image/jpeg"}
    ]
1 Like

Thanks, are you using Nabu Casa?

I might be wrong here but I suspect ai_task.generate_data doesn’t encode attachments and submit with the request, it simply passes a URL.

So the URL that is passed is media-source://media-source... which isn’t understood by OpenAI.

The only way I’ve been able to get this to work is to call the camera snapshot directly in the AI Task using…

 - media_content_id: media-source://camera/camera.front_door_onvif_mainstream

Which is OK, but then I need to do a second snapshot to save it locally - but for me, there’s a 3 second delay between them (due to camera and writing file), so the local image isn’t exactly the same as has been sent to OpenAI.

2 Likes

I solved that by taking the first snapshot and copying it to the media folder so I can send a clickable notification
with photo to the camera dashboard I setup. It has the live video and a list of previous snapshots. Then the AI Task analyzes all snapshots 30 seconds later then updates the notification automatically on iPhone.
I do use Nabu Casa

1 Like

I feel like I’m so close, but so far away.

  1. The image I get in the first notification is just a generated image that says the image wasn’t found. However, I’ve verified that the folders you said to create are there, and the images are being created. I noticed that line 33 has a path of /local, so I changed that, but that didn’t work either. It’s also not taking me to cameras specifically. Is that because I have to create a specific dashboard? I don’t understand the /lovelace URL, but I assume that’s correct. Figured all that out from the docs at least.
  2. I never get the second notification. If I look at traces, I get this error for that step: Error: expected a dictionary @ data['attachments'][0]

Edit: I’m noticing ai_attachments doesn’t seem to be defined anywhere, which would be the source of the error.

After looking at documentation, this is the best I could find for that attachments value, and it works:

attachments:
  media_content_id:  media-source://camera/camera.g4_doorbell_pro_poe_high_resolution_channel
  media_content_type: image/jpeg

If you want another automation to look at, you can import the blueprint that was created as well.

Same issue, can’t pass attachments from folder to AI
snapshot_dir: /media/snapshots
snapshot_www: /config/www/snapshots
So its no good as stream detection is sometimes hit and miss due to lag between video and motion detection. If there is no solution then need to explore other options :frowning:

i don’t follow how this code could work. You have 3 actions running in parallel which means action 1 has 6 sequential steps. The first is to take a snapshot
At the same time, action 2 in parallel is sending frontdoor_latest.jpg to your phone. But it will be another 4-6 seconds before frontdoor_latest is even updated since there is a 2 sec pause, then another snapshot, then another 2 sec pause, then finally you actually take the frontdoor_latest snapshot.

So you are sending the previous frontdoor_latest.jpg to the phone initially

Then later you are sending the AI description but that will be describing the new latest image not the previous… so you will see the UPS guy but the AI will describe the Fedex guy