Hi there, from now on i can see who is at the front door on my Google Home. It goes via an intermediate docker image, which (on automation trigger) streams a generated video to the Google Home. The video, once the URL is called; is generated at runtime by fetching the latest snapshot image from the camera, combine it with a Ding-Dong sound and create an MP4 from it using ffmpeg and sends that file as a response to the calling service.
Basically it is a ‘generated’ MP4 of 10 seconds long with the latest snapshot of the cam + the ding-dong MP3 under it. So nothing spectacular, but it helps with the Wife Acceptance Factor.
Please reply if you want to know more so i can upload my Docker image (+ source) to Github and give a more thorough rundown of how i achieved it!?