Actionable notifications without voice assistant

Ive seen a few people demo actionable notifications using voice assistants like Alexa or google, but I am hoping to avoid having to use either of those. I currently have cameras that provide RTSP streams in quite a few locations already in my house and would like to have a way to do some small actionable notifications using the speakers and microphones built into these existing cameras.

High level I am looking for something like this:

  1. Trigger Event occurs(light left on, user enter room, etc…)
  2. Play sound on camera using speaker using TTS
  3. Start waiting for reply(yes, no)
  4. Process reply

So I would want to be able to walk into a room and have it detect me as a person(which it already does), then ask me if for instance the lights should turn on. Then if I say yes it should turn the lights on

It sounds like you are trying to turn your camera into Siri or Alexa. There is nothing in the background to interpret/process your responses. How would it know what you were saying?

Why? I have an Alexa device in almost every room of my house (and in my car) and if I want a light on, I just say “Alexa, turn on the kitchen light”. I don’t find that to be a problem.

On the other hand, how do you trigger on a person detected event? I have a few Wyze Cams around the house running RTSP and a couple outside, and sometimes I would like for presence to turn the lights on or off.

Well I am not looking to be able to say Hex Alexa, or anything like that. Like I dont want any hot word trigger, I am just looking for basic actionable replies using the microphone. I already have simple notifications being spoken on the speakers. I would just like a way to then process a RTSP streams audio and action a yes/no to then trigger something else

I think that speech recognition is going to be a bigger challenge. But, keep us posted.

I hhave unifi devices already detecting people so triggering something to happen isnt a problem. I could already setup a simple automation to turn on lights in a room when someone is detected. I would just like a way to first ask rather then always do it.

I dont want to walk by a room for instance and it decides it saw a person so it turns on.

Right, but how is it going to understand what you are saying?

Well home assistant claims to have speech to text, Speech-to-Text (STT) - Home Assistant

I was also looking at using node red with ibm watson or something similar. I am totally open to using some other software to do that, I just dont really want to start puttiing alexas or other devices in every room when they already have speakers/microphones. It just seems kind of pointless to me

I think you are looking for capabilities that are beyond HA. Maybe you should look into iOS/Android actionable notifications. You’ll need to press a button on your phone rather than just speak, but you’ll be able to accomplish your goal.

Yeah I looked at that and if I cant figure out a way to make this work that is likely what I will do. Its just not as ideal since then it requires notifications going to my phone. So when someone else walks into the room who doesnt have the app installed Ill be getting notifications and they will have to turn the lights on manually