Automate listening by assist

I’ve been looking at options to make voice assist starting listening automatically and without a wake word. As an example, I walk into a room (motion detected), the ESP32 starts listening and I can just say “turn on the light” and it is processed. Also, I would like to be able to have assist ask me a question and wait for a response/answer. Is this currently possible? This is for situations where I don’t always want the light to turn on, say passing through the bedroom to the ensuite or rooms where other people don’t want automated lights.

I’d love to be able to make this work, particularly leveraging espresence so HA will only talk to me (I know others in the house will not want this). It would be great for a morning briefing when I first walk into a specific room, but only if I answer “yes”. I’m sure I’m not the only one wanting to do this and someone will have worked it out, or made the suggestion to the HA team somewhere.

1 Like

It’s possible…

For my first attempt, when the sun is below horizon, tts say " sun is below horizon, do you whant close the shutters ?" , I say " yes, i want" and it works.

Add a switch in my ESP32 config to force listening without wakeword

  • platform: template
    name: Listen
    id: Listen
    optimistic: true
    - switch.turn_off: use_wake_word
    - delay: 1s
    - voice_assistant.start_continuous
    - switch.turn_on: use_wake_word

An automation “voice_close_shutter” , with "yes i want " as speech trigger
A second automation “assist_close_shutter_proposal” with sun bellow_horizon as trigger , and in actions

  • tts say "do you whant close the shutters "
  • automation.turn_on “voice_close_shutter”
  • turn_on Listen (the switch)
  • Wait 5 secondes (for response)
  • Turn_off Listen
  • automation.turn_off “voice_close_shutter”

With automation.turn_on and off i can use same response “yes i want” for other automation .

It’s just for testing but work fine. I work on better solution.

1 Like

Thanks Will. That is a great solution. I will give it a try.

I came up with an all-in-one script to do something like this…

  • TTS asks a question
  • HA waits until TTS is done playing
  • ESPHome stops wake word detection
  • ESPHome triggers listening to response
  • STT waits for yes/no/timeout
    • if yes, run given script
  • ESPHome reverts to wake word detection

It is limited to yes/no/nothing, but could be duplicated for other needs.
I just started using it, there might still be some bugs, definitely some quirks (like the media_player entity being inconsistent in state changes playing>idle which delays the listening trigger; use an LED if you want to know exactly when you can reply). You will want to change the delays/timeouts according to your configuration (Whisper model used, STT overall performance, etc).
Let me know if you find ways to improve upon it.

1 Like

Same problem with variable TTS delay. No finding workaround :face_with_raised_eyebrow:
Work similar with assist on companion using notify instead ESP32

service: notify.mobile_app_xxx
 message: "command_activity"
  intent_package_name: ""
  intent_action: "android.intent.action.ASSIST"

Actually, it’s not the TTS that’s the problem in my case, it’s the inconsistent “return to idle” of the media_player after it’s done playing. Sometimes, for no apparent reason, it takes a few extra seconds before it changes back to idle, which delays the whole thing.

Otherwise, this part takes care of “waiting until TTS is done” if you place it right after TTS call (and use same target in TTS call also)

  - wait_template: "{{ is_state(target, 'playing') }}"
    continue_on_timeout: true
    timeout: "00:00:30"
  - wait_template: "{{ is_state(target, 'idle') }}"
    continue_on_timeout: true
    timeout: "00:00:30"

Of course, this only works well if the media_player in question is only used for TTS output, if you are playing music at the same time, on the same media_player, it will be forced to wait for timeout instead (you could pause music before asking the question though).

I managed to partially do it using the Stream Assist integration from AlexxIT that allow you to use cameras to talk to assist.
With that integration we have the option to start the voice pipeline where we want, so I simply created a script to start it from Speach To Text, this way it skip the wake word. When I activate the script, it start listening immediately.
What I have not yet managed to do is to let it ask a question.

This is how I run the service:

  camera_entity_id: camera.reolink_camera_sub
  player_entity_id: media_player.tablet
    start_stage: stt
    end_stage: tts
  stt_start_media: media-source://media_source/local/beep.mp3