Continuous conversation workaround with code

I’ve been trying to work on and improve a possible workaround for continuing conversations. The current design does not make ANY sense to tie an open reply conversation back to ONLY if the AI responds with a question. that isn’t how natural language works and goes against funcitons like DnD, RP, or just regualrly speaking with the ai

I’ve been using a toggle that will swap the AI into a new session each time it replies without a question. this has allowed an easy way to vocally chain commands but then you ALWAAYS start off with new context so it’s like speaking to someone with alzheimers. How do we expose the existing context for use here? OR what would be better is a built in long conversation mode but if that’s a problem merely allowing a direct move from a old convo right into the new would be best.

It is not ideal to keep saying “hey jarvis” because the AI didnt reply with a direct question when having a discussion.

Please see below code

  • alias: Voice assistant follow-up
    description: Keeps the conversation going if Follow Up Mode is enabled
    trigger:
    • platform: state
      entity_id: assist_satellite.home_assistant_voice
      from: responding
      to: idle
    • platform: state
      entity_id: assist_satellite.home_assistant_voice
      from: responding
      to: idle
      condition:
    • condition: state
      entity_id: input_boolean.follow_up_mode
      state: ‘on’
      action:
    • service: assist_satellite.start_conversation
      data:
      start_message: ‘’
      preannounce: false
      continue_conversation: true -< I need something like this.
      target:
      entity_id: ‘{{ trigger.entity_id }}’
      mode: single

Hello RC,

Thanks for coming here and asking a question.
Would you be so kind as to adjusting the format of your code so that we can read it properly & check the YAML spacing, etc. Editing your original is the preferred way. It is very hard for us to tell what is what when the text formatter jumbles everything like that.
You can use the </> button like this… How to format your code in forum posts
OR… Here is an example of how to fix it from the site FAQ Page.
How to help us help you - or How to ask a good question.

2 Likes

It’s not possible to fully implement this with automation, as it would require constantly managing the mode switch—otherwise, the device will enter an infinite loop.

You need to modify the configuration file in ESPHome, as global flags are required for operation, and it’s also necessary to implement an exit from the dialog loop (e.g., if VAD isn’t activated within n seconds). However, I wouldn’t recommend altering the VPE code; for experiments, it’s better to use an ESP32S3 board.

2 Likes

Here is further details along with formatting and just added thoughts after watching the firmware logs across multiple tests (and also diving into the firmware itself but with no luck.

Continuous Conversation Mode for Voice PE: Firmware Limitations

Current Workaround (Limited Success)

I’ve have a “quasi” continue conversation mode with this automation:

- alias: Voice assistant follow-up
  description: Keeps the conversation going if Follow Up Mode is enabled
  trigger:
  - platform: state
    entity_id: assist_satellite.home_assistant_voice_yourid
    from: responding
    to: idle
  - platform: state
    entity_id: assist_satellite.home_assistant_voice_yourid
    from: responding
    to: idle
  condition:
  - condition: state
    entity_id: input_boolean.follow_up_mode
    state: 'on'
  action:
  - service: assist_satellite.start_conversation
    data:
      start_message: ''
      preannounce: false
    target:
      entity_id: '{{ trigger.entity_id }}'
  mode: single
  id: 702e8dd4a3c748e8a33f779162978101

And these automations to toggle follow-up mode by voice:

- alias: 'Voice: Stop Follow Up Mode'
  trigger:
  - platform: conversation
    command:
    - stop follow up
    - stop follow up mode
    - stop listening
    - stop talking
    - end conversation
    - conversation over
    - disregard
    - nevermind
    - cancel conversation
    - enough
  action:
  - service: input_boolean.turn_off
    target:
      entity_id: input_boolean.follow_up_mode
  - service: persistent_notification.create
    data:
      title: Follow Up Mode
      message: Follow up mode has been disabled.
  mode: single
  id: a6f2309be431465084e8e81ec687e65a

This approach is great for commands like “turn off the light. Done. Turn on the TV. Done. Play some music” - all without constantly repeating the wake word. It also lets you immediately cut off the assistant while it’s speaking.

The Downside: This does NOT carry over context. Every single call is like a new conversation, so you’re basically speaking to an LLM with the dude from Memento.

Home Assistant Voice PE Analysis: Question Detection Behavior

Based on detailed log analysis, I’ve found evidence of how Voice PE decides when to keep listening:

State Transition Patterns

When analyzing device logs, two distinct patterns emerge:

Pattern 1: Responses WITH questions

[20:51:07] Response: "What makes you think of purple today?"
[20:51:12] State changed from RESPONSE_FINISHED to START_MICROPHONE

Pattern 2: Responses WITHOUT questions

[20:51:19] Response: "Purple is now a key data point in my brain!"
[20:51:23] State changed from RESPONSE_FINISHED to IDLE

Technical Analysis

The firmware appears to:

Perform linguistic analysis on response text

Identify question patterns (question marks, interrogative phrases)

Force different state transitions based solely on this detection

Make this decision extremely quickly (within milliseconds)

Proposed Enhancement

A simple firmware modification could add a configuration option:

voice_assistant:
  # Existing configuration...
  continued_conversation:
    enabled: true
    mode: "always"  # Options: "questions_only", "always", "disabled"
    timeout: 5s

This would enable users to choose their preferred conversation style without complex workarounds.

Conclusion

I have literally tried everything from carrying over conversation IDs across states to flashing the firmware with custom code. I cannot overcome this design limitation, and it’s REALLY holding back the device from being a true conversational AI home assistant - for real D&D games, proper long-form discussions not constantly interjected with “HEY JARVIS”.

I hope the you can look into where it needs to be changed, because when it gets into firmware design, I can’t crack it. The issue appears to be that the state transitions happen TOO quickly at the firmware level for Home Assistant automations to reliably intercept. multiple state transitions in milliseconds.

I love your product but tying continuous conversations to questions doesn’t make sense for natural spoken word conversation.

Thank you for all you do!

Thank you for explaining the technical side. This confirms what my experiments revealed - that implementing proper continuous conversations requires firmware-level changes that can’t be achieved safely through automations alone.

Since modifying the VPE code directly isn’t recommended, I hope this can be prioritized as an official feature. The current question-detection approach is clever but doesn’t match how natural conversations flow.

A simple configuration toggle in the firmware would allow users to choose their preferred conversation style.

This would make the Voice PE a true conversational assistant rather than just a command-response system.

The latest esphome 2025.05 update brought big changes, start_conversation and the continued conversation is now available in the release branch. I had to radically rework the code for my esp32s3.
But I think you can try to apply this scheme even for vpe if you are interested in experimenting.
The new conversation round is started only for neutral sentences, sentences with a question are handled by the component itself.
You can add an additional switch as a condition to turn the function on and off.
There may be additional delays or waits at the on_end stage, but it’s hard to say which.

Here are the additional elements that were required for implementation.

script:
  - id: listening_timeout
    mode: restart 
    then:
      - delay: 4s
      - if:
          condition:
            lambda: |-
              return id(voice_assistant_phase) == 2;
          then:
            - voice_assistant.stop:

globals:
  # Variable for tracking TTS triggering 
  - id: is_tts_active
    type: bool
    restore_value: no
    initial_value: 'false'
  # Variable for tracking built-in continued conversations 
  - id: question_flag
    type: bool
    restore_value: no
    initial_value: 'false'


voice_assistant:
  ...
  on_listening:
    # Reset flags
    - lambda: |-
        id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
        id(is_tts_active) = false;
        id(question_flag) = false;
    # Waiting for speech for 4 seconds, otherwise exit
    - script.execute: listening_timeout

  on_stt_vad_start:
    # Turn off the script if speech is detected
    - script.stop: listening_timeout
    
  on_tts_start: 
    ...
    # Finding a question mark at the end of a sentence.
    - lambda: |-
        bool is_question = false;
        if (!x.empty() && x.back() == '?') {
          is_question = true;
        }
        id(question_flag) = is_question;

  on_tts_end:
    # Set the flag when the stage is reached
    - lambda: |-
        id(is_tts_active) = true;

  on_end:
    ...
    - if:
        condition:
          and:
            - lambda: 'return !id(question_flag);'
            - lambda: 'return id(is_tts_active);'
        then:
          - voice_assistant.start:
1 Like

Thanks for this! Ill try it out. I actually managed to get an internal cooldown going as well and also long and short term memory!

https://www.reddit.com/r/LocalLLaMA/s/urRMWaGf0m

I’ll hopefully have a separate write up. All was done with very clever automations. I can now though officially daisy chain command and also have internal memory for the unit that survives restarts.