Retrieving text output from esphome voice assistant

I’m working on a couple of voice assistant devices that use alternative outputs (e.g. rtttl through buzzer, serial lcd display, e-paper display, etc), rather than an amp/dac and speaker. The hurdle I’m facing is retrieving the text from the responses to feed to the displays. I’ve read through any docs I thought might be useful, but I couldn’t find anything. Maybe I’m not looking in the right place or I just missed it.

I have noticed that the text from the responses show up in the logs preceded by [voice_assistant:192]. If there’s a way to read from the logs, maybe I can filter it and place the appropriate string into a text sensor? I’m not really sure. I don’t have a lot of experience with this sort of thing.

Anyway, if anyone can help me get voice assistant’s responses in text form, I would really appreciate it. Thanks

[22:43:53][D][voice_assistant:168]: Speech recognised as: " turn on the hallway lights."
[22:43:53][D][voice_assistant:144]: Signaling stop...
[22:43:53][D][voice_assistant:192]: Response: "Turned on lights"

You could use the Logger’s on_message to get that text

This is good advice, I am going to try this too.
Keep in mind it will also require the mqtt component as I don’t think this is default with the M5 atom for instance.

MQTT is not needed. It is only shown in the example but you can put whatever you want under then.

This looks like a pretty good direction to move toward. I looked over the logger component page so many times, I’m not sure how I missed this. I saw that there was a write function and thought it was strange that there wasn’t a read. Anyway, I’ll try to figure this out and post the solution or another question. Thank you.

Have you found a way to print the voice assistant response to the display? I am throwing the assistant on a M5StickC

As of yet, I have not. Other projects, along with life, have pushed this to the back burner for a bit for me. I plan to revisit it sooner rather than later, but I’ll have to play it by ear. If you end up getting it to work in the meantime, I would appreciate a solution.

MQTT works for both STT and TTS in the m5.
This is what I have learned.

  1. If you add MQTT to on_stt_end: you will receive text of what you said
on_stt_end:
  - mqtt.publish:
      topic: ha/tts
      payload: !lambda return x;
  1. If you add MQTT to on_tts_start: you will receive the text of what the M5 will respond with.
  on_tts_start:
    - light.turn_on:
        id: led
        blue: 0%
        red: 0%
        green: 100%
        brightness: 100%
        effect: none
    - mqtt.publish:
        topic: ha/tts
        payload: !lambda return x;
  1. If you add MQTT to on_tts_end: you will receive the path of the raw audio file that is created for the response.
  on_tts_end:
    - light.turn_on:
        id: led
        blue: 0%
        red: 0%
        green: 100%
        brightness: 100%
        effect: pulse
    - mqtt.publish:
        topic: ha/tts
        payload: !lambda return x;

I have not done anything with this yet but I will think of some ways to use this.
Here is a screenshot of what I found.

I don’t know if you still need this, but the voice assistant component has the automation, just as pcwii showed, on_tts_start. Inside this automation, the response can be accessed with the variable x just as described in the documentation [Voice Assistant ESPHome Link]. You can use this variable in lamda however you like.

For example, I use a text sensor to see what the response was in home assistant:

voice_assistant:
  id: va
  ...
  on_tts_start:
    - text_sensor.template.publish:
        id: tts
        state: !lambda 'return x;'
text_sensor:
  - platform: template
    name: "text-to-speech"
    id: tts

In your case I would try to add the response to a global string to be able to use it in a display component. Something like this:

on_tts_start:
    then:
      - lambda: |-
          id(tts_global_string) = x;
globals:
  - id: tts_global_string
    type: std::string
1 Like