Retrieving text output from esphome voice assistant

listlesslife · September 2, 2023, 6:22am

I’m working on a couple of voice assistant devices that use alternative outputs (e.g. rtttl through buzzer, serial lcd display, e-paper display, etc), rather than an amp/dac and speaker. The hurdle I’m facing is retrieving the text from the responses to feed to the displays. I’ve read through any docs I thought might be useful, but I couldn’t find anything. Maybe I’m not looking in the right place or I just missed it.

I have noticed that the text from the responses show up in the logs preceded by [voice_assistant:192]. If there’s a way to read from the logs, maybe I can filter it and place the appropriate string into a text sensor? I’m not really sure. I don’t have a lot of experience with this sort of thing.

Anyway, if anyone can help me get voice assistant’s responses in text form, I would really appreciate it. Thanks

[22:43:53][D][voice_assistant:168]: Speech recognised as: " turn on the hallway lights."
[22:43:53][D][voice_assistant:144]: Signaling stop...
[22:43:53][D][voice_assistant:192]: Response: "Turned on lights"

ckxsmart · September 3, 2023, 4:31am

You could use the Logger’s on_message to get that text

pcwii · September 3, 2023, 7:23am

This is good advice, I am going to try this too.
Keep in mind it will also require the mqtt component as I don’t think this is default with the M5 atom for instance.

ckxsmart · September 3, 2023, 4:09pm

MQTT is not needed. It is only shown in the example but you can put whatever you want under then.

listlesslife · September 3, 2023, 6:17pm

This looks like a pretty good direction to move toward. I looked over the logger component page so many times, I’m not sure how I missed this. I saw that there was a write function and thought it was strange that there wasn’t a read. Anyway, I’ll try to figure this out and post the solution or another question. Thank you.

mattbruman · September 24, 2023, 2:07am

Have you found a way to print the voice assistant response to the display? I am throwing the assistant on a M5StickC

listlesslife · September 27, 2023, 9:49pm

As of yet, I have not. Other projects, along with life, have pushed this to the back burner for a bit for me. I plan to revisit it sooner rather than later, but I’ll have to play it by ear. If you end up getting it to work in the meantime, I would appreciate a solution.

pcwii · September 28, 2023, 9:39pm

MQTT works for both STT and TTS in the m5.
This is what I have learned.

If you add MQTT to on_stt_end: you will receive text of what you said

on_stt_end:
  - mqtt.publish:
      topic: ha/tts
      payload: !lambda return x;

If you add MQTT to on_tts_start: you will receive the text of what the M5 will respond with.

  on_tts_start:
    - light.turn_on:
        id: led
        blue: 0%
        red: 0%
        green: 100%
        brightness: 100%
        effect: none
    - mqtt.publish:
        topic: ha/tts
        payload: !lambda return x;

If you add MQTT to on_tts_end: you will receive the path of the raw audio file that is created for the response.

  on_tts_end:
    - light.turn_on:
        id: led
        blue: 0%
        red: 0%
        green: 100%
        brightness: 100%
        effect: pulse
    - mqtt.publish:
        topic: ha/tts
        payload: !lambda return x;

I have not done anything with this yet but I will think of some ways to use this.
Here is a screenshot of what I found.

spacestone70 · November 5, 2023, 12:03am

I don’t know if you still need this, but the voice assistant component has the automation, just as pcwii showed, on_tts_start. Inside this automation, the response can be accessed with the variable x just as described in the documentation [Voice Assistant ESPHome Link]. You can use this variable in lamda however you like.

For example, I use a text sensor to see what the response was in home assistant:

voice_assistant:
  id: va
  ...
  on_tts_start:
    - text_sensor.template.publish:
        id: tts
        state: !lambda 'return x;'

text_sensor:
  - platform: template
    name: "text-to-speech"
    id: tts

In your case I would try to add the response to a global string to be able to use it in a display component. Something like this:

on_tts_start:
    then:
      - lambda: |-
          id(tts_global_string) = x;

globals:
  - id: tts_global_string
    type: std::string