How to get data spoken into the companion app conversation?

Is there any more documentation of the HA conversation integration other than the official documentation? I am hoping to get access to the raw sentence that get interpreted within Home Assistant.

Some background:
With the “2023.1: Happy New Year of the voice!” release I was inspired to spin up a Rhasspy instance. Got it working and seems to be working mostly fine for what Rhasspy is intended.

Beyond just controlling lights and getting the time/weather and stuff I wanted to then set it up to send spoken open ended questions to a remote API with the response to be send to my local speaker. So: STT → remote API → TTS

I have all of that working unfortunately Rhasspy drops all word it doesn’t recognize from its trained sentences before sending it to Home Assistant (its more complicated than that, but thats the jist) when interpreting voice commands.

So if I type in to the Rhasspy console “Answer this How many moons does saturn have” I will hear “Saturn has at least 82 moons.” from my speaker, however if I speak that same phrase to Rhasspy via a microphone the Home Assistant intent only receives “Answer this”, there is no way to do/send wildcards in Rhasspy.

The sentence/intent (uses “answer this” as intent trigger) and removes it before sending it to the remote API:

askOpenAiQuestion:
  speech: 
    text: Let me check on that
  action:
    - service: input_text.set_value
      data_template:
        entity_id: input_text.last_openai_question
        value: "{{ _intent.rawInput | replace(\"answer this\",\"\") }}"

I started looking for another solution to send STT questions to the remote API and found the built in HA Conversation Integration. With it I can ask the question into my phone (HA companion app) “How many moons does saturn have” and it transcribes it correctly and prints “How many moons does saturn have” in the conversation dialog, so I know it gets me at least part of the way there. In hopes to solve this, I have a few questions I hope someone can help with.

Does that STT data get written to a sensor somewhere that HA can access?

Does HA conversations integration support wildcard words/phrases?

Is there more documentation on the conversation integration that I missed when looking?

Is there a way to get the conversation integration to access the physical microphone attached to the HA system the same way Rhasspy does (maybe with an alternative wake word)

Has anyone got any other SST system working with HA when sending words it wasnt specifically trained on?

Thanks for reading.

You might need “open transcription” for that.
Not all STT models have that, but those that have is mentioned here.

https://rhasspy.readthedocs.io/en/latest/speech-to-text/

Unfortunately I dont think any of those speech to text models will work while allowing Rhasspy to continue to respond to commands. They all do transcription however it seems only (at least the way I am understanding it now) on the specific words they have been trained on and will ignore all other words. Which I now [think] I understand why it works when I type it in to the Rhasspy console but not when speaking into the microphone - because its already past that step in the Rhasspy tools chain workflow (or whatever, sorry, Im not sure on the actual terms used). This seems totally reasonable that the models ignore other words since they are mostly designed to be used offline and on pi like devices.

Quoting and expanding what I posted on a similar question on the Rhasspy forum:

At this point I am looking at two options. The first one is promising but initially limited, which is to use the Home Assistant Conversation integration as that already does full text transcription, but I have yet to find a way to get that actually into HA. If I cant find a way I’m thinking maybe to put in a feature request to the HA companion app for them to add whatever gets transcribed into a HA sensor (sensor.conversation_last_sentence or something) that HA can interact with.

The second one is to find a separate STT model that does do full language transcriptions and install that on a separate rPi and send the raw audio from Rhasspy to it to be transcribed and from there sent it along to the remote API.

Other suggestions, pointers, or corrections to any erroneous assumptions you see I may be making are welcome.

The problem is not really Rhasspy then, but the language body it is train on and that is often the same as all the other VRs, when it comes to conversational bodies.
The bigger bodies also require a more powerful computer, so a RPi might not be acceptable at all.

This is pretty correct, I agree its not a problem with Rhasspy, which is doing what is was designed to do.

Im hoping to get the companion app to act as the transcription service. I put a feature request in here to get the transcribed text in to an HA sensor. If that gets in I can do the full STT → TTS answers I am hoping for.

I posted the implementation details on the Rhasspy forum because someone asked but since this really isnt a Rhasspy issue I’ll post it here as well in case someone else want to try also:

Sure. Theres really not much to it.

In Rhasspy (only works entering text into the console)

[askOpenAiQuestion]
(answer this) {openaiquestion}

The HA intent_script that updates a sensor that is watched by automations:

askOpenAiQuestion:
  speech:
    text: Let me check on that
  action:
    - service: input_text.set_value
      data_template:
        entity_id: input_text.last_openai_question
        value: "{{ _intent.rawInput | replace(\"answer this\",\"\") }}"

Then a couple of HA automations to watch for sensors changing:

last_openai_question updates → send value to rest_command
last_openai_answer updates → send value to TTS

- id: request_openai_question
  alias: Request OpenIA Question
  trigger:
  - entity_id: input_text.last_openai_question
    platform: state
  condition:
    condition: and
    conditions:
    - condition: template
      value_template: >
        {{ states("input_text.last_openai_question") != 'empty' }}
    - condition: template
      value_template: >
        {{ states("input_text.last_openai_question") != '' }}
  action:
  - service: rest_command.openai_raw_answer
    data:
      pKey: "{REDACTED}"
      pQuery:
        value_template: >
          {{ states("input_text.last_openai_question") }}
- id: answer_openai_question
  alias: Answer OpenIA Question
  trigger:
  - entity_id: input_text.last_openai_answer
    platform: state
  condition:
    condition: and
    conditions:
    - condition: template
      value_template: >
        {{ states("input_text.last_openai_answer") != 'empty' }}
    - condition: template
      value_template: >
        {{ states("input_text.last_openai_answer") != '' }}
  action:
  - service: tts.cloud_say
    data:
      entity_id: media_player.vlc_telnet
      message: >
        {{ states("input_text.last_openai_answer") }}
      options:
        gender: male
      language: en-GB

An HA rest_command to send the question to the remote API (my server with both local and internet access)

  openai_raw_answer:
    url: https://{MY_SERVER_DOMAIN}/queryOpenAI.php
    method: POST
    headers:
      user-agent: 'Home Assistant Rest Command'

Then the REST endpoint is just a few lines of php (removing auth and sanitizing stuff) that queries openAI and writes the answer to an HA sensor which the above automation listens for and speaks the answer via TTS:

  $theQ = $_POST['pQuery'];
  exec("curl https://api.openai.com/v1/completions -H \"Content-Type: application/json\" -H \"Authorization: Bearer {OPEN_AI_AUTH_TOKEN}\" -d '{\"model\": \"text-davinci-003\", \"prompt\": \"$theQ\", \"temperature\": 0, \"max_tokens\": 40}'", $output , $retval);

  $replyData = json_decode($output[0]);
  $replyText = rtrim(ltrim($replyData->choices[0]->text));

  exec("curl -v -X POST http://homeassist.local:8123/api/states/input_text.last_openai_answer -d '{\"state\": \"$replyText\"}' -H \"Authorization: Bearer {HA_AUTH_TOKEN}\" -H \"Content-Type: application/json\"", $output , $retval);

Im sure this could be accomplished better by someone who actually knows what they are doing, but to be honest I just spent an afternoon putting this together in an effort to try an impress my daughter.

I created a feature request for the Home Assistant companion app to enter the transcribed text into a sensor

Just a quick update in case anyone might be interested…

The feature request to add the companion app transcription to a Home Assistant sensor what rejected as out of the apps scope, so thats another dead end. When closing the request they suggested that I look into Almond but it looks to be no longer active https://almond.stanford.edu/ and the HA Integration will no longer install.

It looks like maybe its in the process of being replaced by Genie https://genie.stanford.edu/ - does anyone know the status on that switch and if there is any documentation for the Genie HA Addon yet?

Yeah, Genie was in the “State of the Open Home” a few years back.
No idea on its current state.

@keassol was there any progress on this topic since back then?
I also would like to be able to access the STT text without HA filtering so that I can process that with my own cloud automation services.

No, I pretty much gave up after my last post to this thread. I just created a browser based interface instead:

snapshot

And if the “speak answer” box is checked it will use the Home Assistant API to TTS the answer from the speakers. If not checked it will just display the answer on that browser interface.

I likely could embed that interface in the HA dashboard but we just have it bookmarked and access it from there.

Sorry I couldn’t have been more help. If you do get it working please post how you did it here.