Home Assistant : Assist + Speech to text to NodeRed and Text to Speech to assist

Dear HASS Community,

After one month of learning how to configure HASS, templates, Webhook, Node-RED, etc., I am eager to share my experience with the “Assist” module in conjunction with Node-RED, OpenAI, and other exciting features that Home Assistant offers.3

The primary goal of this project is to:

  • Utilize “Assist” to enable voice interaction with my home, receive intelligent and adaptive responses, control my home entities, and obtain whimsical answers about life while minimizing the use of cloud platforms (with the exception of OpenAI).

Note: I am aware that this is not a novel concept as many fantastic individuals online are sharing their innovative work with HASS and OpenAI.
However, I have yet to come across a post that describe a system that permit :
→ Speech-to-Text function, which captures all sentences/questions from the user,
→ forwards the data to Node-RED,
–>consults OpenAI (or any other API capable of interesting functions),
→ awaits a response,
→ then sends the response back to “Assist” for audio output (and possibly performs certain actions as instructed by the user into NodeRed).

For those interested in exploring cool content that has greatly aided me, check out this video :

And all the chapters of “Year of the Voice” on the HASS Blog:

More cool blogs are coming I’m sure…

I am sharing this work because I aim to assist individuals (as far as I can) with similar goals, and to save them the three-plus weeks it took me to grasp how templates function :stuck_out_tongue:
Additionally, I am open to receiving your feedback on how to better achieve the objective described earlier.

IF YOU WANT TO SKIP THE INTRO IT IS HERE :
IMPORTANT NOTICE
“Please be advised that applying the proposed configuration for data transmission to the OpenAI API (or any other cloud platform) is entirely at your own risk. I will not be held responsible for any consequences or damages that may arise from utilizing this configuration.”

Prerequisites (not detailled) :

  • have already an installation of Home assistant with “Assist” activated
  • have already an installation of “faster_whisper” or any kind of Speech to text agent
  • Idem for Text to speech (already integrated by default into HASS with google if I’m not mistaken)
  • HACS installed on your Home Assistant (for NodeRed Custom integrations like “Node-RED Companion”)
  • Have NodeRed installed with “HASS custom integration”

Following the diagram in the introduction the configuration is the following:

Step 0: “Assist” configuration “settings” → “vocal assistant” → “assist”

Step 1 : creation of a custom “Intent” to detect and place into a “wildcard” variable the user request/sentence :
Note : File creation “openAI.yaml” into : config/custom_sentences/fr (adapt folder to your langage)

language: "fr" #adapt to your language
intents:
  custom_intents:
    data:
      - sentences:
          - "{user_demand}" #the wildcard variable that we will pass to nodered, openAI Whatever you want
lists:
  user_demand:
    wildcard: true

Step 2 : creation of an “Intent Response custom” to respond with “sensor.openai_query_response_from_nodered” which is the openai response from NodeRed :
NOTE : File creation “responses.yaml” into : config/custom_sentences/fr (adapt folder to your langage)

language: "fr" #adapt to your language
responses:
  intents:
    custom_intents:
      default: "{{states['sensor.openai_query_response_from_nodered'].attributes.full_response}}" #contain the response from nodered (template sensor)

Step 3 : creation of an "Intent_script"to send “user_demand” on the NodeRed API:
NOTE : Config to place into “configuration.yaml”

#intent_script to intercept "custom_intents" and launch API rest to nodeRed (which is using openAI API)
intent_script:
  custom_intents:
    action:
      - service: rest_command.send_to_node_red
        data:
          message: "{{user_demand}}"
      - wait_template: "{{ is_state('binary_sensor.nodered_response_received', 'on') }}" #we wait the openAI response before launching the response "speech" to assist NOTE: this "binary" sensor is set to on on nodered side and then back to off in nodeRed also once the response is received from OpenAI
    speech:
      text: "{{ states['sensor.openai_query_response_from_nodered'].attributes.full_response }}"

Step 4 : creation of the REST API to send the user payload to NodeRed :
NOTE : Config to place into “configuration.yaml”

#API GPT based on node RED
rest_command:
  send_to_node_red:
    url: "https://noderedURL:1880/endpoint/gpt" #change this url to point to your nodeRed instance containing an "HTTP In" coupled with "basic http auth" nodes in node red for basic authentification
    method: POST
    verify_ssl: false #if you are lazy and don't want to generate a certificate for nodered…
    headers:
      Authorization: !secret node_auth_header #(Optional basicauth) stored into "secret.yaml" under the format "node_auth_header: "Basic 'base64 of user:password'"
    content_type: "application/json"
    payload: '{"message": "{{ message }}"}' #"message" coming from intent_script: -->custom_intents:

Step 5 : Template to store the OpenAI response into the attribute of the custom sensor “openai_response_received” :
NOTE : Config to place into “configuration.yaml”

#template to store the last OpenAI response into the attribute of the sensor "openai_response_received"
template:
  - trigger:
      platform: event
      event_type: openai_response_received
    sensor:
      - name: "OpenAI query response from NodeRed"
        state: "{{now()}}" #useless but…
        attributes:
          full_response: "{{ trigger.event.data.full_response }}"

Step 6 : creation of an “automation” to send the nodeRed response coming from openAI into “openai_response_received” sensor :
NOTE: Config to do into “automation” gui (switch to yaml display to copy paste the code)

alias: Webhook response OpenAI
description: Webhook response OpenAI
trigger:
  - platform: webhook
    allowed_methods:
      - POST
      - PUT
    local_only: true
    webhook_id: "yourwebhookID here"
action:
  - event: openai_response_received
  event_data:
	full_response: "{{ trigger.json.text }}"
mode: single

Step 7 : NodeREd exemple

[{"id":"3cb857c1567c8b69","type":"tab","label":"OpenAI generic","disabled":false,"info":"","env":[]},{"id":"eeb1f3559ad73570","type":"group","z":"3cb857c1567c8b69","name":"Take input message + consult HASS entities","style":{"label":true},"nodes":["fc4fd94bd4abf2b2","e08dfff621631c5a","b0a83ee8ec390c8f","327ce8338edb06f8","76acf240b4bcd6bc","b88722191879064c","1b5bf50f0f6d532e","e942ac44a79d973f"],"x":594,"y":179,"w":892,"h":342},{"id":"3b89d79bb5cfe2e6","type":"group","z":"3cb857c1567c8b69","name":"GPT API In / Out","style":{"label":true},"nodes":["52e0b46444cefa4a","d1777378bded143e","8aeb87c8d435fd32","be9ec0b86c636a24","382dab4f3a4b6547","c72ead54dbba8cf1"],"x":1494,"y":199,"w":972,"h":242},{"id":"e06825bd14661733","type":"group","z":"3cb857c1567c8b69","name":"Used to indicate HASS \"conversation\" is finished","style":{"label":true},"nodes":["2f20ed922d2cc7cd","1aa736f2e097d82b","0d2bdfeb787a3632","232c9abbf38e34f2","f067ff8e5e2f2f97"],"x":2514,"y":299,"w":310,"h":322},{"id":"bbcd608906b7d614","type":"group","z":"3cb857c1567c8b69","name":"OpenAI reply to HASS conversation","style":{"label":true},"nodes":["374b384e5656a3ae","f6cec7473493e474"],"x":2514,"y":159,"w":372,"h":122},{"id":"45d74f37c7e95990","type":"http in","z":"3cb857c1567c8b69","name":"écoute URI \"gpt\" depuis HASS","url":"gpt","method":"post","upload":false,"swaggerDoc":"","x":180,"y":400,"wires":[["e6fd2b6cd00627ab"]]},{"id":"52e0b46444cefa4a","type":"function","z":"3cb857c1567c8b69","g":"3b89d79bb5cfe2e6","name":"payload vers API GPT","func":"let userMessage = msg.payload.usr_msg;\nlet entities = JSON.stringify(msg.payload.entities_msg);\n\nmsg.headers = {\n    'Authorization': 'Bearer yourgptAPItoken', //change your token\n    'Content-Type': 'application/json'\n};\nmsg.payload = {\n    'model': \"gpt-3.5-turbo\",\n    'messages': [\n        {\n            'role': 'system',\n            'content': 'YOUR GPT PROMPT HERE :' + entities\n        },\n        {\n            'role': 'user',\n            'content': userMessage // msg.payload user message from ASSIST\n        },\n    ],\n    \"temperature\": 1.1,\n    \"max_tokens\": 2048,\n    \"presence_penalty\": 1.1,\n    \"frequency_penalty\": 1\n};\nreturn msg;","outputs":1,"timeout":0,"noerr":0,"initialize":"","finalize":"","libs":[],"x":1620,"y":400,"wires":[["d1777378bded143e"]]},{"id":"d1777378bded143e","type":"http request","z":"3cb857c1567c8b69","g":"3b89d79bb5cfe2e6","name":"","method":"POST","ret":"obj","paytoqs":"ignore","url":"https://api.openai.com/v1/chat/completions","tls":"","persist":false,"proxy":"","insecureHTTPParser":false,"authType":"","senderr":false,"headers":[],"x":1830,"y":400,"wires":[["8aeb87c8d435fd32"]]},{"id":"8aeb87c8d435fd32","type":"function","z":"3cb857c1567c8b69","g":"3b89d79bb5cfe2e6","name":"response http","func":"msg.payload = msg.payload.choices[0].message.content;  // GPT response extract\nreturn msg;","outputs":1,"timeout":0,"noerr":0,"initialize":"","finalize":"","libs":[],"x":2060,"y":400,"wires":[["be9ec0b86c636a24","382dab4f3a4b6547"]]},{"id":"be9ec0b86c636a24","type":"http response","z":"3cb857c1567c8b69","g":"3b89d79bb5cfe2e6","name":"reponse API chat GPT","statusCode":"200","headers":{},"x":2340,"y":320,"wires":[]},{"id":"fc4fd94bd4abf2b2","type":"ha-get-entities","z":"3cb857c1567c8b69","g":"eeb1f3559ad73570","name":"","server":"96ae01b7.d902b","version":0,"rules":[],"output_type":"array","output_empty_results":false,"output_location_type":"msg","output_location":"payload","output_results_count":1,"x":750,"y":480,"wires":[["e08dfff621631c5a"]]},{"id":"e08dfff621631c5a","type":"function","z":"3cb857c1567c8b69","g":"eeb1f3559ad73570","name":"filter your entities before send to GPT","func":"var filteredEntities = msg.payload.filter(function (entity) {\n    return entity.entity_id.startsWith('sensor.') && entity.attributes.device_class == 'temperature';\n});\n\nvar filteredStates = filteredEntities.map(entity => {\n    return {\n        entity_id: entity.entity_id,\n        state: entity.state,\n        friendly_name: entity.attributes.friendly_name,\n        last_changed: entity.last_changed,\n        last_updated: entity.last_updated,\n        unit_of_measurement: entity.unit_of_measurement,\n        device_class: entity.device_class,\n        persons: entity.persons,\n        user_id: entity.user_id\n    };\n});\n\nmsg.payload = filteredStates;\nreturn msg;","outputs":1,"timeout":0,"noerr":0,"initialize":"","finalize":"","libs":[],"x":1030,"y":480,"wires":[["76acf240b4bcd6bc"]]},{"id":"b0a83ee8ec390c8f","type":"join","z":"3cb857c1567c8b69","g":"eeb1f3559ad73570","name":"","mode":"custom","build":"object","property":"payload","propertyType":"msg","key":"topic","joiner":"\\n","joinerType":"str","accumulate":true,"timeout":"","count":"2","reduceRight":false,"reduceExp":"","reduceInit":"","reduceInitType":"","reduceFixup":"","x":1400,"y":400,"wires":[["52e0b46444cefa4a"]]},{"id":"327ce8338edb06f8","type":"change","z":"3cb857c1567c8b69","g":"eeb1f3559ad73570","name":"change_topic","rules":[{"t":"set","p":"topic","pt":"msg","to":"usr_msg","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":950,"y":400,"wires":[["b0a83ee8ec390c8f"]]},{"id":"76acf240b4bcd6bc","type":"change","z":"3cb857c1567c8b69","g":"eeb1f3559ad73570","name":"change_topic","rules":[{"t":"set","p":"topic","pt":"msg","to":"entities_msg","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":1290,"y":480,"wires":[["b0a83ee8ec390c8f"]]},{"id":"e6fd2b6cd00627ab","type":"node-red-contrib-basicauth","z":"3cb857c1567c8b69","name":"","realm":"","username":"username","password":"password","x":410,"y":400,"wires":[["fc4fd94bd4abf2b2","1b5bf50f0f6d532e","b88722191879064c"],[]]},{"id":"b88722191879064c","type":"function","z":"3cb857c1567c8b69","g":"eeb1f3559ad73570","name":"parse user message","func":"msg.payload = msg.payload.message;\nreturn msg;","outputs":1,"timeout":0,"noerr":0,"initialize":"","finalize":"","libs":[],"x":720,"y":400,"wires":[["327ce8338edb06f8"]]},{"id":"374b384e5656a3ae","type":"http request","z":"3cb857c1567c8b69","g":"bbcd608906b7d614","name":"","method":"POST","ret":"txt","paytoqs":"ignore","url":"http://localhost:8123/api/webhook/yourHASSwebhookIDHERE","tls":"","persist":true,"proxy":"","insecureHTTPParser":false,"authType":"","senderr":false,"headers":[],"x":2630,"y":200,"wires":[[]]},{"id":"382dab4f3a4b6547","type":"function","z":"3cb857c1567c8b69","g":"3b89d79bb5cfe2e6","name":"Réponse OpenAI vers HASS","func":"let api_response = msg.payload;\n\nmsg.headers = {\n    'Content-Type': 'application/json'\n};\nmsg.payload = {\n            'text': api_response // msg.payload = OpenAI response\n};\nreturn msg;","outputs":1,"timeout":0,"noerr":0,"initialize":"","finalize":"","libs":[],"x":2320,"y":400,"wires":[["374b384e5656a3ae","1aa736f2e097d82b"]]},{"id":"1b5bf50f0f6d532e","type":"api-call-service","z":"3cb857c1567c8b69","g":"eeb1f3559ad73570","name":"GPT query text save","server":"96ae01b7.d902b","version":5,"debugenabled":false,"domain":"input_text","service":"set_value","areaId":[],"deviceId":[],"entityId":["input_text.gpt_query"],"data":"{\"entity_id\":\"input_text.gpt_query\",\"value\":\"{{payload.message}}\"}","dataType":"json","mergeContext":"","mustacheAltTags":false,"outputProperties":[],"queue":"none","x":770,"y":300,"wires":[[]],"info":"fonction permettant de retenir la dernière question\r\nposée à OpenAI dans la WebUi HASS"},{"id":"2f20ed922d2cc7cd","type":"ha-binary-sensor","z":"3cb857c1567c8b69","g":"e06825bd14661733","name":"nodered_response_received","entityConfig":"09803e7de818ecdc","version":0,"state":"payload","stateType":"msg","attributes":[],"inputOverride":"allow","outputProperties":[{"property":"state","propertyType":"msg","value":"turn_on","valueType":"str"}],"x":2660,"y":400,"wires":[["0d2bdfeb787a3632"]]},{"id":"1aa736f2e097d82b","type":"change","z":"3cb857c1567c8b69","g":"e06825bd14661733","name":"","rules":[{"t":"set","p":"payload","pt":"msg","to":"true","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":2650,"y":340,"wires":[["2f20ed922d2cc7cd"]]},{"id":"0d2bdfeb787a3632","type":"delay","z":"3cb857c1567c8b69","g":"e06825bd14661733","name":"","pauseType":"delay","timeout":"5","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"allowrate":false,"outputs":1,"x":2640,"y":460,"wires":[["f067ff8e5e2f2f97"]]},{"id":"232c9abbf38e34f2","type":"ha-binary-sensor","z":"3cb857c1567c8b69","g":"e06825bd14661733","name":"nodered_response_received","entityConfig":"09803e7de818ecdc","version":0,"state":"payload","stateType":"msg","attributes":[],"inputOverride":"allow","outputProperties":[{"property":"state","propertyType":"msg","value":"turn_off","valueType":"str"}],"x":2660,"y":580,"wires":[[]]},{"id":"f067ff8e5e2f2f97","type":"change","z":"3cb857c1567c8b69","g":"e06825bd14661733","name":"","rules":[{"t":"set","p":"payload","pt":"msg","to":"false","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":2650,"y":520,"wires":[["232c9abbf38e34f2"]]},{"id":"e942ac44a79d973f","type":"comment","z":"3cb857c1567c8b69","g":"eeb1f3559ad73570","name":"Adapt to your needs","info":"Adapt to your needs","x":1190,"y":220,"wires":[]},{"id":"c72ead54dbba8cf1","type":"comment","z":"3cb857c1567c8b69","g":"3b89d79bb5cfe2e6","name":"Adapt to your needs","info":"Adapt to your needs","x":1650,"y":240,"wires":[]},{"id":"f6cec7473493e474","type":"comment","z":"3cb857c1567c8b69","g":"bbcd608906b7d614","name":"CHANGE with your HASS webhook token","info":"Adapt to your needs","x":2700,"y":240,"wires":[]},{"id":"96ae01b7.d902b","type":"server","name":"Home Assistant","addon":true,"rejectUnauthorizedCerts":true,"ha_boolean":"","connectionDelay":false,"cacheJson":true,"heartbeat":true,"heartbeatInterval":"10","areaSelector":"friendlyName","deviceSelector":"friendlyName","entitySelector":"friendlyName","statusSeparator":"-","statusYear":"numeric","statusMonth":"numeric","statusDay":"numeric","statusHourCycle":"h23","statusTimeFormat":"h:m:s","enableGlobalContextStore":true},{"id":"09803e7de818ecdc","type":"ha-entity-config","server":"96ae01b7.d902b","deviceConfig":"","name":"nodered_response_received","version":"6","entityType":"binary_sensor","haConfig":[{"property":"name","value":"nodered_response_received"},{"property":"icon","value":""},{"property":"entity_category","value":""},{"property":"entity_picture","value":""},{"property":"device_class","value":""}],"resend":false,"debugEnabled":false}]

Let me know all your remarks and improvments that can be done or if I missed something important for things to work.

5 Likes

@sebj84 Thank you for sharing the results of your hard work. I use NR extensively and have been dipping into OpenAI integration. Have not had time to play with the new HA voice technology yet, so delighted to see your post which gathers it all together. Rest assured I will digest and (hopefully) implement.

Thank you. I hope it’ll work for you (If I did missed something let me know…). Next-step on NR side… Usage of “Vector database” to optimize the requests to openAI and interpret some OpenAI JSON responses to pilot my entities (this part is already documented by this great youtuber “Technithusiast”)

Hello @sebj84 and thank you for the wonderful guide!

I have a good background on GPT work but not so much experience in HASS.

The guide was easy to follow and I have a good implementation going.

Have you faced any issues getting the response to show up on the Assistant? The only issue I have been facing is that the binary sensor does not trigger the assistant to send the response message. After sending another message to the assistant, the response message given is shown.

After sending a new message, the earlier response shows immediately:

I wonder what’s my configuration issue here due to my limited HASS knowledge. I would assume something goes wrong in the custom_intents's wait_template. I can still see the binary sensor changing its value. Going to do further investigation and see what’s wrong

Found the issue - the nodered_response_received seems to have to be turned on all the way until a response is ready. Currently it took more than the 5 seconds to form the reply, which lead to the UI not being able to respond in time.

Thank you for your interest on this topic.
Glad you found the issue by yourself :smiley: If you found your solution, that means that you found the trick I used to wait the “assit response” :slight_smile: I’m not sure it is clean but it is working (until now).

Hi,

I’m using this to speak with OpenAI, but would it be possible to combine this with the HA-assist?

If the assist could execute known commands, before calling OpenAI, you could save both time waiting for OpenAI and the api calling

Hi bloody,

Sorry for the long delay.
Of course the goal in this project is to combine here HA assist with node-red at first. Once you “trigger” the first function into node red you can do everything you want before going to openAI (or any other LLM API). In the actual project I have, I’ve managed to do a backup of my previous requests and LLM responses into a vector database to perform a “RAG” (Retrieval Augmented Generation (RAG) | Prompt Engineering Guide<!-- --> → objectives : minimise the tokens used on the LLM API + eventually bypass the LLM if the user request is simple + getting some sort of memory for the LLM). At this time it is working well but would need another long post to explain the complete chain. I hope you found your solution since your last message. Rgds.

Thank you for your initial post, it was very helpful.

“At this time it is working well but would need another long post to explain the complete chain.”

If you ever get round to it, I would be very interested in reading how you implemented RAG.

Thank you for your support. I’ll do my best to make another post on this project…