ChatGPT generated bedtime story read aloud by ElevenLabs, almost! Need help getting there

danblah · March 29, 2023, 2:23pm

I’m trying to insert user-generated input text from a dashboard card to be included within a larger ChatGPT text prompt. The goal for the output is a unique bedtime story, with a user (my kids) inspired theme to be read back via ElevenLabs tts or the user (me as the parent).

I’m looking for help/recommendations on the best way to combine the base prompt text with the user-generated text for the full prompt to post via rest sensor platform to the resource https://mylocalhass/api/conversation/process (because service conversation.process via UI doesn’t return the response). The challenge is that payload in the rest sensor doesn’t allow templating.

The user-generated text is an input_text, the whole prompt is being stored in a template sensor, using the attributes to bypass the string length limitations of 255 char for states:

    {{ states('input_text.story_prompt') }}
    Result type: string
    The theme of the story is the kids are going on a balloon ride to the ocean.

And then the template sensor:

template:
  - sensor:
      - name: "GPT Bedtime Story Prompt"
        state: >
          {{ this.attributes.story_prompt | default('null') }}
        attributes:
          full_prompt: >
            I want you to write a bedtime story for my two kids. I want them to be part of the story. Their names are Jill and Jack. Include two lessons from Aesop's fables to be worked into and a part of the story. Do not mention Aesop's fables by name. Instead reference the lessons as a story their parents had told them. {{ states('input_text.story_prompt') }} The story should be as long as it would take 10 minutes for an adult to recite it.
          story_prompt: >
            {{ states('input_text.story_prompt') }}
          gpt_json: >
            '{ "text": "{{ this.attributes.full_prompt }}", "language": "en" }'

The response is solicited/stored in the following rest sensor:

    sensor:
      - platform: rest
        resource: https://mylocalhass/api/conversation/process
        method: POST
        name: GPT Bedtime Story Response
        timeout: 45
        scan_interval: 31557600
        json_attributes_path: "$.response.speech.plain"
        json_attributes:
          - speech
        headers:
          Authorization: !secret gpt_test_token
          Content-Type: application/json
        value_template: "blah blah"
        payload: >
          {{ state_attr('sensor.gpt_bedtime_story_prompt','gpt_json') }}

Doing a homeassistant.update_entity on the above entity produces the following:

    Logger: homeassistant.components.http.data_validator
    Source: components/http/data_validator.py:60
    Integration: HTTP (documentation, issues)
    First occurred: 12:41:07 AM (2 occurrences)
    Last logged: 9:56:04 AM

    Invalid JSON received

    Logger: homeassistant.components.rest.sensor
    Source: components/rest/sensor.py:161
    Integration: RESTful (documentation, issues)
    First occurred: 12:41:07 AM (2 occurrences)
    Last logged: 9:56:04 AM


    JSON result was not a dictionary or list with 0th element a dictionary

All goes well if I replace payload with the actual json.

'{ "text": "I want you to write a bed time story for my two kids. I want them to be part of the story. Their names are Jill and Jack. Include two lessons from Aesop's fables to be worked into and apart of the story. Do not mention Aesop's fables by name. Instead reference the lessons as a story their parents had told them. The theme of the story is the kids are going on a balloon ride to the ocean. The story should be as long as it would take 10 minutes for an adult to recite it.", "language": "en" }'

Thoughts suggestions? What trivial thing am I missing? Thanks in advance!

I’m using the ha_chatgpt custom component (to make use of gpt3.5-turbo, which official doesn’t support yet) and elevenlabs_tts for read back.

danblah · April 3, 2023, 3:55am

Moved it forward using appdaemon. To get real meta (and cause I’m a crap dev), I let GPT4 do most of the coding. It’s working great, kiddos love it

import appdaemon.plugins.hass.hassapi as hass
import json
import requests
import datetime
import time


class AssistHoller(hass.Hass):

    def initialize(self):

      self.story_entity = self.get_entity("sensor.gpt_bedtime_story_prompt")
      self.story_entity.listen_state(self.get_story, attribute = "gpt_json")

      self.story_button = self.get_entity("input_button.story_play")
      self.story_button.listen_state(self.read_story)


    def get_story(self, entity, attribute, old, new, cb_args):

      # A new theme has been saved, lets PUT to /api/conversation/process to get the full story

      self.payload = self.get_state("sensor.gpt_bedtime_story_prompt", attribute="gpt_json")
      self.headers = {
        'Authorization': self.args["hass_token"],
        'Content-Type': 'application/json'
      }
      
      response = requests.post(self.args["hass_url"], json=self.payload, headers=self.headers, timeout=50)

      speech = response.json()['response']['speech']['plain']['speech']
      self.set_state(entity_id="sensor.gpt_bedtime_story_ai_response", state = datetime.datetime.now(), attributes = {"story": speech})

    def read_story(self, entity, attribute, old, new, cb_args):
      
      #button pressed, let's play the story
      speaker = self.get_state("input_select.story_speaker")

      story = self.get_state("sensor.gpt_bedtime_story_ai_response", attribute="story")

      # split the story into chunks, per elevenlabs 2500 char limit
      chunks = []
      chunk = ''
      for sentence in story.split('. '):
          if len(chunk) + len(sentence) + 1 <= 2500:
              if chunk:
                  chunk += '. '
              chunk += sentence
          else:
              chunks.append(chunk)
              chunk = sentence
      if chunk:
          chunks.append(chunk)

      # play a chunk, wait until its finished, play another
      for i, chunk in enumerate(chunks):
          #print(chunk)
          self.call_service("tts/elevenlabs_tts_say", entity_id = speaker, message = chunk, options = {"voice": "Bella", "stability": "0.5", "similarity": "0.5"})
          time.sleep(160)