I want to use the GPT-3 API to build my own super smart voice assistant. What hardware/software can I use for voice recognition?

I’ve played around with GPT-3 in the OpenAI playground and have been really impressed. I taught it how to return a JSON object where it can call Home Assistant services, say a response, or ask follow-up questions. It’s basically JARVIS from Iron Man, and feels like it could be 1000x smarter than Alexa. I’m excited about the idea of fine-tuning GPT-3 and providing many more examples, so that it knows about all of my entities and all of the HA services. Plus some extra things like integrations with Google, Wikipedia, IMDB, etc.

I’m just not sure how to handle the speech to text part. I already have Echo devices around my house. Is there a way to use Amazon for the voice recognition, but hack it so that my custom software to do all of the processing? Can I root the echoes and install my own OS to use their mic and speaker? Or maybe rip out the echo internals and replace it with a raspberry Pi or ESP32?

Or is there a more open device similar to an echo with a microphone and speaker? Or something I can 3D print and assemble?

I don’t mind using a cloud service for speech to text, because I want it to be extremely accurate and fast. (It would listen for a wake word before it starts streaming audio to the API, just like Alexa.)

How is the accuracy of Ada/Almond’s speech recognition? Could I use them but plug in my own code that calls the GPT-3 API?

5 Likes

One idea I had was to use Rhasspy for voice-to-text part of the system (e.g. raspberry pi + microphone), capture the input and pass it to service running pre-configured GPT-3 instance, then use HA or Rhasspy to respond.

3 Likes

I’m investigating the same thing.

I’m also looking into unreal engine and omniverse to give the assistant a human like avatar.

Amazon Polly for voice, but that could change.

1 Like

I’m also investigating this space.
Regarding the voice to text: Maybe you could have a look at the recently released open-source AI model Wisper: Introducing Whisper

There are probably also some online companies who can host it for you.

1 Like

Someone has released a quantized/TFLite model of whisper tiny.en and an .apk demo

The tiny model may not be as accurate and does not have ASR streaming over websickets like existing cloud tts, eg: assembly.ai

Be sure to prompt gpt3 with example json key/values to avoid parsing errors, I’ve had some experience with that. New OpenAI davinci model allows for suffix text which might help there. Good luck

1 Like

Here is what I have so far.

installed openai library with the command: pip install openai

also ran this command: pip install openai requests urllib3

Create:
“gpt_chat” folder inside of the “custom_components” folder.

Create two files:
file one- “init.py”
with contents:

import openai
openai.api_key = “YOUR_API_KEY”

def generate_response(input_text):
prompt = (f"{input_text}\n\n"
“Type your response here:”)
response = openai.Completion.create(engine=“text-davinci-002”, prompt=prompt)
return response.text

File Two- “manifest.json”
contents -

{
“name”: “GPT Chat”,
“description”: “A chatbot powered by GPT-3”,
“domain”: “gpt_chat”,
“dependencies”: [
“openai”
]
}

Then add “gpt_chat:” to the configuration file.

I have no idea what I am doing, but at this point, I get a configuration invalid answer saying that I don’t have the component “gpt_chat:”

If anyone has any ideas, I would appreciate the help.

Doing the same thing did you make some progress?

Hi everyone. I recently got started doing the same thing. I am going to use a Raspberry Pi with the official 7" touchscreen to try to build a graphical interphase which includes an avatar for this.

Regarding programming that accesses the OpenAI API, I got ChatGPT to write the code for me. Ask it something like “Write a python program with a graphical user interphase which takes a text input from the user and sends it to the OpenAI GPT-3 API”. It should display the output text from the OpenAI API to the user via the graphical user interphase." This code should do it: (ChatGPT wrote it and I tested it on my laptop):

import openai
import tkinter as tk

# Set the OpenAI API key
openai.api_key = "YOUR_API_KEY"

# Create the GUI window
window = tk.Tk()
window.title("OpenAI GPT-3 Demo")

# Create a function to process the user's input and get a response from the OpenAI API
def get_response():
    # Get the user's input
    user_input = input_text.get()
    
    # Use the OpenAI API to generate a response
    response = openai.Completion.create(engine="text-davinci-002", prompt=user_input, max_tokens=1024, temperature=0.5)
    
    # Get the response text from the API
    response_text = response['choices'][0]['text']
    
    # Display the response text in the GUI
    output_label.config(text=response_text)

# Create the input field for the user
input_text = tk.StringVar()
input_entry = tk.Entry(window, textvariable=input_text)
input_entry.pack()

# Create a button to trigger the API request
request_button = tk.Button(window, text="Get Response", command=get_response)
request_button.pack()

# Create a label to display the response text
output_label = tk.Label(window, text="")
output_label.pack()

# Run the GUI
window.mainloop()

It uses the Tkinter library for the GUI, so make sure to install that library as well. The line that actually deals with the openAI API is just 17. It specifies which engine to use (davinci-002 is the highest-end right now I think), the prompt string, the maximum number of tokens to be used (1000 tokens costs $0.02 and yields approximately 750 words), and the temperature (think of it like creativity on a scale of abstractness). Line 20 pulls the actual text response from the rest of the data sent back. To get your OpenAI API key, you can go to beta.openai.com, create an account, then add a payment method. You do need to pay to use the API, although it costs a fraction of a cent per reasonably sized call. You can also set hard and soft spending limits on their website, so don’t be afraid to pay.

As for speech to text, I’ve ran into a bit of trouble. After prompting ChatGPT to add speech to text recognition to the above program, I found that the python speech recognition library did not perform very well after moving more than a foot away from my laptop’s microphone and that it also lacks the capability to always listen for a wake word.

If anybody has any updates on implementing a real time avatar, speech to text, or text to speech, I would love to hear them.

Thanks

3 Likes

Home assistant should produce new box (like home assistant blue) with good microphone for ChatGPT. Because, it is very hard to build such microphones as Alexa has for HA users, it is hard to make high quality microphones. So community should work only with integration of ChatGPT. But ChatGPT should also draw new Lovelace cards, make new automations.

Check this out

4 Likes

This is pretty awesome!!! I did notice a lot of cuts. So The API calls and response probably takes some time.
I feel like the first person that integrates ChatGPT with rhasspy and HA is going to win the internet lol.If it’s made good enough it could help pull some of the general public to HA as well.

1 Like

Yeah he said somewhere that long inquiries take about 3-5 seconds, but short ones are faster.

im actually trying to use Node-red to integrate them both… no luck as of today, but getting closer

here, some spoiler:

4 Likes

Look at my post!
https://community.home-assistant.io/t/using-gpt3-and-shorcuts-to-talk-with-home-assistant-in-a-very-smart-way/522908

Hi Guys, this sounds awesome. I am still a beginner with this all but hope to do same as the poster discribbed. Can you help me out where to start and how to implementate CHAT GPT 3 in to home assistant and how to make the text to speach and speach to text? Can i use my google home for this and hack it so that only chat gtp is working and how to install all of this in de home assistant.

Hope you guys can help me out.

Many thanks,

Best regards, Robin

You can try mating Alexa with OpenAi’s GTP3.

with a little help this worked for me

1 Like

Have you guys seen the news ChatGPT plugins

A plugin would work.

Hi, I found this repository on github: GitHub - qui3xote/OpenAIConversationEnhanced: An implementation of ChatGPT as a conversation agent that can actually control your home.

It is basically a hacs replacement for the default openai chat integration in home assistant. The difference is, this one can really do something with your lights and so on.

So the prompt send to chatgpt by this integration describes a few things. At first, it tells chatgpt to generate only json formated answers. The answers are descriped in a way, that the integration can send the answer of chatgpt to the home assistant service handler.
Chatgpt gets an idea of the home and its structure by a description of the rooms and their devices.
With this knowledge it is possible to control the home in nearly natural speaking. And you can ask everything else.

In my case, the integration didn’t work, because i have to many devices to create an automatic description for chatgpt of my home.

So i have changed some things manually and it worked.
I removed in the file const.py the content of home_info_template. I wrote the information of my house and the room into the template, that you can edit within home assistant, where you also set the chatgpt model and so on.

I only experimented with one room of my house (Arbeitszimmer/Office).

So i added the following description at the end of the template, that explains chatgpt, what to do:

Properties of the smart home:

Office:
Temperature is {{ states(‘sensor.temperatur_arbeitszimmer’)}}°C
Printer Switch is {{states(‘switch.drucker’) }}, use “switch.drucker” as entity_id in the JSON.
Light is {{states(‘light.0x588e81fffeef3214’) }}, use “light.0x588e81fffeef3214” as entity in the JSON.

With these informations chatgpt was able to answer question about the states of the switch, Light or the temperature, but also able to generate a json command, that was valid to use it with the service handler from home assistant. Within this integration the service handler is called, if the answer from chatgpt contains a JSON command. And the content of the field comment is returned as answer. It contains for example a short acknolegde, of the things, chatgpt has done. Something like “I have turned the switch off” or “OK” only. If you ak for the temperature, there is a full answer in there.

The smart thing is, that chatgpt was able to create JSON to change the light color, light color temperature, and on and off, the switch too.

And there was no need to build a specific sentence to tell chatgpt what to do. Just natural speaking, in my case in german, but described the home and so on in english. I gave the rule, to answer in german only in the comment.

The next step should the transfer the result of a speech2text part like rhasspy maybe to the conversation/process endpoint of home assistant. And then convert the result to speech.

It is a really greate base work of the creator of the repository and i hope there is way to complete the whole workflow of this someday :wink:

Add: I created a fork of the repository with all changes i needed to get this work with my installation.

4 Likes

A way to handle the speech input. Maybe there is a way to send the recognized speed to mqtt and pass this with an mqtt- automation to the conversation- service inside home assistant.