AI First approach to Voice (Using OpenAI Realtime API)

If you want your HA to be as local as possible, stop reading here :wink:

I would like to experiment with an AI first approach for voice.

I imagine a pipeline where a wake-word connects the voice device to the OpenAI realtime API so all the voice features are just piped to OpenAI and the LLM has access to the same functions and context as the AI-agent we have today.

The LLM does not even have to talk to the raw HA APIs, it can use ā€œassistā€ and send simpler texts to HA.

If i say ā€œItĀ“s too dark in the kitchenā€ the LLM can use assist to get the state of the lights in the kitchen and then decide if it needs to change a dimmer or turn something on.

Iā€™ve tested this in the OpenAI playground (with ā€œfakeā€ assist funktions) and the LLM understands what needs to be done very well in my (limited) testing, you can even say things like ā€œa little more pleaseā€ to turn up the dimmer even more.

Benefits:
Not turn-based, you can even interrupt the assistant and change your mind mid sentence.
Smarter, and evolves as the models get smarter.
No TTS/STT ā€¦ just voice (and function calls)

It would be quite easy to build this ā€œoutsideā€ of HA and just use the HA APIs,
but it would be best as a pipeline in HA so you can mix assistants.

What do you all think? Is this the way to Jarvis?

Iā€™m a bit astonished that this topic hasnā€™t seen more activity! The realtime preview via the playground is really impressive.

Iā€™m receiving my Voice Preview Edition this afternoon, with the goal of getting this integrated. Iā€™d opt for the approach where the model has access to (part of) the HA API, instead of having it use assist. Bridging the two doesnā€™t need to be very difficult.

I agree with thisā€¦ While local is great, would love an option like this. This would make it more fluid like Alexa and Siri used to beā€¦ :slight_smile:

Realtime API is so good. Looking forward to this integration.