If you want your HA to be as local as possible, stop reading here
I would like to experiment with an AI first approach for voice.
I imagine a pipeline where a wake-word connects the voice device to the OpenAI realtime API so all the voice features are just piped to OpenAI and the LLM has access to the same functions and context as the AI-agent we have today.
The LLM does not even have to talk to the raw HA APIs, it can use āassistā and send simpler texts to HA.
If i say āItĀ“s too dark in the kitchenā the LLM can use assist to get the state of the lights in the kitchen and then decide if it needs to change a dimmer or turn something on.
Iāve tested this in the OpenAI playground (with āfakeā assist funktions) and the LLM understands what needs to be done very well in my (limited) testing, you can even say things like āa little more pleaseā to turn up the dimmer even more.
Benefits:
Not turn-based, you can even interrupt the assistant and change your mind mid sentence.
Smarter, and evolves as the models get smarter.
No TTS/STT ⦠just voice (and function calls)
It would be quite easy to build this āoutsideā of HA and just use the HA APIs,
but it would be best as a pipeline in HA so you can mix assistants.
Iām a bit astonished that this topic hasnāt seen more activity! The realtime preview via the playground is really impressive.
Iām receiving my Voice Preview Edition this afternoon, with the goal of getting this integrated. Iād opt for the approach where the model has access to (part of) the HA API, instead of having it use assist. Bridging the two doesnāt need to be very difficult.
Yes please! This would be fantastic, and yet still seems not to far out of reach. Maybe the focus on streaming the TTS coming in the next version is a small step toward this?
Pay attention to the plumbing being put in the March release currently in beta⦠Doesnāt get 100% yet but supports streaming responses. Which would be required before this if my understanding of the architecture is soundā¦