If you want your HA to be as local as possible, stop reading here
I would like to experiment with an AI first approach for voice.
I imagine a pipeline where a wake-word connects the voice device to the OpenAI realtime API so all the voice features are just piped to OpenAI and the LLM has access to the same functions and context as the AI-agent we have today.
The LLM does not even have to talk to the raw HA APIs, it can use āassistā and send simpler texts to HA.
If i say āItĀ“s too dark in the kitchenā the LLM can use assist to get the state of the lights in the kitchen and then decide if it needs to change a dimmer or turn something on.
Iāve tested this in the OpenAI playground (with āfakeā assist funktions) and the LLM understands what needs to be done very well in my (limited) testing, you can even say things like āa little more pleaseā to turn up the dimmer even more.
Benefits:
Not turn-based, you can even interrupt the assistant and change your mind mid sentence.
Smarter, and evolves as the models get smarter.
No TTS/STT ā¦ just voice (and function calls)
It would be quite easy to build this āoutsideā of HA and just use the HA APIs,
but it would be best as a pipeline in HA so you can mix assistants.
What do you all think? Is this the way to Jarvis?