Hi, I was planning to setup a voice assistant in my house and I saw that the normal pipeline is STT → LLM → TTS, however, we saw the release of good omni models (any-to-any modality) and got me thinking if I could directly use it in the voice assistant pipeline?
Do you know if this functionality is already available?