Allow For Event Firing In Place of TTS for Voice Pipeline

Abcdefg · July 9, 2024, 7:31am

Currently, the assistant pipeline runs something like:

Wake word spoken → speech to text → AI → Text to Speech → play TTS out at source

I am suggesting the pipeline allow us to go right from AI to our own automations, which may or may not call into TTS. This could be accomplished by having another “TTS” provider instead of Piper that simply fires an event with the original text, the response text, and the source (i.e. which satellite, or the HA device) rather than actually doing text to audio processing. An even more fully populated version could look something like

{
source: foo,
source_area: foo Room,
source_ip: 127.0.0.1,
query: bar,
response: foobar
timestamp: 12345
}

This text could then be used for a variety of useful purposes. For instance:

Feed text into the TTS integration again, to play out audio on a device of our choosing, rather than the source satellite. This seems to be a very commonly requested feature
Allow us to use existing external TTS services - such as calling into the Sonos integration’s TTS functionally. This gives greater compatibility and growth without any more work from the developers
Displaying the text on a dashboard rather than speaking it. Beyond just being cool, this could be a great accessibility feature for deaf users

Last, this should be easy to implement. Since the voice assistant is already a pipeline, this feature would simply involve returning sooner. Aside from a bit of data conversion at the end to expose it “nicely”, the work should mostly be done

Let’s be creative and see what we can do with the text, rather than waiting for the full response!