The conversation integration allows us to converse with Home Assistant. We can either converse by pressing the microphone in the frontend (supported browsers only (no iOS)) or by calling the conversation/process service with the transcribed text. I understood “how to call the conversation/process service with the transcribed text”. My issue is how speech is converted into text by using microphone. Can somehow help how home assistant is converting speech to text? What api or code is used for that?
I think it uses Stanford University’s Genie (Formerly known as Ada with Almond)
You might also look at Rhasspy
Can someone please give more clarity to this?
I need to know exactly where this speech to text conversion is happening in HA?
STT was introduced in HA 0.102.
Here is the release post: 0.102: Official Android App, Almond, Scene editor - Home Assistant
But it is not using stt. I’m sure about it because I tried to print logs in stt component. But no logs came as it is not using stt component. May be it using browser api to convert. But not clearly about it. Can some clarify?
The conversation icon in the Home Assistant web UI uses the Web Speech API in the browser (Chrome only, backed by Google Cloud APIs).