- I am running Voice Assistant preview fully locally with Whisper and piper
- On Home Assistant Green
- Both Home Assistant and Voice Assistant are running latest firmware
What I want to do is use the voice and transcript data from voice-based interactions to help fine-tune a Whisper STT model to my particular voice. I am a woman and have a general Australian accent, which results in mis-transcriptions such as “start a writing session” => “start a rotting session”.
Imagine saying “start a writing session” in an Australian accent and you can easily hear how it’s mis-transcribed as “rotting”, particularly with a smaller model .
Is there any way to do this? I am technical and run Linux as my daily driver.