Is there a way to record utterances and transcripts from Voice Assistant for fine-tuning Whisper?

  • I am running Voice Assistant preview fully locally with Whisper and piper
  • On Home Assistant Green
  • Both Home Assistant and Voice Assistant are running latest firmware

What I want to do is use the voice and transcript data from voice-based interactions to help fine-tune a Whisper STT model to my particular voice. I am a woman and have a general Australian accent, which results in mis-transcriptions such as “start a writing session” => “start a rotting session”.

Imagine saying “start a writing session” in an Australian accent and you can easily hear how it’s mis-transcribed as “rotting”, particularly with a smaller model .

Is there any way to do this? I am technical and run Linux as my daily driver.

I don’t think you can hear the command you gave, but there should be a record of what the voice assistant thought you said in the debug log on the Settings | Voice Assistants page (in the “three dots” menu).

You may be able to find the file the voice assistant played in the TTS cache, but it’s pretty hard to identify a particular one.

1 Like

If you need voice samples, enable debug in the configuration file.

assist_pipeline:
   debug_recording_dir: /share/assist_pipeline
1 Like