I wanted a way to quickly swap settings and identify differences in Whisper’s behavior, but without doing a thousand restarts and/or flashings, all the while keeping the logic as close to Home Assistant & ESPHome as possible.
I ended up creating a little tool, I thought others might find it useful.
Thanks to ESPHome & Home Assistant using python, it only required porting the various functions to get a pretty decent match (I hope; PRs welcomed if you found some place where the “matching” can be improved). Porting the audio processor to Windows was another story, but still, done…!
I added a simple UI on top for ease of use (screenshot provided in the link).
This is only a first release (tested by 1!). Expect bugs (hopefully not too many!), I’ll try to fix them in a timely fashion.
I’m sure this will eventually be obsolete and replaced with a nice UI from within HA’s Developer Tools, but until then…
Features
- Models & configs from Home Assistant / ESPHome.
- “auto”
language
detection supported.
- “auto”
- Save processed audio (without running transcribe) to identify differences with original.
- Allows to tweak config for ESPHome according to recorded output of microphones.
- Timing of various executions (with more details in logs).
- Can’t compare to Home Assistant’s, but allows to evaluate the impact of various configs on a per-machine basis.
Requirements
You’ll need to know enough about Python to install it, and execute a couple of package installs (commands given in the link) before being able to run this.
And of course *.wav
files to process…
This was currently only tested on Windows 11. See below if other users report other platforms.