Hello. I have a Home Assistant Voice Preview Edition and I am trying to fine tune the TTS output. What’s confusing is that there are two places to configure Piper:
Settings → Voice Assistants → {{ your_assistant }} → Text-to-speech allows you to just pick a tts engine and voice
Settings → Add-ons → Piper → Configuration has way more settings, giving you more voices by allowing you to choose the speaker (with option 1 if a voice has multiple speakers you are stuck using the first one), length scale, and some other fine-tuning parameters useful for really dialing in the tts output
my problem is that I seem to be able to make edits using option 1, but not option 2. Or maybe I can? I tried testing by changing the scale length to 2, and it wasnt working, so then I changed the scale length to 0.2, but now maybe the scale length of 2 is working? Listening to the voice, it’s unclear to me which is why I changed it to 0.2, but that doesnt seem to work at all so im thinking it never worked and the voice just talks fast sometimes. Ive been trying to say new phrases too to avoid cached results screwing with me.
Is it possible to use option 2, or get the settings presented by option 2 by some other means? Right now I just dont feel like I can get the results im looking for with just the controls available from option 1. Thanks
There are three levels.
The physical VPE terminal is not linked to the tts configuration, it receives data from HA based on the pipeline configuration.
The Assist and its pipelines in HA, they interact with external tts , for some tts it is possible to select voice.
The TTS themselves are configured either in addons (like wyoming piper) or through their own integration. This is where the fine tuning happens.
Thank you for that information. I’m trying to wrap my head around this. So, based on what you are saying, I should be able to configure the piper add-on that is connected to the wyoming integration, and the HA Assist pipelines should use those changes and pass it along to Voice PE, no? But that’s not the behavior I’m seeing
Hold the phone. The problem was my kryptonite - reading basic directions laid out for me. I thought it wasnt responding to my changes because I was lowering the scale length value and the speech wasnt getting slower. Meanwhile, *as you note, directly below the box I was entering those values into was this text:
< 1.0 being faster and > 1.0 being slower
It was responding the whole time, and I can confirm that all of the settings are affecting the output as expected thanks for the help mchk