HA Voice PE: Add post-processing step between "Conversation Agent" and "Speech to Text" step

The basic problem is that Piper uses espeak-ng, whose dictionaries are not adapted for use for conversations. espeak aims to voice all characters, emoticons, numbers (without using normalization) and so on.

More work would be required to customize the dictionary for each language before training the voice for Piper. But that’s a huge amount of work, for a project with few people, and time was running out.
Now, if I were OHF, I would allocate resources to re-train the basic voice with these nuances in mind if they plan to continue using Piper as their primary tool.

Now we have several solutions to the existing problem.

  • Character replacement can be added to the Piper server code.
  • It is possible to create an intermediate proxy on Wyoming protocol, which will deal with normalization. Which is a more unified solution. Here is an example of a similar project for stt.
  • Or wait for a system solution from the developers.

Sketched out a test version of the proxy.

2 Likes