Despite the instructions in the prompt, many models use the “*” symbols in the response (** for bold text, * - list items). These characters are then said by the TTS (some tts engines do the normalisation themselves, but most can’t, Piper included).
It would be a great idea to add a simple filter (as an option) to the assist pipeline (after the conversation module) that removes the formatting from the text.
Run oai with two different TTS engines and force something that will drop markup in the response… Asterisk asterisk pops out. If youre running piper it recites it faithfully annoyingly
All the alternate speech engines I’m using ignores markup. So I’d put this on Piper to ignore markup.
Workaround - an entry VERY sternly and very high up in your prompt at says something like:
You are serving a voice pipeline - do not use markup in your response, use simple ASCII only.
while not 100% seems to squish most of it for me.
I won’t vote for a fr to strip format (seems like a sledgehammer where we need a scalpel) but I would TOTALLY vote for a fr to piper.
I disagree, it’s the agent’s task to transmit formatted text both to a text terminal window and for speech synthesis.
No UI for LLMs displays formatting symbols, which is reasonable.
Shifting this functionality to the TTS module is technically incorrect.
Though you’re right that most TTS engines ignore these symbols.
When I ask for certain responses from Preview Edition it verbally says “asterisk” for every bulletpoint. An example in the short (1 minute) video that I uploaded to Youtube. Here in this specific example, I have it set up to describe certain alerts from my cameras. Such as in this example I drive up the driveway and park my car. Pretty simple. Basically, I have LLM Vision set up to analyze a video with Gemini (does it with ChatGPT also) and the prompt is set as follows:
“Briefly describe the vehicle in the video, what it’s doing and it’s occupant if they are visible. If there is a license plate number available, please detect it and include that in the response.”
Then it announces on Preview Edition.
But it audibly includes the “asterisks” in the description of the video. It’s comically annoying. https://youtu.be/5msnfk7IxfI
I’m using Home Assistant Cloud for my TTS because I found it to be faster but the “asterisk asterisk asterisk” is extremely annoying.
The cloud tts is located in the HA container. At homeassistant/components/cloud/tts.py
Most likely, changes can be made there, but they will be reset with each update.
A system component is the worst option for interfering with the code; only proceed if you are confident in your abilities.
It might be better to create an additional issue on GitHub or support existing ones. This issue should be resolved architecturally, at the stage after the conversation handler.