If it’s the same issue I was having, the workaround I found for now, is to disable the wake word when the TTS starts playing, and re-activate it after it ends. It goes from unbearable, to perfectly clear.
I initially tried to do this directly in ESPHome, but quickly found it was easier to simply bypass it altogether. I built a script in HA that does all the logic, and I call it from within on_tts_start
in ESPHome voice_assistant
, from which I also removed the media_player
parameter (HA handles the TTS output).
I haven’t noticed any kind of “slowdown” because of this wrapper, and been using it for several weeks now. It works well for me.
I posted the ESPHome config, as well as the script in question here on Github (more readable than posting large portions of code here). PS: Make sure to switch to the main branch for latest updates, since the community’s software alters github links to day-of-post revision.