Rhasspy TTS Options & Open Source WaveNet

fishertimj · December 15, 2019, 5:39pm

This is my first post about Rhasspy, having just jumped down the offline voice assistant rabbit hole. I’m only an hour or two in (installed Rhasspy on my NUC via Docker, reading a lot, buying a mic+pi setup, etc.) and color me VERY impressed! @synesthesiam, you have done something really amazing here and I can’t wait to see it continue to develop!

My first of many questions/comments are around the TTS options. The range of choices are much appreciated but none of the fully offline options “wow” me (i.e. are as good or better than Google’s - yes, for obvious reasons) but I do think some of the MaryTTS voices are pretty damn good.

When I was digging around for additional engines to suggest that you add, I came across this open source implementation of WaveNet: https://r9y9.github.io/wavenet_vocoder/. It’s a little over my head, certainly after just 5 minutes of reading, but I’m wondering if there’s an opportunity to use this in some capacity? I’m not sure if it’s a true TTS itself or another layer of processing on top of an existing TTS engine but I’d love any thoughts on it.

Thanks!

synesthesiam · December 15, 2019, 9:55pm

Hi @fishertimj! Thanks for the positive feedback and suggestion

I’ve looked into some newer TTS systems like the one you linked, and had always shied away from them because of the Tensorflow requirement (which can be a problem on Raspberry Pi’s). However, given the shift Rhasspy is undergoing to split into independent services that may run on different machines, this seems much more possible.

If anyone has a barebones Python example or a Dockerfile for this or other Tensorflow/PyTorch based TTS systems, that would speed things along!

fishertimj · December 15, 2019, 11:40pm

I wish I was a bit more savvy because I’d be happy to help you out here. Maybe we can agree on a number of coffees to buy you.

Romkabouter · December 16, 2019, 8:27am

I would like to add here for Google Wavenet:

You know exactly what is going to the cloud because you yourself are creating the repsonses
It only goes once, every returned spoken wave file is cached and played locally next time. So if you do not have a lot of variating sentences, using Google Wavenet will be 100% offline after a while. When adding randomness, the will be a bit more online stuff, but that implies some advanced coding stuff

fishertimj · December 24, 2019, 3:56pm

Very good points, @Romkabouter!

Once cached, if access to the internet is down, does it still play the cached version or is there always a check online first that will stop the processing? I’m assuming the former, but thought I’d ask!

Romkabouter · December 26, 2019, 4:59pm

First check is on a cached file, if not found then the speech api is called.
So yes, it plays when offline