Year of the voice, what about GLaDOS?

DiMMaX · October 18, 2023, 12:25pm

Hi.

This is my first post here and I’ve been looking around to make sure I’m in the right place to talk about this topic. If I made a mistake, please excuse me and let me know where to take this conversation to and I’ll gladly move it to the proper place. Thanks in advance!

This being said I’d like to check with this community for possibilities for ‘my project’. I’ve been searching the web and couldn’t really find something that fits my needs/idea so I’m hoping to get some ideas/suggestions/help from within this community.

Before we proceed I cannot leave out to mention my profile or what kind of user I am as this will have an impact on possible solutions. ‘Why?’ I hear you think, well I’m not a programmer for starters. So implementing complex code or writing/compiling code other than YAML in the related config files will be an issue. I do not have a developer environment available to me nor do I have the knowledge or desire to set one up. (Tried this several times in the past for other projects and it’s simply wasted on me, so I don’t want to go that route again)
I find this important to mention as the suggestions (most likely) might require to have this knowledge…

I will provide as much info about my setup and idea/concept as possible. If I have left out something, let me know and I’ll happily provide the missing bits. If this happens it’s probably due to a lack of understanding the requirements for this project, sorry in advance…

With all this out of the way, let’s dive into the idea/concept…

As the title of this post suggests, this idea/project is voice related and more specifically to text to speech (I assume)
In short: my goal is to have my voice assistant talk/respond as GLaDOS (from the Portal game) and can control my Home Assistant environment.

I’ve been researching and I couldn’t really find an ‘easy’ solution to this. But I figured a lot of data/info on the web is a bit outdated since we’re in the year of the voice and Home Assistant might have capabilities which weren’t available before. So I’m hoping those new capabilities might shed some new light on this topic and bring it a bit closer to a noob as myself.

I’ve discovered Nerdaxic’s project on GitHub but this is way too complex for me and I’m not sure how to even start with that. My head starts to hurt by reading the (very limited) instructions. As with many GitHub projects, the instructions are meant to be followed by other programmers/developers knowing what they’re doing. I’m not one of them…
Link: GitHub - nerdaxic/glados-tts: A GLaDOS TTS, using Forward Tacotron and HiFiGAN. Inference is fast and stable, even on the CPU. A low quality vocoder model is included for mobile use. Rudimentary TTS script included. Works perfectly on Linux, partially on Maybe someone smarter than me can make a GUI.

I’ve also discovered this GLaDOS voice Generator which has a ‘simple’ API (as they claim) but I have no idea how to make use of this API from within Home Assistant.
Link: https://glados.c-net.org

I think the solution might lie into one (or both) these websites, but again, to me it might just as well be written in Chinese, I have no clue where and how to start.

Further more I have no test-environment available so I don’t want to screw up my operational setup. It would be a nightmare to loose all that and need to reinstall.

As for my setup:

I’m running Home Assistant OS on a Raspberry Pi 4 (Xute) and so far all that goes well.
I also have Nabu Casa enabled for Alexa integration. So I can control my entities by voice through Alexa. I have echo’s all over my house.

The way I see this, is that there’s 3 possible paths here.

Via Alexa (not sure how)
I see this as an option since I have the mics available from the Echo’s and she’s already integrated.
Via the integrated voice assistant in HA.
I’m hoping the link with text-to-speech might bring some solution.
Something entirely different which I’m not aware of at this point.

For the HA path (option 2) I’ll need mics to be able to communicate. I’m not sure if the mics in the Echo’s could be used for that (regardless of Alexa). Reason for thinking this is because in the recent Year of the Voice video from HA they mentioned that any mic-enabled device could be used for the voice assistant. Again, I have no idea where an how to even start.
They also mentioned the ATOM Echo Smart Speaker Development Kit but even though they say it’s available, it’s not.
Only to mention I’m willing to go that route as well if needed.

So in conclusion, is there anyone around who’s having info that might help with this? Considering you’re talking to a 50y old noob…

Your input is much appreciated!

Cheers!

DiMMaX