Wyoming-satellite as tts output?

KennethLavrsen · July 23, 2024, 1:10pm

I was fighting this for hours. My guess is that the problem may be on the client side and not server side. Rick can you share how your client is configured. Both your client OS, the software, docker or bare metal. Also does the daemons run as user or root. Do they talk directly to ALSA or via Pulse?

Rich37804 · July 23, 2024, 1:22pm

Bare metal machine. 10th gen I7 and 16Gb RAM. Pulse audio.
I just ran a test by sending a TTS to a speaker and triggered the second one before the TTS was finished and the second started immediately after. No delay at all.
What are you using for the TTS? Im using Chime TTS calling Amazon Polly. Similar results when using Piper.

KennethLavrsen · July 23, 2024, 1:43pm

Rich thanks for your follow up.

I just read your post in another thread.

I followed that tutorial and I installed that snapclient version. Maybe that was my main issue.

When I had the problem I was using the tts.cloud_say service. Now I use local piper with tts.speak with tts.piper. It is a little slow (few seconds) to generate the messages the first time, but with cache enabled it is not practical problem because 99% of my tts messages are “Someone at the front door”, “Diane is arriving”, “Would you like a little beer” kind of messages.

I am not using Chime TTS but considered it. Maybe that layer fixes the issue?

When I was sending TTS via Snapcast, the media:player device showed itself on and playing for 60 seconds after the message ended. And I could not send next message until the previous was reported complete which took 60 seconds.

Something made it hang. Maybe Chime TTS will not hang but just continue playing on the media_player device that is already on? I had a work around working sort of OK with where I sent a media_player.turn_off message before each TTS message. That worked 90% of the time. And then I had to change all the scripts so they were of the type Restart instead of Queued which meant that queing messages meant cutting messages to pieces. The Squeezebox integration behaves as it should. It keeps the media_player in playing mode for the duration of the message + 2 seconds, so I can queue messages correctly.

Rich37804 · July 23, 2024, 1:50pm

Be aware that there is an issue right now with Music assistant that causes TTS files to stop early. Its been reported and I am assuming they are working on it. It doesnt seem to affect my chromecast players but I dont use them much anymore. The issue actually affects all local files from my testing.
The issue you are describing may have been an issue in the past, but its opposite right now. I just never experienced that.
Short files of about 4 seconds complete all the way right now.

KennethLavrsen · July 23, 2024, 1:56pm

Thanks again. I really do not want to run Music Assistant. I do not need it for Music so it is a bit of a silly thing to setup just to transmit TTS to Snapcast clients. Snapcast of LMS is to get the messages in sync. If your satellites is in a large house and far from each other so the echo is of little concern then vlc is the most simple way i found so far.

We really either need a lean Snapcast media_player.

Naturally we all want the Wyoming-satellite to expose a media_player in HA but without sync of the client it still becomes a lot of echos.

Rich a final thing. Your client. Did you setup pulse a a root service? Or does the Snapcast service run as a plain user? That was also something I was fighting. And you read about nothing but warnings against running Pulse as root.

Rich37804 · July 23, 2024, 1:58pm

I believe its a plain user. I set it up per the FutureProofHomes tutorial.

KennethLavrsen · July 23, 2024, 2:03pm

The FutureProofHomes tutorial runs Pulse audio as a global service and not as the normal user. That gave me trouble on a Debian machine. Maybe it works better in Raspberry OS.

python · August 21, 2024, 3:26am

It’s coming very slowly, using MPD is not feasible because of the annoying bugs in ban system.

As far as I can understand main issue is that TTS output is usually in mp3 format and wyoming client is able to get WAV only at the moment. Do you have any plan where convertion of format will take place, on wyoming side or HA integration side?

brunkj · August 27, 2024, 3:16pm

I am with you. As soon as this is in place I will probably be able to get rid of my rhasspy satellites which have been working like rockstars for 2 years. Yes I understand the irony since @synesthesiam wrote rhasspy too

fanaticDavid · September 7, 2024, 6:27pm

A bit late to the party - or lack thereof - but I’m also hoping for Wyoming satellites to become available as TTS targets inside Home Assistant. But the fact that this thread is nearly 8 months old without any tangible confirmation doesn’t exactly fill me with confidence that it will happen.

I am legitimately not trying to be rude or ungrateful as I understand most developers are working on projects like this during their free time, but I am concerned. Development on the Wyoming satellite solution seems to be slow. E.g. as a Docker user, I still have to build most images myself.
Will this not be the go-to solution for offline, privacy friendly, self hosted voice assistant satellites in the future? Does HA have something else coming down the pipeline that we don’t know about yet? I can’t shake the feeling that I’m betting on the wrong horse.

I wish I could contribute to this project, but I am not a developer. My apologies for going off on a tangent.

ginandbacon · September 8, 2024, 1:48am

There are various options using ESP32-S3 (some lower versions also) with the espressif S3 Box-S3 being the most popular and supported. Originally, ESP32 devices like the m5stack atom echo used OpenWakeWord. The issue was OpenWakeWord didn’t work on ESP32 like it did on ARM for the Wyoming Satellite. That’s why both Wyoming Satellite and OpenWakeWord have to be installed manually on the raspberry pi satellite This meant the HA server had to constantly listen for the wake word with ESP32 device which was resource intensive, especially on a raspberry pi.

Then one of the ESPHome contributor came up with microwakeword which runs on the ESP32 and can listen for the wake word taking the load off the server. There is also newer boards coming out like the Seeed respeaker lite which has an XMOS chip on the board for echo/noise cancellation for the microphones. Nabu Casa has said their dedicated voice assistant will use an XMOS chip also but no word on release date yet.

Right now my best satellite is a USB dedicated speakerphone, one of those round ones, plugged directly into my HA server using Assist Microphone add on. The obvious downside is having to be connected directly to the HA server so not really a “satellite” per day.

The next is a tie between a Wyoming satellite and the S3 Box-S3 using this. It is a.media player so you can send TTS to it, has 24 customizable buttons and I particularly like the push to talk feature on areas with background noise like a TV. The ESP32 devices use piper, whisper, and Wyoming. They use OpenWakeWord if you don’t use on device wake word detection with microwakeword.

Depending on your voice pipeline it can be completely local or use Nabu Casa cloud although you’re aware of this already. Complete local just isn’t up to par with Nabu Cloud IMO but it also depends on what your running HA on as you can pick different local models to ise. I believe Espressif had discontinued this model but now it’s back on their official AliExpress page. 43US for the box and stand, 50.for.ome extra attachments. Still slightly more than a pi zero w2 and 2 mic respeaker but not by much and the added functionality of a touchscreen so worth it.

So in short, no, they aren’t abandoning local voice assistants, the development is actually expanding but seems to have shifted to ESP32 instead of ARM to leverage ESPHome instead of using a solution like the Wyoming satellite which is still a tie with the S3 box-S3. No additional add ons are needed outside possibly ESPHome for these to work and pretty soon you will be able to add devices like this without the add on using their new “made for ESPHome” devices. This is a recent addition so not any examples out yet that I can think of but makes it way more user friendly to non technical people who just want something that works out of the box. I would wait and see what Nabu announces as you know it will just work and be supported since they are making the hardware. If you need something now then the S3 Box would work, be a media player, and have a touchscreen. Only downside is it’s 1W speaker and no 3.5mm output.

donburch888 · September 9, 2024, 7:36am

In the The Current State of Voice - July 2024 thread Mike recently confirmed that Nabu Casa are actively developing a voice assistant device, which will be based on the ESP-S3 with an XMOS audio chip. More relevant, he expects the Voice Kit to become the go-to device … but RasPi’s still have a place in the technology matrix.

I agree that RasPi is overkill for a simple satellite device, and (at least here in Australia) far too expensive for dedicating to that single task.

Curiously a couple of other threads here and here have recently been mentioning up-coming devices with similar hardware, and apparently there is a lot of activity on github around these chips.

As for the Raspberry Pi & Wyoming-satellite …

I have 3 RasPi based Rhasspy voice satellites which I am not yet confident to upgrade to wyoming-satelite - and yes, the development effort definitely has shifted away. Mike stated that he anticipates being busy on VoiceKit for the rest of this year.
However when the VoiceKit is established in the market I am sure Mike will have plenty of updates to apply to wyoming-satellite But the waiting is hard

fanaticDavid · September 9, 2024, 9:38am

Thank you for your insightful feedback. The wait is indeed hard.

I’m using a few Raspberry Pi’s (with an Anker S330 conference speaker connected) around our apartment as Wyoming satellites. That exact setup was one of the ways in which Assist was first demonstrated. The Raspberry Pi’s are also running other services such as Pi-hole so I’m hoping this kind of setup will continue to be supported going forward.