Sorry Rich, i misunderstood. I have now installed the Chime TTS component through HACS, and it seems great. thanks for your advice.
Any ETA on this? I just set up a RPi4 + MIC+ v2 HAT and it works great, but I want to make “announcements” (TTS) directly to it without using the voice pipeline.
Today, I can do that with, for example Sonos, by generating the TTS audio and then using the media_player_entity_id
to broadcast the message through Sonos, but it doesn’t appear that this can be done with Wyoming Satellite?
Here’s what I want to be able to do that I seemingly can’t:
# Service call YAML:
service: tts.speak
target:
entity_id: tts.piper
data_template:
media_player_entity_id:
- media_player.? # Where is my Wyoming Satellite?
# Or can we have a "tts.my_wyoming_device"?
message: "{{ msg }}"
Anyone gotten this to work yet?
I just installed Chime TTS. And I have a Wyoming-Satellite working.
I do not understand how I can send TTS messages to my Wyoming Satellite. There is not a word about that in the Chime TTS documentation. It assumes media_player targets which is exactly what we do not have
I think the missing part here is that rich has some stuff installed in addition to wyoming satellite. (snapcast and pulseaudio). That enables a media player entity on the wyoming satellite.
I personally would love to see wyoming satellite natively expose a mediaplayer device.
@synesthesiam is there a big problem with this, or have you got side-tracked having fun with AI ?
I am happy to use a general purpose media player - my problem is deciding which to choose from
I have tried MPD which mostly coexists OK with Voice Assist (until they both try to output at the same time). Squeezelite seems better as a music player - but totally blocked Voice Assist
Your instructions use aplay and arecord, so I assumed to stick with ALSA for the media-player. Can ALSA and PulseAudio use the same speaker concurrently ? I still have lots to learn.
I tried different solutions for TTS while we wait for the obvious solution which is to let the satellite software provide it.
I tried first the Music Assistant/Snapserver/Snapclient solution for tts. That sucks. There is a 60 seconds time out from you transmit a tts message and till you can send another. And they queue up. That is no good.
Then I tried mpd. I could not make that work properly either.
I tried vlc. It is surprisingly easy to make that work and I just let the local vlc talk directly on the alsa layer. Yes, if you talk to the satellite and then get a message you loose some audio. That rarely happens. But vlc clients are not in sync so you have nasty echoes when you have 5 clients in the house.
I ended up with Squeezelite and also let us a Logitech Media Service (which has a new name now still with L). That works well. The TTS has a 2 second time out. That is perfect. And if you create a permanent group of all the satellites then they are in sync 90% of the times. That is better than Alexa Media Player does.
ONE THING. I do not use my clients for Music. I love music too much to listen to music from a crappy little Jabra mono box. I keep my Squeezeclient for music in a separate group from the Wyoming Satellite boxes. So I only need to worry about conflicts between Wyoming voice responses and the rare TTS messages. I can live with that
I’ve not experienced a 60 second delay using music assistant/snapcast for TTS. I can run them back to back without issue.
I was fighting this for hours. My guess is that the problem may be on the client side and not server side. Rick can you share how your client is configured. Both your client OS, the software, docker or bare metal. Also does the daemons run as user or root. Do they talk directly to ALSA or via Pulse?
Bare metal machine. 10th gen I7 and 16Gb RAM. Pulse audio.
I just ran a test by sending a TTS to a speaker and triggered the second one before the TTS was finished and the second started immediately after. No delay at all.
What are you using for the TTS? Im using Chime TTS calling Amazon Polly. Similar results when using Piper.
Rich thanks for your follow up.
I just read your post in another thread.
I followed that tutorial and I installed that snapclient version. Maybe that was my main issue.
When I had the problem I was using the tts.cloud_say service. Now I use local piper with tts.speak with tts.piper. It is a little slow (few seconds) to generate the messages the first time, but with cache enabled it is not practical problem because 99% of my tts messages are “Someone at the front door”, “Diane is arriving”, “Would you like a little beer” kind of messages.
I am not using Chime TTS but considered it. Maybe that layer fixes the issue?
When I was sending TTS via Snapcast, the media:player device showed itself on and playing for 60 seconds after the message ended. And I could not send next message until the previous was reported complete which took 60 seconds.
Something made it hang. Maybe Chime TTS will not hang but just continue playing on the media_player device that is already on? I had a work around working sort of OK with where I sent a media_player.turn_off message before each TTS message. That worked 90% of the time. And then I had to change all the scripts so they were of the type Restart instead of Queued which meant that queing messages meant cutting messages to pieces. The Squeezebox integration behaves as it should. It keeps the media_player in playing mode for the duration of the message + 2 seconds, so I can queue messages correctly.
Be aware that there is an issue right now with Music assistant that causes TTS files to stop early. Its been reported and I am assuming they are working on it. It doesnt seem to affect my chromecast players but I dont use them much anymore. The issue actually affects all local files from my testing.
The issue you are describing may have been an issue in the past, but its opposite right now. I just never experienced that.
Short files of about 4 seconds complete all the way right now.
Thanks again. I really do not want to run Music Assistant. I do not need it for Music so it is a bit of a silly thing to setup just to transmit TTS to Snapcast clients. Snapcast of LMS is to get the messages in sync. If your satellites is in a large house and far from each other so the echo is of little concern then vlc is the most simple way i found so far.
We really either need a lean Snapcast media_player.
Naturally we all want the Wyoming-satellite to expose a media_player in HA but without sync of the client it still becomes a lot of echos.
Rich a final thing. Your client. Did you setup pulse a a root service? Or does the Snapcast service run as a plain user? That was also something I was fighting. And you read about nothing but warnings against running Pulse as root.
I believe its a plain user. I set it up per the FutureProofHomes tutorial.
The FutureProofHomes tutorial runs Pulse audio as a global service and not as the normal user. That gave me trouble on a Debian machine. Maybe it works better in Raspberry OS.
It’s coming very slowly, using MPD is not feasible because of the annoying bugs in ban system.
As far as I can understand main issue is that TTS output is usually in mp3 format and wyoming client is able to get WAV only at the moment. Do you have any plan where convertion of format will take place, on wyoming side or HA integration side?
I am with you. As soon as this is in place I will probably be able to get rid of my rhasspy satellites which have been working like rockstars for 2 years. Yes I understand the irony since @synesthesiam wrote rhasspy too
A bit late to the party - or lack thereof - but I’m also hoping for Wyoming satellites to become available as TTS targets inside Home Assistant. But the fact that this thread is nearly 8 months old without any tangible confirmation doesn’t exactly fill me with confidence that it will happen.
I am legitimately not trying to be rude or ungrateful as I understand most developers are working on projects like this during their free time, but I am concerned. Development on the Wyoming satellite solution seems to be slow. E.g. as a Docker user, I still have to build most images myself.
Will this not be the go-to solution for offline, privacy friendly, self hosted voice assistant satellites in the future? Does HA have something else coming down the pipeline that we don’t know about yet? I can’t shake the feeling that I’m betting on the wrong horse.
I wish I could contribute to this project, but I am not a developer. My apologies for going off on a tangent.
There are various options using ESP32-S3 (some lower versions also) with the espressif S3 Box-S3 being the most popular and supported. Originally, ESP32 devices like the m5stack atom echo used OpenWakeWord. The issue was OpenWakeWord didn’t work on ESP32 like it did on ARM for the Wyoming Satellite. That’s why both Wyoming Satellite and OpenWakeWord have to be installed manually on the raspberry pi satellite This meant the HA server had to constantly listen for the wake word with ESP32 device which was resource intensive, especially on a raspberry pi.
Then one of the ESPHome contributor came up with microwakeword which runs on the ESP32 and can listen for the wake word taking the load off the server. There is also newer boards coming out like the Seeed respeaker lite which has an XMOS chip on the board for echo/noise cancellation for the microphones. Nabu Casa has said their dedicated voice assistant will use an XMOS chip also but no word on release date yet.
Right now my best satellite is a USB dedicated speakerphone, one of those round ones, plugged directly into my HA server using Assist Microphone add on. The obvious downside is having to be connected directly to the HA server so not really a “satellite” per day.
The next is a tie between a Wyoming satellite and the S3 Box-S3 using this. It is a.media player so you can send TTS to it, has 24 customizable buttons and I particularly like the push to talk feature on areas with background noise like a TV. The ESP32 devices use piper, whisper, and Wyoming. They use OpenWakeWord if you don’t use on device wake word detection with microwakeword.
Depending on your voice pipeline it can be completely local or use Nabu Casa cloud although you’re aware of this already. Complete local just isn’t up to par with Nabu Cloud IMO but it also depends on what your running HA on as you can pick different local models to ise. I believe Espressif had discontinued this model but now it’s back on their official AliExpress page. 43US for the box and stand, 50.for.ome extra attachments. Still slightly more than a pi zero w2 and 2 mic respeaker but not by much and the added functionality of a touchscreen so worth it.
So in short, no, they aren’t abandoning local voice assistants, the development is actually expanding but seems to have shifted to ESP32 instead of ARM to leverage ESPHome instead of using a solution like the Wyoming satellite which is still a tie with the S3 box-S3. No additional add ons are needed outside possibly ESPHome for these to work and pretty soon you will be able to add devices like this without the add on using their new “made for ESPHome” devices. This is a recent addition so not any examples out yet that I can think of but makes it way more user friendly to non technical people who just want something that works out of the box. I would wait and see what Nabu announces as you know it will just work and be supported since they are making the hardware. If you need something now then the S3 Box would work, be a media player, and have a touchscreen. Only downside is it’s 1W speaker and no 3.5mm output.
In the The Current State of Voice - July 2024 thread Mike recently confirmed that Nabu Casa are actively developing a voice assistant device, which will be based on the ESP-S3 with an XMOS audio chip. More relevant, he expects the Voice Kit to become the go-to device … but RasPi’s still have a place in the technology matrix.
I agree that RasPi is overkill for a simple satellite device, and (at least here in Australia) far too expensive for dedicating to that single task.
Curiously a couple of other threads here and here have recently been mentioning up-coming devices with similar hardware, and apparently there is a lot of activity on github around these chips.
As for the Raspberry Pi & Wyoming-satellite …
I have 3 RasPi based Rhasspy voice satellites which I am not yet confident to upgrade to wyoming-satelite - and yes, the development effort definitely has shifted away. Mike stated that he anticipates being busy on VoiceKit for the rest of this year.
However when the VoiceKit is established in the market I am sure Mike will have plenty of updates to apply to wyoming-satellite But the waiting is hard
Thank you for your insightful feedback. The wait is indeed hard.
I’m using a few Raspberry Pi’s (with an Anker S330 conference speaker connected) around our apartment as Wyoming satellites. That exact setup was one of the ways in which Assist was first demonstrated. The Raspberry Pi’s are also running other services such as Pi-hole so I’m hoping this kind of setup will continue to be supported going forward.