Sending TTS via Snapcast does not play messages

Tarass_Gribuks · February 15, 2021, 9:28pm

Hello there,

I was testing this thing for a whole day and got new data. Let me share it with you.

VLC. It does write data to a pipe already via telnet, they even list it in the players options in their official manual:

VLC
Use --aout afile and --audiofile-file to pipe VLC's audio output to the snapfifo:

vlc --no-video --aout afile --audiofile-file /tmp/snapfifo

But there is something beautifull about the VLC in the HomeAssistant which I would like to share in the end as a bonus to all of you, who spent their time helping me out with this situation - maybe you would find it usefull in your projects.

Snapcast situation.
I decided to try MPD instead of Mopidy to make the experiment pure to understand if it is Mopidy or Snapcast, or both of them. Besides, MPD seemes to be a bit easier to understand in operation. So, I installed the mpd on NUC with Snapserver. I did amend the config as per the directions from:
“https://github.com/badaix/snapcast/blob/1f98c790734f7b5de01497fa5243149dddd6dc0b/doc/player_setup.md#mpd” - I created a pipe named it ‘test’ pipe in the snapserver.conf file and directed the sound data there in the mpd.conf to write to that pipe.
Then I installed MPD client on my phone. Then I added to a playlist radiotation just like that:

sudo su
echo "http://wdr-1live-live.icecast.wdr.de/wdr/1live/live/mp3/128/stream.mp3" > /var/lib/mpd/playlists/einslive.m3u

And it did appear in the playlist in the client on my phone, and it said to start playing but no sound was produced again… Then I started to collect logs. So, the logs needed:

From MPD server on NUC (192.168.1.246)
From Snapcast server on NUC (192.168.1.246)
From Snapcast Client on RPi4 (192.168.1.247)
When I collected those I found some minor ‘exeptions’ which seemed insignificat to me, but I decided to tackle those, and discovered, that no metter what I did I could not get read of that one:

 bind to '0.0.0.0:6601' failed (continuing anyway, because binding to '[::]:6601' succeeded): Failed to bind socket: Address already in use

So, best solution to me seemed to reinstall whole thing from scratch - for that purpose I came across this:
sudo apt-get install --reinstall mpd
As the text on the screen stopped emerging all my 3 sound scources with snapclients installed (phone, laptop and RPi4 with speakers) started to play radio I added into the list, that is it - Black Magic! I guess, we will never know the true source of the phenomena observed, unless someone with experience tells us one day leaving a comment below, but for those of you who run into something similar - when you seem to do as per the manual but nothing works - consider reinstalling that thing from scratch, maybe some module you are using is outdated, or whatever else… it might help.
The moment it started to play radio, it started to playTTS as well - so once you are done with phase 1, phase 2 is completed automatically. Was it a Mopidy or some outdated lib - as I said we may never know, but it has lead me to some interesting use of VLC player as a result of these 3 days of investigation…

Bonus - VLC.
VLC is nothing special by itself, but in Home Assistant it stands alone from the other platforms for the fact, that it is speaker defined - meaning that one can define as many mediaplayers of that platform as you have spekers connected to your Pi - and yes, I do mean blue tooth… You can have dozens of these, I suppose. At the moment of me writing that, I have tested it with 2 sets of speakers: BT + wired, connected to my Pi4, but same applies to any number. So, having dozen of mediaplayers in one HA instance associated with one and the same Pi is not a great use - unless you have this:
‘https://github.com/custom-components/remote_homeassistant’
What it allows you to do - is to have an unlimited number of independent instances of HA joined into a single cluster - you can import any entity into your main HA instance, including mediaplayers. And this turns your dozens of local VLC players for individual speakers into dozens of network VLC players for individual speakers. How to use it - you can dinamicaly determine which output to use while your programm execution, which is impossible when using MPD or Mopidy - the output scource is defined before the start and cannot be chanded before the end. You can have several BT speakers in your room, play all of them - all players envolved, some of them… - and these speakers are connected to a single Pi…
And another bonus - you can build as many Zigbee networks as you have Pi’s now - I already have 3 ZHA integrations in One Home Assistant while they say you cannot have more than one - easy, one is in the main instance, and other two are in my test Pi4 and in HA, running on bare metall NUC. This is very usefull when you have 3 floores or have devices which do not want to live in one network. Since in every room I will have RPi for the multi-room - yes, right, every room will have it’s own zigbee network as well, and they all will be joined in my main HA instance where my general code is running… Hope you will find use of this as well.

Thanks for your help!

Burningstone · February 15, 2021, 9:41pm

Glad you got it to work in the end.

That’s not what I’d recommend, separati g zigbee networks ahould only be done if not possible to have one big mesh or if ZHA and ZLL devices interfere with each other. Otherwise it’s better to have one largw zigbee network, the more nodes, the sreonger your mesh. I have 60+ zigbee devices on my mesh and it works flawless.

Tarass_Gribuks · February 15, 2021, 9:50pm

I have 183 zigbe devices alltogather at the moment. What can I say - theoretically they promiss you that you can have houndreds in one network, practically - every week or so your whole network collapses into fragments and there is nothing you can do about it.
Interference is a simple thing to avoid - use channels alternations space expanded - solved. Ex below, where the letters denote channels spaced by step 2 atleast.
x y z
y x y
z y z
Experience - bulbs and end devices will never live in one network - not a chance. You will always have sensor dissapearing from the network one day and have to plug it back…
Plugs and end devices doing great…
Why is that - no idea, maybe my bulbs manufacturers (gledopto) did not follow exactly zigbee standards… But what can I do about it now?

danbutter · February 15, 2021, 10:35pm

Could you please share the automation you ended up with?
I’m curious to see what it looks like.

Thanks

Tarass_Gribuks · February 16, 2021, 12:55am

I do not fully comprahand the meaning of ‘ended with’, since I do not use HA in general way - I need it only for the sake of device drivers - which for some reasons are called ‘integrations’ there, but basically are programs which explain to the system how to deal with a devices - drivers, network interface drivers, if you prefer to call those this way, and a couple of usefull add-ons, the rest I develop myself, thus my ‘automations’, ever when I not use python and use jinja instead, very far resemble to what you are used to, and require some debugging skills to understand what they do. Thus I do some simple examle for the VLC player, as I understood you were interested in this:

Lets call your main instance of HA - HAm, and satelite instance on RPi - HAs. HAs needs to be installed in the virtual environment under pyton3.8 since otherwise VLC won’t be granted acces to the ALSA and no sound will be played. Than you install VLC following the generall procedure under Debian(Raspbian). Once done, you can add to the configuration.yaml of your HAs mediaplayers (I assume by this time you added a BT speakers already to your Debian instalation, the manuals for that are available on the WEB):

media_player:

  - platform: vlc
    name: mainspeaker_bt
    arguments: "--alsa-audio-device=pod1"

  - platform: vlc
    name: mainspeaker_wire
    arguments: "--alsa-audio-device=hw:0,0"

pod1 is the name of the BT device in asound.conf file.

After that you install from the HASC that usefull integration - “Remote Home Assistant” in your HAm. If HAm and HAs are on the same network, it will detect HAs automatically and offer you to set it up. You pick the entities you want to import from the HAs, do not forget to import ‘state_changed’ events too, otherwise you will not be able to track the changes of your objects. And it is done, now you will be able to work with media_player.mainspeaker_bt and media_player.mainspeaker_wire as if they were installed and declared in HAm, no difference, and use it in your automations. Let’s do simple example: imagine you have a long corridor and 2 sensors telling you the direction someone entered it. At one end you have spekers bt,and at another end spekers wire, and remember, they are both connectd to same RPi, which is 2 floors away from your HAm instance. And you want spekers to pass greating message to the person entering: “Hello, dear eathling. What a nice day it is at your planet!”. It could be like this:

alias: Greating to the earthling
description: ''
mode: single
trigger:
  - platform: event
    event_type: state_changed
condition:
  - condition: template
    value_template: |-
      {{
        'binary_sensor.corridor_side_A' in trigger.event.data.entity_id
         or
        'binary_sensor.corridor_side_B' in trigger.event.data.entity_id
      }}
         
action:
  - choose:
      - conditions:
          - condition: template
            value_template: |-
              {{'binary_sensor.corridor_side_A' in trigger.event.data.entity_id }}
        sequence:
          - service: script.pass_msg
            data:
              msg: 'Hello, earthling. What a nice day it is at your planet'
              speaker: media_player.mainspeaker_bt
      - conditions:
          - condition: template
            value_template: |-
              {{'binary_sensor.corridor_side_B' in trigger.event.data.entity_id }}
        sequence:
          - service: script.pass_msg
            data:
              msg: 'Hello, earthling. What a nice day it is at your planet'
              speaker: media_player.mainspeaker_wire                            
    default: []

and the called script would simply do the TTSed msg to the passed ‘speaker’ entity_id.
The rest is your imagination. Add 4 BT speakers and make yourself a surround music system. I am not sure how well it would be synchronised however - snapcast does it perfectly for the reason it makes all speakers to read from single scource, but you never know, maybe something decent would come up… Anyway, snapcast is only capable of working with only ONE speaker connected to your RPi which you told it to use in the config, while via VLC you can use all of them picking dynamically which one to use…

danbutter · February 16, 2021, 1:16am

Sorry I guess that “ended up with” is a localized term. It just means what you have at the end of your journey after all the changes you had to make…So I was asking was what your automation to call the TTS service is now that you have gotten things working to your satisfaction.

You gave an example of an automation in your first post that didn’t seem to work.
Just curious how it changed when you were successful.

I’ve just gotten mopidy and snapcast to work for radio streams…I still have work to do with it.
I would like to be able to use this for TTS in the future though so I’m interested on what people are doing with it.
Long term I’d like to have Rhasspy on some raspi satellites that can also play music and do TTS.
I don’t have a lot of time to do all this so I think it will be a while before I get there.

Tarass_Gribuks · February 16, 2021, 1:44am

'>>'You gave an example of an automation in your first post that didn’t seem to work.
Just curious how it changed when you were successful.

That automation was a test to understand if the snapcast and mopidi working since it was not quite clear how to deal with it out of the box first when I installed it. As you understand it has no resemblance to the project I was working on. Just now, when all the logics is done and tested, it is time to start deploying actual infrastructure for the real people who would use it, to make it convenient to interact with voice assistant via hidden mic and get replyes through the normal speakers, and multi-room as a side effect. So, basically what you do, is make something simple to see how it works to understand how to deploy the real project, which is of the different scale.

I also used Rhasspy in my project and left pretty pleased with its performance, though I made a feature request to protect their API with a token authorithation - it is not O.K. at the moment that anyone on the same local network can get access to your HA via Rhasspy’s API just knowing the port Rhasspy uses for HTTP. Apart from that - very good piece of software, hope they can do the changes soon. Rhaspy on satelites, which do multicasting along with STT and event recognition - this is exectly what I have done, and this was the last brick in that wall. As I see, this is the right approach you are taking. I replaced almost all GUI with voice assistant in my system doing exectly this way. Works pretty well.

As the experience of these 3 days tells, I would not use snapcast for TTS. TTS does not require synchronization - what is the point of using it for the TTS? This tool is for multi-room. For TTS I would go with VLC for the resons I have explained - if you have more than one speker in the room, you can TTS to one of them, and keep playing multi-room music through another - lower the volume during speach - who can do that via snapcast? - nobody… Besides, imagine you are casting the sound to several rooms and you decide to pass a message to only one of those rooms, what should the others do meanwhile? ‘Enjoy the silence?’ Definetely, TTS and multi-room are not ment to be togather, this are different tools…

Burningstone · February 16, 2021, 7:13am

Nice, so my memory is pretty bad

I don’t have as many bulbs as you do, however for me bulbs and end devices work perfectly fine for years. I never had any sensor lose connection or my mesh crashing or anything similar, it was rock solid for 4 years now. The only issue I had was a door sensor, which lost connection every 10 mins or so, I found out that the device was faulty, replaced it with the same sensor and no connection issues anymore.

Continue playing the music they were playing? Snapserver makes all sources available to all clients, the clients then decide which source they wanna listen to. TTS is a separate source, which clients can switch to and then switch back once the message has been played.
My wife can happily play music in her bathroom, through snapcast and I get a TTS message at the same time in the office also through snapcast and it won’t stop my wife’s music.

I’m curious about your VLC solution. Can you play some music, then pause the music and play TTS and then continue with the music or will the track just continue playing in the background or will the song be restarted after the TTS message?

Tarass_Gribuks · February 16, 2021, 9:10am

‘>>>’ I never had any sensor lose connection or my mesh crashing

As I said, I belive it all depends on the manufacturer of the device - how stricly they comply with the Zigbee standards… I use sonoff - no reason for, just first option I tried, and when I had them deployed already a lot of “inreresting” stuff was noticed to be going on. 4 years… - I think you should share the details on how to build so reliable zigbe networks - hardware primarily, because when you start you really do not understand what is going to work well and what is not. You buy first thing which is availabele and… In my case there were a lot of hardware issues with devices disappearing from the network. The only way to make it work stable was to fragment it. So sensors would not be connected through the lonsoho bulbs, which may decide to leave your mesh for a while, taking your sensor for a walk to keep the company…

‘>>>’ Snapserver makes all sources available to all clients
Yes, in this part you are right, it does have procedures in its API to do that so other clients would not suffer - i discovered that after I posted the statement. But another big issue, and I am not the one to notice that, is unresolved - you can listen to a broadcasted music, you can listen to aTTS, but you cannot do that simultaniously with the Snapcast - ALSA issue. If you browse through the forums, you’d discover that many people tried, but none succeeded… That what I pointed out. With VLC you have a mediaplayer which coexists with the Snapcast and takes control of the speakers Snapclient cannot reach. When you get a message, you ‘shift’ the music to the background lowering its volume, and move the TTS to the front playing it at a full volume - the effect is way more different from the one when you stop music, play message, continue music… As an illustration, imagine you’ve taken your wife for a romantic dinner to a restaraunt with a live music, and every time a waiter would come to you the band would stop playing - how would that feel?

Tarass_Gribuks · February 16, 2021, 9:53am

Just when I finished writing I came up with an alternative solution to this - multiple snapclients on your RPi would solve the problem of controlling all speakers. I decided to browse if it is solved or not, and came accross this:

5 years ago people were trying to do that, and yet I could not understand if this feature is there or not - from the description they provide today at their web page, they did not implement it still… But, as a ‘hack’ solution, you can try and use different network interfaces one guy suggested… good luck!

Burningstone · February 16, 2021, 10:25am

It’s definitely down to hardware, probably I was just lucky. Nothing special about my setup, I started with Philips Hue only devices and then added Xiaomi, IKEA and OSRAM devices. Approx. One third of the devices are mains powered and from my experience the Hue bulbs have been good routers for the zigbee mesh.

Yes, this might be true, I remember people found some workaround. But it doesn’t apply to my setup as I only have one pair of speakers per room, so it wouldn’t even be possible for me to do this hardware wise.
But to be honest, I barely use TTS and even less so when we are listening to music (I don’t like the voices and for most messages I rather have it added to my to do list through an automation instead of a TTS message). I mainly use snapcast for multiroom audio synced across rooms or individual music in different rooms.

Tarass_Gribuks · February 16, 2021, 12:09pm

The help option for snapclient returns this:

root@raspberrypi:~# snapclient --help
Allowed options:
  --help                          produce help message
  -v, --version                   show version number
  -h, --host arg                  server hostname or ip address
  -p, --port arg (=1704)          server port
  -i, --instance arg (=1)         instance id when running multiple instances on                                                                                                                                                              the same host
  --hostID arg                    unique host id, default is MAC address
  -l, --list                      list PCM devices
  -s, --soundcard arg (=default)  index or name of the pcm device
  --latency arg (=0)              latency of the PCM device
  --sampleformat arg              resample audio stream to <rate>:<bits>:<channe                                                                                                                                                             ls>
  --player arg (=alsa)            alsa|pulse|file[:<options>|?]
  --mixer arg (=software)         software|hardware|script|none|?[:<options>]
  -e, --mstderr                   send metadata to stderr
  -d, --daemon [=arg(=-3)]        daemonize, optional process priority [-20..19]
  --user arg                      the user[:group] to run snapclient as when dae                                                                                                                                                             monized
  --logsink arg                   log sink [null,system,stdout,stderr,file:<file                                                                                                                                                             name>]
  --logfilter arg (=*:info)       log filter <tag>:<level>[,<tag>:<level>]* with                                                                                                                                                              tag = * or <log tag> and level = [trace,debug,info,notice,warning,error,fatal]

which implies, that under -i option you may pass a different instance ID to a snapserver. I guess they did implement that feature after all 4 years ago… I will test it to see if it is possible to controll different speakers in one room via snapcast.

Tarass_Gribuks · February 16, 2021, 9:35pm

UPD:

yes, as expected, indeed they implemented multiple snapclient instances on a single device. Sufficient enough to create a service for the second instance with the instance number and output device. Thus you will be able to use that extra speakers as well for your multi-room.
This is not enough, however, if you want to have full control over what speaker to use for your TTS. I have read Mopidy config docs, and discovered that by indicating:

[audio]
output = alsasink device=device_name
mixer = alsamixer

in your config you tell Mopidi instance what device to use. Therefore, theoretically, for mopidy you can also ran several instances for every BT speaker to have virtual media_player for every speaker - equvalent to the VLC with HA, no difference. But, sice as you know Mopidi did not start for me when I tried it, I already removed it and did not check if it works or not.

Tarass_Gribuks · February 17, 2021, 4:14pm

Jus a brief note on why one should NEVER use multi-room (Snacast) for TTS (just came to mind while testing my module for that).

According to the general patern for TTS usage with Snapcast, you are supposet to open a scource for the TTS collection, to write data broadcasted by Snapclients.

If you plan to do something decent with Rhasspy, you would end up with speach interface (voice assistant as it is called nowadays) for your climate, lighting, engineering, survailance, landscaling and other stuff - otherwise nobody, except you, would use it. And odds are pretty high that you and someone else would want to send commands to your voice assistant at the same time. What would you get as TTS return? Right, you both would be listening to the return of each other. In best ocasion, if you implement connect/disconnect procedures to prevent others listening to a messages not addressed to them (and it is tricky, a very tiny bit, but still - you have to write a synchro engine on top of that all), you would endup with silence periods while you would be waiting for your turn to listen to a messages addressed to you. Rhasspy passes the satelite id togather with the event to HA, so you have sufficient info to send the TTS to a specific mediaplayer back directly, not mixing it with other messages.

Good luck!