New experimental "Sendspin protocol" (formerly "Resonate") for streaming synchronized multi-room audio and HiFi music playback to multiple media player appliances and smart speakers running ESPHome on ESP32 or Linux on Raspberry Pi (Zero)

Update: Resonate has been renamed to “Sendspin” (i.e. “Sendspin Protocol/Server/Client”):

Just a roadmap heads up to ESPHome tinkerers, not sure if this new experimental “Resonate protocol” will be usable on all ESPHome based audio output/input devices running on ESP32 (i.e. voice assistants, media player, and smart speaker hardware), but FYI, I heard it sounds like (pun intended) some Open Home Foundation and Music Assistant developers have started working on a new open-source audio streaming protocol for a better multi-room audio and music playback experience on embedded hardware running ESPHome on ESP32 or Linux on Raspberry Pi (Zero):

This new protocol specification is design from scratch but is similar to Squeezelite and Snapcast as well as Music Player Daemon (MPD) in concept at a high level, Resonate is not a stand-alone player, but an extension that can turn an existing audio player into a Sonos-like multiroom audio solution with time synchronized between the clients and the server to play perfectly synced audio. In their own tests they claim to see that average time deviation is below 0.05ms for synced audio (50 microseconds between two ESP32-S3 devices running ESPHome and connected over WiFi). Think of it like a completly royalty-free and open-source competition to Apple AirPlay (formerly AirTunes) and Google Cast (Chromecast Audio) propriatory protocols but for implementing HiFi multi-room audio solutions.

Resonate is a multi-room music experience protocol. The goal of the protocol is to orchestrate all devices that make up the music listening experience. This includes outputting audio on multiple speakers simultaneously, screens and lights visualizing the audio or album art, and wall tablets providing media controls.” “Definitions; Server, a Resonate server. Orchestrates all devices. Generates an audio stream, manages all the players, provides metadata etc… Player, a Resonate client that can play audio, visualize audio or album art or provide music controls”.

Resonate Project Board (Backlog and Roadmap):

Btw, Music Assistant lead developer posted a reply in their discussion section answering the question why they choose to not just use Snapcast:

For a WIP reference implementation of a server using their aioresonate (Async Python library see how it is implemented in Music Assistant:

And the ESPHome firmware project will get a new experimental audio client component that adds support for the Resonate Protocol for synchronized music-playback across multiple ESPHome-based devices, i.e. audio synchronizer to enable sound-sync timing for DIY multi-room audio systems:

:warning: Note! This is a development snapshot - NOT ready for production use. This PR is intended for testing and feedback purposes only. :warning:

Implementation Details:

  • Time Synchronization:

    • Employs a Kalman filter to model and compensate for internal clock drift between the client and server
    • Dynamically tracks and adjusts for each device’s clock characteristics to maintain tight synchronization
    • In preliminary testing, achieved a median audio synchronization error of approximately 50 microseconds between two ESP32-S3 devices connected over WiFi.
  • Architecture:

    • The resonate component implements a flexible hub architecture that allows devices to participate in the audio listening experience with different levels of functionality:
      • Text Sensor: Display track metadata without audio playback (e.g., on screens)
      • Media Player: Output synchronized audio
      • Extensible Design: Ready for future integrations like audio visualization
  • mDNS:

    • Implements mDNS advertisement for the component’s WebSocket server
    • Enables automatic discovery by resonate servers
  • External Dependencies:

    • Integrates esp-libopus (WIP IDF component) for Opus audio codec support
  • Testing:

  • Current Status:

    • August 28-29th updates adds several important changes:
      • Now compatible with the Music Assistant server’s resonate branch
      • Fixes issues with rapid stopping and starting (requires using the mixer and resampler speaker platform changes in this PR)
      • Improves memory handling and safety
      • Adds groundwork to support cover art as an image component in ESPHome
        • This PR is intended for testing and feedback purposes only. The implementation contains:
          • Numerous TODO items for simplification and optimization
          • Extensive debug logging for diagnosing synchronization issues
          • Incomplete features pending server-side support (metadata processing, FLAC decoding)
          • Breaking changes expected in future iterations
          • Modified Components
  • Future Work:

    • Protocol refinement and stabilization
    • Enhanced audio stack to support additional synchronized audio protocols
    • Complete metadata and FLAC support
    • Production-ready implementation

There was some discussions + further explainations on its concept in a other thead that asked about this new multi-room streaming protocol:

_Originally posted by marcelveldt in New multi-room streaming protocol from MA VS SnapCast · music-assistant · Discussion #3883 · GitHub

"we decided to create (yes, yet another!) protocol. One that would fulfill all our needs and would work across the internet and suitable for low powered devices like the ESP32. It will support (perfect) synced playback by default but can also switch codecs on the fly, making it suitable to stream to a (remote) browser. Based on websockets ensures it can travel across firewalls, reverse proxies and whatnot. It’s a bit of shame to not use the strength of an existing audio streaming protocol (snapcast) but we ran into so much trouble it became more of a burden than a help. I even considered rewriting Snapcast from scratch in python, so keep the binary protocol the same but just re-implement it with our issues fixed. I’ll make sure we publish the details somewhere soon, currently its still in ideation/POC phase but looking very good. “Be aware that this is nowhere near ready to test something - we are still defining the spec and doing some PoC implementations to test the theory.

If I understand correctly this new Resonate synchronized audio component will add a flexible hub architecture for audio time synchronization

This could solve the problem that you normally need more expensive hardware solution if you want to add multi-room audio to existing speakers, as once this is implemented fully you could have almost perfect multi-room audio syncronization using something like the Home Assistant Voice Preview Edition connecte to external speakers using its stereo output jack:

But a more relevant follow-up feature request for Music Assistant when this implemented would be if could add support for different types of multi-room audio, (to have options for single source-single zone, single source-multiple zone, or multiple source-multiple zone playback)?

Ongoing discussion about this new experimental Resonate audio component is going on in this Discord channel here:

KevinA:
"Luckily math is here to save the day :laughing: . A Kalman filter is a nice way to combine measurements with different uncertainties, which is exactly what we have! Basically, if the most recent message has a very low delay, then we weigh its value much more than if it had a very high delay. It gives us a much more stable offset than using the median, but it also gives us an estimated error after each update. This means the client can estimate how accurate its own offset is! This would allow us to slow down or increase the pace we send the time messages. The more confident we are in the offset, the less we need to send - and vice versa. This will be especially helpful when we first connect and need to spam a bunch of time messages to quickly build our confidence in the computed offset.

So being accurate is important, of course, but being stable is also very important. The median was problematic, even if its inaccuracy wasn’t audible. With the median filter, the offset would shift by over 200 microseconds, on average, per update, which means the client would have to constantly be tweaking the audio to try to stay in sync. With very big jumps, it would require hard syncing which is audible.

The Kalman filter modeling the drift (of 2 seconds per day) had an average offset change of only ~25 microseconds per update. Keep in mind, if the clock does drift 2 seconds per day, we would expect it change 23 microseconds per update if the updates are 1 second apart. If there is no systematic drift in the client clock, the Kalman filter had an average change of less than 1 microsecond per update."

Btw, looking at the commit history it seems like “Resonate” used to previously be refered to as “improv-audio” internally by its developers:

PS: Somewhat ironically the Open Home Foundation recenrly posted a newsletter blog post titiled “” and then they choose to use xkcd’s “standards” classic XKCD comic strip on how standards proliferate as their tumbnail for the Resonate organization on GitHub, however with the Open Home Foundation, Home Assistant , Music Assistant, and Nabu Casa founder backing this they probably have the influence to at least convince many open-source based media layer implementations to also add support for this new protocol:

8 Likes

I am mega excited for this!

Nice :slight_smile: Does anyone have an idea what realistic expectations would be when it comes to the sound quality of the underlying opus library, in a wifi/esp32 setup?

I believe that they are at a stage where the core parts of the software pipeline will not be the bottleneck for achiving the highest sound quality possible, instead to achive that it will be dependent mostly on the hardware pipeline that is used in the overall audio pipeline, meaning the choice of DAC (Digital-to-Analog converter) chip and circuits that are used on the PCB, etc…

That said I do not think that the current Home Assistant Voice Preview Edition hardware reference design has the best possible components to achieve top-of-the-line Hi-Fi quality on its analog audio output jack, but have seen mods that add a digital SPDIF/TOSLINK audio output port (via the Groove) so that you can connect it to an external DAC instead of using the built-in one. See example:

and

In fact I do hope that the next-generation of Home Assistant Voice Preview Edition hardware will feature a S/PDIF optical audio output port by default as an option to the analogue audio output.

If they do not add it by default then I that they consider adding a USB audio output adapter which could maybe be possible now ESPHome have added ”USB Host” and ”USB UART” features:

By the way, also remember that the Home Assistant Voice PE is only meant to act as an intial reference hardware and anyone including other companies can build their own hardware solution using this technology and designs as a base to build on to create alternative hardware solutions:

As the Resnate protocol is open-source software others can alternativly just add support for it to different hardware solutions, so they do not even have to be based on ESP32 or use ESPHome.

PS: In the end the final audio quality output from your speakers will obviously also depend on both the audio source (hence the saying “crap in crap out”) as well as your other external audio equipment as well. So if you are a true audiophile then you also need to spend time and money on all hardware.

FYI, Music Assistant 2.7.0 BETA / NIGHTLY releases (i.e. Music Assistant Server) has now added an initial (experimental) MA “music provider” for this new Resonate streaming protocol:

Note that there is still not yet any Resonate clients released so those you currently need to build yourself if you want to experiment with this:

Note that it does however not yet look to be documented under music providers:

3 Likes

FYI; the implementation of Resonate has now been renamed to “Sendspin” (i.e. “Sendspin Audio”):

It is however not clear if that will be the final name, so the name and specs are still subject to change.

You can already stream audio/music via Sendspin using the beta version of Music Assistant (server):

Development on the Sendspin component for ESPHome has been refactored in this new PR here:

If can now test alpha/pre-release builds of ESPHome with it for the Home Assistant Preview Edition:

Or you can today run a Resonate command-line server and client powered by resonate-go yourself:

If you have any feedback joint the #sendspin-beta-testing channel on the Music Assistant Discord:

PS: It sounds as if Sendspin component will not be merged into ESPHome mainline until next year.

Quoting Sendspin/Resonate lead developer (maximmaxim345):

The protocol specification is not yet fully finalized, but we are getting very close to that. As for implementations, we will soon have:

  • A almost fully spec compliant python server library (now called aioresonate)
  • A Music Assistant Provider using aioresonate
  • A almost fully spec complaint VPE Alpha Firmware
  • A spec compliant python client (now called aioresonate-cli)
  • A minimal but spec compliant js library for players
  • Experimental Resonate Support in the Music Assistant Frontend using that js library
  • A proof of concept Google Cast implementation of Resonate, including synchronized audio
  • A highly experimental option to use Resonate for Chromecast devices in Music Assistant
2 Likes

FYI, blog + video about Music Assistant 2.7 has more info about the official launch of Sendspin:

Introducing Sendspin

For some time, the Music Assistant team has been looking for the best way to stream audio, album art, and other music visualizations to the devices we have around our homes. There are a couple of projects out there doing cool stuff with streaming audio, but not any that fit our needs. So, when it doesn’t exist, it’s time to start building.

Introducing Sendspin, a new multimedia streaming and synchronizing protocol. It’s fully open source and free to use. Sendspin can stream high-fidelity audio, album art, and visualizer data, automatically adapting to each device’s capabilities. Imagine an e-paper display showcasing the album cover, while multiple speakers play in sync, and smart lights pulse to the rhythm.

The best way to use it right now is either via your browser or a Home Assistant Voice Preview Edition running beta firmware. We’ve built the experimental ability to use Sendspin on Google Cast-capable speakers (we’re also looking to do the same with AirPlay-capable speakers), which will allow Sendspin to work with a lot of different hardware.

A big thanks to Maxim and Kevin at the Open Home Foundation, who have been instrumental in making Sendspin a reality. Even though it can do some impressive stuff today, it’s very much a tech preview, and this announcement is our call to all developers and DIY audio hobbyistswe need your help building and testing this. This is the spec, start building with it!

All the best things in life are meant to be shared, and your music should be as free and open as the software we love. So spin that record :cd:, drop the needle, and send that music across your entire home

Is there a limitation on which devices this protocol works with ATM? Is it only the S3 boards or are others supported as well? I have an esphome audio player that currently works with the ‘speaker’ /i2saudio protocol. I’m experiencing some playback problems when exposing it to Music Assistant, and would like to test if Sendspin works better.
The board is listed in the Yaml as board: mhetesp32minikit

Sendspin as far as I know, for esphome is based on the s3, primarily the VPE. Although there is some success with non s3. I am having some success with the esp32 s n16r8 and max98357. Although still have the odd crash and reboot.

You will need to wade through the music assistant discord to find the answers. There is also some discussion here

Just a point, if you are having playback problems with your setup try adding this to your wifi section.

wifi:
  power_save_mode: none
1 Like

Thanks for the suggestion, I’ll give it a try. The problems I have are short stop-go interruptions, accompanied by a bit of noise. It seems more like a buffering/time sync problem than going completely offline. Is this to be expected as a result from power saving modes?

I have no idea what causes the stuttering and crackleing, that I was suffering it on my sendspin media players I am testing. I read a comment and it made them far better, not perfect but useable.

That question is not genral but specific to each software/code implementation so should really ask that question via GitHub on that pull request for the mainstream ESPHome codebase → [sendspin] (WIP) Add Sendspin synced audio protocol support by kahrendt · Pull Request #12284 · esphome/esphome · GitHub

That is, while there can and will be software/code limitations in each implementation there is no such limitations in the specification, which is why the limitations will be specific to each software/code implementation.

I would like to extend my most heartfelt thanks to the Open Home team who put together this fantastic innovation called Sendspin.

Now we can have multiple media players in the home synchronized perfectly without using shitty protocols that were often buggy. Synchronization is perfect across even Web players!

Furthermore, it’s never been easier to put together a very cheap and high quality media player thanks to the support for this protocol in ESP32 devices. And of course, the ability for those devices to output sound to optical SPDIF outputs (soon thanks to John).

I can finally ditch the output to AirPlay in my Marantz prepro, which was a catastrophe — slow to react, buggy, and half of the time it would sound bad for some reason I could never find out.

Thanks again!

2 Likes

Thanks for clarifying! I think I will wait until that PR is merged, but it looks promising!

It was worth a try. But didn’t change anything on my device (not Sendspin)

Has anybody tried Sendspin with a P4 CPU? It builds for me but I get a linking error when ESPhome is creating the image for my board.

Yep not sure it’s making the difference with mine either after extensive testing. But crackling and breaking up issues are often attributed to weak WiFi signal. So may be worth looking into that.

Again, such question is not general but specific to each software/code implementation so in this case should really ask that question via GitHub on that pull request for the mainstream ESPHome codebase → [sendspin] (WIP) Add Sendspin synced audio protocol support by kahrendt · Pull Request #12284 · esphome/esphome · GitHub

My Esp is litterally 10 cm away from my wireless AP. I am pretty sure this is not the issue.

I don’t see why a question to ask users whether they tried Sendspin on a particular type of ESPhome device is not relevant in a topic announcing that Sendspin is available for ESP in a forum about ESPhome.
Sure, there are also ESPhome users on Github but I expect many more of them here. At the very most you could ask the poster to start their own topic in this forum but referencing them to another site, for which they may not have or want an account seems unwarranted in this case.