Update: Resonate has been renamed to “Sendspin” (i.e. “Sendspin Protocol/Server/Client”):
Just a roadmap heads up to ESPHome tinkerers, not sure if this new experimental “Resonate protocol” will be usable on all ESPHome based audio output/input devices running on ESP32 (i.e. voice assistants, media player, and smart speaker hardware), but FYI, I heard it sounds like (pun intended) some Open Home Foundation and Music Assistant developers have started working on a new open-source audio streaming protocol for a better multi-room audio and music playback experience on embedded hardware running ESPHome on ESP32 or Linux on Raspberry Pi (Zero):
- https://github.com/Resonate-Protocol (Resonate Protocol organization on GitHub including contact information for collaboration)
- GitHub - Sendspin/spec: Specification of the Sendspin protocol (Specification of the Resontate protocol)
- GitHub - Sendspin/aiosendspin: Async Python library implementing the Sendspin Protocol. (Async Python library implementing the Resonate Protocol.)
- GitHub - Sendspin/audio-sdk-js: Proof of concept of how Improv Audio can work (Proof of concept of how Improv Audio can work in TypeScipt/JavaScript)
This new protocol specification is design from scratch but is similar to Squeezelite and Snapcast as well as Music Player Daemon (MPD) in concept at a high level, Resonate is not a stand-alone player, but an extension that can turn an existing audio player into a Sonos-like multiroom audio solution with time synchronized between the clients and the server to play perfectly synced audio. In their own tests they claim to see that average time deviation is below 0.05ms for synced audio (50 microseconds between two ESP32-S3 devices running ESPHome and connected over WiFi). Think of it like a completly royalty-free and open-source competition to Apple AirPlay (formerly AirTunes) and Google Cast (Chromecast Audio) propriatory protocols but for implementing HiFi multi-room audio solutions.
“Resonate is a multi-room music experience protocol. The goal of the protocol is to orchestrate all devices that make up the music listening experience. This includes outputting audio on multiple speakers simultaneously, screens and lights visualizing the audio or album art, and wall tablets providing media controls.” “Definitions; Server, a Resonate server. Orchestrates all devices. Generates an audio stream, manages all the players, provides metadata etc… Player, a Resonate client that can play audio, visualize audio or album art or provide music controls”.
Resonate Project Board (Backlog and Roadmap):
Btw, Music Assistant lead developer posted a reply in their discussion section answering the question why they choose to not just use Snapcast:
For a WIP reference implementation of a server using their aioresonate (Async Python library see how it is implemented in Music Assistant:
And the ESPHome firmware project will get a new experimental audio client component that adds support for the Resonate Protocol for synchronized music-playback across multiple ESPHome-based devices, i.e. audio synchronizer to enable sound-sync timing for DIY multi-room audio systems:
Note! This is a development snapshot - NOT ready for production use. This PR is intended for testing and feedback purposes only. ![]()
Implementation Details:
-
Time Synchronization:
- Employs a Kalman filter to model and compensate for internal clock drift between the client and server
- Dynamically tracks and adjusts for each device’s clock characteristics to maintain tight synchronization
- In preliminary testing, achieved a median audio synchronization error of approximately 50 microseconds between two ESP32-S3 devices connected over WiFi.
-
Architecture:
- The resonate component implements a flexible hub architecture that allows devices to participate in the audio listening experience with different levels of functionality:
- Text Sensor: Display track metadata without audio playback (e.g., on screens)
- Media Player: Output synchronized audio
- Extensible Design: Ready for future integrations like audio visualization
- The resonate component implements a flexible hub architecture that allows devices to participate in the audio listening experience with different levels of functionality:
-
mDNS:
- Implements mDNS advertisement for the component’s WebSocket server
- Enables automatic discovery by resonate servers
-
External Dependencies:
- Integrates esp-libopus (WIP IDF component) for Opus audio codec support
-
Testing:
-
Currently compatible with:
- Music Assistant Server - feat/improv-audio branch
- Note: This implementation has known issues but provides basic testing capabilities for synchronized playback
- Music Assistant Server - feat/improv-audio branch
-
Example entry for config.yaml:
- Example Voice Preview Edition firmware is available here: home-assistant-voice-pe/home-assistant-voice.yaml at 66f7fdea2e73ef91156ad579aeeb768ecdfa87e1 · esphome/home-assistant-voice-pe · GitHub (requires building with ESPHome dev branch)
-
-
Current Status:
- August 28-29th updates adds several important changes:
- Now compatible with the Music Assistant server’s
resonatebranch - Fixes issues with rapid stopping and starting (requires using the mixer and resampler speaker platform changes in this PR)
- Improves memory handling and safety
- Adds groundwork to support cover art as an image component in ESPHome
- This PR is intended for testing and feedback purposes only. The implementation contains:
- Numerous TODO items for simplification and optimization
- Extensive debug logging for diagnosing synchronization issues
- Incomplete features pending server-side support (metadata processing, FLAC decoding)
- Breaking changes expected in future iterations
- Modified Components
- This PR is intended for testing and feedback purposes only. The implementation contains:
- Now compatible with the Music Assistant server’s
- August 28-29th updates adds several important changes:
-
Future Work:
- Protocol refinement and stabilization
- Enhanced audio stack to support additional synchronized audio protocols
- Complete metadata and FLAC support
- Production-ready implementation
There was some discussions + further explainations on its concept in a other thead that asked about this new multi-room streaming protocol:
_Originally posted by marcelveldt in New multi-room streaming protocol from MA VS SnapCast · music-assistant · Discussion #3883 · GitHub
"we decided to create (yes, yet another!) protocol. One that would fulfill all our needs and would work across the internet and suitable for low powered devices like the ESP32. It will support (perfect) synced playback by default but can also switch codecs on the fly, making it suitable to stream to a (remote) browser. Based on websockets ensures it can travel across firewalls, reverse proxies and whatnot. It’s a bit of shame to not use the strength of an existing audio streaming protocol (snapcast) but we ran into so much trouble it became more of a burden than a help. I even considered rewriting Snapcast from scratch in python, so keep the binary protocol the same but just re-implement it with our issues fixed. I’ll make sure we publish the details somewhere soon, currently its still in ideation/POC phase but looking very good. “Be aware that this is nowhere near ready to test something - we are still defining the spec and doing some PoC implementations to test the theory.”
If I understand correctly this new Resonate synchronized audio component will add a flexible hub architecture for audio time synchronization
This could solve the problem that you normally need more expensive hardware solution if you want to add multi-room audio to existing speakers, as once this is implemented fully you could have almost perfect multi-room audio syncronization using something like the Home Assistant Voice Preview Edition connecte to external speakers using its stereo output jack:
But a more relevant follow-up feature request for Music Assistant when this implemented would be if could add support for different types of multi-room audio, (to have options for single source-single zone, single source-multiple zone, or multiple source-multiple zone playback)?
Ongoing discussion about this new experimental Resonate audio component is going on in this Discord channel here:
KevinA:
"Luckily math is here to save the day. A Kalman filter is a nice way to combine measurements with different uncertainties, which is exactly what we have! Basically, if the most recent message has a very low delay, then we weigh its value much more than if it had a very high delay. It gives us a much more stable offset than using the median, but it also gives us an estimated error after each update. This means the client can estimate how accurate its own offset is! This would allow us to slow down or increase the pace we send the time messages. The more confident we are in the offset, the less we need to send - and vice versa. This will be especially helpful when we first connect and need to spam a bunch of time messages to quickly build our confidence in the computed offset.
So being accurate is important, of course, but being stable is also very important. The median was problematic, even if its inaccuracy wasn’t audible. With the median filter, the offset would shift by over 200 microseconds, on average, per update, which means the client would have to constantly be tweaking the audio to try to stay in sync. With very big jumps, it would require hard syncing which is audible.
The Kalman filter modeling the drift (of 2 seconds per day) had an average offset change of only ~25 microseconds per update. Keep in mind, if the clock does drift 2 seconds per day, we would expect it change 23 microseconds per update if the updates are 1 second apart. If there is no systematic drift in the client clock, the Kalman filter had an average change of less than 1 microsecond per update."
Btw, looking at the commit history it seems like “Resonate” used to previously be refered to as “improv-audio” internally by its developers:
PS: Somewhat ironically the Open Home Foundation recenrly posted a newsletter blog post titiled “” and then they choose to use xkcd’s “standards” classic XKCD comic strip on how standards proliferate as their tumbnail for the Resonate organization on GitHub, however with the Open Home Foundation, Home Assistant , Music Assistant, and Nabu Casa founder backing this they probably have the influence to at least convince many open-source based media layer implementations to also add support for this new protocol:
