FYI, Nabu Casa’s ESPHome developers are working on such features for an ESP32-based ”voice-kit” in an experimental fork that is planed to be used in an upcoming official Home Assistant Voice Satellite development kit hardware platform, so can follow that progress here:
It will use similar hardware components to what is used in the new ”reSpeaker Lite” devkits:
There is a related feature request discussion for Music Assistant here: