I’m posting this as a feature request to “sum up” the myriad of issues posted here and there about the aforementioned combination.
After perusing the codebase involved, I can think of several items that could use a few pair of eyes… This list is likely non-exhaustive, but I only have two eyes myself…
Please don’t post issues here, to keep this thread about possible improvements.
ESP32-audioI2S library
https://github.com/esphome/ESP32-audioI2S
The library is about a year behind the original in terms of commits (hundreds since).
Of interest, a few optimizations with the usage of psram which would seem useful with newer boards.
i2saudio / media_player component
https://esphome.io/components/media_player/i2s_audio
Unavailable with esp-idf framework.
speaker
is too “disconnected” from Home Assistant in my opinion, especially for a voice assistant.
Makes esp-adf unusable with media_player
; I didn’t dig into what it would bring to the table though.
Other
The voice pipeline setup is a bit messy, start
, start_continuous
, stop
, use_wake_word
; it all seems a bit “disconnected”. How about some higher-level functions to take care of stopping any active state and switching to the desired one (listen once without wake word / listen for wake word)? And a use_wake_word
switch that automatically calls the proper functions (at the right time), to avoid messing with init progress, on_client_connected
& other such mechanisms to avoid errors on boot; and that shuts down the voice pipeline appropriately on its own when deactivated. Ideally for the use_wake_word
config, the value could be either true/false, or the ID of a template switch (to link its value directly).
The voice pipeline also throws a lot of errors when it doesn’t seem warranted (like “wake word not detected” on deactivation of use_wake_word
). Likely linked to the aforementioned “disconnect” between processes.
Bonus: TTS queueing, although easy enough to implement with a couple of wait templates (but requiring a bit of yaml…), could really be handled better at a lower level.