I’ve just set up an ESP32-S3-BOX as well, flashed with Willow. In HA I have the Willow Application Server add-on for speech recognition and the Amazon Polly integration for TTS, with responses played by the media player integration on Sonos speakers (no microphone needed) - the results are remarkably good, but cloud dependent of course.