Same issue here. But I put 2 speaker in my livingroom room.
There needs to be a solution to detect the speaker closest to the voice (possibly the sound intensity) and only activate one speaker in case both receive the hotword.
My temporary solution is to put 2 speakers with 2 different hotwords. Lol
yeah, I have the feeling there is no simple solution for this problem. most probably it would require the devices to communicate with each other, or with home assistant, and decide which one to activate based on a Voice Activity Detection value or similar.
Triggering the wrong box3 is tricky. As a work around, you could set different wake words for each, by setting the wake word detection to ‘on device’, and changing the esphome code for each, but thats less than ideal.
As for the wrong speaker saying the reply, I would suggest using Robs s3-box firmware which has the esphome code in it for the s3 to actually show up as a media player and therefore be the responding device, using its built in speaker.
The current dev code also has support for tons of other features like timers, and volume control (the default s3 ephome code is set to be very quiet)
and to be honest, I only had a brief look at the pipeline code and don’t have a complete picture as of now.
as for the alternative firmware, I agree it is nice and I have been following the thread. but at the moment it is also just nice to get OTA updates, without having to run the ESPHome CLI every time