Detect Someone Talking - Not a Voice Assistant

Good afternoon. First time posting.
I’ve been using HA for about a year or so now. I transitioned from Hubitat. Love HA!

I have a need for a hardware/HA solution. Let me give the use case.
My wife works from home in her she-shed we built for her home office. I have it fully automated inside/outside with lights, heated floor, security, etc… Alexa is also integrated. In her job as an executive she is constantly in meetings. So, I’m never quite sure when she is free. While I considered having her update a local calendar with her meeting schedule I realized I didn’t want to add to her daily workload and many times she will have ad hoc calls/meetings that won’t be on any calendar.

So, I want to build a little circuit, probably with an ESP32 and a microphone that will detect when there are active voices in the office/her speaker phone. No wake word or any specific persons voice - just any voice within a previous rolling window of say 15 seconds. Any ideas on what I would need to do to build such a circuit? Once built I am sure I can use ESPHome to handle the HA interface.

Thanks in advance!