"Zoom room"-style automatic starting of video calls?

Hi there,

I am on a quest to get effortless video calling to work for my family, but am really struggling and wondering if I have the right approach. I have a mini Windows 11 PC connected to a TV with a webcam, and I bought an HDMI-CEC adapter from Pulse-Eight in the hopes of achieving something like this:

  1. Computer receives a signal (MQTT, HTTP, whatever)
  2. It turns on the TV and sets it to the right input
  3. It kills all active browser instances
  4. Opens a new browser instance full screen
  5. Navigates to a Google Meet URL
  6. Joins the call

Then I can create an automation in Home Assistant to send the HTTP or MQTT message to start the call.

Playing around with cec-client I think the CEC stuff should work, and the web server bit is trivial. I can also kill all MS Edge or Chrome instances easily enough.

However they tend to go through a recovery/restore workflow on next start, which is frustrating, and the main problem is the final step of joining the call. I’m trying to use pyautogui to locate the “Join now” or “Ask to join” buttons and then click them, but it can’t find them, maybe because it takes too long for them to appear. But there’s so many reasons why they might now even appear at all, like if the browser isn’t logged into the Google account or loses it’s camera/microphone access permissions.

Any ideas or thoughts on making a more reliable solution for this? I’m aware of a few other solutions:

  • Meta Portal TV, which is a really good product but is limited to WhatsApp and is in any case discontinued.
  • OnScreen seems to be able to do exactly what I want, using Zoom, but would be really expensive - $30/month, plus a zoom subscription, and I’m not sure I can trigger it outside of their proprietary mobile app.
  • CallGenie is very similar to OnScreen except it requires using Skype, of all things.
  • Zoom rooms basically the gold standard, works flawlessly at my workplace, but costs megabucks.

Any thoughts?