SIP Doorbell Client with ESP Home

I have two separate buttons for my door bell. One sits at the front gate, close to the street and anotherone is positioned right at the front door. These are simple momentary switches with wires running into the utility room in the basement. In the utility room sits a transformer that transforms 230V into low voltage. In the past, these buttons triggered a old fashioned bell.
A couple of years back however I installed a Telegärtner TS2-a/b, which is connected to a DECT-Modul communicating with my Fritzbox on the ground floor. See schematic picture below:

This is how the current wiring looks like:

Now my two Fritz!Fon start ringing, whenever someone rings the bell. Great!
What I do not see, is which of the two buttons is pressed. The DoorLine does have an input for a second button, but it can only be configured to call a different phone number when the other button is pushed (i.e. the use case here is for mutli flat houses).

Additionally I have a Netatmo Presence at the front door. The Fritz!Fon is able to access the live feed of the camera, but I cannot trigger it to show the live feed, when the door bell is “calling”. In the Fritzbox this can only be configured for SIP doorbells.

So my question (and potential project … ) would be:
Can I use an ESP or anything else, to connect to the momentary pushbuttons, that have a SIP client able to connect to the Fritzbox’s SIP server? so that I can configure the live feed of the Netatmo camera to be shown, when someone is pushing the door bell?
Ideally this ESP would also forward the doorbell events to Home Assistant to be further processed.

So far I came across this little Add-On, called DSS VoIP Notifier. Having it installed on my Home Assistant instance, I am able to:

  • call my landline phones from Home Assistant
  • Bind the Camera Live Feed with the SIP Client using the Fritz!Box Web UI

So theoretically I could now remove the transformer, the Telegärtner TS2-a/b and the DECT-Modul to replace it with a ESP or Shelly, that sends me events/states of the pushbuttons at the front door and the gate.

This solution would be very simple and would allow me:

  • to ring the phones when someone is ringing the door bell, as before
  • show the camera feed on the phone to see, if that person is at the door or at the gate

I would loose the following features:

  • Talk with the person at the gate. The current TS2-a/b has got a speacker and microphone connected, however performance has always been miserably (i.e. not able to understand each other) as such this function has never been used.
  • Open the door by the gate remotley via a button press on the phone. (However that function has also rarely been used, as you can simply reach over the door and open it from the inside.
You can also place a relay on your esp and connect it on your electric lock or make a parallel connection if you have a switch that open lock…