Tapo RTP/ONVIF Video Stream converting to Audio Stream to play on Voice Satellites or LMS Player?

I have a Tapo C210 cam, connected to HA by @JurajNyiri HACS integration GitHub - JurajNyiri/HomeAssistant-Tapo-Control: Control for Tapo cameras as a Home Assistant component - This camera is our babyphone. As the ONVIF notifications aren’t implemented by TP Link, the only solution we are currently using is streaming the live video. Is it also possible to use the audio only and stream it to a HA Voice satellite or LMS player / other connected audio playing device? Instead of watching the video it could be enough in some situations to hear the audio stream only.

I am also happy if there is a direct solution, running as a python script of just a command line execution - I think it is also possible to run scripts on ESP32/raspi in homeassistant - as a button for my wife ;). The camera has a fix IP and maybe the inofficial pytapo library could help