Backup stuck, no way to restart?

I just had this happen for the first time since switching over to the new built-in automated backups, but it used to happen to me approximately every other week when triggering them with an automation. I’ve been unable to find any logs or other indication of problems across many months.

I used to solve this via just rebooting my Yellow, now this requires dropping into ssh and doing an ha core restart (supervisor refuses since the system is in the frozen state for the backup). The core restart causes the backup to enter the failed state so I can then reboot.

Both with the old automation-based backups and the new built-in, when this problem occurs the system load average skyrockets from my usual < 1 to above 3. I actually have a “high system load” automation set up to alert me when this happens, since it is a reliable indicator of this situation happening overnight (these days I could switch to the backup manager sensors, but I digress).

The high load average persists even after the ha core restart, though the rest of the system appears to function normally. In my brief poking around this morning, I didn’t even notice any apparent slowness despite the high load. This leads me to the following conjecture:

I haven’t been able to prove anything, but long ago I had an issue on my home server which manifested in this same way: high load average until reboot. It ended up being a stuck NFS mount causing processes to become stuck in permanent iowait. I’m wondering if something similar is happening here with the SMB/whatever connection dropping and the backup being in a sort of filesystem limbo. Or perhaps it’s related to the overlay filesystems used by the addons, but a similar result.

If anyone has any ideas about how to track down the state of the kernel when this happens, I’d be happy to test some of them out next time I wake up to this. With how containerized HA is I don’t have much experience with debugging it further down.

Edit: here’s the visual indicator of this happening, I’d be curious if the same happens for any of you: