Diagnosing a recurring issue with HA becoming unresponsive

Hi,

I wonder if anyone could help me or give me some advice on diagnosing an issue with my installation please?

It’s been solid as a rock for three years, but in the last few weeks maybe 2-3 time a week it’s become inaccessible, often when I wake up in the morning from overnight. Cannot login via the web or app, it just sits there trying to load indefinitely.

Some automations seems to still be running as far as I can tell, but most are not.

It’s running as a VM on Proxmox on a little HP Elite desk 800 G2 mini server, which has plenty of resources generally (and an SSD) .

Restarting the VM or even the host! doesn’t seem to work and I literally have to power cycle the host server to get it back up.

As far as I can tell there’s nothing obvious in the logs and I don’t remember a particular change I made around the time it started happening that could be the cause. The only error I see sometimes in the logs around the time it occurs is something to do with my Bluetooth adapter - “bluetooth: hci0: malformed le event 0x02”, not sure if it’s relevant or related.

CPU and Memory usage seems fine in Proxmox as far as I can tell, but it does feel like something is getting stuck, tieing it up and slowing it down so much it’s unusable.

I’m concerned it’s getting worse and if I’m not home when it happens I have no way of recovering it until I am. So any help or advise on diagnosing the issue would be much appreciated!

Thanks.

See this topic:

1 Like

Thanks I have enabled debug mode and will see if that provides any clues.

However to make matters worse since I restarted it ZHA now refuses to start, just constantly says failed setup. Argh!