Since about 6 weeks I have a strange issue with my HA OS installed as a Virtual Machine in Proxmox that was running rock solid for 2 years.
It take me a bit to “understant” what was going on, as I had several containers running on the same Proxmox.
During the “Troubleshooting” I landed to see that it’s the HA Virtual Machine the reason of my Proxmox crashes, If I don’t start HA, the server is running great without problems.
The problem is quite strange, when I start HA it could work fine even for one or two days then it simply die crashing proxmox and the only option I have to recover it is to power-cycle the server.
After some time I realize that there are in particular 2 things that let it crash 90% of the times:
- Running a backup
- Updating Core or a big integration
For example, updating Core to 2025.2.3 take me about 20 attempts.
I start the update and after 2 minutes HA become unavailable and Proxmox die. I have to power-cycle the server and try again.
Smaller updates most of the time works, for example if I update a simple integration it works. But if after succesfully updating an integration I run another bigger update the system crash and when I power-cycle, then that same small update has still to be performed. If after the succesful small update I immediately reboot, then the update is really done.
Anyway updates are not the only time it crashes, sometime it simply die without any interactions after a few hours or even a couple of days.
I disabled all integrations except DHCP Server and Zigbee2MQTT
This unclear problem is driving me crazy and I start suspecting this has something to do with some hardware failure and not really with HA itself. I know I said it’s happening only with HA machine but it’s also the ‘heaviest’ I run (Maybe with the exception of NextCloud) and it is the only virtual machine, the others are all containers.
Any suggestion ?
- Core2025.2.3
- Supervisor2025.02.1
- Operating System14.2
- Frontend20250210.0