Home Assistant crash Proxmox

Since about 6 weeks I have a strange issue with my HA OS installed as a Virtual Machine in Proxmox that was running rock solid for 2 years.
It take me a bit to “understant” what was going on, as I had several containers running on the same Proxmox.
During the “Troubleshooting” I landed to see that it’s the HA Virtual Machine the reason of my Proxmox crashes, If I don’t start HA, the server is running great without problems.

The problem is quite strange, when I start HA it could work fine even for one or two days then it simply die crashing proxmox and the only option I have to recover it is to power-cycle the server.

After some time I realize that there are in particular 2 things that let it crash 90% of the times:

  • Running a backup
  • Updating Core or a big integration

For example, updating Core to 2025.2.3 take me about 20 attempts.
I start the update and after 2 minutes HA become unavailable and Proxmox die. I have to power-cycle the server and try again.
Smaller updates most of the time works, for example if I update a simple integration it works. But if after succesfully updating an integration I run another bigger update the system crash and when I power-cycle, then that same small update has still to be performed. If after the succesful small update I immediately reboot, then the update is really done.

Anyway updates are not the only time it crashes, sometime it simply die without any interactions after a few hours or even a couple of days.

I disabled all integrations except DHCP Server and Zigbee2MQTT

This unclear problem is driving me crazy and I start suspecting this has something to do with some hardware failure and not really with HA itself. I know I said it’s happening only with HA machine but it’s also the ‘heaviest’ I run (Maybe with the exception of NextCloud) and it is the only virtual machine, the others are all containers.

Any suggestion ?

  • Core2025.2.3
  • Supervisor2025.02.1
  • Operating System14.2
  • Frontend20250210.0
1 Like

What hardware is this running on? It sounds like a possible hardware issue (bad RAM?) Can you remove/swap the memory or is it soldered onto the board? What does your System Monitor show for memory and processor usage over time? How large is your database/backup file?

Does you Proxmox have enough CPU/RAM headroom?

Ram will be shared among the VMs, so other VMs will be hit too at times.
This indicate a storage media error in the place where HA is located.

It shouldn’t be possible for a VM to crash the host, so I think the other comments about a hardware issue might be right.

Hello, an update: I restored the Proxmox backup on another storage and it’s now working great again !
The strange thing is that the storage that was causing the fault was hosting all my other containers without any problem. By the way, it’s fixed now :slight_smile:

1 Like

Hi Simone, good you got it solved!

Please take the time to mark the solution as the answer, you do that by selecting the three dots under the post:

image

Then select the check box:

image
By doing so this can be useful to other users as well (this thread comes up as solution when staring a related thread).

Hello, I don’t have the issue that the Proxmox server stops working but all the rest you tell is the same but it’s my VM which stops working and I have to startup the guest, also a hardware problem or more a memory issue but when it’s updating it’s not using much mem or cpu.

Same thing happens to me, if I try and do a backup while updating. Proxmox dies every time.

Proxmox or the VM?
Proxmox is the supervisor and should not due, because then you have some serious issues.

If the VM dies, then it mght be due to too low ram assignments.

Proxmox,

Only using 25% memory, system passes memory tests.

It’s 100% stable, until a home assistant backup.

I run several other vms and they backup just fine.

Your memory value is just a picture of the moment and only of what is being used at that moment.
You can not see what is being used a second later and neither what is being requested.
The crash occur when a request can not be fulfilled and that request can be problematic if it request more than is available, but also if it request a continuous block of memory large than is available.
The second one is the hardest one to deal with and only an ample surplus of available memory can truly counter it.

Linux logs also show now OOM errors.

Which should happen if the operating system runs low on memory to function

Just thought id post back here, after much much searching through logs, i eventually found the cause.

Looks like there is a bug in the newer Proxmox kernals, where by big network spikes cause a e000e crash.

Forum post here: e1000e eno1: Detected Hardware Unit Hang: | Page 4 | Proxmox Support Forum

Hope it helps others.

1 Like

Nice to know.
Are you using Proxmox 8 or 9?

I am having that issue too. HA keeps running but the NIC the VM is using crashes to the point the port doesn’t even show up if you run ‘ip link show’. This doesn’t always happen so at times I can use my JetKVM to log in and run ifdown and ifup on the NIC used by HA. Anyhow the logs show the e1000 hang error. So… I moved HA from the 1GbE embedded Intel NIC to an Intel X520-DA2 2 port SFP+ 10GbE NIC but the same thing happens even though it doesn’t rely on the e1000 driver. Frustratingly I’ve had this issue happen before when running Proxmox 8, and had found some way to resolve it (turning off features on the NIC) but that fix doesn’t seem to work now (or I am missing something).