I was previously running HA on proxmox. I moved my install to a dedicated computer with the below config. Ever since moving it I’ve had a lot of stability issues. The first issue I was able to resolve by removing the intel Bluetooth integration and the system became more stable, but I’m still having issues. Home assistant will still randomly reboot or freeze up and I have to reboot the machine. Below is a video of what it did last night.
Note the output of pwd for later, that’s the path to your file you’re editing.
then keep hitting the d key, which will delete a line when pressed twice, until you get to just before the crash.
then use the down arrow to pass through the lines you want to share and use the d key to get rid of the rest. Hit the escape key, a colon, the letters w and then q then hit enter.
then on your machine you’re posting from open powershell. Type pwd, hit enter and note the result. That’s where the file will land locally so you can attach it.
This is a Linux issue (if you search for “workqueue leaked lock or atomic” you get a lot of reports from various distributions). Make sure that your distro/kernel is up-to-date.
I’ve just been able to stabilize my HA moved to a (much weaker than yours) n100 box, funnily enough into a VM instead of outside of it.
The key thing is to tweak vm dirty pages sysctls. The defaults are good for a time long past. They blow up in certain workloads on fast machines with loads of memory due to write caches being too big and then blocking everything.
This is my /etc/sysctl.d/01-memory.local.conf
# do not overcommit - optional
#vm.overcommit_ratio=100
#vm.overcommit_memory=2
# 5 GB of dedicated hugepages for qemu - disable if not used - or increase if using more memory for vms
# vm.nr_hugepages=2560
# use swap, but try not to. do not set to 0!
vm.swappiness = 1
# default 100
vm.vfs_cache_pressure=50
# dirty ratio/bytes: do not oom kill when suddenly lots of pages need to be written to disk
# loads of ram => reduce further
#vm.dirty_ratio=6
#vm.dirty_background_ratio=4
# it's either the above or below - with a very fast disk you can increase this 2x, maybe 5x
vm.dirty_bytes = 1300000000
# ~0.3 * above
vm.dirty_background_bytes = 300000000
# these are still too high probably - experiment with 500/150 or less
vm.dirty_expire_centisecs = 1500
vm.dirty_writeback_centisecs = 300
After these changes (and setting vm image to raw instead of qcow2) my nightly backups barely register on the cpu load time series instead of blowing up to 15+, which on the n100 kills the system.