HAOS Is always locking up

HAOS has been a nightmare and I am planning on moving off it eventually, but I would really don’t have time and would like to find a temporary fix. It seems a little bit odd that even the console would lock up.

The last logs written before it locks up don’t provide much information.

2024-12-21 21:10:36.938 homeassistant systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
2024-12-21 21:26:14.104 homeassistant kernel: audit: type=1334 audit(1734816374.102:336): prog-id=90 op=LOAD
2024-12-21 21:26:14.104 homeassistant kernel: audit: type=1334 audit(1734816374.102:337): prog-id=91 op=LOAD
2024-12-21 21:26:14.104 homeassistant kernel: audit: type=1334 audit(1734816374.102:338): prog-id=92 op=LOAD
2024-12-21 21:26:14.113 homeassistant systemd[1]: Starting Hostname Service...
2024-12-21 21:26:14.456 homeassistant systemd[1]: Started Hostname Service.
2024-12-21 21:26:14.473 homeassistant kernel: audit: type=1334 audit(1734816374.471:339): prog-id=93 op=LOAD
2024-12-21 21:26:14.474 homeassistant kernel: audit: type=1334 audit(1734816374.472:340): prog-id=94 op=LOAD
2024-12-21 21:26:14.474 homeassistant kernel: audit: type=1334 audit(1734816374.472:341): prog-id=95 op=LOAD
2024-12-21 21:26:14.486 homeassistant systemd[1]: Starting Time & Date Service...
2024-12-21 21:26:14.781 homeassistant systemd[1]: Started Time & Date Service.
2024-12-21 21:26:44.469 homeassistant systemd[1]: systemd-hostnamed.service: Deactivated successfully.
2024-12-21 21:26:44.513 homeassistant kernel: audit: type=1334 audit(1734816404.511:342): prog-id=92 op=UNLOAD
2024-12-21 21:26:44.513 homeassistant kernel: audit: type=1334 audit(1734816404.511:343): prog-id=91 op=UNLOAD
2024-12-21 21:26:44.513 homeassistant kernel: audit: type=1334 audit(1734816404.511:344): prog-id=90 op=UNLOAD
2024-12-21 21:26:44.817 homeassistant systemd[1]: systemd-timedated.service: Deactivated successfully.
2024-12-21 21:26:44.849 homeassistant kernel: audit: type=1334 audit(1734816404.847:345): prog-id=95 op=UNLOAD
2024-12-21 21:26:44.849 homeassistant kernel: audit: type=1334 audit(1734816404.847:346): prog-id=94 op=UNLOAD
2024-12-21 21:26:44.849 homeassistant kernel: audit: type=1334 audit(1734816404.847:347): prog-id=93 op=UNLOAD

It would help if you could provide more info… How to help us help you - or How to ask a good question.

See sections 10, and what hardware is being used, etc.

  • Core2024.12.5
  • Supervisor2024.12.0
  • Operating System14.1
  • Frontend20241127.8

It is a Generic x86-64 system.

I think I have a similar issue. HA OS just crashes randomly, sometimes it runs for a week, this weekend it crashes twice, once it worked for two weeks. CPU utilization and RAM usage are completely normal until it happens. Sadly I can’t provide more information, because there just are no logs. I used journalctl with a monitor connected after the reboot, and before HA OS goes down everything is normal, no error messages or anything, and then they just stop. My logs also look similar to the log posted here, but one time I managed to connect a monitor shortly after it crashed, and it showed these messages on the console:

systemd[1]: systemd-hostnamed-service: Watchdog timeout (limit 3min)!
systemd[1]: Failed to start Journal Service.
systemd[1]: Failed to start Network Time Synchronization.

After the reboot, the logs also just stopped at the time it crashed. But that was the only time it displayed something after going down, usually the screen is just black and I also can’t ping HA anymore. Once the PC also beeped for some reason, not sure if that’s related to the issue.
Also running a Generic x86-64 system with OS 14.0, Core 2024.12.1, Supervisor 2024.12.0. I already had the issue with HA OS 13.x though. It originally started when I still ran it on a Pi 4 2GB. I then switched over to this PC, because I thought it crashed because it ran out of memory. Not sure if the crashes on the Pi had the same cause these ones do.

In general, I would start with the installed integrations; either:

  • remove or disable them all, reboot, add / enable one integration at a time and monitor system stability
  • disable or remove one integration at a time to see if the system becomes stable after an integration is disabled/removed.

Also, booting into Safe Mode, to see if the system is stable, is similar to the first bullet above.

Problem with that approach is that it crashes anywhere between 2 days and 2 weeks. It could take months with that approach to find. I did disable the default_config this round, but it is still discovering devices.

Here is an alternative approach: 2024.5+: Tracking down instability issues caused by integrations.