System crashes immediately after boot, any troubleshooting tips?

Might be the wrong forum? Apologies if so.

Running HA 2025.8.3 bare metal on a dedicated NUC. It’s been stable and reliable until mid-morning today when my wife noticed her media on a smart speaker kept getting reset. I pulled up the HA companion app on my phone and found it couldn’t connect; the system would restart itself sporadically. I have HA log some events like startups & shutdowns to a private Discord server. Shutdowns aren’t getting listed because the system is crashing:

Fortunately, HAOS is stable when booted to safe mode. Nothing looks particularly fishy, but there’s no harm in restoring from last night’s backup. This is why I store 35 days of full backups, right?

Restoring the 8/30 @3am backup yields the same behavior above. System is immediately unstable once HA finishes loading everything. Same with the 8/29 backup. And the 8/28, the 8/25, and even the 8/15 from when we were out of town. This behavior never materialized before today.

Maybe it’s a hardware issue. How about we spin up a VM and try restoring each of those backups to the VM? Same issue. This time, I could see the VM running out of memory as it crashed – even after bumping it up to 12GB RAM. Something’s up.

The VM did show this CIFS error while booting, though I’m skeptical that a faulty connection to the NAS would be the culprit.
image

The only hardware or device change I made this month was plugging in the new ZWA-2 when it arrived on 8/26. I haven’t had a chance to set it up yet, or even have HA recognize it yet – just plugged in the USB. My backup restores were all conducted with it unplugged, so it’s unlikely the culprit.

I’m stumped here. I tried disabling integrations like HACS and other third-party additions with no success. My next step tomorrow is to disable every non-first-party integration, reboot, add a couple back in, then continue the cycle in hopes of finding something that will boot properly again. Each full reboot + restart in safe mode takes ~10 minutes, so that’ll be fun. Any troubleshooting ideas beyond that?

This means it is a third party integration or dashboard resource. They are not loaded in safe mode.

Start in safe mode again and look at your config/home-assistant.log.1 file. This is the log from before the last restart.

The .log.1 file helped me narrow my search, thank you. The culprit was either:

  • An automation that regularly queried a sensor for a threshold. That sensor’s output was recently changed without me realizing it.
  • The UniFi Network integration. I subscribe to the regular, stable release channel, and this is a fairly popular integration, so there’s a decent chance this is a false-positive.

Both were disabled in my last reboot, and I don’t dare tempt the gods any further on a long weekend. This ordeal reignited my interest in the High Availability Home Assistant guide.

High Availability would not help in this case. If it was an automation causing the crash, that automation would have been replicated to the other instance.

2 Likes