I have the dreaded home assistant constant crash/restart problem and I'm stumped!

EDIT: Looking like a hardware problem; I’m not going to mark it as resolved until I rule that out.

Hi,

Been using home assistant since 2017.

These days I I run it on an Intel NUC under HASSOS.

Not an expert but I am an extensive user.

Lately it has been crashing and restarting and by the end of last week it has been crashing as soon as it starts.

I’ve read a stack of posts here and on reddit about this issue and none of the diagnoses/ fixes have worked or applied to me.

My log is at https://pastebin.com/PNZYF1xW. This is a log where I cleared everything out and rebooted to try to capture one crash/boot cycle only (or as close to it).

My fault log is at https://pastebin.com/ipiavewT. I wish it was contemporary with the actual log file but for reasons I won’t go into, it is not.

I’m continuing my own diagnostics but I’m desperate. If anyone can help my solve this, I’m all ears.

If restarting in safe mode helps it is a custom integration causing the issue:

1 Like

Thanks and sorry to ask what is probably an obvious question, but how do I identify which custom integration is causing the problem?

BTW on the plus side I’ve eradicated a bunch of integrations (custom and otherwise) I don’t use anymore. Hasn’t fixed the problem but has been good housekeeping!

Well first see if it helps. If it does then the first one I would disable is Local Tuya as it is generating quite a few errors in your logs. But if that is not it you can do a binary search by disabling half of them at a time.

System good - > faulty integration in disabled lot.

System bad → faulty integration in un-disabled lot.

Keep dividing the halves with the fault in half until you find it.

If it does not help it could still be a core integration.

Just recently I disabled an integration at random every day. This took weeks to eventually find the core CPU Speed integration can cause performance issues.

Also how are your CPU and memory resources?

Could also be a runaway loop in an automation or script.

Thanks! I’ll give it a try then.

The first thing is to test the base installation of HA. Does it start and run without errors for 12, 24, 48, 72 hours. If it does then move to disabling all integration and automations. Slowly reloading them one at a time and seeing if the last one added crashes HA.
Now if it still crashes with it in safe mode, you may have a hardware issue or a base install issue. I would check your power supply first, then move to storage (SSD do corrupt and die), but your might be able to fix using a chkdsk type hard drive checker.
You have to be methodical.

Cheers. I’ve been looking for a window of time to do this.

I was hoping in the meantime there might’ve been something standing out in the logs that might help… seems not.

You might have solved it for me… it was crashing even in safe mode.

So I’ve ordered a new NUC and will await its delivery.

I’ve had a few lightning strikes lately and lost an xbox twice and a DSL modem once, plus the surge blew out a hub and some other gear. I hadn’t thought my trusty Intel NUC and its SSD was at all affected.

Yes, I have backups!

Thanks for your advice.

I’ll see how we go when the new hardware is installed.

You might want to look at a UPS for protecting your home network (modem, gateway/router, switches, HA unit, etc)

Does the NUC have a separate power supply like a brick? That might be the only bad item, usually filter caps take the beating and fail, putting more ripple on the DC that the computer can’t handle.