HA keeps restarting since a few day - how to find the source

phil-schneider · December 23, 2023, 5:47pm

Hello,
my home assistant instance on omy raspi keeps restarting since a few days.
It was running since over a year on an external ssd. I never had issues.
But since a few days I get at least one restart per hour. During the night maybe every four hours.
I did not add any new hardware or anything else. The raspi is not hot, so cooling is not an issue.

How can I start to find the source of the problem? How to look for an issue?

Core2023.12.3
Supervisor2023.12.0
Operating System11.2
Frontend20231208.2

tmjpugh · December 23, 2023, 6:38pm

First you must determine if HA is restarting or host server is restarting. I guess you may use something external like dhcp on router and check dhcp lease or connection time. If connection time is same as reboot time then it is host restart.

Also
Power supply could be issue
storage could be issue
You must verify these are OK as well

nickrout · December 23, 2023, 8:20pm

One word: logs

phil-schneider · December 24, 2023, 9:44am

Hello,
thanks for the help.
DHCP will not help, because my HA has a static IP.
The logs where not helpful at the moment, because they start after the reboot
I installed a partial backup which was take for the 2023.12.0 release, but even the backup has the restart problems. But the backup did not cause problems when it was the original version.
Sadly I dont have a raspi4 spare power supply to test this.

nickrout · December 24, 2023, 9:49am

The old log is renamed and remains available.

phil-schneider · December 24, 2023, 10:06am

Ahh , I didnt know that. I will have a look into that.

What seems also strange is my database file.
In my config I have this

recorder:
  purge_keep_days: 40
  commit_interval: 30

the size of my file is: 28383880.0 KiB
So even when I start the purge service with keep days set to zero. The size does go down.

phil-schneider · December 29, 2023, 12:17pm

Did not find anything in the logs
I bought a new power supply. 5 restarts in the night with the new powersupply. So thats not the issue.
My new HA green hardware arrived today. So lets see what I can do with this.

phil-schneider · December 31, 2023, 1:33pm

I found the issue:
After setting up the HA green from scratch I also get random reboots.
But my old system was suddenly stable for six hours.
The different: The old system (raspi4) did not had the USB Conbee2 stick for ZHA.
After I removed the Conbee2 stick from the new system, it was stable too.

So it seems the conbee2 with ZHA was/is the problem.
I have now ordered a SkyConnect Sticka and a conbee3 Stick.
my Conbee2 stick is/was at least five to seven years old.

danielperna84 · December 31, 2023, 2:37pm

Not sure if it is related, but I’ll report here anyways:
I’m running HAOS as a Proxmox VM on a Beelink Mini PC. It was running stable the whole time. And also since a few days ago, the whole system started rebooting / crashing randomly (first crash randomly, then always pretty quickly after HAOS was fully booted, inkl. Add-Ons.).

For me the issue started, when I’ve upgraded the Unifi Add-on to the latest version. Since I have disabled the Add-on, I had no more crashes.

I’m not sure how the Add-on is able to crash the whole system. Typically only the HAOS VM should reboot, as that’s one of the benefits of using VMs. And maybe it’s all just coincidence.

I did watch the dmesg output in the follow-mode to see, if there’s anything helpful when the crash happens. But at least across the SSH connection no messages appeared. So maybe a Kernel panic with an instant reboot is what’s happening in my case.

Oh, and I don’t have any USB devices attached. So at least for me that’s not the cause of the issue. And as stated before, after disabling the Unifi Add-on I had no more crashes.

EDIT: As it turns out, my Mini-PC was delivered with defective RAM. Hence at first everything seems to have worked sufficiently for the system to run stable, but as soon as a certain amount of RAM was consumed, the OS could not handle the defective RAM blocks anymore. It just happend to be the UniFi Add-on, that consumed just as much RAM needed to trigger those crashes.

phil-schneider · January 17, 2024, 8:30pm

Small update. I think this is/was related to ZHA integration.
I have now deactivated ZHA and I am using Z2M.
I dont have any restarts anymore.