RPi4 running HAOS rebooting randomly in the night - how to troubleshoot?

I’ve noticed some instability in my Home Assistant this past week. The whole raspberry pi reboots in the night at odd times. 04:09, 02:02, 03:40, 03:56, 03:36, 03:42, 04:59. And also, not every day.

My only recurring task is a full backup at 00:01 daily.
Temperature is good, always <60 °C.
Memory usage is always at least 50% free.
Power supply is official RPi4 one, been working fine for 3 years.
I have two other RPi4’s in the same stack that are fine.
I’m using a 128 GB USB stick as the data disk, rather than microSD card.

I updated to the latest versions, but it still happened.
Home Assistant 2023.5.4
Supervisor 2023.04.1
Operating System 10.1

Looking at home-assistant.log.1 there isn’t anything unusual before it just surprise reboots.
Looking at the core log after a surprise reboot there is the warning that sqlite3 wasn’t shutdown cleanly. This makes me suspect it’s a hardware rather than something calling the hassio.host_reboot service.

Is there “host.log.1” equivalent I can view?
Is there anything in HA OS that can trigger a force reboot, without shutting the DB down cleanly first?
What else should I try to get more insight? Turn debugging on for something maybe?
Could it be the USB stick datadisk? How can I test that?

There is sensor to monitor your power, things can break, Deven though it’s not likely the issue.

I did a search and according to that it could be a device causing it, but if you’d ask me, it could also be the power delivery itself. So not your adapter or own hardware, but the source.
Maybe it’s unstable and since it’s at night you might not notice it in other devices. Are there other sensitive devices, maybe a router or so, that shows anything in the log?

If you have a strong powerbank, you might give that a try as a test.

Just thinking it loud

1 Like

Another search that seems to return some useful topics: Database clean job

Thanks Recte for your comments. The “RPi Power status” reports “OK” all the time. I also swapped power supply with another RPi4 and the rebooting still occurs: 04:57, 03:33, 03:12, 04:41, 03:56.

It’s annoyingly random, yet always between 2am-5am.

home-assistant.log.1 doesn’t provide much detail - what components should I turn on debugging for?

2023-06-06 01:03:01.566 ERROR (MainThread) [homeassistant.components.xbox] Error requesting xbox data: 504, message='Gateway Time-out', url=URL('https://peoplehub.xboxlive.com/users/me/people/batch/decoration/preferredcolor,detail,multiplayersummary,presencedetail')
2023-06-06 01:52:15.202 WARNING (MainThread) [homeassistant.helpers.entity] Update of binary_sensor.hive_hub_status is taking over 10 seconds
2023-06-06 02:50:00.407 WARNING (MainThread) [homeassistant.helpers.entity] Update of binary_sensor.hive_hub_status is taking over 10 seconds

Is there anything in HA OS that can trigger a force reboot, without shutting the DB down cleanly first?

It could very well be that the USB flash drive is causing an issue with writing to a bad sector. If you have a spare one, I’d give it a try. If it’s not the drive, i’d say it relates to your xbox component, assuming that binary_sensor.hive_hub_status relates to it. Have you tried to disable or remove the component as a test?

Do you see the same errors at other time slots?

What you basically have to do is eliminate possible sources. HA should not crash out of the blue.
Any idea of the system load before crashing?
Think of:

  • CPU %
  • 1m load
  • IOwait

The last one can be measured using a custom sensor:

  - sensor:
      name: CPU IO Wait
      command: top -b -n1 | grep ^CPU | awk '{printf("%.0f"), $10}'
      scan_interval: 3
      unit_of_measurement: "%"
      value_template: '{{ value }}'

will try this as mine is restarting randomly too

Although the RPi4 has a heatsink it was passive, and temperatures were getting above 60 °C. This shouldn’t be a problem for the CPU but maybe other components were getting hot, like the USB drive?
I now have a 120 mm fan blowing over the whole RPi4 and temperatures are <40 °C and the random reboots have stopped.
It’s more of a correlation, not causation, but for now my issue has gone away.

1 Like

Thanks for posting the update - good to know.
As HA becomes more enmeshed into my house I’ve been vaguely thinking I should have a plan for this kind of thing/hardware issues. Good that this thread has bumped it up my priority list!
Think I’m going to order a spare Rpi which could also help diagnose similar issues.