Scheduled "ha core reboot" and test for hangs?

Ambidexter · November 28, 2021, 5:18am

Hi All,

Thanks for taking a look at my question!

I’ve got the similar “HA on my RPi4 has started hanging and SSL doesn’t work and requires a cold-start” issue.

To troubleshoot it, I know that powering off my Pi and starting it works, so at one point where it hung and I was around to SSH to it locally, I performed a “ha core reboot”, and the Lovelace page came back after it restarted.

So I have 3 questions: 1) what’s the best way to schedule a job to recycle it, say every 3 days or so, and when I want to restart what copies of which log or database file(s) should I export - if possibe? and 2) If this restart of the core continues to work, I’d want to replace it with the “next higher” level of reboot. That is, instead of the core, what other process or service can I restart and see if this clears my issue?

Doing #2 above would help me narrow down the issue. For example, restarting automations may fix it, though I doubt it, so I’d want the next “less serious” restart than the “ha core reboot” command. Of course, some restart won’t fix the issue, then I’d power cycle it, and try another type of restart.

If I wanted a watchdog-like functionality, what task, process, memory or CPU utilization, etc. would I test for before whatever reboot / restart works? So, something like “if CPU/Memory/disk is > 90% for 15 minutes, then restart”. Maybe even “If log entries per minute > 500 for 15 minutes, restart”?

Failing something like this, I’d just schedule a reboot at 3 AM every 3 days. (is there a cron daemon on HA?)

Thanks,

Ambi