After having a dead remote HA after OS 15 upgrade. I would like to try to do something that could incredibly stupid:
Create cron job in HA that runs every 30 minutes that:
Check if HA is fully booted
If not, wait 10 minutes (just in case its in the middle of a reboot)
Check again if HA is fully booted
If not, switch Slots (‘ha os boot-slot other’) and reboot
In intent of this is for those of us Not at our HA home, and we installed an update that fails with HA stuck at the GRUB menu, it would be able to fix itself and reboot.
So I need some assistance with the following:
How to determine if HA is fully booted and running?
Is cron running in a failed boot scenario (at the GRUB prompt)?
Or maybe this has already been done and someone has a script?
Nice idea.
Implementing it is not straightforward.
Here is one way to implement it:
Testing if the server is up, is very simple, just check if the URL in port 8123 is alive.
The server can’t monitor itself, because when the OS doesn’t start, cron is not working. So you have to have another PC \ server to monitor the HA server.
Also, since the HA OS is not up (not even for SSH), you can’t send a switch slot command. What you can do, is reboot the HA several times until it decides to switch slots.
You could reboot the HA server by connecting it to a smart plug, and turn it on/off using a smart hub, or with a direct command to the smart plug.
Of course, if that happend, some kind of notification is needed, so the admin could later fix the damaged slot, otherwise on the second time it happens, you’re left with no working slots.
Maybe somebody else could think about a simpler implementation…
Providing you can get your hands on it. I cannot. It is literally thousands of miles away. I’ve phoned-a-friend, but it’ll be another day before he gets there.
This is not ideal, to say the least.
Why wouldn’t I?
It worked locally. Are you saying these updates are so sketchy that we should not install them?
I guess I’ll wait a year between updates from now on.
Sweet.
P.S. The first thing asked when reporting and issue is “Are you on the latest release?”
See the catch-22?
Honestly, yes n.0 OS updates have been pretty flaky lately.
At least monitor the forum and wait for a dot release if you are not in a position to attend your server.
PiKVM, BliKVM, JetKVM, NanoKVM, or TinyPilot, may be of some use if you travel a lot.
Small devices that give you remote console access right through POST. I travel for 12 months at a time and am currently looking into fitting a couple of these to my servers. Haven’t decided which one yet. NanoKVM is out of the running (closed source)