Hi – I would like to pick the experts brain for some strategies to recover from power failure.
Home Assistant (2023.10.1, Supervisor 2023.11.0 Operating System 10.5) is installed on a Raspberry Pi 4 using the Raspberry 240V plug pack. I did no more with the power figuring that if main power was lost there was nothing HA could do around the house anyway. i.e there is no battery backup and no UPS.
The story is – we were on holidays 1,000 km from home when the hometown had a total power failure lasting over 12 hours. When the power company notified that power had been restored I tried to connect to HA to see that things were tidy. The reply was one of ‘connection refused’ or “This site can’t be reached. The webpage at …. might be temporarily down or it may have moved permanently to a new web address.”
When we got home 2 days later I tried to access locally with the same results. I also tried the ‘observer’ on port 4357 and got ‘access refused’. Having run out of ideas to see what state it was stuck in I pulled the power plug and reinserted it – whereupon it came up happily.
So, whilst it would be nice to know where it was stuck I am now more interested in a technique to remotely reboot the whole thing should this happen again. This time the plants lived without water for two days but next time we may be away for weeks.
There is a second Raspberry Pi in the overall design, a slave that handles I2C communication with the watering relays and other relay related tasks such as opening the garage door. In this case this slave came back online properly after power restoration. It also logged the inability to connect to the mqtt server and noticed ‘Heartbeat’ messages were missing.
So what I am saying is that there is another component already present that can tell that the main module is not playing properly. Could it be possible for this device somehow poke the main unit? It cant just send an mqtt message because that pipe must be broken for it to tell the main unit is sick. Of course that just begs the chicken and egg question of what happens if next time it is the slave that gets stuck.
P.S. The slave has one spare relay available (but I could always get an additional hat if more are needed)
Correct you want to enable the last power state mode to power on and have all critical gear connected to a dedicated UPS.
An UPS will only allow devices to be on for x time to allow safe shutdown once power loss at the socket side is detected i.e you want to enable safe shutdown of your NAS and non critical systems first that to allow more uptime for the critical gear to stay on longer.
I have a zigbee based local only smartplug at this time (need to swap to a certified one for Australian use later) that automates the power cycle of my nbn modem should the wan connection on the main router be down for 1min as well and everything in my rack is connected to the dedicated UPS I have there which gives me 90min of uptime when my gaming PC is not on and 20min with the gaming PC on).
…With an automation on HA that detects the low battery condition (mine is 5min remaining on battery) and tells ha to shut down gracefully to prevent disk issues.
I set mine to power on at boot always so when the power eventually comes back were running.
OP is on a Raspberry PI. So, no bios. It always boots on power.
And, it did power up. But, got stuck somewhere. Unfortunately, without hooking up a monitor or remoting in while it was stuck to see the messages there is no way to know why it was stuck. Without this, not sure there is much that can be done for recommendations.
This is one reason why I use a VM based setup via Virtualbox in a headless mode setup through my NUC at the moment, easier to just restart the VM this way instead of doing a full restart of the host machine whilst remoted in to the NUC.
Also one thing I forgot to mention in the other replies was this is why we test for failure points so we know what can break where and why.
Thanks everybody but a lot of you are missing the point.
No affordable UPS will hold up for 13 hours and the HA platform is not critical - certainly not in a power failure where it is controlling 240V lights, music etc. No power no light!
And I tried to remote in but it was too sick to let me.
Another post suggested that bringing up the HA before the modem has done its thing can be a problem.
That poster added a smart switch that only turned power on the HA infrastructure after a delay.
I will look into this but I think I still need something to give the RPi a kick up the3 bum.
@jeffcrum is closest to understanding the problem.
I have discovered that you can add a reset switch to a Raspberry Pi 4 and its just by adding a pin header then shorting two connections.
I intend to experiment with this by using the 2nd unit and its spare relay to short those pins if the main unit stops sending heartbeats for a long time.
Lets call it ‘external defibrillator’ savage but effective (hopefully) in this worse case scenario.
Wish me luck
JC
@jeffcrum I see, thats news to me. The router allocates a reserved IP to each of the Raspberries and other such devices (mostly because Windows makes such a mess of trying to access by name). So I will put this in as a static address and see if it makes a difference.
Thanks
JC
Found this on the web: “Normally the RPi will boot when the power comes back on. If the power outage is too short or a ‘brown out’, then it may or may not boot properly.” Perhaps it corrupted the boot process if there were multiple brownouts prior to the complete power failure. That situation is not uncommon.
This may fail, though. If the power comes back on during those 5 minutes, after HA shuts down, the system will not reboot, and thus HA will not restart. I’m not sure how to come up with something that’s resilient to this case. I’d love to have a solution that works unattended.
i have HA running on a synology NAS virtual machine , works really well, the power comes from my battery solar power system so has huge backup power , last night the grid went off and for some reason crashed the synology and the VM, HA now stuck on the same thing as guy starting this thread, how do you get it back i think is what he is saying