My main server died last night, and that is OK (Plan for the worst)

Last night after laying down in bed one of my switches caught my eye as it will flash red then green in a loop when it loses a WiFi connection or connection to my MQTT server.

I open up my laptop to troubleshoot and discover that I no longer have WiFi, and in general my network seems to be down.

Long story short my main server that runs my router (PFsense), Home Assistant, and some other services is dead as a door nail. I am guessing a power supply failure as there was no life in it at all. (autopsy when I have some time)

You can see details on my setup here:

Guessing that this might happen one day I have always planned to have duplicate hardware around. Commence ripping out hardware from a spare and swapping in HDDs and a multiport NIC and fire it up.

Awhile back I put together some items I have found helpful /good practices:

Static IPs - This helped a ton, as my DHCP server was also down and my DNS going by host names wasn’t going to work. Also, having hardware automatically use (not DHCP) an IP is invaluable when that section of your network isn’t functioning.

Backups - I didn’t need this one for this issue luckily. However, if the power supply had taken out my drive, I have an offsite backup (the Google drive backup add-on is awesome) I can quickly rebuild my instance based on a snapshot from that morning.

Plan for things to go wrong - I only have 1 light that depends on HA to function. Running around my house finding hardware would have been a nightmare if I depended on HA for lighting as I would have been in the dark.

This is the biggie, I can pull the plug on my instance (or it could die again) and with the exception of the one light I mentioned, our home is pretty easy to get back into “manual” mode. My in-wall switches function without HA and things like HVAC function like normal as HA only adjusts settings instead of actually running the system.

Anyway, I thought I would share, I think the message is getting out about backups, but the same principle of easy restoration goes for your entire system, or be able to live without it.

Id be interested to hear what others have done to plan for an untimely death or failure when your not around.

1 Like

If you have ever experienced networks issues with unifi control down you will immediately learn not having static IPs can be a pain. And nmap helps

And Yes, if you can’t operate by hand you will run into problems one day

Everything static, macs, aliases and IPs stored in a spreadsheet on my NAS in case of router failure (router config backups too, but failure might be the impetus for an upgrade).

Still on a pi (big Easter project coming up to change that) so nightly SD card snapshots that are copied off to my NAS daily. Monthly off-site backup of the whole NAS to portable HDDs. It’s not that onerous, takes me an hour or so a month to file-sync the changes to 4x 2TB portable drives.

Lights are all Lifx controlled by movement/presence/timers. If the system goes down they can still be controlled by the physical light switches.

I still need to sort my backup regime for after Easter when I’m (hopefully) going to be running HASSIO in docker on a mini pc running Ubuntu server LTS.

Snapshots to NAS to off-site will still do for HA. and I might make a Clonezilla image of the complete server HDD after everything is set up. And then do that again only after major updates/changes. Not sure how useful that would be as it would require purchasing the same mini-pc if something catastrophic happened to it and technology moves so fast so they probably wont be available.

I’ll have two pi’s running GPIO sensors and switches over MQTT for which I just need to make images of the SD cards and store off site. Again this probably doesn’t have to be done that often, if at all. I have the MQTT <-> GPIO bridge configs backed up and it’s not that hard to set up the pis and bridges again from scratch.

1 Like

This is a good one, static IPs are great but if you have no idea what those static address are, you are going to be in trouble. I have something similar on google drive so I can pull it up on my phone.

Yeah, having an extra Pi is a quick and relatively cheap swap. Having an extra PC/Server, not so much. I lucked out finding a company with a corporate policy of discarding any hardware older than 3 years.

This reminds me of this funny post.

I read a post on another forum about someone finding out there UPS was failing only when the power went out.

This brought to mind another good idea…

Test to confirm your plans work Software backups, a UPS, hardware spares are all great ideas to have in place, but make sure to test your plan once and awhile. I do this from time to time with my UPS, I will unplug it from the wall and see how it behaves. I’ve tested the backup routine I have and confirmed that I can restore quickly if needed. (this is often done when Im moving to/from hardware or testing things)

I have read a number of threads about automated backups working and now one about a UPS failing when really needed. Made me think that these things need to be tested periodically as a good practice.

There are a few HA Integrations for UPSs like APC.
Not sure of what sensor data they expose

But yes, need to confirm your plan will work

I have mine integrated in HA and it repots charge and runtime but apparently some UPSs aren’t good at reporting as the batteries age. In the case I read the UPS didn’t give any indication the batteries were failing.