Diseaster / Fallback-Strategy when Home Assistant needs Assistance

Dear Community,

What is your Fallback-Strategy for your Homeassistant-Infrastructure?

Last Week the Power of the RaspberryPi was cut off, leading to a likely corrupted SD-Card. Which lead to Lights, Devices and even Internet not usable.
My Partner was really pissed, even now she still wants to “Quit all that stuff if Things like that happen” (which i can kinda understand), and my Evening wasn’t on the Couch but on the Desk.
Fortunatly i had a recent Backup on my Nas and a spare MicroSD Card as well, so i could set up HA again. But it took several Hours anyway, and a significant amount of sweating and swearing :crazy_face:

To make a long Story Short:
I really want/need to avoid Downtimes of Homeassistant, wether due to Hardware-Failure (RaspberryPI, MicroSD), Bad Updates, or Errors in Front of the Display (Misconfiguration or Misuse).
Not only 80% of the Lights are operated via HA, but AdguardHome-Addon (as in Fact the DNS Server for the Internet) relies on a Functional System as well.
So, even five Minutes of not beeing able to operate the Lights or Websurfing is really bad.
Since i guess almost everyone uses HA at least for their Lights, and probably a lot of you had Situations like me lately - what is your Fallback-Strategy, to minimize Downtimes and ensure Availability in Case of something goes wrong?

What is my Setup at the Moment:

  • RaspberryPi 4 with MicroSD Sandisk Extreme Pro
  • HomeassistantOS with (among others Addons) AdguardHome as DNS-Server configured, Zigbee via Conbee-Deconz (to name the most Important)
  • NAS pulling the Backup-Folder every Night
  • I do have a spare RaspberryPi 3B and some used SSD-Drives in the Drawer

At the Moment i think about
a) Make the Live-System more durable (against Power Cutoffs and Wearing) by using a SSD as Datadisk
b) Use my Raspberry 3B as Fallback-System, with (probably automated?) regulary restored Backups from the Live-System

Did i miss an Option, can you think of better Solutions, Advices how to set up a Fallback-System?

For me it was not the first Time that i had to immediatly care about the System, as e.g. Lights didnt work as expected or didnt work at all :smiley: Grab the latest Backup, restore it and everything works again is only the Best Case, and even that would take half a half Hour. My last Downtime lasted in total several hours, and my experience is, it is most of the Time always like that.
So i want to avoid to have that hassle again, and stay relaxed when the Fallback-System is kicking in, and care the other Day about the Problems with the Live-System.

Happy to hear your Backup/Fallback Strategies!

My general strategy is a fallback to dumb approach to all critical infrastructure in the home. Anything where a failure of the home automation system would lead to major discomfort or even safety concerns, including but not exclusively about lighting, heating, locks, fire safety, must be able to fall back to normal manual usage when the brains of the HA system goes down. And this fallback must be a implicit and automatic. For lights that means normal pre-automation switches and light circuits. For climate, that means a thermostat that also works in dumb mode. For locks it means, well you know, a physical key.

The HA system is there to augment all this, not substitute it. If I turn off my server, the house turns back to a dumb normal house. I admittedly have a few lights that are only controlled electronically without a manual wall switch, but those are all either not that important or purely decorative. Even my security camera system works without HA on its own dedicated off the shelf NVR.

I think everybody here fully realizes that HA and the hardware typically used to run it are far from being high availability systems. Especially HA updates are highly problematic in that respect, due to a lack of a stable long term support release channel. Of course there are ways to make it more stable and reliable, but those can be a lot of upfront work. I rely almost exclusively on MQTT for my HA system and I only use a couple of integrations in HA. That makes breaking changes much less likely. I also use a lot of direct associations linking devices that will keep working if / when HA goes down.

I don’t think that having a physical fallback architecture with a second Pi would be beneficial. In fact, I think it would make it worse as it would increase maintenance and is likely to fail when needed the most. About SSD, keep in mind that modern high endurance SD cards use the same memory chips as SSDs do and do comparable wear leveling. Don’t forget a UPS on your Pi / NAS too, that will solve the corruption problem on power outage.

4 Likes

We had a similar discussion here before I started integrating HA into more areas/topics of our household. In the end it’s now the same approach as @HeyImAlex described. Concrete example: our most important everyday-used lights are connected to old 433mhz RF-controlled power outlets, and the HA integration works via an ESP with RF sender module. But we still have 2 manual remotes laying around which we also use regularly. Same thing for the heating: the button on the thermostat is being operated by a switchbot, so if that one dies we just press the buttons ourselves. I would have liked a deeper integration into the heating system, but in the end the switchbot is the good compromise between “it works with HA” and “everything works as before still”.

:+1:

I completely agree with this.

And my system looks very similar to yours infra-structure wise aside from me not using MQTT for everything.

Having another machine capable of running HA at the ready and a decent/timely HA backup strategy is right behind that in priority.

1 Like

My “thoughts” to , building a surveillance/automation/heating controlling system, that leaves front Doors open, lights closed, water running, and the wood-Stowe/Sun water “Boiler” exploding, is a Disaster

I totally agree. If HA goes down, I still can use the wall switches to turn on the lights. However, I have a ‘standby’ HA running that can take over if my main HA instance fails. And my Pihole (DHCP/DNS/adblock server), mqtt server and Zigbee2mqtt server run on a separate Pi, independent from HA.

1 Like

Well - this is definitly a clear Direction: If Important, always keep Possiblity to become dumb :smiley:
Happy i haven’t ordered Heating/Locker :smiley:

Couple of short Questions first:
Are you aware of a quite reliable Zigbee-Bridge, that one could put in the Middle before HA and after Switches/MotionTriggers/Lights? Then i could probably still use interact via HA, but could hand over control in Case of HA beeing down.

Regarding a Fallback-HA System: I think i still want to continue with that Idea. Regulary Backups are already automatically pulled to my NAS.
Next Step might be automatic Restoring of the second-latest Backups from the Live System to the Backup-System. To make it as reliable as possible, i might come up with a Dual Backup/Restore Strategy: In Addition to the regulary updated Backups a Longterm Backup, with extended tested, probably on a second SD-Card.
How do you guys keep your Fallback-System Up-To-Date? Totally manually?
Do you have any automatic A/B Switching - is the Fallback System always on, and kicks in when the Live System isnt reachable? Or, again, totally manually?

UPS: Are you aware of a Kind of Powerbank-UPS that could do passthrough and have enough V/A for a RaspberryPi 4, probably with a SSD connected?
If possible, i’d want to avoid a “Big” Powersupply. Never had Problems for the NAS, and having a 3-2-1 Backup-Strategy leaves me quite comfortable enough.

Now for the Thougts on Lights Dumb vs Smart:
While i can totally follow you, i’m still a bit unsure if there is a real and practical Hybrid-Solution for e.g. our Lights: We have Wallswitches for all Lights, and after your Feedback definitly will continue using them. But that implies a Couple of Downsides:

If the (Smart Zigbee) Lights get turned off via Wallswitch (which happens quite often because of beeing used to) no Automation in the World will turn them on again.
If they get turned on again hardwired, they return to their last state. So always dimmed to 10% if you want 100%, and vice versa :-p
Theres always a Second Wireless Remote (thats what my Partner demands) next to the Wallswitch
In Case HA really is Down, theres no Possibilty to dim them up again. Only Plan C: Reset and pair them with the Switches directly

These are all together quite a few Downsides. I can only think of a reliable Dumb-Solution by. Well… Using Dumb Bulbs :smiley: As soon as i want to use e.g. Motion Triggers to turn Lights on, or Dim them, i (at least partially) give up Fallback Strategies.

My Conclusion by Now regarding the Lights:
Yes. The Basic/Emergency-Lights will need to stay 80% dumb. I still can use Smart Lights, but only at 100% Brightness, and only turned off by Automations. Turn on relies on the Hardwired Method.

Uff. Still feeling a bit indifferent.
On the one Hand it is absolutely clear that important Parts always need to be Dumb operable.
On the other Hand the important Parts benefit most from beeing automated.
Would have loved to hear a story like “For 6 Years i rely on HA, have Strategy X,Y,Z, and twice the Year the Live-System has a failure, but always got covered from the Plan B/Plan C, til i fixed the Main System” :wink:

Thanks all for your constructive Hints!

Can you tell more? How do you keep it up to date, does it take over automatically (if so, how?)

Same Direction - do you do automatic restores to the Fallback-System? If so, how?

Interesting! What are you talking about for example?

What do you think about having two Versions of the Backup-System - an automatic with the second-last Backup enrolled, and a manually and with Love tested “Longterm” System?

No. I just do a manual backup after a round of development.

Which tends to be every couple of days. :laughing:

And I don’t copy anything directly to a “fall-back” machine.

If my production machine fails I can easy have my latest config copied back over to the new machine in a short time.

And in the meantime all of my stuff still works the old-fashioned way until I get the automation system back running so there is no reason at all to go to panic mode.

And don’t forget that any backup system won’t be able to truly “just work” on failure of the main system if you use things like usb based controllers (zwave, zigbee, etc) since those will have to be physically moved to the back up machine.

So there can’t ever really be a true auto fall back system if you use that type of platform.

Like associating a light switch with a light across the room without needing HA as a middleman. Or a motion sensor with some driveway lights. Z-wave, Zigbee and 433MHz devices all support that natively.

I mean backups are obviously important. I do them every time I change something. I also have spare parts around, like a second Pi. If something goes wrong, I’d just copy my last full disk image to a new SD card, pop it into the spare Pi, move the Z-Wave stick and I’m up and running again. And since the house is still functional in dumb mode, there’s no need to rush anything. Pretty much what @finity explained. I don’t have an always-online spare running or anything like that though.

Nonono, not hardwired like that. Hardwired over a wireless relay or dimmer module behind the switch (EU way) or integrated with the switch (US way). This will automatically fall back to normal dumb operation when the network goes down or when the controller is offline. They can also communicate with other switches by themselves without a controller active (direct association).

Eh I just use a normal ‘big’ UPS for the entire rack, including HA, NAS, NVR, modem and PoE routers.

i use Ikea_hub ( for non essential ) Lights/Switches, can control those lights, with HA, RemoteC, and PhoneApp (Thou i guess it’s limited to Ikea Devices)
For Temperatures+Motion sensors+switches i also use Aqara_M2_hub … common for both is they support Wifi and Ethernet ( i have both “Wired”, less disturbance in" wifi net + fast reliable connection(Hub close to devices)
PS: And don’t use any “smart switches” for your “Freezer”( beside for “testing power consumption” ), You might find water running out of your Freezer some day :laughing: ( Yes i did :wink: )

My 2 “Controllers” are wired through switch to Router, (and keep their devices, through at-least 4 hours power-loss, as i so far experienced), and can “control” the devices via Phone App(some via RC also, regardless of HA

Then those aren’t the type of controllers I was talking about. Those are probably ethernet/wifi based devices.

I was talking about the type of controllers that connect the HA PC via the local USB port on the machine itself.

Yes i got that( after reading again, after i posted :slight_smile: ), thou that was a part of my idea of having a “Home-system” that was not entirely depended on 1 single device( HA, with it’s “hard-connected” devices, which i have none of, beside keyboard/mouse :slight_smile: , as not only software failure, but even hardware and power failure is accounted, when it comes to “security / fallback strategic”

Yes The Hubs(Gateways) are, capable of connecting to router either ways( so they communicate with HA through Ethernet(in my case), devices to the aqara_hub are Zigbee/ Ble/ IR/ wifi, ikea(Zigbee)

Wait, what?! I think i totally misunderstood the Basic Concept - i thought a Zigbee-Device can only be paired once with a Controller!
So in my Case (most of my Lights are Tradfri) i could just pair my Lights/Remotes with Homeassistant and a Tradfri-Gateway / Aqara Hub? And the Remote Commands/Motion will get handed over to HA (in Case it is alive), Actions from HA will go to the Lights via the Tradfri-Gateway / Aqara-Hub?!
Is that great News or did i get it wrong again? :stuck_out_tongue_winking_eye:

Ah! Didn’t know they would fallback to dumb behavior in Case the Controller isnt available. The EU Way i dont get, since Poweroff means Poweroff, right? Doesnt matter to much tough, there are currently no plans to exchange the Switches :wink:

This is the Route i wanna go. I just think that i could minimize the Time by e.g. already have the Last Backup automatically rolled out. And/or the tested Longterm-Backup on Disc. And/or having the Fallback-Pi ping the Live-System periodically. And/or… :smiley:

Lot of great Input! Thank you all for your helpful Inputs, really appreciate it! :slight_smile:

Im not sure what / how you are thinking, or what you “think” you “got wrong” … have you thought about what i wrote

“for non essential” meaning these device connected to the Ikea “Controller/Gateway” … and yes HA stands for the automation of these “Lights / Switches” ( I have no idea how you “paired” your devices, i paired them with my phone APP, through the Ikea Controller) NOT with HA, and use the “Ikea Integration” to connect the the “Controller” with it’s devices … , and yes as you correctly assumes, the HA automations and manual click in HA-UI , controlles the lights/switches … no magic there
So, i still have the Mobil APP (ikea) which i used to “pair” the devices to the “Controller”, and i can use that APP to controll the devices as-well , No magic, PhonAPP talks via Wifi Through router, through Ethernet to “Controller” to Devices ( that i have paired with/in the controller) Hope you follow me so far … because i actually have 3 st “5 button” ikea Remote Controllers as-well ( Paired “default” ikea way, to respective Devices) … So I don’t know if this is Great News, or just “common” procedure … If HA (for some reason) can’t control the devices, i have my ikea phoneAPP, if i don’t have my phone or it’s dead, I have the “Ikea” RC’s ( which i use sometimes, when “passing by” :slight_smile: , if batteries also are dead there … I light a candle :grin:
So , I also have the Aqara-Controller, with 8 temperatures, 4 motion sensors(2 of them controls ikea-lights, through HA), 3 switches, All Paired to the Aqara-m2( actually also added basic functionality from my IR Remotes for TV / Reciever, Just for fun" through the app")… I also Added all through my Aqara PhoneAPP( So i can control all through the aqaraAPP, and added the bunch to “Homekit-Controler” Integration in HA ( Aqara-IR is not supported in Homekit-Controller) … but rest of Devices, plus Aqara ARM feature, works great in HA-“Homekit-Integration”, with the automations as-well
PS: Yes i could set up “automations” in PhoneAPP as-well, but that’s probably to take it “a step” to far-out :slight_smile:

So Yes With IKEA-Integration in HA, you ADD the IKEA Gateway( Trådfri-Gateway )
… and “Like Magic” all your devices are ready to go, your phoneAPP works as usual, so does your ikea-RC’s ( but it’s a-bit tricky with some automations( Made in HA), when triggering also with the RC, but that you’ll figure out, eventually :wink:

Not really. The relay or dimmer module sits in the wallbox and electrically replaces the switch. The light fixture is entirely controlled by the module, the wall switch is disconnected from the light circuit. The switch is then directly connected to the module instead. When you hit the switch, it tells the module to turn the light on or off locally. It behaves like a normal switch, but it goes through the modules microcontroller, which will sync the switch commands with the network (if it’s online).

It’s the same with all-in-one switches more commonly used in the US. They just have the module integrated into the switch itself.

This will very significantly limit your options for automatic ‘dumb fallback’. Replacing the switches with smart ones is pretty much the basis of the entire dumb fallback concept. Using smart lightbulbs will limit your options a lot more.

That is the missing Link i got unsure about. If there would have been a possibility for HA to react to Remote Events directly, i would have put the Gateway in the Middle. I mean, still it could be a Part of a “Fallback” strategy anyway!

This im gonna think about. If Remote → Light ↔ Gateway ↔ Homeassistant could be a helpful Piece :slight_smile: Thanks!

Totally agree. Aside the Cost and Work to exchange them, it is Politically at the Moment its not a good Moment to come up with “Hey, new switches everywhere!” :wink: :smiley:

So, my Conclusio:

  • I will identify critical Lights/Devices, and find an “As-Dumb-as-possible” Solution
  • Regulary, possible automated Backups
  • Regulary check, if restoring them is possible
  • UPS at least for the RaspberryPi

Thank you all for your great Input! Feel happy to add you Ideas, Strategies, your Diseaster-Solutions.
Cheers, and wish you all a non-corrupted Backup handy if needed :slight_smile: