Reliability and failover

I have started automating my home. Currently testing smart lights and AC control. So far very happy with home assistant + zigbee2mqtt. I can see how this could cover all my use cases, BUT I have serious concern about reliability and failover when I’m away. Professionally I deal with system administration and programming, have had home server for most of my life, and I know things run fine until they don’t and one needs to plan for this.

My major concern is the AC - how do I make sure my heating won’t be left working indefinitely in case something goes wrong with the home-assistant server, the WiFi or something else. I’m using zigbee for my sensors and relay.

An actual failover for HA is not really possible as far as I know. I design everything to be complimentary to existing functions as much as possible.

So as far as Airco/heating goes, it’s thermostat should be set normally as it would for non automated homes, and it should turn off when no one is there (assuming HA works when you leave, I’d normally notice when I do). I automate it to turn on when I’m due home (that may fail, but that would not cost money, just comfort). I let HA do the exceptions, such as nobody in the room, window open, etc.

Lights should work when HA is down, so normal light buttons must be there, either smart switch with button hardwired to the bulb or with a direct association between smart switch/bulb and remotes. Favour z-wave over zigbee over wlan is my experience with reliability.

If you use ESP’s to control something, consider putting logic inside it if you can, but know ESP’s too can fail. Also remember esp’s using the HA API reboot every 15 mins if HA is down, unless you disable that.

Also avoid cloud based if you can, but I get the feeling I do not need to tell you that.

Backup. Store outside HA! Not only HA itself but also z-wave and zigbee configurations. Z-wave config is stored in the coordinator itself, so if it breaks you’ll be happy you have it.

1 Like

Take a look at Versatile Thermostat in HACS. It has a safety mode if thermosensors stop responding.

Thank you. Apart from all the cool features, does it do something on the low-level side to handle the cases when HA server is down?

Thanks! what’s ESP in your case? esp-home or something else?
The lights will be IKEA so I’ll probably have them paired.

There are some threads on the forum that discuss “high availability” but I don’t think there is a fail over system that is integrated into the system.
My advice would be similar to some of the others.

  • use hardware that is reliable (example avoid an sd card if using an rpi)
  • use the backup function and make sure you can get to the backups if hardware fails.
  • avoid cloud based systems
  • do not overload your wifi
  • be aware of breaking changes when doing updates
  • stay as up to date as possible to avoid becoming overwhelmed by changes as they come
  • use best practices when using certain hardware (usb extension cables, avoid duplicate wifi addresses, use HW network where you can)
  • learn how to write YAML
  • learn how to format code on the forum
  • expect that HA is a work in progress
  • learn much and have fun.
    There is likely as many recommendations on this forum as there are people here.
    If you find some good answers to your own needs, share them as others likely will appreciate your knowledge.
1 Like

I said, consider. Now you make me admit I didn’t make much use of ESP logic :wink: . If it is ESPHome (I use that 90% of the time) or something else does not matter that much, as long as you have some form of control over what happens.

But for instance, examples I saw were people control chlorine injection into a pool. If so, make sure the ESP itself knows when to stop, or that it stops after x time anyway. Do not wait for HA to control turning it off. If it doesn’t, it won’t be safe.

Things I use it for, a display, controlling blinds, bluetooth proxies, presence sensors etc do not require much logic built in.

Thanks. I’m proficient in writing esp32 firmware and thought about using something like the sonoff hub (esp32 + zigbee) to run some failsafe rules. But that’s effort I won’t have time to spend soon. Would be cheaper to buy ready solution.

I expected there could be temperature sensor that fires events at predefined threshold and I can pair this to the relay so it works without a hub. Like IKEA’s switches and motion sensors.

An alternative would be super stupid hardware thermo relay coupled to the boiler mains or the control loop.

wow, it’s amazing!

I use multizone thermostat and have set up simple automation - if any of my zones needs heat, my dry-relay switch goes on and that gives signal for heatpump to heat

  • automation runs every 5 minutes and switches it on if needed (despite it is on or off)
  • my zwave relay has defined auto-off timer directly on relay (15 minutes).

This ensures it will go off if it does not receive “on” signal for any reason.
I am still screwed if HA is down. I was thinking of using relay board running esphome code, but never really looked into it.

That might be the limiting factor regarding reliabilty.

I changed my setup from zigbee to esphome to be resilient as possible.

My thermo/hygrostats hardware is based on an esp and thanks to esphome it can work completely independently from HA.

ESPHome is good as it can contain logic that runs even when HA is down. For that reason alone it looks like Wifi is the best wireless option right now. For instance my heat pumps automatically go on heat/cool 20C at 22:00. I suppose it’s also possible to make it a fallback logic in case connection is lost - instead of fixed time.

If the temperature control of any zone rely on wireless temperature/humidity sensors, there should be at least two of them because batteries die and connections break. I have template sensors that combine several sensors and smooth the data. That also makes it easy to replace sensors without touching any other configs.

An old laptop is really good for HA as they are fast enough, have ssd, don’t hog power, and have a display, keyboard, and UPS, all integrated.

Make sure all things you make smart fall back to some sane functionality in case of loss-of-smartness.

Every time the internet or electricity goes off one learns something new the hard way.

Just a foot note; in the last 4 years my HA server (running on VMWare under debian) has never stalled or frozen…

Some people offered esphome; but this is still a ’ computer’ and can also break down…
Or the wifi/switch, or even the AC…
I now some people run a secondary instance of HA, which might be a solution.

But off coarse, it still runs a computer, and it can break down, everything can brake down…
So in my opinion, there is no 100% save system…

While not esphome strictly speaking not a RTOS it is a big difference running a full fledged machine with a rather complex OS (booting from a dedicated disk with it’s own controller and dedicated firmware, utalizing dedicated network hardware with own controller and firmware, and so on) like HA (or even with additional complexity within a virtual machine).

For me not one esphome device started to misbehave (or even die). On the other hand my HA server has regular downtimes - for example restarting for an update or hardware maintance on the host. Also I had network gear failing in the past so some esphome devices couldn’t connect to HA anymore but still were 100% controllable locally (and via the fallback access point).

I also used ZigBee in the past but they lack the resilience esphome can offer which allows the logic to run locally without the need of network or HA (or any other controller or single point of failure).

1 Like

Thanks for all the suggestions. I see lots of ppl steer me towards esphome (which is great platform and I use it), but there are several issues:

  • esphome relies on wifi. nodes can’t autonomously talk to each other the way zigbee can
  • I have started with zigbee devices and adding wifi would require bridging via software such as HA (which I also use)

overall I see zigbee as the more resilient (communications) platform and that’s what I want to use. I see zero technical obstacles to having edge node bound to sensors and switches and running basic rules. There shouldn’t be issues having two such redundant devices running in parallel, bound to the same devices. The problem is what’s the easiest way to get this.

such edge (or router) node with proper bindings running rules beats any other solution relying on software running on server. It is independent from the wifi as well.

So how did you solve

Does your setup rely on a single point of failure as ZigBee only allows one coordinator?

They can work autonomous because they allow advanced local logic including a forced turn off of your load (AC) in case WiFi disconnects or after x time as an example. The ZigBee devices I used to own didn’t allow such “failover” functions at all…

You can use esp-now for direct node to node communication.

At the moment is not yet part of the official esphome but available as custom component.

Tasmota also has rules, but not as yaml file. Also it appears one can talk to the ZB module via console commands. Haven’t had time to go deep on this yet.

Indeeed. They mostly work in the same hardware while esphome also supports chips that aren’t compatible with tasmota like beken and realtek which are found in a lot of cheap tuya WiFi products or the raspberry pico (w) MCU.

Obviously this and more should be possible because of the great amount of freedom open source software brings (like HA!). A while ago I read the last not open sourced thing with the esp32’s is the WiFi stack - but work is going on to have that fully open source too!

I was so fed up with my unreliable ZigBee network that I took the time to find real alternatives. After some short try with tasmota I found esphome and till this very day not regret for one second that I got rid of all my ZigBee gear in favor of esphome. The difference couldn’t be bigger and reliability and specially resilience is just at another level with local logic on the nodes!

Regarding reliability and failover: This is possible to accomplish with the right setup. If you virtualize or containerize HA and run a network based Zigbee coordinator you can configure this. I run my homelab on VMware ESXi (two hosts) and do backup with Veeam. If my main host would die I can spin up HA from the backup, have the virtual machine online within a minute or two, and then move the virtual machine in the background from my backup NAS to my “production” storage. You also have functionality that can do this automatically.

Network wise you have routers that support failover to 4G/5G and in theory you could have a spare network based coordinator running unconfigured in your network also. If you throw enough money at the problem you can build your system quite resilient, but building “redundant everything” at home is rarely a good economic choice. With a VPN connection and a virtualized environment you could fix most problems remotely (and you need a good backup regime of course).

I programmed the controller for my heat pump to fail to a known state if Home Assistant stops checking-in with it (i.e. it turns off if Home Assistant hasn’t requested data for 5 minutes). This covers connectivity and software issues on the HA side.

If the controller itself were to lose power it opens the relays that control the actual system, thus always turning it off. There is additional protection configured in the heat pump itself, so even if the controller were to freeze up and fail in a closed state the call would terminate after a set timeout.

Critical systems like this should run in a closed loop and fail to a known state in case something stops responding or starts sending dubious commands.

Your best bet is to use a standalone wall thermostat that integrates with HA, that way you only control the setpoint instead of switching on and off the actual heat source.

2 Likes