HA HA not funny Home Assistant in a cluster

I’m looking to rebuild Home Assistant in a new configuration but I’m not sure how I should split up services. I was thinking about running HAOS on a VM under Proxmox, and things like Mosquitto, PiHole, NodeRed, etc. as LXC containers. I currently have everything in its own container using Docker under Alpine Linux on HP thin clients. Reliability has been ok, but not where I want it. I’m sometimes away for long periods, and HA has been great to keep things going. So far it has only gone down once when I was away, but there have been other outages and sometimes I didn’t have the time to touch anything so it would be down for a few days. Since I moved Mosquitto and NodeRed to a different box. I could just plug in a backup Pi with HA and all was good. But restores were always a problem because of breaking changes across versions. The backup box would be months if not years out of date and even the main box was often not current.

My goal is to get more resilience to faults, via a cluster or warm standby with replication. Would this be better with Proxmox or maybe Kubernetes? I’ve been using Home Assistant for many years, starting on a Pi. When they became hard to get, I moved to HP thin clients and ran everything under Docker. Having moved yet again, I’m going to rebuild from scratch, so I want to do what I can to make things better. I run over 150 Tasmota devices, and I’m looking at ESPHome for some new items I want to build.

I’ve done some searching and reading here, but haven’t found all that much. It doesn’t help the High Availability and Home Assistant both share HA. This project has been on the back burner ever since I had SD failures on the Pi platform, many years ago. Just the other day, I had my first M.2 SSD failure, so that is reminding ME that I really want some kind of redundancy for HA. Is anyone running HA in a clustered environment?

If you want a solution which can be scaled to tier-4 datacenter SLA and infrastructure for use at home with HA and any other domotics platform the only reasonable option is Proxmox. You can start with a single node and add nodes until such time you can create a cluster.

Backups with Proxmox can also be fully automated and in terms of service continuity Proxmox is as good as any of the top type-1 hypervisors. Plus if you opt for a paid license to support the project it has a very reasonable cost for homelab use.

For a dependable domotics/HA setup the only reasonable option is Proxmox IMHO.

but don’t forget a good solution for the wireless networks like Zigbee, or other things like a google coral stick.

if you use a USB dongle here, it will be very difficult or impossible to auto migrate in the event of a failure. here I would rather use a lan zigbee stick, but there is again the “risk” if this fails, your entire zigbee network fails.

I chose this one: ZigStar LAN Gateway by using zigbee2mqtt

I use only one VM => HomeAssistant. I installed the Portainer Addon. Each day the VM is backed up to another NAS. If I play with a lot of addons I create a manual Backup before. The big benefit is, if I lost the Proxmox Node I can just get a new PC, install Proxmox and upload the Backup. The full backup just take 4GB. It’s really nice. And the downtime is lower than 2mins for the full backup.

Thanks for the replies. It sounds like Proxmox is the way to go. I will do some experiments to see how failover works for that.
I don’t use any Zigbee or Z-Wave stuff as I tend to build whatever I need from ESP boards. Currently my cameras are on a different system. However, I do want to merge cameras into the mix and having Frigate use a Coral TPU would be great. All the nodes will have a mini PCIe slot so if I can pass that through in Proxmox it would work, right? Or I can go USB if my hardware doesn’t support IOMMU?

I’m reluctant to add containers to HAOS as addons, as in the past this was always a problem every time I had to reboot HA having everything come up in the right order wasn’t always guaranteed. And HA always wanted reboots for updates or changes. That was one of the things I was hoping to fix with a cluster. Moving everything to separate containers took care of that and I really do like Portainer for management. Getting the dependencies right made everything “just work”. But with HA in a container, I did lose a lot of optional things. I haven’t had much time with HAOS but I did spin it up to look it over when I get some free time.

For whatever is worth, anything that can be run outside HA, should be run outside HA. This is the beauty using Proxmox as all communication between inside and outside HA is done locally at the hypervisor level.

Use LXCs when possible (almost any HA add-on, Docker) and KVMs when you must (e.g. Windows, OPNsense, Linux desktop distros).

That was my thinking exactly. Looks like I will need 3 nodes for an HA cluster. And then some kind of HA shared storage. This rabbit hole might get pretty deep.