Apologies in advance if the answer is yes, but did you happen to read the reply I made to Xinil just 10 days ago? In fact, it was right before your initial post to me. The context function is about liquid dosing pumps, but the method is easily carried over to lights, sprinkler systems, media centres, coffee pots, deep fryers, etc.
And for what it’s worth, I don’t think pure HA / MQTT is a mistake (although I’m curious why you think so?). In some ways, it makes the setup of Application Layer implementation of High Availability a bit easier, as all instances of HA can be subscribed to the same topic (say Temp data from a specific probe which controls your home heater/AC) and your default primary HA VM node can be setup to act on that data, while your secondary HA VM node will only act on that Temp data when the primary HA node is offline, or fails to function properly within a certain time period.
And that latter point, is one of the strengths of Application layer High Availability offers over and above VM based High Availability. With VM based High Availability, I can set triggers to reboot a VM if that VM crashes or kernel panics, but it doesn’t give me the ability to sanity check the logic and functioning health of the Application itself. Application layer High Availability can be setup to constantly sanity check and monitor the functioning health of another node’s application and/or OS VM.
This is very similar to the Space Shuttle or Crew Dragon Flight control systems - 5 redundant computers which all are running the same program, one of which is controlling (primary) and the other 4 are monitoring (secondary) nodes, the later of which are all sanity checking the performance operation of the controlling node. The moment they see the primary go out of expected functional operation - whether do to a complete system crash or a more subtle dropped CPU instruction due to a low voltage condition (which is common on RaspberryPi’s using either underpowered or failing USB Power supplies), one of the designated secondaries will jump in, demote the primary, and take control as the new primary.
And we can do this, we can replicate this kind of functionality, easily and natively inside Home Assistant, because HA gives us the possibility to monitor and control another HA node almost just as if it was a light bulb or smart switch itself, but one that offers a lot more data input to make smart decisions about that HA node’s health and operational state.
The only major difference between my primary and my 4 secondaries is that my secondary nodes have additional bits of conditional evaluation functions in their HA automations to not trigger unless the condition has been meet that a failure scenario has occurred.
BTW, this is another strength of Application Layer High Availability - you can get very granular in deciding what functionality should be mission critical and offered as High Availability, while excluding others that are not. For instance, I like having my dozen security cameras streamed to HA’s interface, and my wife likes the dozen or so Sunset & Beach cams streaming to the HA interface. I have this setup on my VM based instance of Home Assistant, which can easily handle this. But do not have these cameras streaming setup on my RaspberryPi nodes, because they would overload the CPU of RPI’s and they are not mission critical. With VM based High Availability, it’s pretty much all or nothing.
However, there is a role for VM High Availability to play here: Your MQTT server, and this is a perfect scenario for that, because we are not really concerned with MQTT storing data, but we are concerned with MQTT always being available somewhere on the network.
If you want to go down this path, and you haven’t done so yet, I’d go ahead and split your DB servers onto their own VMs on their respective VM hosts, and do the same for the MQTT server. Setup duplicate MQTT servers on both VM hosts under their own VMs that are exactly identical down to the same IP address. The only difference between them is that one should be designated a primary MQTT server that is set to always boot up and run, and the secondary MQTT server will never run, unless the primary MQTT server has crashed / been offline. Unfortunately, I don’t run Proxmox or Unraid, so I’m not exactly sure of how to technically implement that on those platforms, so you might have to research to find out if this is even possible to use the native feature sets of these VM hosts to do that. If it’s not, let me know, and I think I have a work around for that - which is a way I had solved this issue when I first began my use of HA and everything was on only a pure RaspberryPi environment (and before my incorporation of VMs for my primary, primary DB server and MQTT server).
In the meantime, if you want to give me a copy and paste example of an automation you’d like to configure as a High Availability function, I can help show you through that how to set this up in your automations & sensors.
BTW, I’m in the middle of painting my house at the moment, and am limited to periods of rain for time to post here. If I don’t reply very quickly, that means it’s sunny / not raining here and I’m out painting - I will eventually return, provided I don’t fall off the scaffold and injure or kill myself.