Hey @Xinil I’m going to rope you back over onto the main thread since the High Availability topic is a bit off-topic for the Autodoser thread, if that’s okay with you. I’ll quote your latest comment on the autodoser thread here, just to make it easier to follow.
Thanks so much for the detailed walkthrough of your home system, cowboy! You’ve got an amazing setup and a seriously impressive workflow of failure-triggers, backups and redundancies.
You’re welcome.
FWIW, this setup began itself life in 2003 as a single 250litre tank which I adopted from a previous owner who was moving back to Germany. In a way, it was more of a rescue case as they had let it fall into neglect and it had Old Tank Syndrome after 5 years, and just wanted to get rid of it all. I nursed it back to health, and started growing it out slowly by just adding more and more tanks to the overall system through the years. At some point, I realised I had so much time and labor invested in it, I really wanted to have the same kind of reliability in the aquarium automation controller that I was familiar with building in my day job working at mobile phone network operators, where network & system uptime HAS to be better than 99.97% available. And I’d already been scratching my head wondering why wasn’t there any commercial or open source aquarium controller that offered this kind of guarantee. Since there wasn’t any, that’s why I resorted to building it myself.
First things first, I need to set up a secondary/backup HA! I don’t know why I haven’t done this yet but I guess I needed to see other people doing it first to know it was possible. I run HA in a VM as well using the qcow2 image. So my plan would be to install HA on a separate machine (I have two Unraid servers each running VMs/dockers) and set up a similar set of ‘secondary node conditions’ so that nothing runs without first confirming the primary is down. Do you have any more details on how you set up those conditions? I use Node-Red/MQTT for 90% of my automations.
Well, if you have two Unraid servers running their own VM/docker containers, you’ve already got a good base to work from. I can’t advise on Unraid as I use XCP-NG (XenCloudPlatform-NextGen) as my VM hosts, largely due to 1.) I was already familiar with Citrix XenServer at the GSM Mobile phone companies I worked at, and 2.) XCP-NG had just launched as an open source project with intentions to reclaim Xen back from Citrix who was moving at the time to make Xen Server more proprietary and closed source, and 3.) If one builds the XenOrchestra Community Edition management tool from the sources (which is pretty easy), then one get’s ALL the enterprise management tools like Live Migrations between hosts & full backup suite and other functionalities.
One thing I forgot to mention in my previous reply, is for my VM install of HA, I actually broke out the Home Assistant DB SQL server & MQTT server into their own dedicated VMs, for a few reasons, and recommend others do the same. My primary HA node which runs on my VM, is the one node that runs all critical (i.e.- aquarium automations) and non-critical (i.e.-streaming beach & sunset cams, etc.) functions when everything is working properly. I want to keep a very long DB history (several months-to-a-year) of water parameters, and automation history, which I don’t want to lose if I use a snapshot / backup restore function of the Home Assistant VM itself.
Further, the reason for doing this for the MQTT server is that I want the other RaspberryPi nodes to be able to talk to each other, even if the VM HA host goes down or is being rebooted or upgraded. I’ve found that Mosquito MQTT server is very fast and very reliable in practice, but in that extremely rare case that it were to crash, my XCP-NG box is setup to reboot it (which takes just like 15 seconds) or spin it up on my secondary XCP-NG host. (I won’t go into detail here, but there is a way to have a redundant installation of MQTT server on a RaspberryPi in a cold-standby mode that only fires up if the primary MQTT server were to fail, but I do not use that at this time.)
I also do not use Node-Red at all. By the time I saw more people were starting to do this, I had become quite comfortable doing all my automations natively in YAML on HA and felt that NodeRed would just add more complexity to my solution, which I wanted to avoid. It might be possible to do this using NodeRed, but being I’ve never used it, I can’t advise on that myself.
I also thought Zwave was limiting / blocking in its capabilities and reliability. Others have had better experience with Zwave, but in my limited trials with an Aeotec Zwave dongle and one single Aeotec SmartSocket6 hooked up to our coffee maker, I had to many Zwave network issues despite the dongle & smart socket just being a few meters apart & clear line of sight. Also, since the dongle has to be plugged into a host, if said host were to die in the middle of the night, this could present problems. I found HTTP, MQTT and SNMP much more flexible, reliable and scalable in this regards compared to Zwave.
For the High Availability functions, I broke out each function as much as possible / practical to do so, for more granularity of control, specifically disabling specific HA-HA automations should the need arise, without disabling the full suite of HA-HA functions. For instance, maybe I want to turn off the Sound Alert automations for a failure, without disabling the shutdown of of the autodosers themselves.
For example, this is the function to safely shutdown any/all dosing operations (with extreme prejudice - i.e.- I just assume they may be running, rather than testing to see if they are running) if one of my RaspberryPi HA-HA nodes detects either the VM Host has gone offline, or if the HA VM has gone offline or if the HA application itself has stopped sending MQTT heartbeat signals.
So as you can see, while the triggers are the same, the automation actions (and automation labels) are different for that greater granularity of control - I can turn off a specific High Availability or Alarm function while leaving the rest remaining. This is useful for debugging, testing and even at times, during upgrades (because I tend to upgrade one host at a time).
BTW, in my professional experience, I’ve seen VM Host network stacks do a “half-crash” - where the VM’s keep running & can keep talking to each other & keep existing open connections to external boxes open, but any new connections made from an external box to a VM host get denied / dropped. It’s an extremely rare condition, but having seen it happen in enterprise environments, that is why I reply on pinging the VM host, the VM container, and an MQTT heartbeat from the VM container - this can reveal the nature of the failure that occurred & how to react.
Like the above examples, I have another automation with the same triggers but a different automation action that toggles an input boolean for the locally implemented (on that RaspberryPi) autodoser automations. These local automations are the very same ones which are already shared on the AutoDoser thread with an extra condition added for that input boolean so they do not run unless the boolean is activated. The trigger also has a longer timeout (10 minutes) so that the dosers are not run from a secondary node in the event I’m just rebooting the VM (30-40 seconds) or VM host (5-6 minutes).
As for dosing…there’s a lot of consider. I’ve already installed a Reef Pi (running on a Pi Zero) and I have a pH probe reporting to my HA system. it works wonderfully.
FWIW, I investigated the ReefPi project, but at the time (end of 2016/early 2017) it had appeared the project development had stalled & documentation was kinda spotty, so I didn’t pursue that. However, I am glad to see it got picked up again and is moving forward once again. I’m familiar with several people on this thread here, whom have integrated the ReefPi with HA so I know it’s totally doable, but I never got into it myself.
Another contributor ended up writing the code to natively support Seneye devices (which I have 3 of) and I’ve been happy with that path the last few years for pH and NH3 Ammonia monitoring.
Given I can do more with this Reef Pi, my two options are:
A) Add a Reef Pi doser and make it part of that system - independent of HA and open to failures on its own (adding redundancy here seems very difficult and I’m honestly very if the risk is worth the ‘simplicity’ reward.)
or B) get an AC powered doser and trigger it via HA and Z-wave. I need to set up my secondary HA before I’d be comfortable going down this route.
I can’t comment much for option A, but could speculate if you go this route, you have two sub options with this path - run the doser logic from HA through ReefPi or have the doser logic run natively in ReefPi itself.
For option B, there’s two options actually - you can buy a Sonoff 4CH that supports just AC switching (my existing doser was AC…so the choice was that) or if you want to use DC pumps, Sonoff makes a variant of the 4CH that is AC/DC switching compatible.
What I can say is while MQTT is extremely fast, it’s not atomic fast. There is a tiny, tiny bit of network latency. In practice, and because my dosing pumps pump rather slowly (just 0.41667ml every 1 second for my MG/CA/Alk/Iron dosers & double that for my Reverse Osmosis and Kalkwasser dosers) this tiny bit of latency doesn’t translate into much in the way of variation. For instance, my Kalkwasser is set to dose 6 Liters over the night period, split up 12 times every hour. At the end of the day in practice, this comes out to maybe 6002ml dosed one day, the next day might be 5997ml, the next day it might be 5999ml, etc. So the deviation in practice is quite small. The smallest amount of daily dosing is the Iron doser, which is set to 22ml a day - in practice, the daily deviation is less than just 0.2ml (i.e. - 21.85ml, 21.92ml, 22.05ml, etc.). In practice, these tiny deviations do not make themselves apparent in my chemistry testing of my reef. In fact, my Iron has ridden the last 3+ years extremely stable at my target of 0.1ppm. And I would expect they wouldn’t be measurable in your smaller system either, however, I do want to be very honest and transparent about this.
On the flip side, I can imagine some of that latency is also from the Linux OS itself, as Linux is not an atomic real time OS, so it’s possible even with direct hardwired GPIO from a ReefPi, you might still see some tiny bit of latency & variation in what actually gets dosed.
One way around this would be to implement the doser control logic directly on the ESP chip of the Sonoff 4CH itself & simply use HA to push the variable runtime parameters to the ESP chip. I just never got around to doing this myself, because I’ve not come across the need to have that level of accuracy.
But you do have lots of options to choose from for your own implementation here, I think.
I hope that helps in deciding which way you want to go. Again, keep us updated what you decide and how your implementation goes. I always appreciate hearing about other people’s journeys with this, and if you feel inclined to do so, feel free to share a few pictures of your Tank and your HA solution implementation.