Substandard reliability of HA

I have SOME of these issues:

  1. I’ve seen automations stop working. However, triggering via lovelace doesn’t fix them. As best as I can tell, when I have this issue, it’s network related. Either the sensor that triggers the automation can’t reach HA, or the device that is activated in the automation can’t be reached by HA. It doesn’t make a lot of sense, because my wifi setup is overkill for the space that I have. But, a few times, I’ve managed to see a similar issue on non-HA devices (i.e. I’ll notice the issue because an automation didn’t fire, grab my phone to check on HA and see that I can pull up the interface either. Then go into another room to try to pull it up and see that it’s working fine. )

  2. I had an issue like this before. My ZWave device wasn’t reporting it’s changes all the time (common with many battery powered devices, and even some AC powered devices). I configured those devices to be polled, and it went away.

  3. My Shellies work perfectly. Easily the most reliable pieces of Home Automation gear I have. I use the “shellies_discovery” python_script to get them loaded in HA.

  4. This happens to me all the time, but it’s device specific. I have many Linear WDZ-500 wall dimmer devices, and they are terrible. One of their (many) issues is that they don’t report state regularly. They need to be polled. Adding polling configuration gets me a state update within 10 seconds or so.

  5. MQTT, in general, is the most reliable integration I’ve seen. The only issue I have with it are that entities remain “used” even once the device is removed from the discovery topic. Beyond that, it works perfectly all the time.

  6. Yes. This happens. A lot. It hangs on my desktop as well. On my desktop, reloading the window doesn’t help. And while the window is unresponsive, I can open another tab, go to HA, and that tab works fine, while the other is completely stalled.

Hi Mutt,

I’m refurbishing a 90 year old fishing trawler I live on. The integration is going to be all vessel systems when finally complete.

Currently I’ve built sensors with ESPHome for water tank levels using ultrasonic sensors then using template sensors to convert distance into litre values. Have shelly devices running lights and power sockets for pump control when tanks are empty for example. Had to replace lights so installed switches and lights that can be controlled via HA.

Have multisenors, flood sensors and heating control thermostats on Z-wave with alarming for low temperatures.

NMEA2000 navigation system information via MQTT for location, wind and so on.

I’m currently working on automating a home built reverse osmosis watermaker including flow and pressure sensors, as well as automatic valve control for water direction flow (overboard on start-up and flushing and into tanks once flushed).

Also have some modbus and canbus for battery and power management to monitor power storage, battery usage and state of charge solar panel power generation and shorepower usage and generator status. The power system automation is currently separate, but working on integration.

Monitoring pump duty cycle for bilge pumps and altering if running too long/frequently.

Integrated WebOS TV, Sonos and Bluesound originally to get familiar with how HA worked before moving on to more “involved” integration.

I’m also looking at taking engine data from a NMEA2000 system to monitor fuel usage and so on.

Due to the systems needing to run on battery I’ve avoided a dedicated PC for power consumption reasons. The Pi 4 runs at approximately 25% utilisation and only a few watts and can run 24/7 without much load on batteries.

I appreciate I’m pushing HA hard with the number of integrations, however there are numerous standards I’m trying to integrate with, that are in themselves expensive to replace, so put in a position that collecting the data from many systems is cheaper than replacing systems. However, the difficulty seems to go up exponentially with more systems.

I’m finding that there are a lot of niggles with devices dropping off monitoring and HA not handling the recovery of reconnecting gracefully all the time.

Definitely appreciate that I couldn’t have achieved what I have without HA, however feels it needs a lot of looking after to maintain.

I’m overall very positive and happy with HA, just opening a discussion on stability in more complicated environments.

Hi Tinkerer,

Thank you for your reply. I’ve opened a few issues with some of the integrations, however I must concede not all of them. I’ll chunk them into smaller problems and open with the relevant teams.

Again, appreciate your time

Hi,

Thank you for replying, just a quick question around polling intensity. Did you find it affected battery life on the devices? I set polling on for a couple of devices, however noticed battery life significantly impacted.

It does, absolutely. I think the way sleeping devices work is, they wake up every so often to ask “are there any messages for me”? And, if there are, they have work to do which consumes battery. I’ve set the polling intensity low for battery devices for this reason.

My ZWave thermostat runs on batteries. They need changing every 6 months or so. My front door lock (which admittedly, has a lot more work to do) needs changing every 2 or 3 months. However, waaaaay back in the days when I used Wink (instead of Home Assistant) and didn’t have to manage polling on my own, whatever Wink was doing was better, because the device reported reliably and batteries lasted 6 months, if not more.

First, thanks for sharing your experiences. Sorry to hear you’ve had so many issues.

It’s been my experience (over the past decade) that, regardless of the selected software platform, there’s always something that breaks or misbehaves. However, you have, unfortunately, attracted more than your fair share of problems.

With the exception of automations and Lovelace, the problems all concern integrations. In other words, they are mutually exclusive and each one will need to be resolved separately, one integration at a time (i.e. not likely there is one culprit responsible for all of this bad behavior).

There is hope because, as you’ve undoubtedly read in other posts, not everyone is seeing the same problems with the integrations you’ve listed. Perhaps they’ll offer potential solutions.

I wish I could be of more assistance but I am not using any of the integrations you are and have not experienced the reported automation issue (“randomly stop working”). I suggest you open a new thread for that issue and post at least one of the troublesome automations.

1 Like

Wow !

All I can say is Wow !

You are really at the cutting bleeding edge here.

For some of these systems you really need reliability and the rest, maybe not so much.
I’d prioritise your list and work top down.
As swiftly says polling can sometimes be a great boon but I’ve also seen some people bog down their networks with pointless polling.
Thermostats for example … By definition a thermostat switches stuff on and off, do you need that ? The reason I ask is that I have 3 thermostats (well 6 actually, 3 hardware 3 software) and I ONLY use the hardware ones as sensors for the software ones. This allows the sensor to sleep as much as possible (saving battery) and I trust it to wake up and tell me if the temperature has exceeded either a delta T or if it thinks a time horizon update is required. I can then set target values for each software thermostat without forcing the device to wake. The values can be changed through the day according to schedule/circumstances/wim/fancy and it just works. My boiler switch is not z wave plus so it’s the only device I poll but 1 per couple of minutes is more than I need. In fact now that I have built trust in the heating I no longer think I need to poll as it always reacts when the system tells it to. (and I no longer use the manual override)
Trust battery sensors to tell you stuff when they have something to say.
Browsers - well people have different browser preferences and I’ve seen issues with all of them, it also depends how fancy you get with your frontend (custom cards, photo backgrounds, photo icons etc.), I don’t give a **** so I have a very plain frontend. Automate what you can and provide minimal status. The rest (settings and diagnostic stuff) gets put on its own page right at the back, try different browsers, there are even apps for apple/android. (I couldn’t personally recommend as my experience is slight)
You seem to be progressing pretty well, so let’s help in nailing down some of those niggles.
You have picked a mammoth application to test yourself and HA against, I wish you luck.

Currently I’ve built sensors with ESPHome for water tank levels using ultrasonic sensors then using template sensors to convert distance into litre values.

I’m interested in doing something similar to monitor my cistern. Could you provide some detail on your implementation?

Thanks for adding that bit of information. The environmental aspects of where all of these systems are operating may influence their behavior. When people are considering solutions to say, zwave glitches, they now know this isn’t in a two-storey wood-framed cottage or studio apartment!

I’m running it on a Pi 3B+. Prior to that, ran it for a few weeks on an Intel-based home server (I don’t remember the exact CPU).

A comment right off the bat: the Pi 3B+ is definitely not a perfect platform. Switching from a memory card to a USB SSD disk dramatically improved the responsiveness of my system. Some installations may also run into problems with power saving. I had to turn off the WiFi power management because it would sometimes make my installation appear to be crashed or frozen. If something on the Pi tried to send out packets, it works fine. But if something tried to contact the Pi, it would take noticeable time before the interface would process the packets, and in the meantime I’d get timeouts on requests. So if I tried to access HA from my phone, I’d have to hit reload a bunch of times. Or I’d have to hit Tasker tasks a bunch of times before a request from Tasker to HA would get through. Turning the WiFi power management off fixed that for good.

All this to say, one has to be careful about determining the cause of a problem. For a period of time, I thought HA was to blame for some of the Pi’s shortcomings. I’ve not used the Pi 4 yet but I’d expect switching to a SSD to also make a huge difference. I don’t know whether it is beneficial to turn of WiFi power saving on a Pi 4.

I’ve not experienced much stability problems that could be attributed to HA itself.

Never experienced this. Whenever I’ve had an automation I expected to run that did not run, I’ve been able to trace the problem to a mistake of mine (wrong expectation, typo, bad logic, an automation gotcha I did not know about, etc), or to a faulty device.

Can’t talk about points 2, 3, 4, 5, 7, 8, 9, 11, as I don’t use those integrations/features/tools.

I’ve encountered that problem twice:

  1. I’ve had that problem with GE devices very early on. It appeared that the GE device was doing something funky with state that HA did not handle well. However, it was unclear whether the problem was with HA itself or Open ZWave. I just returned the devices.

  2. More recently I’ve had that problem with some Inovelli LZW31-SN devices. Here the problem can be squarely attributed to faulty firmware, not HA.

Most of the other devices on my network are Inovelli gen 1 devices, and I have a few Leviton devices. I’ve not had any status issues with any of them.

Never had that problem, and I do use HA by accessing its web interface through my phone regularly.

I don’t know how fishing trawlers are built but perhaps it is a challenging environment for radio? If you have a mass of equipment blocking the way for instance, that’s surely going to impair radio transmission. We have a chimney inside our house which is enough to significantly degrade radio propagation. I’ve had to work around it both for WiFi and for the ZWave network. For WiFi, I have stations on both sides of the chimney. For ZWave I’ve added a couple of plug-in switches that don’t switch anything but only relay messages. Any ZWave device that is constantly powered on 110v would do for this.

Also, metal is problematic. Even if you don’t have a mass of equipment but they’ve used metal everywhere in that boat when they built it, that’s a problem.

This being said

It is definitely possible to run into badly implemented integrations. I was looking at the imap_email_content component yesterday. It is badly implemented. It does not use the IMAP protocol efficiently and it violates the principle of least surprise.

In brief, I use a nodemcu with a waterproof ultrasonic sensor that has the transmitter receiver in the same unit. There’s also a DHT sensor to monitor room temperature that the tank is in. The ESPHome yaml config is as follows for one of the devices.

esphome:
  name: tanksensorstarboard1
  platform: ESP8266
  board: nodemcuv2

wifi:
  ssid: "ssid"
  password: "per-shared key"

# Enable logging
logger:

# Enable Home Assistant API
api:

ota:


sensor:
  - platform: ultrasonic
    trigger_pin: D1
    echo_pin: D2
    name: "Starboard Water Tank Level Sensor"
    unit_of_measurement: "L"
    icon: "mdi:water"
    accuracy_decimals: 0
    filters:
      - lambda: return (1-x) * 1000.0 - 55;
      - filter_out: nan
      
  - platform: dht
    pin: D3
    model: DHT11
    temperature:
      name: "Tank Room Temperature"
    humidity:
      name: "Tank Room Humidity"
    update_interval: 60s
    
  - platform: wifi_signal
    name: " Starboard Water Tank WiFi signal"
    update_interval: 60s

  - platform: uptime
    name: "Starboard Water Tank Sensor uptime"

text_sensor:
  - platform: version
    name: "Starboard Tank ESPHome version"

The ultrasonic sensor is actually 5v, but works ok on 3v if the distance is less than about 2m.

The calculation is for a 1m x 1m x 1m tank. Sensor mounted at the top. Subtract detected distance from 1m (the - 55 is a calibration adjustment). The result is a litre value that is then displayed as a guage in Lovelace using colours green, amber and red for a quick visual indication of tank level with the value shown as well.

I am working on a sensor that discards the highest and lowest value of the last 5 readings and then averages the 3 values left to stop rogue values from the sensor.

Thank you for a very thorough and detailed response. Quick question, how straight forward is the migration from sd card to ssd installation? It’s something I’ve been considering, just haven’t bitten the bullet so to speak yet.

With regard to construction, the vessel is pretty solid and wifi installed with extra access points over ethernet backhaul to the router rather than range extenders. Where possible I’m using wired connectivity for all devices. Z-wave I have a few AC powered devices that repeat the mesh. Z-wave isn’t my preferred option for connectivity, however in general it does seem to be a case of you get what you pay for. The cheaper devices have proved less solid than some of the more expensive units.

Did you have better performance on the PC? I’d consider a low power PC for sure, just wanted to proof of concept on Pi before investing more heavily in HA hardware.

As mentioned before, most of the issues are more annoyances than out and out show stoppers, however the frustration comes when something was working and you’re moving on to the next integration and then have to go back and spend time “fixing” something that was previously working, if that matters sense?

The aim is to replicate the level of automation and insight that normally costs 10’s of thousands of euros for like 10 cents :slight_smile: Marine systems are expensive, for example a device that does the same as a sonoff 4ch pro r2 is in the region of 300 - 400 euros just for the switching unit excluding the control cabling and displays.

I like the way you’ve implemented thermostats, I’d not thought of using the existing thermostat as just a sensor. I’m going to look into this a bit more.

I’m trying to get commercial grade stability without spending much and I guess that’s the challenge :confused:

Thank you. I’ll look into this deeper.

If you’re using HassIO then do a fresh install onto an SSD, and then restore a snapshot.

It does, but that’s the nature of Beta software, which this really still is. It’ll eventually stabilise, some time after 1.0, but for now this is a work in progress that changes every three weeks. Some of those changes are known to not be backwards compatible (like the recent changes to scene) and some come as a surprise to everybody, including the devs (like the full impact of the recent changes to scene).

If stability is important then there’s two things you can do:

  1. Run a test instance with all the same integrations configured (obviously that’s impractical sometimes, like with Z-Wave). You can upgrade that first and sanity check all the things that matter.
  2. Don’t rush to upgrade. Watch the chatter here, in the GitHub issues list, and most usefully on the Discord server. Give it at least a day to see what problems are falling out, if any, and upgrade the test instance only once you’re happy that there’s no show-stoppers for you.

That’s exactly what I do, and yes, it often means I’m a week (or more) behind in the upgrades. On the other hand, it also means that I very rarely have any real issues caused by an upgrade.

2 Likes

I have not experience these exact same issues as you, but my system does have other issues and a few creative workarounds.

The main issue for me is how often HA needs to be restarted when changes are made.

If you do need to restart HA then all automation + timers get reset back to 0. I like @flamingm0e’s workaround of using NodeRED, and am thinking of doing something similar myself.

I agree that stability if more important than features, especially with automation software, stability should be king.

Currenly I treat HA as beta software as it’s still v0.x, so wouldn’t want to rely on it anything critical.

Hopefully once HA reaches v1.x stability will not be an issue, but I don’t have a crystal ball!

We used to do something similar in PLC’s if you have a register (input_number) for the last reading you can multiply this reading by (say, and adjust this to your required responsiveness and the noise issues you suffer) 49, add the latest reading and divide by 50 putting this value back into your register.
Alternatively you will have to have 5 registers and rotate them (move 2nd oldest to oldest, 3rd oldest to 2nd etc. Then discard lowest and highest, averaging the rest, it just depends what gives you the best result for your needs. EDIT: Just a word of caution, try to minimise the frequency of this and any other forced sensor update, not just for loading on the sensor but to reduce processor load - just as good practice and common sense. EDIT2: use a time_pattern trigger for this, say - seconds: ‘/10’

I think the reason the maritime gear costs so much is that it is generally 99.9999 % reliable whereas HA gear … (with the best will in the world) … isn’t.

I think I’d have spares configured and paired ready to go and if you have an emergency … swap it out, rename the devices and you should be back up in mins rather than hours.

Look into (say) Visual Studio Code (other editors are available !) Mine runs on a separate PC as it can do name changes accross EVERY file in your instalation for this sort of thing.

OH ! and make backups after every major change (and some minor ones depending on the amount or repeated work involved).

Tink, he says he’s on a Pi 4 so the SSD route would require him to install Raspbian, this will soon be a deprecated route (Though I initially I dismissed for that reason, I’m going to set up a test system just like this (my SSD arrived today)) Anything as a consequence he (and I) should be taking into account ?

Quite a few people find the restarts a bind but we are wieghing in to try to improve this … (feel free to add your vote :smiley: )

1 Like

Thanks, by the looks of it I have already voted, I’d vote twice if I could! :smiley:

I also use NodeRED and Domoticz as a part of my setup, and both of these only ever need restarting when plugins/nodes are added or removed, config/flow changes never need the host software to be restarted. I know these are very different pieces of software, but I’d like to think there’s a chance this may be possible in HA also.

1 Like

I’m not using HassIO so my approach may not apply at all. I did it using dd and then resized the partition and fs. Probably Tinkerer’s advice would be easier to follow.

I was running HA on the PC initially because that was the expedient thing to do but it was problematic because rebooting the PC meant taking HA down too, even if the reboot had nothing to do with it. I wanted to avoid unnecessary reboots because (as you know) restarting HA is disruptive. I want it to restart only if it MUST. Besides, the PC is also a file server. Sometimes the disk checks at boot time take a long time to complete but over 90% of that time is spent with files that don’t pertain to HA. So HA would take a long time to come back for reasons that have nothing to do with HA. It was just not viable.

Making a fair comparison betwen the PC and the Pi on the basis of my experience is a bit difficult because the PC was not dedicated to running only HA, and my configuration became more complex after I moved to the Pi. One thing I can say is that moving from the sd card to a SSD drive on the Pi was critical. Prior to doing that I was thinking I might have to move to a NUC. The one thing I can say after having installed the SSD on the Pi is that I do not regret having moved HA to the Pi.

If you read the forum you’ll find quite a few people complaining about the Pi and moving their installation to a NUC, and implying that the Pi was just not good enough for HA. (Keep in mind through that some folks are talking about different versions of the Pi, sometimes very old.) I’m sure there are usage scenarios where’s that true. Some people try to have their Pi do too much. Then again, I’m sure there are cases where people just threw hardware at the problem rather than try to figure out how to tweak their setup on the Pi.

Yeah, that’s frustrating.

Absolutely – please do be aware of construction materials when considering placement of radio frequency devices. Large bulkheads or beads of steel that are a centimetre or more thick are likely to block and/or reflect radio waves, and if there is a density of transmitters in a corner then interference could well be affecting the ability for signals to be reliably transceived.

Also cheaper devices are likely to have less reliably tested software and when environmental conditions push tolerances to their limit, there is more in the chain of technology that can stop an intended automation from having its effect in the real world.

I occasionally swear at my HA system, maybe less now its on a NUC than when a Pi – but it tends to be the same group of low cost devices that don’t behave as I’d hope. And I can always reassure myself that I would not have been able to string together such a diverse range of Things into a branded solution, and that the whole setup would have cost $000s more if I had bought ‘it just works’ gear.

Enjoy your learning experience and I hope that this helpful community can help you with the individual niggles that you can manage to isolate and describe.