Substandard reliability of HA

Running HA on Pi 3B+ and changed to Pi 4 (4GB).

I was having the below issues and decided to improve the hardware to illuminate performance bottlenecks causing problems.

I’m running into a lot of stability issues with integrations running and not doing what that should do. These are small issues, but for me demonstrate a general feature stability issue within HA.

  1. automations will work and randomly stop working and not start again until they are triggered in the Lovelace interface using the “trigger” automation button and then work again for a while
  2. LGawebos integration detects the TV and detects playing and pause and can shutdown the TV, but if you pause with the remote controller it still shows as playing in HA. If you pause the tv in Lovelace then the pause is detected.
  3. Bluesound integration randomly stops connecting to the amp, but starts again after rebooting HASS. Both amp and HA have been on wired and wireless to test network stability.
  4. Z-wave thermostats in climate control change temperature to the new set temperature about 50% of the time.
  5. Shellies integration (community) turns the shelly on or off, but the Lovelace switch doesn’t always show the state change. The lights will come on, but the switch in Lovelace goes back to showing off, when the lights are on. Or the switch is on in Lovelace and then turn the light off, the light goes off, but the Lovelace switch returns to on.
  6. Z-wave switches show the wrong state in Lovelace when the lights are on or off.
  7. Asuswrt doesn’t show the correct internet download and upload total.
  8. The snapshot feature fails to copy the HA database when the database is large and logs the disk image as corrupt for the database. Lucky logging to InfluxDB keeps historical settings.
  9. MQTT auto discovery of devices is unreliable and even if the JSON mqtt content is correct the discovery doesn’t always work.
  10. Lovelace causes Chrome to hang on Android, when running as the chrome app, if taken to background and brought back to foreground becomes unresponsive and needs to force close Chrome sporadically to fix.
  11. UPnP stops polling devices, for example UPnP router sensor for download bit rate polls frequently to get the data and then stops and doesn’t collect data for hours even though nothing has changed and the HA device is ethernet connected directly to the router. After restarting HA it starts polling immediately.

There are other examples, however just thought those above illustrate the stability issue.

I love new features and love that the HA team are constantly adding them, but sometimes it feels that new features are added at the cost of performance and stability.

What are other people’s thoughts and experiences?

I’ve seen none of these issues, but I don’t run all the same things as you.

I have z-wave and shellies and they all work perfectly.

I don’t use hassio, so I can’t comment on snapshots, but that’s not fair to lump in that Hassio ONLY feature as an HA problem. Also, the db is merely state/history, and unimportant for the most part.

My automations are all handled via Node Red, but I also don’t use a pi…a pi is a toy to me.

MQTT Auto Discovery works wonderfully, as does Lovelace.

All in all, I don’t share any of the same problems as you.

1 Like

How do you have HA installed? Did you use the HassOS image, a generic Linux install plus Hass.io, venv?

Do you have your automation setup with initial_state: true ?

MQTT discovery works flawlessly for me, I have not encountered any issue with it on multiple HA installs across Pis, NUCs and PC based installs.

I use chrome on an Android phone and do not face any hangs or freezes when viewing the Lovelace interface. This has never happened to me.

If including the DB file in your snapshot is important to you, perhaps look at reducing the size of your DB. I have never faced this issue and snapshots save without issue even with large DB files (over 1gb)

I no longer use Z-Wave, so can not comment on it’s reliability.

I run an old Dell PC circa 2011 with Ubuntu and Hass.io at my business to automate switches and lights via MQTT, all added via discovery and all work perfectly and stays online unless I update HA, or there is a power outage long enough that the UPS runs out of battery. Until recently this installation was being run on a Pi3 without issue, with Pihole, Unifi and Google drive backup as well as running FEH to display images on a TV for in-store advertising.

I run a Pi3 at my parents house using Raspbian on an SD card and Hass.io, again, no issues and before I updated it to 0.103.0 a couple of days ago, it had been online and running for almost a month. This Pi also runs Unifi, PiHole, Google Drive backup, chrony etc - no issues whatsoever with stability or reliability.

I have no stability issues at all, across 3 different locations and installations, with a variety of hardware and use cases. Automations work flawlessly, Lovelace is accessible at all times from my phone, PC or laptop.

1 Like

my 5 cents

put on a real PC

I running mine on

HP EliteDesk 800 G2 Desktop Mini PC i5

and HAS not mis a beat

1 Like

I’ve been running HA for 3 years now, and I’ve had no real reliability issues. That said:

  1. Never seen this
  2. Sounds like an integration issue - it’s also possible that the TV won’t report the status when paused locally. Have you opened an issue?
  3. I’ve seen similar things with Sonos a couple of times, though that’s usually been down to the Sonos being upgraded. Have you opened an issue?
  4. OZW_Log.txt may identify what’s going on here, possibly a communications issue.
  5. Check Developer tools -> States - that’ll identify whether it’s purely a display issue, or an integration issue
  6. As (5)
  7. Sounds like an integration bug - have you opened an issue?
  8. That’s a HassIO bug - have you opened an issue?
  9. I’ve not experienced this myself, but I’ve seen others talk about it. Not that it looks like anybody has opened an issue about it
  10. Never seen that myself. Have you opened an issue?
  11. Have you opened an issue?

In many cases, unless somebody opens an issue then the odds are the developers are unaware that people are experiencing problems they’re not.

My recommendation is always:

  1. Work with the folks here or on the Discord server to see if the problem is something you’ve done (or not done), or an actual issue
  2. If it’s an actual issue, open an issue so that it can get fixed
1 Like

I’ve read your issues and the very good advice given in response.
Your install looks to be a ‘bit more exotic’ than a standard build with lots of 3rd party integrations.
Your profile suggests you are a newbie but your issues belong to a very experienced integration, so I can only assume that you are quite technically savvy and have progressed to this stage in isolation.
Kudos to you :+1:
However blaming HA for ‘an issue’ with a 3rd party integration is a bit like blaming Microsoft for a failure in your Nero CD writing software, it’s hardly fair on microsoft nor on the writer of the software if they are unaware of the problem (this is as tink suggests above).
Also you must be aware that HA has not even reached release candidate 1.0 yet so yo know it’s still in development (and probably will be even when it does reach 1.0 as so much is changing in the home automation arena).
Pick one of your issues post it as a lone problem and people can concentrate on that to help you.
Find the threads for your various problem integrations and see what they can advise.
Side comment : You say that HA shows substandard reliability, from my perspective and those above, it does not, so may I ask to what you are comparing it to ?
Good luck with your issues, I hope to be able to help with some of them.

1 Like

I have SOME of these issues:

  1. I’ve seen automations stop working. However, triggering via lovelace doesn’t fix them. As best as I can tell, when I have this issue, it’s network related. Either the sensor that triggers the automation can’t reach HA, or the device that is activated in the automation can’t be reached by HA. It doesn’t make a lot of sense, because my wifi setup is overkill for the space that I have. But, a few times, I’ve managed to see a similar issue on non-HA devices (i.e. I’ll notice the issue because an automation didn’t fire, grab my phone to check on HA and see that I can pull up the interface either. Then go into another room to try to pull it up and see that it’s working fine. )

  2. I had an issue like this before. My ZWave device wasn’t reporting it’s changes all the time (common with many battery powered devices, and even some AC powered devices). I configured those devices to be polled, and it went away.

  3. My Shellies work perfectly. Easily the most reliable pieces of Home Automation gear I have. I use the “shellies_discovery” python_script to get them loaded in HA.

  4. This happens to me all the time, but it’s device specific. I have many Linear WDZ-500 wall dimmer devices, and they are terrible. One of their (many) issues is that they don’t report state regularly. They need to be polled. Adding polling configuration gets me a state update within 10 seconds or so.

  5. MQTT, in general, is the most reliable integration I’ve seen. The only issue I have with it are that entities remain “used” even once the device is removed from the discovery topic. Beyond that, it works perfectly all the time.

  6. Yes. This happens. A lot. It hangs on my desktop as well. On my desktop, reloading the window doesn’t help. And while the window is unresponsive, I can open another tab, go to HA, and that tab works fine, while the other is completely stalled.

Hi Mutt,

I’m refurbishing a 90 year old fishing trawler I live on. The integration is going to be all vessel systems when finally complete.

Currently I’ve built sensors with ESPHome for water tank levels using ultrasonic sensors then using template sensors to convert distance into litre values. Have shelly devices running lights and power sockets for pump control when tanks are empty for example. Had to replace lights so installed switches and lights that can be controlled via HA.

Have multisenors, flood sensors and heating control thermostats on Z-wave with alarming for low temperatures.

NMEA2000 navigation system information via MQTT for location, wind and so on.

I’m currently working on automating a home built reverse osmosis watermaker including flow and pressure sensors, as well as automatic valve control for water direction flow (overboard on start-up and flushing and into tanks once flushed).

Also have some modbus and canbus for battery and power management to monitor power storage, battery usage and state of charge solar panel power generation and shorepower usage and generator status. The power system automation is currently separate, but working on integration.

Monitoring pump duty cycle for bilge pumps and altering if running too long/frequently.

Integrated WebOS TV, Sonos and Bluesound originally to get familiar with how HA worked before moving on to more “involved” integration.

I’m also looking at taking engine data from a NMEA2000 system to monitor fuel usage and so on.

Due to the systems needing to run on battery I’ve avoided a dedicated PC for power consumption reasons. The Pi 4 runs at approximately 25% utilisation and only a few watts and can run 24/7 without much load on batteries.

I appreciate I’m pushing HA hard with the number of integrations, however there are numerous standards I’m trying to integrate with, that are in themselves expensive to replace, so put in a position that collecting the data from many systems is cheaper than replacing systems. However, the difficulty seems to go up exponentially with more systems.

I’m finding that there are a lot of niggles with devices dropping off monitoring and HA not handling the recovery of reconnecting gracefully all the time.

Definitely appreciate that I couldn’t have achieved what I have without HA, however feels it needs a lot of looking after to maintain.

I’m overall very positive and happy with HA, just opening a discussion on stability in more complicated environments.

Hi Tinkerer,

Thank you for your reply. I’ve opened a few issues with some of the integrations, however I must concede not all of them. I’ll chunk them into smaller problems and open with the relevant teams.

Again, appreciate your time

Hi,

Thank you for replying, just a quick question around polling intensity. Did you find it affected battery life on the devices? I set polling on for a couple of devices, however noticed battery life significantly impacted.

It does, absolutely. I think the way sleeping devices work is, they wake up every so often to ask “are there any messages for me”? And, if there are, they have work to do which consumes battery. I’ve set the polling intensity low for battery devices for this reason.

My ZWave thermostat runs on batteries. They need changing every 6 months or so. My front door lock (which admittedly, has a lot more work to do) needs changing every 2 or 3 months. However, waaaaay back in the days when I used Wink (instead of Home Assistant) and didn’t have to manage polling on my own, whatever Wink was doing was better, because the device reported reliably and batteries lasted 6 months, if not more.

First, thanks for sharing your experiences. Sorry to hear you’ve had so many issues.

It’s been my experience (over the past decade) that, regardless of the selected software platform, there’s always something that breaks or misbehaves. However, you have, unfortunately, attracted more than your fair share of problems.

With the exception of automations and Lovelace, the problems all concern integrations. In other words, they are mutually exclusive and each one will need to be resolved separately, one integration at a time (i.e. not likely there is one culprit responsible for all of this bad behavior).

There is hope because, as you’ve undoubtedly read in other posts, not everyone is seeing the same problems with the integrations you’ve listed. Perhaps they’ll offer potential solutions.

I wish I could be of more assistance but I am not using any of the integrations you are and have not experienced the reported automation issue (“randomly stop working”). I suggest you open a new thread for that issue and post at least one of the troublesome automations.

1 Like

Wow !

All I can say is Wow !

You are really at the cutting bleeding edge here.

For some of these systems you really need reliability and the rest, maybe not so much.
I’d prioritise your list and work top down.
As swiftly says polling can sometimes be a great boon but I’ve also seen some people bog down their networks with pointless polling.
Thermostats for example … By definition a thermostat switches stuff on and off, do you need that ? The reason I ask is that I have 3 thermostats (well 6 actually, 3 hardware 3 software) and I ONLY use the hardware ones as sensors for the software ones. This allows the sensor to sleep as much as possible (saving battery) and I trust it to wake up and tell me if the temperature has exceeded either a delta T or if it thinks a time horizon update is required. I can then set target values for each software thermostat without forcing the device to wake. The values can be changed through the day according to schedule/circumstances/wim/fancy and it just works. My boiler switch is not z wave plus so it’s the only device I poll but 1 per couple of minutes is more than I need. In fact now that I have built trust in the heating I no longer think I need to poll as it always reacts when the system tells it to. (and I no longer use the manual override)
Trust battery sensors to tell you stuff when they have something to say.
Browsers - well people have different browser preferences and I’ve seen issues with all of them, it also depends how fancy you get with your frontend (custom cards, photo backgrounds, photo icons etc.), I don’t give a **** so I have a very plain frontend. Automate what you can and provide minimal status. The rest (settings and diagnostic stuff) gets put on its own page right at the back, try different browsers, there are even apps for apple/android. (I couldn’t personally recommend as my experience is slight)
You seem to be progressing pretty well, so let’s help in nailing down some of those niggles.
You have picked a mammoth application to test yourself and HA against, I wish you luck.

Currently I’ve built sensors with ESPHome for water tank levels using ultrasonic sensors then using template sensors to convert distance into litre values.

I’m interested in doing something similar to monitor my cistern. Could you provide some detail on your implementation?

Thanks for adding that bit of information. The environmental aspects of where all of these systems are operating may influence their behavior. When people are considering solutions to say, zwave glitches, they now know this isn’t in a two-storey wood-framed cottage or studio apartment!

I’m running it on a Pi 3B+. Prior to that, ran it for a few weeks on an Intel-based home server (I don’t remember the exact CPU).

A comment right off the bat: the Pi 3B+ is definitely not a perfect platform. Switching from a memory card to a USB SSD disk dramatically improved the responsiveness of my system. Some installations may also run into problems with power saving. I had to turn off the WiFi power management because it would sometimes make my installation appear to be crashed or frozen. If something on the Pi tried to send out packets, it works fine. But if something tried to contact the Pi, it would take noticeable time before the interface would process the packets, and in the meantime I’d get timeouts on requests. So if I tried to access HA from my phone, I’d have to hit reload a bunch of times. Or I’d have to hit Tasker tasks a bunch of times before a request from Tasker to HA would get through. Turning the WiFi power management off fixed that for good.

All this to say, one has to be careful about determining the cause of a problem. For a period of time, I thought HA was to blame for some of the Pi’s shortcomings. I’ve not used the Pi 4 yet but I’d expect switching to a SSD to also make a huge difference. I don’t know whether it is beneficial to turn of WiFi power saving on a Pi 4.

I’ve not experienced much stability problems that could be attributed to HA itself.

Never experienced this. Whenever I’ve had an automation I expected to run that did not run, I’ve been able to trace the problem to a mistake of mine (wrong expectation, typo, bad logic, an automation gotcha I did not know about, etc), or to a faulty device.

Can’t talk about points 2, 3, 4, 5, 7, 8, 9, 11, as I don’t use those integrations/features/tools.

I’ve encountered that problem twice:

  1. I’ve had that problem with GE devices very early on. It appeared that the GE device was doing something funky with state that HA did not handle well. However, it was unclear whether the problem was with HA itself or Open ZWave. I just returned the devices.

  2. More recently I’ve had that problem with some Inovelli LZW31-SN devices. Here the problem can be squarely attributed to faulty firmware, not HA.

Most of the other devices on my network are Inovelli gen 1 devices, and I have a few Leviton devices. I’ve not had any status issues with any of them.

Never had that problem, and I do use HA by accessing its web interface through my phone regularly.

I don’t know how fishing trawlers are built but perhaps it is a challenging environment for radio? If you have a mass of equipment blocking the way for instance, that’s surely going to impair radio transmission. We have a chimney inside our house which is enough to significantly degrade radio propagation. I’ve had to work around it both for WiFi and for the ZWave network. For WiFi, I have stations on both sides of the chimney. For ZWave I’ve added a couple of plug-in switches that don’t switch anything but only relay messages. Any ZWave device that is constantly powered on 110v would do for this.

Also, metal is problematic. Even if you don’t have a mass of equipment but they’ve used metal everywhere in that boat when they built it, that’s a problem.

This being said

It is definitely possible to run into badly implemented integrations. I was looking at the imap_email_content component yesterday. It is badly implemented. It does not use the IMAP protocol efficiently and it violates the principle of least surprise.

In brief, I use a nodemcu with a waterproof ultrasonic sensor that has the transmitter receiver in the same unit. There’s also a DHT sensor to monitor room temperature that the tank is in. The ESPHome yaml config is as follows for one of the devices.

esphome:
  name: tanksensorstarboard1
  platform: ESP8266
  board: nodemcuv2

wifi:
  ssid: "ssid"
  password: "per-shared key"

# Enable logging
logger:

# Enable Home Assistant API
api:

ota:


sensor:
  - platform: ultrasonic
    trigger_pin: D1
    echo_pin: D2
    name: "Starboard Water Tank Level Sensor"
    unit_of_measurement: "L"
    icon: "mdi:water"
    accuracy_decimals: 0
    filters:
      - lambda: return (1-x) * 1000.0 - 55;
      - filter_out: nan
      
  - platform: dht
    pin: D3
    model: DHT11
    temperature:
      name: "Tank Room Temperature"
    humidity:
      name: "Tank Room Humidity"
    update_interval: 60s
    
  - platform: wifi_signal
    name: " Starboard Water Tank WiFi signal"
    update_interval: 60s

  - platform: uptime
    name: "Starboard Water Tank Sensor uptime"

text_sensor:
  - platform: version
    name: "Starboard Tank ESPHome version"

The ultrasonic sensor is actually 5v, but works ok on 3v if the distance is less than about 2m.

The calculation is for a 1m x 1m x 1m tank. Sensor mounted at the top. Subtract detected distance from 1m (the - 55 is a calibration adjustment). The result is a litre value that is then displayed as a guage in Lovelace using colours green, amber and red for a quick visual indication of tank level with the value shown as well.

I am working on a sensor that discards the highest and lowest value of the last 5 readings and then averages the 3 values left to stop rogue values from the sensor.

Thank you for a very thorough and detailed response. Quick question, how straight forward is the migration from sd card to ssd installation? It’s something I’ve been considering, just haven’t bitten the bullet so to speak yet.

With regard to construction, the vessel is pretty solid and wifi installed with extra access points over ethernet backhaul to the router rather than range extenders. Where possible I’m using wired connectivity for all devices. Z-wave I have a few AC powered devices that repeat the mesh. Z-wave isn’t my preferred option for connectivity, however in general it does seem to be a case of you get what you pay for. The cheaper devices have proved less solid than some of the more expensive units.

Did you have better performance on the PC? I’d consider a low power PC for sure, just wanted to proof of concept on Pi before investing more heavily in HA hardware.

As mentioned before, most of the issues are more annoyances than out and out show stoppers, however the frustration comes when something was working and you’re moving on to the next integration and then have to go back and spend time “fixing” something that was previously working, if that matters sense?

The aim is to replicate the level of automation and insight that normally costs 10’s of thousands of euros for like 10 cents :slight_smile: Marine systems are expensive, for example a device that does the same as a sonoff 4ch pro r2 is in the region of 300 - 400 euros just for the switching unit excluding the control cabling and displays.

I like the way you’ve implemented thermostats, I’d not thought of using the existing thermostat as just a sensor. I’m going to look into this a bit more.

I’m trying to get commercial grade stability without spending much and I guess that’s the challenge :confused:

Thank you. I’ll look into this deeper.