Reliability Update

I’ve been running HA on a Pi3 for over three years now and have always had issues with long term reliability. I have a lot of integrations / automations etc and just wanted to share a few tips for smooth running.

This is the last 3 months memory usage and you can see each crash as a gap in the data which correlates with a dramatic increase in swap usage. My Pi has 1Gb of RAM and a 2Gb swap file.

The CPU shows the system is not in any way stressed,.it’s just the memory usage which is pressured.

The system would suffer from two types of crashes. Firstly about 8 out of 10 updates would result in a crashed non responsive system, secondly apparently random crashes after several weeks of operation. In both circumstances only a power cycle would get the system back up.

How to get it stable?

  1. Disable ALL entities that you don’t use. Many of my integrations have entities that I never accessed, disabling these reduced my entity count by over 600.

  2. Don’t record high frequency entities, like those which update every few seconds and/or constantly change by small amounts. You can create another sensor for these entities that samples at a lower rate/resolution and record these instead.

  3. My final tip, RESTART the HA core every day. This is the one thing that has prevented all my crashes (so far) and as you can see from the graph, stabilised the memory usage.

- id: restart_home_assistant
  alias: Restart Home Assistant
  initial_state: true
  trigger:
    platform: time
    at: 02:40:00
  action:
  - service: homeassistant.restart

Short list, and the restart every 24hrs does seem a little drastic, however, nothing else helps with the clearly appalling memory management. Tracking down what is responsible for this memory usage could literally take me years of testing on the constant shifting sands of HA…, so it’s not going to happen!

I also detect higher memory consumption since last couple of releases.

In the past I also had memory leaks causing unresponsiveness and crashes like you are observing. But most of the time when I killed all add ons and custom HACS integrations they were not happening anymore. Sometimes by bringing them back one-by-one I was able to detect which one was actually the one causing this. Quality of these addons/custom integrations is not always that good, it strongly depends on the quality of the developer.