I’ll admit, I have a somewhat unusual Home Assistant configuration: >100 loads (Vantage Lighting and Zwave), as well as using many integrations. But, I’ve always managed to get away with having my system all work just fine on a Raspberry Pi 3, with the recorder sending all data to MariaDB on a Synology. It is a testament to how efficient and robust Home Assistant is. But…
For the past 6 months or so I’ve been having Home Assistant repeatedly disconnect from my Elk-M1 system. (Which is connected via a serial-to-USB connection.) And when it disconnects, I need to reboot Home Assistant to get it working again – which was annoying, but fine.
But… over time, the problem got worse. And worse.
So I decided to start debugging it. But I found that the debug statements in the Elk code were not enough for me to figure out what was going on. So I figured out how to log into the HassOS container, and modify the source to add more debug log messages.
And in the process, I noticed that the system was really slow. And frequently would hang a bunch.
Uh oh. Is my SDCard failing? Rather than get a new SDCard, I decided it was finally time to upgrade my Pi3 to something better. So I bought a Pi5 system.
And it was only once I brought up the new system from a backup and looked at my System Monitor graphs to see how it was doing that I realized what was going on.
The memory use of Home Assistant have apparently gone up over time with new code. And I had finally reached the point where my Pi3 was constantly swapping. And the periods of extreme lag I was encountering (which also were killing my Elk-M1 connections) were when the swap process was going nuts trying to free up pages. And since my Pi5 has 8X the memory of my Pi3, I no longer have any issues.
Which brings me to my feature request: add a feature which monitors swap usage, and when it goes too high for too long, generate a notification. Tell users “hey, you seem to be running low on memory, perhaps you should consider fixing this”?
(I guess I could also file a bugreport on the Elk-M1 code for not properly reconnecting after it gets unsynced, but it is better to just avoid getting in that state in the first place…)
^ Guess where in this graph I upgraded the hardware?