An update with good news and bad news from this morning.
The system once again at around midnight local time decided to consume a bucketload of system memory.
The good news: with the increased swap space, the RPi itself still lives.
The bad news: the python3 process running homeassistant died.
The “homeassistant” container has crashed.
Looking at the home-assistant log (/mnt/data/supervisor/homeassistant/home-assistant.log from HassOS) the final entry is at midnight local time (15:00 UTC) and this is reflected in the last-modified timestamp of the file.
2020-06-03 00:00:18 WARNING (zeroconf-Engine-1848242208) [zeroconf] Exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/zeroconf/__init__.py", line 1292, in handle_read
MemoryError
Even though the MemoryError is written here, and the process doesn’t fully die until an hour later, the high memory consumption and CPU usage spikes about an hour prior.
At around 13:55:11 UTC is where memory and CPU utilization begins to climb.
With some debug logging enabled, I also see that even “DEBUG” log entries from homeassistant.core cease. Final DEBUG entry is at 22:55:01 local time (13:55:01 UTC) just prior to . The debug entries are only regular sensor update information, “Bus:Handling <Event state_changed[L]> …”.
Over the following hour, for “python3 -m homeassistant --config /config” the CPU is pegged at 100% and memory utilization climbs from a typical mere 335 MB VIRT (200 MB RES) to 1169 MB VIRT (~700 MB RES) in just 5 minutes (13:59:52 UTC). CPU utilization hangs back a bit (possibly because the SWAP isn’t being hammered.)
It kicks up again at 14:05:27 UTC, with 100% CPU and gradually climbing memory usage. It plateaus again at 14:06:33 UTC with 1384 MB VIRT (740 MB RES).
This remains stable until 14:19:12 UTC, going up further to 1652 MB VIRT by 14:20:34 UTC.
Rinse and repeat at 14:37:15 UTC, climbing to 1987 MB VIRT by 14:39:02 UTC.
15:00:02 UTC we climb again, to 2029 MB VIRT by 15:00:20 UTC. This is around the time that MemoryError happens.
The python3 process lives for another hour at the same memory utilization and approximately 15% CPU utilization on average. Then it dies and goes away.
How in the world am I supposed to troubleshoot this further?!