Hi All,
It feels like since v0.114.x I have an issue which first showed as unavailability of all bulbs connected to my deCONZ. Digging a little showed relatively high loads on CPU. Since I noticed a high memory consumption (80%+) caused by the UniFi addon, I decided to first replace the RPi3+ for a RPi4b-8GB (I missed out on the fact that 4GB at most is currently supported, so am running a development build as mentioned here. While doing that, I thought it would be good to use a brand new Samsung EVO 32GB SDCard and I use the official 15w power adapter.
Besides the pleasantly low memory this unfortunately was no solution. So the issues of which I think might relate to each other are basically.
-
Unclear high CPU loads
I first noticed a trend of CPU spikes at a frequency close to once an hour, and sometimes a spike in between. At the moment it is constantly higher, but that might be caused by Glances. I don’t know.
The high CPU load annoys me because HA gets slow at turning on lights in response to motion detectors for example. I am not 100% sure issue two relates, but there is some coincidence at least. -
Random unavailability of (IKEA) bulbs and sensors of brands like Ikea, Philips Hue and Xiaomi
I started monitoring this by using a trigger on the unavailability of the deConz ALL-group, so I was sure that all bulbs are unavailable. Last evening/night this happened 5 times (23:22, 00:42, 01:53, 02:39, 03:20) and during daytime 4 times untill now. The message sent contains this data:
CPU loads: 1.1/0.8/0.81
CPU perc : 5%
RAM perc : 17.3%.
CPU and memory are always about the same and load was at most 2.77, today (during daytime and so activity in the house the load was at most 4.77 when stuff got unavailable.
So besides a ton of topics, this is what I’ve done and where I am currently standing:
- Update the deConz firmware (to 2.05.79 / 26580700 at the moment)
- Change channel from 15 to 25. I never had an issue with 15, but perhaps the WiFi in the area changed, so I did an analysis and 25 is pretty quiet.
- Double checked and where needed updated the bulb firmware
- Restarted and rebooted everything multiple times, as switched hardware (From Pi3 to 4)
- Relocated the Pi
- Removed the HA database and set ‘purge_keep_days: 1’, this was only 5 in my case by the way. DB is very small, couple MB.
- Checked the mesh environment and for as far as I understand everything is fine.
- Because I wanted to know if it was the disk IO causing the problems installed Glances using the addon. As you can see in the image
– the read/write to the disk isn’t very high when the CPU load is higher;
– critical CPU_IOWAITs feel relatively frequent, for example a very long one at 09:15, when the Conbee nodes came unavailable. In my alert it said: CPU loads: 4.24/3.05/2.09 | CPU perc : 11% | RAM perc : 17.6%.
That’s why I think there is a relation and decided to write this post. Next to that, the unavailability issue worsted while running Glances (on screen), what increased the CPU usage of course.
I hope I didn’t forget anything I tried, but right now I have no idea what to do to solve this issue. I am happy to buy a SSD drive (for example), but do want to know that it is solving the CPU issue. Looking at glances I am not convinced. If issue 2 remains after solving issue 1, than that is a different thing to tackle.
But one thing is sure, I need your help to troubleshoot.