I started with HA a few weeks ago. It is running on RPi4, 8GB RAM. I installed it on a 64GB card and it went fine with KNX integration and a few Tasmota things communication with MQTT.
A few days ago, the system told me that the 64GB card was full. So I deleted a few backups, shut down the machine, made an image of it with Win32Imager and restored it on a new 256GB SD card.
After that it came up and I installed Grafana, InfluxDB and Glances. Did not many configurations because I have other projects at the moment.
Two days later I found out that the system is responding very slow if it is responding after all. So sometimes the website will not even show up. Data integration from MQTT and KNX has big holes in it and the switches handling KNX do not work most of the time.
I deinstalled Grafana, InfluxDB and Glances but that did not change anything.
So I tried connecting it with LAN instead of WLAN. No changes.
I jumped in the console and at least pings tell me network times between 1 and 30 ms. But the console also hangs often.
Then I made a backup and restored it on a new and fresh installation (balenaEtcher way) on a new SD card.
Ok, system was unstable again that morning. I migrated the complete system to a VM running under Hyper-V server. It seems to be quite quick and responsive even after installing InfluxDB and Grafana again.
Migration was done by downloading the VM Imgae and restoring the backup in that. Nearly the same as I did with the Raspi4 instance.
But that should not be the final stage - I want it up and running under the Raspi4 hardware again, so any ideas how to continue are welcome.
Best regards,
Roger
Maybe it’s related? – I also get some high CPU usage from time to time. It get partially resolved after a restart but happens from time to time again.
@nickrout: Will try to get the log. Just have to restart that thing and wait till it freezes. Where can I find the log? Don’t tell me GUI - because that will be inaccessible then. I need the location on the filesystem because the server still reacts to a ssh connection.
@Camatobe: Will try to lok at it. At least in the times with high lags but still responses, the CPU was below 20%. But I will look at it when it freezes again.
Ok, now I have the log. On my actual running main system, there are 4GB of log and on that faulty one there are a few hundert MB of log. What secure data do I need to delete from it? I found my static IP and my address and deleted it. Any other data to delete?
Here you see the grey lines in the red status line. Every grey line indicates missing data.
I switched the system to on about 8 hours ago, it got new data and then showed a static line for that data - so no new values. CPU was at about 3 to 5%.
Then I rebooted it because I wanted to see data again. CPU is now between 30 and 50%, memory is maxed out, missing data as shown. (KNX and MQTT data)