Hello Community,
I hate generic questions, but I’m starting to get the crisis here. My Home Assistant installation is now dying in under 24 hours, with random error messages about components or IO errors.
I have already reinstalled HassIO several times and restored snapshots. I had a Pi, changed the power supply there, I connected the Pi directly to the router instead of via D-LAN, I deactivated all custom componentens… Now I bought a completely new NUC out of desperation and because I suspected a bit the SD card. Yesterday evening snapshot restored, ran through the night, this morning again hundreds of errors in the log, now dead.
Sometimes it’s the Yeelight lamps, sometimes my Bluetooth thermostat, sometimes MQTT, sometimes Wemo, sometimes the network is allegedly gone, sometimes the filesystem read-only…
Here are some excerpts from my log, sometimes I can’t even save it before I can’t get to it anymore: https://pastebin.com/WFVQjKmm
I’m slowly going crazy here! Does anyone have any idea what is going so wrong here?
This is a long shot but if you happen to have the Duckdns hass.io add-on enabled, you should turn that off. I once had really strange issues I couldn’t diagnose. It sounds similar to your issue, where random components would start failing slowly, basically couldn’t make a network connection. I turned off the add-on and haven’t had the issue for nearly a month now.
SSH already dead, can’t restart the addon, System log empty, can still use the GUI to interact with my devices, more more error messages are coming: https://pastebin.com/A0ykyvmF
You didn’t mention what version of HA you were running on or what hardware it was running on. I’ve noticed a bit of instability in 0.91.2 but I see today 0.91.3 is now available. Might be worth updating.
If using an SD card it may be going bad. As mentioned above try deleting the DB and the HA log files too.
I flashed the image on my ssd with etcher. Could this be a problem? That the filesystem got not resized or something? Or should the sensor display the not-resized size then?
That would explain the short time where it runs fine
Honestly you’d need to check the kernel logs for what’s going on fully, and to do that you’ll need SSH access to the main OS or mount the disk in another instance of linux somewhere.