I’ve been a HASS user for many years, and over the past week I’ve had some strange things happening. Currently running on Hass.io, 0.78.3, on a RPi3.
The most noticeable thing is that some automations just…won’t fire. I’ll open the UI to check if the motion sensor that’s supposed to trigger them detections motion, and it does…but the automation doesn’t run (and yes, the automation is enabled). At the same time as that’s happening, some other automations may work perfectly normally. These are all automations and devices that I’ve had in place for years without issue.
The other thing that’s happening is that I’ll lose access to the UI. Attempting to connect will fail. Yet I’ll still, for example, get push notifications, and some automations will run, so I can tell the system’s not completely dead.
Throughout all this, I can maintain SSH access no problem, and I can see:
No serious errors in home-assistant.log (just normal stuff like Hue taking too long to update sometimes)
CPU/memory usage on the Pi is fine
SD card space usage is fine (around 28%)
I can restart from the command line (hassio ha restart), and for a short time everything runs fine. Then it starts to degrade again, with some automations not firing, some device states getting “stuck”, etc.
What could be going wrong? Could my SD card be failing? If so is there some way to verify it? That’s the only thing I can think of because I can’t find any meaningful clues otherwise. Any advice would be welcome.
Good idea, but I don’t use the file DB for that very reason (ran into issues a couple years back when it got big). I have MariaDB running in a docker container.
Eh, I’d guess the SD card is more likely, but I’m trying to figure out a way to verify. Regardless, I ordered another SD card anyway, but going through the restore process is probably going to suck so I’m just looking for alternatives before I get to that point.
I actually had problems with automations sending IR codes via my GC Flex units last night. Some would send some wouldn’t. Restarting HA did not help. I had not changed anything for a day or so. Nothing relevant in the logs.
As the only common link for both devices was my remotes.yaml file I replaced it with one from a backup and all was well again. I have been scouring the files (backup and problem file) line by line and can not see any difference. I’m hoping my SD card is not on the way out too. I might look into booting from USB.
Yes, or as a quick test just go back to the file based; and place that on the USB. Easy to see if it solves the problem.
Remember most SD’s do not have a proper controller with wear leveling and spare block management like a SATA/SAS/NVME SSD. They were created as a compact and simple storage for consumer electronics media; and not really up for running a system off of.
You can find SLC versions of SD that do a lot better; but they are getting harder to find and expensive.
For what it’s worth, I too was having very similar issues recently (unfortunately out of town for the start of them). After lots of testing and reverting backups, I started wondering if it did have something to do with my DB. For the heck of it, I blew it away my MariaDB instance (history isn’t critical for me) and reinstalled. Boom, everything has been flawless since.