My system is dying and I can't figure out why

I’ve been a HASS user for many years, and over the past week I’ve had some strange things happening. Currently running on, 0.78.3, on a RPi3.

The most noticeable thing is that some automations just…won’t fire. I’ll open the UI to check if the motion sensor that’s supposed to trigger them detections motion, and it does…but the automation doesn’t run (and yes, the automation is enabled). At the same time as that’s happening, some other automations may work perfectly normally. These are all automations and devices that I’ve had in place for years without issue.

The other thing that’s happening is that I’ll lose access to the UI. Attempting to connect will fail. Yet I’ll still, for example, get push notifications, and some automations will run, so I can tell the system’s not completely dead.

Throughout all this, I can maintain SSH access no problem, and I can see:

  1. No serious errors in home-assistant.log (just normal stuff like Hue taking too long to update sometimes)
  2. CPU/memory usage on the Pi is fine
  3. SD card space usage is fine (around 28%)

I can restart from the command line (hassio ha restart), and for a short time everything runs fine. Then it starts to degrade again, with some automations not firing, some device states getting “stuck”, etc.

What could be going wrong? Could my SD card be failing? If so is there some way to verify it? That’s the only thing I can think of because I can’t find any meaningful clues otherwise. Any advice would be welcome.

How big is your home-assistant_v2.db?
I sometimes see similar issues, including stats sensors going nuts, when my database file gets larger than 3GB.

If not the SD card, maybe your power supply is on the way out?

Good idea, but I don’t use the file DB for that very reason (ran into issues a couple years back when it got big). I have MariaDB running in a docker container.

Eh, I’d guess the SD card is more likely, but I’m trying to figure out a way to verify. Regardless, I ordered another SD card anyway, but going through the restore process is probably going to suck so I’m just looking for alternatives before I get to that point.

I actually had problems with automations sending IR codes via my GC Flex units last night. Some would send some wouldn’t. Restarting HA did not help. I had not changed anything for a day or so. Nothing relevant in the logs.
As the only common link for both devices was my remotes.yaml file I replaced it with one from a backup and all was well again. I have been scouring the files (backup and problem file) line by line and can not see any difference. I’m hoping my SD card is not on the way out too. I might look into booting from USB.

This doesn’t solve the issue; in fact running MariaDB on a SD card is a good way to kill the card :wink:

I’m using rpi-clone to keep a backup of my active RPi3s - worked great last time when I had an issue with the card.

True, although it’s not really a size-related issue like the default file DB. That thing definitely hits performance limits around 2-3GB.

Maybe I should try to hang a USB drive off the RPi and setup the database backend there.

Yes, or as a quick test just go back to the file based; and place that on the USB. Easy to see if it solves the problem.

Remember most SD’s do not have a proper controller with wear leveling and spare block management like a SATA/SAS/NVME SSD. They were created as a compact and simple storage for consumer electronics media; and not really up for running a system off of.

You can find SLC versions of SD that do a lot better; but they are getting harder to find and expensive.

I had performance issues running my db on a USB-stick port of my Pi3.

I purchased a ’ High Endurance Video Monitoring Card’ and moved the db back onto it:

1 Like

For what it’s worth, I too was having very similar issues recently (unfortunately out of town for the start of them). After lots of testing and reverting backups, I started wondering if it did have something to do with my DB. For the heck of it, I blew it away my MariaDB instance (history isn’t critical for me) and reinstalled. Boom, everything has been flawless since.