This is pretty well all my fault and no blame on HA, but more a general comment/advice. With my experience I should have known better…
I have a debian system on x86-64 running supervised.
Too long to describe the full sequence, but I suddenly noticed that my regular backups were not completing, and I had no backup other than the ones that the system does on an addon update. There were often dB errors in mariadb. At the same time I noticed errors in dmesg, which alerted me to the fact that my ssd drive might be dying.
Unable to fix any of that, I became worried that although I could replace the hard drive, install debian then supervised and restore a backup, I didn’t have a backup. I didn’t want to lose 90 days of database, or my automations etc.
I rebooted the machine and debian said the root filesystem was no no good and it ran fsck, but it wouldn’t complete. Several times. I got a little panicy.
I “fixed” the hard drive with a boot usb with ubuntu on it, ran fsck. It took several goes, and a long time. Some of the repeated tries were possibly the result of impatiently rebooting when the fsck process was taking an inordinate time with no feedback.
Eventually the disk was rendered usable and I was able to start debian and the supervised home assistant. I immediately initiated a backup which finished 1 1/2 hours later. I sent that off to another computer. Then I tried to fix the mariadb errors, but I completely broke mariadb. It would no longer start.
I restored the mariadb section of the backup I had taken when the system started to semi-behave, and now all is sweeetish, but I still have a potentially dying SSD which I will need to replace this weekend.
Lessons, some of which I should have learned years ago:
- Make sure your backups are actually backing up
- Keep an eye on your hard drive, at any sign of error, it is probably dying and needs to be junked
- If your database is repeatedly erroring, see 2 and check 1.