I have had the exact same issue with two different hardware configurations. What happens is that after some period of time (usually days to weeks), the GUI will stop responding, and on the physical screen, I will see these errors scrolling by:
[timestamp] Buffer I/O error on dev nvme0n1p3, logical block 2, async page read
[timestamp] erofs: (device nvme0n1p3): erofs_read_inode: failed to get inode (nid: 274) page, err =5
This first happened when I was running HAOS on a Raspberry Pi5 with an MHAT.2 connected SSD drive. Thinking the drive was bad or something, I decide to replace the whole setup with a NUC, which is currently running HAOS v16.2 installed on the internal SSD drive. This was a fresh install of HAOS with a restored backup of the add-ins and configurations.
Since the errors are exactly the same on the old hardware and the new hardware, I’m now wondering if it’s NOT a hardware issue, but perhaps a software bug.
I’ve tried searching for these error messages, and everything seems to indicate a hardware issue. While not impossible, I would be surprised to have the exact same block fail on 2 different SSDs.
I appreciate any pointers or insights as to how I might be able to troubleshoot this further - even some way to confirm whether it’s a hardware or software issue.
I had done a fresh install of the HAOS OS, then restored from the last backup I had from my RP5.
I just redid the NUC again - fresh HAOS install, and it’s restoring the “last known good” backup now. I’ll see how that goes. If I still get errors, is the next step to do a fresh HAOS install, then restore just a portion of the backup (e.g. NOT all the add-ins) and rebuild everything? How do I tell if the backup has that EROFS partition in it (and to NOT restore it, if so)?
(And my apologies for the delayed reply - I haven’t figured out how to get the system to notify me when someone replies to my topics yet, so I didn’t realize someone had! ;-( )
Just chiming in for now to say that I feel your pain @kecepull, I’ve been struggling with the same “failed to get inode” errors. I’ve even gone as far as replacing a bunch of hardware (SSD, power brick, and RAM is in the mail), removing many add-ons/hacs integrations, and more. But the system keeps crashing with these errors every 2 days or so.
Definitely please keep us posted on how things go for you!
Another thing I just thought of to note, I had a similar backup/restore path, and have restored to different devices/install methods over time, with the current install being on a little Dell OptiPlex micro PC (running Home Assistant OS).
Hi, I am facing the same issues. First, I saw it on HAOS running on SD card. I thought that this was related to a faulty card. I changed the setup to use a fresh SSD, but still facing the same error.
The problem has gone away for me. The SMART drive stats for the NVMe SSD on the new NUC indicated that there were errors with it, so I ended up replacing the whole NUC as it was still under warranty. I installed HAOS and restored the latest backup, and I haven’t seen the error once on the new box! So, it would appear that it was a bad SSD causing the problems (although that doesn’t explain why I saw the errors on TWO completely different sets of hardware, unless it was a genuine coincidence that both SSDs were going bad at the same time?!).
I don’t know if this helps anyone else, but for me, at least, a new set of hardware fixed it, and I suspect it was a bad (or going bad) SSD.
To my great surprise, the problem seems to have gone away for me as well. This is after replacing/upgrading the RAM in my little Dell micro PC running Home Assistant OS.
Interestingly, what I noticed after upgrading the RAM from a single 8GB stick to a matched pair of 8GB sticks (so 16GB total), is that the system fairly quickly started consuming more than 8GB of RAM and has stayed steady at around 70% usage. I’m still digging into what is causing the high memory usage, but this seems to be the root of where the issue was coming from for me (the system was trying to use more memory than was physically available). Despite the high memory usage, my Home Assistant machine has been running more stable (crash/reboot free) for longer than it has in quite a while.