Apologies in advance for incorrect terminology, I’ve got a few decades of Windows experience but only 1-2yrs with this platform/linux.
I’ve got a 4GB RPi4 (a Labist kit including their power supply) running Debian 11 from a 64GB SanDisk MicroSD card that no longer boots (working fine for a year or so) and I don’t know enough to find out how I can even troubleshoot the issue. The system has been hard rebooted a number of times due to bluetooth/wifi issues (the device lives in a solar cabinet to help me control inverter/BMS and I do wonder if heat was an issue, though the Pi has a fan and heatsink kit) I’m wondering if the MicroSD card is faulty/the file system is damaged.
Here’s a few images of the boot process and the card contents, the boot process pics are chronological, but do overlap a bit - Imgur: The magic of the Internet
On boot I get the usual rainbow screen then it proceeds as shown in the images until it eventually just stops progressing, after a bit I do see the “random: crng init done” but then nothing after that. If I try to ALT+CTRL+DEL it fails and says system halted. It responds to KKB/Mouse being installed by loading drivers and the KB “responds” only as much as showing caps lock light on/off etc.
I’ve tried;
Booting with and without KB/Mouse connected in each of the ports (this was originally running headless)
Examining card contents in a Windows machine, though I cannot edit anything, as I was going to try altering the cmdline.txt file (contents are as follows);
console=tty0 console=ttyS1,115200 root=/dev/mmcblk1p2 rw fsck.repair=yes net.ifnames=0 rootwait
I did wonder if the root device was wrong?
Swapping in another (smaller capacity) MicroSD with an old half done HA install on it, which boots fine but doesn’t give me a useful environment to work from.
Booting with the MicroSD card in a USB reader and plugged into a USB2 port.
How do I actually see what Debian is trying to do and failing on? The SD card only has an image file on it and a few other bits and pieces, no obvious logs.
Is there some kind of boot loader I should be able to get to for loading recovery environments etc?
There’s a line in the boot process that says trying to unpack rootfs image as initramfs does this offer any clues?
Really stuck and whilst I have Home Assistant backups, I do not have a backup of the Debian environment (including my docker setup which has other containers in it).
Fact. ALL SD’s will wear out. Fact. HA (especially before some of the recent changes to Databases and the history graphs in recent builds) are VERY HARD on SD cards. It eats them for lunch. Usually in about 8 months to a year. They simply weren’t designed for the constant writes.
I’d be incredibly surprised if it was anything else. I’m sorry to say you’re probably looking at a rebuild - and I wouldn’t do it on an SD - I’d use an SSD instead.
Thanks Nathan, good to know for future builds (which I will probably go NUC/SSD).
In the mean time is there anything I can do to verify if the SD is toast or any other way to read logs and try and get an error message out of debian to give me something to go on?
I left it overnight, it’s still sitting at the “random crng init done” line. Still appears responsive in that a mouse being plugged in is detected and drivers load, but nothing else.
I can leave it while I’m at work and check it when I get back but I’m a bit doubtful.
It doesn’t respond to enter or any other random keypresses I’ve tried. Normally it would proceed past the random crng line all the way to a login prompt.
It will respond to ALT+CTRL+DEL but says reboot failed system halted.
It’s like it’s at the point ready to hand over for the proper boot process to start but never does.
I’m currently installing full Debian with a GUI on old laptop of mine with the hope that it will give me the ability to mount this file system somehow and see what’s up (no idea how to do that though just yet).
As for the RPi and its install, is there any way I can pause the boot or see logs of some kind to understand what’s going on? It’s not nearly verbose enough. Can I scan this MicroSD in another machine to check the hardware and filesystem integrity?
It is not unlikely that the SDcard is simple dead. They have a limited lifetime due to their limited number of write operations possible per sector.
Try to install on another card.
Is there a way to confirm this to be sure as I’d love to recover some things from the file system, I have Debian up on that notebook now so can use it for recovery work.
To confirm if a SDcard is dead then you really need to make writes on a representative number of places on it. In order to do that it would be best to have a clean card, so backup everything you can on that card and then test it.
Backup can just be copy of all files, but remember the hidden ones too!
I’ve just mounted the card on the Debian notebook, previously I could only see the RASPFIRM directory, now I can see the ROOT directory which appears to show my entire previous file system.
So the card is “readable”.
Is it safe for me to attempt writing some random files to it or should I be performing some diag/recovery work first.
Backup first!!!
If it is dead, then any operation on the card can make it worse.
Get as much out as possible before the card dies completely and before you attempt anything else with it.
It’s saying it cannot enter various folders to copy them which I assume is a permissions issue, do I risk selecting all and taking ownership/modding permissions to allow a copy?
It is up to you.
If you do it, then you might not be able to get the current installation up and running again until you have restored those permissions, but in case of reinstalling on another SDcard, then I would do it.
So no good, I tried to do a repair but it errored out saying it couldn’t write the superblock.
Further testing shows I couldn’t actually write anything to the card, it was only caching making it seem like I could because files that I’d tried to create were gone after dismount/remount.