Raspberry Pi and SD card damage

Jtarche · September 13, 2019, 9:12am

Hi,

Do you know, how to minimize the risk of SD card damage on RPi?

How often your card is damaged?

VolkerKa · September 13, 2019, 11:07am

Hello, I’m not a professional, but what I can give as a tip is:

Always shut down the system properly and do not unplug the device during operation!
Best to use the original power supply of the Raspberry Pi, if not available use a power supply with amps sufficient (2.5A)!

Please correct if wrong indication …

Take a look at this add-on, it makes sense …

https://community.home-assistant.io/t/hass-io-add-on-hass-io-google-drive-backup/107928

Tinkerer · September 13, 2019, 11:20am

Minimise the amount of writes
Use a suitable power supply (not a phone charger)
See (1)
Don’t disconnect the power while the operating system is running
See (1)
Use an Industrial SD card if possible
See (1)

HypnoToad · September 13, 2019, 6:59pm

How often your card is damaged?

I’ve had 2 damaged cards from around 50, so that is a pretty high percentage failure rate: ~4%.

I use ‘f3write’ and ‘f3read’ to check if the card is damaged, often the card can appear to be damaged but is actually fine after a reformat.

Do you know, how to minimize the risk of SD card damage on RPi?

As above, I use a Read-Only mode on some Pi’s, and in this case you can power off the Pi without risk of damage.

jeremylee_7 · September 13, 2019, 8:49pm

Hello,

I have had two cards corrupted. The second one was even after I put the database on a SQL server running on my file server.

I ended up using raspbian + hass.io running in docker booting from USB flash drive (this on a Pi 3 B)
As of 8/30/19 docker does not work properly in buster, so you will need to use raspbian stretch

PeteCondliffe · September 13, 2019, 9:51pm

All of the above plus

Optimise the recorder - exclude anything you don’t need any history for this will help reduce the number of writes

Get a nice big quality SD card - The 8 or 16gb shipped with RPi kits are cheap and just don’t last. It may be enough for the install and your data but data is written to SD cards evenly to avoid constant rewrites of the same blocks. I use SanDisk Ultra 64gb (about 1 year so far) seems to be holding up well for me, however I do backup regularly and have a spare for the day it does give up.

HypnoToad · September 13, 2019, 11:39pm

I find the iostat command is handy to see how many writes there are to the SD card.

I find actually having the Database on another drive, be it a USB drive or NAS can both stop card damage and increase speed.

Another option is to disable the recorder/database totally but I don’t think many people would like this option.

I have had two cards corrupted. The second one was even after I put the database on a SQL server running on my file server.

When you say ‘corrupted’ do you mean damaged as in the card fails f3write/f3read tests or mounts as read-only? Or does ‘corrupted’ mean the card is OK after formatting?

paulcam · September 14, 2019, 8:55am

So Linux, unlike Windows uses asynchronous file systems by default. Put another way, Linux, by default uses “cached writes”. Not just when you write a file, but any disk operation.

What this means is that when you update a database(*) or append/create a file the file system driver writes to memory mapped to the disk sector and considers the operation as “done” and moves on. However that memory is not “flushed” to disk immediately or in some setups even frequently, but will remain in memory for a prolonged period of time and not actually written to disk. This can and does include disk structure data like iNodes and file metadata.

If you want to see this in operation put an SD card into your Linux machine and assuming your flavour of distro has not enabled synchronous writes as some now do, when you write a large, 2Gb file, say, to the disk the write will appear to complete in a few seconds. Now pull the SD card out without “Safely removing it”. Put it into another computer. The SD card will either be corrupted, won’t contain the file or will contain only part of the file.

If you repeat the experiment and write the file, then use the “Eject”, “Safely remove hardware” or “umount”, options you will notice that that operation takes quite a long time. The “Eject” or “umount” request forces the OS to flush the cache and so any pending writes are queued at that point.

If your PI is forcefully shut down, by cycling the power, forcing a BIOS reset or forcing a CPU halt, any cached writes in memory or any differences between the physical disk and it’s memory mapped mirror will be lost.

If you are unlucky one of those cached writes was a partial update to a disk meta data structure such as iNodes or FAT tables and what is actually on the physical disk is now unreadable. Usually on Linux filesystems this happens in confined areas, with directories showing up with broken filenames, sizes and modes and the disk effectively crashing when you try to read them. Sometimes files end up in “lost+found” with just numeric names (good to know when recovering data!).

Journalling file systems like ext4 provide some protection at the disk level, writes are transactional and the journal records all operations and they only take effect when “closed” or “commited”. Uncomitted writes can be undone, with lots of complexities and caveats of course, but are not helpful with unwritten data still in the cache.

Forcing synchronous writes is another option which will greatly reduce the chances of corruption, but it will also murder performance.

There is only one way to remove this threat completely and that is to make the SD card read only. However a lot of Linux software will simply not function on a read only disk. The mechanism used to get round this is to create a RAM disk and move and or map the areas of the disk that need to writable into RAM. /var /etc/ tmp/ /home are candidates.

This of course means you loose any state in those folders when you reboot.

You can create a hybrid with most of your SD card read-only (mounted read-only) and small portions for writable stuff in separate partitions. This limits the scope of any damage to a degree.

If you have a more reliable storage mechanism on your network you can “export” a file system with NFS and have a raspberry PI mount it over the network at boot. The PIs can all contain the same tiny image which only has enough to “boot from network”, the actual filesystems like /usr /var /home etc. are remote on a secure NAS with RAID etc. Note, this requires you have a VERY stable NAS and network, highly recommended it’s wired. NFS shares dropping out and coming back is nearly fatal in this setup and a reboot of all clients is usually the only practical recovery.

There are things you can do for quick wins. Disable “Swap” completely. Create ram disks for things you don’t care about losing, such as /var from which you will lose logs, journals, spools and caches. Move things you want to keep and cannot otherwise restore quickly to network drive cloud storage or github. You can automate this to some degree.

DavidFW1960 · September 14, 2019, 9:01am

According to Pascal, @pvizeli one of the main devs, the #1 cause of SD-Card issues is an inadequate power supply.

My personal choice is to over-specify the maximum current the power supply is capable of… so if the spec is a 5v 2.5A, I will get a 5v 3A or greater… It is also important to use a quality SD-Card of course… look in the camera forums for ideas or follow the recommendations here. We currently have Samsung EVO cards 32GB advertised for $6 (Australia)

Cloolalang · June 2, 2020, 8:55am

:(… From Sandisk web site…

This warranty does not cover use of the Product in connection with the following uses or devices (as determined by SanDisk):

(vii) continuous data logging devices like servers

btsimonh · January 25, 2021, 8:49am

with the right stuff at the OS level (read only FS with something like a JFFS2 overlay for updates and a ramdisk overlay with periodic copy to the JFFS2, there is no need for the SD card to not perform for a very long time. I have an orangepizero where the current card has been in place for over 5 years.
However, before this type of adaption, the card would regularly fail every 3 months.
Are we saying that the hassio OS for RPI does not have these kinds of protections in place?
Now I’m a little scared