Reliable solid state drive

Disclaimer: I have read as many of the postings about what I am about to question as possible.

What is a good quality brand of solid state drive (ssd) for use with HA? I have had two fail now, and I am weary of the reliability issues. I am using a RPI4 with 4gb of memory. I used an old desktop prior to this and had few issues with the ssd, but the pi has not proven to be as reliable. After a week or a month the drives just fail. I tried changing the USB interface, but the same problem occurred,

They should not do that. Maybe the SSD enclosure is to blame ?

Yes, they did. Moreover the enclosures were changed to a model recommended by another poster. I have also made sure the RPI is getting a full 3A VDC, the cables are short, etc. I am able to reformat the ssd’s and they work after I lay a new ext4 partition on them, but I do not trust them, except for storage. Another question: Why would the enclosure only work part of the time? Were it bad, wouldn’t it always be bad?

Have you used some tool to verify that the SSDs actully are broken?

Gpartd. The file system is corrupted. As I wrote earlier, I re-partitioned and reformatted to ext4, same thing happens

Try using a tool from the manufacturer, corrupt data does not necessarily mean a faulty (or two in your case) SSD. Something that scans the health of the SSD, not the file system.

What manufacturing tool? I’m using Linux, and there is Gpartd. In fact that’s what most of the manufacturers use. It can scan of bad blocks.

Isn’t gpartd just a partitioning tool? I wouldn’t know what manufacturer made your SSDs, you haven’t said.

No, if you add the entire suite of file system tools, gpartd can check and recover too. I have been using it for many years. The drives were branded Crucial, BTW. No tools came with them.

OK, doesn’t say so here

They have some tool that is Windows only. Crucial Storage Executive Tool | Firmware Download |

What does your SMART say, if they are broken it should show some bad numbers?

There are no bad blocks. I have not used Windows since 1995. My work was 100% *nix Something is corrupting the data, and the logs are useless after corruption. Fortunately I back up. Just using Mosquitto, ESPhome and FTP addons.

Alright, my point was that it is statistically unlikely that you break two SSDs (Crucial aren’t known to make faulty drives) in a short while and that you should do your research before buying more drives and maybe have the same happen to them, but you seem reluctant to do any troubleshooting outside scanning for bad blocks.

Look at your local hardware supplier / Amazon / whatever offerings, find one from a reputable brand and look at the MTBF, the higher the better.

Whats the failure? What make/model? SMART data?

The most reliable SSDs are Form companies that actually produce flash (only 3 or 4 companies world wide do this). If you choose a brand/no-name they need to source the flash cells on the market and the quality can vary a lot. Also cheap SSDs aren’t mostly real ones in the sense that they offer extra long lasting flash cells that work kind as interim storage so that flash cells can be written in full pages/cluster and write amplification is minimized.

No, I assure you I have checked the drives, what else goes wrong other than bad inodes? Amazon is really not a very good place to do “research” is it?

Thank you. That’s is a detail I wanted/needed to hear. I suspect I need to go to a higher quality drive and a more powerful SBC - Odroid probably.

So what does the SMART data say? Since you have checked them. Crucial are one of the ones that manufacture memory by the way. Micron Technology - Wikipedia

Well, if I use the SMART in the Webmin interface, it shows the drive is good. If I use the native Linux tool, which is the same tool, it shows the same. But it does not work and I have to reformat. This is a QC problem with the cheap drives.

Then I would check for other causes than failed drives. To me it seems that the data corruption is caused by something other than two failed SSDs, and that your SSDs are likely good (you have data to support this). Is your power supply good enough?

1 Like

I don’t think so. You should first gather all information so you actually can identify the problem and don’t make quick assumptions.

1 Like