Stability problems NVMe on RPI 5

Wow, thats a lot you did and found out!

I finally surrendered and bought the RPI SSD and the official HAT because I was tired of further investing more time.

Your point with the signal cable may be valid! I was/am an embedded SW engineer and my gut feelings told me exactly that. When the problems got worse here I suspected that cable. So I disassembled everything, checked that cable, turned it on again and the problems visible during startup were all gone.

Initially I was happy but already after closing the housing/fixing the screws and putting the RPI to its right location it did not even boot anymore. So for me this just looked like a problem regarding signal quality, depending on unknown factors. So I strongly believe in your theory.

Another finding I had was: during this time I had a display connected to the housing, so I could monitor the logs from time to time. I never connected a display before. But now my RPI had suddenly massive trouble connecting to the WiFi. As soon as I detached the display cable again everything was fine again.

My lesson learned from this whole thing is, that I would never use an RPI again for this purpose. We want to have a stable HA and it is extremely annoying to explain your wife why the bathroom remains cold.

Taking all the money I invested in new PSUs, HATs, SSDs I could have afforded a most likely more stable Mini PC with even some money left, taking my wife out for dinner, aka saying sorry for the cold bathroom :wink:

1 Like

Can you have a look at what this returns in the CLI?

The issue here is much much bigger. I found out on my search for answers.
There is an incompatibility with many NVME SSD and raspberry pi 5 but not only with RPi5 It also pertains to other platforms and hardware systems. It has to do with Phison NVME controller. That causes the issues.

I was experiencing shutdown crashes and freezes moving from 15.2 to 16.1 and up to the point where it drove me crazy. I ended up completing wiping HA of my NVME and making a new install only to find out that did not help. Digging further it turned out that all the NVME’s I have SABRENT Rocket plus 1T, Kingston NV3 WD350 all have a poison controller which causes these issues.

I ended up grabbing a 256gb USB3.1 stick dongle thingy and running HA on it. After a restore I luckily recovered my instance. and it has now been running smooth for 36 hours.

more to follow.

https://www.google.com/search?q=nvme+ssd+incompatibilty+with+rpi5+phison&client=firefox-b-d&hs=yWIp&sca_esv=89aaa52b873f06b6&biw=1710&bih=989&ei=Y_yEaeOhD5mBi-gPl4-C-Qw&ved=0ahUKEwij34Whl8OSAxWZwAIHHZeHIM8Q4dUDCBM&uact=5&oq=nvme+ssd+incompatibilty+with+rpi5+phison&gs_lp=Egxnd3Mtd2l6LXNlcnAiKG52bWUgc3NkIGluY29tcGF0aWJpbHR5IHdpdGggcnBpNSBwaGlzb24yCBAAGIAEGKIEMggQABiABBiiBDIIEAAYgAQYogQyBRAAGO8FSMAqUNAFWJcdcAF4AZABAJgBhwGgAaEFqgEDNC4zuAEDyAEA-AEBmAIIoALPBcICChAAGEcY1gQYsAPCAgcQIRgKGKABmAMA4gMFEgExIECIBgGQBgiSBwM1LjOgB9YcsgcDNC4zuAfHBcIHBTAuMy41yAcXgAgB&sclient=gws-wiz-serp

@330chauf you are right that the Phison NVMe controller is know to be a problem but in fact, it generally prevents the NVMe SSD to be recognized by the Pi. You can install on it or boot it.

As far as I know, the Patriot is not relying on this controller so I don’t think it is the issue. At least I have not seen the Patriot on the lists I found of problematic NVMe SSD, in particular the Geekworm one.

36 hours is a start but it is not a lot… As for me, without any configuration change, my RPI 5 has been running for a week now. The only change I can think about is that I gave up with WiFi (because of the problems during reboot as experienced by others and mentioned recently by @dan-shaqfu ) and decide to connect the RPI 5 to the network with a RJ45 cable after the last reboot and 1 day out of the network because of the WiFi issue. Since then it has been stable but I think it doesn’t make sense to correlate the 2 things…

I decided to wait a little bit more before doing new checks and rebooting the RPI 5…

Still tuned! But I am not really satisfied by the situation suddendly becoming stable without any reason…

1 Like

Could be that it has no effect. But I let it run overnight with raspberry Pi lite overnight. Meaning booted but no running extra processies and sure enough in the morning there was no connection anymore.

I’m consediring moving my installation to a NUC. Speed is one thing but unreliability is something totally different

Apologies. Reading again my previous post, I realized that I wrote the opposite of what I wanted to say at the end of the first paragraph… Should read ā€œyou can’t install on it or boot itā€ā€¦

I also wanted to report that my RPI 5 is still running like a charm and that thinking again at the possible cause, I consider more and more seriously an impact of the WiFi on the SSD. It is really the only change: since the last boot, the WiFi interface is down, without any way to restart it except a reboot that I don’t want to do for the time being (for testing purpose)…

For what is worth, I decided to ask ChatGPT and I got an answer saying it was not completely a crazy idea. My question was: on a rasperry pi 5, could the WiFi have a negative influence on PCIe signals ? Below is an except from the (long) answer:

Why Wi-Fi could affect PCIe

Wi-Fi (2.4 GHz / 5 GHz) and PCIe are very different beasts, but they can still interact indirectly:

EMI (electromagnetic interference)
Wi-Fi radios transmit RF energy. If PCIe traces, cables, or connectors are poorly shielded or routed, that RF noise can couple into the PCIe signals.

High-speed sensitivity
PCIe Gen 2/Gen 3 signals are extremely fast (multi-GHz). They don’t tolerate noise well if:

Signal integrity is already weak

Eye margins are small

There’s impedance mismatch, reflections, or long cables

Ground / power noise
Wi-Fi transmission draws bursty current. If power delivery or grounding is marginal, that noise can show up as jitter on PCIe lanes.

In particular, in my case I use the Geekworm X1001 PCIe HAT which seats on top of the RPI instead of under the RPI with the official HAT. As a result the FCC cable is longer, meaning it is more exposed to signal perturbations from the environment and WiFi may be one of the source of such a disturbance… I have nothing to prove it, just an idea…

At next reboot, I plan to so some tests reactivating the WiFi to see if I can reproduce the unstable sitatution and stop it by disabling it. If it is the case, it is an easy workaroud for me as using an Ethernet cable is not a problem (was just easier with WiFi).

1 Like

These settings absolutely saved the day for me with a Raspberry Pi5, Argon V3 enclosure, Raspberry Pi5 power supply and a Ranxjiang (yeah, never heard of them either) 256Gb NVME