Stability problems NVMe on RPI 5

My Raspberry Pi 5 (16GB) Running Home Assistant OS frequently stops working. It occurs seemingly random every few weeks.

I cannot access the Home Assistant web page using my browser. The Home Assistant App is barely usable, not useful information, such as log files is accessible.
I cannot connect via SSH.
All functionality, such as Zigbee and Z-wave communication is not working anymore.

When connecting a monitor the console output shows an endless amount of the following messages:

Buffer I/O error on dev nvme0n1p3, logical block 17399, async page read
erofs: (device nvme0n1p3): erofs_read_inode: failed to get inode (nid: 2227159) page, err -5
systemd-journald[130]: Failed to write entry to /var/log/journal/<id>/system.journal (21 items, 685 bytes) despite vacuuming, ignoring: Input/output error

The only solution to get out of this situation is to hard reboot the Raspberry Pi.

The home-assistant, host and supervisor logs, accessible after rebooting show no useful information.

Setup currently in use:

HW:
Raspberry Pi 5, 16 GB
Raspberry Pi 45W USB-C Power Supply
Pi NVMe HAT: 52pi 04-m-2-2280-pcie-to-nvme-top
SSD: WD Blue SN5000 500GB PCIe G4
Sonoff Zigbee 3.0 USB Dongle Plus, TI CC2652P
Zooz ZST39 LR Serie 800 Z-Wave USB stick
SW:
Home Assistant OS
Core: 2025.6.1
Supervisor: 2025.05.5
OS: 15.2

Zigbee2MQTT, InfluxDB, Grafana, etc.

However, this issue also occured with my previous setup consisting of a RPI 4 with a SD Card for booting and an additional USB SSD. I suspected that the USB SSD might be the cause of these problems and I decided to switch to the RPI5 setup with NVMe SSD, however, to my great frustration, I still have the same problems!

This issue might be related to the following post, however that didn’t result in any responses:

This post might also described the issue, but also didn’t result in a solution:

Have any of you an idea what might be the cause of the issue or if it is possible to get more information (logs) for finding the cause?

1 Like

My suggestion would be to try a different NVMe SSD - ideally one of the RPi Foundation own-brand tested devices.

Several RPi5 HATs (and the Yellow) report issues with specific SSDs. I’ve not got a compatibility list to hand but ISTR issues with command queueing, overheating, and brown-out of power.

The Pimoroni list calls out a range of WD SSDs:

I certainly had boot issues with the Pimoroni HAT and Samsung 970 EVOs, although other Samsung drives work in the Yellow (probably pushes the device a lot less as PCIe 1 or 2, not 3.0).

If this helps, :heart: this post!

I have to admit I’m desperate enough to try something like buying another SSD. However I do wonder, why I encounter the exact same issues with such a different setup: I switched the RPI from 4 to 5 and the SSD from USB to NVMe. Might it no be software related?

And I also wonder what other HA user have as a HW and storage configuration. I always assumed a RPI with HA is a pretty common setup. If that is indeed the case I would assume more people would encounter the same problems I have.

Also, I assume I won’t make any difference considering the not so powerful hardware of a RPI, but the speed differences between the RPI SDD (50k / 90k IOPS read/write) vs the WD Blue (460k / 770k IOPS read/write) make me cringe a bit.

Before going with the nuclear option, can you get your hands on a powered usb hub for your SSD?

Any cheap usb2 one will do, as long as you provide it with a separate power source. My money’s on insufficient usb power for your SSD.

1 Like

The RPi forum goes into some detail about software features the Linux kernel can use, and that some manufacturers don’t implement or don’t implement correctly (ISTR command queueing was one).

Given the cost of a RPi “original” NVMe, I decided it’s not worth the time to tune, and just upgraded to a “known good” part. The old one is in a USB-C enclosure as a portable drive - seems to work for that use.

Hi,
I’m experiencing the exact same issue. After running a bunch of tests, I found that the problem only shows up when I’m using HA OS flashed to the NVMe. At the same time, if I run the Supervised version (using Raspberry Pi OS Lite, which is basically slimmed-down Debian), everything works rock solid. That makes me think it’s all about how the kernel is built—if it’s optimized for the Pi, no issues at all.

The bad news is that they’re planning to phase out the Supervised version, so I’m back to square one…

Since I can’t really change how the kernel handles my NVMe, the only thing left to try is a different drive—maybe HA OS will “like” it better. I’m currently using a WD SN530 NVMe 256GB, and I’m planning to test it with a Micron 512GB next.

Cheers,
George

I will start with using a powered USB hub, which seems a useful improvement anyway. If that doesn’t work I’ll try the official RPI SSD.
Switching from HA OS to Supervised will be a last resort, especially if they are phasing out that version. I do agree with the suggestion of jokoto777 that it is a SW issue and not a HW issue, since I already switched HW.

If, in the meantime, anybody has any other suggestions or solutions, please let me know!

I’m having the EXACT same issues on an Home Assistant Yellow with NVMe SSD (1TB Silicon Power sold by Ameridroid in their Home Assistant Yellow bundle).

I was connected to SSH when it died this time and this was my first clue that it was disk related (I went out to connect a USB stick to import my authorized_keys file and all the commands were failing by that point):

It’s a bit annoying that Ameridroid bundles an SSD that causes issues like this…

This thread looks helpful for finding a compatible SSD… I’ve ordered a Crucial 500gb since it seems people have had good luck with it: HomeAssistant Yellow NVME compatibility

I tried a powered USB hub, but that didn’t solve the problem. Last week I installed a new NVMe HAT and SSD, both from the Raspberry Pi brand for maximum compatibility. I guess it will take a few weeks to really know if this solves the issue. Unfortunately, both the SSD and HAT are not compatible with the previous HAT and SSD (due to their physical format), so I cannot combine them to pin point if the SSD or the HAT is the culprit (assuming the issue is solved). I’ll keep you updated about the stability of my latest HW setup!

This has been going on for a while:

Best advice, ditch the PI, get a cheaper N100 system and never look back…

I’m using a Pi5 - 4Gb in combo with a raspberry pi 27W power supply, a 256GB Integral M.2 SSD and an Argon NEO 5 M.2 NVMe case for 9 months now and so far no issues.

Hello,

Brandon from ameriDroid here, we have sold hundreds of 1TB NVMe from SiliconPower along with Home Assistant Yellow with both CM4’s and CM5’s and this is the first time that we have heard of an issue like this. In the past we have only seen one issue with one of our pre-assembled units with CM5 where we sent the NVMe flashed with the HAOS and the customer got a few errors that HAOS automatically fixed, the issue appeared to be that after we turn off the device after testing one of the files got corrupted but once our customer boot up the unit HAOS automatically created new files and fixed the issues. I have HAOS running on a SiliconPower on a Yellow with CM5 myself and it has been running for months without any issues. At this time we do not see any pattern that would prompt us to stop recommending the SiliconPower NVMe’s along with Yellow, but we will definetely keep an eye in case that is necessary at some point.

1 Like

Any updates? Struggling with the same as you, only the failure frequency has gotten to the point it fails anywhere between a couple days to a few hours.

Even rebooting fails often now, with I/O errors starting at boot time. Takes several reboots to get it back up.

It might be that the nvme drive (Kingston SNV2S/250G) is worn out, but SMART has nothing to report and the drive seems healthy.

The problem started after the July update for me, but I see others have had the same problem longer than that so I guess it’s a coincidence.

This is my setup as well, which is causing me problems. The only difference between us is I’m using Kingston SNV2S/250G. I’ve also tried a beefier PSU to no avail.

Just out of curiosity, would you be able to log in to the raw host OS (via port 22222 or login from the console, for example) and run dmesg to see if you’ve had any I/O errors or similar at all?

Which version are you running, and do you have the latest firmware on the rpi5?

I ran dmesg from the terminal, didn’t find any errors.
I’m running HA OS
core 2025.8.1
supervisor 2025.08.1
OS 16.0
frontenc 20250811.0

Since my last post (20d ago) I didn’t encounter any issues anymore, so it is getting more likely that the SSD and/or NVMe HAT incompatibility caused my problems. I discovered I can install the new SSD on the old HAT, so if I can find the time I’ll try that combination to find out if the SSD or HAT is the culprit.

And just to be clear, I’m talking about the specific instability described in the starting post:

Buffer I/O error on dev nvme0n1p3, logical block 17399, async page read
erofs: (device nvme0n1p3): erofs_read_inode: failed to get inode (nid: 2227159) page, err -5
systemd-journald[130]: Failed to write entry to /var/log/journal/<id>/system.journal (21 items, 685 bytes) despite vacuuming, ignoring: Input/output error

Just here to post my experience in case it helps someone.

I was having the same issue/error, with a similar setup. RPI 5 with SSD connected via M.2, using the official Pi power adapter and a Sonoff zigbee usb dongle. For many months working without problems but suddenly frozen with the same I/O error, happening every 5-6 days and more recently several times a day. After some investigation, I ended up removing the power strip it was connected to, directly connecting it to the plug in the wall. That did the trick and it’s been working without restarts for a couple of days. If I don’t write here again it means that the issue is gone.

What’s interesting is that the issue started appearing several weeks after moving the device to the power strip, that’s why I never thought that would be the culprit.

So, lesson learned, RPi is very picky/delicate regarding energy.

2 Likes

Hi there
having exactly the same issues with my PI5 (Geekworm X1003 and Transcend 256GB SSD).
I did all the tipps and error corrections surrounding in different I-Net Forums and blocks with no succes so far

Hi everyone, I’m having the same issue. I have my pi5 16gb in a argo neo 5 case. First with a crucial p3 plus and it gives the exact same issue. It’s random. Sometimes it’s running for days others for a few hours. Tried with a crucial p310 and just after restore of the backup all went into flames, nomore booting in cli. But it startrd a few times before de backup got restored.
Went back to a sd card, slow as hell but it booted and backup restored.
Right now i don’t know what to do, can it be the nvme? Can it be the hat?
To mention that both nvme use a phison controller.