Home Assistant OS 14 breaks NVMe SSD usage on RPi5

Hello,

I use a Raspberry Pi 5 8 GB with Argon NEO 5 M.2 NVME PCIE Case and an NVMe SSD. There is no microSD card in my setup. RPi boots directly from SSD. The last HA OS <14 works well. My last HA OS working properly was 13.2.

After upgrading to HA OS 14.0 my HA seems to be unresponsible. I tried many restores and flashed the SSD new. Again and again. Nothing worked. So I attached HDMI monitor and keyboard and saw lots of I/O errors.


I/O error, dev nvme0n1, sector xxx op 0x0:(READ) flags 0x80700 pys_seg 1 prio class 2
Buffer I/O error on device nvmp0n1p8, logical block xxx
Aborting journal on device nvme0n1p8-8.
JDB2: I/O error when updating journal superblock for nvme0n1p8-8
EXT4-fs error (device nvme0n1p8) in ext_4 reserve_inode_write:570: Journal has aborted

So I thought my SSD is bad and took another one, flashed with HA OS 14.0 and had the same errors.

So I downloaded Home Assistant OS 13.2 and flashed this to the SSD. And this works without I/O errors, until I upgrade from 13.2 to 14.0 or to 14.1.

Any Ideas how to fix the I/O erros wirh HA OS 14+?

1 Like

Hi, I have the same issue. I don’t tried to restore previous versions but the issue appeared around the new versions you commented. In my case I already tested the disk and it’s healthy.

I am stuck with the same problem here! I will try it your way downloading the 13.2 Version and restore from Backup. Thank you for evaluating this.

1 Like

Not happening anymore. I also noticed a CMA out of memory problem messages. I followed the last post on: Raspberry Pi 5 with HassOS booted from nvme SSD shows cma_alloc errors in dmesg at startup · Issue #3214 · home-assistant/operating-system · GitHub by booting from sd card and modifying the config.txt on the mounted nvm drive.
Since then I have not experienced this error message any journal breaks anymore.
So it might was just a follow up problem by misconfiguration of the Raspi5 with its ssd, now using more memory in the new core.

I hopefully fixed the issue the following steps. In order not to make the time-out too long, I have done everything together that I think could be connected with it.

  1. Prepare microSD with Raspberry Pi OS Lite (64 Bit) (I used Raspberry Pi Imager) and boot this
  2. update everything and reboot
    sudo apt-get update && sudo apt-get upgrade && sudo apt-get -y full-upgrade && sudo reboot
  3. check RPi EEPROM with sudo rpi-eeprom-update
CURRENT: Tue 12 Nov 16:10:44 UTC 2024 (1731427844)
LATEST:  Wed  8 Jan 17:52:48 UTC 2025 (1736358768)
  1. RPi EEPROM with sudo rpi-eeprom-update -a and reboot
  2. Configured use of latest bootloader with sudo raspi-config and reboot
    5.1. avigate to Advanced Options > Bootloader Version.
    5.2. Select Latest to update your firmware to the newest version available.
    5.3. Exit the configuration tool and reboot your Raspberry Pi to apply the changes and reboot
  3. Run scripts from Argon NEO 5 M.2 NVMe-PCIe Case for RPi 5 Manual
    6.1. curl https://download.argon40.com/argon-eepron.sh | bash
    6.2. curl https://download.argon40.com/arginneo5.sh | bash
    6.3. reboot
  4. shut down Raspberry Pi OS, remove microSD and boot from NVMe
  5. Update HA OS to latest 14

This is the first time HA OS 14.x works without any I/O errors releated to NVMe device. Tested this for a couple of days and are fully satisfied. I’ve got the feeling, the SSD is now a bit cooler than before this procedure.

Perhaps the developers could consider checking the firmware in the HA OS image and updating it automatically if necessary.

2 Likes

Since I got HAOS 14.0 and 14.1 working after EEPROM Update, I once experienced I/O errors. This time after a week or a couple of days.


So I rethinked what else to do.
First of all I upgraded to HAOS 14.2. Then I thought of temperature issues with the NVMe. Because in my Argon NEO 5 M.2 NVME PCIE Case the SSD is in the bottom with a small head spreader and little space between desk and heat sink. Just millimetres not more. So I put the case upside down, so the heat can flow upwards.

My second idea is to limit the PCIe interface to Gen 1. Normally it’s set to Gen 2. In HAOS I entered the following after local login command. Please check your self for how to use vi.

vi /mnt/boot/config.txt

and added two lines at the bottom:

# Enable the PCIe External connector.
dtparam=pciex1
# Set PCIe to Gen 1 (normally 2)
dtparam=pciex1_gen=1

With this setting restarted and case upside down I started to watch NVMe temperatures.

nvme smart-log /dev/nvme

Temps are from 27 - 29 °C. That seems fine for me. Time to put case in normal position SSD at the bottom.

Temps are now 31 - 33 °C. That’s ok too.

So I played a little with nvme command:

nvme smart-log -H /dev/nvme

On top there is more info about temperature threshold. Here it shows 0. So there was no temperature throttling and therefore there was no temperature issue?
critical_warning:0
Temp Theshold[1]: 0

Hi, I was wondering whether or not you got any further with this topic since I am experiencing the same problem.

Was it the same for you that most of HAS OS was still running but the I/O ERRORS caused the recorder to stop working?

Greetings