Running HAOS with core 2025.11.3 and supervisor 2025.12.2 on RPi4
My backups to nfs have been failing last few days and trying to debug. What is happening is I’m getting partial files written to the nfs mounted drive.
I’ve rebooted my HA server a couple of times over the last day.
I can mount the nfs server from my Mac laptop w/o any problem and, for example, copy a full backup file (1.8g) from the nfs server to my laptop over wifi.
Back on the HA machine, I can attach to the supervisor container and see the contents of the nfs share. Then if I try to copy a recent backup it runs for a while then stops:
➜ ~ docker exec -it hassio_supervisor bash
842cbd8c22fd:/# mount | grep nfs
macha.local:/home/ha/ha_nfs_backup on /data/mounts/Macha_Backup type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=200,retrans=2,sec=sys,clientaddr=192.168.0.142,local_lock=none,addr=192.168.0.253)
842cbd8c22fd:/# ls /data/mounts/Macha_Backup
Automatic_backup_2025.11.3_2025-11-28_05.31_17004318.tar
...
Then if I start a copy to the nfs share it runs a while and then stops:
842cbd8c22fd:/data/backup# cp Automatic_backup_2025.11.3_2025-12-05_05.23_08004568.tar /data/mounts/Macha_Backup
^C
^C
And the transfer isn’t complete:
ha@macha:~/ha_nfs_backup$ ls -lt | head -5
total 10084016
-rw-r--r-- 1 nobody nogroup 18874368 Dec 5 12:42 Automatic_backup_2025.11.3_2025-12-05_05.23_08004568.tar
-rw-r--r-- 1 nobody nogroup 872415232 Dec 3 06:22 Automatic_backup_2025.11.3_2025-12-03_05.08_42004201.tar
-rw-r--r-- 1 nobody nogroup 1937653760 Dec 2 05:36 Automatic_backup_2025.11.3_2025-12-02_05.30_37004745.tar
-rw-r--r-- 1 nobody nogroup 1939107840 Dec 1 05:19 Automatic_backup_2025.11.3_2025-12-01_05.14_23005396.tar
And if I attach again to the supervisor container I can see cp in the process list, but unable to kill it:
842cbd8c22fd:/# ps | grep cp
318 root 0:06 [cp]
423 root 0:00 grep cp
42cbd8c22fd:/# kill -9 318
842cbd8c22fd:/# ps | grep cp
318 root 0:06 [cp]
425 root 0:00 grep cp
Sadly, nothing in the host or supervisor logs.
Oh, dmesg on the HAOS host does show something perhaps related.
[ 9546.832738] INFO: task cp:13992 blocked for more than 120 seconds.
[ 9546.832831] Tainted: G C 6.12.47-haos-raspi #1
[ 9546.832876] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 9546.832922] task:cp state:D stack:0 pid:13992 tgid:13992 ppid:13838 flags:0x0000020d
[ 9546.832948] Call trace:
...
I’m not seeing any hardware issue in dmesg (e.g. like SSD errors).
Any suggestions on how to debug further?
My drive:
# parted -l
Model: SanDisk Extreme SSD (scsi)
Disk /dev/sda: 1000GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags: pmbr_boot
Number Start End Size File system Name Flags
1 1049kB 34.6MB 33.6MB fat16 hassos-boot msftres
2 34.6MB 59.8MB 25.2MB hassos-kernel0
3 59.8MB 328MB 268MB hassos-system0
4 328MB 353MB 25.2MB hassos-kernel1
5 353MB 622MB 268MB hassos-system1
6 622MB 630MB 8389kB hassos-bootstate
7 630MB 731MB 101MB ext4 hassos-overlay
8 731MB 1000GB 999GB ext4 hassos-data
