HA hangs and no response after OS updates on KVM

I have my HA instance running in KVM, hosted on my local Ubuntu 22.04.3 LTS x86_64 server machine. I’ve set up the instance using virt-install.

And every OS updates (iirc), the HA instance seems to get stuck:

  1. virsh console hass unresponsive.
  2. The CPU usage of the qemu process corresponding to the HA VM is almost consistently stuck around 100%.
  3. There doesn’t appear to be any significant I/O activity by the qemu process when checked with iotop.
  4. I attempted to check logs in /var/log/libvirt/qemu/hass.log , but didn’t find anything particularly unusual:
$ sudo tail /var/log/libvirt/qemu/hass.log
-netdev tap,fd=32,id=hostnet0,vhost=on,vhostfd=34 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f2:b4:bc,bus=pci.0,addr=0x2 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-audiodev '{"id":"audio1","driver":"none"}' \
-device usb-host,hostdevice=/dev/bus/usb/001/004,id=hostdev0,bus=usb.0,port=6 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/1 (label charserial0)

Just now updated to OS11.1 and it again stuck like as OS11.0 update, and like as others (but I don’t remember exaclly every older update).

After a forced reboot (virsh destroy and virsh start), the HA instance starts up normally and the OS version is updated.

Home Assistant 2023.10.5
Supervisor 2023.10.1
Operating System 11.1

We have similar reports in Update from 11.0 to 11.1 does nothing · Issue #2870 · home-assistant/operating-system · GitHub.

If this is reproducible for you on every update, I’d suggest to manually downgrade and upgrade again to see if it is reproducible. You can downgrade using ha os update --version 11.0. It would be interesting to monitor the console while updating, to see where it gets stuck exactly. The serial console is not the primary console, so you won’t see the systemd startup procedure. You can make the serial console the primary one by changing console=ttyS0 console=tty1 to console=tty1 console=ttyS0 (reverse the order) in /mnt/boot/cmdline.txt. With that you should see the systemd boot messages on the serial console as well.

Before doing all this, I’d suggest to make a copy of the image just to be safe :sweat_smile:

Downgraded and updated again, stuck at reboot: Restarting system after OS update. Here my actions:

  1. Backed up my HA and copied the image.
  2. Connected to instance (virsh console hass)
  3. Edited file /mnt/boot/cmdline.txt to console=tty1 console=ttyS0.
  4. Successfully executed ha os update --version 11.0, after that instance successfully rebooted itself.
  5. Waited for os boot, then checked OS version (11.0).
  6. Then executed ha os update, waiting for the shutdown. Then (as I see it) the instance got stuck on reboot: Restarting system after all the Unmounted and Stopped:
...
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Stopped File System Check …dev/disk/by-label/hassos-data.
[  OK  ] Removed slice Slice /system/systemd-fsck.
[  OK  ] Stopped target Preparation for Local File Systems.
[  OK  ] Stopped Remount Root and Kernel File Systems.
[  OK  ] Stopped Create Static Device Nodes in /dev.
[  OK  ] Reached target System Shutdown.
[  OK  ] Reached target Late Shutdown Services.
[  OK  ] Finished System Reboot.
[  OK  ] Reached target System Reboot.
[  528.507516] reboot: Restarting system

Full log here
6. Waited some more time to make sure it stuck (around 30 minutes while writing this post)
7. virsh destroy hass and virsh start hass, then it boot successfully with OS 11.1. Full log of booting after update here

Update OS 11.2, still same problem

I have the same issue running KVM. OS 10.5, core 2023.11.3, Kubuntu 22.04.

I didn’t have this issue before on the exact same version, on the exact same machine. It started after doing a clean system install and upgrade from Ubuntu 20.04 to 22.04. The same image was migrated, using an identical .xml for the KVM machine setup.

Ever got this fixed? I am running in KVM as well, and have had this issue from i started, around 4-5 years.
virsh destroy and virsh start is the only option.