Proxmox: restore HA VM got stuck after some time

woody4165 · November 14, 2022, 1:50pm

Hi

I’m trying to move my domotic stuff from an Intel NUC with J3455 cpu to a Beelink U59 with N5195 cpu
I’ve installed from scratch Proxmox 7 (difference between the two machines is that now I have a M.2 SATA SSD where is located Proxmox and an SSD in ZFS where there are the VM and LXC, while before I had only one SSD (no ZFS) where I had both Proxmox and VM/LXC) and made backup from the NUC of all my LXC (Adguard, Z2M, Nodered and others) and my only VM, HA

All the LXCs are working fine, while the HA VM got stuck after some hours (it can be 1 or more than 1) and UI is not reachable.
IP address is the same and I haven’t changed anything while restoring.

What can I check?

Thanks

PS

This is what happens in the VM console when it happens to stop working

[  859.926019] invalid opcode: 0000 [#1] SMP PTI
[  859.927713] CPU: 3 PID: 2884 Comm: python3 Not tainted 5.15.74 #1
[  859.929881] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[  859.931671] RIP: 0010:offline_pages+0x105/0x954
[  859.932732] Code: ee 48 89 ef e8 dc 13 4b ff 48 89 44 24 08 48 85 c0 74 cf 48 8b 5c 24 08 8b 43 50 48 89 df 89 44 24 24 e8 4e e1 4a ff e8 f9 0a <46> ff b9 03 00 00 00 4c 89 ee 48 89 ef ba 01 00 00 00 e8 04 08 4e
[  859.936180] RSP: 0000:ffffb1bc42063a30 EFLAGS: 00010892
[  859.937061] RAX: ffffb1bc42063a38 RBX: 0000000000000000 RCX: ffffffff90c010e7
[  859.938227] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffffb1bc42063a38
[  859.939362] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  859.940453] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  859.941568] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  859.942698] FS:  00007fd96d30fab0(0000) GS:ffffa06835d80000(0000) knlGS:0000000000000000
[  859.943910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  859.944763] CR2: ffffffff83838482 CR3: 0000000110a4c000 CR4: 00000000000006e0
[  859.945864] Call Trace:
[  859.946331]  <TASK>
[  859.946719]  ? asm_exc_page_fault+0x22/0x30
[  859.947631]  ? native_iret+0x7/0x7
[  859.948388]  ? offline_pages+0x106/0x954
[  859.949249]  ? asm_exc_page_fault+0x22/0x30
[  859.950100]  ? native_iret+0x7/0x7
[  859.950768]  ? asm_sysvec_apic_timer_interrupt+0xb/0x20
[  859.951793]  ? offline_pages+0x106/0x954
[  859.952541]  ? sysvec_apic_timer_interrupt+0xb/0x90
[  859.953788]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[  859.954675]  ? clear_page_orig+0x15/0x40
[  859.955415]  ? kernel_init_free_pages.part.0+0x41/0x60
[  859.956466]  ? prep_new_page+0x62/0x80
[  859.957190]  ? get_page_from_freelist+0xa4c/0xc30
[  859.962005]  ? obj_cgroup_charge_pages+0xc2/0x100
[  859.963328]  ? release_pages+0x13f/0x440
[  859.965008]  ? __alloc_pages+0x173/0x300
[  859.966293]  ? alloc_pages_vma+0x6d/0x170
[  859.967443]  ? __mod_lruvec_page_state+0x5b/0xa0
[  859.968643]  ? __handle_mm_fault+0x552/0x990
[  859.969956]  ? handle_mm_fault+0xca/0x2a0
[  859.971082]  ? do_user_addr_fault+0x1be/0x640
[  859.972329]  ? exc_page_fault+0x68/0x140
[  859.973516]  ? asm_exc_page_fault+0x22/0x30
[  859.974620]  </TASK>
[  859.975496] Modules linked in: rfcomm xfrm_user bnep cfg80211 btusb btrtl btbcm btintel virtio_console virtio_balloon
[  859.977741] ---[ end trace 1d7729f7d253e5b3 ]---
[  859.978933] RIP: 0010:offline_pages+0x105/0x954
[  859.980108] Code: ee 48 89 ef e8 dc 13 4b ff 48 89 44 24 08 48 85 c0 74 cf 48 8b 5c 24 08 8b 43 50 48 89 df 89 44 24 24 e8 4e e1 4a ff e8 f9 0a <46> ff b9 03 00 00 00 4c 89 ee 48 89 ef ba 01 00 00 00 e8 04 08 4e
[  859.983865] RSP: 0000:ffffb1bc42063a30 EFLAGS: 00010892
[  859.985189] RAX: ffffb1bc42063a38 RBX: 0000000000000000 RCX: ffffffff90c010e7
[  859.986856] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffffb1bc42063a38
[  859.988463] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  859.990231] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  859.991888] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  859.993535] FS:  00007fd96d30fab0(0000) GS:ffffa06835d80000(0000) knlGS:0000000000000000
[  859.995429] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  859.997977] CR2: ffffffff83838482 CR3: 0000000110a4c000 CR4: 00000000000006e0

dMopp · November 14, 2022, 3:18pm

Try to create a new VM and import the disk only. (Hardwarechanges are sometimes tricky)

woody4165 · November 14, 2022, 3:33pm

Thanks @dMopp

Since I usually use this method to install HA in Proxomox (Installing Home Assistant OS using Proxmox 7), how should I do this time?

I’ve never used a disk import in Proxmox, what should I do?

Thanks again

BTW, I don’t want to loose my original VM and disk!

dMopp · November 14, 2022, 9:15pm

The easiest way:

clone the broken vm
delete VM but keep the disk
create a new vm (use the webinterface)
How to add an existing virtual disk to Proxmox | Dae’s blog or try another method.

This could help, if the problem is not inside the disk

Another way:

start HA (the freezing One)
Create and Download a full backup inside HA
create a complete new VM following the procedure u linked
restore from backup inside the fresh HA Installation

woody4165 · November 14, 2022, 9:37pm

Thanks
I would like to try this, but the broken VM is the target one, i have already create a disk image from the origin VM (the good one)
So I should create a new VM without starting it and then follow the Dae’s blog to add the existing virtual disk to it
Is it correct?

dMopp · November 15, 2022, 6:30am

Correct.
But keep a untouched copy as backup