Upgrade path after 2 years of missed updates...?

bassicrob · December 27, 2024, 3:37am

Life got fairly busy and my instance of Home Assistant just always ‘worked’ with minimal upkeep or tinkering. I now have some time to tinker and troubleshoot if things go awry while wife and kid are out of town. I think I know how to go about this, but I wanted to revisit the great contributors that got my installation to the relatively maintenance-free place it’s been.

I am currently running HAOS 10.5 in Proxmox 8.1.4. My HA Core version is 2023.1.2. Original install is via the amazing scripts by the late tteck, along with his upgrade script from Proxmox v7 to v8…

I’ve kept track of the changelogs when I had time in a personal wiki as it (or as I thought it might) affect my instance and all of the great new features I’d think I would use.

I run Mosquito, ESPHome, DuckDNS, & Node-Red currently as an HA add-ons, and a separate Debian LXC on the same Proxmox machine hosting docker containers for Zigbee2MQTT (1.32.1), Zwave2MQTT (11.4.2), RTLAMR2MQTT and my network’s PiHole (2024.02.0 or v5.17.3). I keep off-machine backups from both Proxmox and from within HA and have verified restore works.

Since I cannot run snapshots (probably due to some unused raw drive in Proxmox I am too afraid or don’t know which one to touch - separate future issue!), I was thinking of the following as an upgrade path:

Create backup in HA and copyoff local machine
Create backup of HA-VM in Proxmox and copy off local machine
Backup the Debian LXC with docker containers
Upgrade the Docker Containers
Clone the VM with new storage pool in Proxmox and stop the other HA VM
Start up the cloned HA VM and hit the OS upgrade button, then the Core Update button
Review errors and make appropriate edits
Upgrade HA Add-Ons once HAOS is successfully updated and deal with those changes
Re-assign new HA-VM MacID to the old’s static IP, reboot VM, and archive or destroy the old VM

If anything fails I should always have a restore point. Does this make logical sense or is there and easier way? I was also thinking of installing a new VM via script and restoring config and add-ons from backup. Possibly the same end result but maybe that can also fix my snapshot storage issue?

JerryM · December 27, 2024, 4:26am

Your Plan

Backups:

You’re right to prioritize backups. Ensure both Proxmox and Home Assistant (HA) backups are tested and validated.
Tip: For Proxmox, use the vzdump utility to create backups of both the HA VM and the Debian LXC.

Upgrade Docker Containers:

Confirm compatibility between the new versions of Home Assistant Core and your Docker container add-ons (e.g., Zigbee2MQTT, ZWaveJS). Their release notes often mention required minimum versions.

Clone VM to New Storage:

Cloning the VM to a new storage pool is a great idea if you suspect your current storage is related to the snapshot issue.
After cloning, run a qm list in Proxmox to verify the cloned VM’s storage configuration. This might reveal inconsistencies causing snapshot issues.

HA OS and Core Upgrades:

Sequentially upgrading HA OS first and then HA Core is the best practice, as some Core updates rely on specific OS versions.

Network Settings:

Ensure your new VM retains the same static IP (by reassigning the MAC address) to avoid reconfiguring your ecosystem.

Potential Improvements to Your Plan

Test Snapshot Functionality:

Before beginning, investigate why snapshots aren’t working.
Run the following commands to identify issues with unused raw drives or storage configurations:

qm config <vmid>
pvs  # Check physical volumes
lvs  # Check logical volumes

If snapshots are failing because of a raw disk or incompatible storage type, converting the disk to a compatible format (like qcow2) might resolve this.

Use the New VM + Restore Option:

Installing a fresh Home Assistant OS via the script and restoring your configuration from backups is a cleaner alternative that also resolves long-standing issues (like snapshot storage).
Steps:
1. Create a new VM using tteck’s Proxmox script.
2. Restore the backup in HA.
3. Test snapshots on this fresh setup before committing.

Parallel Testing Environment:

If you have spare resources, consider setting up the new HA VM in parallel, upgrading it, and testing it against backups.
This avoids downtime for your existing installation and lets you validate changes in a sandbox.

Monitor Resource Utilization:

Since you are hosting multiple services (e.g., Zigbee2MQTT, Zwave2MQTT, PiHole) in an LXC, confirm Proxmox resource utilization is balanced.
Use Proxmox’s built-in monitoring tools (pveperf and UI graphs) to check for CPU or storage bottlenecks.

Addressing Snapshot Issues

Identify the Issue: Likely causes include incompatible storage types (raw disks vs. qcow2), insufficient storage space, or configuration issues.
Fix:
- Convert raw disks to qcow2:

qm stop <vmid>
qemu-img convert -f raw -O qcow2 /path/to/raw-disk.img /path/to/new-disk.qcow2
qm set <vmid> --scsi0 /path/to/new-disk.qcow2
qm start <vmid>

Verify snapshot functionality with qm snapshot <vmid> <snapshot-name>.

Key Considerations for Your Setup

Update Add-Ons Last: Upgrade HA add-ons after HA Core and OS are stable. New versions might not work on outdated HA Core.
Monitor Logs: After each step, monitor logs (supervisor, core, and addons) for errors.
Document Changes: Keep track of any manual edits or fixes to replicate if a rollback is necessary.

Alternative Upgrade Flow

Minimal Impact Approach:
1. Create backups of HA, Proxmox VM, and Debian LXC.
2. Install a new HA VM with tteck’s script and restore your current setup from backups.
3. Validate functionality (snapshots, add-ons, Docker containers, network).
4. If successful, switch over by assigning the new VM’s MAC address to match the old one.

bassicrob · December 28, 2024, 1:54am

Thanks for this organized and concise response! I think I will proceed with parallel new install from script and once everything is confirmed working I will reassign MAC address for the assigned static IP. Hopefully purging some of those old storage drives from the old install, I can use snapshots with the new! Double whammy! I’ll let you know how it goes - will probably tackle it this weekend.

bassicrob · December 31, 2024, 2:35am

Update on…updates so far-
After re-evaluating my system and the recommendations, I did a hybrid approach with additional measures, and had some lessons learned.

I created all the backups except those of the Proxmox VM for HA (because I didn’t read my own instructions), just within HA itself. When I recreated the new HA environment from script it stopped my existing HA instance, and I didn’t expect this. I thought I could run the two, with separate IP addresses, one as dev and other as live. But I let the onboard and updates go, had to manually reboot several times (after 40+ min) but everything came back alive with new recommendations. Remember to remove Core and OS from backup restore like I didn’t! - probably saves another hour of unnecessary re-updates!!

My new HA instance is working with a few errors related to yaml entries that have long been depreciated according to my notes from the changelogs, so all has been expected, and the recommended repairs which is a very welcome feature. I was amazed at the onboarding and how far HA has come since my initial onboard in ~0.6X-ish.

Z2M and ZwaveJS updated and restarted on both systems without fail. However ZwaveJS took some tinkering on with the docker-compose, but it is up and running and cooperating with both versions after a bit of restarts and minor tinkering. I recreated new Docker containers for each with the version in the Container name so I always had the old to revert back to if things went awry.

Overall automations from node-red across the two versions are largely unaffected, which is welcome news! I’ll spend the next few nights repairing (or upgrading) the dashboards and troubleshooting the remaining bugs, then the next year tackling the minor ones while hopefully staying more on top of the upgrades. I am also planning on installing Spook! to help me find lost relics in my install.

Snapshots are working but my lvm storage in proxmox is giving overallocation messages, so I figured out how to move my qcow storage to a different, larger drive and all is well.

I reinstalled, restored, updated many add-ons, integrations, and HACS items. There are errors being thrown around but overall functionality and automations seem to be firing as expected. Spook was very helpful in finding lost entities and still cleaning up those. I kept a log of all my changes in case something went wrong.

All-in-all, I am pleased and kudos to the developers and contributors for bringing HA a long way in 2 years!!

JerryM · January 2, 2025, 2:49am

If anyone can provide a way to have a dual possible dev haos setup running easily let us know.