Help troubleshooting slow startup times for HAOS on a VM?

I have HAOS installed on a Proxmox VM on an x86 Xeon server. It has 4GB RAM which can balloon to 8GB, 6 cores allocated, and the storage is backed by an SSD ZFS array. This is an EFI VM. This install is about 2 years old.

Recently, over the last 2 months or so, I’ve been having many issues with slow boot, often having to jump into the emergency console. I’ve been gathering diagnostics and logs (trying to shut off integrations that don’t work anymore) but it’s not improving things.

Currently, boot is getting stuck at “A start job is running for HAOS swap” and I’ve seen that run for over 30 minutes:

Once we get through that, I get other long-running start jobs, like “A start job is running for Docker Application Container Engine”, “Cleanup of Temporary Directories”,

I’ve done some research, but not come up with any solutions:
No solution, thread locked, much talk about flashing an SD for a pi install: "A start job is running for HAOS swap" - takes forever · Issue #3182 · home-assistant/operating-system · GitHub
Same issue here, no specific help for my case:
USB install: HAOS swap job running forever - #8 by david65536

In general, I’m just getting extreme slowness and bootup failures, and I don’t think throwing more resources at this is the fix. For example, right now, it booted to the console but I have no web UI, if I do “core logs” I see Home Assistant Core exited, then if I start it, I get this:

I’ve been through lots of threads, but feel like everything I try just leads to more problems. I’ve reverted from backup several times in this troubleshooting, when trying things doesn’t help. I can always sling more resources at this, but I don’t think that’s the fix; I monitor the VM with Zabbix and do not see any resource bottlenecks:

Do you see the same behaviour for a fresh (very basic) haos VM as well - or only with your existing one?

I have not gone the route of making a new one yet, I would like to try to fix the old one first.

My suggestion wasn’t to replace your existing VM (yet), but just to temporarily test how an untampered haos VM behaves on your hypervisor (minimum amount of disk/ RAM, only the most basic configuration to test a few reboots).

What is HAOS Swap? Is it truly required? Is it just like linux swap? Is it something targeted at HAOS running on a Pi, which wouldn’t apply to me in a VM where I can essentially add RAM to avoid swapping?

One thing I’m seeing, on bootup, I get a ton of MQTT errors on bootup. I had a bunch of wyze cameras I’ve since replaced, and all the devices and entities are long deleted, but when I check “ha core logs” on bootup, I get a bunch of stuff like this:

2025-10-25 15:56:58.662 ERROR (MainThread) [homeassistant.components.mqtt.entity] Error 'Single-level wildcard must occupy an entire level of the filter for dictionary value @ data['state_topic']' when processing MQTT discovery message topic: 'homeassistant/switch/sr66-wyze-04/night_mode/config', message: '{'unique_id': '2caa8eee3eec-night-mode', 'device': {'identifiers': '2caa8eee3eec', 'connections': [['mac', '2c:aa:8e:ee:3e:ec']], 'manufacturer': 'Xiaomi', 'model': 'Dafang', 'sw_version': 'null - master - null', 'name': 'sr66-wyze-04'}, 'icon': 'mdi:weather-night', 'state_topic': 'Above+Garage/sr66-wyze-04/night_mode', 'command_topic': 'Above+Garage/sr66-wyze-04/night_mode/set', 'name': 'sr66-wyze-04 Night Mode'}'

I can’t figure out where this is coming from; I search “wyze” in entities, devices, mqtt, everything, and they’re totally purged.

This do sound like a hardware failure.
Anything in the Proxmox logs?

My guess is one of your discs have an error in track zero, which is the track used for writing file tables.
The system will then recreate the missing data from the other discs, but this takes time on a system without a dedicated raid controller.

I’m extremely confident it is not a disc corruption issue. This Proxmox setup is backed by a trueNAS LUN on a ZFS array that has error checking and scrubbing enabled, and all disks are showing clean.

Shutwown the old VM. Copy the VM in Proxmox. Bring up the new VM. Do you experience the same issues?

The only thing I can then come up with is if you have selected dynamic disc sizing.

For these errors it sounds like you could do with looking into “mqtt Explorer” read the documentation and such but essentially it can be used to connect to (presumably the integration/broker) list all your mqtt devices and allow you to easily filter/delete old redundant entries which would remain deleted or get repopulated with updated entities/ID’s.

Do your research first, watch a few videos of it action, but if nothing else hopefully it would give you a starting point in reducing the mqtt errors you’re getting.

Otherwise, as others have said, try a new VM to see if it’s behaviour is comparable, if it’s just as slow it may be a hardware issue.

This was great advice, and I think that cleared out the MQTT artifacts; I had a ton of dead devices in there. Gonna try a reboot, then I’ll be cloning and trying that.

Edit: It just rebooted and was up and running in less than 30 seconds. This had huge impacts, and at this time I consider the issue resolved, though I am still doing some testing; no matter what this was a monumental difference.

1 Like

Very pleased to have been of some assistance. Thank you for taking time to report back, and the kind words!