HAOS crashes at automatic backups time

Hi,
Since a few weeks (end of January or beginning February, I’m not sure), HAOS sometimes crashes. I think it happened 3-4 times so far. And when it does it’s always around 5:15-5:45. To me, this looks like the time of the new automatic backup system.

Last time it happened was this morning.
When I look at the logs I see the last log entry before those of the booting process is indeed the start of the backup process (5:18).

There seems to be nothing more in the logs that could show me in more details what was executed at 5:18.

How can I know for sure what happened?

N.B. This install is 2-year old. The system has 6 GB of RAM and is a VM on Proxmox. At least now HAOS automatically reboots with the help of a script in Proxmox I setup because of the previous crashes.

2025-02-16 05:16:42.533 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2025-02-16 05:16:51.341 INFO (SyncWorker_11) [pylast] user.getInfo
2025-02-16 05:16:51.462 INFO (SyncWorker_11) [pylast] user.getInfo
2025-02-16 05:16:51.585 INFO (SyncWorker_11) [pylast] user.getRecentTracks
2025-02-16 05:16:51.998 INFO (SyncWorker_11) [pylast] user.getTopTracks
2025-02-16 05:16:52.350 INFO (SyncWorker_11) [pylast] user.getRecentTracks
2025-02-16 05:17:22.340 INFO (SyncWorker_7) [pylast] user.getInfo
2025-02-16 05:17:22.675 INFO (SyncWorker_7) [pylast] user.getInfo
2025-02-16 05:17:22.827 INFO (SyncWorker_7) [pylast] user.getRecentTracks
2025-02-16 05:17:22.982 INFO (SyncWorker_7) [pylast] user.getTopTracks
2025-02-16 05:17:23.244 INFO (SyncWorker_7) [pylast] user.getRecentTracks
2025-02-16 05:17:39.132 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2025-02-16 05:17:53.340 INFO (SyncWorker_14) [pylast] user.getInfo
2025-02-16 05:17:53.862 INFO (SyncWorker_14) [pylast] user.getInfo
2025-02-16 05:17:53.980 INFO (SyncWorker_14) [pylast] user.getRecentTracks
2025-02-16 05:17:54.972 INFO (SyncWorker_14) [pylast] user.getTopTracks
2025-02-16 05:17:55.197 INFO (SyncWorker_14) [pylast] user.getRecentTracks
2025-02-16 05:18:06.006 INFO (MainThread) [homeassistant.components.recorder.backup] Backup start notification, locking database for writes
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun home-assistant (no readiness notification)
s6-rc: info: service legacy-services successfully started
2025-02-16 05:20:22.146 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration ms365_todo which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:22.148 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration adaptive_lighting which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:22.151 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration plant which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:22.152 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration spook which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:22.153 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration spook_inverse which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:22.154 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration v6_quietdrift which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:22.156 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration battery_notes which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:22.157 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration growcube which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:22.158 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration tapo_control which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:22.160 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration google_home which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:22.162 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration hacs which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:22.163 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration openplantbook which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:22.164 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration smartir which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:22.165 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration dawarich which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:22.167 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration webrtc which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:22.168 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration hilo which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:22.169 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration lastfm_scrobbler which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant
2025-02-16 05:20:23.256 INFO (MainThread) [homeassistant.setup] Setup of domain system_log took 0.17 seconds
2025-02-16 05:20:23.256 INFO (MainThread) [homeassistant.setup] Setup of domain logger took 0.17 seconds
2025-02-16 05:20:23.256 INFO (MainThread) [homeassistant.bootstrap] Setting up frontend: {'backup', 'frontend', 'hassio'}
2025-02-16 05:20:23.256 INFO (MainThread) [homeassistant.setup] Setting up http
2025-02-16 05:20:23.257 INFO (MainThread) [homeassistant.setup] Setting up device_automation
2025-02-16 05:20:23.257 INFO (MainThread) [homeassistant.setup] Setup of domain device_automation took 0.00 seconds
2025-02-16 05:20:23.271 INFO (MainThread) [homeassistant.setup] Setup of domain http took 0.01 seconds
2025-02-16 05:20:23.271 INFO (MainThread) [homeassistant.setup] Setting up websocket_api
2025-02-16 05:20:23.271 INFO (MainThread) [homeassistant.setup] Setup of domain websocket_api took 0.00 seconds
2025-02-16 05:20:23.271 INFO (MainThread) [homeassistant.setup] Setting up api
2025-02-16 05:20:23.272 INFO (MainThread) [homeassistant.setup] Setup of domain api took 0.00 seconds
2025-02-16 05:20:23.272 INFO (MainThread) [homeassistant.setup] Setting up auth
2025-02-16 05:20:23.273 INFO (MainThread) [homeassistant.setup] Setup of domain auth took 0.00 seconds
2025-02-16 05:20:23.273 INFO (MainThread) [homeassistant.setup] Setting up config
2025-02-16 05:20:23.275 INFO (MainThread) [homeassistant.setup] Setup of domain config took 0.00 seconds
2025-02-16 05:20:23.275 INFO (MainThread) [homeassistant.setup] Setting up diagnostics
2025-02-16 05:20:23.275 INFO (MainThread) [homeassistant.setup] Setup of domain diagnostics took 0.00 seconds
2025-02-16 05:20:23.275 INFO (MainThread) [homeassistant.setup] Setting up file_upload
2025-02-16 05:20:23.276 INFO (MainThread) [homeassistant.setup] Setup of domain file_upload took 0.00 seconds
2025-02-16 05:20:23.276 INFO (MainThread) [homeassistant.setup] Setting up image_upload
2025-02-16 05:20:23.276 INFO (MainThread) [homeassistant.setup] Setting up repairs
2025-02-16 05:20:23.276 INFO (MainThread) [homeassistant.setup] Setup of domain repairs took 0.00 seconds
2025-02-16 05:20:23.276 INFO (MainThread) [homeassistant.setup] Setting up search
2025-02-16 05:20:23.277 INFO (MainThread) [homeassistant.setup] Setup of domain search took 0.00 seconds
2025-02-16 05:20:23.277 INFO (MainThread) [homeassistant.setup] Setting up hassio
2025-02-16 05:20:23.281 INFO (MainThread) [homeassistant.setup] Setup of domain image_upload took 0.01 seconds
2025-02-16 05:20:23.282 INFO (MainThread) [homeassistant.setup] Setting up person
2025-02-16 05:20:23.284 INFO (MainThread) [homeassistant.setup] Setup of domain person took 0.00 seconds
2025-02-16 05:20:23.318 INFO (MainThread) [homeassistant.setup] Setup of domain hassio took 0.04 seconds
2025-02-16 05:20:23.347 INFO (MainThread) [homeassistant.setup] Setting up binary_sensor
2025-02-16 05:20:23.347 INFO (MainThread) [homeassistant.setup] Setup of domain binary_sensor took 0.00 seconds
2025-02-16 05:20:23.347 INFO (MainThread) [homeassistant.components.binary_sensor] Setting up hassio.binary_sensor
2025-02-16 05:20:23.348 INFO (MainThread) [homeassistant.setup] Setting up sensor
2025-02-16 05:20:23.348 INFO (MainThread) [homeassistant.setup] Setup of domain sensor took 0.00 seconds
2025-02-16 05:20:23.348 INFO (MainThread) [homeassistant.components.sensor] Setting up hassio.sensor
2025-02-16 05:20:23.352 INFO (MainThread) [homeassistant.setup] Setting up update
2025-02-16 05:20:23.353 INFO (MainThread) [homeassistant.setup] Setup of domain update took 0.00 seconds
2025-02-16 05:20:23.353 INFO (MainThread) [homeassistant.components.update] Setting up hassio.update
2025-02-16 05:20:23.356 INFO (MainThread) [homeassistant.setup] Setting up backup
2025-02-16 05:20:23.357 INFO (MainThread) [homeassistant.setup] Setup of domain backup took 0.00 seconds
2025-02-16 05:20:23.358 INFO (MainThread) [homeassistant.setup] Setting up onboarding
2025-02-16 05:20:23.358 INFO (MainThread) [homeassistant.setup] Setup of domain onboarding took 0.00 seconds
2025-02-16 05:20:23.358 INFO (MainThread) [homeassistant.setup] Setting up lovelace
2025-02-16 05:20:23.360 INFO (MainThread) [homeassistant.setup] Setup of domain lovelace took 0.00 seconds
2025-02-16 05:20:23.360 INFO (MainThread) [homeassistant.setup] Setting up frontend
2025-02-16 05:20:23.365 INFO (MainThread) [homeassistant.setup] Setup of domain frontend took 0.01 seconds
2025-02-16 05:20:23.365 INFO (MainThread) [homeassistant.bootstrap] Setting up recorder: {'recorder'}
2025-02-16 05:20:23.366 INFO (MainThread) [homeassistant.setup] Setting up recorder
2025-02-16 05:20:23.368 INFO (MainThread) [homeassistant.components.http] Now listening on port 8123
2025-02-16 05:20:23.369 INFO (MainThread) [homeassistant.setup] Setup of domain http took 0.00 seconds

The homeassistant.log.1 in the config folder is the log from the previous run, which means the one up to the reboot.
At reboot the old homeassistant.log renamed with a .1 in the end and a new homeassistant.log is started.

Two additional lines were in that file:

2025-02-16 05:18:14.379 INFO (MainThread) [homeassistant.components.recorder.backup] Backup end notification, releasing write lock
2025-02-16 05:18:14.381 INFO (Recorder) [homeassistant.components.recorder.core] Database queue backlog reached 7 entries during backup

None of the log messages you posted give a hint about the cause.

The backup should start at 4.15 local time, so it might be that it have finished and need to copy it somewhere maybe?

The homeassistant.log.1 should also contain a lot more lines than those 2.

Mine are always between 5:15-5:45, and that is from default HA ‘preferred’ option, which says backups will happen between ‘04:45 and 05:45’.

BTW, homeassistant.log.1 had the lines I had previously posted (up until 5:18) plus the two others I just shared.

With the same timestamps???

2025-02-16 05:16:42.533 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2025-02-16 05:16:51.341 INFO (SyncWorker_11) [pylast] user.getInfo
2025-02-16 05:16:51.462 INFO (SyncWorker_11) [pylast] user.getInfo
2025-02-16 05:16:51.585 INFO (SyncWorker_11) [pylast] user.getRecentTracks
2025-02-16 05:16:51.998 INFO (SyncWorker_11) [pylast] user.getTopTracks
2025-02-16 05:16:52.350 INFO (SyncWorker_11) [pylast] user.getRecentTracks
2025-02-16 05:17:22.340 INFO (SyncWorker_7) [pylast] user.getInfo
2025-02-16 05:17:22.675 INFO (SyncWorker_7) [pylast] user.getInfo
2025-02-16 05:17:22.827 INFO (SyncWorker_7) [pylast] user.getRecentTracks
2025-02-16 05:17:22.982 INFO (SyncWorker_7) [pylast] user.getTopTracks
2025-02-16 05:17:23.244 INFO (SyncWorker_7) [pylast] user.getRecentTracks
2025-02-16 05:17:39.132 INFO (MainThread) [hass_nabucasa.google_report_state] Timeout while waiting to receive message
2025-02-16 05:17:53.340 INFO (SyncWorker_14) [pylast] user.getInfo
2025-02-16 05:17:53.862 INFO (SyncWorker_14) [pylast] user.getInfo
2025-02-16 05:17:53.980 INFO (SyncWorker_14) [pylast] user.getRecentTracks
2025-02-16 05:17:54.972 INFO (SyncWorker_14) [pylast] user.getTopTracks
2025-02-16 05:17:55.197 INFO (SyncWorker_14) [pylast] user.getRecentTracks
2025-02-16 05:18:06.006 INFO (MainThread) [homeassistant.components.recorder.backup] Backup start notification, locking database for writes
2025-02-16 05:18:14.379 INFO (MainThread) [homeassistant.components.recorder.backup] Backup end notification, releasing write lock
2025-02-16 05:18:14.381 INFO (Recorder) [homeassistant.components.recorder.core] Database queue backlog reached 7 entries during backup

Either you have restarted between the two extracts or you are looking at the wrong file.
Those two files can not be identical.

Extract #1 is from HA web ui, Extract #2 and #3 are from reading homeassistant.log.1 on the VM. And yes they are the right files.

Then you have restarted your system between the 1. and 2. extract, which means you have lost the log from the run up to the crash.

You will have to wait for it to happen again then.

I don’t understand what you mean. What is it that you think is missing?

The logs in the GUI is the same as the content in homeassistant.log.
The text in the homeassistant.log.1 is the content of the homeassistant.log from the before the last restart of HA.

You need both to diagnostic a crash issue, because one is telling you what happened up to the crash and the other is telling you what happened after the system came up again.

Yeah, obviously. I do have both. You also want to see homeassistant.log ?
Its content – its first few lines – is the same as what you can see in extract #1 from 5:20, which is the time the vm rebooted.

The logs I posted in first post is just an extract of the logs that were recorded around the crash time at 5:18 and reboot at 5:20. I for sure have the thousands of other lines in respective log files.

BTW, my point with this thread was to show that except from the 3 last lines of homeassistant.log.1 at 5:18 there seems to be nothing indicative in the logs and was asking where/how I could find more info elsewhere.

Try to open both side-by-side.
They should start somewhat similar in text, but not in timestamps.

The last entry in homeassistant.log.1 should be just before the first one in homeassistant.log
If you did not restart you HA core yourself, then the crash should be in the space between those two logs.

Pointing your somewhere for help will be to no avail, because everyone will need those two logs, and especially the homeassistant.log.1, to deduct what steps to take from there.

I’ve just fixed a similar issue yesterday, though not on proxmox., but on an ubuntu 24.10 N100 host.

I had to convert a disk from qcow2 to raw using

 qemu-img convert -p -O qcow2 snap1 merged.qcow2

where snap1 is a symlink to a snapshot (because qemu-img didn’t like a colon in the snapshot name…), then convert the flattened merged.qcow2 to a raw with falloc preallocation:

qemu-img convert -o preallocation=falloc -p -O raw merged.qcow2 debian12-hass-falloc-raw.img

I could probably do that in one go if I knew this is the fix but this is the combination which finally worked. 1m load average dropped from 15 to 1 during backups and recorder purges (4-6 am).

I also set cache to none instead of default (IIRC writeback) in the disk settings and tweaked sysctl vm.dirty_ratio to a very low number (try 5 or 7) on both host and guest.

Hi, and thank you for your response.

My host uses ZFS, so I understand the guest’s drive is kind of a mounted partition rather than a file. Thus, I suppose it’s already then closer to some kind of raw format than qcow2.

Cache was already set to none.

I’ll look into the vm.dirty_ratio thing, I never heard of that before.

1 Like

Oh no idea how to work with zfs here so it might not be it. I’d start by looking at vmstat 1 in the guest and see if there’s any correlation between high cpu wait and/or large number of blocked processes with whatever happens on the host with either also vmstat or however you monitor zfs pool activity.

Well, I don’t seem to ever see high CPU usage, at least in the Beszel monitoring tool I just installed. Even memory usage does not seem to ever peak, or maybe if it does it’s really a burst and then the monitoring tool does not have enough time to record it before the VM crashes. I don’t know.

Yet, I see that the Home Assistant Core container memory usage slowly and steadily increases over time.

The monitoring tool for CPU % I used also didn’t show usage, but the 1m load average was spiking. vmstat 1 was showing wa (i/o wait) times in 90+% range, which basically means processes were blocked waiting for disk to return data and that is something Linux really doesn’t deal with well.

1 Like

@WallyR I think there is no logs in Home Assistant because I think it’s the Proxmox server that killed the HAOS vm because it was using too much memory.

@donbaq Here is an output of running vmstat:

# vmstat --unit M 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu-------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st gu
 1  0      0   1174    158   2969    0    0    31    89  450    3  2  0 98  0  0  0
 0  0      0   1174    158   2969    0    0     0    80  594  914  1  1 98  1  0  0
 0  0      0   1176    158   2969    0    0     0     0  236  328  1  0 99  0  0  0
 0  0      0   1176    158   2969    0    0     0     0  421  432  3  1 96  0  0  0
 0  0      0   1175    158   2969    0    0     0     0  382  438  2  1 97  0  0  0

I’m not a sysadmin so I don’t know if there is something worrisome in these numbers. In all cases, I guess I’ll have to find a way to have that command running non-stop between 5:00-6:00 to get the real useful numbers.

In the meantime, here is a screenshot from Beszel monitoring:


We can clearly see a sudden increase in memory usage around 5:15-5:30.
Memory-wise, there is also a jump around 2:40-3:00 but that one is coming from the music assistant addon (red-orange container below homeassistant-core container).

The thing I fond strange is the Disk I/O write activity that peaks around 5:40 because I have not setup local backups, they are all save on remote locations. (Maybe they are in fact packaged locally before being sent out).