HAOS on RPi froze - and systemd ate my logs

Hi all,

I just had my HA instance freeze up completely - couldn’t access the OS or HA via ssh or browser. Pulled the power, and it rebooted fine.

I’m trying to figure out what may have caused it to lock up, but there is an almost 24 hr long chunk of logs simply missing before the reboot:

... lots of logs ...
Jul 14 21:40:12 homeassistant hassio_dns[591]: [INFO] 127.0.0.1:44275 - 14926 "PTR IN 111.6.168.192.in-addr.arpa. udp 55 true 2048" NXDOMAIN qr,rd,ra 44 0.027743732s
-- Boot 5a7ccb997c2f433f8da1969880afbe79 --
Jul 15 20:48:39 homeassistant systemd-timesyncd[581]: System clock time unset or jumped backwards, restored from recorded timestamp: Mon 2024-07-15 20:48:39 UTC
... lots of logs ... 

The really weird thing is that HA continued recording data from sensors until ~ Jul 15 19:15, i.e. the system was frozen for less than 2 hrs.

Is a hard reset expected to result in hrs/days worth of lost logs? Any ideas on how to further debug?

Thanks!

I would start here: How to Troubleshoot Raspberry Pi Crashing.

I would start here: How to Troubleshoot Raspberry Pi Crashing

Thanks. Nothing in there really jumps out as a possible culprit though - the system had been running on the same hardware, with the same peripherals, and the same configuration (except regular updates to the latest HA version) for months without issues, and it’s running without issues now. Other than a ~2 hr gap in sensor data and a ~24 hr gap in system logs, there is no indication of anything ever being wrong :crazy_face:

I had some issues before with the SSD, so it stands to reason that this could be my problem. It just seems very strange that almost a day of logs disappeared while HA was still acting perfectly normal and dutifully recording sensor data.

Your not missing logfile data. The current logfile is created on reboot. The old logfile is home-assistant.log.1

Pulling the power on a system without a proper shutdown can corrupt the file system - something that may not be immediately noticeable but may cause problems in the future.

Hardware does go bad. Power supplies, SD cards, SSDs etc. It may not be consistent how this presents itself but that does not mean that there is not an issue.

1 Like

Your not missing logfile data

I am talking about the OS level logs pulled via journalctl. There is nothing between Jul 14 21:40:12 and Jul 15 20:48:39. home-assistant.log[.1] has nothing that would shed light on why the system crashed.

I agree that something is an issue, and it may well be the HW… but without logs it’s hard to figure out what, and all I can do is randomly replace hardware and hope the problem goes away… (though the SSD is probably a good start given past problems, so I may do that).

What RPi model are you using? Is the SSD plugged directly into the RPi?

I’m using a CM4 on a Waveshare PoE UPS Base Board, with a M.2 NVMe connector. As mentioned the SSD / board combination has not been the most reliable, so, it’s definitely a suspect.

Might want to consider this also: 2024.5+: Tracking down instability issues caused by integrations.

2024.5+: Tracking down instability issues caused by integrations.

Thanks! Didn’t know about that, enabled debug as described.

I am still scratching my head how a day’s worth of logs could just disappear. It’s apparently not unheard of with systemd. Not great, not losing all the logs is kinda important…

1 Like

I agree with that.
My Haos on rpi4 freezes once every 15 days aprox. Unplug and rep… do the thing, but the logs are gone.
Im thinking a script that copy the logs every x time, so I can backup the files to see them later…