I have a standard HassOS on Raspberry Pi 4 installation which seems to be going offline overnight. When I check it in the morning, it’s not pingable, and the only way to get access to the system is to power-cycle the Raspberry (which I’m aware is not good).
My question is, how should I debug this? I’ve checked all the logs in the supervisor tab, but they only seem to log up to the point of the last reset. Is there any way of checking what happened before the Raspberry was turned off, like a persistent log file?
I had to wait for HA to break in the same way (it worked fine for a few days) to try this. Last night it finally did, so I tried this approach.
The home-assistant.log file shows nothing of interest. I couldn’t read the host logs - I found a bunch of .journal files, but they seem to be binary. They cannot be read without journalctl, am I right?
One thing that has been happening was that at random times while using the UI was that I would get a “read-only file system” error, which seems to be a symptom of a dying microSD. That might be at fault for the overnight crashes as well. I’m going to replace the card (since this one was a random one that I had around, and is probably not best suited for the job) and see if that fixes the issue.
Thank you! Yes, I would say that the card has something to do with it. I will order a better card, and in the meanwhile, check if the crashes are exactly one week apart.
After replacing the SD card with a new one the crashes have stopped (for now?).
Thing is, we still don’t know what process triggers the crash. Something is happening at sunday night just after midnight that somehow uses the SD card to it’s limits (and over).
Thank you! My issue seems to be slightly different to the one those people are experiencing; my Pi goes completely offline (can’t SSH) and the crashes happen seemingly at random. However, the debugging advice in that thread seems very useful, so I’ll try those methods to get more information that might help diagnose the issue
Replacing the SD helped in some way; the crashes seem to happen less frequently, but they didn’t disappear completely.
I increased the log level to debug, and it crashed again tonight.
I read the home-assistant.log from my computer, but it doesn’t shed any light into the issue. The very last entry seems to be a very normal entry about updating a template sensor:
2021-04-19 21:23:10 DEBUG (MainThread) [homeassistant.helpers.event] Template group [TrackTemplate(template=Template("{{ states('binary_sensor.sensor_movimiento_despacho_javier_motion') }}"), variables=None, rate_limit=None), TrackTemplate(template=Template("00:{{ states('input_number.timeout_ocupacion') | int }}:00"), variables=None, rate_limit=None)] listens for {'all': False, 'entities': {'input_number.timeout_ocupacion', 'binary_sensor.sensor_movimiento_despacho_javier_motion'}, 'domains': set(), 'time': False}
I will try to read the journal files once I get access to a Linux system with journalctl.
The line before this one is a very normal one, Bus:Handling <Event state_changed[L]. It has to do with a sensor that updates every second - could that be possibly overloading the event bus?
What follows is about 500 lines of what seems to be a shutdown process - of which some states seem to fail as well… I posted those here. My Linux knowledge is not that deep, so I can’t really understand a lot of it, but at least now I have specific errors that I can search.
While this doesn’t seem to be the problem here, as I don’t think the pi enters sleep mode. It could be related to the system time changing, NTP lookup so some such thing.
@rousveiga do you still have log entries from before Apr 19 19:22:27? It would be interesting to see if there is some service that alters the system timestamp.
Monday morning and this has happened to me for the second time. Completely off line sometime last night. I am very reluctant to change the storage because i am already using a usb SSD drive. I don’t wan to to buy a new one nor go back to an sd card. Pi4 with superviced install and raspbian. Problem just started within the last week. There has been a lot of core updates of HA lately. My google drive backups have been saving at 10 pm nightly.