Hi everyone. The past week my home assistant has been hanging and cannot be accessed via the web UI or Android companion app. Automations also stop working. The only way I can get it running again is to pull out the power cable and plug it in again.
When it reboots, there is a gap in the logbook from when it died to when I restarted it. Checking video recordings in Frigate there are also none since it died so it seems like nothing is actually running during that time.
My question at this point is - where do I find the logs necessary to investigate this further? I navigate to System > Settings > Logs, “Load Full Logs” but the logs only go back as far as when I powered it back on.
I’m wondering if an SSHD I recently installed could be the problem (it is an older drive I repurposed from my desktop computer) but am hoping for some logs to verify this before I purchase a new drive.
I do have both CPU usage and memory usage both configured through the system monitor integration. Here are screenshots of both during the most recent crash.
It crashsed just before 9am and I restarted it a bit before 4pm. Memory usage is a bit below half and CPU usage is under 40% so both of these seem ok.
I’ve found the log file you mentioned as well. There are only warnings in the hour leading up to the crash. Here they are. These don’t seem too serious and I don’t think they would have caused it to crash.
2023-12-24 09:14:42.171 WARNING (MainThread) [zigpy_deconz.api] No response to 'CommandId.aps_data_indication' command with seq id '0x70'
2023-12-24 09:15:43.181 WARNING (MainThread) [zigpy_deconz.api] No response to 'CommandId.aps_data_indication' command with seq id '0x76'
2023-12-24 09:16:37.709 WARNING (MainThread) [zigpy_deconz.zigbee.application] Unexpected transmit confirm for request id 209, Status: TXStatus.SUCCESS
2023-12-24 09:16:45.531 WARNING (MainThread) [zigpy_deconz.api] No response to 'CommandId.aps_data_indication' command with seq id '0x82'
2023-12-24 09:55:38.616 WARNING (MainThread) [homeassistant.helpers.entity] Update of camera.front_gate_doorbell is taking over 10 seconds
2023-12-24 09:55:43.614 WARNING (MainThread) [homeassistant.components.camera] Updating amcrest camera took longer than the scheduled update interval 0:00:15
2023-12-24 09:55:43.950 WARNING (MainThread) [amcrest.http] <Front gate doorbell:Z1792C5DF9E71> Trying again due to error: ReadTimeout('')
2023-12-24 09:55:43.983 WARNING (MainThread) [amcrest.http] <Front gate doorbell:Z1792C5DF9E71> Trying again due to error: ReadTimeout('')
2023-12-24 09:55:43.994 WARNING (MainThread) [amcrest.http] <Front gate doorbell:Z1792C5DF9E71> Trying again due to error: ReadTimeout('')
2023-12-24 09:55:45.610 WARNING (MainThread) [amcrest.http] <Front gate doorbell:Z1792C5DF9E71> Trying again due to error: ReadTimeout('')
2023-12-24 09:56:23.619 WARNING (MainThread) [homeassistant.helpers.entity] Update of camera.front_gate_doorbell is taking over 10 seconds
2023-12-24 09:56:52.146 WARNING (MainThread) [amcrest.http] <Front gate doorbell:Z1792C5DF9E71> Trying again due to error: ReadTimeout('')
2023-12-24 09:56:52.154 WARNING (MainThread) [amcrest.http] <Front gate doorbell:Z1792C5DF9E71> Trying again due to error: ReadTimeout('')
2023-12-24 09:56:53.395 WARNING (MainThread) [homeassistant.components.binary_sensor] Updating amcrest binary_sensor took longer than the scheduled update interval 0:00:05
2023-12-24 09:56:53.620 WARNING (MainThread) [homeassistant.helpers.entity] Update of camera.front_gate_doorbell is taking over 10 seconds
2023-12-24 09:56:58.622 WARNING (MainThread) [homeassistant.components.camera] Updating amcrest camera took longer than the scheduled update interval 0:00:15
It’s been fine for the last 48 hours. Now I know where the log file is, if it happens again I’ll check it and see if there’s anything in common with the log file from Dec 24th. Also if it happens again I’ll revert back to my previous SD card in case it is related to the hard drive.
If you have any suggestions that would be great, but otherwise I think I’ll just have to wait and see what happens next.
Here is the contents of home-assistant.log.1 in the hour leading up to the crash
2023-12-30 15:12:10.078 ERROR (MainThread) [custom_components.frigate.api] Timeout error fetching information from http://ccab4aaf-frigate:5000/api/stats:
2023-12-30 15:12:10.079 ERROR (MainThread) [custom_components.frigate] Error fetching frigate data:
2023-12-30 15:12:25.078 ERROR (MainThread) [custom_components.frigate.api] Timeout error fetching information from http://ccab4aaf-frigate:5000/api/stats:
2023-12-30 15:12:40.084 ERROR (MainThread) [custom_components.frigate.api] Timeout error fetching information from http://ccab4aaf-frigate:5000/api/stats:
2023-12-30 15:12:55.078 ERROR (MainThread) [custom_components.frigate.api] Timeout error fetching information from http://ccab4aaf-frigate:5000/api/stats:
2023-12-30 15:56:54.078 ERROR (MainThread) [custom_components.frigate.api] Timeout error fetching information from http://ccab4aaf-frigate:5000/api/stats:
2023-12-30 15:56:54.079 ERROR (MainThread) [custom_components.frigate] Error fetching frigate data:
2023-12-30 15:57:09.079 ERROR (MainThread) [custom_components.frigate.api] Timeout error fetching information from http://ccab4aaf-frigate:5000/api/stats:
2023-12-30 16:50:29.079 ERROR (MainThread) [custom_components.frigate.api] Timeout error fetching information from http://ccab4aaf-frigate:5000/api/stats:
2023-12-30 16:50:29.079 ERROR (MainThread) [custom_components.frigate] Error fetching frigate data:
2023-12-30 16:50:44.079 ERROR (MainThread) [custom_components.frigate.api] Timeout error fetching information from http://ccab4aaf-frigate:5000/api/stats:
After powering off and on there is a notification with the following message. I’ve tried opening the file in question in notepad++ but it says the file cannot be found. I’m not able to copy it or rename it either - I think Windows doesn’t like some of the characters in the file anme but it won’t let me rename it either.
I’ve tried searching but not clear on what core.restore_state actually does. I’ve never manually modified it so not sure why I’m getting this error.
Everything seems to be running fine now - do I need to restore from a back-up?
Tomorrow I’m going to restore a back-up to my SD card and try running it from that to see if that solves the problem (in which case I think the SSD could be faulty).
The core.restore_state storage could not be parsed and has been renamed to /config/.storage/core.restore_state.corrupt.2023-12-30T07:15:49.707847+00:00 to allow Home Assistant to continue.
A default core.restore_state may have been created automatically.
If you made manual edits to the storage file, fix any syntax errors in /config/.storage/core.restore_state.corrupt.2023-12-30T07:15:49.707847+00:00, restore the file to the original path /config/.storage/core.restore_state, and restart Home Assistant. Otherwise, restore the system from a backup.
Click SUBMIT below to confirm you have repaired the file or restored from a backup.
The exact error was: unexpected content after document: line 6411 column 2 (char 201710)
From time to time I have the same problem with completely different setup and hardware. Suddenly system stops and I have no clue why.
Did you make any progress finding your reason?
In the end I replaced my pi with a mini PC and for 6 months now everything has been running fine. It may have been related to the SSD I was using (the mini PC has a different SSD in it…) but I suspect I was expecting too much of my pi and needed stronger hardware to run everything.