I know this might not be the right forum to ask this question, but I need help with debugging my Raspberry Pi 4 running Home Assistant Supervised sporadically crashing. If you can point me to another place to ask the question, please do!
The Raspberry Pi has been running for years now with no issues, but now, maybe once per week, the Pi seems to crash. It then no longer replies to pings - either on ethernet or WiFi. As it is headless, I can’t get much information from it. It runs in an Argon One case with an SDD at about 45 degrees C with the official Raspberry Pi 4 power supply.
I have looked in the /var/log/messages log file, but I don’t see any particular error at the time the crash happened. Everything just seems to stop.
How do I debug this? Is there any other log files I can check? Can I check the boot device (the SSD) for errors?
Pi crashes are either.
Power problems (they can easily ovtrin the power supply,)
Sd/storage failure/disconnect) you’re not on SD unless you’re running from SD AND.have an SSD?
Then you start looking at software issues.
First thing are you using the original Argon one PS or one guaranteed to deliver a solid 3a? (i keep an extra - Mines in am Argon one)
Do you have your peripherals on a powered external USB hub?
If you’re using the SD as boot drive and have the SSD you should also look at the SD card. You can’t file out issues with the SSD, just with sd it’s far more likely to be an issue (sd cards wear out)
Have you run your device with a monitor attached? Mine throws undervolt issues on the console if it’s about to crash.
I am using the official Raspberry Pi 4 power supply rated for 3 amps. I verified the output with an oscilloscope (I am an electrical engineer), and it is rock solid. Previously, I used a supply for an iPad, and I had tons of under-voltage warnings. I did not see any issues with it, though. I have not had any under-voltage warnings since going with the official power supply.
With regards to the storage, I only have the SSD (no SD card boot).
I have an external hard drive attached via USB, but the hard drive enclosure is powered externally. Other than that, I only have a Sonoff Zigbee dongle (with a shielded USB extension cable to avoid disturbance from the external hard drive). After the last crash yesterday, I unplugged the external hard drive from the Pi as part of the debugging.
It is kind of difficult for me to connect a monitor to the Pi, but I am aware that it might be the last resort.
I am no expert in Linux, so I am not sure about the logs. I checked journalctl and also /var/log/message files, but I am concerned that I am missing something. Those log files do not show any unusual stuff. They simply stop at a certain point (usually with an entry from Zigbee2MQTT receiving data from a device), and the next entry in the log is the first boot message, when I reboot the Pi.
I have two main questions:
Which log files should I look in?
How do I check the integrity of the SSD (the boot drive)?