HA Freezing - logs for cause?

I’m running HA on an RPi4 with an SSD for storage.

Twice in the past week I have not been able to open the instance on my PC or iPad. I have pinged the RPi and it is responding - just no dashboard and the message that it is refusing to connect.

The only way I can figure to fix it is to pull the power plug and reboot the RPi. When I look at the various logs everything seems to have been working okay and then nothing… until I have rebooted. No error messages… no apparent faults.

Is there anywhere in particular I should be looking to see if I can identify the cause?

It could be the SSD - it’s an AliExpress KingSpec, but it could just as easily be the RPi or the power supply. I am just hoping there is a way to point me to the cause. Is there a way to check the SSD health as there is on a PC?

Are you running HAOS? If so which version are you on …

Yes HAOS on the Pi.
HA 2022.11

It froze on this one and the first time on 2022.10 (final in that series)

Can you ssh onto the rpi? Without a way to check the system logs then you’ll only be guessing.

I’d start by looking in /var/log/syslog - the rpi warns about low power events there, usually quite chatty before it does lock up.

If it is the ssd, there’ll likely be a lot of I/O type messages there, or in dmesg.

But from experience, I’d say stick a bigger usb charger on it and replace the lead. 4 out of 5 times an rpi unstability is fixed that way. If that doesn’t fix it, then it could be overheating, ssd or lack of memory. I’ve never had HA itself lock up.

I just checked that the SSD was firmly plugged into the USB3 socket and had a freeze - so I am wondering if it is just the cable. Swapped it over to a new one, so I’ll see.

I didn’t try to SSH in, but will next time.

This is what I am getting on Glances:

Warning or critical alerts (last 3 entries)
2022-11-04 10:17:59 (00:01:00) - CRITICAL on CPU_IOWAIT (33.0)
2022-11-04 8:54:59 (00:01:00) - CRITICAL on CPU_IOWAIT (30.3)
2022-11-03 20:38:39 (00:01:39) - WARNING on CPU_USER (75.6)
|DISK I/O| R/s| W/s|
|----|----|----|
|sda|33K|62K|
|sda1|0|0|
|sda2|0|0|
|sda3|0|0|
|sda4|0|0|
|sda5|0|0|
|sda6|0|0|
|sda7|0|0|
|sda8|33K|62K|
|zram0|0|0|
|zram1|0|0|
|zram2|0|0|
1 Like

did you have a way to check ssd health with under HA OS?

I was facing same issues and behavior - if your system hangs when trying to load supervisor logs, it might be that you either have an insufficient Powersupply, or you have an UAS incompatible ssd-adapter.

For too weak Powersupplys obviously there is only a hardware-solution :wink: For UAS-related Problems: (Raspberri Pi 4) 9 steps howto get both HASSIO boot & data run over an SSD
There might be more intuitive instructions around.
Basically: Confirm your System hangs when accessing Supervisor-Logs.
Then check over SSH, if UAS for your drive is ignored. If it isn’t, force it to do so, by adding VendorID and Product-ID as Quirk to cmdline.txt.

For me, having the same issues, that solved it :slight_smile:

No - I don’t know of any way to check SSD health under HA on an RPi.

It might have been a power supply given I was running the SSD through the USB 3, but even swapping power supplies did not fix the problem.

So - I bit the bullet and bought a 2nd hand thin client (Lenovo M900) and put HA in Proxmox. It’s running without a hitch. I’m now looking for RPi 4 projects!

For I found out when using local calendars and add for October on every day “Halloween” state all days from first to last day. So 30 appointments. HA crash on that. When removing the local calendar HA is stable as a rock again