HA Hangs and no response

HI All, I am using Homeassistant (core-2022.3.7, supervisor-2022.05.0) on Dockers. Its on RaspBerry Pi4 with SD card.

Since, last couple of weeks I am having a strange problem my HA hangs completely and I cannot access it. The only way is to reboot it. At hang time, I can ping the system but cannot ssh into it.

There is no way i can see the logs as what caused this issue as logs are generally kept for very small time frame and are usually erased when system is booting up.

I was thinking how to troubleshoot or diagnose this issue. This usually happens once in every 24hrs.

Any help will be appreciated.

Thanks

Only a guess: Maybe the supervisor version doesn’t like the stale ha core version.

thanks what do you mean by stale. Do you think this version is unstable and I need to upgrdate to next version.
Happy to try updating to new verion.

Means that your HA version is not the latest available one :slight_smile: We are close to 2022.5.

cheers i have updated that now, hopefully this should fix.
I will report back in weeks time.
Thanks

:+1:t2:

Other reasons could be power outages or a corrupted SD card.

1 Like

Yeah, corrupt SD card was my first guess too.

good points,
we dont have power outages here
Corrupted SD is possiblity.
I did tried to check using following and found no issues, Do let me know if there are any better ways to check for corruption.

time badblocks -sv /dev/mmcblk1p1 -o mmcblk1p1.log
Checking blocks 0 to 306175
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found. (0/0/0 errors)

A defective power cable or a loose contact can sometimes be enough.

What is your Python version?

Python 3.7.3

is it possible to store logs for longer to see what errors where there before system got halted?

i checked the power cable and it appears to be new.
Secondly, I have placed my hardware in secure location so kids cannot touch it.
Even i have to use my kasa smart switch to reboot as physically its complicated to reach :slight_smile:

Don’t you have something like

WARNING (MainThread) [homeassistant.bootstrap] Support for the running Python version 3.7.3 is deprecated and will be removed in the first release after December 7, 2020. Please upgrade Python to 3.8.0 or higher.

in your logs?

What do you mean with store logs for longer? You can go to your configuration directory → home-assistant.log and home-assistant.log.1 files

no i dont have anything like this in logs. Mostly, it adb related issue (Nvidia Shield)

yeah thanks for name of logs but I dont usually see what happened around the time when system was hung. Usually these log files contain data after when i reboot something. like 1am in the morning system was not responding hence i did a remote reboot.

Anything prior to that is erased.

hi, i am still having the same issue.
Even though now i am updated to latest version of homeassistant.
I was wondering if there is a way to scan my sd card for any errors.
I tried fsck command but it suggests file system is mounted.
Thanks

You can start by running sudo dmesg to see kernel messages. Typically when errors are starting to occur they will end up there.

There’s also the badblocks tool that you can run, which will run in read-only mode by default so can be used on a mounted file system.

Nothing is erased. Ignore the home-assistant.log files in /config, they are irrelevant. That’s just a file where core replicates logs since the last restart for convenience.

Your real log on a system with supervisor is the system journal. It contains everything from all containers and system itself and is persistent across many restarts

I wrote a guide on how to access it. Take a look and start there, hopefully that can give you some insight into what’s going on.

thanks for your reply.
demsg comes up with warnings only .

badblock output is fine.

root@HomeAssistant4:/home/baba# fdisk -l
Disk /dev/mmcblk1: 59.6 GiB, 64021856256 bytes, 125042688 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xffd90dc1

Device         Boot  Start       End   Sectors  Size Id Type
/dev/mmcblk1p1        2048    614399    612352  299M  c W95 FAT32 (LBA)
/dev/mmcblk1p2      614400 125042687 124428288 59.3G 83 Linux
root@HomeAssistant4:/home/baba#

root@HomeAssistant4:/home/baba# time badblocks -sv /dev/mmcblk1p1 -o mmcblk1p1.log
Checking blocks 0 to 306175
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found. (0/0/0 errors)

real    0m9.978s
user    0m0.010s
sys     0m0.172s
root@HomeAssistant4:/home/baba# time badblocks -sv /dev/mmcblk1p2 -o mmcblk1p2.log
Checking blocks 0 to 62214143
Checking for bad blocks (read-only test): ^[[Cdone
Pass completed, 0 bad blocks found. (0/0/0 errors)

real    35m53.264s
user    0m2.205s
sys     0m36.631s
root@HomeAssistant4:/home/baba#

thank you,
i did connected to ssh and tried to get journalctl.
There is a funny thing. I get jounal messages from Feb 14 and then May 12 today nothing in between. Do you think this is normal?
Even the message from today are after the reboot which was around 1724.

Feb 14 10:12:01 HomeAssistant4 kernel: bluetooth hci0: firmware: failed to load brcm/BCM4345C0.hcd (
Feb 14 10:12:01 HomeAssistant4 kernel: bluetooth hci0: firmware: failed to load brcm/BCM.hcd (-2)
Feb 14 10:12:01 HomeAssistant4 kernel: Bluetooth: hci0: BCM: firmware Patch file not found, tried:
Feb 14 10:12:01 HomeAssistant4 kernel: Bluetooth: hci0: BCM: 'brcm/BCM4345C0.hcd'
Feb 14 10:12:01 HomeAssistant4 kernel: Bluetooth: hci0: BCM: 'brcm/BCM.hcd'
Feb 14 10:12:01 HomeAssistant4 systemd[1]: Starting Network Time Synchronization...
Feb 14 10:12:01 HomeAssistant4 systemd[1]: Starting Raise network interfaces...
Feb 14 10:12:01 HomeAssistant4 systemd[1]: Started Update UTMP about System Boot/Shutdown.
Feb 14 10:12:01 HomeAssistant4 systemd-timesyncd[284]: System clock time unset or jumped backwards,
May 12 17:24:17 HomeAssistant4 systemd[1]: Started Network Time Synchronization.
May 12 17:24:17 HomeAssistant4 systemd[1]: Reached target System Initialization.
May 12 17:24:17 HomeAssistant4 systemd[1]: Listening on D-Bus System Message Bus Socket.
May 12 17:24:17 HomeAssistant4 systemd[1]: Starting Home Assistant OS Agent..



No that is very odd, I’ve never seen anything like that. I have tons of stuff in my journal all the time. Logs from every container as well as kernel, audit and all the system stuff go in there so I don’t see how that could be. You aren’t applying a filter there?

Although I just noticed from one of your earlier screenshots it says you are running an unsupported installation. Is the reason related to the journal by any chance? If not what is it?

no i am not applying any filters its just a barebone journalctl command.
I have also reset the journal today (i assuming it was corrupt).
If this does not work out I am going to make it ‘supported install’ on a usb drive (boot disk).
Regards