Core = 2024.5.5
Supervisor = 2024.05.01
OS = 12.3
Hardware = Raspberry Pi 5 8GB, Waveshare PoE HAT
USB = None
I’ve been running HA for a number of years now, alternating between running it on a container and running it in a VM. As I am moving to a new home shortly, I decided to start fresh on an 8GB Pi5, utilizing the waveshare PoE HAT. The original plan was to use a 64GB Sandisk Endurance card that I had, with having the DB on a separate system. However, even with a completely fresh install (no addons besides default Matter, no integrations, besides the default), HA Core would restart every ‘x’ amount of time (where x was completely random.
Voltage from the switch was showing ~5.1 watts at idle, with some spikes to 7.9 watts. Temperatures were consistently around the mid-40’s. And of course, no consistent error in the logs to indicate where the problem may lie. However, I did my due diligence and removed the HAT and use the proper PSU, same behaviour. I removed the board from the case (for better air flow), same behaviour. I replaced the SD card with a new Samsung, same behaviour. I also enabled “debug” in the configuration.yaml file.
Frankly, I was prepared to throw in the towel, when I decided to try once again. Fresh install, no usb devices, running off SD Card using the PoE Hat and with debug in configuration.yaml. The behaviour returned, but I saw two repeated errors in the host log stating:
homeassistant kernel: get_swap_device: Bad swap offset entry 00100000
homeassistant systemd-journald[11040]: Missed X kernel messages
Suspecting it may be an issue with swap (despite it not being used), and confirming via Google, I disabled the swap. I will note here that this wasn’t meant to be a “fix” as I suspected that HA requires a swap to exist, even if not used. This was rather a “nothing else has helped, lets see what happens”. So what happened was suprising. The system has now been up for the past 22 hours, and seems to be working just fine. The only thing that doesn’t appear to be working is that the only two logs being generated is core and host. As well as the above kernel error still remains repeated ad nauseum in the host log.
I installed system monitor as well as Advanced Terminal to have better eyes into what’s happening. HTOP is showing one core completely pinned, and a second core at about 50% usage, but I can’t see any indication of what / who the culprit is. If I were to hazard a guess, it’s probably supervisor trying to restart core (despite core functioning just fine)… but I have no idea.
Anyway, I know this is a long post, but has anybody seen this before. Any other steps I can do?
Thanks in advance!
EDIT - Rereading this, I want to clarify that the system was always “up”. The stability issues were that Supervisor was restarting core every “x” amount of time. The system itself never unexpectedly rebooted.