I have my Home Assistant OS installed on a Raspberry Pi 4 (8GB). I’m using the official PoE+ hat on the Pi, attached to a 1Gb PoE+ port on my switch.
What happens
I’ve noticed this happen twice now. My devices stop working, and I try to connect to Home Assistant, it’s basically dead:
Port 8123 isn’t active.
I can’t access it from Nabu Casa’s Home Assistant Cloud.
It shows up in my UniFi controller and is still pulling power over Ethernet (close to 7W, the normal operating wattage).
Recent changes
The only thing I’ve changed recently:
Upgrading the OS and Home Assistant. I do this regularly.
Swapping from a 32GB microSD card to a USB 3.0 NVMe boot drive. I changed the boot loader to use USB rather than microSD. I eventually plan to change to a Raspberry Pi 5 with NVMe; that’s why I’m using this old 500GB Samsung drive I had lying around.
Why does this happen?
How do I figure out what caused the issue? I don’t know how to use Home Assistant logs to find what may have occurred prior to a restart or what locked it up. Is there a way to get notified if there’s a memory leak or the drive becomes inaccessible or something else?
The only thing you have changed is the nvme drive so my guess is the power supply cant cope with the nvme. Try a powered usb hub as the pi’s usb ports are not good at powering drives, sometimes.
… or it maybe easier to swap back to the microSD card for a period of time to see if the problems stop occurring. But agree with Arh guess that the device is drawing more power than the Pi can cope with. And note that if you do switch to a Pi5 that needs a higher power supply.
I can try that if this happens again. I went ahead and upgraded Home Assistant for this week’s release. Hopefully that issue stops.
I assumed it was the PSU, but PoE+ Pi Hat can provide 25.5W of power. I’m using less than 8W:
I don’t think it spikes to over 25W; then again, I think USB can pull up to 5A (25W) with the Type A cable, and this is a Type C adapter connected via a Type A port. NVMe drives take between 12-18W for high-performance models. This is only a PCIe gen 3 drive.
I’m not overclocking either. I wonder if something else is going on or if this PoE+ Hat is defective. I had one of my other PoE+ hats stop working from this same order, so it’s possible.
I had something similar with an rPi 4 8GB, totally unrelated to Home Assistant. The symptoms disappeared after switching off the ethernet’s power saving feature (or was it by reducing 1G to 100M - sorry, can’t check as it is located in another country behind a CGNAT ).
You can find similar reports on the Raspberry Pi bug tracker, and it seems to be triggered by an incompatibility with the switch it is connected to (mine is connected to a Unifi Dream Machine Pro). Because of the fact that it doesn’t occur that frequently, it was never really acknowledged as a bug and as a consequence, there is no solution but this workaround.
I hope this post helps you - I have spent many months tracking it down!!
The backup is particularly more intensive use of resources - including CPU for the encryption. Is there a way of seeing if you get any further clues if you connect up the pi to a monitor?
Also if you can’t backup, can you shut it down cleanly and duplicate the disk outside of the OS? (perhaps by PC)
I could connect a portable HDMI monitor I have for sure. I haven’t done that yet, but that’s another option.
At the moment, it hasn’t happened since my posting.
Another similar issue
At the same time, I’ve also been losing connection to my Ubiquiti controller which is one of their CloudKey Gen2+ devices. I’m wondering now if something else is afoot. That hasn’t happened in a bit over a week at this point. It literally loses network access even though it has a static IP and is still powered on via PoE with no front-panel errors other than the “no cloud” icon. Same issue as Home Assistant, it’s inaccessible.
PoE Switch?
And it wasn’t the PoE switch. I tried swapping it from different PoE ports on different switches to no avail. I did plug a USB-C power adapter in the back, but when you use the rackmount module, I’m 99% sure that port is inactive.
Two completely unrelated switches
One other thing I changed is the removal of my UniFi Enterprise PoE 8 switches in favor of the new 2.5G PoE model.
Even though those switches are in different parts of the house, it’s relevant because those Enterprise switches have caused issues on my network before. One time, they were flooding the network with fake MAC addresses: thousands. A restart fixed it, but I’ve actually had to manually power cycle them before when devices suddenly disappeared.
Another idea
Part of that, I think, is also related to the VCE-branded (Chinese no-name company) Ethernet terminators I used when I moved in. Those connectors suck, but I haven’t replaced the all yet. Any jacks having issues in the past haven’t had issues after changing them out.
I’m waiting
At this point, I’m gonna wait to see if it happens again. I already bought a 128GB long-write microSD card if I have to go back. I hope I don’t have to use it. It looks to me as though automatic backups are working, and they’re including the full backup suite too.