Home Assistant Pi randomly becomes inaccessible

I have my Home Assistant OS installed on a Raspberry Pi 4 (8GB). I’m using the official PoE+ hat on the Pi, attached to a 1Gb PoE+ port on my switch.

What happens

I’ve noticed this happen twice now. My devices stop working, and I try to connect to Home Assistant, it’s basically dead:

  1. Port 8123 isn’t active.
  2. I can’t access it from Nabu Casa’s Home Assistant Cloud.
  3. It shows up in my UniFi controller and is still pulling power over Ethernet (close to 7W, the normal operating wattage).

Recent changes

The only thing I’ve changed recently:

  1. Upgrading the OS and Home Assistant. I do this regularly.
  2. Swapping from a 32GB microSD card to a USB 3.0 NVMe boot drive. I changed the boot loader to use USB rather than microSD. I eventually plan to change to a Raspberry Pi 5 with NVMe; that’s why I’m using this old 500GB Samsung drive I had lying around.

Why does this happen?

How do I figure out what caused the issue? I don’t know how to use Home Assistant logs to find what may have occurred prior to a restart or what locked it up. Is there a way to get notified if there’s a memory leak or the drive becomes inaccessible or something else?

1 Like

I looked at my logbook. I turned Home Assistant back on at 11:30a. The last time I saw events occur was at 8:17a:

The only thing you have changed is the nvme drive so my guess is the power supply cant cope with the nvme. Try a powered usb hub as the pi’s usb ports are not good at powering drives, sometimes.

… or it maybe easier to swap back to the microSD card for a period of time to see if the problems stop occurring. But agree with Arh guess that the device is drawing more power than the Pi can cope with. And note that if you do switch to a Pi5 that needs a higher power supply.

The Raspberry Pi Power Supply Checker - Home Assistant may also show something?

I can try that if this happens again. I went ahead and upgraded Home Assistant for this week’s release. Hopefully that issue stops.

I assumed it was the PSU, but PoE+ Pi Hat can provide 25.5W of power. I’m using less than 8W:

image

I don’t think it spikes to over 25W; then again, I think USB can pull up to 5A (25W) with the Type A cable, and this is a Type C adapter connected via a Type A port. NVMe drives take between 12-18W for high-performance models. This is only a PCIe gen 3 drive.

I’m not overclocking either. I wonder if something else is going on or if this PoE+ Hat is defective. I had one of my other PoE+ hats stop working from this same order, so it’s possible.

It might not be the poe, but the power a pi’s usb ports can supply. It is sometimes not enough to power a drive.

1 Like

Not sure if this helps; POE+ HAT - not able to use full power : r/raspberry_pi indicates that the USB ports have max 1.2Amps to share across all of its ports.

1 Like

Interesting. Not sure yet if that’s the issue. The max power draw is still under 8W.

I haven’t had the issue since, so I wonder if it was that particular update. We’ll see

I had something similar with an rPi 4 8GB, totally unrelated to Home Assistant. The symptoms disappeared after switching off the ethernet’s power saving feature (or was it by reducing 1G to 100M - sorry, can’t check as it is located in another country behind a CGNAT :confused: ).
You can find similar reports on the Raspberry Pi bug tracker, and it seems to be triggered by an incompatibility with the switch it is connected to (mine is connected to a Unifi Dream Machine Pro). Because of the fact that it doesn’t occur that frequently, it was never really acknowledged as a bug and as a consequence, there is no solution but this workaround.
I hope this post helps you - I have spent many months tracking it down!!

Thanks for that info. I don’t think it’s the switch because I had it on that switch for the last year no problem until recently.

I had it not work again this morning.

I think the issue has to do with Automatic backups. It failed to run one just before I lost connrction.

I tried doing a full manual backup after power cycling, and it happened again.

Now I’m wondering if the drive could simply be bad. I have other NVMe drives I could test with.

Either way, I’ll grab another microSD card today with more capacity to see if that’ll fix it. But I can’t do a full backup, so that’s a problem.

The backup is particularly more intensive use of resources - including CPU for the encryption. Is there a way of seeing if you get any further clues if you connect up the pi to a monitor?

Also if you can’t backup, can you shut it down cleanly and duplicate the disk outside of the OS? (perhaps by PC)

I could connect a portable HDMI monitor I have for sure. I haven’t done that yet, but that’s another option.

At the moment, it hasn’t happened since my posting.

Another similar issue

At the same time, I’ve also been losing connection to my Ubiquiti controller which is one of their CloudKey Gen2+ devices. I’m wondering now if something else is afoot. That hasn’t happened in a bit over a week at this point. It literally loses network access even though it has a static IP and is still powered on via PoE with no front-panel errors other than the “no cloud” icon. Same issue as Home Assistant, it’s inaccessible.

PoE Switch?

And it wasn’t the PoE switch. I tried swapping it from different PoE ports on different switches to no avail. I did plug a USB-C power adapter in the back, but when you use the rackmount module, I’m 99% sure that port is inactive.

Two completely unrelated switches

One other thing I changed is the removal of my UniFi Enterprise PoE 8 switches in favor of the new 2.5G PoE model.

Even though those switches are in different parts of the house, it’s relevant because those Enterprise switches have caused issues on my network before. One time, they were flooding the network with fake MAC addresses: thousands. A restart fixed it, but I’ve actually had to manually power cycle them before when devices suddenly disappeared.

Another idea

Part of that, I think, is also related to the VCE-branded (Chinese no-name company) Ethernet terminators I used when I moved in. Those connectors suck, but I haven’t replaced the all yet. Any jacks having issues in the past haven’t had issues after changing them out.

I’m waiting

At this point, I’m gonna wait to see if it happens again. I already bought a 128GB long-write microSD card if I have to go back. I hope I don’t have to use it. It looks to me as though automatic backups are working, and they’re including the full backup suite too.

1 Like