HA dies every morning

Since I moved my HA (2021.10.6) install off SD onto a SSD (restoring a full backup from the SD onto a new SSD install), I’ve got some issues.

The worst so far is for whatever reason, HA just dies every morning around 5-ish. The Rpi4 can be pinged but Lovelace, Homekit control, Zigbee buttons, all that is dead. I have to powercycle the device. I have yet to try to SSH into it as I was adamant to get it up and running ASAP. I’ll try that the next time this happens.

The logs restart after I kickstart it back to life (which is weird, isn’t it?), so there’s nothing to go on. This has never happened, not once, with the SD card. It always happens in the wee hours, which would be kinda weird if it was, say, a power issue.

So, questions:

  1. what could this be?
  2. how can I make sure I have got logs to go on? I don’t remember having logging problems until this. I don’t have specific logger: setting in my config.yaml but nor do I use default_config:, so maybe that’s why? I’ll start by adding logger: default: warning (correctly formatted!).

I am on this install since 2019 without major issues until this. Thanks!

EDIT: some more info:

  • using official Rpi4 power 3A supply, only other thing plugged is a CC2531 Zigbee stick
  • SSD enclosure has the recommended ASMedia chipset, SSD is Patriot brand
  • no voltage warnings in logs
  • no CPU/RAM/temp spikes before it dies
  • it’s plugged into UPS, so the power should be fairly smooth

5-ish or 04:12?

That’s what time the database purges.

Last logbook entry is 5:07, and then nothing until 7:17 when I manually powercycled the Pi. There are entries from 4 to 5 so that’s probably not it. Thanks!

Have a look at the file home-assistant.log.1 with a text editor. That is the system log from before your last reboot. Any low voltage warnings?

Adding an SSD will require more power.

I’ve rebooted once more since, so that is lost… How can I retain logs for longer? I’m looking at logger documentation but don’t see anything to that effect.

In logs that I do have available, no voltage warning. I use the official Rpi4 power supply.

Only the one previous log is saved. If you want more it’s going to be a mess of file sensors and file notifications to create a copy of your log file. Not worth the effort. Just check the log file 1 after rebooting.

Which is enough for a pi4… but a pi4 and an SSD :man_shrugging:

1 Like

I moved to SSD about six months ago and got an official power supply for that reason - no problems so far.

but I DO have an official Pi 4 power supply…

Sorry - I was replying to @tom_l 's dubious shrug.

Is there anything else powered by the usb ports? Like a zigbee/zwave?

You can enable system monitor, if you haven’t already. It will record ramped up processor/memory before the crash if that is the case.

sensor:
  - platform: systemmonitor
    resources:
      - type: memory_use_percent
      - type: memory_free
      - type: processor_use
      - type: processor_temperature

I am using the monitoring, yes . No spikes in CPU, RAM or temp before the die-off. I’ll add that to the post. Thanks.

The Pi has the obligatory CC2531 connected, nothing else beside the SSD. No voltage warnings.

Reddit tells me the USB 3 controller may overheat and cause a hang. It doesn’t have a sensor. I’ll try to stick a heatsink on it. Jesus this SSD on RPi stuff is picky :slight_smile:

Are you using one of the known working adapters? Can you disconnect the zigbee dongle for the night? The original pi power supply is the minimum recommended supply for a pi4 with an ssd.

Yes I am using the recommended ASMedia one. I’ll try unplugging Zigbee as the last resort, I have a lot of important stuff on it. Thanks!

i also have similar problem. hangs every alternate day 6.11am…yesterday I increased the swap size. and today It didnt stop at scheduled (!) time, still running. i will observe a week, lets see.

1 Like

That uses an NVMe hard drive? IIRC they draw more power under load than a sata version. If you have a powered usb hub, you could use that to power the zigbee dongle.

No, SATA SSD.

It happened again, at the same time. Nothing of interest in log, no apparent spike of any kind. I’m returning back the SD card as I can’t deal with this right now (it’s a bitch to diagnose as it only ever happens at 5AM, and there’s a toddler involved who absolutely requires a working zigbee device around that time). I’ve ordered (another) powered USB hub and some heatsinks and will give it a go later. If I have to choose between unreliable solution that’s been working flawlessly for 2+ years (I have backups) or a reliable one that dies for no apparent reason every day, I’ll take the former :slight_smile:

Thanks y’all!

update:
it’s been running without any hangs for 5 days since I increase my swap from 1GB to 3GB. I did see some spikes in the swap size and actually, its current value is more than the default value. perhaps you can try increasing the swap size give it a try.

Thank you, I just went back to the SD card and it’s smooth sailing since. I don’t have the time to troubleshoot hardware for no good reason :slight_smile:

You should use the datadisk option which is a lot more stable than USB booting