Frequent reboots without obvious cause

My cabin HA instance is experiencing frequent reboots (like, several times a day):

As the instance is at my cabin (some 200 km away), physically accessing it is not possible at the moment.

Is there a way, from a distance, to figure out what causes the reboots? Power supply hiccups, SD card (it's a relatively new one - a few months), something wrong with the RPi (RPi 4B, 2 GB, version 1.1 - I think), ...?

From the logs, I don't see any obvious reason for the reboots (but I'm not that much of a linux wizard, so there's that), it just seems to reboot out of the blue.

Running HA OS 17.1 (I usually don't update the OS unless I'm physically at the cabin - just in case something goes wrong), Core 2026.5.2 (about to update to 2026.5.3 now, but the reboots have been going for a few months so it's not likely to be the Core version), Supervisor 2026.05.0.

It may, of course, be the SD card since the reboots have been going on since I put in the new SD card, but the SD card replacement was due to the external SDD (in use at the time) failing, so I'm running just off the SD card now. Maybe an RPi problem that caused the SDD failure in the first place?

I realise there's probably not much for you experts to build on here, but any ideas would be appreciated.

Logs logs logs.

But in a vacuum I'm always going to point at the SD card or Power first on a Pi. Then try to use environmental queues like...

I just switched to a new SD because the SSD was acting up...

:smiling_face_with_sunglasses: You already know what to look at. All things are name resolution until proven otherwise... (Unless it's an SD card or a power supply in an overburdened Pi4... And you potentially have both of those...) Do the SD first.

1 Like

Is 2gb ram enough these days, my setup uses 2.7gb or thereabouts.

1 Like

I have seen brand new ones fail...

2 Likes

Yep, maybe, I should perhaps just get something with more RAM.
image

Thank you all for good input.

The long term solution is clearly to get a more capable platform, an RPi 4B 8GB with an external SSD is probably the cheapest solution, whereas the best long term solution probably is a fanless mini pc (I use that for my home HA instance, and it's been rock solid for years).

As a band aid, I have stopped a few add-ons that weren't really in use (but were using memory). Free swap space has increased (from nothing, to a few hundred MB most of the time) and free memory has also increased. There have been no more reboots since doing that last night. So, the conclusion seems to be that @Arh came closest to pointing directly at the actual problem: Insufficient RAM.

SD card is a high risk issue, but so far it's not the problem (but let's not jinx it...).

If you are on a budget, thin clients can be purchased reasonably cheap second hand on ebay.

My setup has been running for 4 years or so, on an HP T630, with a bit of extra ram and bigger ssd. It runs proxmox with HAOS, openmediavault, pihole and labdash.

The problem with RPI's is by the time you have purchased one, with a psu, ssd, usb ssd cable, powered usb hub and a case you into the cost of a mini pc. You will then probably want to upgrade pretty soon, especially if you want to run cameras.

3 Likes

Thanks, I am hesitant to get another RPi (for the reasons you are pointing out), so I'm looking for a reasonably priced (possibly used), light-weight (I'm going to hang it on the wall and have not got much real-estate available), fanless (noiseless) thing. A fanless mini-pc, preferably <0.5 kg, would be great.

I have had a Beelink mini s for a few years now, it would fit the bill. I just looked at the current prices for them, and I was shocked. I bought mine for less than £100 a few years back now they are twice that, I guess that's AI pushing up the ram prices. Anyhow its only used for running windows when need to have windows and its been faultless. I would be searching ebay for thin clients though if it was me. Then cross referencing with

Just ask if you need advice or opinions.

1 Like

Just ask if you need advice or opinions.

Thanks, I might just do that :+1:

Can you monitor RAM usage and free space, both RAM and Disk Drive statistics? You may not be able to catch a sudden surge such as the daily database reorganisation just after 4am, but you may be able to spot trends such as a gradual shortage going to lockup and then interrupted by a watchdog timer.

You may have a rogue app with a resource leak.

If your problem is low RAM, check if all the apps you have loaded are needed. Bloatware such as VSCode have a poor reputation in this aspect. If you still find RAM is too low, hardware upgrade may be your best option.

Your low RAM does not akways cause a crash. Linux has good memory management and will swap out ram as needed into the swap file. If page thrashing occurs where this process gradually brings the system to its knees, you will find wear and tear on your disk, especially if it is a SD Card, will become a contributing factor.

Your memory stats you snapshotted show your actual RAM still has about 20% headroom. Unless you have fixed your swap file size (not advised) or are running low on disk space, it might not indicate an issue.

Alternatively there may be external factors such as unreliable power you may be able to overcome with a UPS. Drive faults should give read and write warnings in the system logs before failing. Replacing your server in this instance will not remedy the issue.

Find your problem before starting to solve it.

Look for clues. The key is to examine the logs. Look for patterns that may be related to time of day or duration of uptime.

Any replacement system should be upgradeable for future capacity issues. RAM (when prices drop), disk drive, etc.

Ask the locals at the remote location if anything happens around the time of the system crash. Earthquake, passing train, aircraft landing, floods, sun angle on open windows, mine blasting, solar energy rate change, rats in the ceiling all are random reasons I have come across as external issues that can affect reliable operation.

I tried to install the latest update today and I was experiencing repeated reboots immediately.

I clicked to apply 2026.5.4. The update page looked normal and the system rebooted. My system console says I still have 2026.5.2

I had to hard reboot it several times. It seems stable now. I just took a manual backup in case it is a hardware failure. This is a Home Assistant Blue.

Thanks, a lot of good advice here.

Free RAM has always seemed to be around 20 % as you point out, whereas free swap has been close to zero. In the evening of Wednesday («ons.» in Norwegian) this week, I stopped/removed a few add-ons that weren’t used or useful and both free RAM and free swap increased. I haven’t had an unexpected reboot since.

What I see is that during back-up, free swap drops to (close to) zero, while free RAM increases (somewhat).

In the plot below you can actually spot the (unexpected) reboots around 5 am, 3 pm and 7 pm (free RAM and free swap spike) on Wednesday. The last spike around 10 pm is a reboot that I initiated after I had thrown out the add-ons I mentioned. (There are another couple of (expected) reboots towards the end of the plot - today on Saturday («lør.») - after I updated core to 2026.5.4 and OS to 17.3.)

So at this point, I’m relatively sure that the culprit was lack of free «memory» (RAM/swap) in some form.

As mentioned, I’m not a Linux wizard, but I have not been able to find anything that raises my eyebrows in the logs (can be due to lack of knowledge what to look for, but still) other than sudden «start-up» type log entries without anything suspicious earlier than that. The lack of any complaints in the log - wouldn’t that also support the hypothesis that the reboot could be caused by the sudden exhaustion of some resource (so sudden that there would not be time to log anything)? I do agree that it could have been the power supply, but since the RPi stopped acting up after I removed some add-ons, I’m inclined to think the power supply is ok.


(RPi 4B 2GB RAM, 256 GB SD card)

EDIT: Typo

I upgraded to 2026.5.4 today with no issues, so I’m not sure this is a related issue, but thanks, anyway.

See this comment:

I'll settle for scarce RAM resources running out as your underlying issue. By the very nature of swap files, they grow and proliferate as they are needed and the OS will remove them after they have finished their role, unless there is a crash and they survive the reboot. Hence they should always be close to 100% use, disappearing if not needed any more. The real value you need to track is free RAM, which will drive the memory allocation OS subroutines to start swapping out to disk as required if real RAM becomes scarce. Shouild this become excessive, your system eventually grinds to a halt as the processes encounter bottlenecks and watchdog timeouts.

Now you've done your cleanup of resource hungry apps, have applied the latest update, and are keeping your fingers and toes crossed and rosary beads warm, start saving for the hardware upgrade and keep a close eye on your backups that they complete without incident so you are able to confidently migrate when the time comes and RAM prices fall back to reasonable figures.

Yes, thanks, I’m doing just that :wink:
I’m spending some time getting to know the market for thin clients and mini pcs and will upgrade from the RPi - hopefully before the SD card dies.

An important positive effect of the cleanup is that nightly back-ups now complete successfully. Before the cleanup, the back-up itself used to be successful, but a reboot usually occurred during the uploading of the back-up to the cloud so a dead SD card would be a real bummer.

I don't need a hardware upgrade. Haos runs in an Ubuntu VM in an i7-8700.

I have 64gb of RAM. 23% usage of 12gb allocated to HAOS just bugged me.

The update cleared the memory leak or whatever it was, I cleaned nothing out.

The memory page thrash issue may have slowed your system down so much that these processes didn't complete before they timed out. The frantic page thrash is a major contributor to shortened SD Card life, so keep offsite backups and verify that you can restore from them.

Vatican bank motto: Jesus saves, so should you...