HAOS on x86-64 Shuts Down Under Heavy Load

Not real sure if this is a Home Assistant issue or just a random hardware issue on my end. I’m running HAOS directly on an old Asus laptop (Intel I3 with 4 gigs of ram). It took me a while to narrow down the general cause of this issue, but I’ve found that anytime it’s under a heavy load for more than maybe 5 minutes or so, it just shuts down. Specifically, I’ve had it happen a couple times during my scheduled backups at night (but very rarely) and then nearly anytime I compile ESPHome firmware that’s more complex (or more correctly, takes a while to compile, mainly devices with BT Proxy). Prior to shutdown, the CPU goes to 100% (which makes sense) for the duration of the heavy load, the temp goes to ~180F and the memory stays flat and low. The laptop battery also shows 100% the whole time.

Since it’s an old laptop with a battery that was completely shot (and I had to resolder one of the barrel plug connections on the board that broke loose which is why it had originally gotten shelved), I originally thought it might just be a power issue and ultimately bought a replacement battery to rule this out. While the battery works just fine, I’ve still had this happen numerous times. I haven’t ripped it open to verify the solder joint still looks okay, but I REALLY don’t think this is it since even with a fully charged battery it still just shuts down.

Any ideas of where to possibly look? It’s quite annoying, especially the randomness related to backups sometimes seeming to cause this. I’m also running AdGuard as an add-on and unfortunately my router can’t handle not seeing it and essentially causes the internet go down (I’ve spent an absurd amount of time trying to find a setting that only makes it use an alternate DNS server if it can’t connect to mine, but regardless of the setting it ends up using the secondary even while mine is up). Luckily with ESPHome, I’ve found that compiling a second time seems to always work (even though it sure looks like it’s re-compiling the whole thing again). When I’m at home, it’s just annoying, but we tend to travel a bit and it’s pretty annoying having it go down (along with the internet).

Could be temperature related. Blow the dust out. Reapply thermal paste.

1 Like

Most likely a heat issue. In general, most Intel Core i3 processors have a maximum operating temperature (Tjmax) around 100°C (212°F). My Intel NUC i3 specs say 90°C. With dust in the heat pipes and heat sink, you are getting pretty close. Is it shutting down or going into the turtle mode?

1 Like

It’s just completely shutting down. Based on your all’s suggestions, I’m gonna try blowing out the dust and move it to another location where it should get slightly better airflow too and try compiling some ESPHome devices and see if it helps any. Normally it gets nearly to the the very end of compiling before it dies, so I’m hoping if it’s a thermal issue those small changes will be enough to see a difference. If so, then I’ll definitely look at reapplying thermal paste too (it’s just such a PITA to open this one up).

The fact that the temp never appears to spike more than other times of high load made me not really consider heat, but if some of them have a lower temp tolerance I could be bumping right up near that (I also just assumed it would unload the CPU if the temp went too high). Admittedly I don’t compile complex ESPHome code all that often, but I never noticed it happening when I originally switched to this device but have slowly noticed it happening, which would make sense if it’s a dust and thermal paste issue building up.

Thanks for the tips and I’ll report back!

Well, after cleaning it out pretty thoroughly with compressed air (from an air-compressor vs a can) and moving to a spot that MAY have better airflow (I felt like it had good airflow before, but there was also a slight chance of the air short-circuiting some) it appears to be running at a lower temp than before when just idle, even after letting it run for a while; specifically about 10-15F lower temp with no change in the CPU load.

I also just compiled a BT Proxy ESPHome device and it didn’t shutdown (although the max temp looked the same, it DID seem like it took a little longer to get there). Fingers crossed this might have fixed it? I’ll probably look at reapplying thermal paste at some time in the future as well. I’ve admittedly never done that before, but am pretty comfortable tinkering with things. Anything I should be particularly careful of or concerned about in doing that? Some quick Googling makes it look pretty straight-forward.

Thanks again for the help and advice @nickrout and @stevemann !

1 Like

That could be the aforementioned “turtle mode”. The processor will turn down the clock speed to run cooler.

Yes. Too much thermal paste is worse than not enough. Too much paste will become an insulator. However, dry paste is pretty rare and unless the computer is 10-12 years old, highly unlikely.

This is one reason that I don’t recommend running Home Assistant on a laptop. A laptop is a poor choice for a server of any kind.

Good to know. I honestly don’t know the exact vintage, but it’s got to be pushing close to a decade. I probably won’t bother with redoing the thermal paste unless the issue crops up again (or I just happen to be opening it up for some other reason).

Definitely don’t disagree, especially purchasing for said use. But it’s also a great use for something that was collecting dust in a closet and would have been trashed otherwise.

Thanks again for the help!

1 Like

Back to the original problem LOL

1 Like

I’ve now compiled ESPHome firmware quite a few times with BT Proxy (among other things) and haven’t had an issue since cleaning it out with compressed air. It didn’t look dirty at all from the exterior, but a reasonable amount of dust came out and the passive temp dropped a good 10F and has stayed that way.

I honestly didn’t really consider dust, both because I thought the processor would self-regulate and because my daily driver has very visible dust collected on the exterior (despite periodically cleaning it out) and this one looked completely clean. Regardless, if anyone else runs into this problem, give it a go before anything more involved.