Simple(?) Configuration keeps crashing - help!

Ok. It’s lived past the standard 7 day lifespan and we are now into day 9.
I’m not going to solidly call it ‘Fixed’ until 14 days have passed without a lockup.
But so far, the Samsung 64GB evo select mico xc class 3 SD card (which cost more than the Pi) seems to have fixed this issue.
I’ll report back definitively next week.
Thanks!

1 Like

31+ now :rofl:

1 Like

I guess we’ll never know for sure now, will you…

1 Like

My HA is routinely crashing, usually multiple times a day and I’m on the new HA blue. No idea why, can’t seem to find anything in logs. What would cause consistent crashing when not using an SD card?

crash is overly broad.
Please describe in details the symptoms.

OK, by “crash” i mean I come around the corner and my garage door doesn’t open. I check HA and it will/can not connect. I try to turn on an emulated Hue switch with my Harmony remote. Nothing happens. I check HA and it can/will not connect. I go to my desktop and try to load my local HA URL. Nothing. It simply stops working and I have to pull the power plug and reconnect to get it to reboot.

Do you have samba share set up or ssh? Without seeing the logs there’s not much that can be said

Yup. I can pull whatever logs you’d like. I can’t see anything in them that helps me but I’m no expert. Which would you like to see?

Home assistant logs

Under Supervisor? I have several options, the most useful being Supervisor and Core. Which specify and from where?

home-assistant.log in your configuration folder.

Not much help, but here’s what it said:
2021-07-26 10:53:17 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration alarmdotcom which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant 2021-07-26 10:53:17 WARNING (SyncWorker_2) [homeassistant.loader] We found a custom integration hacs which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant 2021-07-26 10:53:17 WARNING (SyncWorker_1) [homeassistant.loader] We found a custom integration nodered which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant 2021-07-26 10:53:17 WARNING (SyncWorker_4) [homeassistant.loader] We found a custom integration alexa_media which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant 2021-07-26 10:53:58 WARNING (MainThread) [slixmpp.stringprep] Using slower stringprep, consider compiling the faster cython/libidn one. 2021-07-26 10:54:00 WARNING (MainThread) [slixmpp.basexmpp] Legacy XMPP 0.9 protocol detected. 2021-07-26 10:54:24 ERROR (Recorder) [homeassistant] Error doing job: Task was destroyed but it is pending!

I’ve traced this to an OOM issue stemming from the Ring integration.

Hello. I am having quite similar and still not solved issue in post by same author: Home Assistant just stops working!
@bkr1969 , how did you traced this to particular integration?

Also was samba working after system was crashed? As for me it does not. Neither samba, or ssh

This is a really old issue. Many people had a memory leak issue with Ring. I believe it is resolved now. No idea about Samba as I had to completely restart my system after each crash.

You’ll notice in all those threads that you’re linking to that there is a common solution: Get a good quality SD card or switch to an SSD.

If all your containers are dying, then something is wrong with your system. 9 times out of 10 if you’re on a raspberry pi it’s the sd card, 1/10 times it’s a cheap power supply.

Thanks both @bkr1969 and @petro

Most frustrating part is that I don’t know how to troubleshoot this issue, as when it happens I cannot access log files.
Power cycle recreates those files

From hardware perspective: I have RPI4 model B 4GB ram (official power supply) with SSD in USB enclosure connected via USB and a first generation Conbee stick.
Always installing latest stable OS and Core versions
My addons:
AppDaemon 4,Duck DNS,File editor,Mosquitto broker,Plex Media Server ,Portainer,SQLite Web,SSH & Web Terminal (It’s not running),Samba share,Terminal & SSH,deCONZ

Like initial post author I mostly get these issues weekly or couple times a week. But it’s not exactly one week after restart.

Integrations:

when you restart, your old HA logs should still be in tact. FWIW: The differences between my system and your system are:

Addons:

  • AppDaemon 4 (however I ran it for years, probably not the issue.)
  • File editor
  • SQLite Web
  • deCONZ
  • Plex Media Server
  • Maybe Portainer (I haven’t updated it because I don’t want to create a login).

Integrations:

  • Daiken AC
  • Daiken
  • deCONZ
  • Meteorologisk
  • NordPool
  • OpenWeatherMap
  • Xiaomi Mio

So I’d start by looking at those. Other than that, it’s going to be hard to track down your issue.

Thanks!

Is it possible that it’s just raspberry hardware error and RPi needs replacement?

I can also try disabling ones with differences. Disable one by one and check. Is disabling enough or I would need to remove add-ons / integrations?
The only I cannot disable for longer then a few hours is deconz as it’s responsible for all ZigBee devices and home heating :confused:

As the “standard” issues with this kind of error messages is the SD-card or the power supply, can you check how much current is taken from the power supply? I could imagine, that a combination of things could bring the power supply to its knees or at least near it… :slight_smile:

I wouldn’t start with disabling, before I ruled out all other pointers (SD and power). There are way to many people out there, that run a similar combination of things without problems. So I’d suspect something “faulty”. Not faulty in general, but faulty enough, to sometimes bow out… :slight_smile:

Just a small story: I once built up a PC from pieces and it worked great. Only sometimes it restarted without any errors or things like that. Took us a while to find out, that it always restarted when someone touched the case. But even with knowing this, it took us nearly a week. We tested everything, from OS to components and whatsoever. In the end we dismantled it completely, just to find out, one (1!) out of over 20 insulating rings between main board and case was missing… What I want to say is this, if the majority runs the same combinations of hardware without errors, it is more likely, that something with the hardware is faulty, than with the software. :slight_smile:

2 Likes