HASS keeps crashing, please help!

Hello everybody and thanks for helping me.
My home is heavily invested in HA and lately my instance started crashing/rebooting very randomly.

I don’t know whether is an HW problem where the ethernet flaps and breaks connection (it’s PoE powered by a Meraki Switch) or the DB has somehow corrupted.

I cannot pinpoint where the issue is and it is driving me crazy!!

HASS is running as HAOS on a RPI5 with 8G RAM and SanDisk ExtremePro SD-Card on MariaDB. It is PoE powered via Waveshare PoE HAT (F) x Rpi5.

Here’s my latest logs:

Homeassistant.log

2024-09-13 11:07:00.074 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration hacs which has not been tested by Home Assistant. This component m>
2024-09-13 11:07:00.076 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration icloud3 which has not been tested by Home Assistant. This componen>
2024-09-13 11:07:00.078 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration miele which has not been tested by Home Assistant. This component >
2024-09-13 11:07:00.080 WARNING (SyncWorker_0) [homeassistant.loader] We found a custom integration meteoswiss which has not been tested by Home Assistant. This compo>
2024-09-13 11:07:02.348 WARNING (Recorder) [homeassistant.components.recorder.util] Ended unfinished session (id=33 from 2024-09-12 14:30:16.010305)
2024-09-13 11:07:17.460 ERROR (SyncWorker_0) [homeassistant.util.json] Could not parse JSON content: /config/.storage/icloud3/restore_state
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/util/json.py", line 75, in load_json
    return orjson.loads(fdesc.read())  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
orjson.JSONDecodeError: unexpected character: line 1 column 1 (char 0)
2024-09-13 11:07:17.461 ERROR (SyncWorker_0) [custom_components.icloud3.helpers.common] Error while loading /config/.storage/icloud3/restore_state: unexpected charact>
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/util/json.py", line 75, in load_json
    return orjson.loads(fdesc.read())  # type: ignore[no-any-return]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
orjson.JSONDecodeError: unexpected character: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/config/custom_components/icloud3/helpers/common.py", line 311, in load_json_file
    data = json_util.load_json(filename)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/util/json.py", line 81, in load_json
    raise HomeAssistantError(f"Error while loading {filename}: {error}") from error
homeassistant.exceptions.HomeAssistantError: Error while loading /config/.storage/icloud3/restore_state: unexpected character: line 1 column 1 (char 0)
2024-09-13 11:07:23.082 ERROR (MainThread) [homeassistant.components.homekit.util] media_player.tv_samsung_6_series_40 does not support any media_player features
2024-09-13 11:07:23.091 ERROR (MainThread) [homeassistant.components.homekit.util] media_player.plex_plex_for_apple_tv_apple_tv_2 does not support any media_player fe>
2024-09-13 11:07:23.091 ERROR (MainThread) [homeassistant.components.homekit.util] media_player.plex_plex_for_apple_tv_apple_tv_3 does not support any media_player fe>
2024-09-13 11:07:23.091 ERROR (MainThread) [homeassistant.components.homekit.util] media_player.plex_marinella_plex_for_apple_tv_apple_tv does not support any media_p>
2024-09-13 11:07:23.092 ERROR (MainThread) [homeassistant.components.homekit.util] media_player.plex_infuse_direct_apple_tv does not support any media_player features
2024-09-13 11:07:23.092 ERROR (MainThread) [homeassistant.components.homekit.util] media_player.plex_family_plex_for_apple_tv_apple_tv does not support any media_play>
2024-09-13 11:07:23.092 ERROR (MainThread) [homeassistant.components.homekit.util] media_player.plex_infuse_library_apple_tv_2 does not support any media_player featu>

Homeassistant.log.1

  GNU nano 8.0                                                             home-assistant.log.1                                                                        
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 802, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/util/retry.py", line 594, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='app-prod-ws.meteoswiss-app.ch', port=443): Max retries exceeded with url: /v1/forecast?plz=815700&graph_st>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/config/custom_components/meteoswiss/__init__.py", line 223, in _async_update_data
    data = await self.hass.async_add_executor_job(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/hamsclientfork/client.py", line 238, in get_typed_data
    data = self.get_data()
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/hamsclientfork/client.py", line 228, in get_data
    self.get_forecast()
  File "/usr/local/lib/python3.12/site-packages/hamsclientfork/client.py", line 281, in get_forecast
    jsonData = s.get(jsonUrl, timeout=10)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/adapters.py", line 700, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='app-prod-ws.meteoswiss-app.ch', port=443): Max retries exceeded with url: /v1/forecast?plz=815700&graph>
2024-09-13 10:48:59.661 ERROR (MainThread) [custom_components.meteoswiss] Error fetching meteoswiss data: HTTPSConnectionPool(host='app-prod-ws.meteoswiss-app.ch', po>
2024-09-13 10:49:04.125 ERROR (MainThread) [homeassistant.components.ring.coordinator] Error fetching devices data: Timeout communicating with API: Timeout error duri>
2024-09-13 10:49:04.126 ERROR (MainThread) [homeassistant.components.synology_dsm.coordinator] Error fetching 192.168.5.50 SynologyDSMSwitchUpdateCoordinator data: Er>
2024-09-13 10:49:17.125 ERROR (MainThread) [homeassistant.components.synology_dsm.coordinator] Error fetching 192.168.5.50 SynologyDSMCentralUpdateCoordinator data: E>
2024-09-13 10:49:59.222 ERROR (MainThread) [snitun.multiplexer.core] Ping fails, no response from peer
2024-09-13 10:55:41.125 WARNING (MainThread) [hass_nabucasa.iot] Connection closed: Connection error


^G Help         ^O Write Out    ^F Where Is     ^K Cut          ^T Execute      ^C Location     M-U Undo        M-A Set Mark    M-] To Bracket  M-B Previous
^X Exit         ^R Read File    ^\ Replace      ^U Paste        ^J Justify      ^/ Go To Line   M-E Redo        M-6 Copy        ^B Where Was    M-F Next
 *1  [nano]                                                                                                                                             Fri 09-13 11:09

Is any of this making any sense to you??

What am I missing?!?

Thanks for your help fellow home automators!

What’s this “icloud3” custom component which is causing errors? Remove it and see if it’s any better,

For future reference, or if the solution by my fellow HA user doesn’t help, restart HA in safe mode to exclude any faulty integration/configuration and rule out it’s your HW.

It tracks devices such as iPhones which are then associated to people. This is used for presence based automations or for kids safekeeping (ie. they leave school premises during school hours and we/parents get notified on the spot to check-in on them and make sure they’re safe)

Aha… well, it surely seems that something is wrong with this addon, so i would (temporarily) remove it and see if it helps.

At first step to calm down and start bringing the server back online, you need to know that you have posted.
You have a lot of warnings. The warnings are hints for you but will not crash your system and you can fix this to optimize your system later.

So there only remains one error. This error says you have a filename or filecontent with an unreadable character.

If you just loaded the addon without any customization there could be a bug in this addon.

If you changed the config, added a file or something, you can check this out, this could be a device name with an unsupported character in its name as the description of restore state describes here “GitHub - gcobb321/icloud3: iCloud3 v3 - iCloud3 is an advanced iDevice tracker that uses Apple iCloud account and HA Companion App data for presence detection and location based automations.” for device_tracker. All other values should be constants generated by the addon. In this case more people would report an error.

When it works after removing a device, you can fix the problem with renaming the device with an accepted device name.


Your second error indicates your server have tried to many request and you have reached the (daily) limit of requests for your (free account). So here you should set a limit of daily request to the limit the server api allows you to (may this be a daily limit or a limit of requests in a period - every 15 min f.e.).
So all above requests are blocked and give no response/an error.

Coincidentally they’ve just released an update to the iCloud add-on which fixes that unsupported character error.

What puzzles me is, would this be enough for crashing/rebooting my HASS instance??

Sure can.

This is why it’s important to trust the integration author they become ‘part’ of HA and It runs in context therefore getting off rails can absolutely crash you.

The advice given so far is solid.

Of course, that is why we all need a test server and a live server.

Never change a running system. The addon itself will crush nothing but as ssen in your logs, services around will throw errors. Maybe in the future error handling will lead to just block addons and modules and you run with a limited set.

Hey Devel, I did use to have an RPI4 as LAB (I still have it, just not running) but I decommissioned it as the two instances were fighting for the same automations and control…

How would you run a LAB in parallel with your PROD instance?

In lack of money most people and companies will not mirroring everything.

To handle this, separate your network, that there is only one HA Server per network (subnet). In this case you avoid conflicts with DHCP, Adguard etc. With additional API Tokens you can enable these data on every server.

You can install new addons, edit themes, so you can exclude nearly every issue with updates and modifications.
Using a VM with a snapshot is a fine thing. After testing you reload the last snapshot over and over, with a fresh dev environment restoring all occuring error while testing.

Temporary you can migrate some devices to the dev network or dedicate some devices permanent.

I don’t think the add-on is the culprit here.

I’ve just realized nearly 7.5G of RAM are being used and Swap is maxed at 100% which might cause those sudden random reboots.

I’ve now changed the SWAP size to 2G (see the change in the graph below) and it is now staying at around 60% capacity, however, RAM is still at 7.5G utilization with ~400M free memory left.

Any idea why my instance is using all that RAM??

Here’s some of my stats





Yep, that’ll be the culprit.

Go through every addon and check how much RAM it’s using (Settings > Addons > click each addon > Add-on RAM usage). One or more of those will be abnormally high - stop that for a couple of days and see whether the crashes stop.

Found it!
It’s Mosquitto spiking to 84% RAM Usage!!

The problem is I’m heavily depending on MQTT…cannot afford to not having a MQTT broker!!

Any ideas on how I can circumvent that?

SOLVED!

I simply uninstalled an re-installed the Mosquitto broker and now it works. Somehow it got corrupted during one of the updates…

1 Like

That’s absolutely not normal. My Mosquitto broker is only using 1.8% of 4GB, so you should be seeing half that.

Now that you found the addon causing the issue, it’s time to narrow it down to the actual culprit causing this excessive usage:

  • What integrations/devices are making use of MQTT?
  • Have you checked your MQTT logs (or used MQTT Explorer) to see if there’s an extremely chatty device which could be swamping your broker?
  • Are your MQTT settings correct?
  • Have you tried restarting the MQTT Addon and monitored its RAM usage over time?

EDIT: lol nevermind - you posted the solution as I was typing

1 Like