HA crashes almost daily after 2021.12 upgrade

matt1131 · December 22, 2021, 8:53pm

Running HA on Rpi 4b+, Samsung T7 500GB SSD through USB. Everything is stable on 2021.11, but every time I try to upgrade to 2021.12 HA crashes after 4-24 hours. I can power cycle and it comes back ok. When it crashes, I can’t SSH in to get the log, but on the latest crash I still had it open on my browser and it loaded the formatted log which I copied out below. I can’t find anything in the breaking changes that could be causing this.

Error while processing event StatisticsTask(start=datetime.datetime(2021, 12, 22, 19, 50, tzinfo=datetime.timezone.utc)): ‘NoneType’ object is not callable
3:15:10 PM – (ERROR) Recorder - message first occurred at 1:30:10 PM and shows up 22 times

Failed to to call /resolution/info -
3:15:06 PM – (ERROR) Home Assistant Supervisor - message first occurred at 10:35:41 AM and shows up 14 times

Client error on /network/info request Cannot connect to host 172.30.32.2:80 ssl:default [Connect call failed (‘172.30.32.2’, 80)]
3:15:06 PM – (ERROR) Home Assistant Supervisor - message first occurred at 1:29:55 PM and shows up 138 times

Unhandled exception
3:15:03 PM – (ERROR) /usr/local/lib/python3.9/site-packages/aiohttp/web_protocol.py - message first occurred at 2:26:34 PM and shows up 11 times

Error handling request
3:15:03 PM – (ERROR) components/http/static.py - message first occurred at 2:26:55 PM and shows up 4 times

Client error on api app/entrypoint.js request Cannot connect to host 172.30.32.2:80 ssl:default [Connect call failed (‘172.30.32.2’, 80)]
3:15:03 PM – (ERROR) Home Assistant Supervisor - message first occurred at 2:26:55 PM and shows up 3 times

Error while processing event <Event time_changed[L]: now=2021-12-22T15:13:29.001989-05:00>: ‘NoneType’ object is not callable
3:14:59 PM – (ERROR) Recorder - message first occurred at 1:29:49 PM and shows up 285 times

SQLAlchemyError error processing event <Event time_changed[L]: now=2021-12-22T15:12:29.001941-05:00>: This Session’s transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: ‘NoneType’ object has no attribute ‘dialect’ (Background on this error at: Error Messages — SQLAlchemy 1.4 Documentation)
3:14:59 PM – (ERROR) Recorder - message first occurred at 1:29:49 PM and shows up 146 times

Error doing job: Task exception was never retrieved
3:14:57 PM – (ERROR) helpers/storage.py - message first occurred at 1:39:01 PM and shows up 21 times

Error while processing event <Event time_changed[L]: now=2021-12-22T15:13:44.001686-05:00>: ‘NoneType’ object has no attribute ‘dialect’
3:14:56 PM – (ERROR) Recorder - message first occurred at 1:29:49 PM and shows up 352 times

SQLAlchemyError error processing event <Event time_changed[L]: now=2021-12-22T15:09:56.000329-05:00>: This Session’s transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: ‘NoneType’ object has no attribute ‘dialect’ (Background on this error at: Error Messages — SQLAlchemy 1.4 Documentation)
3:14:44 PM – (ERROR) Recorder - message first occurred at 1:29:49 PM and shows up 139 times

Error connecting to ecobee while attempting to get thermostats. Possible connectivity outage. HTTPSConnectionPool(host=‘api.ecobee.com’, port=443): Max retries exceeded with url: /1/thermostat?json=%7B%22selection%22%3A+%7B%22selectionType%22%3A+%22registered%22%2C+%22includeRuntime%22%3A+%22true%22%2C+%22includeSensors%22%3A+%22true%22%2C+%22includeProgram%22%3A+%22true%22%2C+%22includeEquipmentStatus%22%3A+%22true%22%2C+%22includeEvents%22%3A+%22true%22%2C+%22includeWeather%22%3A+%22true%22%2C+%22includeSettings%22%3A+%22true%22%7D%7D (Caused by NewConnectionError(‘<urllib3.connection.HTTPSConnection object at 0x7f910c1b50>: Failed to establish a new connection: [Errno -3] Try again’))
3:13:46 PM – (ERROR) /usr/local/lib/python3.9/site-packages/pyecobee/init.py - message first occurred at 1:31:46 PM and shows up 35 times

Connection to ThinQ for device Washer is not available. Connection will be retried
3:13:27 PM – (ERROR) SmartThinQ LGE Sensors (custom integration) - message first occurred at 1:36:02 PM and shows up 8 times

Connection to ThinQ failed. Network connection error
3:13:27 PM – (ERROR) SmartThinQ LGE Sensors (custom integration) - message first occurred at 1:36:02 PM and shows up 4 times

Error doing job: Task exception was never retrieved
3:10:45 PM – (ERROR) loader.py - message first occurred at 2:41:01 PM and shows up 9 times

Can’t read Supervisor data:
3:10:25 PM – (WARNING) Home Assistant Supervisor - message first occurred at 1:29:55 PM and shows up 21 times

Request exception for ‘https://api.github.com/rate_limit’ with - Cannot connect to host api.github.com:443 ssl:default [Try again]
3:09:09 PM – (ERROR) HACS (custom integration) - message first occurred at 2:29:09 PM and shows up 5 times

Access to https://aa015h6buqvih86i1.api.met.no/weatherapi/locationforecast/2.0/complete returned error ‘ClientConnectorError’
2:36:52 PM – (ERROR) /usr/local/lib/python3.9/site-packages/metno/init.py - message first occurred at 1:34:47 PM and shows up 2 times

Error doing job: Task exception was never retrieved
2:29:09 PM – (ERROR) util/json.py

JSON file reading failed: /config/.storage/hacs.hacs
2:29:09 PM – (ERROR) util/json.py

Error handling request
2:26:48 PM – (ERROR) components/recorder/util.py - message first occurred at 2:26:44 PM and shows up 2 times

Error fetching myq devices data: Error requesting data from https://devices.myq-cloud.com/api/v5.2/Accounts/672bbe33-0da7-458f-a5c9-55df419c05f2/Devices: Cannot connect to host devices.myq-cloud.com:443 ssl:default [Try again]
1:44:08 PM – (ERROR) MyQ

Disconnected from MQTT server core-mosquitto:1883 (16)
1:31:48 PM – (WARNING) MQTT - message first occurred at 1:31:48 PM and shows up 2 times

Error fetching rinnai-204701826WZD5 data: There was a client connection error while requesting https://s34ox7kri5dsvdr43bfgp6qh6i.appsync-api.us-east-1.amazonaws.com/graphql: Cannot connect to host s34ox7kri5dsvdr43bfgp6qh6i.appsync-api.us-east-1.amazonaws.com:443 ssl:default [Try again]
1:30:55 PM – (ERROR) Rinnai Control-R Water Heater (custom integration)

Error while processing event <Event time_changed[L]: now=2021-12-22T13:28:43.000814-05:00>: [Errno 30] Read-only file system: ‘//config/home-assistant_v2.db’ → ‘//config/home-assistant_v2.db.corrupt.2021-12-22T18:29:49.576467+00:00’
1:29:49 PM – (ERROR) Recorder

The system will rename the corrupt database file //config/home-assistant_v2.db to //config/home-assistant_v2.db.corrupt.2021-12-22T18:29:49.576467+00:00 in order to allow startup to proceed
1:29:49 PM – (ERROR) Recorder

Unrecoverable sqlite3 database corruption detected: (sqlite3.DatabaseError) database disk image is malformed [SQL: INSERT INTO events (event_type, event_data, origin, time_fired, created, context_id, context_user_id, context_parent_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?)] [parameters: (‘state_changed’, ‘{}’, ‘LOCAL’, ‘2021-12-22 18:28:42.956381’, ‘2021-12-22 18:28:42.956381’, ‘935774b0e4b912eeab50a893481b3359’, None, None)] (Background on this error at: Error Messages — SQLAlchemy 1.4 Documentation)
1:29:49 PM – (ERROR) Recorder

SQLAlchemyError error processing event <Event time_changed[L]: now=2021-12-22T13:28:28.000750-05:00>: This connection is on an inactive transaction. Please rollback() fully before proceeding. (Background on this error at: Error Messages — SQLAlchemy 1.4 Documentation)
1:29:49 PM – (ERROR) Recorder - message first occurred at 1:29:40 PM and shows up 4 times

Error in database connectivity during commit: Error executing query: (sqlite3.OperationalError) disk I/O error (Background on this error at: Error Messages — SQLAlchemy 1.4 Documentation). (retrying in 3 seconds)
1:29:46 PM – (ERROR) Recorder - message first occurred at 1:29:37 PM and shows up 4 times

Config entry ‘Z-Wave JS’ for zwave_js integration not ready yet: Cannot connect to host a0d7b954-zwavejs2mqtt:3000 ssl:default [Connect call failed (‘172.30.33.4’, 3000)]; Retrying in background
1:29:37 PM – (WARNING) config_entries.py

Failed to connect: Cannot connect to host a0d7b954-zwavejs2mqtt:3000 ssl:default [Connect call failed (‘172.30.33.4’, 3000)]
1:29:37 PM – (ERROR) Z-Wave JS

Error doing job: Task exception was never retrieved
1:29:37 PM – (ERROR) util/file.py

NCP entered failed state. Requesting APP controller restart
1:29:37 PM – (ERROR) /usr/local/lib/python3.9/site-packages/bellows/ezsp/init.py

Error saving current states
1:29:37 PM – (ERROR) util/file.py

Lost serial connection: write failed: [Errno 5] I/O error
1:29:37 PM – (ERROR) /usr/local/lib/python3.9/site-packages/bellows/uart.py

Saving file failed: /config/.storage/zha.storage
1:29:37 PM – (ERROR) util/file.py

Error doing job: Fatal write error on serial transport
1:29:34 PM – (ERROR) /usr/src/homeassistant/homeassistant/runner.py

File replacement cleanup failed for /config/.storage/tmpivky3jat while saving /config/.storage/core.restore_state: [Errno 30] Read-only file system: ‘/config/.storage/tmpivky3jat’
1:29:37 PM – (ERROR) util/file.py

Saving file failed: /config/.storage/core.restore_state
1:29:37 PM – (ERROR) util/file.py

BacchusIX · December 23, 2021, 3:55am

I’ve had nothing but issues since I upgraded to 2021.12 a few weeks ago. I think I’m going to have to roll back my system and the Wife Acceptance Factor has pretty much gone terminal. Node Red has just become the latest casualty, which means my entire smart home is down right now.

matt1131 · December 23, 2021, 1:47pm

At least I’m in good company. WAF is also at an all-time low here too.

Slinkos · December 23, 2021, 1:52pm

My log is also full of errors since the update. HA doesn’t really crash, but doesn’t work well either.

matt1131 · December 23, 2021, 2:08pm

May have to wait until 2022.2 to see any difference

La-te · December 24, 2021, 2:08pm

I upgraded from latest 2021.11 to 2021.12.5. All seemed to be working fine first. Then less than 12 h after that I noticed any ZigBee devices (Zigbee2mqtt) were not working anymore and decided to reboot. After that I’ve been unable to connect to HA. It responds to ping with some 30-50% packet loss and that’s it. None of the ports are reachable for tcp.

MidnightLink · December 25, 2021, 5:50am

+1. Same here as well.
MQTT devices stop working on a nightly basis. Errors are almost identical from OP

fogger · December 25, 2021, 8:10am

I agree, with 2021.12 I am power cycling at least twice a week. And the constant updates (up to 12.5 by now) seem to make it worse. Not even the display of blocks in the app’s dashboard works properly anymore. The worst update ever.

moepstar · December 25, 2021, 11:20am

Oh my, seems i’m in good company…

Ever since 2021.12.x, my Homematic integration stops working, maximum time it’s been up was 24hrs, when it was weeks or even months before - nothing else changed much - even opened my own thread here and pinged the maintainer of the integration: Homematic - losing connection daily since 2021.12.x

But it seems to be as i expected - Home Assistant itself has gotten more unreliable

Anyone have an idea how to start debugging this?

La-te · December 28, 2021, 10:19pm

Date issues in this thread Date/Time is broken in the latest supervisor after each reboot. Could that be it with you?

I just discovered that for me HA must be in a reboot loop rebooting every minute. I can see from my router that DHCP lease is reset once per minute so either the DHCP client or the whole device is rebooting. I’m suspecting the device as it always stays off-line (no ping) for a while. But you others seem to have it online and just misbehaving.

moepstar · December 29, 2021, 8:25am

No, fortunately, that doesn’t seem to be it - my HASS stays up, but seems to loose the network connection every now and then (evident by the log that it can’t reach my Fritz!Box, Pi-Hole, CCU,…).

For 2 years+ my HASS is connected via WiFi and only since Home Assistant OS 7.0 and Core 2021.12.x it has these problems…

matt1131 · January 16, 2022, 8:58pm

Update: I went around and upgraded everything to the latest version as of 8 Jan just to see if things have stabilized. After a week of running, I haven’t had a single crash. I haven’t tried 2012.12.9 yet, but that’s next once I finish installing a few more devices.

michapr · January 25, 2022, 5:39pm

I have same issue here - HA is crashing after a while.
My database is about 1GB - but I think the problem may be related to the number of measurements of some sensors.
I have two MQTT sensors, reporting every 5 seconds to local broker - it was not a problem before integrating it to HA.
If I use a chart with these sensors in a dashboard - then I can be sure (now…) that HA will crash.

I only have changed yesterday the mqtt configuration that value will be taken only every minute - but I have just now already too much values in database.
(not sure what will happen if I “concentrate” it (delete some of them) because of the “old_state_id” in states will be wrong then…)

Attached a screen from terminal about the free space before crash.

I think one way could be to enhance the swap for now - for testing the issue - but how?..