Continuous crash/reboot after upgrade to 2026.1.3

I have been using HA for a while but really only basic stuff. Yesterday I decided to do the upgrade from 2026.1.1 to 2026.1.3 and now I have continuous restarts. The uptime varies but seems to be at least 10 minutes and I have seen as long as 40 minutes (I haven’t monitored yet).

As I’m still a relatively new user, what can I do to identify this?

I would start by looking in the logs.

Do you have maybe watchman from hacs installed? This crashed my ha this morning.
Probably to very likely you have some integration from hacs that is breaking your system because it wasn’t updated with latest version of ha.

How close to the edge are you on memory utilization? One trick would be to disable all add-ons and see if that gives you more time. If it’s stable without them, re-add them one by one until you find the culprit. But unless you know you’ve got lots of free memory, I’d start there.

My system wouldn’t reliably stay up long enough to troubleshoot. I started to wonder if it might have been related or caused by recent OS and Core upgrades, I hadn’t been paying enough attention over the preceding several days (the winter storm and all) so I don’t know if 2026.1.1 was running ok.

My backups were all new, so I ended up doing an OS update back to 16.3 and a Core update back to 2025.12.4. Interestingly the Supervisor is showing 2026.1.1. All has been stable since the roll backs.

At least now I have some time to try to figure this out.

This is the main cause of recent instability:

Yeah it crashed my ha this morning.

1 Like

I do not have watchman installed. Before I move forward I do need to tackle my database. It has gotten huge. But that’s for another topic.

How huge (in GB)?

My backup tar is now over 8GB. Homeassistant.tar.gz is almost 2GB, influxdb.tar.gz is over 7GB.

Watchman killed mine yesterday. I verified by restoring I immediately got the watchman beta release prompt to upgrade. I upgraded and it killed the instance again.
Restored again. Removed watchman and stable now!

I have the same after the 2026.1.3 update. Nothing else was installed. I have to kill the power to reset it. I do not have Watchman installed.

I do see this in the Supervisor log as the last entry before I have to reset the system. I am including a little before and after. The time stamps do not make sense.

2026-02-03 08:01:12.461 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
2026-02-03 08:31:12.637 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/udev.sh
[16:27:05] INFO: Using udev information from host
cont-init: info: /etc/cont-init.d/udev.sh exited 0
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun supervisor (no readiness notification)
services-up: info: copying legacy longrun watchdog (no readiness notification)
[16:27:05] INFO: Starting local supervisor watchdog...
s6-rc: info: service legacy-services successfully started
2026-02-03 16:27:06.602 INFO (MainThread) [__main__] Initializing Supervisor setup
2026-02-03 16:27:06.665 INFO (MainThread) [supervisor.coresys] Setting up coresys for machine: raspberrypi5-64
2026-02-03 10:27:06.669 INFO (MainThread) [supervisor.docker.supervisor] Attaching to Supervisor ghcr.io/home-assistant/aarch64-hassio-supervisor with version 2026.01.1
2026-02-03 10:27:06.682 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state initialize
2026-02-03 10:27:06.684 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
2026-02-03 10:27:06.684 INFO (MainThread) [__main__] Setting up Supervisor
  • Installation method: Home Assistant OS
  • Core: 2026.1.3
  • Supervisor: 2026.01.1
  • Operating System: 17.0
  • Frontend: 20260107.2

OK, so after rolling back to OS 16.3 and Core 2025.12.4 everything ran stable for over a week. I decided to try to upgrade again, but this time I focused on just the OS 17.0.

After upgrading to 17.0, the problem returned. Lack of stability with reboots sometimes only a couple of minutes.

I’m not sure which logs might reveal what is going on, but I do see this in influxDB logs for 17.0 and I don’t have the tsm file error, or the too many open file errors in 16.3:

type or paste cs6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service base-addon-banner: starting

e[34m-----------------------------------------------------------e[0m
e[34m Add-on: InfluxDBe[0m
e[34m Scalable datastore for metrics, events, and real-time analyticse[0m
e[34m-----------------------------------------------------------e[0m
e[34m Add-on version: 5.0.2e[0m
e[32m You are running the latest version of this add-on.e[0m
e[34m System: Home Assistant OS 17.0  (aarch64 / raspberrypi4-64)e[0m
e[34m Home Assistant Core: 2025.12.4e[0m
e[34m Home Assistant Supervisor: 2026.01.1e[0m
e[34m-----------------------------------------------------------e[0m
e[34m Please, share the above information when looking for helpe[0m
e[34m or support in, e.g., GitHub, forums or the Discord chat.e[0m
e[34m-----------------------------------------------------------e[0m
s6-rc: info: service base-addon-banner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service base-addon-timezone: starting
s6-rc: info: service base-addon-log-level: starting
s6-rc: info: service fix-attrs successfully started
[13:00:39] INFO: e[32mConfiguring timezone (America/New_York)...e[0m
s6-rc: info: service base-addon-log-level successfully started
s6-rc: info: service base-addon-timezone successfully started
s6-rc: info: service legacy-cont-init: starting
cont-init: info: running /etc/cont-init.d/create-users.sh
cont-init: info: /etc/cont-init.d/create-users.sh exited 0
cont-init: info: running /etc/cont-init.d/influxdb.sh
cont-init: info: /etc/cont-init.d/influxdb.sh exited 0
cont-init: info: running /etc/cont-init.d/kapacitor.sh
cont-init: info: /etc/cont-init.d/kapacitor.sh exited 0
cont-init: info: running /etc/cont-init.d/nginx.sh
cont-init: info: /etc/cont-init.d/nginx.sh exited 0
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun chronograf (no readiness notification)
services-up: info: copying legacy longrun influxdb (no readiness notification)
services-up: info: copying legacy longrun kapacitor (no readiness notification)
services-up: info: copying legacy longrun nginx (no readiness notification)
s6-rc: info: service legacy-services successfully started
[13:00:46] INFO: e[32mChronograf is waiting until InfluxDB is available...e[0m
[13:00:46] INFO: e[32mKapacitor is waiting until InfluxDB is available...e[0m
[13:00:47] INFO: e[32mStarting the InfluxDB...e[0m
ts=2026-02-04T18:05:57.962687Z lvl=error msg="Cannot read corrupt tsm file, renaming" log_id=10rSs2ml000 engine=tsm1 service=filestore path=/data/influxdb/data/homeassistant/autogen/546/000000005-000000002.tsm id=0 error="init: read tombstones: open /data/influxdb/data/homeassistant/autogen/546/000000005-000000002.tombstone: too many open files"
ts=2026-02-04T18:05:58.016077Z lvl=warn msg="error opening fields.idx: open /data/influxdb/data/homeassistant/autogen/738/fields.idx: too many open files.  Rebuilding." log_id=10rSs2ml000 engine=tsm1
2026/02/04 18:08:16 http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 5ms
2026/02/04 18:08:17 http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 10ms
2026/02/04 18:08:17 http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 20ms
2026/02/04 18:08:17 http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 40ms
2026/02/04 18:08:17 http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 80ms
2026/02/04 18:08:17 http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 160ms
2026/02/04 18:08:17 http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 320ms
[13:08:17] INFO: e[32mStarting Chronograf...e[0m
[13:08:17] INFO: e[32mStarting the Kapacitore[0m

'##:::'##::::'###::::'########:::::'###:::::'######::'####:'########::'#######::'########::
 ##::'##::::'## ##::: ##.... ##:::'## ##:::'##... ##:. ##::... ##..::'##.... ##: ##.... ##:
 ##:'##::::'##:. ##:: ##:::: ##::'##:. ##:: ##:::..::: ##::::: ##:::: ##:::: ##: ##:::: ##:
 #####::::'##:::. ##: ########::'##:::. ##: ##:::::::: ##::::: ##:::: ##:::: ##: ########::
 ##. ##::: #########: ##.....::: #########: ##:::::::: ##::::: ##:::: ##:::: ##: ##.. ##:::
 ##:. ##:: ##.... ##: ##:::::::: ##.... ##: ##::: ##:: ##::::: ##:::: ##:::: ##: ##::. ##::
 ##::. ##: ##:::: ##: ##:::::::: ##:::: ##:. ######::'####:::: ##::::. #######:: ##:::. ##:
..::::..::..:::::..::..:::::::::..:::::..:::......:::....:::::..::::::.......:::..:::::..::

2026/02/04 13:08:19 Using configuration at: /etc/kapacitor/kapacitor.conf
2026/02/04 18:08:19 http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 5ms
2026/02/04 18:08:19 http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 10ms
2026/02/04 18:08:19 http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 20ms
2026/02/04 18:08:19 http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 40ms
2026/02/04 18:08:19 http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 80ms
2026/02/04 18:08:19 http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 160ms
2026/02/04 18:08:19 http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 320ms
2026/02/04 18:08:19 http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 640msode here

It seems random how long this will run before it reboots. Maybe something else is causing this, I just don’t know where else to start looking. I ended up trying to restore from an earlier backup and now things are stable again.

I’m wondering if somehow I might have gotten a corrupted 17.0 image or something. And I don’t know how I can force it to start a fresh download to overwrite the 17.0 image on the SSD. When I tried to update earlier today for this test, I think it knew I already had the image. My cli ha os info boot_slot B has 17.0, maybe that is how?