Home Assistant crashing after clocks changed?

We’ve just moved out of daylight savings time (BST, UTC+1) back to our normal timezone (GMT, UTC). At 2am the clocks went back, so we effectively had 1am twice.

I’ve just been woken up at 2.15am to find the lights in the house turned on (many of our ZigBee and Tasmota bulbs are configured to do this if they lose connection) and all our esphome devices LEDS flashing. Home Assistant has crashed and is completely unreachable!

I’ve powercycled the raspberry pi and it’s all come back okay. We have some automations triggered with cron-style notation to run every “/10” minutes. Looking in the logbook I can see these time-triggered automations were running repeatedly (almost every second) around 1.15am.

I want to go back to sleep, but I’ll do more troubleshooting in the morning. Has anyone else seen similar?

4 Likes

No, the daylight savings change here (UTC+10 → +11) went off without issue.

There were some people on Discord reporting issues: Discord

The solution was a forced shut-down and restart.

1 Like

Yup - same here - there is a bug on Github.
I’ve narrowed it down to automations that are triggered with a time pattern. From the MASSIVE 123mb log file I was left with…

That would be why I was unaffected then, I avoid that trigger. Usually there will be a much more efficient state trigger option.

Well that and you seem to have jumped forward an hour, rather than repeating an hour. I certainly did not have any problems when we jumped forward an hour in March.

EDIT:

If you have a fancy template though that replicate being able to run an automation every X minutes, I’d be interested - because this is one of the automations that was affected:

alias: Camera Snapshots
description: ''
trigger:
  - platform: time_pattern
    minutes: /5
condition: []
action:
  - service: camera.snapshot
    target:
      entity_id: camera.livingroom_mainstream
    data:
      filename: /config/www/cameras/rtsp_livingroom_main.jpg
  - service: camera.snapshot
    target:
      entity_id: camera.bedroom_mainstream
    data:
      filename: /config/www/cameras/rtsp_bedroom_main.jpg
mode: single

I only take a snapshot when movement is detected.

- id: d70f3e5f-8bcc-49bb-8aa4-3fc389f300e8
  alias: 'Drive Camera Motion'
  max_exceeded: 'silent'
  trigger:
    platform: state
    entity_id:
    - binary_sensor.pir_drive
    from: 'off'
    to: 'on'
  action:
  - service: camera.snapshot
    data:
      entity_id: camera.front
      filename: '/config/www/snapshots/drive1.jpg'
  - delay: 00:00:01
  - service: camera.snapshot
    data:
      entity_id: camera.front
      filename: '/config/www/snapshots/drive2.jpg'
  - delay: 00:00:01
  - service: camera.snapshot
    data:
      entity_id: camera.front
      filename: '/config/www/snapshots/drive3.jpg'
  - service: script.turn_on
    entity_id: script.drive_cam_telegram
  - service: notify.gmail
    data:
      title: 'Drive Movement'
      message: 'Drive movement detected'
      data:
        images:
          - /config/www/snapshots/drive1.jpg
          - /config/www/snapshots/drive2.jpg
          - /config/www/snapshots/drive3.jpg
  - delay:
      seconds: 5 # trigger every 5 seconds at most

Yeah, everything with timepattern just fires constantly on all cylinders.

That doesn’t help in this case - the problem is the cameras are using RTSP and when you load the lovelace view, you don’t get a picture on loading it, until the first time the picture card refreshes the feed (every 10 seconds or so). This workaround means there is a recent image at least when you look at the camera page. This workaround is required on multiple installations not just mine.

Here is another automation that ensure that the boiler state stays synchronised on several ESPHome thermostats, in case they have been rebooted or lost connection with WiFi temporarily.

alias: Sensor - Boiler Running
description: Update State of Thermostats
trigger:
  - entity_id: binary_sensor.boiler_working
    platform: state
  - hours: '*'
    minutes: /5
    platform: time_pattern
    seconds: '0'
condition: []
action:
  - data: {}
    target:
      entity_id:
        - switch.boiler_status_led
        - switch.boiler_status_led_2
        - switch.boiler_status_led_3
    service: >-
      switch.turn_{{ 'on' if is_state('binary_sensor.boiler_working','on') else
      'off' }}

Same here. Crashed HA and i have an automation with time pattern.
And ofcourse it happend right when i was heating… so nice and warm house this morning :smiley: :cry:

My system crashed last night. Not figured out yet what it could be, but we have some time to fix this :wink:

You are putting your CPU to that much use 24/7 just to refresh an image in case you might look at it?

Browser mod can refresh the image only when Lovelace is opened, (use the device sensor visibility attribute to trigger a script that captures your image).

The other one is easily converted to state based triggers too:

alias: Sensor - Boiler Running
description: Update State of Thermostats
trigger:
  - platform: state
    entity_id: binary_sensor.boiler_working
    to:
      - 'on'
      - 'off'
  - platform: homeassistant
    event: start
  - platform: state
    entity_id: 
      - switch.boiler_status_led
      - switch.boiler_status_led_2
      - switch.boiler_status_led_3
    from: 'unavailable'
condition: []
action:
  - data: {}
    target:
      entity_id:
        - switch.boiler_status_led
        - switch.boiler_status_led_2
        - switch.boiler_status_led_3
    service: >-
      switch.turn_{{ 'on' if is_state('binary_sensor.boiler_working','on') else
      'off' }}

This is getting off topic. Open another topic if you want help making your automations more efficient.

1 Like

Same issue for me, last night we’ve moved out of daylight savings time, and at 3AM, home assistant stooped receiving updates form sensors, and got stuck at the last value received. I’ve checked the logs in esphome, and they were still detecting the right values. Also none of my automations ran over night. After a restart of the virtual machine, everything started to work. I’m running core-2021.10.6 and Home Assistant OS 6.5
image

3 Likes

Same for me, my HA crashed exectly at 3AM and the first error message in the log is iindeed linked to time patter triggered automation. Interesting though, HA was working (or trying so) for another almost 7 minutes before finally stopping. In that time it generated almost 600 000 of various error messages in the log - 79MB of data :slight_smile:

For me, Home Assistant continued to run (data was still captured from sensors and sent to InfluxDB) but my graphs were “flat”. The relevant error was “The recorder queue reached the maximum size” - but I have no idea why the clocks changing would cause the recorder queue to build up.

My recorder database is the default SQLite one.

System Health
version: core-2021.10.2
installation_type: Home Assistant Core
dev: false
hassio: false
docker: false
user: homeassistant
virtualenv: true
python_version: 3.9.1
os_name: Linux
os_version: 5.10.17-v8+
arch: aarch64
timezone: Europe/London

logged_in: false
can_reach_cert_server: ok
can_reach_cloud_auth: ok
can_reach_cloud: ok

FYI:
I have a notification each time HA reboots and I had like 6 notifications in a row at 3am.
(yes I have automations with the time pattern trigger)

For me HA still running but no history after clocks went back an hour. With this error -

The recorder queue reached the maximum size of 30000; Events are no longer being recorded

Seems fixed by a restart.

1 Like

Yep, “backend” crashed for me as well.
Besides the “recorder” message, ZHA seemed to have lost it as well.

2021-10-31 02:02:26 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: 
2021-10-31 02:04:37 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: 
2021-10-31 02:05:47 ERROR (MainThread) [homeassistant.components.recorder] The recorder queue reached the maximum size of 30000; Events are no longer being recorded
2021-10-31 02:09:35 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: 
2021-10-31 02:20:58 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: 
2021-10-31 02:27:18 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: 
2021-10-31 02:29:17 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: 
2021-10-31 02:34:22 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: 
2021-10-31 02:37:58 WARNING (MainThread) [homeassistant.helpers.entity] Update of sensor.plug_tuya_01_electrical_measurement is taking over 10 seconds
2021-10-31 02:38:00 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: 
2021-10-31 02:38:16 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: 
2021-10-31 02:38:31 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: 
2021-10-31 02:38:46 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: 
2021-10-31 02:39:01 WARNING (MainThread) [bellows.zigbee.application] Watchdog heartbeat timeout: 
2021-10-31 02:39:50 WARNING (bellows.thread_0) [bellows.uart] Reset future is None
2021-10-31 02:40:04 ERROR (bellows.thread_0) [bellows.uart] Lost serial connection: device reports readiness to read but returned no data (device disconnected or multiple access on port?)
2021-10-31 02:40:04 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2021-10-31 02:40:04 WARNING (bellows.thread_0) [bellows.uart] Reset future is None
2021-10-31 02:40:04 ERROR (bellows.thread_0) [bellows.uart] Lost serial connection: device reports readiness to read but returned no data (device disconnected or multiple access on port?)
2021-10-31 02:40:04 ERROR (bellows.thread_0) [bellows.uart] Lost serial connection: device reports readiness to read but returned no data (device disconnected or multiple access on port?)
2021-10-31 02:40:04 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2021-10-31 02:40:04 ERROR (MainThread) [bellows.ezsp] NCP entered failed state. Requesting APP controller restart
2021-10-31 02:47:22 ERROR (MainThread) [homeassistant] Error doing job: Task was destroyed but it is pending!
2021-10-31 02:47:22 ERROR (MainThread) [homeassistant] Error doing job: Task was destroyed but it is pending!

Might be worth some testing before our western Atlantic friends do their DST jump next week…

my system crahed too :thinking:

I woke up to a crashed instance at exactly 2am too. Used the console to do a core restart which brought it back to life.

I thought something had happened as a motion sensor didn’t trigger lights when I walked past.

First time anything like this has happened.

Yes it looks like Home Assistant crashed for me as well at 02:00 (UK DST applied).

Of course, logs don’t survive a rebbot so absolutely noidea why.

I do wish the system logs were saved and rotated… :frowning:

[edit]
HA Blue
core-2021.10.6
supervisor-2021.10.6
Home Assistant OS 6.5