Home Assistant crashing after clocks changed?

Guys it’s not caused by time trigger (at least not the one from automations).
I have only a few automations, all reacts on core or mqtt restart only. But I was affected as well

Got any template sensors using for: or auto_off: ?

Those are the only automations I have:

- id: shellies_announce
  alias: Shellies Announce
  trigger:
  - event: start
    platform: homeassistant
  action:
  - delay: '0'
  - data:
      payload: announce
      topic: shellies/command
    service: mqtt.publish
  mode: single
- id: 5640ac6192c842779baad69d558fafa1
  alias: Notify Mobile app
  trigger:
  - event: start
    platform: homeassistant
  action:
  - service: notify.mobile_app_maxym
    data:
      message: Home assistant restart!
      data:
        attachment:
          url: https://github.com/home-assistant/home-assistant-assets/blob/master/logo-round-192x192.png?raw=true
          content-type: png
          hide-thumbnail: false
  mode: single
- id: '1617665287968'
  alias: Shellies Announce MQTT reload
  description: ''
  trigger:
  - platform: event
    event_type: event_mqtt_reloaded
  condition: []
  action:
  - delay: '0'
  - data:
      payload: announce
      topic: shellies/command
    service: mqtt.publish
  mode: single
- id: '1634338726513'
  alias: Set Home Assistant theme at startup
  trigger:
    platform: homeassistant
    event: start
  action:
    service: frontend.set_theme
    data:
      name: mxm_theme

I see, but I didn’t ask about automations - I accepted what you originally said that you only had automations running on Home Assistant start. I asked do you have any template sensors - that use the for or auto_off statements. Eg:

  - platform: template
    sensors:
      boiler_working:
        friendly_name: "Boiler Running"
        value_template: "{{ is_state('switch.house_boiler','on') and (states('sensor.boiler_water_out')|float > 39.5) }}"
        delay_off:
          minutes: 5
      boiler_fault:
        friendly_name: "Boiler Fault"
        value_template: "{{ is_state('switch.house_boiler','on') and is_state('binary_sensor.boiler_working','off') }}"
        delay_on:
          minutes: 15

or

  - trigger:
      - platform: state
        to: 'on'
        entity_id: binary_sensor.mymotiondetectorrule_cell_motion_detection
    binary_sensor:
      - name: Livingroom Presence
        state: 'on'
        auto_off: "00:30:00"
        device_class: occupancy

No.
The only time-related attributes I found are scan_interval and expire_after

1 Like

Thanks just trying to find a common thing that we all share, since there are other people saying they don’t have time pattern either. And I can’t imagine that the timestamps aren’t being stored in the database using UTC so should be unaffected by the clocks changing, in either direction. It has to be something related to some code in Core doing something with an internal timer. It doesn’t make sense that (for me) automations which are clearly supposed to run every 5 or 15 minutes, were being triggered 20+ times every second. But there must be more going on that isn’t get logged for it to be affection other people who aren’t using any sort of time related logic in their configuration.

interestingly (?) mine stopped at 1, not 3

What timezone though?

Mine stopped at 1:12am (GMT). Because it had gone back from 2am (BST) so it continued for 12 minutes after the clock change, before the recorder gave up.

1 Like

High CPU since 0100. Not locked up, but slow. Reboot of core has fixed it.

1 Like

Same here (NL). Took me a while to realize what was happened… :roll_eyes:
image

Rebooted the host and everything was back as before…

EDIT: FYI, the Android HA-App <–> Mobile phone didn’t responded/updated to changes. I rebooted the phone and it is working again. I noticed, before the reboot, it was updating only when the app was activated, in the background it was not updated.

Since only newer installations seem to be affected it might be the hourly calculations of the statistics.

1 Like

I’ve been having a think about it. Because people using InfluxDB didn’t lose any data on InfluxDB, but there were a lot of warnings that it was dropping “old” events. I’m assuming InfluxDB isn’t being asked to store any statistics data, because that is the job of the recorder and whatever database it is connected to. As far as I know influxDB is only storing state change information? So the suggestion is that whatever caused the recorder to give up after queuing 30,000 events - was updating the state of one or more entities - hundreds maybe thousands of times a second.

FYI

Source of the issue was identified by OttoWinter here.

It is being addressed by PR 58894: EDIT As per post below, it addresses only one aspect of the issue

The PR was merged into patch release 2021.10.7 which should become available for installation today.

Many thanks to OttoWinter, bdraco, and emontnemery (and everyone else involved) for promptly investigating and resolving this issue.

1 Like

Source of ONE of the issues. This only addresses part of the problem.

Is this only related to HA OS installs or all of them?

IOW, is it an OS issue or a HA Core issue?

OttoWinter did state:

Fixes (at least part of) #58783

but I couldn’t figure out which part was not fixed. Do you know what that might be?

It was in issue in Core caused by the migration from pytz to python-dateutil in May.

1 Like

My understanding from the fix - is that fixes finding the next time expression - so will fix the issue with automations firing multiple 20+ times a second. But There are people who did not have any automations using the time trigger, and there are some people - even in this thread, but also in the bug report on Github, that don’t use ANY automations at all inside Home Assistant, instead relying on NodeRed for everything. These people also suffered from the increased resource use and the database connection giving up after 30,000 queued events. Something else that wasn’t related to the automations caused this.

In the bug report there is at least one person who had InfluxDB as well as the recorder, InfluxDB stayed up but did complain that it was dropping thousands of “old” events to “catch up”. I assume InfluxDB doesn’t store any of the new statistics stuff, and is mainly just storing state changes of entities. So for InfluxDB to be complaining - it suggests that one or more entities must have been updating it’s state hundreds, perhaps thousands of times a second.

1 Like

I guess all North Americans who upgrade to 2021.10.7 will soon discover (this coming weekend) the correction’s effectiveness (seeing that it may not address all aspects of the issue).

FWIW, I’m not certain the affected function is limited for use in time triggers only. I skimmed the code and it seemed to be have broader use … but I may have simply misunderstood it.

Anyway, I’ll be upgrading to 2021.10.7 but, as a precaution, I’m shutting down my two servers late Saturday night and starting them Sunday morning. If there’s any ‘bad behavior’, I’ll be present to spot it (as opposed to letting it run amuck from 02:00).

2 Likes

Also i noticed that my backups are now 1Gb larger, probably due to a lot of log files. Is there a way to clean them?

1 Like