Home Assistant crashing after clocks changed?

maxym · October 31, 2021, 3:47pm

Guys it’s not caused by time trigger (at least not the one from automations).
I have only a few automations, all reacts on core or mqtt restart only. But I was affected as well

mobile.andrew.jones · October 31, 2021, 3:55pm

Got any template sensors using for: or auto_off: ?

maxym · October 31, 2021, 4:19pm

Those are the only automations I have:

- id: shellies_announce
  alias: Shellies Announce
  trigger:
  - event: start
    platform: homeassistant
  action:
  - delay: '0'
  - data:
      payload: announce
      topic: shellies/command
    service: mqtt.publish
  mode: single
- id: 5640ac6192c842779baad69d558fafa1
  alias: Notify Mobile app
  trigger:
  - event: start
    platform: homeassistant
  action:
  - service: notify.mobile_app_maxym
    data:
      message: Home assistant restart!
      data:
        attachment:
          url: https://github.com/home-assistant/home-assistant-assets/blob/master/logo-round-192x192.png?raw=true
          content-type: png
          hide-thumbnail: false
  mode: single
- id: '1617665287968'
  alias: Shellies Announce MQTT reload
  description: ''
  trigger:
  - platform: event
    event_type: event_mqtt_reloaded
  condition: []
  action:
  - delay: '0'
  - data:
      payload: announce
      topic: shellies/command
    service: mqtt.publish
  mode: single
- id: '1634338726513'
  alias: Set Home Assistant theme at startup
  trigger:
    platform: homeassistant
    event: start
  action:
    service: frontend.set_theme
    data:
      name: mxm_theme

mobile.andrew.jones · October 31, 2021, 4:34pm

I see, but I didn’t ask about automations - I accepted what you originally said that you only had automations running on Home Assistant start. I asked do you have any template sensors - that use the for or auto_off statements. Eg:

  - platform: template
    sensors:
      boiler_working:
        friendly_name: "Boiler Running"
        value_template: "{{ is_state('switch.house_boiler','on') and (states('sensor.boiler_water_out')|float > 39.5) }}"
        delay_off:
          minutes: 5
      boiler_fault:
        friendly_name: "Boiler Fault"
        value_template: "{{ is_state('switch.house_boiler','on') and is_state('binary_sensor.boiler_working','off') }}"
        delay_on:
          minutes: 15

or

  - trigger:
      - platform: state
        to: 'on'
        entity_id: binary_sensor.mymotiondetectorrule_cell_motion_detection
    binary_sensor:
      - name: Livingroom Presence
        state: 'on'
        auto_off: "00:30:00"
        device_class: occupancy

maxym · October 31, 2021, 4:44pm

No.
The only time-related attributes I found are scan_interval and expire_after

mobile.andrew.jones · October 31, 2021, 4:51pm

Thanks just trying to find a common thing that we all share, since there are other people saying they don’t have time pattern either. And I can’t imagine that the timestamps aren’t being stored in the database using UTC so should be unaffected by the clocks changing, in either direction. It has to be something related to some code in Core doing something with an internal timer. It doesn’t make sense that (for me) automations which are clearly supposed to run every 5 or 15 minutes, were being triggered 20+ times every second. But there must be more going on that isn’t get logged for it to be affection other people who aren’t using any sort of time related logic in their configuration.

alkissack · October 31, 2021, 4:59pm

interestingly (?) mine stopped at 1, not 3

mobile.andrew.jones · October 31, 2021, 5:04pm

What timezone though?

Mine stopped at 1:12am (GMT). Because it had gone back from 2am (BST) so it continued for 12 minutes after the clock change, before the recorder gave up.

WhimsySpoon · October 31, 2021, 5:18pm

High CPU since 0100. Not locked up, but slow. Reboot of core has fixed it.

Domoticon · October 31, 2021, 6:37pm

Same here (NL). Took me a while to realize what was happened…

Rebooted the host and everything was back as before…

EDIT: FYI, the Android HA-App <–> Mobile phone didn’t responded/updated to changes. I rebooted the phone and it is working again. I noticed, before the reboot, it was updating only when the app was activated, in the background it was not updated.

seanomat · November 1, 2021, 5:26am

Since only newer installations seem to be affected it might be the hourly calculations of the statistics.

mobile.andrew.jones · November 1, 2021, 5:37pm

I’ve been having a think about it. Because people using InfluxDB didn’t lose any data on InfluxDB, but there were a lot of warnings that it was dropping “old” events. I’m assuming InfluxDB isn’t being asked to store any statistics data, because that is the job of the recorder and whatever database it is connected to. As far as I know influxDB is only storing state change information? So the suggestion is that whatever caused the recorder to give up after queuing 30,000 events - was updating the state of one or more entities - hundreds maybe thousands of times a second.

123 · November 1, 2021, 6:03pm

FYI

Source of the issue was identified by OttoWinter here.

github.com/home-assistant/core

Issue after DST time change

opened 01:13AM - 31 Oct 21 UTC

chneau

### The problem In UK 2021/10/31, at 01:59:59, time got back to 01:00:00 (sum…mer to winter, Daylight saving), since then (it's 01:08) home-assistant has a high CPU usage, using a core at 100%. ``` CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 42985e0497d4 hass 104.53% 251.7MiB / 7.658GiB 3.21% 0B / 0B 103MB / 1.77MB 15 ``` Edit: memory usage seems to increase quickly: at 01:14:00 ``` CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS 42985e0497d4 hass 104.93% 703.1MiB / 7.658GiB 8.97% 0B / 0B 112MB / 1.98MB 16 ``` Edit2: Switching lights work fine but it does not appear on the state history of the light. ### What version of Home Assistant Core has the issue? core-2021.10.6 ``` REPOSITORY TAG IMAGE ID CREATED SIZE homeassistant/home-assistant stable e0a45773808a 12 days ago 1.14GB ``` I could not find the exact image id on docker hub, but here is the label section of `docker inspect` ``` "io.hass.arch": "amd64", "io.hass.base.arch": "amd64", "io.hass.base.image": "homeassistant/amd64-base:3.14", "io.hass.base.name": "python", "io.hass.base.version": "2021.09.1", "io.hass.type": "core", "io.hass.version": "2021.10.6", "org.opencontainers.image.authors": "The Home Assistant Authors", "org.opencontainers.image.created": "2021-10-18 06:34:53+00:00", "org.opencontainers.image.description": "Open-source home automation platform running on Python 3", "org.opencontainers.image.documentation": "https://www.home-assistant.io/docs/", "org.opencontainers.image.licenses": "Apache License 2.0", "org.opencontainers.image.source": "https://github.com/home-assistant/core", "org.opencontainers.image.title": "Home Assistant", "org.opencontainers.image.url": "https://www.home-assistant.io/", "org.opencontainers.image.version": "2021.10.6" ``` ### What was the last working version of Home Assistant Core? _No response_ ### What type of installation are you running? Home Assistant Container ### Integration causing the issue _No response_ ### Link to integration documentation on our website _No response_ ### Example YAML snippet _No response_ ### Anything in the logs that might be useful for us? Interesting `The recorder queue reached the maximum size of 30000` ```txt 2021-10-30T09:40:10.884156228Z 2021-10-30 10:40:10 WARNING (MainThread) [homeassistant.components.websocket_api.http.connection] [139778345277952] Disconnected: Did not receive auth message within 10 seconds 2021-10-30T09:40:22.323961416Z 2021-10-30 10:40:22 WARNING (MainThread) [homeassistant.components.webhook] Received message for unregistered webhook c9fa7b5955dcce6df0ec16e14a28b23623563b96373bc5a66c0413c418093008 from 192.168.1.117 2021-10-31T01:03:30.660640416Z 2021-10-31 01:03:30 ERROR (MainThread) [homeassistant.components.recorder] The recorder queue reached the maximum size of 30000; Events are no longer being recorded 2021-10-31T01:04:57.487128770Z [cont-finish.d] executing container finish scripts... 2021-10-31T01:04:57.489476430Z [cont-finish.d] done. ``` at `2021-10-31T01:03:30.660640416Z` I restarted the container to see if it could fix the issue, it did not. ``` ### Additional information Maybe after 02:00:00 it will stop? Everything is working properly: light switches, the mobile phone app is working properly, the website served by the container (server:8123) is working properly. Restarting the container or restarting the PC does not solve the high CPU usage

It is being addressed by PR 58894: EDIT As per post below, it addresses only one aspect of the issue

github.com/home-assistant/core

Fix find_next_time_expression_time

home-assistant:dev ← OttoWinter:fix-find_next_time_expression_time

opened 02:03PM - 01 Nov 21 UTC

OttoWinter

+489 -64

## Proposed change  Fix `find_next_time_expression_time` after https://github.com/home-assistant/core/pull/49643 The conversion introduced some bugs, like - `fold` is the wrong value here (`is_dst == True` is equivalent to `fold == 0`, but that was the opposite here): https://github.com/home-assistant/core/pull/49643/files#diff-ebdd5e43e71f51f15950749ccba6701248317f7217663167de537455941e8d7dR319 - Setting `fold` to `1` here was missed: https://github.com/home-assistant/core/pull/49643/files#diff-ebdd5e43e71f51f15950749ccba6701248317f7217663167de537455941e8d7dR345 - thus all these edge cases would just generate times in the first fold - and something else that I unfortunately couldn't track down Additionally, there was an unwanted regression in the testing: the tests no longer checked if the `is_dst/fold` property is correct. This is due to a peculiar behavior of python's built-in `datetime` objects: The equality check doesn't consider the `fold` attribute (potentially a bug in the stdlib?). So all checks were passing even though the fold attribute was incorrect ```python3 >>> from datetime import datetime, timezone >>> from zoneinfo import ZoneInfo >>> tz=ZoneInfo("Europe/Vienna") >>> dt1=datetime(2021, 10, 31, 2, 30, 0, fold=0, tzinfo=tz) >>> dt2=datetime(2021, 10, 31, 2, 30, 0, fold=1, tzinfo=tz) >>> dt1 == dt2 True >>> dt1.astimezone(timezone.utc) == dt2.astimezone(timezone.utc) False # ¯\_(ツ)_/¯ ``` Edit: that appears to be intentional behavior: https://www.python.org/dev/peps/pep-0495/#toc-entry-27 Fixes (at least part of) https://github.com/home-assistant/core/issues/58783 Because I couldn't get the old DST checks to work anymore, I took a look at my notes from when I initially implemented this feature, and I think I have simplified the checks somewhat. The main concern with leaving DST is that patterns like `02:30` should match _twice_. Once in the first fold, and after turning the clocks backwards. But we only need to consider two cases: - `now` and the matched `result` are both ambiguous (=in the same fold) - `now` is in the first fold, and the next matched event should be in the second fold. ## Type of change  - [ ] Dependency upgrade - [x] Bugfix (non-breaking change which fixes an issue) - [ ] New integration (thank you!) - [ ] New feature (which adds functionality to an existing integration) - [ ] Breaking change (fix/feature causing existing functionality to break) - [ ] Code quality improvements to existing code or addition of tests ## Additional information  - This PR fixes or closes issue: fixes # - This PR is related to issue: - Link to documentation pull request: ## Checklist  - [x] The code change is tested and works locally. - [x] Local tests pass. **Your PR cannot be merged unless tests pass** - [x] There is no commented out code in this PR. - [x] I have followed the [development checklist][dev-checklist] - [x] The code has been formatted using Black (`black --fast homeassistant tests`) - [x] Tests have been added to verify that the new code works. If user exposed functionality or configuration variables are added/changed: - [ ] Documentation added/updated for [www.home-assistant.io][docs-repository] If the code communicates with devices, web services, or third-party tools: - [ ] The [manifest file][manifest-docs] has all fields filled out correctly. Updated and included derived files by running: `python3 -m script.hassfest`. - [ ] New or updated dependencies have been added to `requirements_all.txt`. Updated by running `python3 -m script.gen_requirements_all`. - [ ] For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description. - [ ] Untested files have been added to `.coveragerc`. The integration reached or maintains the following [Integration Quality Scale][quality-scale]:  - [ ] No score or internal - [ ] 🥈 Silver - [ ] 🥇 Gold - [ ] 🏆 Platinum  To help with the load of incoming pull requests: - [ ] I have reviewed two other [open pull requests][prs] in this repository. [prs]: https://github.com/home-assistant/core/pulls?q=is%3Aopen+is%3Apr+-author%3A%40me+-draft%3Atrue+-label%3Awaiting-for-upstream+sort%3Acreated-desc+review%3Anone  [dev-checklist]: https://developers.home-assistant.io/docs/en/development_checklist.html [manifest-docs]: https://developers.home-assistant.io/docs/en/creating_integration_manifest.html [quality-scale]: https://developers.home-assistant.io/docs/en/next/integration_quality_scale_index.html [docs-repository]: https://github.com/home-assistant/home-assistant.io

The PR was merged into patch release 2021.10.7 which should become available for installation today.

Many thanks to OttoWinter, bdraco, and emontnemery (and everyone else involved) for promptly investigating and resolving this issue.

mobile.andrew.jones · November 1, 2021, 6:40pm

Source of ONE of the issues. This only addresses part of the problem.

finity · November 1, 2021, 6:41pm

Is this only related to HA OS installs or all of them?

IOW, is it an OS issue or a HA Core issue?

123 · November 1, 2021, 6:45pm

OttoWinter did state:

Fixes (at least part of) #58783

but I couldn’t figure out which part was not fixed. Do you know what that might be?

mobile.andrew.jones · November 1, 2021, 6:45pm

It was in issue in Core caused by the migration from pytz to python-dateutil in May.

mobile.andrew.jones · November 1, 2021, 6:48pm

My understanding from the fix - is that fixes finding the next time expression - so will fix the issue with automations firing multiple 20+ times a second. But There are people who did not have any automations using the time trigger, and there are some people - even in this thread, but also in the bug report on Github, that don’t use ANY automations at all inside Home Assistant, instead relying on NodeRed for everything. These people also suffered from the increased resource use and the database connection giving up after 30,000 queued events. Something else that wasn’t related to the automations caused this.

In the bug report there is at least one person who had InfluxDB as well as the recorder, InfluxDB stayed up but did complain that it was dropping thousands of “old” events to “catch up”. I assume InfluxDB doesn’t store any of the new statistics stuff, and is mainly just storing state changes of entities. So for InfluxDB to be complaining - it suggests that one or more entities must have been updating it’s state hundreds, perhaps thousands of times a second.

123 · November 1, 2021, 7:10pm

I guess all North Americans who upgrade to 2021.10.7 will soon discover (this coming weekend) the correction’s effectiveness (seeing that it may not address all aspects of the issue).

FWIW, I’m not certain the affected function is limited for use in time triggers only. I skimmed the code and it seemed to be have broader use … but I may have simply misunderstood it.

Anyway, I’ll be upgrading to 2021.10.7 but, as a precaution, I’m shutting down my two servers late Saturday night and starting them Sunday morning. If there’s any ‘bad behavior’, I’ll be present to spot it (as opposed to letting it run amuck from 02:00).

frederikbove · November 1, 2021, 7:49pm

Also i noticed that my backups are now 1Gb larger, probably due to a lot of log files. Is there a way to clean them?