2023.8: Update breaks automations

raman325 · August 17, 2023, 6:21pm

hi all, apologies for potentially hijacking this thread, but for people experiencing this problem with automations involving zwave devices, I’d like to look into whether or not the Z-Wave JS integration or driver is somehow contributing to this problem.

I’ve already reviewed the Z-Wave JS PRs introduced in 2023.7 and I don’t think this is newly introduced behavior, but rather the automation changes introduced in 2023.7 may have exposed an existing issue with zwave-js that was previously hidden from users (and us devs) because HA would stop waiting for the service call to complete after 10 seconds (it no longer does this)

If you’d like to help, please provide the following:

Automation YAML definition
Automation trace ideally, but if that’s not possible because the automation never finishes, an indication of what step in the definition the automation run is hanging on
Debug level zwave_js integration logs
Debug level zwave-js-server-python library logs
Debug level zwave-js driver logs (this is the addon logs for Z-Wave addon users, the Docker container logs for zwave-js-server or zwave-js-ui for bare Docker users, or zwave-js-server logs for the people running the server on the command line)

While I realize there isn’t much information here, this section of the docs may help you in obtaining the driver logs: Z-Wave - Home Assistant

For the integration and library logs, you can update your HA configuration, or use the services listed here: Logger - Home Assistant

For any additional help in obtaining the logs, please ask in the Discord #zwave channel

If you can’t publish this info here, you can open a GitHub issue, or you can DM me on discord (same username). Thanks!

Richi_Bowzer · August 21, 2023, 6:59am

its not just zwave, after the upgrade to any 2023.8 version various automations stop working, I have one blueprint for a 3 band opple switch as mentioned previously, it does not execute the act for zigbee lights, blinds or reolink camera actions, or hue integration actions (apart from turn off on hue stuff), but works flawlessly with soma tilt unit still, but not a soma roller (on tilt everything works, on roller only close works)…

Something is more deep afoot hears as it seems to be a issue with the automations rather than a specific integration.

changing the mode of a automation or adding any code is not going to fix it, something is broken, putting a band aid on it will just mean more problems down the line, people need to post on git to get it fixed.

raman325 · August 22, 2023, 3:20am

what’s going on is that there was a 10 second timeout on service calls that no longer exists. This has exposed the fact that some services that would hang indefinitely. Each of these integrations need to be looked at one by one

Richi_Bowzer · August 22, 2023, 6:49am

So hue, mqtt, reolink all need to be fixed? (especially weird as hue will turn off the lights it controls), but not on, it seems unlikely (but i am no expert) because all the items broke in these automations work fine in 2023.7 and before and not delayed they are instant. even without this time out, surely they should all still work as they did previously? My issue here, is that its only effecting maybe 10% or more of users, who will have the savy enough to add the bandaid fix to their automations.

From webcoding i know bandaiding something now will only lead to further problems down the road.

petro · August 22, 2023, 11:38am

Listen, you’re more than welcome to question bugs, but at this point we’ve narrowed down the issue to exactly what Raman is talking about. You’re talking with a lead Zwave Dev who has been talking with the people who made the change in 2023.7. We are 99% sure what raman described is the cause because we can replicate it.

123 · August 22, 2023, 2:15pm

Here’s my interpretation of the issue (and it may be an oversimplification):

In previous versions, if an integration failed to acknowledge a command, like a service call in an automation, the automation would give up waiting for the integration’s reply after ten seconds.

The advantage of having a timeout is that it avoids waiting indefinitely for a reply that may never be received.
The disadvantage of the timeout is that it masks potential problems in the integration. By simply giving up and moving on to execute the next action (if any) no one is aware that something in your system failed to work normally.

By eliminating the timeout, those failures are now readily apparent (because the automation waits indefinitely for the reply). The focus now is on correcting the integrations that, on occasion, fail to reply promptly.

Richi_Bowzer · August 22, 2023, 7:32pm

as I said I am no expert in this, but Taras line probably says it best, integrations as in multiple need fixing, including hue, reolink, mqtt… weird thing is though the someone (which I know its not a official but a hacs) for the tilt blind all buttons will still work, but for roller, only down will work (pretty much like hue automation in that only off will work…

Of course I did band aid my automations to single to temp fix. but I am just suggesting with so many integrations not reponding with various errors and some off commands work but not on… could their be another issue some place else…

again I could be 100% wrong

123 · August 23, 2023, 1:01am

Can you link to the posts in the community forum, or Issues in GitHub, reporting problems with the integrations you mentioned? The majority that I have seen (forum and GitHub) are for Zwave and ZigBee.

Here’s one data point: I haven’t experienced any problems with Hue or MQTT in 2023.7.3 or 2023.8.3.

Tinkerer · August 23, 2023, 6:43am

On 2023.7.3 with Zigbee2MQTT I also haven’t seen this problem.

at9 · August 23, 2023, 1:39pm

I had to downgrade to 2023.7.3 from 2023.8.3 and this fixed my issues with Z2M.

123 · August 23, 2023, 1:48pm

What were all of the issues?

at9 · August 23, 2023, 1:53pm

Most of the automation/blueprints for my zigbee devices were getting stuck like others have mentioned.

123 · August 23, 2023, 2:12pm

That’s interesting. It’s my understanding that the 10-second timeout was removed in 2023.7.0. Yet you’re not experiencing the ‘stuck automation’ problem in that version and neither is Tinkerer. Perhaps I’m mistaken and the timeout was only removed in 2023.8.0.

Alternatively, the timeout was indeed removed from 2023.7.0 and the fact Z2M misbehaves only in 2023.8.X perhaps serves as another clue for the development team.

Just to clarify, I’m using devices communicating via MQTT over Wifi/Ethernet (in 2023.8.3 without the ‘stuck’ automation issue). In contrast, Z2M does ultimately rely on Zigbee to communicate with the physical device; a delay in acknowledging a ZigBee command will be experienced by Home Assistant regardless if it communicates directly with the device via ZigBee or indirectly via MQTT<->ZigBee.

Richi_Bowzer · August 23, 2023, 7:45pm

2023.7 was fine all releases this only occoured for me on 2023.8 release, @Taras you can band aid it by going into those blueprints via studio code server and change them to single instead of restart.

As for comments on the others I still find it strange, my first thought when booting up the first release that it was mqtt related as the switches still worked, and showed they were responding.

but then the hue bulbs turned off but not back on, None of the items tied via mqtt worked,
but then reolink spotlights stop responding… So I thought maybe the switches are just being funny, but know they are all showing actions when pressed, then to my suprise the sometilt blind in office worked fine, open and close via the switch but the soma roller in the kitchen would only close.

Before reverting back to 2023.7 the first time I double checked I could call the services via dev and that was fine, I then went and changed all automations and blue prints that did use restart to single as a band aid fix.

Lux4rd0 · August 23, 2023, 8:39pm

I had the timeouts in 2023.7 and 2023.8:

github.com/home-assistant/core

Automations marked as "Still Running" After Upgrade to 2023.7 & 2023.8

opened 09:08PM - 08 Aug 23 UTC

lux4rd0

integration: automation

### The problem Automations are getting stuck in versions of HA core 2023.7.0 t…hrough 2023.8.1 after upgrading from 2023.6.3. ![trace1](https://github.com/home-assistant/core/assets/30187533/37c63524-2de6-41eb-8028-d17122b54fe8) ![trace2](https://github.com/home-assistant/core/assets/30187533/d9909caa-7ded-4a35-9219-1c977ea7a7a5) ![trace3](https://github.com/home-assistant/core/assets/30187533/a03ad64e-479a-4d7a-9312-9ab482bff911) ### What version of Home Assistant Core has the issue? core-2023 7.0, core-2023 7.1. core-2023 7.2, core-2023 7.3, core-2023 8.0, core-2023 8.1 ### What was the last working version of Home Assistant Core? core-2023 6.3 ### What type of installation are you running? Home Assistant OS ### Integration causing the issue Z-wave and Zigbee ### Link to integration documentation on our website _No response_ ### Diagnostics information [home-assistant.log.2023.6.3.zip](https://github.com/home-assistant/core/files/12295856/home-assistant.log.2023.6.3.zip) [home-assistant.log.2023.7.0.zip](https://github.com/home-assistant/core/files/12295857/home-assistant.log.2023.7.0.zip) ### Example YAML snippet ```yaml alias: Office Day On description: "" trigger: - platform: state entity_id: binary_sensor.office_motion_motion to: "on" condition: - condition: or conditions: - condition: state entity_id: input_select.mode state: Home action: - type: turn_on device_id: 6a7dd3564b68685d3dfbfbe0d2429237 entity_id: light.office_floor_lamp domain: light brightness_pct: 100 - type: turn_on device_id: 6b3ff3b7d4b89d95e2120316881e62ca entity_id: light.office_overhead domain: light brightness_pct: 100 - service: zwave_js.bulk_set_partial_config_parameters data: parameter: "16" value: 33884694 target: entity_id: - light.office_floor_lamp - light.office_overhead enabled: false - service: light.turn_on data: effect: Twinkle target: entity_id: - light.esp07_light_1 - light.esp07_light_2 - light.esp07_light_3 - light.esp07_light_4 mode: single ``` ### Anything in the logs that might be useful for us? _No response_ ### Additional information Watching several other issues that have been opened and closed: https://github.com/home-assistant/core/issues/97965 https://github.com/home-assistant/core/issues/97768 https://github.com/home-assistant/core/issues/97721 https://github.com/home-assistant/core/issues/97662 https://github.com/home-assistant/core/issues/97581 https://community.home-assistant.io/t/already-running-new-automation-bug/596654/16

There might be nuances with the “restart” mode, as I only used “single” for all my automation. I’ve moved everything to “restart” from the madness and haven’t seen it hang yet. Just following along here to see if there are similar fixes to the situation.

123 · August 23, 2023, 8:56pm

The first users to report the ‘stuck’ automation problem employed mode: single. In fact, this makes sense because a ‘stuck’ single-mode automation will trigger once and then never again (because only one execution instance is permitted). Others have reported it also occurs with mode: restart because the ‘stuck’ automation refuses to abort the first execution instance in order to start over.

In other words, neither of the two modes is a guaranteed ‘band aid’ nor, for that matter, is using continue_on_error: true (which was reported to avoid the problem for some users but not others).

tl;dr
There’s no single, reliable workaround for all integrations experiencing the problem. It will be solved only when the development team mitigates it within the affected integrations.

In case you’re wondering where I am getting the data for my observations, I have read all reports about the problem posted in the community forum and in GitHub Core (and commented in most of them in an attempt to characterize it).

gyordanov · August 23, 2023, 10:08pm

I tried single/restart modes and even added continue_on_error: true … it did help a little, but i still have automatons getting stuck …
funny enough one gets stuck on checking if the door is open, in theory that should not have an integration call as it should just check the current status (battery operated zwave sensor)…
but it does get stuck there … or at least the UI shows it like it does.

Jon123 · August 24, 2023, 3:26am

Have you received the data you need here? If not I can try to gather it tomorrow. I have a few automations that keep failing. The issue is they run every x minutes as a single mode, so by the time I catch them usually the traces are no longer available.
I have one now that won’t stop running even after I changed it and saved it

EndUser · August 24, 2023, 3:33am

I follow this with interest. I was among the early ones reporting this issue after upgrading to 2023.8 . However for me the issue disappeared after a few days and the only change I consciously made is to run a HA restart at 4 am. I am not sure that this makes a difference and I may have done something else that I don’t remember. Among my well over 100 Automations there is only one or two where I changed single to restart.

123 · August 24, 2023, 4:48am

Every time you restart, all parts of Home Assistant are reset. That means any automation that may have become ‘stuck’ is reset.

It’s the equivalent of regularly rebooting a slow computer as a means of fixing a memory leak. In other words, it doesn’t cure the actual problem, it only suppresses the symptoms.