2023.8: Update breaks automations

+1 for me. It seems things work if I reboot prior to my 10PM automation running.

If it fails, I can open up the automation and run each step, one at a time, and they all work, so I doubt the automation is bad, especially since it was running find under 2023.7.

Same problem on some automation (Shelly device).

However, in 2023.7.x I noticed that Home Assistant sometimes restarted by itself, so I didn’t notice this problem. And I don’t know if this automatic restart was related…

2023.8.2 : same problem and sometimes HA restarts itself…

I have tried to upgrade on all 2023.8 releases and still come across the same issue, all my switches (wireless buttons stop working) via a blueprint/automation… adn there is now sence behind what is happening. for example

all lights except ceiling spots and tv lights in one room are controlled by zigbee2mqtt and mqtt, as well as some ikea blinds connected via mqtt to wireless switches around the house 6 in total of 6 button switches.

the ceiling spots and tv lights in living room are part of hue inegration…

all fail to turn on or off, but the hue bulbs will turn off from the blueprint/automation but not back on.

Now the even bigger mind confusion, so as hue is run on zigbee of its own, I was thinking has the upgrade done something that is causing zigbee to have a hissy and interfear with each over?

because out of all the buttons the soma tilt ones still work, confused because I am still as I thought it was a zigbee related thing.

I even have 2 of the outdoor reolink camera spotlights on 2 of the siwtches to act as a light, I tried both as switch toggle and as a helper (change device to light) toggle. these dont work, so now I know its not a zigbee thing, so even more confused.

So what to try next, everything is working from the ui fine, lets try calling the services, Ok these work… nothing in logs to indicate anything.

gave up and reverted back to 2023.7 were everything works still.

image

In order to detect which automations are faulty, I noticed that their status was marked 1 on the current attribute :.

id: 078f8c74-35ad-4f37-ba02-bc280cd6526b
last_triggered: '2023-08-16T11:14:11.257927+00:00'
mode: restart
current: 1
icon: mdi:window-shutter-cog
friendly_name: VR Cuisine - Position

So I made myself a map to see the faulty automations:

type: custom:auto-entities
card:
  type: entities
  state_color: true
  title: Automation State
filter:
  include:
    - entity_id: '*automation.*'
      options:
        secondary_info: last-changed
      attributes:
        current: '> 0'

It doesn’t solve anything, but it keeps track…
image

@EndUser GitHub - thomasloven/lovelace-auto-entities: 🔹Automatically populate the entities-list of lovelace cards

FYI

Mentioned 12 days ago in post #77. It also sorts them in chronological order.

Sorry, I hadn’t seen that :wink:

Restarting Home Assistant serves to cancel all in-progress automations and reload all of them. That explains why the problem seems to disappear after a restart.

The current theory is the service call is waiting for a reply (from the entity’s integration) that is never received. Instead of timing out, it waits forever. So there appear to be two problems requiring investigation:

  1. There’s no reply.
  2. There’s no timeout.

Thanks, This may be too complicated for me and I don’t find Lovelace anywhere.

You can still use the simple technique I mentioned 12 days ago.

Thanks !!!

To the people experiencing the problem, try the following experiment to see if it helps to avoid the problem. Add continue_on_error: true to each service call that causes the automation to wait forever (i.e. get ‘stuck’).

Example

- service: light.turn_on
  continue_on_error: true
  target:
    entity_id: light.kitchen
  data:
    brightness_pct: 80

Reference

Continuing on error

1 Like

Yes, I’ve been doing that for a few days on a few sensitive automations, but it’s hard to get everything back. I hope a real solution will be found soon…

Are you saying that adding continue_on_error: true to a service call fails to prevent it from waiting forever?

It works, but doing it on all automations is very time-consuming ! Especially since I have automations where the devices called do not cause errors…

Glad to hear continue_on_error: true works.

Simply add it to the service calls that experience the problem. If you feel that’s too much work, you’ll have to live with automation failures until the development team identifies the cause and corrects it.

3 Likes

That was a smart idea.

Will you add this to the GitHub issue?

I got the idea from what allenporter wrote here:

I am reading some of the changes and it seems like before what would happen is the service would wait for a timeout then proceed anyway even if the call timed out. I think what we should do instead is timeout explicitly, and fail, and allow use of continue_on_error to continue anyway.

It appears that using continue_on_error: true can already be used to abort waiting (endlessly) for a reply to the service call.

To be clear, I consider this to be a workaround because people are reporting problems for automations that have worked properly in previous versions. Something yet to be identified is now causing endless waiting.

2 Likes