Improving automation reliability

That makes sense, thanks. Can I then use the “on error” condition to abort the automation if it’s already moved onto the next steps?

I would suggest to use timeout:

wait_for_trigger:
 - trigger: state
    entity_id: domain.device_state_1
    to: "on"
timeout: 60
continue_on_timeout: false

It will cancel the rest of the automation if the 1st state change didn’t complete successfully within the timeout period.