@Mariusthvdb My HA setup currently has 181 automations, 27 scripts and 294 helpers. Many of these things are just for monitoring/alerting (so not an issue) but I also have mechanisms implemented for active control of:
Hot Water - immersion heater and gas boiler.
Individual room heating - via presence based control.
Powerwall export mode - solar or everything.
Powerwall automatic backup reserve adjustment.
Powerwall charging control based on if my EV is charging or not.
Powerwall automatic on/off grid control - to optimise use versus export of solar power based on multiple factors.
Powerwall off-peak charging control.
These each involve multiple automations with different (but potentially overlapping in some cases) triggers and scripts which examine many sensors (many are shared between different automations and scripts) and modify the state of multiple entities, many of which are again shared between (i.e. can be modified by) several different automations and scripts.
This mostly works very well but occasionally I see unusual/unexpected behaviour. Sometimes this is due to bugs in my logic/code but occasionally it is hard to pinpoint an obvious cause. Given my setup, the potential for race conditions is theoretically significant, so I was trying to understand how likely this might actually be based on HAs architecture and automation/script execution concurrency model and hence whether I need to guard against it myself (and if so how). Sadly the answer seems to be ‘no one can really say’, though it does seem that HA is not immune to race conditions in user code.
I’ve looked into trying to reduce the number of automations/scripts but the downside is a significant increase in complexity (and hence an increased scope for errors and also more difficult to maintain).
Based on my 40+ years experience as a software engineer working on complex high concurrency systems I am only too aware of the havoc that race conditions and incorrect mutual exclusion control can cause and how hard it can be to track them down.
I guess I shall just have tohope that the remaining occasional glitches are bugs in my code and not due to race conditions between ‘concurrently executing’ automations/scripts.
As to what could be done, a great first step would be to provide detailed information, in the documentation, as to what isolation/concurrency controls/guarantees are, or are not, provided by HA in respect of concurrent execution of different automations and scripts (the modes for multiple instances of the same automation or script seem to be fairly well documented).
If that detail reveals that race conditions are possible then a second step would be to provide detailed coding guidelines for best practice to avoid (or at least minimise) the possibility of concurrency issues and maybe also for HA to provide some mechanism to help avoid them, such as mutexes for example, though I do foresee potential issues with allowing user level mutexes in the context of HAs asyncio implementation and execution paradigm.
My hope is that the HA devs could provide details and explanations to show that race conditions in user code are simply not possible (but I am not overly hopeful).