Thanks - I’ll revert more on the UI - and just to be clear I think the best UI feature is configurable icons and colors for the icons for each alert - I do not think changing the color of the text will look good, I think likely will be too much visually going on.
I am having a couple of issues now that I’ve used this for a bit:
Showstopper race conditions on startup with other integrations
Asynchronous startup has race conditions initializing alerts. When you initialize a given alert, if any referenced entities are not yet set up, so possibly in an ‘unavailable’ state, or with integrations that create new entities on startup, e.g. on restart after a configuration change for an integration that causes new entities to be created, or for non-persistent entities that are created each start, the ‘unknown’ state, initializing can have a few problems.
I’m seeing this in the context of generated alerts but I suspect they may risk existing for non-generated.
The first is the values of genRaw, genEntityId and creation of friendly_name. friendly_name can render as ‘None’ (if the entity exists as an object but does not have a friendly_name set at the time of initialization, which some integrations I’ve found create on the fly during startup) or throws an exception (if the entity has not yet been created by another integration). In the former the friendly_name will now always be wrong and in the latter the alert is never created (and alert2 errors thrown). Also, genEntityId will be empty if the entity was ‘unknown’ (since the lookup in the db fails) and cause exceptions to be thrown where it is used and the alert is never created.
The second is if the alert is created but values are unavailable when templates (e.g. alert condition and possibly threshold and others etc., I’m only using condition right now) are evaluated. A lot of exceptions are thrown into the log (which is not great for figuring out if there is a legitimate problem or not). I presume these resolve at some point to valid conditions when the entities go to a valid state under the HA state machine, but it concerns me and I’m uncertain in both cases.
I’ve also found exceptions being throw that genRaw is not set, but haven’t chased those down yet as to exactly what’s happening, but I suspect it’s related.
I am uncertain if alerts that are not ‘early_start’ can be entirely delayed for initialization until after all other integrations have finished their integration under the HA framework, but it appears that’s what we need. If not can consider adding timed delays to initialize but I don’t love that as it’s prone to other errors (running too early and still having the above problems or too late and missing legitimate alerts).
Note to help me debug this I added this debug log stataement to AlertGenerator.update right before declaring the condition so you can see exactly the values of the alert and the passed in variables which helped immensely to debug the above if you can please add to help debug generation initialization:
break
_LOGGER.debug(f' {self.name} generator creating alert: {acfg} with vars {svars}')
ent = self.alertData.declareCondition(acfg, False, genVars=svars)
Alerts that were active and removed from configuration.yaml stay active
If I have an active alert, remove it from configuration.yaml and restart, it appears the alert is still active in the UI (and cannot be removed). I just manually set its state to ‘off’ to fix this, but I am not sure how to also make it ‘acked’ so it’s in a zombie state right now (I’ll just let it run off the UI clock for now but really should get it cleared out). There may want to be some clearing out of or deleting old alert2 objects that are no longer in the config, but that’s not a showstopper unlike the above startup problem.
Alert state writing to db
Lastly can we make the alert state stored to the database more frequent to the database. Especially while debugging and restarting, I don’t want to lose old alert info, 15 minutes is a long time. An optional config item would be really best to tailor the user’s system. I’m fine with far more frequent in my system.
** Question on entities going unavailable **
If an entity referenced anywhere in an alert goes unavailable during steady state operation, I think this will cause the condition to fail and throw an exception which alert2.error will catch, but want to make sure as that would not be good to be swallowed (as the alert itself will then never fire again falsy indicating everything is ok)