Alert2 - a new alerting component

tman98 · December 13, 2024, 2:30pm

I do think there is value in knowning specifically which variable is an entity_id, and there is value in simplifying generator creation template code. Alerts are not useful if they don’t fire due to coding errors so things to make it easier to write and read them are valuable.

#1 is definitely best for me. Is there any way to autodetect you have entity_ids? It seems if you just do a lookup of a string into the entity database and find it you are pretty sure.

I do like getRaw as a fallback and genEntityId and genGroups if the regex is being used.

redstone99 · December 13, 2024, 10:51pm

@tman98, I just released v1.6.1 which changes the variable semantics for generators. Also updated the docs. Let me know your thoughts. It works like #1, so:

For each element of the list returned by a generator, Alert2 checks if it is an entity_id (i.e., has an entry in hass.states). If so, genEntityId is defined to be the element.

Otherwise, genElem is the element.

genRaw is available as a fallback. Lastly, if the element is a dictionary, the dict keys become variables - and entity_regex returns a dict with the keys genEntityId and genGroups.

Examples. They all produce the same alerts for a set of 3 battery_plus entities:

alert2:
  alerts:
    #
    # generating list of entity_ids
    #
    - generator_name: low_bat
      generator: "{{ states.sensor|selectattr('entity_id','match','sensor.*_battery_plus')
                                  |map(attribute='entity_id')
                                  |list }}"
      domain: battery
      name: "{{ genEntityId|regex_replace('sensor.(.*)_battery_plus', '\\\\1') }}_is_low"
      condition: "{{ state_attr(genEntityId, 'battery_low') }}"
    #
    # same using entity_regex()
    #
    - generator_name: low_bat
      generator: "{{ states.sensor|entity_regex('sensor.(.*)_battery_plus')|list }}"
      domain: battery
      name: "{{ genGroups[0] }}_is_low"
      condition: "{{ state_attr(genEntityId, 'battery_low') }}"
    #
    # explicitly listing elements that are not entity_ids
    #
    - generator_name: low_bat
      generator: [ 'dev1', 'dev2', 'dev3' ]
      domain: battery
      name: "{{ genElem }}_is_low"
      condition: "{{ state_attr('sensor.'+genElem+'_battery_plus'), 'battery_low') }}"

-J

woodersayer · December 14, 2024, 6:18am

Woah! Great work! I’ll take a deeper look in the next couple of days as I’ve been away for a work trip. Very excited to see this and appreciate you taking this and running with it.

tman98 · December 14, 2024, 7:02pm

Fantastic work Josh! Thank you - I have already set up a set of group of sensors for my low temperature alert, used the expand in the generator statement, and it was freezing last night and had them go off! It also appears to work correctly as I add/remove devices from the group which is awesome as I have to do no more restarts for that alert and I make changes which is frequent!! That is a huge workflow efficiency for me!

Also so glad to see unit tests - too many projects don’t have them and for a mission critical component like alerting, really comforting to see.

I feel like you’re really close to being able to do a configuration only reload given that you have dynamic alert creation/removal (which is awesome).

One small bug, it does not look like friendly_name is resolving a template. In a generated alert, name, message and condition works correctly, but friendly_name looks like the following in the actual alert itself, so ignoring the template:

"{{ state_attr(genEntityId,'friendly_name') }} is low temp"

yaml:

    - domain: temperature
      generator_name: low_temperature
      generator: "{{ expand('sensor.low_temperature_group')|map(attribute='entity_id')|list }}"
      name: "{{ genEntityId }}_is_low"
      friendly_name: "{{ state_attr(genEntityId,'friendly_name') }} is low temp"
      condition: "{{ states(genEntityId)|float < 45 }}"
      delay_on_secs: 600
      message: "{{ state_attr(genEntityId,'friendly_name') }} is low temp of {{ states(genEntityId)|float }}"

Rudd-O · December 15, 2024, 11:03am

Generators are an absolute winner. BOOM! What a banger of an integration.

redstone99 · December 15, 2024, 7:33pm

I just released v1.6.2, which adds support for templates to friendly_name in generators. @tman98, thanks for reporting that. I’d forgotten about friendly_name when doing generators.

And yeah, support for dynamically adding / deleting alert2 entities does bring us closer to supporting dynamic config reload. I haven’t decided what I’ll work on next yet.

Thanks re unit tests. I should say that while I tried to test all basic functionality and most corner cases I could think of, there are gaps. Though I just checked, and the # of lines of test code exceeds that of alert2 code by 2800 to 2400.

-Josh

teachingbirds · December 16, 2024, 9:39am

You are doing such a great job! I have moved almost all my regular alerts to this integration now and it works wonderful!

tman98 · December 17, 2024, 2:33pm

Hey Josh - everything now looks like it’s working great. I’ve had a few alerts now set up with generators and seem to be working great. What a feature!

Quick question, could you format date/time in a more readable format for the previous firings and snooze time. ISO 8061 is not the most human readable: “2024-12-17T04:00:33.892304-05:00”.

Log file format is a lot better: 2024-12-17 09:29:15.859.

If you’re doing features I’d love icons like above - maybe just make the various states (active & unacked, active & acked, triggered and unacked, acked, snoozed and unacked, snoozed and acked I think are the states) have UI config options for icons & color and defaults, that probably gives us users the best optionality.

redstone99 · December 17, 2024, 8:21pm

@tman98 , yeah I can change the time formatting.
Re UI I can add Lovelace YAML config options. I’m not sure how to put the config options in the UI itself (is there a good example of someone doing this?)

I’m up for adding some default coloring. Coloring red the condition alerts that are currently firing seems straight forward.

If a condition alert is not firing but is unacked, I’m not sure what color that should be - if a user generally acks alerts maybe it should be orange, but if the user does not generally ack alerts, one could argue it should be green / uncolored.

I’m also unsure how to color event alerts. E.g., if some event alert fired 20 minutes ago. If the user is acking alerts, seems like it should be the same orange as an unacked quiet condition alert. But if the user is not acking alerts, I’m unsure what color that should be.

I almost think that if we’re going to have default colors, we need some story for event alerts, to avoid a UI where you have an interweaving of colored and uncolored alerts, which would be confusing.

Thoughts?
Josh

tman98 · December 18, 2024, 3:44pm

Thanks - I’ll revert more on the UI - and just to be clear I think the best UI feature is configurable icons and colors for the icons for each alert - I do not think changing the color of the text will look good, I think likely will be too much visually going on.

I am having a couple of issues now that I’ve used this for a bit:

Showstopper race conditions on startup with other integrations

Asynchronous startup has race conditions initializing alerts. When you initialize a given alert, if any referenced entities are not yet set up, so possibly in an ‘unavailable’ state, or with integrations that create new entities on startup, e.g. on restart after a configuration change for an integration that causes new entities to be created, or for non-persistent entities that are created each start, the ‘unknown’ state, initializing can have a few problems.

I’m seeing this in the context of generated alerts but I suspect they may risk existing for non-generated.

The first is the values of genRaw, genEntityId and creation of friendly_name. friendly_name can render as ‘None’ (if the entity exists as an object but does not have a friendly_name set at the time of initialization, which some integrations I’ve found create on the fly during startup) or throws an exception (if the entity has not yet been created by another integration). In the former the friendly_name will now always be wrong and in the latter the alert is never created (and alert2 errors thrown). Also, genEntityId will be empty if the entity was ‘unknown’ (since the lookup in the db fails) and cause exceptions to be thrown where it is used and the alert is never created.

The second is if the alert is created but values are unavailable when templates (e.g. alert condition and possibly threshold and others etc., I’m only using condition right now) are evaluated. A lot of exceptions are thrown into the log (which is not great for figuring out if there is a legitimate problem or not). I presume these resolve at some point to valid conditions when the entities go to a valid state under the HA state machine, but it concerns me and I’m uncertain in both cases.

I’ve also found exceptions being throw that genRaw is not set, but haven’t chased those down yet as to exactly what’s happening, but I suspect it’s related.

I am uncertain if alerts that are not ‘early_start’ can be entirely delayed for initialization until after all other integrations have finished their integration under the HA framework, but it appears that’s what we need. If not can consider adding timed delays to initialize but I don’t love that as it’s prone to other errors (running too early and still having the above problems or too late and missing legitimate alerts).

Note to help me debug this I added this debug log stataement to AlertGenerator.update right before declaring the condition so you can see exactly the values of the alert and the passed in variables which helped immensely to debug the above if you can please add to help debug generation initialization:

                       break
                _LOGGER.debug(f' {self.name} generator creating alert: {acfg} with vars {svars}')
                ent = self.alertData.declareCondition(acfg, False, genVars=svars)

Alerts that were active and removed from configuration.yaml stay active

If I have an active alert, remove it from configuration.yaml and restart, it appears the alert is still active in the UI (and cannot be removed). I just manually set its state to ‘off’ to fix this, but I am not sure how to also make it ‘acked’ so it’s in a zombie state right now (I’ll just let it run off the UI clock for now but really should get it cleared out). There may want to be some clearing out of or deleting old alert2 objects that are no longer in the config, but that’s not a showstopper unlike the above startup problem.

Alert state writing to db

Lastly can we make the alert state stored to the database more frequent to the database. Especially while debugging and restarting, I don’t want to lose old alert info, 15 minutes is a long time. An optional config item would be really best to tailor the user’s system. I’m fine with far more frequent in my system.

** Question on entities going unavailable **

If an entity referenced anywhere in an alert goes unavailable during steady state operation, I think this will cause the condition to fail and throw an exception which alert2.error will catch, but want to make sure as that would not be good to be swallowed (as the alert itself will then never fire again falsy indicating everything is ok)

redstone99 · December 18, 2024, 7:57pm

@tman98 - tell me if the following summary accurately captures what you’re seeing.

A main bug is that, unlike normal Alert2 alerts, generators don’t wait for HA to fully start before starting to generate (oops). It’s as if they all have early_start set. So they’re running into all kinds of init issues.
A second bug occurs when a generator returns a list of generated elements that are used to refer to entities that don’t yet exist or aren’t fully initialized. E.g., a generator returns [ "dev1", "dev2" ], but either sensor.dev1_battery_plus doesn’t exist yet or sensor.dev1_battery_plus hasn’t fully initialized yet. This is much more likely to happen before HA has fully started (see above bug), but could happen even after HA has “fully started”.

If the corresponding sensor entity doesn’t exist yet, then the DB lookup fails and so genEntityId won’t be defined (instead, genElem will be), causing template errors.

If the corresponding sensor entity exists but isn’t fully initialized, then you may get unexpected values for domain, name or friendly_name, which are never updated.
Alert2 UI doesn’t promptly cull alerts that no longer are in the config
Alert2 DB writing isn’t frequent enough.

Fixing #1 is easy. I’ll make generators behave like normal Alert2 alerts and wait for HA to init.

#2 is a bit trickier. One option is to instruct users to add filters to their generated list to restrict the list to only elements that are ready for alerts to be created. E.g., filtering out entities that don’t have friendly_name set yet. I could imagine writing some filters to make this syntactically easier.

Another option is to add a config parameter that specifies a secondary test to run on each generated element (eg testing that friendly_name is set).

A third option is to somehow automate waiting for the “right time” to create an alert from a generated element.

A fourth option is to allow dynamic updates to domain, name or friendly_name. This seems risky to me.

For bug #3, I’ll look into it.

For bug #4, what’s going on is that alert2 writes state updates to the HA db promptly. However, on HA restart, it restores state not from the HA db, but using RestoreEntity, which is what old Alert1 does. RestoreEntity writes out state to a separate file every 15 minutes. I imagine one could make Alert2 restore directly from the HA db, but I’m not sure what’s involved in that. I presume there’s a reason that RestoreEntity exists at all - maybe reading the main HA db is too expensive during startup? To put it another way, do you know of other components that restore state directly from the HA db?

-Josh

tman98 · December 18, 2024, 9:24pm

Thanks for the quick response and the summary, I think it’s pretty accurate. A few comments:

Correct. I suspect a number of the problems I’m seeing would be resolved by delaying initialization of generation until after HA starts, so looking forward to that.
Correct, and I am not 100% certain at what time certain entities are fully created by different integrations, I would hope by the end of startup but with #1 happening for sure I can’t say yet. But to be foolproof, you’re absolutely right, we can just leverage HA’s state machine to solve any lingering entities that aren’t ready yet, I don’t think we need any special user entries. If a new simple filter just didn’t pass an entity_id through from a provided entity_list until the entity was “ready”, where “ready” is defined as: defined, available and optionally has a friendly_name, HA’s state machine should just do everything correctly. You’d want a timeout though, i.e. if an entity is not ready by a certain amount of time, there is a failure to create the alert and an alert2 error reported. So I think the filter needs to directly raise an alert2.error alert after an optional timeout period to notify the user that enitty did not become ready (and indicating what the ready failure was).

Note I included “available” in the “ready” definition. With the timeout I think this is really useful to raise a report that an entity we expected to alert on was never available to start. It’s useful to know during steady state operations, but I think there’s a particular use case on startup as that’s often after you just made a configuration.yaml change and are most susceptible to diagnosing config errors.

Various useful declarations would be to cover the use cases I think you have for entity lists and simplicity:

alert2_ready_filter(entity_object_list, [optional] wait_for_friendly_name, [optional] timeout seconds) - the least amount of work on the user, just do a selectattr or expand, which are a majority of my programatic selection criteria to pass results in for generator filters:
alert2_ready_filter(entity_id_list, [optional] wait_for_friendly_name, [optional] timeout seconds) - a list of entity_id strings is useful and actually required for one of my use cases that cannot get entity objects as it’s using a template macro to build a list
(approximately) alert2_ready_filter(list_of_dictionaries_or_lists_with_entity_id_as_key_or_item_0, [optional] wait_for_friendly_name, [optional] timeout seconds) - I think this covers one of your use cases where you wanted to pass the entity_id and other data in

(there may be others)

Example use cases I have:

      generator: "{{ alert2_ready_filter(states.sensor|selectattr('entity_id','match','sensor.remote_connection_to_(.*)'), true, 60) }}"

      generator: "{{ alert2_ready_filter(['entity_id_1', 'entity_id_2]), true, 60 }}" # A string which I often am programmatically creating by a template

And the use cases I think you’re running in your examples. I think there is usefulness to naming your filters alert2 so it’s not confusing that these are being used when reading YAML with lots of integrations (i.e. they are not generic HA filters that may be used elsewhere).

Ironically, I think this would be belt and suspenders for bug #1 as I think it actually would delay generation automatically by not passing down entities that aren’t ready yet, but #1 makes sense absolutely as that’s a clear bug and we should start the filter’s timeout period when HA has started anyway.

Correct
Maybe this is not a bug, I’m actually not sure upon understanding a bit more of your explanation. Let’s come back to that, not urgent for sure.

Thanks!

EDIT: thinking about unavailable entity states, they are a little annoying because they can fail in weird ways in a typically clean written condition (e.g. they can’t be converted into a float or int so a compare fails saying “no default” provided). And some might also want unavailable to come down and not be a failed “ready” state as they may in fact want to check unavailable as a part of their condition. Also an “unavailable” state could become available so it doesn’t actually mean the alert should not necessarily be created, so I don’t in fact love my suggestion above upon consideration. I think “ready” should be defined as defined and optionally friendly_name available.

I dread over argumenting functions, but I would see utility in my world having an optional wait_for_available argument to the filter then instead of forcing that check upon the user.

The other answer is just to add an unavailable check into your condition if you care. about it and not be too clever in the filter. Because a good clean condition requires it anyway if an entity goes unavailable during steady state operation.

So in short a patch fix for #1 would be great to get going at least and hopefully remove the dead alerts I’m getting every startup and we can keep thinking out unavailable state handling.

redstone99 · December 18, 2024, 10:13pm

Hi @tman98 , thanks for the thoughts.
I just released v1.6.3 which fixes the bug so now generators wait till HA fully starts before generating. It also fixes a similar issue with event alerts. And adds that debug logging statement you suggested.
I wanted to get this out so you could better understand remaining init bugs.
Cheers,
Josh

tman98 · December 18, 2024, 10:21pm

You’re the best Will immediately start using it!

tman98 · December 19, 2024, 7:36am

Well it all seems to be working so far with initializing alerts!

Question, did you get the gen* variables into the threshold alert’s “value” key? That’s the other alert type I want to start using too now, but uncertain if I can use genEntityId in it after reading the code (and can we make “value” evaluate a template if not!)

EDIT ADD: I’d really like reminder to be shorter than a minute for really emergency alerts. I have smoke alarms at a remote building no one is at. If they go off I want my phone to absolutely blow up! Given the name already has minutes in it, either you’d need to add a reminder_seconds field or support fractional minutes - for simplicity, the latter would frankly would be fine with me to keep from bloating the config.

redstone99 · December 19, 2024, 8:02pm

Glad to hear it’s working so far. So to respond to your questions / comments:

reminder_frequency_mins accepts a float as small as 0.01 min. Is that what you’re looking for?
Yes, the value template is aware of the generator variables.
I think I’ll make friendly_name be a dynamically tracked template. I can’t think of a reason why it shouldn’t be allowed to dynamically change if someone wants. And that’d alleviate the specific issue around friendly_name initing late in generators.
HA templates are lax around undefined variables. I.e.:
{{ zzz }} produces the empty string
{{ states.sensor.zzz }} produces null
{{ states('sensor.zzz') }} produces undefined
you can request that Jinja throw an error on undefined vars, but HA doesn’t use that option by default. It’s probably possible to change the Alert2 template eval environment to set that option. That may just affect the {{zzz}} example above.
Yeah, HA doesn’t I think have clear semantics around when you can expect an entity to exist or be fully initialized. Waiting for HA to “fully start” helps but is no guarantee. This is an issue for templates in general as well as generators. You can can work around it by setting defaults in int/float conversions and explicitly checking for unavailable or undefined, but that’s a bit klunky. And if you want to be alerted if some entity isn’t ready, you have to set up a separate alert for that.

I’m not sure what the best answer is. At one point I was toying with adding an option to Alert2 to alert if any entity referred to in an alert is unavailable/undefined for some period of time, so I could get rid of the repetitive checks in my alerts.

Adding a filter to a generator to filter out entities that aren’t ready might help, but it’d be nice to have a solution that more broadly helps the issue. And if I make friendly_name dynamically update, then the need for the filter in generators may be less.

Thoughts?

Josh

tman98 · December 20, 2024, 1:41pm

Thank you, that’s perfect and should be suitable annoying at a low number
Great! While I’m on the topic, do the other threshold alert fields (i.e. maximum, minimum, hysterisis) support templates and gen* variables? If not let’s make it all consistent, I’m desiring to use an input_number field for example to set those values.
Good idea - no reason that shouldn’t dynamically update
and 5.: I think there’s a lot of nuance to the undefined/unavailable entities. I have a few thoughts but want to think out the corner cases and potential alerting around them. Agreed that #3 should help

redstone99 · December 20, 2024, 8:25pm

@tman98 : max/min/hysteresis don’t support templates, though they certainly could.

Btw, over at https://github.com/redstone99/hass-alert2/issues/5, a use case arose that might benefit from an alert state of “something to be aware of” that’s somewhere between “off” and “on”. I was thinking of addressing that by adding a new service call reset_alert, that undoes snooze, ack and reminder timing, so, if the condition for an alert is true, it’s as if the alert just started firing. So the idea is you’d snooze the alert to mark it as a “something to keep an eye on”, but if conditions change and you want the alert to really fire again, you’d call reset_alert.

I was wondering if you’d have any use for that or thoughts?

-Josh

tman98 · December 20, 2024, 8:51pm

I’d love if you got templates into those fields in a release, not urgent but if you’re around that code…

I just replied on the github thread that actually it aligns fairly close to a discussion I was going to start, repasted here. Not sure best thread to keep this on:

Was going to raise discussing alert escalation, i.e. first notify one notifier(s) then after a certain amount of time if the alert has not resolved, a second (e.g. escalated) notifier(s). If you just didn’t put any notifiers on the first stage, you pretty much have the feature requested!

Given that I think it’s worth thinking out the generic feature set to hit a bit, but there might be a few birds that can be killed with one feature stone. So I’d personally avoid too narrow of an implementation as I think there’s some really useful broader cases (both OP and mine and potentially others) that can be hit with the right (albeit similar) functionality.

EDIT: friendly_name appears to be “None” sufficiently frequently that I think your dynamic template will help a lot

redstone99 · December 20, 2024, 11:57pm

K, will look into friendly_name next.

There are a constellation of features that have been mentioned. I list them together to see if people have thoughts on common primitives that might support them.

Alert escalation. If an alert has not been ack’d change the notification recipient / frequency / message.
Alert priority. In the UI have some way to emphasize high-priority firing alerts over lower-priority ones.
Alert tagging. Add a piece of metadata to alerts who’se purpose is to make it easier to select subsets of alerts in the UI. E.g., one Lovelace card for indoor alerts, one card for outdoor alerts.
Reduce duplication of same notification / reminder logic across alerts. As the sophistication of controlling notificaiton behavior increases, it’d be nice to not have to repeat e.g. complicated notifier configs in many alerts.
Support a new alert state, “something to be aware of”, in addition to the existing binary state “on” (firing) or “off”.

some possible pieces of solutions

a. I’d toyed with the idea of making reminder_frequency_mins take a template, and exposing a variable reminder_idx that is the ordinal number of the reminder sent for this firing of the alert. You could expose that variable to the notifier and message template as well. Then in principle you could implement any escalation policy you wanted. Downside is it requires writing non-trivial jinja.

b. One could also imagine creating the notion of “tagged defaults” in the alert config. Then you could express config parameters that multiple alerts share. I’m not sure if overloading “default” is the best way to express grouping, but here’s what it might look like:

alert2:
   defaults low_pri:
     notifier: ...
     reminder_frequency_mins: ...
     ui_tag: low_pri
   defaults hi_pri:
     ....
   alerts:
     - domain: test
       name: foo
       use_default: low_pri

-Josh