Alert2 - a new alerting component

@cerebrate I think you’ve nailed the two concepts of staging and “sub-alerts”. Both have a lot of value and well described. Also your skip_first use case makes sense to me (I’m also trying to think if there’s any broader abstraction there but haven’t found one yet).

Been thinking about alerting more in relation to everything we’re doing and my own use cases.

Providing the ability to call an arbitrary set of services when a condition/stage occurs and not just a notifier is valuable. For example, if I get a leak alert, I want to shut off my water valves. If I get a fire alert, I want to shut down boilers. And I actually think calling the services each reminder interval is fine (make sure the valves are off for example), once I don’t care about the alert redoing things I can ack it. I could see an option to call services only on the alert being fired the first time.

Equally, there could be optional services to fire when the alert is over. In a leak condition I might want to keep the valve off until I figure out the problem, but if there was just temporary smoke that cleared I may want to just fire up the boilers automatically because I don’t want the buildings to freeze.

Again, this would be wonderfully simple yet materially powerful to link to alerts because we already have delays to start, hysterisis, etc. And with staging, I could have a first stage notification that a problem exists, then a second stage action to shut down the valves and/or boilers. That is big.

Automations alone would be pretty hard to do the above actually (notably the staging), and then we’d have disjoint action taking in the system from alerting to the user - which means you actually can’t know for certain if an action has been taken when you see an alert (you have to know the underlying automation code vs. alert code). For my use cases, having them exactly aligned would be the best, for example a staged message can be “Smoke has progressed, shutting down boilers” and reduces emergency condition code in my system to being in one place: alert checks fire notifications and take actions.

EDIT: Also @redstone99 with logging, could you ensure there is an INFO level log statement that is detailed for each alert creation and deletion, as well as each activiation, deactivation and notification (and unified across generator and not)? I’ve found some inconsistencies with what is at debug vs. info, and the problem with the debug level is every condition and trigger check gets logged which spams the logs when I’m not trying to debug at that level. I actually think that’s a fine set of info to log at debug as when you really need to figure out why an alert is not firing, you want that detail. But I want to be able to confidently go into the log when it’s at a normal Info level and check the major events related to alerts, like making sure they were created in the first place, fired, deactivated, etc. Each should have a full set of info to be able to go back and diagnose if something went wrong given the mission criticalness of the component. Thanks!!

EDIT2: As I’ve expanded my usage of Alert2, found more and more use cases. I now have it sending alerts for nice to know conditions (e.g. I’ve started to warm up a building, tell me when it’s gotten warm enough) vs. emergencies (smoke alarm or leak). A way to designate severity on an alert and in the UI would be great, I know you are doing UI work and we referenced some form of severity in the past so just reraising now as you’re doing the work.

Hi All, I’m sortta on vacation for the next week so may be flakier than usual responding.

I like singling out “severity” as a common use case.

“sub alerts” feels close, but at least in the door-open use case, there is a severity aspect to it. The door-open-and-its-cold seems more urgent than just door-open. But agree that there’s not any apparent severity relationship between door-open-cold and door-open-nighttime. Not sure how to best capture that.

Actually to that point, does anyone have an example of “sub alerts” besides the “door open” case? Having some other examples might make it easier to extract common pattersn/abstractions.

@cerebrate, roger on snooze. Added to list
@woodsayer, I like the idea of being able to take alert actions from the companion app in a handy way. I think I need to play around with what you’ve described to get a feel for it first.
@tman98 , roger on cleaning up the info/debug so there’s consistent alert lifecycle messages in logs. Great idea. Added to list.

And having alerting support remediation (e.g., turn off water valve) is interesting. Definitely some thinking to be done about how to best support.

For my standpoint my plan roughly is:

  1. get the UI alert editing ability to some sort of alpha release
  2. Do a clean-up pass on Alert2 and pick off some of the easy logging and other simple feature requests.
  3. maybe next support latched alerts. I’m currently thinking maybe we introduce trigger_off and condition_off and manual_off that determine when an alert turns off. So if an alert just specifies trigger, it’s a momentary alert. If it specifies only condition, it’s a traditional condition alert. And if it specifies one of the new fields, it’s a latched one. Something like that.
    Cheers,
    Josh

Another variation on the theme that occurred to me is alerts that are relevant either when they run for a long time, or when they’re short but chronic. I use my HA instance for server monitoring, too, so the first examples that spring to mind are memory/CPU usage alerts, where sustained high level use is worth a notification, but short high level bursts aren’t - but it’s still handy to fire the alert, because the alerts/time is statistically useful.


I’ve been thinking about this, and I’m not sure that it’s something that would need implementation in Alert2 itself, except when it comes to making sure that implementations of staging do the needful to support it. (Unless you specifically want to call the service every reminder-time, at least.)

I say this largely because I’ve already, in some cases, got automations set to trigger when an alert goes off->on and on->off (to avoid having to duplicate the conditions of the alert in the automation, and it also includes the delays to start, hysteresis, etc. of the alert), and I’m not seeing much of a win from just moving the link between the alert and the automation from out of the latter and into the former.

(Where staging is concerned, to make this easy, this would mean implementing staged alerts to not be off->on->off, but rather be off->stage1->stage2->stagen->off, so that the stage would be visible in the state, but that doesn’t break backcompat so shouldn’t be an issue, I think?)

Am I missing something here?


That may just be an unfortunate quirk of my calling it severity, because you’re right. Maybe a better way to look at the distinction is between “non-dependent” and “dependent” cases, where one set, the one we’ve been calling severity, is alerts that are added to by the same conditions, and the other is alerts that are added to by an outside factor.

So non-dependents would be the “X open longer”, “X got hotter”, “X space decreased”, etc., type of alert, and dependents are all the ones best phrased as “This… AND ALSO that.”

(There’s a little subjectivity in here. One could argue reasonably that a second smoke detector going off after the first constitutes “fire got worse” or “This is on fire… AND ALSO that’s on fire now”, but I think that sort of decision might be best left up to the person setting up the alerts and how they want to handle that sort of case.)

Thinking about severity:

Being able to specify a severity or priority on alerts (as just a simple integer, I suggest, higher being more) would be good for UI purposes anyway, for example, being able to sort displayed alerts by their severity.

(The fancy version of this would take from the gauge card UI and let us specify alerts to be displayed in amber, red, etc., as the severity got higher.)

And then that could integrate with both severity/non-dependent/staged alerts and sub-alert/dependent alerts. I would propose that for the former, a different severity can be assigned to each stage, so they can bubble up as they move through their stages.

And for the latter, perhaps a sub-alert going off gives its severity to the parent alert, if that alert’s effective severity is lower? So, as an example, if we declare three alerts, door_open (severity 10), door_open_after_nightfall (severity 30), and door_open_while_cold (severity 50), and door_open is already on, when door_open_while_cold goes on, it promotes door_open to severity 50 and it starts showing up at the top of the panel along with the sub-alert beneath it.

So, basically, the effective severity of an alert is the highest of its own and that of any sub-alerts it has which are currently on.

Well, I just implemented the equivalent of “Pool Pump Power Low” plus “Pool Pump Power Low AND ALSO Outside Temperature Below Freezing”. :slight_smile: There will be some more subalerts going onto the former once depth and pressure sensors have been added to the pool automation.

I’m also in the middle of rewriting some purely-advisory logic about inside-outside temperature differences that will look at the state of the house and advise on possible ways to help out the HVAC, which will have “Significantly Cooler Outside” alerts for summer, with sub-alerts like “You Could Open This Window” and “You Could Turn On The Whole House Fan”, and likewise in winter with “Significantly Warmer Outside” and “You Could Close This Window”, etc.

I also also have some loose thoughts on implementing some rather complicated sets of them that are basically my first few steps in debugging network issues hereabouts, but those are still a bit vague to go into details about.


All sounds good to me. Enjoy your vacation!

Added:

While I don’t think it’s all of them, I think "here is a situation (primary alert), and here are potential problems with it (sub-alerts)"¹ and "here is a problem (primary alert), and here are possible mitigations for it (sub-alerts)"² are likely to be two very common sub-alert usage patterns.


  1. The door-open scenario, for example.
  2. The HVAC scenario, for example.

Good stuff @cerebrate I’ll think about all that!

Interesting how you are using automations to trigger off of the alert. I hear you in that if the alert had enough to make triggering an automation simple code to write for each desirable case, that might be sufficient. There is a difference between an indirect association between alerts and actions that might occur when they arise (triggered automations) vs. explicit associations in the form of scripts. I think the latter is easier from a usability standpoint, as the logic is all together in the alert YAML and not subject to automation trigger code correctness. In the end I’m not sure if we have to actually force one or the other on users - if you want to use automations do that but adding service calls to Alert2 would be trivial (after all a notification call is just a special cased service call).

And @redstone99 enjoy your vacation - well deserved.

Hm, yeah, I can see how it would be desirable from a clarity point of view.

Are you thinking about putting the entire script in the alert configuration (which “the logic is all together in the alert YAML”) might imply, or just the action to run the script?

(The latter would seem to me to save on adding redundant complexity to Alert2, at the cost of having the script code elsewhere, but would also let you use a single script to handle multiple alerts by passing parameters to it from the alert. Since I think most people are likely to have multiple similar alerts, this seems like a win to me, especially from the maintenance PoV.)

Thanks for asking for the the clarification - definitely was intending the latter, that slert2 can make generic service calls, of which a (optionally parameterized) script is one possibility. For all the reasons you stated that is the right abstraction to me. I would not suggest supporting arbitrary action ymal action logic in alert2 configuration, just n number of services calls (so could be a script call or a simple service call to turn a valve off).

EDIT: I have started actually flipping the problem (for now). When I have critical causal relationships I want to both alert on and cause action off of I’m creating an automation that both takes action and performs an alert2.report_event call, so at least there isn’t too code paths separate from each other for action and alerting. THat works for now -e.g. if a leak is detected I shut off my water valve and fire an alert2.report_event. Given the severity of that condition I also have a normal alert2 condition alert on the leak. But if that happens I really don’t mind my phone blowing up.

Hi all - small update on my end. I have the UI code mostly working to create / edit alerts in the UI. Now trying to write some basic testing, which is a bit complex because it involves interactions between the browser and the backend. I was faking out most of HA in my unittests up until now, but I think I need to switch to a better test framework, probably pytest-homeassistant-custom-component. Figuring out how that stuff works.
Cheers,
Josh

2 Likes

Awesome great to hear! I’m curious how this will work - now that I’ve been doing lots of alerts it’ll be interesting to see how the UI works!

One small note in the UI, months are off by 1 in the new formatting, this is an event today should be 1 for Jan not 0:

image

The second is some alerts seem to disappear even if unacked. It may be related to trigger alerts? I think my condition alerts stay after ‘off’ in the UI until I ack. But this trigger alert did not (and you can see the ack time is before the next fired time):

image

Hi - yeah, I also noticed the off-by-one in the month field of event times. Fix will go out with UI stuff.

Re disappearing alerts, are you describing the following series of events:

  1. an alert fires and shows up in the UI. Let’s say the UI display time window is set to 4 hours.
  2. the alert is disappearing from the UI before 4 hours have passed and the alert would naturally disappear

When you next notice this happening, can you go to “Developer tools” → “States” and tell me what the state field of the alert entity is? Maybe screenshot the whole row, including the state and attributes. I can’t think of a bug off the top of my head, but trigger alerts use the state field differently from condition alerts, so it’s certainly plausible for there to be a bug there.
Thanks,
Josh

Hi @redstone99 this may be me misunderstanding one of the releases. I thought in 1.5.2 (or around there) but may have misunderstood, when you added the group of “Acked, snoozed or disabled” alerts, that unacked alerts would stay in the top group no matte what until acked, i.e. that condition alerts that were even off would stay there until acked and that trigger alerts would stay until acked.

The use case I had raised back then is if an alert goes off at 2 am, and the slider is set to 4 hours, then I will never possibly know if happened overnight. I want the alerts to stay persistently in the UI until I acknowledge that I have seen them and taken action (whcih for me is hitting “ack”). If that’s not the feature as implemented, then I’d like to re-raise as a critical feature (this in fact happened to me last night for a important temperature alert with freezing temperatures in the northeast - I decided to expand the slider and found a few important ones I didn’t know about).

If the behavior that I think I thought we had in (that unacked alerts never disappeared even if older than the time slider) is not actually a feature, can we please add as an option to the UI, it’s critical in my uses.

Thanks!

Hi @tman98, at present, alerts that aren’t snoozed or disabled are shown in the UI only during the time window. Yeah, I haven’t changed that yet - sorry if I didn’t communicate that clearly. Probably what should be done is add a UI settings icon to the Alert2 UI card that lets you select to preserve unacked alerts in the display as you wish. I’ll say it’s easier to implement that feature now due to the UI alert editing work because the server now has a way to store UI state. At least it will once I finish this UI bit of work.

As a hacky work-around, you be able to create a watcher alert that is firing as long as there are any unacked alerts. So that watcher alert would continually fire in your 2am example and so be visible the following morning in the UI. I’m don’t remember off the top of my head if Jinja is powerful enough to write such a spec.

Or if you wanted to implement a temporary YAML config field that’s exposed to the UI and triggers the persistent unacked alert behavior, I’m open to that, at least till we figure out doing UI settings in the UI itself.

J

Thanks @redstone99 no worries, a lot going on! I’d rather not hack things up probably. Let me try a little template noodling and might be able to get something to work for now. I’ll post if something works!

Some additional thoughts: I think the features we’ve been discussing all have value, both to my setup as a real world example, but make sense generally, in order of priority for me:

  1. A display template for displaying a current value for an active alert in the UI (for me this is the high priority as almost every alert I go having to find the actual current relevant value in a multitude of tabs so this is a multiple times a day issue).

  2. UI having the ability to display alerts in custom defined groups with filters/sorting defined per group as well as a header per group (so I might have active “Alarm level” unacked alerts first, followed by other active alerts sorted by severity, followed by active acked alerts, followed by unacked inactive alerts, followed by snoozed within the time slider, and maybe no disabled for a primary working tab [but shown on a secondary maintenance tab])

    a. This also solves my don’t hide unacked alerts problem just discussed.
    a. I don’t know best way to write how groups are filtered or sorted yet in YAML.

  3. Priority/severity level on alert (a template would be best, I think a number makes sense but I’d suggest also providing a predefined ordered set of priorities linked to the number like “Alarm”, “Critical”, “Warning”, “Notice”, “Info” for UI, notifier and log readability - possibly with YAML override of the map?

  4. Alert staging where an alert can progress through increasingly more “critical” scenarios with new messages, notifiers, display messages, priority. I think skip_first can basically be implemented through staging, perhaps there’s a short circuit YAML flag but I think at the implementation level it might be the same thing. I have found myself with way too many alerts firing notifications to my phone, but I don’t want to lose the fact that they happened. Often I’d like a stage with a short delay_on time and a notification to a “nothing” group (so it just shows in the UI) and a longer delay to fire a notification to my phone - for example, I’d like to see in the UI I may have lost connectivity to a sensor but I don’t need to know on my phone every time, unless its so long its really a problem.

  5. Ability to call arbitrary services when an alert fires in parallel to notifiers (to give user control of what to do besides just notifying)

  6. The abiltity to have actionable notifications - I now find myself having a lot of alerts, and especially without staging as described above, I need to snooze some of these. Especially with critical alerts, getting to snooze fast is often important if they are firing too much during a meeting!

  7. Subalerts, feel like the implementation/configuration needs a little thinking still but the feature makes sense and I see how I’d use it.

Lower priority but desired:

  1. Setting 1 or more tags on an alert (and UI supporting filtering on tag)

  2. I still desire icon, icon color and possibly text color setting in the UI. I’m not sure the best implementation yet though (e.g. is icon always a function of severity or user configured possibly templated at the server level so can change, etc.? What about color?) The big risk here is over configurability and management of the configuration vs. flexibility. I’m really torn on best feature here to be the most useful for visibility but not going into too much complexity, so I think this needs thinking.

  3. availability template for determining if an alert value is unavailable (and not evaluating the condition and not firing in this case - I think writing your own available alert is the best way to deal with unavailable entity alerting f you happen to want to use the availability template). For now my threshold alerts throw exceptions if they go unavailable, which is ok, they just get caught by the alert2 error alert.

EDIT: Tangential to the above but related to ack, I feel like ack is acting inconsistently but it’s hard for me to tell as it needs to be observed over time (without a deterministic test pattern). My understanding (besides what i thought might be in as a feature above) is that if you ack an alert, it should not send reminder messages until it goes off and on again. Additionally, I would anticipate that if an alert was acked and snoozed, when it un-snoozed it would stay acked. I think the former is happening, but I beleive when a snooze period ends an acked alert goes unacked, which I don’t think should happen unless the alert goes off than on?

EDIT2: I can confirm that acked alerts do indeed continue to fire reminder notifications even after acked it appears in my setup.

EDIT3: Some edits inline based on my use cases.

Hi All, I’ve finally released Alert2 v1.8 and the corresponding UI v1.8 which supports creating alerts via the UI. Please give it a whirl and let me know what you think.

In more detail, it introduces a new Lovelace card, alert2-manager. From there you can adjust defaults, create & edit alerts, and seach over alerts created in the UI. Also, as you edit any config field, the UI shows a “Render result” line, giving you feedback on how the field is interpreted by Alert2. This includes generators. So you can write a template in the generator field and it will show you how many entities match and give a sample of the template variables defined. And the “Render result” for other fields will update to reflect whatever variables result from the generator.

This release also tightens up the validation requirements on condition fields to produce a truthy value (yes/on/true/1, etc).

The UI has a moderate amount of documentation, and you can click on any field name to see some examples of what you can write in the field.

I originally was hoping to make the UI more click-based like when you create an automation, but that looks to be a bit tricky since Alert2 is not part of HA core, so I figured I’d get something text-based out and see what everyone thinks.

EDIT: I should add, this release involves a fair amount of code reworking. I spent a fair amount of time testing and writing tests so hopefully there will be no regressions, but wanted to mention it.

@tman98 - I’ll look into what’s up with acks and notifications.

-Josh

Awesome! I’ll check it out probably over the weekend. Quick question over the manager - how will it interact with YAML created config file alerts? Just want to make sure I fully understand before using it. It appears that defaults will be overridden by UI edits, how about alerts themselves?

It’s awesome if the generator shows you what will happen! I’ve been using the template viewer in Developer Tools but you have to write some for logic to then test the generated alert template code so that’s awesome if this helps out with writing the generator alert logic!

Briefly - UI and YAML alerts are in the same namespace, and the internals enforce that an alert is not created if an alert of the same domain+name already exists. And the YAML alerts initialize before the UI ones.

  • J