WTH is 'unavailable' breaking everything

jerobins · October 12, 2022, 7:07pm

Over the past few months how HA handles entities with state ‘unavailable’ has changed; and not for the better. When a service, like Life360, doesn’t respond to a cloud poll, the device_trackers go unavailable. When an entity used in a Bayesian sensor goes ‘unknown’, the entire sensor (based probabilities) goes to crap.

When a device_tracker becomes unavailable the person doesn’t leave home…or any other zone. While there are certainly ways around this issue: creating a whole 'nother set of template sensors just seems ridiculous, even installing a third-party integration to “do the right thing” …in other words, “what the heck”.

Thanks for listening.

atv · October 12, 2022, 8:02pm

I noticed this as well, currently there seems to be a problem with life360 as it alternates between home and unavailable/unknown, and it makes my automations not run as i filter on the previous state (e.g. unknown).

I agree unavailable or unknown shouldn’t be states to begin with, they are not (usually) usable states from an automation point and should be filtered out by default.

jerobins · October 12, 2022, 8:40pm

Life360 is just one example. It affects many others as well as it is now the default behavior of device_tracker integrations. eg. Netgear, MikroTik, etc.

FaserF · October 12, 2022, 8:54pm

Yeah, same behavior is with the Fritzbox (fritz) device integration. Always thought it was an integration issue

CentralCommand · October 12, 2022, 9:02pm

Idk if how HA as a whole has handled unavailable has changed at all tbh. It sounds like particular integrations have updated to make use of it more in ways that are causing problems for you.

There’s been a few discussions around general improvements around unavailable and unknown this WTH in here and here (and probably other places I’m not aware of). Not sure any is really a duplicate of this though.

I’m not entirely sure there is a magic bullet here though. Entities can truly be unavailable if whatever they get data from is down (device, service, etc.) Granted it seems a bit excessive to me to mark an entity as unavailable if one single cloud poll was missed unless the poll interval is pretty long and every poll is critical, could be worth a follow-up with the Life360 integration for that. But at the same time it absolutely can happen and there’s no easy way to just handle it.

The bayesian sensor for example. It uses an algorithm which calculates its state based on the states of x number of other entities and whether they are on or off. But what if one is neither, what is it supposed to do? Should it treat it as off? on? assume its state didn’t change from what it was previously? Should the probabilities of the others be adjusted to account for this lack of information? It must have handling. Either it takes a default approach and documents it or provides options for handling it. That requires FRs and enhancements.

This one similar issues to above. How do you know the device didn’t leave home or any other zone? HA is only told “I have no idea where this device is right now”. If it assumes that means they didn’t move that would be a reasonable default but not necessarily correct. Other users might want different handling.

Fortunately with this one you do have the tools to handle this the way I believe you want to, at least in an automation (other places YMMV). This is basically why not_to and not_from were added to triggers so you can do this:

trigger:
  platform: state
  entity_id: device_tracker.phone
  not_to: ['unavailable', 'unknown']
  not_from: ['unavailable', 'unknown']

Then it will only trigger if the device tracker moves between zones and not_home but not if it stops briefly in unavailable or unknown. But of course if you do this you may miss a trigger if the phone is home, becomes unknown/unavailable, and then becomes not_home once it is available again.

atv · October 13, 2022, 5:04am

Hey,
That is useful thanks.

Quick question:
If the trigger is the same as the element i’m conditioning on, Is trigger.not_to the same as {{ states(‘sensor.entity’) != ['unavailable,‘unknown’] }} ?

or is_state(‘sensor.entity’,‘unavailable’) and is_state(‘sensor.entity’,‘unknown’) for that matter? (must be a way of using is_state for both unavailable and unknown as with the states command, but i don’t know how).

Also when to use states and when to use is_state ?

CentralCommand · October 13, 2022, 5:18am

I mean sort of? First of all that template isn’t really valid. States are always strings so it’ll never be equal to an array. I think you mean this:

{{ states('sensor.entity') not in ['unavailable', 'unknown'] }}

And this is the same thing just longer:

And while I guess a trigger like this:

trigger:
  platform: state
  entity_id: sensor.entity
  not_to: ['unavailable', 'unknown']

is kind of like that template in that it will fire when the state of sensor.entity changes to something other then unknown or unavailable, they really aren’t the same thing.

A template is run by something and when it is run it returns a value. A trigger is listening for something (in this case state changes for sensor.entity). When it receives an event its listening for it checks it against its config, if it passes then it starts whatever comes next (automation, template entity, wait for trigger step, etc.)

So some similarities in how they evaluate the state of sensor.entity but a trigger is different then a condition or a template.

I mean they aren’t interchangeable… One returns the actual state of something, the other returns a boolean. Have you read this?

atv · October 13, 2022, 6:17am

CentralCommand:

atv:

Is trigger.not_to the same as {{ states(‘sensor.entity’) != ['unavailable,‘unknown’] }} ?

or is_state(‘sensor.entity’,‘unavailable’) and is_state(‘sensor.entity’,‘unknown’) for that matter?

I mean sort of? First of all that template isn’t really valid. States are always strings so it’ll never be equal to an array. I think you mean this:
{{ states('sensor.entity') not in ['unavailable', 'unknown'] }}

Yes, although for one state i have used states(‘sensor.entity’) != ‘One state’ and that works fine (at least in development tools). I think for one state comparison of a string works probably ?

I have not read that yet, no. Starting now, very good stuff in there. I think i skimmed over it in the past but it can look quite daunting.

pnbruckner · October 17, 2022, 3:13pm

Most, if not all, HA entities effectively sample some real-world phenomenon that is constantly changing. Therefore, when the HA entity updates, it is only representing an instantaneous value of that data source. Between samples there’s no way to know what the actual value of the data source is, so the HA entity is only representing an estimate of what it approximately is during that period. Note that it does not change between the most recent update and “unavailable”, even though the real value is unavailable between samples.

The amount of time between samples / known values is typically based on some sample period and is usually fairly consistent. Even if the frequency of updates is increased, however, most of the time the entity is only representing an estimate of the actual value. I.e., most of the time the real value is unavailable, but again, the state of the entity doesn’t change to “unavailable.”

Now, when there is a problem communicating with the data source (e.g., a server), that is effectively just a missed sample, similar to decreasing the sample frequency. If the sample frequency was reduced, say by a factor of two, should that result in values of “unavailable” between each of the sample points? Of course not. It would just retain the previous sample as its state. So why do that if a sample fails? Why not just keep its last known good value, since in all cases it’s just an estimate anyway? Another way to put that: the lack of new data does not, and should not, immediately invalidate the previous known good value.

Yes, it does make sense to have some indication of failed data retrievals from the actual data source. But that should be an “out of band” signal. It should not replace the sample data. E.g., a binary sensor could be implemented that indicates whether or not there are any issues communicating with the data source. That way users could know when this happens, and automations could be made to react to that situation if they wanted to. And by not mixing the communication status with the stream of data points it would simplify automations that only want to react / use known good values.

FWIW, I tried to do this with the life360 integration, but I wasn’t allowed to.

Ghafla · October 17, 2022, 11:36pm

What about adding another state: stale together with attributes last_value and stale_since (leaving last_updated for the last known value)?

pnbruckner · October 18, 2022, 1:39pm

That would defeat the whole purpose of not using the state of unavailable; i.e., keeping the status independent of the stream of sampled values.

Ghafla · October 18, 2022, 2:50pm

Could you elaborate?
I started programming 35 years ago so maybe I’m a bit rusty (it’s not my day job).
And I don’t want to incite a holy war on design fundamentals.
Did I at least get the point across that it seems to be important not only for me to know which sensors may have problems or give unreliable states?

CentralCommand · October 18, 2022, 3:44pm

You suggested doing so with a new state, stale. The whole reason this issue exists in the first place is because it frustrates people that their entities get into a state they consider “not real”.

Like if they have a numeric sensor they expect it to always be numeric but sometimes it is unknown or unavailable which is obviously not numeric. Or if it’s a binary sensor they expect it to always be on or off but sometimes it’s unknown or unavailable which is neither of those. Basically entities have a set of states you expect from their definition but then users must always account for the these other two special states of unknown and unavailable.

You’re proposal to fix this seems to be to add another special state everyone must account for in every entity. Seems exactly counter to what everyone on this thread is asking for - less special states.

petro · October 18, 2022, 3:50pm

TBH, I think we just need a wrapper integration that just removes unknown and unavailable from possible state pool without needing to make a template entity. This would create a new entity that wouldn’t have unknown or unavailable (even restores states on restart).

pnbruckner · October 18, 2022, 3:57pm

No problem. I think I see what you’re asking for.

As far as the life360 integration (which I’ve worked on) is concerned, it already has a last_seen attribute that tells you how “old” the current data is (at least the location specific data.) It can be directly used to know how “stale” the data is becoming (for whatever reason, and there are several), even more so than the state’s last_updated field.

I’ve already (locally) implemented an “on-line” binary_sensor that will indicate the server communication status. It will be on when the last query fully succeeded, and off if not. And if it is off, there will be an attribute, namely reason_offline, that will provide additional detail as to why (e.g., a network error, or a login error.) I hope to submit a PR before long. Of course, I have no idea if it will be accepted.

If/when that is released, there will be even less reason for the main state of the device_tracker entities to have a special unavailable state.

pnbruckner · October 18, 2022, 3:59pm

There already is. My custom “composite” integration, which has been around since before the person integration, does just that (as well as other things for which it exists in the first place.) Well, not the restore state part, and it is still a “legacy” tracker integration, although I hope to change that before long, too.

petro · October 18, 2022, 4:03pm

Yeah, I was more leaning towards an integration using config flow where users can just select a keep alive integration and then just select an entity. Make it a helper with yaml options as well.

Basically to solve the issue of any entity going unavailable and the user not caring and just wanting the last data.

Ghafla · October 18, 2022, 4:18pm

You’re totally right, I kinda deviated from the course. I too am totally annoyed by the unexpected states.
So I officially withdraw my proposal of an other (unexpected) state.

@petro That wrapper integration seems like the way to go. But how could that be achieved?

@pnbruckner Do you have a link to your composite integration?

pnbruckner · October 18, 2022, 5:55pm

maxym · October 18, 2022, 5:55pm

I believe most users will chose to not reset the values to unavailable during HA restarts for most (if not all) sensors. This preference has to be taken into account when considering different solutions.

Do you really want to face another “WTH I need keep alive integration for my every entity” in year or two?

It sounds to me like another ducktaping. While we are facing the core problem: mixing sensor values with information about their (potential) invalidity. The is THE problem which should be resolved. Not spending time on another workarounds.

Idea of providing information about potential invalidity is correct. But the community experienced more hassle than income . This information has to be provided anyway but not like today and maybe there should be an option to change behavior depending on a user needs.

IMO the state as well as invalidity info both should be part of every every sensor, separately.
It would give ability to provide both: consistency in expected values (ie not breaking datatype) as well as information about possible invalidity (which might be important in some specific cases).

Moreover, the system could provide configuration to change sensors behavior. either to keep last values (still giving info about potential invalidity in dedicated attribute) or reset them to ‘unknown’ state. I can imagine such a setting to be global with option to override it on device or on entity level.

At least data-consistency-wise it seems to be better than maintaining current (inconsistent) solution and adding other parts to resolve it.

IMO it would fulfill everyone’s needs providing needed backward compatibility at the same time

Great analogy!