While looking into what is contributing to the size of the recorder DB, I noticed that a Threshold binary_sensor I created (in the UI) is responsible for a row in the states table every 10-20 seconds even though the the actual value of the threshold binary_sensor hasn’t changed in 10+ days.
The entity the threshold sensor is based on is a power consumption sensor so it updates fairly frequently. I wanted to alert/take action if power consumption is above or below the threshold for a period of time.
There is roughly an equivalent number of rows in the states table for both the base power sensor, and the binary threshold sensor. The main change (purpose?) of the new states for the threshold sensor is recording the last_updated date as the base power entity’s last update.
So even though the state of the threshold sensor hasn’t changed, there is a new state for it, because it has been updated to the same value. Is that correct?
If so, I can somewhat understand that, but it certainly seems less than optimal. All of those extra state rows aren’t providing much value – unless I’m trying to debug and figure out the last time the threshold was evaluated.
Note that there really aren’t any clues for the user in the UI – history doesn’t show any changes other than when Home Assistant was restarted.
I suppose there are some ways I might be able to use trigger based sensors or even an automation with triggers to replace this binary sensor and reduce the DB size. But it would require coming up with a way to create an entity whose last update time isn’t tied to the actual power sensor update.
I’m looking for suggestions, but also wondering if this seems reasonable or should be considered a bug or feature request against the threshold helper.
DANGER: Entities that generate a significant amount of state changes can quickly increase the size of the database when the extra_state_attributes also change frequently. Minimize the number of extra_state_attributes for these entities by removing non-critical attributes or creating additional sensor entities.
Thank you for that clue. Now that I’ve looked at the Threshold helper entity in developer tools, I see that threshold is recording the value of the sensor it is based on as the attribute sensor_value.
So it is worse than I thought – for each threshold sensor that gets defined, there will be 2 rows (states and state_attributes) written to recorder DB every time the sensor it is based on updates.
The attributes stored winds up being a long string (251 bytes in this case):
'{"entity_id":"sensor.ice_maker_current_consumption","hysteresis":0.0,"lower":1.5,"position":"above","sensor_value":135.4,"type":"lower","upper":null,"device_class":"problem","icon":"mdi:snowflake-thermometer","friendly_name":"Ice Maker Low Power Use"}'
I don’t see anything relevant I can change about the threshold sensor configuration either through the UI or YAML. This is the config from the UI
Having the sensor_value seems like debug information that probably shouldn’t be in the attributes since it is already available in the state machine, and already recorded in the database.
I find it somewhat discouraging that some of the built in integrations like threshold and history stats have some long standing issues. As users gain more experience they just implement their own template sensors. Fixes things for them, but doesn’t help others avoid the same problem.
The original code owner just suggested working around this by preventing recorder from persisting the entity. Not helpful if the derived entity was intended to provide meaningful data by filtering some noisy sensor.
I feel like there should be a process for deprecating internal integrations that are no longer maintained. If threshold was removed then people would just go directly into searching for posts on how to implement their own using templates.
I think this one boils down to most users won’t be aware of the potential database impact. By the time one knows enough about Home Assistant to understand how threshold is impacting the database, they have the sophistication to implement a workaround.
EDIT:
@frits1980 - Have you dug into this on a somewhat recent version? At some point (last 2 years?) there was some DB schema changes that I believe were intended to avoid the storage of redundant attribute data. Just guessing but I think it would still generate lots of state updates, but not have as much of a size impact if a shared attribute row was being used.
Also before anyone says “just submit a PR”, I think fixing/optimizing this integration needs a decent understanding home assistant internals so this isn’t a good Hacktoberfest/first PR.
@tom_l before you posted this helpful comment did you happen to notice the issue was open from October 24, 2022 until December 24, 2023?
I clearly slacked off since I didn’t notice it was marked stale during the 7 days between Dec 17th and Dec 24th after keeping the issue open for 14 months.
For me personally it’s not so much about the DB impact. I will never have DB issues. But it is more about the fact that I cannot build an automation triggering on state change. Because even if the state is the same at it was before any update it “changes” to the same value.
My workaround is not a template sensor because I hate the templating language. It’s not anybody should have to learn in the world we live in (imho).
I created a boolean helper and an automation that changes it like the threshold sensor should.
Nevertheless I agree with the point of not solving issues in the core integrations. The sensor now is clearly a deviation to the standard so it should be solved to behave the same as other sensor helpers.
Thank you. But I’m not a programmer, nor coder, not scripter. Everything that looks like code I try to avoid. Not because I probably cannot do it, but because it would give me to much of a headache. I’m a GUI guy, or actually I became one
Nice. But I don’t really understand why this behaviour is different then other sensors to begin with. But that is not a question for you of course. Thank you for pointing this out to me!
You can use nothing when writing your automation in YAML. The tilde (~) means the same thing (null) and is required when using the UI. Or at least that is what I thought (I’ve never used the UI for automations). Are you saying you can use nothing when writing automation is the UI?