Recorder Retention Period By Entity

I’d like to be able to set a Recorder keep_days value for each entity.

Background: New and non-technical users are steered toward running HA on a RPi with an SD card. Yet, if they take the defaults, every event and state change is recorded in the SQL database on the SD card, leading to a bloated database file and shortened SD card life.

Yes, I know there are lots of ways to change these defaults, including using other databases and database locations. But for most beginners, it’s a steep learning curve and this isn’t going to be high on their list - until something fails.

Not all new users have experience with database management. Forcing them to install software, download the database, run SQL commands, figure out which events and state changes they want to include or exclude, and creating the appropriate config lines is a lot to ask.

I can do all that, but even so I’m frustrated by the fact that I can specify only one retention period. There are things I want 24 hours of log data for, and others I want to keep for 7 days.

The whole recorder process seems very primitive compared to the rest of HA.

To me, the best solution would be to allow the (new) user to establish a keep_days for each entity. This should be easy to do when the entity is first defined, and wherever its properties are displayed for edit.

I see this has come up before, but didn’t get much traction. Anyone else?

Yep. I’ve got a similar usecase…

Most things I only care about a couple hours of data for on-the-spot troubleshooting (wait, what just happened)
Some things I want a couple days of data to review/analyze regularly
A few things, as needed, I want a few weeks of data (for longer term troubleshooting)

1 Like

For me one of the most important features that should be implemented as soon as possible.

Exactly this feature is the only thing I miss.
I would need to choose the data of selected entities for long-term storage. For example, store others only for 24 hours.
If it is built directly into the system, the setup will be easy even for beginners like me who do not want to work with multiple databases and do not use the advanced capabilities of other databases such as influxdb.
Now the size of the database is growing unstoppably. Because I want to store the history of temperature and the history of the amount of water in the well in the long run, I have to store everything in the long run.

1 Like

If this ever gets implemented, let me list a few ideas and questions for consideration. These are almost a brainstorm (i.e. needs more thought and refinement).

  1. Limiting to “x days” or “y entries”. In other words, old entries will be deleted if they are older than x days, and/or there are more than y entries already.
    • For most (or all) entities, having a minimum limit of entries means there will always be a little bit of history. For entities with frequent updates, that means at least a couple of minutes of history; while for mostly stable entries we can get a better overview of how they change.
    • Having a maximum limit of entries means that no single entity will be able to spam the database and consume a lot of history.
    • Having a time limit is more tangible than number of entries. It also helps to make sure at least a certain amount of time is always visible in the history.
  2. Discarding attributes instead of deleting entries.
    • Whenever a state is saved in the database, all the attributes and some extra metadata is saved together with the state itself. Some of these attributes can be quite large (a very long string), and many are not relevant or useful in the history (even more so because there is almost no UI for viewing or using old attributes).
    • Thus, instead of deleting an entry, it can be stripped down to the bare minimum (entity id, datetime, value, and possibly the user/action that caused the state change), and that by itself can save a lot of space.
  3. Reducing the time-resolution instead of deleting old entries.
  4. This advanced history management logic could be built by the community before being integrated into HA Core.
    • Someone could write an integration (or an add-on, or a script) that would do this advanced history manipulation. This would allow for quick iteration and experimentation before it gets mature enough to be included into the main HA project.
1 Like

Yes, we need to do some brainstorming! The more minds working on this the better.

Great idea! This alone may have avoided all the DST crashes we had recently, which (from what little I know) seemed related to massive unintentional spamming of the database.

Another great idea! This brings in the overall database design, which is embarrassingly poor for such a great product as HA. Just tossing a long string of attributes out in one record because one of them changed is pretty sad. It’s almost as bad as including the unit of measure with every state change value. But I digress.

Nice. A bit more complex, but establishing a different precision for recording vs. real-time sampling makes perfect sense to me.

Your point 4. goes for all of this. Brainstorm, discuss, build consensus and model the ideas before baking them into the finished product. Not the way things are always done around here, but I’m a strong advocate for this sort of process.

TL;DR: I acknowledge that specialized tools are better suited for more serious big data analysis, and that HA shouldn’t try to reinvent those tools. And I share my thoughts on the current database design, while also mentioning this is becoming off-topic.


Let me add that there are already tools specialized in long-term storage of large amount of metrics (InfluxDB, Graphite), aiming for both manageable size on disk and fast queries. And although those tools already provide some graphing UI, there is also Grafana that can connect to many different kinds of storage (including MySQL/MariaDB) and provide very a powerful graphing interface. Whatever HA provides (either now or in the future) will likely be inferior to those specialized tools; and that’s fine, because that’s not the main objective of HA, and because we are trading advanced features of such tools for a simple storage inside HA that is seamlessly integrated with lovelace.

So, for advanced users who really want the full power of big data in their home, those tools are better suited. Having them as add-ons makes them very convenient to install and use, which is already a great step. (Would be better if their storage path was configurable, though: #13, #21, #120, #179)

That said, this feature request is about improving the out-of-the-box Home Assistant metric recording, while keeping it lean and simple. Maybe the best solution would be to embed one of the existing tools/libraries (such as whisper) into HA, instead of writing lots of new code into HA; but that has the risk to become bloat.


You’ve been complaining about the database design, but not providing concrete examples on why it’s bad, nor providing suggestions on how to address the issues. (And I think that’s off-topic of this thread anyway, so just provide links.)

Sure, the database is not totally normalized, but it is simple to understand and simple to use. And it runs even on low-powered devices. (I have no idea if many joins would affect the performance, because that would need many seeks.) And it evolved organically into what it is today. And any changes require doing a database migration on 0.1M installations, so schema changes have to be done carefully. So, while not perfect, it works. Maybe using Cassandra or HBase would be more efficient than any relational database, but SQLite has the advantage of being extremely lightweight and embeddable into other projects.

So, given all the constraints, it’s not obvious to me on how bad the current design is, and neither is obvious what would be a better solution. But I digress, as this discussion seems off-topic. Heck, my entire comment here seems off-topic.

I feel exactly like you guys. I agree we are going off-topic though, is there a better place for we to discuss the database/data retention policy of HA? I have a few ideas myself, having written my own home automation software from scratch.

Disk/storage space is cheap nowadays (even in SD cards), and I believe keeping more data never hurt anyone (and could provide an good insight or two). For example, I configure almost all analog values to be written to the database only when they change by a certain amount (and force a write every hour if not changed enough to be written). This catches all sudden changes beautifully, and wastes almost no disk space if the value stays stable for long periods of time. My second strategy is to use two databases in Derby: the normal, and the archive. The normal one keeps the last 3 months of data, the other gets everything that is deleted from the normal one. While using the software it is really transparent, you don’t even know it is there; normal “recent days” queries are almost instantaneous, and if you ask specifically for data from the last 6 months it takes a few seconds to load (I have to query both databases), witch is (in my opinion) expected by the user.

I believe the standard integrated database in HA could be much improved without asking much (anything?) from the normal user (recognizing the specialized tools @denilsonsa and @CaptTom mentioned should implement all my suggestions and much more). A brainstorm on this would be great!

P.S.: I’m in the process of migrating to HA to stop reinventing the wheel, to have access to the integrations HA offers, and possibly contribute with development for the community, instead of developing for myself and a few friends.

P.P.S.: Being written in Java, I selected Derby/JavaDB as the embedded database, and it has worked wonders for me. I have it running in several 15+ year old installations, with 30GB databases and thousands of entities, never deleting anything. I also have a crude replication feature running with sucess.

2 Likes

As the OP, I’d say any discussion of improving the HA database architecture and functionality is not too far off topic from this request.

I’d also agree that a higher-level discussion elsewhere about the database would be welcome. I’m not really sure how such a discussion would even get started.

The feature I requested wouldn’t fix all the database woes of HA. But it would offer large returns for very little effort, without impacting the overall architecture. I think that could be implemented a lot sooner than a major re-write of Recorder, although I’d certainly support that, too.