Historical data seems to be lost

I started my HA instance in June 2022, I have added new devices gradually since then and everything looked to be fine.
When looking into an entity’s historical data on the logbook today, I noticed this entity doesn’t have any data before 2023-10-10. I first thought there might be a problem with this specific device, but upon checking the logbook, my HA instance doesn’t show any historical data for any entity before 2023-10-10. This is a bit strange and I always thought I would have access to the historical data whenever I need.
Is there a way for me to troubleshoot this? I haven’t made any specific changes to my HA setup recently, other than updating the HA and every other stuff I use with it.
The home-assistant.log events start from 2023-10-15. I have daily full backups for the last month (since 2023-10-1).

There is nothing wrong.

The recorder only stores ten days of state and event data by default.

If you want more you can increase the purge_keep_days setting up to a month or so without creating performance issues.

If you want longer, then give your sensors a state_class so they generate long term statistics. LTS will then store 5 minute max, min and average data up to your purge keep days setting then this will be downsampled to hourly max, min and average data and will be kept forever. The 3x 24 hourly samples per sensor per day does not cause any egregious inflation of the database size.

Your other option is to export data you want to keep for years to InfluxDB and graph this with Grafana.

15 minute speed guide:

Ah this is interesting. I always thought every data point was stored indefinitely by default. Thanks for sharing the links and the video. I will read up on the best practices for keeping the data indefinitely.

Is there a guide on setting state_class and generating LTS? Is it enough to update the purge_keep_days to a longer time and then manually check every important sensor to make sure they have a state_class?

Possibly. It depends what you want to do with the data. If all you want to do is display it then yes you can use the core statistics card, or 3rd party cards like History Explorer or Apex charts.

It is a bit more involved if you want to use the old data in templates as you need to use SQL to retrieve the data.

There’s no real guide. Check Developer Tools → States. The right hand column lists the sensor’s attributes. If state_class is not there it depends how your entities were created. Check if the integration supports adding a state class.

You can always do it manually for ones that don’t support it. See: Customizing entities - Home Assistant

If you want longer, then give your sensors a state_class so they generate long term statistics. LTS will then store 5 minute max, min and average data up to your purge keep days setting then this will be downsampled to hourly max, min and average data and will be kept forever.

What should I set for purge_keep_days to be able to store the data for any sensor with state_class forever? An arbitrarily large number?

You can leave it as is. You will get 5 minute stats for 10 days, then it will be downsampled to hourly stats and kept forever.

So I should see the historical data for the sensors that already had state_class for months, right? I checked a few and none of them have any historical data (on the history or logbook of HA). Am I missing something?
Just an example:



Yes. That is the state history. For displaying LTS:

Ah, ok. Now I fully understand. Thanks for responding to my (probably dumb) questions.

1 Like

@tom_l Are there plans to make this more user-friendly / a feature?

From a user perspective (regardless of performance impact/technicalities) I want to easily review any data I have collected in the same interface I am used to see my current data plus without additional configuration.

Aggregating data over longer periods is super useful. For example, comparing the same time last year was a use case I was expecting to be able to use HASS for. Reading this means that all the data I had is lost since setting up HASS as it is way longer ago than 10 days.

No data is lost if your entity has a state class so that it generates LTS (long term statistics). This is down-sampled and kept forever.

Improvements have been made in the eight months since that post was written. LTS can now be viewed in the History panel if the period exceeds your state data records.