Data science and recommended Recorder purge interval

Hi, I’m looking at doing some data science with home-assistant data and wondering if there is a recommended limit for the purge interval for the Recorder component ?

Documentation says the default is 10days, but seems extremely low if I want to do any meaning-full longer term analysis. I was hoping something in the region of 2 years. Is there any downside in setting this so high (apart from storage) ? Im currently using PostgresSQL as the backend data store for Recorder.

A separate but related question, I am also using the Influx integration to store HA state. Is there a means to use this as the data source for the HA data detective ?

Thanks

Using PostgreSQL or any other database makes very much sense from a performance perspective - regardless of the purge interval. There is a linear relationship between purge interval and database size. If database size is not an issue you can easily increase the interval.

The recorder makes it simple to show history graphs of entity states in Lovelace.

You can include and exclude specific entities or domains in recorder to be captured.

The same applies for InfluxDB. InfluxDB allows you to be more flexible on retention periods (the same as purge interval) for individual data sets. You can use tools like Grafana to create nice statistics out of data. Integrating these statistics in Lovelace is possible but more complicated.

My approach is:

  1. Include in Recorder everything where I want to show simple history graphs in Lovelace.
  2. Include in InfluxDB everything that I want to keep longer and analyze with Grafana.

Thanks for suggestion.

My question was really related to use of the use of HA data detective only supports PostgresSQL and hence requires the Recorder component. I use Grafana today with Infulux, but it’s more of a visualisation tool and has limitations where I want to manipulate data and potentially apply ML to the data.

So if I am to increase the purge interval from say 10days to 730days, I would like there to be no significant effect in HA’s performance when booting, populating the History graphs or any other maintenance operation.

Thanks

I thought that’s what influx dB was good at, maybe it’s grafana that’s the limitation.

However, if you want to keep that much data, I’d suggest writing a routine to run instead of purging. Something that moves the data to an archive database and applies appropriate indices etc. Then it won’t have any chance of impacting ha performance.