Recorder stores data inefficiently, timeseries logging probably better?

koying · June 23, 2024, 8:42am

What are your arguments that having a separate db for purely time series / statistical purposes is subpar, exactly?
IMO, that that’s actually more elegant than trying to squeeze different / conflicting requirements in a single DB, not even considering that not everyone is interested in the statistical part.

The discussions regarding timestamps precision differing per engines goes my way, tbh.

oliv3r · August 2, 2024, 7:35am

What are your arguments that having a separate db for purely time series / statistical purposes is subpar, exactly?

It doesn’t integrate nicely into home-assistant basically. Everywhere where the recorder is used (e.g. a graph of temperature changes on a sensor), this happens natively with the recorder. If I use influxdb, I don’t have that. I have add extra stuff, and it’s not inherently linked, that if I look at a sensors history, I automatically look at the influxdb graph.

I’m not against having different databases or squeezing in conflicting requirements. But making poor design choices because ‘it doesn’t matter, use something better if you want better’ is not ideal.

The fact remains, we have a database, that does store history, that does store statistics. Why would we not make it the best it can be (which doesn’t really change all that much.

E.g. using a proper timeseries table format. It’s just how we store the data, in the end, it’s all still the same. Using a proper timestamp instead of a float, goes my way tbh. You want better/more precision Use a proper database.

As for not storing each and every entry, but instead only store when data has changed, that’s just how you enter/retrieve data. The database is the same. Obviously your graphs, tables, tools need to account for the fact, that rather then having ‘one entry per second’ they now get ‘one timestamp per entry’, but for the database things don’t become ‘more difficult’ or ‘only possible in certain scenarios’. Just different.

dbaarda · September 16, 2024, 3:44pm

I too have some highly opinionated ideas about how time-series should be implemented, and after some initial attempts to get some power/energy/cost monitoring and consoles working in HA using helpers and automations, I’m of the opinion what HA has right now is barely fit-for-purpose. I suspect most people just don’t realise how bad/wrong it is and assume what it shows is correct.

To be fair, I don’t think there is a single timeseries DB I’ve seen that gets it right since RRD. They all make terrible mistakes in their core assumptions and are very inefficient. I have fragments of a doc scattered around that I should pull together and turn into a “how to do a TSDB right” document.

However HA’s attempt seems particularly bad, and the number of third-party efforts to replace it like Turn HA into ULTIMATE Data Analysis platform shows that other people are not happy with it either.

I might start pumping some effort into helping these efforts…

koying · September 16, 2024, 4:15pm

Still not sure why some people wants to fit HA into something it was never meant to be: a data analysis tool or a TSDB.

If they are not satisfied with the arguably little it provides, do like all businesses in the world do: export the needed data to a proper tool for their needs.

I work for a software company, and we do exactly that: provide some baseline tools to get simple metrics, and possibilities to integrate with 3rd parties for specific needs.

At the end of the day, we want HA to be the perfect home automation tool, and every effort put into something else than this core objective just delays it.

dbaarda · September 17, 2024, 8:56am

For me, a lot of what I want to see on dashboards and quite a lot of the automation I want requires good historical data, and ways to summarise it accurately. That means good timeseries support.

For example the current Energy Dash lies to me because I cant (yet) figure out how to convert the battery AC power input sensor (which goes negative when discharging and only updates every 15m) accurately into the total energy in and total energy out sensors the dash needs. I’ve created helper template sensors and integrators for these and they almost work, but HA has some strange behaviours that are making this not work correctly, and the energy output graph shown is clearly not the integral of the power output graph.

Part of the problem seems to be HA’s behaviour of ignoring sensor updates where the value doesn’t change. I’ve attempted to work around that by adding an automation to do “update entity” on all the relevant sensors but that doesn’t seem to work. Using Max sub-interval on the integrators also doesn’t seem to do the right thing. I’ve seen people saying they work around this by making their template sensors add a tiny random offset to the value, but man, does that feel dirty.

Also the energy dash seems to (try to) show energy usage and cost for each day, with the current day only showing the day so far. I’d like to be able to see what it looks like summarised for the past week, or the past 24h.

In the long term I’d like the automation to use historical trends in household power consumption, solar production, and my energy provider’s dynamic demand/supply pricing to optimise when to charge/discharge the battery.

I’ll keep fiddling… maybe I can make it work, but I feel like it shouldn’t be this hard.

koying · September 17, 2024, 5:06pm

If the source data is wrong, no amount of DB optimization will solve that.

If you think some calculations are not done properly in HA, neither. That would be a bug.

You can change the timespan in the top bar

oliv3r · October 25, 2024, 7:34am

@dbaarda, thank you for sharing your frustration

But we have to agree, that there’s some things that could be done to improve the situation yeah? baby steps

My two biggest issues are that a) we don’t use timestamps, but use floats instead, so converting it to tsdb is not possible; which could be fixed easily.

Secondly, duplicate data storage; which is somewhat easy with stored database procedures, but I don’t think they work with sqlite; so a read-asses-write cycle might be needed, which puts an extra burden on the databse of course. However I’m not familiar enough yet with TSDB in that it will actually remove duplicates or this is something an application would have to deal with.