Custom Integration to import long term statistics from a file like csv or tsv

The most probable cause of the history_stats disappearing after a while, even when they’ve been visible for a short while, is that somehow Home Assistant doesn’t allow/recognize long term statistics in its database that it hasn’t created itself using the history_stats platform. And then it simply removes what it considers to be invalid data. I don’t think this integration is to blame for the disappearance.

However, there is also the possibility that long term statistics from the history_stats platform are somehow stored in a different way than those for regular template sensors. And thus that upon importing from a csv file they need to be stored differently?

Perhaps someone with more intimate knowledge of how history_stats are stored could comment on this?

In an attempt to answer my own question, I queried the database directly, and I looked at the long term statistics of 2 sensors. One is a regular template sensor that records the total amount of energy consumed by a device, the other comes from the history_stats platform and it records the time the device was running on a day by day basis.

There is one row per hour, as is to be expected. Both sensors have a timestamp in the start_ts column (and some other housekeeping columns), but the way the actual stats are stored is different.

The sensor for total amount of energy stores the data in the state and sum columns, but the running-time sensor has the data stored in the mean, min, and max columns.

I can also see that the data I tried to import for the history_stats sensor actually did end up in the database table (which is called “statistics”). However, since I labeled the columns in the CSV file as ‘state’ and ‘sum’, that’s also where they ended up. So the table has null for mean, min, and max, and some values in state and sum.

It seems that Home Assistant ends up deciding to ignore these state and sum values for a history_stats based sensor, since those values aren’t supposed to be there to begin with. So even though the data is in the database, in the end it doesn’t get displayed. Which makes sense, it wasn’t supposed to be there in the first place.

So in conclusion, rather than importing the history_stats data in the state and sum columns, I should have used a mean, min, and max column.

I need to figure out now how to structure those, but more importantly how to get my hands on them. When I get a graph of this sensor that comes from history_stats in the history screen of Home Assistant, and then use the Download feature, it gives me the data as ‘state’ (which might be a bug in Home Assistant actually).

How about importing a log file as json? Would it be big job to add json support to this custom integration?
Background: I have build using python some basic home automations at home. I have been using JSON as logging and now I am trying to start using home assistant. It would be nice to add couple of years worth of log’s from my home in to home assistant.

Great you found that out!

As I said, I have no idea about history_stats sensor, but is it possible to set the state_class for them?

For “normal” sensors, I set the state_class for total_increasing for counters (state, sum) and to measurement for analog values (mean, min, max).

Its also interesting that there is no error when historical values are written which do not fit to the sensor …

Did you try to convert your JSON data to CSV? There are online converters as well, and I suppose you anyhow have to format your data to fit to the specification of the integration.

Apart from that, i am using pandas read_csv. There is also a method to read JSON in pandas, so it should not be that big deal … do you have programming skills? Then you could try it on your own, starting point would be here.

I’m trying to import the last 3 months of electricity from my meter (I have the kwh values ​​of each hour in a csv file). I would like to point out that it cannot be integrated into HA, so I have to create a sensor template from scratch (total_increasing with model state equal to 0).

What steps should I take to avoid incorrects spikes? I’ve made several attempts, but I always get these spikes. See image (today).

Please check Custom Integration to import long term statistics from a file like csv or tsv - #63 by TomMarg, does it solve the issue?

Yes. Thanks.

I was successfully able to import using Spook. I used the following format for the service call:

service: recorder.import_statistics
  has_mean: false
  has_sum: true
  statistic_id: sensor.test_spook_import_energy
  source: recorder
  unit_of_measurement: kWh
    - start: "2024-07-02 00:00:00-05:00"
      state: 1
      sum: 1
    - start: "2024-07-02 01:00:00-05:00"
      state: 1
      sum: 2
    - start: "2024-07-02 02:00:00-05:00"
      state: 2
      sum: 4

I downloaded my intervals from Xcel Energy’s website. Opened in Excel and created a simple concat function to generate the individual hourly starts, states, and sums. Pasted into the Statistics field (or you can just copy the full YAML above).

I was capped at roughly 2,000 entries, but I think it varies depending on the total length of the individual entries.

Do NOT attempt without doing a full backup prior to using it though. Run the import on a test sensor first, make sure it works the way you want, then do an import on the sensor.

I did have an issue with the final hour on the import. Usually it was some negative value. Easy to fix a single entry though. Just look at the outliers in the edit stats window to see if there are any after importing.