Influx DB - which data takes most space?

Team,

Goal: find out which entities I have to manage so the Influx DB does not explode :slight_smile:

Main: Maria DB
Long-term: Influx DB

For the Maria DB I can run a query and see exactly which entity generates ridiculous amounts of data.
Based on that I can fine-tune the configuration to exclude certain data.

HOW?
In PHPMyadmin select the HA database, click the states table, click the query tab and paste the following SQL, then click “GO”.

SELECT
    COUNT(*) AS cnt,
    entity_id
FROM states
GROUP BY entity_id
ORDER BY cnt DESC

This generates a list of all entities and the record count for that entity:

cnt       entity_id
---       ---------
46951     sensor.memory_free
46091     sensor.processor_temperature
etc

How can I do this for the InfluxDB?

I can find:

  1. the number of records for a specific measurement
    (like m3, kWh, MB, etc)
    SELECT COUNT(*) FROM "home_assistant"."autogen"."kWh"
  2. the number of records for a specific entity
    (like measurement: “kWh”, entity: “energy_consumption_tarif_1”)
    SELECT COUNT(*) FROM "home_assistant"."autogen"."kWh" WHERE "entity_id"='energy_consumption_tarif_1'

IS IT POSSIBLE TO RUN 1 QUERY TO GET 2 COLUMNS: count AND entity_id?

This data also gives me an idea of which older (useless) records I may have to remove to make the DB smaller.

1 Like

Using the above method I was able to find out that my “kWh” measurement
generated >7 million records in the last 8 months!

So I’m still looking forward to a query that counts all records, grouped by:

  • measurement
    and
  • entity_id

This would give me great insight into

  • entities that can be removed from long-term recording
  • entities that generate more (or less) than expected records

Anyone?

1 Like

Were you finally successful?

Well, not really. I put the project at the back of my todo list :slight_smile:

Surprising how difficult that information is to get at.