That was done using InfluxDB and the Grafana addons. See: https://youtu.be/m9qIqq104as
Also make sure to use includes or excludes to reduce the size of your InfluxDB database.
That was done using InfluxDB and the Grafana addons. See: https://youtu.be/m9qIqq104as
Also make sure to use includes or excludes to reduce the size of your InfluxDB database.
Hi,
You can get the DB-size with a sensor.
Use almost the same url as for the recorder.
sensor:
- platform: sql
db_url: !secret dbsizeurl_nas
queries:
- name: DB size mB
query: 'SELECT table_schema "database", Round(Sum(data_length + index_length) / (1024 * 1024), 1) "value" FROM information_schema.tables WHERE table_schema="homeassistant" GROUP BY table_schema;'
column: "value"
unit_of_measurement: mB
**note tableschema should be the same as in the database section of Mariadb addon
From secrets.yaml
For a local DB
dbsizeurl: mysql://<dbuser>:<password>@core-mariadb/homeassistant
i.e.
dbsizeurl: mysql://homeassistant:CorrectHorseBatteryStaple@core-mariadb/homeassistant
For a db running on another instance
dbsizeurl_nas: mysql://<dbuser>:<password>@<ipadress>:<port>/homeassistant
i.e.
dbsizeurl_nas: mysql://dbuser:[email protected]:3307/homeassistant
Is there something similar, reliable for an InfluxDB?
Is this what you mean?
I have no idea no where I got this from
- platform: influxdb
host: a0d7b954-influxdb
port: 8086
username: homeassistant
password: !secret influxdb_password
queries:
- name: InfluxDB Database Size
unit_of_measurement: MB
value_template: '{{ (value | float / 1024 /1024) | round(1) }}'
group_function: sum
measurement: '"monitor"."shard"'
database: _internal
where: 'time > now() - 10s'
field: diskBytes
Looks, good, need to test it. I just remember @tom_l having issues to create a reliable DB size sensor for InfluxDB.
I had the same question as karansoi even though I’ve already had InfluxDB and Grafana addons installed since a long time ago. The answer to getting the sensor for the MariaDB, at least for me, was at SQL/MariaDB size sensor in MB?
So it looks like this is in my sensors.yaml file
- platform: sql
db_url: !secret mariadb_url
queries:
- name: MariaDB size
query: 'SELECT table_schema "database", Round(Sum(data_length + index_length) / POWER(1024,2), 1) "value" FROM information_schema.tables WHERE table_schema="homeassistant" GROUP BY table_schema;'
column: 'value'
unit_of_measurement: MB
Hi,
I come from iobroker because I really like some things about home assistant . But also some things not. My biggest problem is logging. In iobroker I can set for each entity when the value should be logged. e.g. E.g. only every 30 seconds or when there was a value change.
Edit: i see that i can maybe configure mqtt so that the value is not updated so often. I’ll test that tomorrow.
Following is an example:
I have a device (e.g. a computer) which is on between 09:00 and 12:00 o’clock and usually consumes exactly 20 watts.For a short time the consumption goes up to 100 watts. The device is connected to a current meter, which constantly reports the current value via ZigBee to HA.
The recorder now writes the value “0” into the database every time. Between 9:00 and 12:00 the same with “20” or sometimes “100”. This now generates thousands of lines of unnecessary information. However, it is enough for me if 00:00 the 0 watts are stored once in the database and only at 09:00 the “20”. Then 09:20 “100”, 09:23 again “20”, 10:00 “100”, …, 12:00 “0” until the next switching on of the device and a consumption of more than 0 Watt. But exactly this information is very important to me, which is why I do not want to turn off logging completely. However, if HA would write to the database only at such value changes, the size would also be significantly smaller.
Since I have a lot of such sensors and would like to log over long periods of time (up to a year), I would end up with several TB of data.
Is there a solution for this at HA?
Maybe the problem is also the attribute “last_seen” which changes every time?
–>
{“current”: 0, “energy”: 39.29, “last_seen”: “2020-12-27T23:29:49+01:00”, “linkquality”: 75, “power”: 0, “state”: “ON”, “voltage”: 228, “unit_of_measurement”: “W”, “friendly_name”: “energy_sensor”, “icon”: “mdi:flash”}
{“current”: 0, “energy”: 39.29, “last_seen”: “2020-12-27T23:29:59+01:00”, “linkquality”: 75, “power”: 0, “state”: “ON”, “voltage”: 228, “unit_of_measurement”: “W”, “friendly_name”: “energy_sensor”, “icon”: “mdi:flash”}
For long-term data I recommend influxDB, a time-series database that is exactly made for this purpose. Coupled with Grafana you can get some nice graphs/dashboards. I wrote a little guide about my config with PostgreSQL for short-term data and influxDB for long-term data, integrated into Home Assistant here.
Thanks, I will look at your guide today. With Iobroker I also used InfluxDB. However, through HA I became aware of TimescaleDB ( https://community.home-assistant.io/t/home-assistant-add-on-postgresql-timescaledb/198176 ). Since I wanted to switch from SQLite to MariaDB or Postgres anyway, this way I only need one database and not two. Do you see anything that speaks against TimescaleDB? The arguments for TimescaleDB actually sounded pretty good.
You could use this as well, I’m not familiar with TimescaleDB.
I like my solution because long-term and short-term data is separated and can be managed separately easily. Also the HA DB for the recorder often gets messed up with the only way out being to delete the DB (or knowing enough about databases to fix it yourself), which would be disastrous if everything is stored in one DB.
Well, of course as the author of the TimescaleDb addon I’m slightly biased, but to answer the question: Yes, I have both the recorder and the Long Term storage on TimescaleDB and it performs great!
I actually created the addon because I did not want to have multiple database systems running like SQLLite/mariaDB/Influx… So a one-size fits all solution was very welcome.
BTW, you have separate database for the recorder and LTSS. And (to address the concern about corruption), My SQLLite DB got corrupt once in a while, but on Timescale never had any problems.
Although, you should not want to run ANY database with heavy writes on a SDCard. It will get corrupt one day due to wearing.
Ok, then that’s great!
Luckily I don’t use an SD card anymore for a long time
Yes, I also use an SSD - the switch was a very good decision The perceived performance gain was significantly higher than between each Pi generation.
Unfortunately, via mosquitto mqtt or zigbee2mqtt, I could not reduce the refresh rate without losing valuable information (last seen of the sensors).
I’ll try LTSS tomorrow. In TimescaleDB, are the data possibly stored more sparingly? Or is the advantage only in the faster generation (of graphs)?
I have spent the last few days intensively looking into TimescaleDB and LTSS. I’m almost convinced too, but I still have a few open questions (maybe you can take a look at Pinned: Additional Documentation for Timescaledb Integration · Issue #1 · Expaso/hassos-addons · GitHub).
Since the topic here is about storage space, I would like to briefly summarize my findings of the past days:
Currently I generate about 150 MB per day of sensor data. Since I have not yet integrated all sensors, it will eventually be around >500 MB. So just close to 200 GB per year. That would be way too much. By compressing the data, I should end up with less than 5 GB per year, which would be completely ok.
Regarding performance, the question is, of course, how long a query will take over a period of one year with compressed data. But from what I have read, this is not a problem.
Briefly what I have done:
ALTER TABLE ltss SET (
timescaledb.compress,
timescaledb.compress_orderby = ‘id, time DESC’,
timescaledb.compress_segmentby = ‘entity_id’
);
SELECT compress_chunk(‘_timescaledb_internal._hyper_1_1_chunk’);
SELECT *
FROM timescaledb_information.compressed_chunk_stats;
Result:
“ltss” “_timescaledb_internal._hyper_1_1_chunk” “Compressed”
“54 MB” “53 MB” “8192 bytes” “106 MB” “88 kB” “32 kB” “1624 kB” “1744 kB”
The 106 MB are uncompressed in total and the 1744 KB are compressed in total.
However, I do not recommend haphazardly adopting my SQL statements. It is better to take the time and read up a bit at Timescale Documentation | TimescaleDB API reference.
Of course, my sensor data is very compressible because it is very repetitive. But the bottom line is that TimescaleDB can compete with influxDB and other time series DBMS. And I don’t need to worry further about avoiding unnecessary data. In addition, TimescaleDB should compress even better in future versions.
I could also delete all rows between an interval of 10 seconds or so.
Hi @bingo5,
I have answered your questions in the mentioned issue.
First off, thanks for taking the time to testdrive the TimescaleDb compression and post your findings! Very usefull!
To save storage space, you basicially have 3 options:
1) Compress the data. You have just proven that it even can beat Influx.
Pros:
Cons:
2) Remove data you don’t need (like wiping the attributes column to NULL)
Pros:
Cons:
3) Downsample your data, using Continuous Aggregates
Pros:
Cons:
Now, you could even combine all 3 methods in a mix and match style if you like,
depending on your hunger for data.
Maybe start with compression first, and create some continues aggregates to speed up queries spanning thousands of record.
That way, you have a nice balance between performance, and not losing any data.
With that in place, you could start with dropping old chunks of data to save space if needed.
Clearing the attributes column is more difficult with compression in place, but also
The good news: lots of options, freedom everywhere
First, thanks again for your support (also on github). I really like the mix of self-learning and your support
My approach was originally to avoid writing unnecessary data to the database in the first place, by only taking states that have a certain difference from the previous state.
But this is not supported by Recorder/LTSS. Because of the good compression and the other approaches you mentioned (very nice overview!) this is not necessary at all. The good performance of HA/LTSS and PostgreSQL/TimescaleDB on the small Piv4 is also surprising. I’ll see how it performs when more sensors are added. As another idea, I would adapt Zigbee2MQTT to transmit less/rare sensor information to HA.
Either way - there are quite a few possibilities and therefore I am sure that HA and TimescaleDB are very future-proof for me --> yes, freedom (and fun) everywhere
Next I’ll add the extra column for numeric states and read into Continuous Aggregates.
Many thanks for the guide! Very useful and my db is reduced by half right away
I have just one question though:
Can we combine the domains
in exclude
section with entities_glob
in include
section? When I tried to do it my entities that were matching the globs were not included in recorder. My config example:
recorder:
purge_keep_days: 14
db_url: !secret db_url
exclude:
domains:
- sensor
entities:
- binary_sensor.shelly1l_84cca8ad7bf2_overheating
include:
domains:
- binary_sensor
entity_globs:
- sensor.shenzhen_*_power*
entities:
- sensor.shenzhen_neo_electronics_co_ltd_power_plug_12a_energy
Everything works except the entities matching the listed patterns are not recorded. Am I doing something wrong or it is not possible the way I want?
Thank you all for this guide. I was running with a 3Gb DB with only 3 days history. This helped me identify the main culprits who were logging every 15 seconds (do I really need every stat from every container every 15 seconds when I already have InfluxDB doing the same???!).
So now the purge has run and I’m at the end of my 3 day cycle, I’ve gone from 3Gb to 360Mb. I’ll now extend up to 10 days of history.
It’s in Spanish, but I guess you can follow it: some analysis to optimize database size:
https://www.pacienciadigital.com/base-de-datos-sqlite-home-assistant/
Old post, yeah I know but…
My DB size jumped today and I am not sure why.
I’m usually fairly conscientious when it comes to keeping my recorder entities under control.
I’m using SQLLite to do some analysis and my States don’t seem to be a huge problem. I can’t however get any data using the Events query posted above.
SELECT
JSON_EXTRACT(event_data, '$.entity_id') AS entity_id,
COUNT(*) AS cnt
FROM events
WHERE
JSON_EXTRACT(event_data, '$.entity_id') IS NOT NULL
GROUP BY
JSON_EXTRACT(event_data, '$.entity_id')
ORDER BY
COUNT(*) DESC;
Maybe (probably?) the database has changed since this guide was written. Does any one know what the query should look like?
Thanks.