History/Logbook very slow

123 · May 16, 2019, 2:28pm

Virtually all of the benchmarks, in the link you provided, as for image-processing. That’s not relevant to Home Assistant’s performance.

kevinkk525 · May 16, 2019, 2:30pm

That might be true, it was just to prove that it is generally way more powerful than the PI (including homeassistant as the PIs storage interface is slow too)

petro · May 16, 2019, 2:34pm

L1, L2, and L3 is one of the most important specifications for database look ups and its not great on all these devices. That’s all I’m saying. And the PI will perform better in this category, which is not very good.

So prove away, but all you’re doing is lying to yourself. It won’t get better as long as you stay on that device. Even if you move the database to another unit, it will still be slow because your device is still managing that data. Good luck.

123 · May 16, 2019, 2:43pm

Which may mean nothing in real-world applications because they are influenced by operating-system optimizations (or lack thereof) and other architectural constraints.

Watch this video comparing the startup time of GIMP on RPi 3 (not 3+), TinkerBoard and XU4.

If you rewind the video, you’ll see a CPU benchmark test showing the XU4 is ~3 times faster than the RPi 3. Nevertheless, in real-world applications, subject to reading/writing to storage and using OS services, it was slower.

It’s a one-year old video and many things have undoubtedly changed since then but the moral of the story is to avoid relying exclusively on CPU benchmarks.

Yuran · May 16, 2019, 3:21pm

If someone want to test if a remote MariaDb make the situation better. I can create a couple of databases and also give access via phpMy Admin.

kevinkk525 · May 16, 2019, 3:21pm

That is true for the xu4 with sd card but not for the HC2 with OS on the HDD connected by SATA.

But whatever, everyone here can believe what he wants. I didn’t want to start a platform discussion but was trying to get constructive ideas on how to improve DB performance or workarounds, even if it means using a secondary database and keeping only a few days in the hass db.

petro · May 16, 2019, 3:36pm

I’ve gone through this process with my PI and nuc and this is what I noticed:

Database: ~~2gb~~ ish. My database is 5gb now, forgot I have it holding a month.

Pi with mariaDB database on pi: ~ 5 second load for single day. 30+ for week.
Pi with mariaDB database on a synology server (with not great specs): ~2 second load for single day. 30+ for week.
I5 Nuc with mariaDB database on a synology server (with not great specs): instant for single day. 1 second for week.

kevinkk525 · May 16, 2019, 7:06pm

Thanks for sharing your stats.
Maybe I should look into optimizing my mysql setup.
After running some benchmark it is really slow. There seems to be too much background load on my device.

petro · May 16, 2019, 7:20pm

I also limit my database to only sensors that I care about. For example, I don’t track the outside weather in my database. There are plenty of phone apps that do that. I only track data like CPU temperature, Network ping times, device_tracker locations, light status, etc. I don’t really track other things because I’ve noticed over the years that I’ve never looked at that data. So why clutter your db with it?

jms3000 · October 4, 2019, 5:00pm

I think the history page - esp. the graphs - are mainly done in js and maybe that is very slowly?
Why should MariaDB be so much faster then mySQL?

Tom7320 · December 15, 2019, 6:06pm

Same problem here. My system:

Synology DS1019+ (Intel Celeron J3455 @ 1.5 GHz, 8 GB RAM)
HASS dockerized
MariaDB 10 (Synology Package)
purge_keep_days: 3
All databases have about 500.000 entries, only 800 MB altogether

But still the log and the history are both painfully slow. I’m not a database expert, but simple SQL queries are quite fast. But the HASS log and history are almost unusable…

Any help very much appreciated!

llingnau · January 19, 2020, 9:20am

Well, just since it hasn’t been mentioned yet (and I learned it the hard way):

If your database is oddly large, maybe you should address this and performance will be back to usual. This can be explained for instance if you have an entity that is updating it’s state VERY frequently. This results in a ton of data being stored in the database every day.

In the Home Assistant UI, head to Developer Tools > Events, enter state_changed in the Listen to events field and hit “Start Listening”. You’re gonna get a list of every entity that is sending a state update (live). If there are a lot (and a lot from a single entity) this is gonna generate a huge amount of data.

In my case, I have a Z-wave metered outlet that sends an update “whenever there is a significant change in the consumed power”. This means I can easily have 10 updates in a second (depending on the power consumption variation). This resulted in a database that would easily go over a gigabyte in size and terrible history performance.

My “solution” (you can also call it workaround) was to exclude this data from the recorder with the following configuration entry:

recorder:
  purge_keep_days: 2
  exclude:
    entities:
      - switch.pc_power_33

With “switch.pc_power_33” being the offending entry. This means I don’t get a history for this guy but since I’m only using it as a metered outlet (and I can still see the live consumption in the interface) I’m happy with that. It’s better than waiting several seconds for the Logbook or graphs to open.

I know it’s “harsh” to throw away this data (I’m a data hoarder myself) but this is one way to deal with increasingly large databases. Another would be to fine-tune the setup so that the device itself (or the integration if possible) so that it doesn’t generate/send these updates so often.

If you still need the data but want to reduce the amount that gets recorded, read on (I haven’t tried it myself, but it should work):

It should be possible to use an intermediate template sensor with a fancy update logic to limit the amount of data stored to one measurement per minute or every five minutes for example. This is mentioned on the following documentation page:

This is the suggested example for a template that only updates every 5 minutes:

sensor:
  - platform: template
    sensors:
      nonsmoker:
        value_template: "{{ (( as_timestamp(now()) - as_timestamp(strptime('06.07.2018', '%d.%m.%Y')) ) / 86400 ) | round(2) }}"
        entity_id: []
        friendly_name: 'Not smoking'
        unit_of_measurement: "Days"

automation:
  - alias: 'nonsmoker_update'
    trigger:
      - platform: time_pattern
        minutes: '/5'
    action:
      - service: homeassistant.update_entity
        entity_id: sensor.nonsmoker

I hope my input helps some of those with the performance issues. For reference, I was previously keeping 5 days of data and this resulted in a 1.2GB database. I’m now storing 2 with a 20MB config (and just about to increase it to longer because I’ve now confirmed that this was indeed the culprit (and now my database won’t grow wildly anymore)

Note: I’m also running HA in a Docker container on a Synology DS-718 using the built-in database.

firstof9 · January 19, 2020, 3:13pm

Usually you can turn that off. I have in my Aeotec HEM device, and just have it report every XX seconds.

AaronCake · January 19, 2020, 3:46pm

The fundamental issue here, which is unsolvable without rewriting the HA database component, is that HA uses a relational database like a flat text file.

For example, the “events” table contains the event log with an ID, type (ie. “state_changed”) time_fired and some other columns. All good, but the real issue here is the “event_data” column which contains a dump of JSON containing everything to do with the event; entity_id, old_state and all attributes, new_state and all attributes, date stamps, context ID, etc.

Looking at my sensor.hass_uptime entity, there is 756 bytes of encoded JSON data within the field.

This type of record is inserted with every event change in HA unless it has been excluded in the recorder config.

The “states” table is much the same. Any state change is recorded (unless excluded) and all attributes encoded as long JSON strings.

The consequence of this is that querying and displaying the data involves a lot of string/JSON processing, and completely blows away the concept of a relational database. Instead the engine is used as a data dump and queries are essentially processed in Python.

There should be a separate “attributes” table which contains all the attributes stored as their proper native data types. Let the database engine optimize and decide how to store them. Then use IDs to match attributes to records as one would expect in a relational DB.

Queries now would be processed by the database engine and the data sent to HA as record objects that the libraries were designed to work with. The database would be come an asset to HA and greatly increase its utility as a data recording and display platform, instead of a liability that bloats, slows down the software and renders recording long-term data impossible.

Wish I had the Python skills to rewrite it.

Tom7320 · January 20, 2020, 12:15pm

The only viable option for me was to exclude the whole sensor domain and after that include specifically the entities I really need…

I completely agree with AaronCake, that the main problem is a design issue of the database structure of the recorder component of HA.

Unfortunately I also do not have the Python skills to rewrite it…

mirekmal · January 20, 2020, 3:58pm

I have similar setup to Tom7320 with MariaDB running on Synology NAS with dedicated SSD volume. I only excluded 2 particular sensors that were flooding my log file with frequent updates (every 10 seconds update triggered by script, that I did not wanted to see in log). I can load daily full log in under 12 secs. History graph (that in contrary to log is limited to 15 entities showing as bars and 29 entities showing as graphs) load in under 1 sec for single day, 4 secs for 3 days and up to ~16 secs for 1 week of data. In recorder I limited history to one full month (31 days) and I’m purging history every day. My database over time grow to ~25GB and once reached 31 days of history stays relatively flat (growing as I’m adding more entities to configuration). I found this performance being very acceptable.
Now what is the speed of your system, how long does it takes to load history graphs or log?
In particular, if I’d try to optimize speed, I’d limit/eliminate bar graphs that are showing frequently changing entities. I found these to make huge difference - after adding just one bar representing home presence (combined output from ~20 motion sensors across my house, hundreds of changes per day) i experienced 4 times drop in performance of history!

firstof9 · January 20, 2020, 5:47pm

Would be a good issue write up for the github guys

After 1 week and excluding frivolous sensors ie some dark_sky sensors, my DB hovers around 4GB and I don’t dare attempt to use the logbook nor history menus

llingnau · January 21, 2020, 4:12pm

Below is the device I’m using and it doesn’t seem possible, unfortunately. Otherwise yeah, that would probably be a lot easier approach. Thank you for the input.

http://manuals-backend.z-wave.info/make.php?lang=en&sku=MT%2002792&cert=ZC10-16125365

paddy0174 · January 21, 2020, 4:37pm

Erhmmm… I don’t own that device, but in the manual you linked, there are two settings for the time interval. The first two parameters:

Parameter 1: Watt Meter Report Period

If 1 hour is set (target value =720), the Home Control Smart Metering Plug reports its power consumption at the current time every hour to the Node ID of Group 1. The maximum reporting period is 45 hours(5s32767/3600=45hr).*
Size: 2 Byte, Default Value: 720

Setting Description

1 - 32767 Report Period in Unit 5s

Parameter 2: KWh Report Period

If 1 hour is set (target value =6), the Home Control Smart Metering Plug reports its accumulated power consumption every hour to the Node ID of Group 1. The maximum reporting period is227.55 days (10s32767/1440=227.55 days).*
Size: 2 Byte, Default Value: 6

Setting Description

1 - 32767 Report Period in Unit 10min

firstof9 · January 22, 2020, 12:29am

The defaults are 1 hour so it shouldn’t be flooding you.