How to keep your recorder database size under control

petro · May 2, 2022, 5:22pm

Wow, I had to look this up on discord because I was curious who you talked to and what you said. Do you think this is the best way to start a convo as your first post on discord?

TBH, it seems like you just said that to pick a fight. What do you expect?

TheHolyRoger · May 2, 2022, 5:24pm

We can talk about this in PM if you like, don’t want to derail the thread.

petro · May 2, 2022, 5:27pm

There’s nothing to talk about, I’m just baffled that you think this is positive feedback. Nothing you said in discord was positive, you just started with a sarcastic assumption. Making your previous post a lie just to make the mods look bad. I just don’t get it. Why lie?

TheHolyRoger · May 2, 2022, 5:29pm

Why not share the whole conversation rather than one message taken out of context, and intentionally leaving out the bit where I said I liked everything else in 2022.5?

Anyway, I shall endeavour to not feed the troll, despite the fact it’s a mod. Won’t say anything else about it.

petro · May 2, 2022, 5:37pm

Sure, i’ll glady highlight everything too. Lets see:

You start with a scarastic assumption.

Call beginners plebs…

Have 1 small compliment.

Continue to bash the work…

Yes, The mods here were totally uncalled for asking you to be nicer.

denilsonsa · May 2, 2022, 6:14pm

Hi, I don’t want to take sides, but the last 6 messages (7, including this one) are completely off-topic for this thread (which is already getting quite long). For the sake of keeping things on-topic, maintaining a good sinal-to-noise ratio, and making it easier to read for people in the future, I suggest moving these 7 messages (this one included) to another topic somewhere else. Thanks!

bdraco · May 2, 2022, 6:43pm

Enjoy in 2022.6.x

github.com/home-assistant/core

Avoid recording state_changed events in the events table

home-assistant:dev ← bdraco:drop_state_changed_events_table

opened 06:41PM - 01 May 22 UTC

bdraco

+151 -134

## Breaking change  - All the data needed to fetch a `stated_changed` event is now available in the `states` table (along with `state_attributes` if needed) - Reduces overall database size by ~27% - Refactors logbook to work without the need for the `state_changed` `events` rows (fetched from `states`) - Refactors purge to work without the need for linking the `state_changed` event - Origin is now stored as an int ## Type of change  - [ ] Dependency upgrade - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New integration (thank you!) - [ ] New feature (which adds functionality to an existing integration) - [x] Breaking change (fix/feature causing existing functionality to break) - [ ] Code quality improvements to existing code or addition of tests ## Additional information  - This PR fixes or closes issue: fixes # - This PR is related to issue: - Link to documentation pull request: https://github.com/home-assistant/data.home-assistant/pull/209 ## Checklist  - [x] The code change is tested and works locally. - [ ] Local tests pass. **Your PR cannot be merged unless tests pass** - [ ] There is no commented out code in this PR. - [ ] I have followed the [development checklist][dev-checklist] - [ ] The code has been formatted using Black (`black --fast homeassistant tests`) - [x] Tests have been added to verify that the new code works. If user exposed functionality or configuration variables are added/changed: - [ ] Documentation added/updated for [www.home-assistant.io][docs-repository] If the code communicates with devices, web services, or third-party tools: - [ ] The [manifest file][manifest-docs] has all fields filled out correctly. Updated and included derived files by running: `python3 -m script.hassfest`. - [ ] New or updated dependencies have been added to `requirements_all.txt`. Updated by running `python3 -m script.gen_requirements_all`. - [ ] For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description. - [ ] Untested files have been added to `.coveragerc`. The integration reached or maintains the following [Integration Quality Scale][quality-scale]:  - [ ] No score or internal - [ ] 🥈 Silver - [ ] 🥇 Gold - [ ] 🏆 Platinum  To help with the load of incoming pull requests: - [ ] I have reviewed two other [open pull requests][prs] in this repository. [prs]: https://github.com/home-assistant/core/pulls?q=is%3Aopen+is%3Apr+-author%3A%40me+-draft%3Atrue+-label%3Awaiting-for-upstream+sort%3Acreated-desc+review%3Anone+-status%3Afailure  [dev-checklist]: https://developers.home-assistant.io/docs/en/development_checklist.html [manifest-docs]: https://developers.home-assistant.io/docs/en/creating_integration_manifest.html [quality-scale]: https://developers.home-assistant.io/docs/en/next/integration_quality_scale_index.html [docs-repository]: https://github.com/home-assistant/home-assistant.io

github.com/home-assistant/core

Speed up and isolate legacy logbook context_id query

home-assistant:dev ← bdraco:logbook_isolate_legacy_query

opened 02:10PM - 02 May 22 UTC

bdraco

+42 -8

## Proposed change  - The query to find the context_id in the old format as been moved to its own function so it can easily be removed later. - The query is now much faster for looking up context ids that were recorded before the data change over Verified ok with legacy database and `pytest -k test_context_filter tests/components/logbook` ## Type of change  - [ ] Dependency upgrade - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New integration (thank you!) - [ ] New feature (which adds functionality to an existing integration) - [ ] Breaking change (fix/feature causing existing functionality to break) - [x] Code quality improvements to existing code or addition of tests ## Additional information  - This PR fixes or closes issue: fixes # - This PR is related to issue: - Link to documentation pull request: ## Checklist  - [x] The code change is tested and works locally. - [ ] Local tests pass. **Your PR cannot be merged unless tests pass** - [ ] There is no commented out code in this PR. - [ ] I have followed the [development checklist][dev-checklist] - [ ] The code has been formatted using Black (`black --fast homeassistant tests`) - [ ] Tests have been added to verify that the new code works. If user exposed functionality or configuration variables are added/changed: - [ ] Documentation added/updated for [www.home-assistant.io][docs-repository] If the code communicates with devices, web services, or third-party tools: - [ ] The [manifest file][manifest-docs] has all fields filled out correctly. Updated and included derived files by running: `python3 -m script.hassfest`. - [ ] New or updated dependencies have been added to `requirements_all.txt`. Updated by running `python3 -m script.gen_requirements_all`. - [ ] For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description. - [ ] Untested files have been added to `.coveragerc`. The integration reached or maintains the following [Integration Quality Scale][quality-scale]:  - [ ] No score or internal - [ ] 🥈 Silver - [ ] 🥇 Gold - [ ] 🏆 Platinum  To help with the load of incoming pull requests: - [ ] I have reviewed two other [open pull requests][prs] in this repository. [prs]: https://github.com/home-assistant/core/pulls?q=is%3Aopen+is%3Apr+-author%3A%40me+-draft%3Atrue+-label%3Awaiting-for-upstream+sort%3Acreated-desc+review%3Anone+-status%3Afailure  [dev-checklist]: https://developers.home-assistant.io/docs/en/development_checklist.html [manifest-docs]: https://developers.home-assistant.io/docs/en/creating_integration_manifest.html [quality-scale]: https://developers.home-assistant.io/docs/en/next/integration_quality_scale_index.html [docs-repository]: https://github.com/home-assistant/home-assistant.io

AleXSR7001 · May 2, 2022, 7:29pm

My sdcard and I thank you @bdraco ! Would give 2 likes if I could.

Truly looking forward to it

CaptTom · May 2, 2022, 9:36pm

Wow! It’s been a long time since I looked forward to a version update. You’re restoring my faith!

ingrove · May 4, 2022, 8:14pm

I’ve been using the following to keep my recorder information in memory to avoid excess writing to my SD card:

recorder:
  purge_keep_days: 1
  db_url: 'sqlite:///:memory:'

I upgraded to 2022.5.0 today and it looks like the in-memory option is not longer supported. Is there a alternate method to keep the recorder information in memory?

bdraco · May 4, 2022, 8:50pm

Unfortunately the in-memory sqlite does not have proper locking between threads which leads to random failures so its not longer available.

Although it is not technically supported, you can probably work around it by creating a ramdisk instead.

MichielioZ · May 4, 2022, 9:00pm

While I appreciate that the Home Assistant team is (finally) giving some attention to the excessive writing it used to do, I do feel like disabling in-memory sqlite is kind of abrupt.
Maybe it isn’t the best solution, but we are not given an alternative other than letting HA take control over the lifespan of my SSD (or SD card) again. (I’ve sacrificed 2 SD cards and 1 SSD in the past and I didn’t even want all that data)

fretwire · May 5, 2022, 4:37pm

Hi All, I also ran into this with the recent 2022.5.0 update and after playing around a bit trying to mount a ramdisk, I ended up just storing the database in /dev/shm which seems to be working.

Not sure how it’ll work over time but if you were using 'sqlite:///:memory' the following recorder config seems to get it back into memory on the 2022.5.0 release:

recorder:
  db_url: 'sqlite:////dev/shm/ha-recorder-v2.db'
  [ other options ]

I’m new to Home Assistant (fantastic software) so not sure how this will work over time. My setup (PI4/4GB/SSD dedicated to HA) has the data going to the InfluxDB add-on for longer term storage.

Hope that helps a little!

AleXSR7001 · June 2, 2022, 4:52pm

Just to be sure, is any recorder or logger config change required to fully benefit from the newly optimized data writing? I did not see anything that would suggest it in your code changes, but I won’t rule out an oversight on my end.

Also, does this change then allow to somehow purge the events table? I still get an error message when I want to empty the events table via phpmyadmin.

#1701 - Cannot truncate a table referenced in a foreign key constraint (`homeassistant`.`states`, CONSTRAINT `states_ibfk_1` FOREIGN KEY (`event_id`) REFERENCES `homeassistant`.`events` (`event_id`))

Or would I need to force empty it once because of old way of writing and can then safely empty it in the future?

dzkahn · June 2, 2022, 5:51pm

For those who want to have their database in memory, I would like to suggest this add-on, it works great without issue.

rcgbret · July 3, 2022, 11:37am

Ciao Denilson, many thanks for your contribution. I’m not only a newbie in the HA world but, in general, I’ve alos a very low background in IT topics… but I’ve a big passion and I’m curious and willing to learn. Then, as soon as I landed on your guide, I began following it. I’d say I almost succeded in the first part of it but I’m stuck in adding the scan_interval to the file-dimension sensor. I installed “file size” via the integrations user interface, after some tries (the path I was putting was not allowed or invalid) I did it adding the line

allowlist_external_dirs:
  - /config

I didn’t add

sensor:
  - platform: filesize
    file_paths:
      - /config/home-assistant_v2.db

because I input the db path during the integration set-up in the user interface.
Then, my reasoning is: if Denilson added the filesize sensor via configuration.yaml, in the sensors’ section, I’m assuming that my filesize sensor has been “written” somewhere else…so,where do I find it in order to add the scan_interval option?

Thanks for any suggestion anyone of the community will give to me!

denilsonsa · July 4, 2022, 10:29pm

Hi! The reason is simple… I wrote this guide around version 2021.4, but starting on version 2022.4, the File Size integration is now available to set up from the UI. Anything being configured through the UI is saved inside /config/.storage/, and those files are automatically managed by Home Assistant itself.

Additionally, since version 2021.6, you can disable polling updates on any integration through the UI. For such cases, if you want to change the polling interval, you have to disable the built-in automatically polling and manually create an automation to force one or more entities to update.

As you can see, there were plenty of changes, and I need to update/adapt the guide to the latest HA version. I’m just lacking time to do so, as real-life stuff gets priority. To make it worse, any time I end up dedicating to my HA installation has been updating it to the latest version and trying to debug this high CPU usage issue. I still haven’t found any solution for it, and I spent many more hours than I wanted.

rct · October 24, 2022, 8:05pm

Thanks for this guide. Here’s an updated version of the Viewing states usage query that addresses the move of attributes to the another shared table.

SELECT
  COUNT(*) AS cnt,
  COUNT(*) * 100 / (SELECT COUNT(*) FROM states) AS cnt_pct,
  SUM(LENGTH(shared_attrs)) AS bytes,
  SUM((LENGTH(shared_attrs)) * 100) / (SELECT SUM(LENGTH(shared_attrs))  FROM state_attributes) AS bytes_pct,
  entity_id
FROM states, state_attributes
WHERE
  states.attributes_id = state_attributes.attributes_id
GROUP BY entity_id
ORDER BY cnt DESC

I haven’t dug into it, but the percentage from attributes will total to more than 100% as the attributes can be shared across states.

If no one has made a Jupyter notebook available with these queries, I may publish one. It is a lot easier to dig into this data there.

mr2c12 · December 9, 2022, 8:03am

Running @rct query returned the following result:

aqara_plug_fridge is a zigbee smart plug connected to the fridge
em_channel_* are the sensors exposed
processor use and memory use are self explanatory
mijia_plug_salamusica_ng is another smart plug (in a less “important” position)

In my case, it would be great the possibility to limit the data saved “per entity”, since I’m not really interested in 10 days of fridge power consumption but 2-3 days would be enough.
I, still, would keep 10 days of records for the other entities.

Thanks to everyone in this thread for such an interesting topic!

Mosher · January 13, 2023, 10:04am

Hello all,

Thank you @denilsonsa for this guide.
I have read this thread many times but still have problems with my Database.
I had an install of HA with final Database size of 53GB. I have decided to make a fresh install and restored my back up. After reinstall my Database grew 1.1GB in first day. So i followed your guide and identified main causers of this. Afterwards I have applied filter to my configuration.yaml and issued a purge+repack.
Now my Database is growing slower but still like 200MB/day.
After applying Purge&Repack my Database show NULL for all Values. Is there a way to fix it?

States query result

Recorder configuration:

recorder:
  exclude:
    domains:
      - device_tracker
      - media_player
      - uptime
      - time_date
      - worldclock
    entity_globs:
      - sensor.clock*
      - sensor.date*
      - sensor.glances*
      - sensor.load_*m
      - sensor.time*
      - sensor.uptime*
      - sensor.home*
      - number.office*
      - select.office*
    entities:
      - sensor.home_assistant_v2_db
      - weather.openweathermap
      - climate.office_thermostat
      - lock.office_thermostat_child_lock

I would appreciate any help.

Thank you in advance