How to reduce your database size and extend the life of your SD card

Thank you for the clear directions. Before I tackle this, does your directions work for the built in SQLite database?
I have not switch to mariaDB as it is not clear to me why it is needed. I have been waiting for the Pi4 to be able to boot from SSD and then I will reconfigure it as well as learn how to use the mariadb. It is close now as the boot from SSD is in beta.

Yes I should have made that clear. The configuration is the same for whatever database you use.

It gives a noticeable performance boost at the expense of using more RAM. Also it is a lot less prone to corruption.

2 Likes

Now to learn how to configure it. Might as well switch while learning your procedure.:grinning:

Pretty certain there’s nothing you can do about that leading whitespace being removed. If you trace the code in file notifier back into the point at which messages are handled in the core notify component you can see that it eventually renders the template using helper.async_render which does this:

So yea, that’s just going to trim the whitespace off the beginning and the end of your message every time. If you want everything to line up you’d be better off trying to remove the leading whitespace from the others. Or just add a dummy line to the top before the for loop with a non-whitespace character to workaround this.

1 Like

That makes it very difficult to cut and paste the results.

Adding two spaces to the first entity is the lesser chore.

Thanks for looking into it.

1 Like

Separate topic, but if your install is supervised on Pi OS you can move everything except the bootloader to an SSD now. I was able to follow this guide for doing so without issue. Would be nice to eliminate the SD Card completely but really just a bonus at this point, I’m effectively running everything off the SSD. Performance has improved drastically and HA is only interacting with the SSD.

There’s a thread on this and other solutions. If you are on HassOS the author later wrote up a guide for getting to a similar setup on that as well.

1 Like

Thanks for this Tom! It helped me clean up my recorder and make sure I got everything.

Now to remember to stay on top of it…

For what its worth, copying and pasting from the developer tools worked for me.

1 Like

I’m surprised no one has suggested this yet:

logbook:
  include:
    domains:
      - sensor
      - cover
      - climate
      - lock
      - binary_sensor
  exclude: !include exclude.yaml

I’ve always had a few extras for the recorder:

recorder:
  purge_keep_days: 14
  db_url: !secret database_url
  include:
    domains:
      - sensor
      - input_boolean
      - input_select
      - input_number
      - input_text
      - input_datetime
      - automation
      - cover
      - climate
      - lock
      - binary_sensor
  exclude:
    entities: !include exclude.yaml
    event_types:
      - service_removed
      - service_executed
      - platform_discovered
      - homeassistant_start
      - homeassistant_stop
      - feedreader
      - service_registered
      - call_service
      - component_loaded
      - logbook_entry
      - system_log_event
      - automation_triggered
      - script_started
      - timer_out_of_sync

I actually have it like this:

logbook: !include logbook.yaml

That way all my includes and excludes are in the same place.

Same for history and the recorder.

I usually exclude the same devices from log/recorder/history. I’ve also trimmed out some event_types from recorder as recommended long long ago.

The events table is now the largest part of my database. I’ll have to look into the effect of some of those excludes though. For example does excluding automation_triggered prevent you from using the automation’s last triggered property?

last_triggered: '2020-06-15T07:45:49.070321+00:00'
id: '1559915283056'
friendly_name: Auto Kitchen Lights Off

Looks fine to me.

1 Like

Is that basically every event besides state_changed? Did you get that list from investigating your own DB or is there a comprehensive list of all the native events somewhere? Always wondered about that but haven’t managed to find one yet.

Also do you use logbook? Seems like stripping out events like call_service, automation_triggered and script_started would hide all those events from it unless I’m misunderstanding what data logbook is displaying.

Thanks. Boot from ssd is in beta so I am going to wait for it’s release. Should be soon so for me it doesn’t make sense to start using the Bootloader method.

I’m not sure, it was a list from a very old thread, on how to lower the size of your database.

Not really, I should probably disable it at some point, but you are correct those events do not show in the logbook.

1 Like

I got the mariaDB up and running and wanted to try your guide. Where do you create the notification integration? I thought in the configuration.yaml file, but it fails here.
Thanks.

Yes, in the main configuration.yaml file. See:

I keep getting errors when I check my configuration.yaml using your code.
Reading the directions I modified the code to this:

notify:
  - platform: file
    name: entity_log
    filename: /config/www/entity_log.txt
    timestamp: false

Can I assume that this is correct and your code is missing the notify?
Thanks.
By the way, all that responded changing to MariaDB is night and day the difference. The log file is instant in Mariadb. Thanks for this suggestion.
Bill

It’s not a sensor it’s a notification service. This is not correct (or what I posted above):

notify:
  - platform: file
    name: entity_log
    file_path: /config/www/entity_log.txt
    timestamp: false

This is:

notify:
  - platform: file
    name: entity_log
    filename: /config/www/entity_log.txt
    timestamp: false
1 Like

The suggested approach is difficult to sustain as one can get overwhelmed by the number of entities to classify.

Instead of making a decision for each and every entity_id, why not look at the ‘worst offenders’ and exclude those, only. By identifying the most frequently updated entities we can significantly reduce I/O without having to micro-manage the list of exclusions.

To do so, you’d need SQL access to the database, and then use SQL queries similar to the following:

For the states table:

SELECT 
	entity_id,
    COUNT(*) AS cnt
FROM states
GROUP BY
	entity_id
ORDER BY
	COUNT(*) DESC

The events table seems to take quite a bit longer with:

SELECT 
	JSON_VALUE(event_data, '$.entity_id') AS entity_id,
	COUNT(*) AS cnt
FROM events
WHERE
	JSON_VALUE(event_data, '$.entity_id') IS NOT NULL
GROUP BY
	JSON_VALUE(event_data, '$.entity_id')
ORDER BY
	COUNT(*) DESC
5 Likes