Reduce wasting storage and unnecessary SD/SSD wear

Most HA implementations run on Raspberry Pi’s with either SD card or SSD as storage. This is significant because disk space may be limited, the life expectancy of these devices is compromised by the number of write cycles, and large log files and databases slow down performance because system resources is not unlimited.

One would therefore expect that all focus would be on avoiding generating data and committing it to disk, but unfortunately the opposite is true. On installation the default for the HA core and integrations is to log “info” level data, but a better approach would be to initially log errors only. Users can then expand the logging when needed to also include additional (debug) info for the specific component where they encounter problems. Now users must search through forums and documentation to find solutions to limit data generation, and a large percentage may not even realize that there are workarounds available to mitigate some of the problems.

Another concern is the HA database that is de-normalized, wasting a lot of space in duplicating the same data over and over. Take the states table for example. Currently (HA version 0.118) every record for a sensor reading in the states table contains a large chunk of text with the sensor configuration (e.g. unit of measure, icon, friendly name etc). This type of configuration detail should be stored in a dedicated entity configuration table, so that each record inserted into the states table only contain data about the actual state change that occurred (e.g. entity_id, timestamp, newvalue, …). Something similar also applies to the events table.

That the data volume is recognized as a problem can be seen in the functionality that is released to get rid of data e.g. purge the database, keep only the last x number of records in logs etc. But this seems focused on addressing the symptoms rather than preventing it from happening in the first place. All that data was already written to disk and reduced your DWPD / TBW before it is cleaned up.

Question is what we as users can do to reduce the pain and help protect our systems.
Any thoughts and advice you can share based on your experience?

1 Like

Well, given that Home Assistant hasn’t had it’s v1.0 release I don’t think it’s unreasonable to use a verbose log level.
We have a lot of inexperienced users and telling them to increase the log level after something failed might not be the best idea since some integrations aren’t super stable.
If you want to reduce the wear on your SD, you can reduce the log level.
Set up configuration for the logger and choose fatal or even critical.

However the logs don’t really consume that much data, shouldn’t be more than a few kB every day.
It’s mostly the database.
Have a read through the Recorder docs, especially the first Note.

  • Increase the commit_interval
  • Use filters (use a blocklist approach with exclude or a allowlist approach with include)
  • If you wanted to you could offload the database entirely to another machine.

But there’s only so much you can do, and obviously you should still prepare for the worst-case scenario. Take regular snapshots to a backup machine, so that setting up a new SD card is as simple as flashing it once and restoring the snapshot.

1 Like

Below is a flow in Node Red to purge and repack the HA database.
With it’s current settings it will run daily, at 1am, and purge the data with a retention of 7 days.
On completion it sends a notification to HA, and also writes an entry in the HA logbook. Remove the nodes for the notifications you don’t want.

[{"id":"63595202.9b5c4c","type":"inject","z":"7f52f4e2.25e814","name":"Do Purge & Repack","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"str","x":170,"y":900,"wires":[["1cf76219.6c9a4e"]]},{"id":"1cf76219.6c9a4e","type":"api-call-service","z":"7f52f4e2.25e814","name":"Purge (7 days)","server":"112d9be2.b9d714","version":1,"debugenabled":false,"service_domain":"recorder","service":"purge","entityId":"","data":"{\"keep_days\": \"7\"}","dataType":"json","mergecontext":"","output_location":"payload","output_location_type":"msg","mustacheAltTags":false,"x":420,"y":940,"wires":[["86d6a0f2.8c265","b0f83d37.f6525"]]},{"id":"86d6a0f2.8c265","type":"debug","z":"7f52f4e2.25e814","name":"PURGE","active":false,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":620,"y":1000,"wires":[]},{"id":"b0f83d37.f6525","type":"delay","z":"7f52f4e2.25e814","name":"","pauseType":"delay","timeout":"10","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"x":580,"y":900,"wires":[["b4e17d86.5d4e1"]]},{"id":"3e60b9e.d348d46","type":"bigtimer","z":"7f52f4e2.25e814","outtopic":"","outpayload1":"on","outpayload2":"","name":"HA Purge - 01:00am","comment":"","lat":"52.3676","lon":"4.9041","starttime":"60","endtime":"75","starttime2":"0","endtime2":"0","startoff":"0","endoff":"0","startoff2":"0","endoff2":"0","offs":0,"outtext1":"","outtext2":"","timeout":1440,"sun":true,"mon":true,"tue":true,"wed":true,"thu":true,"fri":true,"sat":true,"jan":true,"feb":true,"mar":true,"apr":true,"may":true,"jun":true,"jul":true,"aug":true,"sep":true,"oct":true,"nov":true,"dec":true,"day1":0,"month1":0,"day2":0,"month2":0,"day3":0,"month3":0,"day4":0,"month4":0,"day5":0,"month5":0,"day6":0,"month6":0,"day7":0,"month7":0,"day8":0,"month8":0,"day9":0,"month9":0,"day10":0,"month10":0,"day11":0,"month11":0,"day12":0,"month12":0,"d1":0,"w1":0,"d2":0,"w2":0,"d3":0,"w3":0,"d4":0,"w4":0,"d5":0,"w5":0,"d6":0,"w6":0,"xday1":0,"xmonth1":0,"xday2":0,"xmonth2":0,"xday3":0,"xmonth3":0,"xday4":0,"xmonth4":0,"xday5":0,"xmonth5":0,"xday6":0,"xmonth6":0,"xd1":0,"xw1":0,"xd2":0,"xw2":0,"xd3":0,"xw3":0,"xd4":0,"xw4":0,"xd5":0,"xw5":0,"xd6":0,"xw6":0,"suspend":false,"random":false,"randon1":false,"randoff1":false,"randon2":false,"randoff2":false,"repeat":false,"atstart":false,"odd":false,"even":false,"x":160,"y":960,"wires":[["1cf76219.6c9a4e"],[],[]]},{"id":"b4e17d86.5d4e1","type":"api-call-service","z":"7f52f4e2.25e814","name":"Purge - Repack","server":"112d9be2.b9d714","version":1,"debugenabled":false,"service_domain":"recorder","service":"purge","entityId":"","data":"{\"repack\": \"true\"}","dataType":"json","mergecontext":"","output_location":"","output_location_type":"none","mustacheAltTags":false,"x":760,"y":940,"wires":[["98a6e148.976c7","fbedafca.0f9d"]]},{"id":"98a6e148.976c7","type":"function","z":"7f52f4e2.25e814","name":"Prep Notification","func":"var runtime = new Date().toLocaleString();\nvar finalmsg = msg.payload.domain + \".\" + msg.payload.service + \"\\n\\n\" + runtime + \"\\nDatabase purged with a retention period of \" + msg.payload.data.keep_days + \" days.\" \n\nvar newpayload = {\n  \"data\":\n  {\n    \"message\": `${finalmsg}`,\n    \"title\": \"HA Database Purge\"\n  }\n}\n\nmsg.payload = newpayload\nreturn msg","outputs":1,"noerr":0,"initialize":"","finalize":"","x":980,"y":1000,"wires":[["a3fde5e3.c7ad88"]]},{"id":"fbedafca.0f9d","type":"function","z":"7f52f4e2.25e814","name":"Prep Log Message","func":"var topic = msg.topic;\n\nvar payload = { \n    \"data\":\n    {\n        \"domain\": msg.payload.domain,\n        \"name\":   `${topic}`,\n        \"message\": \"Database purged with a retention period of \" + msg.payload.data.keep_days + \" days.\"  \n    }\n};\n\nmsg.payload = payload\nmsg.topic = \"System Backups\"\nreturn msg\n","outputs":1,"noerr":0,"initialize":"","finalize":"","x":990,"y":940,"wires":[["7d867841.cdf508"]]},{"id":"a3fde5e3.c7ad88","type":"api-call-service","z":"7f52f4e2.25e814","name":"Notify HA (Persistant)","server":"112d9be2.b9d714","version":1,"debugenabled":false,"service_domain":"notify","service":"persistent_notification","entityId":"","data":"{}","dataType":"json","mergecontext":"","output_location":"","output_location_type":"none","mustacheAltTags":false,"x":1220,"y":1000,"wires":[[]]},{"id":"7d867841.cdf508","type":"api-call-service","z":"7f52f4e2.25e814","name":"HA Logbook","server":"112d9be2.b9d714","version":1,"debugenabled":false,"service_domain":"logbook","service":"log","entityId":"","data":"{}","dataType":"json","mergecontext":"","output_location":"","output_location_type":"none","mustacheAltTags":false,"x":1190,"y":940,"wires":[[]]},{"id":"fd694d23.a8465","type":"comment","z":"7f52f4e2.25e814","name":"Purge and Repack HA database","info":"- Use a BigTimer as scheduler\n- Scheduled to purge at 1am on Sunday\n- Set to purge 7 days data\n- Set Longitude and Latitude for your location!\n- Sends notification to HA on completion\n","x":150,"y":840,"wires":[]},{"id":"112d9be2.b9d714","type":"server","name":"Home Assistant","legacy":false,"addon":false,"rejectUnauthorizedCerts":true,"ha_boolean":"y|yes|true|on|home|open","connectionDelay":true,"cacheJson":true}]

This can also be done manually from within HA.
Developer Tools -> Services:

  1. Service: recorder.purge
  • Data: {"keep_days": "7"}
  1. Service: recorder.purge
  • Data: {"repack": "true"}

Thanks for that extensive reply.

I already set the logger default to error in configuration.yaml, but it had no effect on e.g. DNS (Supervisor > Logs > Log Provider = DNS). I also set log levels for specific integrations, and it works great, but I don’t know the namespace for DNS.

I’m running HA in a IOTstack Docker configuration using a RPi and SSD, so recovery is more involved than simply flashing a SD card with HASSIO. But the financial and time investment aside, it would be nice to have a reasonably reliable home automation system in place, and not knowingly hammer it to death.

1 Like

Below is my database growth for the past 24 hours. It is quite small, as I only recently started with HA, but I also only keep a scrolling window of 7 days of data. And I also make an effort to exclude capturing all kinds of data that has no meaning (to me).

Two things…

  1. I run a purge-and-repack at 1am, and this can be clearly seen in the big vertical drop.
  2. I added a recorder section to configuration.yaml, and it runs by default every night at 04.12, as can be seen by the flat line pointed to by the green arrow.

The database size is captured with the following entry in configuration.yaml

sensor:
  - platform: filesize
    file_paths:
      - /config/home-assistant_v2.db

Unwanted data is excluded with the following:

recorder:
  commit_interval: 30
  purge_keep_days: 7
  purge_interval: 1
  include:
    domains:
      - switch
      - sensor
      - binary_sensor
  exclude:
    domains:
      - homemonitor
      - persistent_notification
      - input_number
      - input_text
      - input_boolean
      - sun
      - person
      - media_player
      - weather
      - updater

Edit: I updated my initial post and changed the commit_interval to 30, to reflect the recommendation made in the HA recorder doc. My own setting is currently 10 due to other reasons, and will most likely change towards 30 over time.

3 Likes

Ah, thanks for sharing the filesize sensor - I’d not thought of that :slight_smile:

As @fedot mentions the Recorder docs recommend dropping commit_interval to 30 seconds, so I’d be interested to know why you’ve picked 10… performance??

Personally, I think it’s a tuning parameter that every needs to consider for their own system… a quiet system could write every minute, but a busy system might need to write every 2 seconds… (unless I’m mistaken :slight_smile: )

I also prune out sensors that I don’t care about… binary sensors will only write once for a state change (I believe… that’s what most sane historians do :wink: ) so you’d reduce more SD writes disabling 1 sensor than 100 binary ones. Do you care about a sensor’s Wifi signal strength (for example)… that’s almost a random number generator… so exclude those sensors too…

So, my settings are: (I have excluded more sensors… you don’t need to read them all…)

recorder:
  purge_keep_days: 14   # Keep 2 weeks in history after purging @ 04:12
  commit_interval: 15   # Write to disk every 15 seconds
  exclude:
    domains:
      - automation
      - weblink
      - updater
      - persistent_notification
    entities:
      - sensor.last_boot
      - sensor.date                 # Don't really need to record the date...
      - sensor.time                 # ... or the time
      - sensor.office_fan_esphome_version
      - sensor.office_fan_uptime
      - sensor.spare_socket_uptime
      - sensor.isp_ping_avg
      - sensor.isp_ping_max
      - sensor.isp_ping_min
      - sensor.hacs     

I see you’ve excluded more domains than I have… I’ll need to take a look at this some more :slight_smile:

@AutoT With that 10-second recorder commit I’m carefully feeling my way through it, I will gradually increase the interval if my setup remains stable. Reason for the concern is because I have implemented “log2ram” on my RPi, and I’m not clear on how the sensor readings are buffered in memory before they’re committed to the db and what I need to tune to prevent issues.

I currently also disabled the WiFi sensors of the Sonoff switches that I flushed with Tasmota. I did use the RSSI readings from a Sonoff Mini for a couple of weeks though when I put it in the garden shed to control garden lights, so that I could monitor the stability of the WiFi signal over time. Interesting enough the signal strength improved at night after sunset, while I expected it to deteriorate because of increased activity and interference from neighbors. But nowadays due to the virus perhaps normal behavior patterns are not all that normal anymore.

Something else that can be considered to reduce SD card wear and increase its life expectancy, is implementing log2ram where possible.

log2ram is Linux functionality that creates a virtual ram disk in memory where logs are initially written to, and it then periodically writes to disk the data that at that point needs to be made persistent. Home Assistant for instance claims to only keep the most recent 50 system logs (configurable), so this may reduce disk transactions significantly under certain conditions.

This site gives a nice explanation of the vulnerability of SD cards, and how log2ram can help to extend their life expectancy.

Andreas Spiess made a great video on how to set up a RPi Docker configuration using IOTstack, and it also covers SD card wear and log2ram (10:10).

Yes, the database is a major problem and will continue to be a ticking time bomb in almost every HA installation until it is set up as a proper relational database, normalized to prevent the repeating data as you mention. Until then, tuning the recorder and all other mitigations are just delaying the inevitable.

My HA database right now sits at 18GB, generated by about a hundred sensors on a 7 day purge cycle.

On contrary, my Zabbix database at the office monitors 100,000 items, receives about 1000 values per second, stores data for 1 year and is about 20GB. And doesn’t take 7 minutes to display a graph.

I think HA is great, and I commend the developers and those who put their time and resources into the project. But the database is an absolute disaster and will be the single thing that holds the project back forever.

4 Likes

I agree. That’s why I switched to Rpi4 8Gb boot SSD.
I just have a small question: is it necessary to have db retry wait set up, or what is it actually used for?
I didn’t quite understand the translation.

It isn’t necessary to have db_retry_wait and db_max_retries set up as the defaults of 3 and 10 should work for most setups. I’ve not bothered to setup mine as MySQL and HA live on the same VM, and systemd takes care of starting MySQL before HA does. But if the live on different boxes you need to consider the scenario of a power cycle.

So probably best to set db_retry_wait to a longer value like 30 seconds so it has plenty of time to wait for a database to start up.

Okay, thanks for the clarification. Everything is running on one Rpi4 8Gb HassOS machine