Steps to reduce Write Cycles and extend SD/SSD life expectancy

Tags: #<Tag:0x00007fc42000f8e8> #<Tag:0x00007fc42000f7d0>

Intro

SD cards, but also SSD’s to a lesser extent, have a finite number of write cycles and will fail at some point. Your mileage will vary based on the quality of storage you’re using, and of course how busy your system is. But one thing can be said and that is that Home Assistant does very little to help and reduce the data being written to disk.

This write-up is about possible actions you can take to reduce the logging and data flows wearing out your storage. Some are easy and others more involved, so choose based on your confidence and skill levels, and also to what extremes you want to go to reduce those write cycles.

Concerns

  • The HA (Home Assistant) Logger filtering currently (March 2021) only applies to the HA core. Other components, especially hassio_supervisor and hassio_audio spam the logs with debug and info level messages all the time, and from within the app there is currently no way to stop this from happening.
  • Although Docker supports STDOUT and STDERR logging streams with resulting log level messages, it seems HA is currently logging all output to STDERR.
    This means we can’t make use of standard filtering in downstream systems to eliminate unwanted data.
  • Because currently all HA messages are logged as errors, it is not possible to run a daily report to “show me all errors that occurred in my system today”, because 99% of those errors are no errors.

There are tickets to correct this behavior for hassio_supervisor and hassio_audio. Unfortunately there is no joy, so I submitted a feature request for hassio_supervisor to filter out its own debug statements.
(You may want to upvote these changes, to show their importance to the user community)

Log Overview

HA runs in Docker containers, using the journald log driver to process messages published on STDERR (and STDOUT), where it is picked up by systemd-journald and put in the system journal. The messages are then passed on to rsyslog, where it is routed to the different log files in e.g. /var/log, based on the rsyslog filtering rules.


Rules of Engagement

When implementing changes to the system, whatever we do we want to maintain stability and be able to revert back to a working state should things go south.
So

  • Make a backup of every (system) file before you make changes.
  • Apply changes incrementally, and test in between.
  • Don’t compromise your security. E.g. don’t disable Samba logs if anybody else can access your RPi and you want to monitor (failed) connections.
  • KEEP A SYSTEM CHANGE LOG !!
    Write up everything you change, and also the time it was (effectively) implemented.
    • If something does not work (anymore), then you know what you changed, and when.
      All those sudden errors may or may not even be caused by your change, but you can only make the link if you have this information available.
    • In future you may want to undo some changes, to expand logging again to help troubleshoot that new problem.
    • New functionality may render your change useless, and you may want to undo it to reduce load on the system (e.g. rsyslog filtering).


Now with all that out of the way, it’s time to roll up the sleeves and get our hands dirty…

A. Home Assistant

1) Logger

Logger configuration. First step is to make sure HA outputs the minimum of logging to the downstream systems.
In configuration.yaml, appart from other filtering settings, set the following:

logger:
  default: critical
  logs:
    # log level for HA core
    homeassistant.core: fatal

It seems logger filtering for other components, like hassio_supervisor and hassio_audio is not yet implemented.

2) Database

It is possible to change the default location of the HA database. This means you can e.g. move it to a USB drive if you’re still on SD. Or you can put it in memory (which is volatile and you will loose the content when you reboot), and then extract only the reduced data set that you really want, to a persistent database.
There are guys who extract data from the in-memory HA database to an online BigQuery database.
I have no experience with this, only mention it here for the sake of completeness.

# Put the SQLITE db in memory.
recorder:
  db_url: 'sqlite:///:memory:'

3) Recorder

Recorder configuration. Keep the HA database small:

  • Only record what you really need. You can explicitly include, or explicitly exclude, or a combination of both.
  • Purge on a regular bases (the default is daily at 04:12am)
  • Keep a small rolling set of data (e.g. 7 days).
# Capture and log data to the HA database
recorder:
  commit_interval: 30
  purge_keep_days: 7
  #purge_interval: 1        # obsolete, replaced by "auto_purge" (default: true) 
  include:
    domains:
      - sensor
      - ....				# domains to record
    entities:
      - sun.sun				# specific entities to record, where the domain may be excluded 
  exclude:
    domains:
      - homemonitor
      - updater
      - ....				# domains to exclude from recording
    entities:
      - ....  				# specific entities to exclude, while the domain may be included
    entity_globs:
      - sensor.epson*		# groups of sensors by using a wildcard

What can be shown in the History and Logbook views is determined by the entities that are logged in the database, but you can exclude events and entities from these views to remove the noise and only show what is relevant to you. (And it is possible to add custom events to the Logbook using the Logbook service).

4) HACS

HACS is nice and all that. But if you don’t use any custom components, then (for now) don’t install HACS just for the fun of it. It downloads a lot of stuff from Github, and keeps on updating it every 30 (!) seconds. And while doing this it writes debug statements to journald.


B. Node-Red

If you are running Node-Red in a container (and not as add-on within HA), make sure it is not performing a health check like every 30 seconds.
If yes, then you may want to rebuild your image with a "HEALTHCHECK NONE" parameter.
Use “Docker Events” (see below) to monitor the events raised by running containers.


C. Linux Host

1) File System

Make sure “noatime” is set for the file systems defined in /etc/fstab. This is to prevent updating file metadata when a file is (only) accessed for reading.
Requires reboot.
e.g.

 PARTUUID=f3557908-01  /boot           vfat    defaults,noatime  0       2
 PARTUUID=f3557908-02  /               ext4    defaults,noatime  0       1

2) Log2Ram

The Log2Ram functionality creates a /var/log mount point in memory, so any logs written to the /var/log folder will not be written to disk immediately, the RAM drive will be flushed to disk regularly/daily. The logic behind it is explained here.

3) Swap Files

When the system starts to run out of usable memory (RAM), it starts to use part of the disk to offload memory data and thus free up memory. Even with enough memory available, Linux will still tend to use swap space after some time to offload seldom used parts of memory to free up memory for other programs.
There are pros and cons to using swap space, but the focus here is to protect our storage, albeit at the cost of additional load on CPU (my RPi 4 is on average running on 2% CPU, so increasing CPU usage marginally is not a concern at the moment).
So depending on your situation - how many containers and add-ons you’re running (and thus loading into memory), how much memory you have available (1/2/4/8GB), and/or use a SD card (that needs more protection), consider to reduce the OS’s affinity or swappiness to use the swap space.

There is also an argument to make use of ZRAM if you are low on memory and have low CPU usage.

# Check current memory usage and availability, and swap file usage:
free - h
# Check swap file status:
sudo service dphys-swapfile status

# Change affinity to use swap file:
sudo vi /etc/sysctl.conf
# set or add 
vm.swappiness=10

A value of 100 means “at every opportunity”, while 0 means “only if things do not work”. Default is 60.

4) Connectivity

  • Change the WPA log level from INFO to ERROR.
  • Consider to disable Bluetooth if you will never use it on the RPi.
  • Consider to disable WiFi if your RPi is and always will be connected via LAN.
sudo wpa_cli
  log_level = ERROR

# Disable Bluetooth:
sudo systemctl disable hciuart
# In "/boot/config.txt" add
dtoverlay=disable-bt

# Disable WiFi:
# In "/boot/config.txt" add
dtoverlay=disable-wifi

  • Change the Samba log settings.
# Samba settings in /etc/samba/smb.conf
[global]
  log level = 0
  max log size = 500
  local master = no
  # Only log to /var/log/samba/log.{smbd,nmbd}.
  logging = file
[printers]
  load printers = no
  printing = bsd
  printcap name = /dev/null
  disable spoolss = yes

If you continue to have entries in your Samba log that is meaningless and that you can’t get rid off, consider to replace the Samba logfiles with a link to a null device:

# replace Samba logs with /dev/null links
 sudo rm /var/log/samba/log.nmbd 
 sudo rm /var/log/samba/log.smbd 
 sudo ln -sf /dev/null /var/log/samba/log.nmbd
 sudo ln -sf /dev/null /var/log/samba/log.smbd
  • If you are running X11 VNC, change its settings and log level.
    Default log is ~/.vnc/vncserver-x11.log, but that is a EXT4 file and not TMPFS.
# Add to VNC config ( ~/.vnc/config.d/   or  /etc/vnc/config.d/common.custom )
 AudioEnable=0
 EnableAutoUpdateChecks=0
 EnableChat=0
 EnableRemotePrinting=0
 Log=*:file:1
 LogDir=/log/var
 LogFile=vncserver-x11.log
  • If you are not running headless, and spend some time using a keyboard and mouse connected to the RPi, it generates logging that I don’t know how to disable.
    One workaround is to replace the existing logs with null files:
 sudo ln -sf /dev/null /var/log/Xorg.0.log
 sudo ln -s /dev/null /home/pi/.cache/lxsession/LXDE-pi/run.log

D. journald

journald (or rather systemd-journald) is the entry point of HA Docker messages to the Linux system side of things.
The following changes can be considered:
( in the journald config file `/etc/systemd/journald.conf`` )

  1. where the journal data is stored
    • “Runtime…” parameters put the journal storage in memory (/run/log/journal)
    • “System…” parameters make the storage persistent on disk (/var/log/journal).
 RuntimeMaxUse				: how much space the journal may use  
 RuntimeKeepFree			: how much space systemd-journald will leave free for other uses
 RuntimeMaxFileSize			: how large individual journal files may grow
 RuntimeMaxFiles			: max nr of (archived) journal files

  1. the max log levels that will be processed.
    If these are set to “warning”, then ignore and discard notice, info and debug messages.
    The nuclear option is to set the max level to e.g. “crit” to ignore the HA debug errors (but note that this is system-wide!).
 MaxLevelStore=warning
 MaxLevelSyslog=warning
 MaxLevelKMsg=warning

Implement journald parameter changes as follows:

 # Make a backup. Edit the config file and make changes. Then restart journald. Check journald status to be "active (running)".
 sudo cp /etc/systemd/journald.conf /etc/systemd/journald.conf.ORIG
 sudo vi /etc/systemd/journald.conf
 sudo systemctl restart systemd-journald
 sudo systemctl status systemd-journald

The journal storage size can be monitored, and reduced (e.g. in cron on a regular bases).

journalctl --disk-usage			: check current disk usage
journalctl --vacuum-size=1GB	: clean up and reduce size to 1GB
journalctl --vacuum-time=1w		: remove all messages prior to the past week


E. rsyslog

rsyslog is responsible for filtering and writing the journal log data to file.
The main rsyslog config file is /etc/rsyslog.conf.
When adding your own filters, it is recommended to create new files in /etc/rsyslog.d/, starting with a numeric value indicating the sequence in which it must be loaded.

Unfortounately HA logs for e.g. hassio_supervisor and hassio_audio are currently all logged as “error” (priority=3), regardless whether they are error, warning, info or debug messages.

This makes it impossible to filter based on log level (priority), and forces us to do filtering using regex and parsing the contents of the message. Two issues with that:

  • Impact on performance and increased resource usage.
  • Inaccuracy. Either unwanted msg’s slip through, or (real) errors/wanted msg’s are filtered wrongly.

Remarks

  • The sequence of filters is important. If a message is “stopped” in an earlier filter, then it will not be processed further in subsequent filters.
  • If /etc/rsyslog.conf contains “$IncludeConfig /etc/rsyslog.d/*.conf”, then all *.conf files in /etc/rsyslog.d/ will be included, ordered by filename.
  • Messages can be filtered based on <facility>.<priority>, message contents, etc.
  • Priority goes from “emerg” (0) down to “debug” (7). Therefore *.err will include all messages from “err” down to “debug”. “*.=err” will only match errors.
  • In the default configuration, there is a LOT(!) of duplication happening, the same message may end up in multiple log files.
    You can choose to eliminate this duplication by changing the default filters. Also discard unwanted messages in the logs by e.g. eliminating logging of debug messages.
    Based on the typical (Debian) default config, partly shown below:
    • ALL messages ( *.* ) will go to /var/log/syslog
    • But all “user” messages ( user.* ) will also go to /var/log/user.log
    • And if that message is a “debug” level message ( *.=debug ), it will also go to /var/log/debug
...
auth,authpriv.*                 /var/log/auth.log
*.*;auth,authpriv.none          -/var/log/syslog
cron.*                          /var/log/cron.log
daemon.*                        -/var/log/daemon.log
kern.*                          -/var/log/kern.log
lpr.*                           -/var/log/lpr.log
mail.*                          -/var/log/mail.log
user.*                          -/var/log/user.log
...
*.=debug;\
        auth,authpriv.none;\
        news.none;mail.none     -/var/log/debug

Change the rules to reduce logging, e.g.

  • Replace the all-level * wildcards (e.g. user.*) with a minimal level ( user.warning)
  • Don’t just dump everything ( *.* ) in /var/syslog. Filter according your needs.
    e.g. just put errors in syslog:
    *.error;auth,authpriv.none -/var/log/syslog
  • Discard all debug messages.
    Also consider this for *.=info (this data is already logged in other files)
*.=debug;\
        auth,authpriv.none;\
        news.none;mail.none     stop

To add new filters, create a new file in /etc/rsyslog.d/ e.g. 01_hassfilters.conf, then add the filter definitions (in correct sequence).
e.g.

# Custom Filters to reduce overhead of Home Assitant on logging.

# Ignore all "pulseaudio" messages that are not errors.
if ($msg contains "pulseaudio") and not ($msg contains " E: ") then { stop }

# Ignore all Supervisor INFO debug messages.
if ($msg contains "supervisor") and ($msg contains "INFO") then { stop }

And just an example for testing dynamic file naming, should you want to do that. In this case, splitting user messages in different files based on severity.

$template userlogs,"/tmp/log/user-%syslogseverity-text%"
if $syslogfacility-text == ["user"] then -?userlogs

When any changes are made to rsyslog config files, first verify the new configuration is valid, then restart rsyslog, and check the status afterwards again.

 rsyslogd -f /etc/rsyslog.conf -N1
 sudo systemctl restart rsyslog
 sudo systemctl status rsyslog


Appendix: General Commands

Some commands and tools you may find helpful to visualize and trace disk usage, and related activities.

* Linux

Command Description
sudo iotop -aod 2 “iotop”. Show disk writes, accumulated over 2 seconds.
lsof -p <PID> “lsof”. Show open files for PID (TID from “iotop”)
pstree -spalGhnu <PID> “pstree”. Show parent processes for PID (change parameter for child processes).
sudo strace -y -e write -p <PID> strace (not always permitted/possible). Trace write file system calls for PID
dmesg -T show kernel messages (-T translates timestamp)
logger -p user.warn "TEST: user warning message" logger Writes to journald. Can be used to test rsyslog filters.

Recursively list all files that changed in the past 60 minutes, starting from <path>
find <path> -mmin -60 -type f 2>/dev/null -print0 | xargs -r0 ls -l

* journalctl

journalctl is a very convenient and powerful tool to search through system events that were written to the systemd journal. (Note that messages from Docker containers that use a different log driver, e.g. json-driver, will not be in the journal.)

Here are some examples how to extract data using journalctl to wet your appetite:
(a combination of filters allowed)

Command Description
journalctl -k List all kernel messages (since last boot)
journalctl -p err --since today List all errors since midnight
journalctl -F CONTAINER_NAME Unique list of all CONTAINER NAMEs in journal
journalctl CONTAINER_NAME=hassio_audio -f Show hassio_audio messages continuously in realtime
journalctl CONTAINER_NAME=hassio_supervisor PRIORITY=6 All “info” level messages logged by “hassio_supervisor”
journalctl CONTAINER_NAME=hassio_supervisor -p err All errors logged by “hassio_supervisor”
journalctl -u docker -S -1h -o verbose All Docker messages logged in last hour, in more detail
journalctl -r -n10 -o json-pretty Latest 10 messages, in reverse order, output in json format (all data fields)

Notes

  • Any of the fields (from the json-pretty output) can be used for filtering, e.g. PRIORITY or CONTAINER_NAME.
  • Filtering by log level (e.g “-p err” or “PRIORITY=3”) has no meaning for HA as ALL the messages are currently logged as errors, even info and debug messages.

* Docker

 # Monitor all Docker events (e.g. container stop / start etc.)
 docker events --format '{{.Time}}	Type={{.Type}}	Status={{.Status}}	Container={{.Actor.Attributes.name}}'

 docker logs <container>		# show logs from container ("-f" to show continuous/realtime logs) 
 docker ps						# list all running containers ("-a" to also show stopped and paused containers)

 # Write messages to STDERR and STDOUT from within a Docker container (to test downstream message handling).
 # Connect to a container by logging in to a Bash shell (e.g. hassio_supervisor).
 # Write messages to the STDERR and STDOUT streams of the process with PID 1.
 docker exec -it hassio_supervisor bash
 echo "`date`  TESTING STDERR" >> /proc/1/fd/2
 echo "`date`  TESTING STDOUT" >> /proc/1/fd/1
  
 # Show the log driver used by each of your containers. Should be `journald`, and possibly `json-file` for  non-HA containers you may have in your stack. 
 column -t <<< $(for f in $(docker ps --format '{{.Names}}'); do printf "$f \t"; docker inspect -f '{{.HostConfig.LogConfig.Type}}' $f; done)

7 Likes

Great write-up, well done! This will be very helpful for a lot of people. My own knowledge stops somewhere around C.4… After that it gets too complicated for me :slight_smile:
In your own experience, which component/service out of the above impacts write cycles the most?

Thanks.
I’d think that in a “normal” Home Assistant implementation there are the following sources of disk writes, and depending on how many sensors you have and how frequently they refresh, the order will change:

  1. HA database:
    Capturing the sensor changes and events. This is the only disk writes with real added value to our system, and should ideally be the only ones.
    So you can achieve a lot by just rethinking what you really need, and exclude the rest.
    • If you don’t need to know what time the door closed or the tv was switched off, don’t log it.
    • If you don’t need to see in HA that your phone is charging, exclude that integration.
    • If you don’t need to see the WiFi strength of your MQTT device, disable it (in TasmoAdmin or wherever).
    • Etc.
  2. HA, the app
    • logs generated by internal processes, like hassio_supervisor checks and refreshes. Same goes for hassio_audio, even HACS.
    • automations, scripts, notifications, … are all captured if you don’t exclude them in the logger and other filters.
  3. OS
    Linux is logging stuff all the time. When you do a “sudo” there’s a log created. Samba, VNC, X, cron, … are all quite vocal when it comes to logging.
  4. Other apps (in your container stack)
    e.g. Mosquitto, that in my case logs info like database updates, device connections etc. to a EXT4 log file (means it’s not in RAM but on disk).

As you may have picked up, my biggest gripe is with how Home Assistant just generates lots of crap. And even worse, all these debug info is logged as errors. Thing is that it is completely superfluous for the average user, even when you’re troubleshooting a specific problem. My ticket to change the hassio_supervisor logging was to me wrongfully rejected, so I’m now trying it as a feature request.

Ok, thanks. I can assume from here https://www.home-assistant.io/integrations/recorder that entities, automations etc. disabled in Recorder will subsequently not be logged in Logbook and History, so above points A.4 and A.5. would be redundant, is that correct? Is there a case scenario/need, in which I would disable something in History/Logbook but have to keep it in Recorder (or vice versa)?

Yes, you are correct. In the context of disk writes Logger determines what is stored in the database, and what therefore can be displayed in History, so History filters are not relevant. And that also applies to Logbook, unless you use the Logbook service to add custom Logbook events that “come in from the side” and may bypass the Logger filters. I will update the text accordingly.

I had one small use case, where I used a Node-Red automation to switch on lights in the house based on sunset, that varies over here from 5pm in winter to 10pm in summer. Turned out that it was too dark inside when the event fired, so I used an offset to switch on the lights earlier, based on whether it was cloudy or not. This only worked if the entity (sun.sun) was logged, but I did not care for how it was displayed in History, so I excluded it from the view. Now this is addressed much better after I added a lux sensor to measure the actual light values.

Ok, thanks again.
I’m now going through your other proposals/ideas. It is actually recommended here https://ikarus.sg/using-zram-to-get-more-out-of-your-raspberry-pi/ to completely uninstall default pi swap file (dphys-swapfile uninstall) if log2ram is used along with zram. I just wanted to add that for anyone who also wants to use zram, without knowing if it’s even a good idea to uninstall default pi swap.
I take it that for a supervised install the most important changes will be those suggested in point D.?

@Tryfos, thanks for actually going through the content and providing feedback. It’s still a work in progress, will update it over time so that people don’t need to read through all the comments.

For journald I’d say the most important part is to make sure it is in memory, by setting the Runtime.. parameters. I actually don’t know what is the default for OOB Debian.

Setting the max log levels would also be an efficient way to immediately prevent all lower-level “noise” from even entering the journal and logs. But, as mentioned, Home Assistant currently logs everything as errors so that is a missed chance. We can’t filter out the debug errors, you will then also miss the error errors. :roll_eyes:


You could also consider updating the rsyslog rules, to prevent all those duplication of messages to different logs. The risk is small if you make a backup of the /etc/rsyslog.conf file to revert back to.

My changes in /etc/rsyslog.conf
(these are only the changed lines, the file contains more stuff that I did not touch):

#- prevent all logging to syslog, by hashing out the complete entry:
#*.*;auth,authpriv.none         -/var/log/syslog

#- only write errors and warnings to other files:
cron.err                        /var/log/cron.log
daemon.warning                  -/var/log/daemon.log
kern.warning                    -/var/log/kern.log
user.warning                    -/var/log/user.log

#- Ignore debug, info, notice and warning messages. 
#- Relevant messages were already covered in previous rules:
*.=debug;\
        auth,authpriv.none;\
        news.none;mail.none     stop
*.=info;*.=notice;*.=warn;\
        auth,authpriv.none;\
        cron,daemon.none;\
        mail,news.none          stop

Maybe you could only write warnings (and above) to syslog, to start with:
*.warning -/var/log/syslog

And I added a new rules file: /etc/rsyslog.d/01-hassfilters.conf

# Ignore all PulseAudio messages that are not errors.
if ($msg contains "pulseaudio") and not ($msg contains " E: ") then { stop }

# Ignore all HA Supervisor info messages.
if $msg contains "supervisor" and $msg contains "INFO" then { stop }


I noticed the ZRAM option, but was hesitant to mention it. All other settings here I actually implemented myself, and been running stable for a couple of weeks now. This I did not do, and I did not want to get burned by leading the guys down the wrong path.

In point 3. I cannot find a “purge_interval” variable in the official docs: https://www.home-assistant.io/integrations/recorder
Instead it is recommended to create an automation for the purge intervals.
Perhaps it was available in earlier versions? Relevant, with references to setting purge_interval to 0 for disabling default daily purge, is the following thread: Recorder purge and repack
The latter can now be achieved by setting auto_purge to “false” according to the docs.
In any way the default setting of “1” should be good for our purposes.

Yep, the Recorder parameters changed. :astonished: I will update the doc to not confuse the peeps. Thanks…

Regarding your previous comment about disabling swap altogether, I don’t have enough knowledge to say something meaningful about it. Some say swap space is a good thing, others say no. Actually the same guy says no in another article. My take on this is to play it safe - limit the server’s affinity to use swap space (swappiness), like actually don’t use it, but to have swap space available as backdoor when its needed. But this is case specific, a guy running a 8GB RPi 4 with lots of memory available, may think different about it than a 1GB guy.
Would be great if some swap guru could chime in…

Actually I referred to a recommendation to disable swap only if zram is used, not in general. After a lot of research I’ve also come to the conclusion that Linux needs a kind of swap, even if it is ultimately not used.
I would also add regarding Database, that the best solution is to put it in memory all together (after the suggested optimizations in Recorder). And use InfluxDB for long-term data in case there is a system shutdown/restart etc.
Further on, in point C.4. the correct syntax for WPA log level is sudo wpa_cli log_level error, not sudo wpa_cli log _level=ERROR . The latter did not work for me.
I think, if log2ram is used (which I installed and seems great!), points D. and E. are redundant in my mind if there is enough RAM in the system. Since all logs of var/log are written to RAM with log2ram, there is little to worry about/tinker. Of course I’ll check the memory more often from now on.
I haven’t installed ZRAM too, since I see that there is a lot of headroom regarding memory. I have the Pi4 4GB version and only 1GB is used for now. I’ll see how things go on from now and react accordingly.
Again, many congratulations for the recommendations!