Intro
SD cards, but also SSD’s to a lesser extent, have a finite number of write cycles and will fail at some point. Your mileage will vary based on the quality of storage you’re using, and of course how busy your system is. But one thing can be said and that is that Home Assistant
does very little to help and reduce the data being written to disk.
In Release 2022.4 some improvements
were implemented, so we may be moving in the right direction.
This write-up is about possible actions you can take to reduce the logging and data flows wearing out your storage. Some are easy and others more involved, so choose based on your confidence and skill levels, and also to what extremes you want to go to reduce those write cycles.
Concerns
- The
HA
(Home Assistant) Logger filtering currently (June 2021) only applies to theHA
core
component. Setting the log level forHA
supervisor
can be done from the command line (see below). Other components, e.g.hassio_audio
spam the logs with debug and info level messages all the time, and from within the application there is currently no way to stop this from happening. - Although Docker supports STDOUT and STDERR output streams with resulting log level messages,
HA
is currently logging all output to STDERR only.- This means we can’t make use of standard filtering in downstream systems to eliminate unwanted data.
- This means that all
HA
messages are logged as errors. So it is not possible to run a daily report to “show me all errors that occurred on my system today”, because most if not all of those errors are no errors at all.
There are tickets to correct this behavior for hassio_supervisor
and hassio_audio
. Unfortunately there is no joy, so I submitted a feature request for hassio_supervisor to filter out its own debug statements.
(You may want to upvote these changes, to show their importance to the user community)
Rules of Engagement
When implementing changes to the system, whatever we do we want to maintain stability and be able to revert back to a working state should things go south.
So
- Make a backup of every (system) file before you make changes.
- Apply changes incrementally, and test in between.
- Don’t compromise your security. E.g. don’t disable Samba logs if anybody else can access your RPi and you want to monitor (failed) connections.
- KEEP A SYSTEM CHANGE LOG !!
Write up everything you change, and also the time it was (effectively) implemented.- If something does not work (anymore), then you know what you changed, and when.
All those sudden errors may or may not even be caused by your change, but you can only make the link if you have this information available. - In future you may want to undo some changes, to expand logging again to help troubleshooting that new problem.
- New functionality may render your change useless, and you may want to undo it to reduce load on the system (e.g. rsyslog filtering).
- If something does not work (anymore), then you know what you changed, and when.
Now with all that out of the way, it’s time to roll up the sleeves and get our hands dirty (and systems clean)…
A. Home Assistant
1) Logs
First step and perhaps the easiest is to make sure HA
outputs the minimum of logging to the downstream systems.
a. Logger
HA Logger
configuration.
In configuration.yaml
, appart from other filtering settings, it is possible to set the logging level of HA components and integrations. Set the default level that will apply to all components (supporting this feature), and override it where needed for specific cases where you want to see more detailed information:
logger:
default: critical
logs:
# log level for HA core
homeassistant.core: fatal
b. CLI
Some HA components do not support the HA Logger
settings and must be configured by passing parameters from the command line using the HA CLI.
Do the following from the command prompt or SSH to change the log level for hassio_supervisor
to e.g. “warning”:
# Change hassio_supervisor log level
ha supervisor options -l warning
c. Other
It seems log filtering for some HA components like hassio_audio
is not yet implemented.
Reducing the hassio_audio
container logging can be done by using a bit of a dirty workaround to change the Pulse Audio log level inside the container from “debug” to e.g. “error”.
2) Database
In release 2022.4 database optimizations
were implemented to improve performance and to reduce the amount of data written to and stored in the database. This may very well reduce the need to tinker with the database setup.
a. Location
It is possible to change the default location of the HA
database. This means you can e.g. move it to a USB drive if you’re still on SD. Or you can put it in memory (which is volatile and you will loose the contents of the database when you reboot), and then on regular intervals extract only the reduced data set that you really want to a persistent database.
There are guys who extract data from the in-memory HA
database to an online BigQuery database.
I have no experience with this, only mention it here for the sake of completeness.
# Put the SQLITE db in memory.
recorder:
db_url: 'sqlite:///:memory:'
b. Recorder
Recorder
configuration. Keep the HA
database small:
- Only record what you really need. You can explicitly include or explicitly exclude what is recorded, or use a combination of both.
- Purge on a regular bases (the default is daily at 04:12am)
- Keep a small rolling set of data (e.g. 7 days).
# Capture and log data to the HA database
recorder:
commit_interval: 30
purge_keep_days: 7
#purge_interval: 1 # obsolete, replaced by "auto_purge" (default: true)
include:
domains:
- sensor
- .... # domains to record
entities:
- sun.sun # specific entities to record, where the domain may be excluded
exclude:
domains:
- homemonitor
- updater
- .... # domains to exclude from recording
entities:
- .... # specific entities to exclude, while the domain may be included
entity_globs:
- sensor.epson* # groups of sensors by using a wildcard
What can be shown in the History
and Logbook
views is determined by the entities that are logged in the database, but you can exclude events and entities from these views to remove the noise and only show what is relevant to you. (And it is possible to add custom events to the Logbook using the Logbook service).
c. Scan Intervals
For certain types of entities it is possible to set the scan interval to reduce database growth and unnecessary load on system resources. You e.g. may not need to know every 30 seconds that the database size increased with a couple of KB’s.
sensor:
- platform: sql
scan_interval: 600
queries:
- name: Database Size
query: "SELECT ROUND(page_count * page_size,1)/1000000 as size FROM pragma_page_count(), pragma_page_size();"
column: "size"
unit_of_measurement: MB
3) HACS
HACS
is nice and all that. But if you don’t use any custom components, then (for now) don’t install HACS
just for the fun of it. It downloads a lot of stuff from Github, and keeps on updating it every 30 (!) seconds. And while doing this it writes debug statements to journald.
It seems that in Release 2022.4
an optimization was done to the Github integration by using event subscriptions instead of polling GitHub.
B. Node-Red
If you are running Node-Red
in a container that was set up from Docker Hub (actually also HA
add-ons run as containers), make sure it is not performing a health check like every 30 seconds.
If yes, then you may want to rebuild your image with a "HEALTHCHECK NONE"
parameter.
Use “Docker Events
” (see below) to monitor the events raised by running containers.
C. Linux Host
Log Flow Overview
Below is a cryptic summary of how the messages raised by HA
are processed.
The HA
core components, as well as any add-ons, run in Docker containers. These containers are configured to use the journald log driver
to process messages output on STDERR (and STDOUT). These messages are picked up by the LINUX systemd-journald
and are put in the system journal. The messages are then passed on to rsyslog
, where they are routed to the different log files in e.g. /var/log
, based on the rsyslog filtering rules.
1) File System
Make sure “noatime
” is set for the file systems defined in /etc/fstab
. This is to prevent updating file metadata when a file is (only) accessed for reading.
Requires reboot.
e.g.
PARTUUID=f3557908-01 /boot vfat defaults,noatime 0 2
PARTUUID=f3557908-02 / ext4 defaults,noatime 0 1
2) Log2Ram
The Log2Ram
functionality creates a /var/log
mount point in memory, so any logs written to the /var/log
folder will not be written to disk immediately, the RAM drive will be flushed to disk regularly/daily. The logic behind it is explained here.
3) Swap Files
When the system starts to run out of usable memory (RAM), it starts to use part of the disk to offload memory data and thus free up memory. Even with enough memory available, Linux will still tend to use swap space after some time to offload seldom used parts of memory to free up memory for other programs.
There are pros and cons to using swap space, but the focus here is to protect our storage, albeit at the cost of additional load on CPU (my RPi 4 is on average running on 2% CPU, so increasing CPU usage marginally is not a concern at the moment).
So depending on your situation - how many containers and add-ons you’re running (and thus loading into memory), how much memory you have available (1/2/4/8GB), and/or use a SD card (that needs more protection), consider to reduce the OS’s affinity or swappiness to use the swap space.
There is also an argument to make use of ZRAM if you are low on memory and have low CPU usage.
# Check current memory usage and availability, and swap file usage:
free - h
# Check swap file status:
sudo service dphys-swapfile status
# Change affinity to use swap file:
sudo vi /etc/sysctl.conf
# set or add
vm.swappiness=10
A value of 100 means “at every opportunity”, while 0 means “only if things do not work”. Default is 60.
4) Connectivity
- Change the WPA log level from INFO to ERROR.
- Consider to disable Bluetooth if you will never use it on the RPi.
- Consider to disable WiFi if your RPi is and always will be connected via LAN.
sudo wpa_cli
log_level = ERROR
# Disable Bluetooth:
sudo systemctl disable hciuart
# In "/boot/config.txt" add
dtoverlay=disable-bt
# Disable WiFi:
# In "/boot/config.txt" add
dtoverlay=disable-wifi
- Change the Samba log settings.
# Samba settings in /etc/samba/smb.conf
[global]
log level = 0
max log size = 500
local master = no
# Only log to /var/log/samba/log.{smbd,nmbd}.
logging = file
[printers]
load printers = no
printing = bsd
printcap name = /dev/null
disable spoolss = yes
If you continue to have entries in your Samba log that is meaningless and that you can’t get rid off, consider to replace the Samba logfiles with a link to a null device:
# replace Samba logs with /dev/null links
sudo rm /var/log/samba/log.nmbd
sudo rm /var/log/samba/log.smbd
sudo ln -sf /dev/null /var/log/samba/log.nmbd
sudo ln -sf /dev/null /var/log/samba/log.smbd
- If you are running X11 VNC, change its settings and log level.
Default log is~/.vnc/vncserver-x11.log
, but that is a EXT4 file and not TMPFS.
# Add to VNC config ( ~/.vnc/config.d/ or /etc/vnc/config.d/common.custom )
AudioEnable=0
EnableAutoUpdateChecks=0
EnableChat=0
EnableRemotePrinting=0
Log=*:file:1
LogDir=/log/var
LogFile=vncserver-x11.log
- If you are not running headless, and spend some time using a keyboard and mouse connected to the RPi, it generates logging that I don’t know how to disable.
One workaround is to replace the existing logs with null files:
sudo ln -sf /dev/null /var/log/Xorg.0.log
sudo ln -s /dev/null /home/pi/.cache/lxsession/LXDE-pi/run.log
D. journald
journald
(or rather systemd-journald
) is the entry point of HA
Docker messages to the Linux system side of things.
The following changes can be considered:
( in the journald config file `/etc/systemd/journald.conf`` )
-
where the journal data is stored
- “Runtime…” parameters put the journal storage in memory (
/run/log/journal
) - “System…” parameters make the storage persistent on disk (
/var/log/journal
).
- “Runtime…” parameters put the journal storage in memory (
RuntimeMaxUse : how much space the journal may use
RuntimeKeepFree : how much space systemd-journald will leave free for other uses
RuntimeMaxFileSize : how large individual journal files may grow
RuntimeMaxFiles : max nr of (archived) journal files
- the max log levels that will be processed.
If these are set to “warning”, then ignore and discard notice, info and debug messages.
The nuclear option is to set the max level to e.g. “crit” to ignore theHA
debug errors (but note that this is system-wide!).
MaxLevelStore=warning
MaxLevelSyslog=warning
MaxLevelKMsg=warning
Implement journald parameter changes as follows:
# Make a backup. Edit the config file and make changes. Then restart journald. Check journald status to be "active (running)".
sudo cp /etc/systemd/journald.conf /etc/systemd/journald.conf.ORIG
sudo vi /etc/systemd/journald.conf
sudo systemctl restart systemd-journald
sudo systemctl status systemd-journald
The journal storage size can be monitored, and reduced (e.g. in cron on a regular bases).
journalctl --disk-usage : check current disk usage
journalctl --vacuum-size=1GB : clean up and reduce size to 1GB
journalctl --vacuum-time=1w : remove all messages prior to the past week
E. rsyslog
rsyslog
is responsible for filtering and writing the journal log data to file.
The main rsyslog
config file is /etc/rsyslog.conf
.
When adding your own filters, it is recommended to create new files in /etc/rsyslog.d/
, starting with a numeric value indicating the sequence in which it must be loaded.
Unfortounately HA logs for e.g. hassio_supervisor and hassio_audio are currently all logged as “error” (priority=3), regardless whether they are error, warning, info or debug messages.
This makes it impossible to filter based on log level (priority), and forces us to do filtering using regex and parsing the contents of the message. Two issues with that:
- Impact on performance and increased resource usage.
- Inaccuracy. Either unwanted msg’s slip through, or (real) errors/wanted msg’s are filtered wrongly.
Remarks
- The sequence of filters is important. If a message is “stopped” in an earlier filter, then it will not be processed further in subsequent filters.
- If
/etc/rsyslog.conf
contains “$IncludeConfig /etc/rsyslog.d/*.conf
”, then all*.conf
files in/etc/rsyslog.d/
will be included, ordered by filename. - Messages can be filtered based on <facility>.<priority>, message contents, etc.
- Priority goes from “emerg” (0) down to “debug” (7). Therefore
*.err
will include all messages from “err” down to “debug”. “*.=err
” will only match errors. - In the default configuration, there is a LOT(!) of duplication happening, the same message may end up in multiple log files.
You can choose to eliminate this duplication by changing the default filters. Also discard unwanted messages in the logs by e.g. eliminating logging of debug messages.
Based on the typical (Debian) default config, partly shown below:- ALL messages (
*.*
) will go to/var/log/syslog
- But all “user” messages (
user.*
) will also go to/var/log/user.log
- And if that message is a “debug” level message (
*.=debug
), it will also go to/var/log/debug
- ALL messages (
...
auth,authpriv.* /var/log/auth.log
*.*;auth,authpriv.none -/var/log/syslog
cron.* /var/log/cron.log
daemon.* -/var/log/daemon.log
kern.* -/var/log/kern.log
lpr.* -/var/log/lpr.log
mail.* -/var/log/mail.log
user.* -/var/log/user.log
...
*.=debug;\
auth,authpriv.none;\
news.none;mail.none -/var/log/debug
Change the rules to reduce logging, e.g.
- Replace the all-level
*
wildcards (e.g.user.*
) with a minimal level (user.warning
) - Don’t just dump everything (
*.*
) in/var/syslog
. Filter according your needs.
e.g. just put errors in syslog:
*.error;auth,authpriv.none -/var/log/syslog
- Discard all debug messages.
Also consider this for*.=info
(this data is already logged in other files)
*.=debug;\
auth,authpriv.none;\
news.none;mail.none stop
To add new filters, create a new file in /etc/rsyslog.d/
e.g. 01_hassfilters.conf
, then add the filter definitions (in correct sequence).
e.g.
# Custom Filters to reduce overhead of Home Assitant on logging.
# Ignore all "pulseaudio" messages that are not errors.
if ($msg contains "pulseaudio") and not ($msg contains " E: ") then { stop }
# Ignore all Supervisor INFO debug messages.
if ($msg contains "supervisor") and ($msg contains "INFO") then { stop }
And just an example for testing dynamic file naming, should you want to do that. In this case, splitting user messages in different files based on severity.
$template userlogs,"/tmp/log/user-%syslogseverity-text%"
if $syslogfacility-text == ["user"] then -?userlogs
When any changes are made to rsyslog
config files, first verify the new configuration is valid, then restart rsyslog, and check the status afterwards again.
rsyslogd -f /etc/rsyslog.conf -N1
sudo systemctl restart rsyslog
sudo systemctl status rsyslog
Appendix: General Commands
Some commands and tools you may find helpful to visualize and trace disk usage, and related activities.
* Linux
Command | Description | |
---|---|---|
sudo iotop -aod 2 |
“iotop”. Show disk writes, accumulated over 2 seconds. | |
lsof -p <PID> |
“lsof”. Show open files for PID (TID from “iotop”) | |
pstree -spalGhnu <PID> |
“pstree”. Show parent processes for PID (change parameter for child processes). | |
sudo strace -y -e write -p <PID> |
strace (not always permitted/possible). Trace write file system calls for PID | |
dmesg -T |
show kernel messages (-T translates timestamp) | |
logger -p user.warn "TEST: user warning message" |
logger Writes to journald. Can be used to test rsyslog filters. |
Recursively list all files that changed in the past 60 minutes, starting from <path>
find <path> -mmin -60 -type f 2>/dev/null -print0 | xargs -r0 ls -l
* journalctl
journalctl
is a very convenient and powerful tool to search through system events that were written to the systemd
journal. (Note that messages from Docker containers that use a different log driver, e.g. json-driver
, will not be in the journal.)
Here are some examples how to extract data using journalctl
to wet your appetite:
(a combination of filters allowed)
Command | Description | |
---|---|---|
journalctl -k |
List all kernel messages (since last boot) | |
journalctl -p err --since today |
List all errors since midnight | |
journalctl -F CONTAINER_NAME |
Unique list of all CONTAINER NAMEs in journal | |
journalctl CONTAINER_NAME=hassio_audio -f |
Show hassio_audio messages continuously in realtime | |
journalctl CONTAINER_NAME=hassio_supervisor PRIORITY=6 |
All “info” level messages logged by “hassio_supervisor” | |
journalctl CONTAINER_NAME=hassio_supervisor -p err |
All errors logged by “hassio_supervisor” | |
journalctl -u docker -S -1h -o verbose |
All Docker messages logged in last hour, in more detail | |
journalctl -r -n10 -o json-pretty |
Latest 10 messages, in reverse order, output in json format (all data fields) |
Notes
- Any of the fields (from the
json-pretty
output) can be used for filtering, e.g. PRIORITY or CONTAINER_NAME. - Filtering by log level (e.g “
-p err
” or “PRIORITY=3
”) has no meaning forHA
as ALL the messages are currently logged as errors, even info and debug messages.
* Docker
# Monitor all Docker events (e.g. container stop / start etc.)
docker events --format '{{.Time}} Type={{.Type}} Status={{.Status}} Container={{.Actor.Attributes.name}}'
docker logs <container> # show logs from container ("-f" to show continuous/realtime logs)
docker ps # list all running containers ("-a" to also show stopped and paused containers)
# Write messages to STDERR and STDOUT from within a Docker container (to test downstream message handling).
# Connect to a container by logging in to a Bash shell (e.g. hassio_supervisor).
# Write messages to the STDERR and STDOUT streams of the process with PID 1.
docker exec -it hassio_supervisor bash
echo "`date` TESTING STDERR" >> /proc/1/fd/2
echo "`date` TESTING STDOUT" >> /proc/1/fd/1
# Show the log driver used by each of your containers. Should be `journald`, and possibly `json-file` for non-HA containers you may have in your stack.
column -t <<< $(for f in $(docker ps --format '{{.Names}}'); do printf "$f \t"; docker inspect -f '{{.HostConfig.LogConfig.Type}}' $f; done)