For those of you using Docker with the standard DB (or even a non-docker install), who want to back up their config prior to pulling a new release, you may also want to backup your DB file, even if it is large.
So you need/want compression, I did a comparison of some methods, speed and size, on a 1GB db file. Files are read off a SATA SSD, processor is a 4-core 4-thread Haswell Xeon, files then are written to another SSD. DB is from 2021.3 release.
Native compression needing no external libs or compressors
Best compression but slow
tar -cJf HABACKUPDB.tar.xz /var/homeassistant/home-assistant_v2.db
5 min 26 sec, 99.4MB
High performance implementation compatible with regular lzip decompression
4 times faster than XZ with almost the same compression ratio (both use LZMA)
tar -c -Iplzip -f HABACKUPDB.tar.lz /var/homeassistant/home-assistant_v2.db
1 min 12 sec, 104.5MB
High performance implementation compatible with regular bzip decompression
Very fast, but just ok compression
tar -c -Ipbzip2 -f HABACKUPDB.tar.bz2 /var/homeassistant/home-assistant_v2.db
29.0 sec, 146.6MB
ZSTD level 10:
Super fast and similar compression to bzip
tar -c -I"zstd -10 -T0" -f HABACKUPDB.tar.zstd-10 /var/homeassistant/home-assistant_v2.db
10.2 sec, 150.4MB
ZSTD level 5:
Even faster but compression still not great:
tar -c -I"zstd -5 -T0" -f HABACKUPDB.tar.zstd-5 /var/homeassistant/home-assistant_v2.db
4.7 sec, 165.4MB
So who wins? Depends on the need, ZSTD-10 is faster than it takes to pull the new docker image (if done in parallel, but 5 is obviously the fastest. ZSTD will also have the fastest decompression time. PBZip retains compatibility with bz2 files, which means compatibility with pretty much any archiving software, even ancient ones. PLZip has the best combination of file size and speed, it is 4 times faster than XZ because it can use all 4 cores. LZip is also apparently a better format from a data recovery perspective. It is also well supported. I did not bother with gzip and lzop because of their poor compression ratio. I tested ZSTD level 15, but the compressed data size was within 1% of level 10 for this type of data, but at half the speed. Level 17 is where size starts to drop, but at that point it is the same speed as PLZip but still not anywhere close to the compression ratio.
I would use PLZip if retaining lots of backups is important, or if you have larger databases.
I would use ZSTD-5 if you retain a single backup, have a small db, or want the shortest downtime between upgrades.
Faster sysems with ample memory like this make better use of ZSTD-10
For mem requirements, PLZip and ZSTD-10 can use huge amounts of memory, over a gig in some cases. PBZip and ZSTD-5 use far less. ZSTD level 9 will typically use half the compression memory of 10. Memory constrained systems should use single threaded LZip for best compression or ZSTD levels 4 through 9 for best speed. 4 is a good chunk faster and uses even less memory, but the compression ratio takes a similar hit.
On systems with 2 or less cores, ZSTD-5 is probably the best option. Single threaded LZip has the best compression and uses far less than the multithread version, with a 1/cores speed ratio.
As for the rest of the config dir, it is much more compressible and much smaller unless you have a bunch of media files in there or something:
tar --exclude=home-assistant_v2.db -c -Iplzip -f HABACKUP.tar.lz /var/homeassistant
2.64 sec, 1.7MB
tar --exclude=home-assistant_v2.db -c -I"zstd -10 -T0" -f HABACKUP.tar.zstd-10 /var/homeassistant
0.42 sec, 1.8MB
Going with ZSTD there for sure! At that speed someone might assume the operation failed to complete. I would assume the log file would get compression ratios in the high 90s, so a large log file would not add much to the archive size, a previous test saw a 30MB log only add 0.3MB to the file size using lzma based compressors. Levels higher than 10 do not help much if at all.