Hey all,
I recently looked at my Home Assistant backups stored on S3. Even with a simple lifecycle rule deleting objects after 180 days, I had accumulated ~380 GB (≈1.8 GB per daily backup, plus manual backups on updates). That was more S3 cost than I was comfortable with.
The core issue is that S3 lifecycle rules are purely age-based: they can delete or transition objects after N days, but they cannot express policies like “keep the latest backup per day, week, and month.” As a result, you still end up storing a lot of redundant backups.
I took some time to write a small Python script that implements a classic GFS-style retention policy (daily / weekly / monthly pruning), and I wanted to share it in case others are in the same situation (YMMV).
I wrote up the details here and the script itself is here.
I’ve begun using it myself, and it instantly reduced my backups to ~40 GB, which I’m way more happy about ![]()
How it works (high level)
You can run the script at any time, and it will ensure there is at most one backup per configured day, week, and month window.
For example, if you configure:
- 3 daily
- 2 weekly
- 2 monthly
You’ll end up with roughly 7 retained backups:
- The latest backup from each of the last 3 days
- The latest backup from each of the last 2 weeks (may overlap with dailies)
- The latest backup from each of the last 2 months (may overlap with daily/weekly)
This collapses redundant backups within each period while still keeping meaningful restore points.
Safety features (important)
The script is intentionally conservative:
- Dry-run is enabled by default (no deletions until you explicitly disable it)
- It only ever operates on objects matching a regex you provide (e.g. HA backup filenames)
- It never deletes the last N backups (default: 5, configurable)
- Objects outside the regex (other data in the same bucket/prefix) are untouched
So even if you share a bucket with other data, it should not be an issue as long as the regex is correct.
Requirements
- AWS IAM credentials with access to the bucket
- The S3 bucket (and optional prefix)
- A filename regex (the blog post includes one that matches Home Assistant backups)
By default, the script just prints what it would delete.
Deployment
I’ve personally deployed it as an AWS Lambda triggered by S3 upload events, so it runs automatically whenever a new backup is uploaded. That way pruning happens continuously and stays entirely within AWS.
This part is probably more relevant to advanced users, but the script itself can be run anywhere (locally, cron, CI, etc.).
Happy pruning ![]()
Related discussion: