e-raser
December 23, 2023, 4:48pm
1
I think it’s the first topic of that kind I started over the years - today not an issue, only intensive wondering
I’m running HA OS (10.5) on a Pi 4. During the last days I discovered, that load and disk intensive tasks take significantly less time to complete. The overall performance is stunning, never seen that before. E. g.:
running SAMBA backup (default, nightly): earlier 35 minutes, now 17 minutes
running SAMBA backup for InfluxDB addon only (weekly): earlier 3 hours, now 1.5 hours
edit/update (2023-12-24): I also noticed restarting HA Core now only takes roughly 3 instead of formerly 7 minutes (but that’s the case already for a bit longer, roughyl 2023-11-19)
That makes a decrease of 50 % , in other words: those tasks are now twice as fast as before . Even issues (like huge backups not copied to remote storages like seen in Backups with size of ~ 4 GB not copied to remote SAMBA share (SAMBA backup fails) · Issue #161 · thomasmauerer/hassio-addons · GitHub ) don’t exist anymore (as they hit some magic value in terms of time, probably not size). It’s awesome!
I checked when at what changed and only discovered:
it improved on 19th of December
only thing changed at that time was supervisor, updating from 2023.11.x to current 2023.12.0
Here’s the fact: below the supervisor update entity, above the Samba backup active durations (cyan color) which roughly halved:
I did not upgrade storage or other hardware things, there was no restart of HA Core (last restart: 2023-12-12) or the host (last reboot: 2023-12-10).
Finally the questions:
Is this possible?
Can a Supervisor update boost overall system performance in such a dramatic way?
I checked Release 2023.12.0 · home-assistant/supervisor · GitHub and couldn’t find anything which would allow such a performance boost in the release notes.
WallyR
(Wally)
December 23, 2023, 4:53pm
2
Check your backups by opening them up with a zip program.
Sometimes it fails and then you will end up with a backup that are missing important parts.
e-raser
December 23, 2023, 5:04pm
3
Good idea.
SAMBA backup addon logs look fine, Supervisor logs at /config/logs
and /hassio/system
unfortunately only go back a few hours so I can’t check the time of backup creation.
The SAMBA backup addon has a mechanism to discover if a backup finished successfully or if it failed. Everything fine. E. g.
I also checked some backups: file size OK (normal), integrity fine (no error on open), extracted them, checked size of extracted content and number of files - everything fine.
So unless there’s a technical way to check the integrity of snapshots, based on those 3 indicators above I tend to say: backups are fine
Unfortunately it’s still unclear what improved that speed that much.
When I read
and specifically
#4843 Significantly speed up creating backups with isal via zlib-fast @bdraco
home-assistant:main
← home-assistant:zlib_fast
opened 11:08PM - 27 Jan 24 UTC
<!--
You are amazing! Thanks for contributing to our project!
Please, DO N… OT DELETE ANY TEXT from this template! (unless instructed).
-->
## Proposed change
<!--
Describe the big picture of your changes here to communicate to the
maintainers why we should accept this pull request. If it fixes a bug
or resolves a feature request, be sure to link to that issue in the
additional information section.
-->
[`isal`](https://github.com/intel/isa-l) via [`python-isal`](https://github.com/pycompression/python-isal) has a drop in replacement for zlib with the cavet that the compression level mappings are different. `zlib-fast` is a tiny compat layer to convert the standard zlib compression levels to `isal` compression levels to allow for drop-in replacement
I considered a few other approaches here such as using `pigz` or switching to another format but this seems like the simplest and most compatible resulting in only a time code change here.
Wheels are built for all supported platforms https://wheels.home-assistant.io/musllinux/
https://github.com/bdraco/zlib-fast/releases/tag/v0.2.0 https://github.com/pycompression/python-isal
Differences
- Compression for backups is ~5x faster than the baseline. My backup time went from 2m2s to 24s with a ~1400MiB backup.
- Restoring backups is ~2.5x faster (but not as noticable since restores were already ~5.5x faster than compress).
- YMMV on the compression ratio a bit depending on the content. On one system it was 2% better, and on another it was 7% worse. securetar uses compress level 6 by default so it seems like the goal there was to balance speed and compress ratio. We could turn up the compress level later, but I left it at the default for now as it seemed most consistent with the securetar choice.
- Cpu usage never hits 100% anymore on the either of my primary production system as I/O is the limitation now. It now hovers around 80% instead
- numbers will be far less impressive on 32 bit platforms and systems that are I/O constrained (reading the data off the sd card can be the bottleneck instead of the compression overhead for these).
https://github.com/powturbo/TurboBench/issues/43
Hopefully this means we won't see any more database backup failures where core gives up the lock on the SQLite database because it can't hold any more events in memory when the backup takes too long. I was especially keen on fixing this since when that happens it may be that user's only hint that their backup is corrupt is in the core log and they find out the hard way and open issue after it's already too late. https://github.com/home-assistant/core/blob/a793a5445f4a9f33f2e1c334c0d569ec772335fe/homeassistant/components/recorder/core.py#L1015
We can likely supplement the logger warning with a repair issue after this change since most cases should be the result of a hardware or overload problem with the system instead of the compression taking too long. Currently issues like https://github.com/home-assistant/core/issues/105987 end up going stale since there was no viable solution before. I have been waiting to add the [repair issue](https://github.com/home-assistant/core/pull/109020) since the answer of telling them to wait until we can make the backups faster seemed like it wasn't going to go over well and the result was the same with the issue eventually going stale since there was no solution
Testing:
- [x] x86_64 ~1400MB 2m2s -> 24s
- [x] aarch64 validation of backup and restore only (test data was small)
- [x] aarch64 (second machine, still a bit I/O constrained) - ~1100MB 9m29s -> 5m58s
- [x] armv7l (32 bit, I/O constrained due to SD card and not expected to change much.. I wanted to make sure it worked on on 32bit so I tested it anyways) ~500MB 5m16s -> 4m42s
A future improvement for I/O constrained system could be to use https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.addfile to add each gziped tar file to the final archive, rewind the stream, and rewrite the size over it, and jump back to the end so we wouldn't have to write all the tgz files to disk and they can be streamed into the final result. This would cut the writes in half (and increase storage lifetime) and likely make more difference for these systems than this change.
## Type of change
<!--
What type of change does your PR introduce to Home Assistant?
NOTE: Please, check only 1! box!
If your PR requires multiple boxes to be checked, you'll most likely need to
split it into multiple PRs. This makes things easier and faster to code review.
-->
- [ ] Dependency upgrade
- [ ] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (which adds functionality to the supervisor)
- [ ] Breaking change (fix/feature causing existing functionality to break)
- [x] Code quality improvements to existing code or addition of tests
## Additional information
<!--
Details are important, and help maintainers processing your PR.
Please be sure to fill out additional details, if applicable.
-->
- This PR fixes or closes issue: fixes #
- This PR is related to issue:
- Link to documentation pull request:
- Link to cli pull request:
## Checklist
<!--
Put an `x` in the boxes that apply. You can also fill these out after
creating the PR. If you're unsure about any of them, don't hesitate to ask.
We're here to help! This is simply a reminder of what we are going to look
for before merging your code.
-->
- [x] The code change is tested and works locally.
- [ ] Local tests pass. **Your PR cannot be merged unless tests pass**
- [ ] There is no commented out code in this PR.
- [ ] I have followed the [development checklist][dev-checklist]
- [ ] The code has been formatted using Black (`black --fast supervisor tests`)
- [ ] Tests have been added to verify that the new code works.
If API endpoints of add-on configuration are added/changed:
- [ ] Documentation added/updated for [developers.home-assistant.io][docs-repository]
<!--
Thank you for contributing <3
Below, some useful links you could explore:
-->
[dev-checklist]: https://developers.home-assistant.io/docs/en/development_checklist.html
[docs-repository]: https://github.com/home-assistant/developers.home-assistant
I thought “this could be the reason!”.
But: Supervisor 2024.1.1 has only been shipped/installed yesterday, performance boost discovered already in 12/2023…
e-raser
February 25, 2024, 9:08pm
5
OK that (Supervisor 2024.2 update / Significantly speed up creating backups with isal via zlib-fast by bdraco · Pull Request #4843 · home-assistant/supervisor · GitHub ) did once again increase speed and lowered backup creation time:
daily backups: nothing to maybe 2 minutes (8 to 10 instead of 10 to 12 minutes)
InfluxDB backups: roughly 42 % faster (compared first 2024.2 run with last 2024.1 run)
But still the miracle for the initial performance boost is kind of unsolved… put my bets still on the Supervisor.