Researching I/O Write Improvements

agners · September 1, 2022, 3:09pm

Hello,

After having some discussion with @orange-assistant in the Home Assistant OS 8 release thread I started doing some I/O write improvements research. Let’s discuss the results in this new thread.

The main goal is to find if ext4 commit intervals, tuned VM dirty writeback timeouts (vm.dirty_expire_centisecs) and potential other tunables can have a positive effect on writes. The goal is to essentially cause fewer writes to the storage, which should lessen the write amplification effect typically seen on flash storage.

For testing I use a iomonitor.sh script, which is a envolved version of the armbian monitor command found here. My version only looks at the actual write I/O’s merges and bytes written to the main data disk reported by /proc/diskstats.

My test setup is a virtual machine with a vanilla HAOS 9.0 development build, and a vanilla Home Assistant Core 2022.8.7 configuration with SQLite. No add-ons other than the Terminal add-on is installed. I did some Idle measurements as baseline, but those are really not that useful. Most tests run with a single automation which increment a counter every second. This changes the state of an entity every second, which essentially causes the recorder to record a state every second. So this should cause continuous writes.

Note: HA Core recorder by default pools state changes, so SQLite is not causing a transaction every second.

The test is somewhat crude, and mainly represents a simple HA installation. Add-ons (e.g. different database backend) etc. could generate different results. But I think this tests should still be helpful to make a somewhat informed picture on the tunables.

Test	I/O Count	I/O merge	I/O kbytes
Idle, Commit 5s, VM 30s/5s (vanilla)	90	41	453
Idle, Commit 30s, VM 30s/5s	50	42	274
1s Increment, Commit 5s, VM 30s/5s (vanilla)	572	187	31139
1s Increment, Commit 30s, VM 30s/5s	381	99	27316
1s Increment, Commit 60s, VM 30s/5s	415	88	27702
1s Increment, Commit 60s, VM 60s/30s	483	123	28215
1s Increment, Commit 120s, VM 120s/60s	431	101	26994
1s Increment, Commit 600s, VM 30s/5s	475	108	30257

What puzzles me a bit is that higher Commit caused fewer I/O merges. I guess the merge happens at a lower layer, hence when fewer writes hit that lower layer there are also fewer opportunities for I/O merges… In any case, the main value we try to optimize is I/O Count, which is the effective completed I/O operations on the block device.

I’ve looked into iotop, and all writes are basically from Home Assitant Core itself. I’d assume most of them are generated by the recorder.

So my interpretation of the results are the following: The difference is measurable, but not massive. As I wrote in the HAOS 8 release thread, higher levels do some write pooling already, which should also translate to fewer writes even with a 5s commit interval, and this seems to be largely true.

However, we can definitely save some writes. It seems that we have about ~30% fewer writes when increasing the commit interval to 30s. It seems that after 30s there is really diminishing returns to increase the value further. I’d guess that is because the Recorder commits data around this rate.

Thoughts, other ideas?

Best regards,
Stefan

orange-assistant · September 1, 2022, 3:57pm

That is already something but is it also measurable to which extend the pages are “filled”?

With a mathematical approach one could argue “before” (default 5s commit) pages were only filled 2/3 and now (30s commit) they are “efficiently” filled and fully written. In reality this will be far from true probably and also only half “full” pages will get written after a higher commit interval.

But in general wearing out the flash cells 30% lesswith the same host/filesystem data written sounds already like a easy win.

Maybe it’ also fact that HA writes a lot of data? So as “more” data (or a bigger data “stream”) the host system is writing as “faster” a page (typically sizes could be 8 or 16k - or maybe even more nowadays?) is filled. The question could be how long does ha (for example a “vanilla” system) take to have “enough” data together for a full page (write)?

agners · September 2, 2022, 7:39am

Not really, just reading /proc/diskstats doesn’t allow this. It can be estimated by dividing the amount of bytes writen by the I/O’s. In practice the size of the I/O operations probably varies widely. When running the script, you can see smaller and bigger write bursts (this is from commit=600s):

Thu Sep  1 14:29:08 UTC 2022      61      20        1812 kbytes written after 16 sec
Thu Sep  1 14:29:09 UTC 2022       2       0           4 kbytes written after 1 sec
Thu Sep  1 14:29:13 UTC 2022      10       3          52 kbytes written after 4 sec
Thu Sep  1 14:29:17 UTC 2022       2       1           3 kbytes written after 4 sec
Thu Sep  1 14:29:23 UTC 2022       5       0        1200 kbytes written after 6 sec
Thu Sep  1 14:29:38 UTC 2022       4       0        1480 kbytes written after 15 sec
Thu Sep  1 14:29:45 UTC 2022       7       1         621 kbytes written after 7 sec
Thu Sep  1 14:29:54 UTC 2022      46      14        1212 kbytes written after 9 sec

With a mathematical approach one could argue “before” (default 5s commit) pages were only filled 2/3 and now (30s commit) they are “efficiently” filled and fully written.

I mean, we don’t know the underlying page size of the flash storage. The bytes written seems to be rather constant, and that does make sense as the Core runs SQLite in WAL mode (Write-Ahead Logging, essentially causing data to be sequentially applied to the log. Same concept is used by F2FS as well).

But in general wearing out the flash cells 30% lesswith the same host/filesystem data written sounds already like a easy win.

Sure, and I think we should take it But it doesn’t seem to me that HAOS is completely missconfigured as you claimed in the other thread In the end, the writes are caused by the application layer, and they also control when data is been written out to disk (especially databases).

Maybe it’ also fact that HA writes a lot of data? So as “more” data (or a bigger data “stream”) the host system is writing as “faster” a page (typically sizes could be 8 or 16k - or maybe even more nowadays?) is filled. The question could be how long does ha (for example a “vanilla” system) take to have “enough” data together for a full page (write)?

30MiB in 5 minutes seems like quite some indeed. I’ll have to look into HA Core to see what is exactly done there.

agners · September 2, 2022, 11:39am

Actually, checking the recorder documentation, it seems that the default commit interval is set to 1s. With my 1 second automation, and with an ACID compliant database, this should cause a I/O write every second (the Durability aspect of ACID basically means the database needs to write data and issue fsync to make sure it hits the disk). Yet we don’t see a I/O’s every second, how come?

The recorder does insert state changes to the state table and commits the transaction every second, I’ve checked that. However, the current database configuration sets the SQLite’s synchronous flag to NORMAL (see Pragma statements supported by SQLite). This mode which avoids some syncs (and looses the durability guarantee).

WAL mode is always consistent with synchronous=NORMAL, but WAL mode does lose durability. A transaction committed in WAL mode with synchronous=NORMAL might roll back following a power loss or system crash.

@bdraco did that change earlier this year with this PR, and IMHO it is a sensible choice for WAL.

Let’s see how various commit intervals behave on disk I/O level:

Test	I/O Count	I/O merge	I/O kbytes
1s Increment, recorder commit 1s, ext4 Commit 5s, VM 30s/5s*	572	187	31139
1s Increment, recorder commit 1s, ext4 Commit 30s, VM 30s/5s	381	99	27316
1s Increment, recorder commit 30s, ext4 Commit 30s, VM 30s/5s	192	93	3975

(*) current default configuration

That is a 2x improvement in I/O writes, and a big difference in terms of bytes written!

From what I understand the Write-ahead logging of SQLite does not cause writes to collapse much, as the log is appended after every transaction. Now since we issue a database commit every second, this adds up writes (data + transaction overhead). All these transactions eventually hit the disk.

Test	I/O Count	I/O merge	I/O kbytes
1s Increment, recorder commit 0s, ext4 Commit 5s, VM 30s/5s§	3781	215	55286
1s Increment, recorder commit 1s, ext4 Commit 5s, VM 30s/5s*	572	187	31139
1s Increment, recorder commit 5s, ext4 Commit 5s, VM 30s/5s	373	153	11837
1s Increment, recorder commit 30s, ext4 Commit 5s, VM 30s/5s	203	55	4492
1s Increment, recorder commit 0s, ext4 Commit 30s, VM 30s/5s§	3607	128	54330
1s Increment, recorder commit 1s, ext4 Commit 30s, VM 30s/5s	381	99	27316
1s Increment, recorder commit 5s, ext4 Commit 30s, VM 30s/5s	300	99	11051
1s Increment, recorder commit 30s, ext4 Commit 30s, VM 30s/5s	192	93	3975

(*) current default configuration
(§) 0s causes the recorder to use synchronous=FULL. With an event every second, this causes a file system sync every second, this also nicely shows up during testing:

Fri Sep  2 09:12:20 UTC 2022      10       0         144 kbytes written after 1 sec
Fri Sep  2 09:12:21 UTC 2022      10       0         172 kbytes written after 1 sec
Fri Sep  2 09:12:22 UTC 2022      10       0         144 kbytes written after 1 sec
Fri Sep  2 09:12:23 UTC 2022      10       0         156 kbytes written after 1 sec
Fri Sep  2 09:12:24 UTC 2022      10       0         212 kbytes written after 1 sec

I did some more tests with logging enabled. Setting log level to info logs every automation execution. In my test case this leads to two logs entries every second. All these tests have done with the same 1s increment automation and recorder commit set to 30s.

Test	I/O Count	I/O merge	I/O kbytes
log level `warning`, ext4 Commit 5s, VM 30s/5s	203	55	4492
log level `info`, ext4 Commit 5s, VM 30s/5s	1830	357	12695
log level `info`, ext4 Commit 30s, VM 30s/5s	1842	130	11821

Most of these writes are generated by the system log systemd-journald. The ext4 commit interval does not really change the write I/O’s, but the log shows that writes are done in bigger batches with increased commit interval. If the blocks written happen to be on the same (flash) page, this can have a positive effect.

With all this testing, I come to the following conclusion:

Make sure you system is not logging extensively. info logging already can cause lots of entries to be written.
Set recorder commit_interval to 5 or higher to decrease writes significantly.
The file system tuning have some effect on write I/Os for the database, but not so much for logging. Higher commit interval than 30s seems to have negligible effects.

Even though ext4 commit interval seems not to have a huge effect, bumping it a bit is probably still a good idea. I created a PR to set the commit interval to 30s as this seems to have good returns while still being on a somewhat conservative level.

bdraco · September 2, 2022, 11:55am

Also one thing to note on recorder, if there are no changes, we don’t commit. So if you have no state changes or events in a 1s interval period, we avoid the write

bdraco · January 17, 2023, 8:38pm

To close the loop on this research:

github.com/home-assistant/core

Increase default recorder commit interval to 5 seconds

home-assistant:dev ← bdraco:recorder_commit_interval

opened 08:31PM - 17 Jan 23 UTC

bdraco

+1 -1

needs https://github.com/home-assistant/core/pull/86110 and https://github.com/h…ome-assistant/core/pull/85892 and https://github.com/home-assistant/frontend/pull/15112  ## Breaking change  The recorder's default commit interval has been increased to 5 seconds to reduce disk I/O and increase storage longevity. As the history and logbook integrations both support live streaming, which avoids the need to read the database after the initial sync, there is no remaining core functionality that needs a lower commit interval. The [`commit_interval`](https://www.home-assistant.io/integrations/recorder/#commit_interval) can be adjusted to a lower value in the event an integration needs read the database sooner and cannot be updated to watch for events. ## Proposed change  The number was chosen based on the research by Stefan in https://community.home-assistant.io/t/researching-i-o-write-improvements/456227/4?u=bdraco We just could not do it at the time of the research due to the redesigns needed for the logbook and history apis. ## Type of change  - [ ] Dependency upgrade - [ ] Bugfix (non-breaking change which fixes an issue) - [ ] New integration (thank you!) - [ ] New feature (which adds functionality to an existing integration) - [ ] Deprecation (breaking change to happen in the future) - [x] Breaking change (fix/feature causing existing functionality to break) - [x] Code quality improvements to existing code or addition of tests ## Additional information  - This PR fixes or closes issue: fixes # - This PR is related to issue: - Link to documentation pull request: https://github.com/home-assistant/home-assistant.io/pull/25810 ## Checklist  - [x] The code change is tested and works locally. - [ ] Local tests pass. **Your PR cannot be merged unless tests pass** - [ ] There is no commented out code in this PR. - [ ] I have followed the [development checklist][dev-checklist] - [ ] The code has been formatted using Black (`black --fast homeassistant tests`) - [ ] Tests have been added to verify that the new code works. If user exposed functionality or configuration variables are added/changed: - [x] Documentation added/updated for [www.home-assistant.io][docs-repository] If the code communicates with devices, web services, or third-party tools: - [ ] The [manifest file][manifest-docs] has all fields filled out correctly. Updated and included derived files by running: `python3 -m script.hassfest`. - [ ] New or updated dependencies have been added to `requirements_all.txt`. Updated by running `python3 -m script.gen_requirements_all`. - [ ] For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description. - [ ] Untested files have been added to `.coveragerc`.  To help with the load of incoming pull requests: - [ ] I have reviewed two other [open pull requests][prs] in this repository. [prs]: https://github.com/home-assistant/core/pulls?q=is%3Aopen+is%3Apr+-author%3A%40me+-draft%3Atrue+-label%3Awaiting-for-upstream+sort%3Acreated-desc+review%3Anone+-status%3Afailure  [dev-checklist]: https://developers.home-assistant.io/docs/en/development_checklist.html [manifest-docs]: https://developers.home-assistant.io/docs/en/creating_integration_manifest.html [quality-scale]: https://developers.home-assistant.io/docs/en/next/integration_quality_scale_index.html [docs-repository]: https://github.com/home-assistant/home-assistant.io

lmagyar1973 · March 9, 2023, 4:21pm

FYI

I’ve tried high values for commit_interval like 600, ie. 10 minutes. I’ve logged the table row counts with a script, and made the below Excel chart. Is shows, that the recorder commits in each approx. 3-5 minutes. If this is normal, forget about my comment.

bdraco · March 10, 2023, 7:14pm

Statistics runs need the data committed before they can run which means the every 5 minutes stats generation will always trigger a commit. A commit interval larger than 5 minutes makes no sense with the current design. We should probably cap that value so its apparent what the upper limit actually is.

Also there are quite a few tasks (mostly stats) that will trigger a commit as well

If the task isn’t set to commit_before = False it has to trigger a commit (which only happens if there are pending writes)

github.com

home-assistant/core/blob/d6a223f0e161fbd7a1fe906351b8ffbe75fd1efa/homeassistant/components/recorder/tasks.py#L246

    
      
              def run(self, instance: Recorder) -> None:
                  """Handle the task."""
                  instance.stop_requested = True
          
          

          
@dataclass
          class EventTask(RecorderTask):
              """An event to be processed."""
          
          
    event: Event
              commit_before = False
          
          
    def run(self, instance: Recorder) -> None:
                  """Handle the task."""
                  # pylint: disable-next=[protected-access]
                  instance._process_one_event(self.event)
          
          

          
@dataclass
          class KeepAliveTask(RecorderTask):
              """A keep alive to be sent."""

microtronics · August 17, 2023, 12:09pm

Hi @agners
The link for your monitoring script doesn’t seem to work. Could you please fix it? I’d like to do some similar monitoring for HA and it would help a lot. Thanks in advance!

agners · August 18, 2023, 1:36pm

I’ve updated the URL to point to a Gist!

karlheinzmabroe · August 22, 2023, 12:59pm

Hi,
I just stumbled upon this thread while trying to get Homeassistant in my configuration to only update the database every 3 hours and keep everything in memory in between. So I set commit interval accordingly but find writes every 5 minutes.
Is it related to the statistics being written every 5 minutes and can I change that behavior to reduce disk writes?
background: I want to bring my NAS HDDs into standby, but this is not possible with HA writing every 5 minutes…

agners · August 22, 2023, 7:14pm

That is really not how memory management and commit intervals are meant to be used.

I suggest to store things locally. If you are worried about data, you can take a backup every couple of hours.

karlheinzmabroe · August 25, 2023, 1:30pm

Ah, forgot to mention that HA runs in a docker environment on the NAS, so it does actually run locally Guess that’s still not the recommendet way to run it…