Snapshots are slow and horrible for microsd cards [supervisor FR]

risk · June 25, 2021, 2:23pm

… and the irony is, we need snapshots for when our cards die.

background

Home Assistant snapshots without temporary files · Issue #421 · sabeechen/hassio-google-drive-backup · GitHub

https://github.com/home-assistant/supervisor/issues/new/choose - directs to here

The basic idea is to stream the snapshot data from the home assistant supervisor into an addon, that would then implement whatever api this remote storage/cloud/file locker thingie provides.

… and all this without having to write new local snapshot data onto the sd card, … which is both a slow and wasteful operation. Today, to make e.g. a 1GiB snapshot somewhere, you need to write 2GiB of data onto a card – and eventually this snapshot is just deleted, a whole lot of work for nothing.

I’m positive that a supervisor change would be needed, as well as some kind of change to core probably, and the addons would need to support reading this stream eventually.

I’ll start with the change, and see how far I can get and if I can get a proof of concept working over the next couple days while I’m on vacation: Home Assistant snapshots without temporary files · Issue #421 · sabeechen/hassio-google-drive-backup · GitHub

Also, I still have some open questions and I’ll definitely need some guidance along the way.

CaptTom · June 25, 2021, 4:20pm

YES!!!

I agree it’s a bad idea to write the snapshot to the OS drive. For one thing, it’s often an SD card and SD cards are known to fail after a certain number of writes. A bigger issue is that, in the event you need to restore the snapshot, there’s a very good chance the OS drive is either corrupt, not available or otherwise inconvenient to read from.

PLEASE READ THIS before commenting: Yes, I know there are add-ons which copy the snapshot to NAS or cloud storage after it’s been written to the local drive. These are great add-ons, and I’ve used a couple of them. But they do not solve the problems above.

I think the simplest solution would be to just allow the user to specify where to create the snapshot in the first place. Maybe on a NAS on the local LAN. Maybe the device (like a laptop) you’re initiating the snapshot from. Maybe via API to cloud storage.

Anyway, thank you for submitting this feature requst!

rccoleman · June 25, 2021, 5:56pm

I think there was some talk of using Nabu Casa for snapshots at some point. Here’s the FR: Nabu Casa - Configuration Backup

And I made this comment back then:

Paulus was just talking about cloud storage of the config via Nabu Casa being a distinct possibility on Frenck’s livefeed. Sounds like he wants it to happen.

I haven’t heard anything recently, though, and a simple implementation where the snapshot is written locally and then uploaded doesn’t address this specific issue.

ludeeus · June 25, 2021, 6:03pm

https://developers.home-assistant.io/docs/api/supervisor/endpoints#snapshot /snapshots/<snapshot>/download is what OP is after, that request can be streamed from the supervisor to any target the add-on that does this requests to write to, local/remote file or remote storage service.

risk · June 25, 2021, 7:50pm

This is what the google drive add-on which I’m using uses underneath to fetch the snapshot, unfortunately it requires a pre-existing snapshot tar file to already exist.

I’d like one to be put together on-the-fly, without being written out beforehand.

ludeeus · June 25, 2021, 8:09pm

That will mean storing/constructing it in RAM, with the sizes snapshots can become, and the fact that one of the most popular devices only have 1GB, that approach will not be viable.

risk · June 25, 2021, 8:31pm

It’s not required to store the entire snapshot in ram just so it could be written out of the socket.

The creation of the tar file is more interesting - tar file format is “quirky” (e.g. header of file1, file1 data, header of file2, file2 data … with headers containing sizes of file data that follows). Some of the archive members could be multiple gigabytes (addons with lots of data, or the recorder / history db), but if archive member files themselves are split into e.g. max 10M chunks, that would make snapshotting doable with just the overhead of this 10M buffer.

Doing this splitting would mean optionally introducing a non-backwards compatible change i.e. snapshots created in this mode might only be readable by newer supervisor versions because of the reassembly needed; probably not a big deal.

We could also maybe change from .gz to .zstd and/or maybe .xz while at it (different tradeoffs of cpu vs. network costs)

ludeeus · June 25, 2021, 9:04pm

Oh, so you want the supervisor to stream the creation over the socket in chunks while it’s constructing the snapshot, I guess that could work, but at that point the supervisor no longer manages it, so why should it even create it? The add-on can access the same things as the supervisor and just do that internally, no need to adjust the core or the supervisor for that.

risk · June 25, 2021, 9:29pm

Theoretically, yes - but it’d be nice to make the usual / canonical way of taking snapshots better and the canonical format better - so we could keep as much interoperability as possible.

Streaming the file over http is just one option, and to be honest I’m not even sure it’s a good one. I’ve been thinking about maybe:

passing file descriptors over domain sockets.
TBH I was expecting supervisor to be exposing its APIs on a regular domain socket when I first started looking at this, I was thinking it’s easier to secure, I was surprised that its http
creating stub snapshots, that don’t have the actual content, but reading them from a named pipe.
add-on could create a named pipe, and ask the supervisor to write a snapshot to it (possibly easiest)
add-on could pass some PUT compatible http endpoint.

All these could theoretically work, I’m guessing 3. is the easiest.

I’ve been hacking on it this afternoon, and I think I’m about 20% done … it’s going really slow because I don’t have a proper development environment setup for this. I’m trying to use vscode on windows for this … maybe I need to try in WSL.

CaptTom · June 25, 2021, 10:20pm

Sorry to interrupt what sounds like a great brainstorming session with a stupid non-technical question, but…

It sounds like you’re saying creation of the snapshot requires a lot of read/writes to disk (SD) storage; so many that it can’t be done in memory. Without a full understanding of the process, this sounds to me like creating a snapshot on an SD card is even worse than I imagined. Not only is the whole file being written, but it’s continually being read, manipulated and re-written during the creation process. Or did I misunderstand?

And if not, should this be documented somewhere for beginners, so they don’t trash their SD card before they have a chance to learn all this?

ludeeus · June 25, 2021, 10:40pm

It’s not about the amount of operations, but fitting an entire 10GB file in 1GB memory is just not doable, but as noted above there are other alternatives

risk · June 25, 2021, 11:12pm

With TLC flash you get about 300 - 1500 write cycles per cell, with QLC between 100 and a 1000.

Assuming you have a somewhat modern card with decent wear leveling, if you keep your SD card close to full all the time, it’ll last you about a year’s worth of daily snapshots (assuming some other part of the card doesn’t wear out faster).

On the other hand If you have plenty of free space, you end up allowing wear leveling to do its job and you could be fine for much longer.

The irony is that, if you do snapshots more often thinking you’re “safer” … you’re actually reducing time to failure, but it’s not too bad overall.

The more important aspect, from my perspective, is that writes to sd cards are slooooow - much slower than reads - so why do them if you don’t have to?

CaptTom · June 26, 2021, 1:14am

Thanks, but pardon my ignorance, would it be possible to send that same 10GB across the network to a NAS instead of the SD card? I wouldn’t imagine you’d want to create the whole thing in memory, no matter where it’s being stored.

risk · June 26, 2021, 6:36am

Exactly, yes. You already have “the 10G” i.e. “your data”, stored on local storage (e.g. the card). It should be possible to send it to the NAS somehow where it’ll be stored in an easily restorable format.

The current preferred process requires creating a snapshot, and copying it somehow (e.g. through an add-on), and then optionally/eventually removing that snapshot.

The process of creating this snapshot (which itself is a tar file), currently involves preparing its contents using a temporary directory, and then cleaning up / deleting this contents after the snapshot creation is done.

To summarize, to get your snapshot onto a NAS, the current implementation copies your data into temporary tar files, and then creates a local snapshot file, that you can then copy onto a NAS.

I would like to avoid both of these copies and just read the data and transform a small piece at a time in memory into the correct format as it’s being written out to your NAS.

Theoretically you could network boot and run with remote storage and not have an SD card at all and then all of this copying would be on your NAS. But that’s not easy, you could perhaps just run on a faster/better machine? IMHO it’d be better if you could keep running reliably and quickly on small and compact devices.

CaptTom · June 26, 2021, 11:02am

Thank you for explaining it much better than I could!

I assumed all this talk about needing memory to process the snapshot involved something like you describe; collecting the data and manipulating it to produce the tar file.

I suspect this would be even worse than just writing the finished file there once. Presumably there are multiple read/writes of intermediary files before that final copy is (again!) written to SD. All of which are slowing down the process and reducing the life of the SD card.

Maybe what we need is a total re-think of the process. Personally, I just run a sync program which keeps a local directory structure on my laptop in sync with whatever’s on the SD card. I know it’s not the best solution - what if some files are open when I run the sync? But even something along those lines would be better than the current approach.

risk · June 26, 2021, 12:20pm

The current approach where there’s integration with the supervisor, can in-theory work around the consistency issues like the one you were describing where you read the files linearly but they’re being written to in random order, or where files are appearing and disappearing while you’re archiving them.

This is typically an issue with any kind of database - there was a patch implemented about a week ago to add HOT/COLD snapshotting of addons:

github

supervisor/addon.py at 564e9811d0f594d95152f298514e2a1942dbad67 · home-assistant/supervisor · GitHub

In COLD backup mode the addon would shut down while being snapshotted to avoid these modifications, and would then be restarted afterwards.

Because this is a common problem with databases, some databases have evolved to not require a shutdown, but they just need to be put into a special “mode”, which then makes the naively taken backup inconsistent, but recoverable. In this case the database software would then replay the undo/redo/write ahead log depending on the implementation when starting up from these “dirty” files in order to get to a consistent state.
Some databases like influxdb TSM and rocksdb (e.g. you can use it with mysql as a table engine) … store data in LSM trees where write-once key-value files are layered on top of each other, and then periodically compacted/merged and then this old pre-merge data is then just deleted as a whole file once no longer needed. No data is ever overwritten, all files are append only. These are more resilient to naive copy backups being taken as long as you do it in order and you don’t shard layers across multiple files, or interleave writes across files - which is easy to do since the last layer is basically a journal.