Home Assistant OS Release 8

Thanks, and yes it’s not my first sd card that’s been killed by hassio. It did surprise me a little because it was a relatively new (less than a year) Samsung pro endurance. On my todo list is to boot from something less susceptible to this issue like my nas.

I was running 8.0.4 on a NUC i3 with a SATA SSD m.2 2280 using HA 2022.8.6 and it was running fine until I updated to 8.5 this morning as it was in the settings update list and now the NUC just keeps rebooting after its starts to load the docker apps. I am watching on a console that is connected to the NUC but it’s too fast to see wher its crashing.
How do I recover? I have a full backup of HA done with HA backup at midnight last night but the OS no.
Thanks

(Heavy) write amplification will kill all flash storage much faster if they don’t have counter measures like “pseudo ram” (often SLC flash) which is typical for SSDs to avoid “badly” configured OSes one could argue.

Problem is that typical no sd cards, usb sticks, eMMC or other relatively cheap flash storage (the ones people usually use together with there relatively cheap SBCs) does feature such “pseudo ram”. This types of storage just need a proper configured filesystem to don’t die like flies :mosquito:.

As a crude visualization what the harm of this 5 second flush interval is that HaOS uses (and that is a relic from the stone hard drive age) one can imagen that a flash storage is a big block of paper :newspaper_roll:. Normally one would try to put es much content/data on one page :page_facing_up: as possible to don’t “waste” that valuable resource. And here is what HaOS does - it just uses a new page every 5 second even if there is only one word (or even less) written on the page. :roll_of_toilet_paper:

In a flash storage one of this page typically holds like 8 or 16kb and all this space is not only wasted when only a few bytes get written in this 5 seconds much worse because the flash storage is worn like if would have been 8/16kb written - so it ages much faster and therefor dies much earlier than necessary. This is (very simplified) write amplification in a nutshell :coconut: and something every OS that is made for SBCs should know about and take the necessary (easy) counter measurements. Maybe @agners want’s to comment on that why HaOS isn’t (or doesn’t want to be?) flash friendly? :handshake:

you are not the first one mentioning this 5 second flush interval
it’s also mentioned here

And it might be true, But maybe there would be a reason for this interval, something we are not aware of,
but now I’m just curious.

what actually did wonders for me was to use the suggested SD card mentioned in the installation manual.
I broke 2 SD cards before and none since.
also raspbian native will work much faster with those.
What I’m wondering about is if it would do anything for performance as will if the flush interval gets increased

The only reason is that this is the default setting from a filesystem which was developed more than 15 years ago at a time spinning hard disks were still the standard and SSDs hadn’t yet hit the consumer market.

It’s quite easy: Best is to avoid any “re”-branding and only buy your products from brands that are flash manufactures - I think there is about 3 or 4 companies left in the whole world.

If you want to run your operating system from sd card you want a high i/o performance which is not in par with advertised high read/write speeds as they always measurement for sequential read/writes :warning:

So for example the “Samsung pro endurance” card which @zagnuts used is the wrong kind for a operating system. It claims “high speeds” but that is only true for continuous writing large chunks (like videos or images for example) on the other hand this card doesn’t even mention the i/o performance. :-1:

The go-to cards are ether A1 or even better A2 rated cards for the intention of running a OS on it:

image
Application Performance Class | SD Association

Still - if the card is rated A1/2 or not the write amplification with a 5 second commit interval will kill it much faster than necessary :hammer:

Just as an idea what is possible with the “right” filesystem settings: I’m using some SBC’s with armbian (which comes with correct settings to don’t kill the flash) and they run on cheap (sub $10) 8/16GB SD cards (A1 rated) that are up to :six: years old. Yes, running 24-7 on the first card I installed it - I expect it to continue to run for another decade or two. It might even that the sd card will survive the SBC (that probably degrades faster because of the produced heat).

Yes, A1/A2 cards are “only” about performance not about life expectancy if someone kills the valuable flash cells by flushing a page every 5 seconds :man_facepalming:

No, this setting is not about performance but only to get the minimum wear out (maximum life time) out of flash storage. The commit/flush interval is actually the maximum time to wait for a write to the physical storage. So the system essential waits a maximum of the configured time (default 5 seconds) before it writes to flash (the minimum possible for flash is one page).

If instead of the default - a interval of e.g. 600 seconds is chosen the system will accumulate all data that is “ready” to write to the flash till it is enough for a full page write (that’s the optimum because no extra wear on the flash) or till the “time” is up - in that case it writes to flash after 10 minutes even when there is not yet enough data (typically 8 or 16kb) for a whole page.

The reason this is the default was that hard drives can allocate each and every bit so they don’t suffer from write amplification at all.

But one thing is to keep in mind: A potential power failure can cause data loss up to the configured interval. So if the system keeps the data which should be written to the storage for 599 seconds in RAM and wants to write it in the next second but the power fails all that data from the last 10 minutes is lost. :put_litter_in_its_place:

That said it is a very minor problem - instead of buying a sd card every year or spending a fortune on “industrial grade” sd cards one can just invest a few bucks in a power bank which can be charged and discharged at the same time :battery: aka poor mans ups :zap:

ok, clear,
but at that point this is an ever bigger problem for products like the HA Yellow, because you will need to replace the whole CM module instead of only the SD.

In my link filesystem f2fs is suggested instead of ext4. I’m not familiar with that one. But can if solve the power failure issue?

Indeed, if the flash is embedded (like soldered eMMC or other flash modules) that can be end up very painful. Depending on the device some arm thingies have 2-stage bootloader and depend on a working internal flash to even be able to boot from a external drive and it can be worse at “loosing” the whole SBC indeed.

I don’t have any practical experiences with f2fs and actually only know that much that is made for flash storage and designed for minimum wear. How that is achieved in detail I can’t say but the prerequisite of only writing (at best) full pages on flash storage stays the same.

The documentation

and git

both talk about SquadFS and ZRAM, not sure if there are issues, but I can’t find any mention of ext4.

on the other hand it doesn’t mention grub as bootloader either.

1 Like

SquashFS is a read only filesystem and used to host HaOS itself if I’m not mistaken. Actually there is even two “slots” so in case a update goes wrong the old OS can still be boot - very nice setup indeed.

The ext4 is the writable partition and some times referred as “data”(disk) or “storage”. It is the one that hosts the database, the config files, the backups - all the data that is not static essentially.

This link should show your "storage"partition/disk:

a click on the the three dots top right and another on “move datadisk” did actually mention ext4 when a suitable disk (to move the data too) is found If I remember right.

Also I remember that @agners once mentioned (can’t find it anymore) in the githubs that (back when HaOS 7 was a thing) didn’t exist any plans to move away from the ext4 defaults (that kill our beloved flash so fast) :man_shrugging:

Looks like the docs are not up2date - I think HaOS 8 introduced grub.

Lots of good point, thanks. Fully agree it’s not ideal for a o/s, but I chose the Pro Endurance not for speed but for longevity as in theory they are designed to be able to cope with “write intensive” applications and the price differential was not huge. Real world I found the performance to be perfectly fine, but it still got whacked by too much activity. I’d like to stick with hassio but I’m contemplating moving back to either a different o/s, or even a container under docker so the problem will just go away. In the meantime, I’ll just keep feeding it cards… :wink:

But these “write intensive” tasks are “big junks” or sequential writes (like a video or images). This types of writing will probably have a WAF (write amplification factor) close to 1 because they certainly always will write full “pages” (the host writes will very much equal the flash writes). So this is the perfect scenario for a flash storage as there is virtually no write amplification and you will get the maximum lifetime for your flash storage.

If you look at the details for this card: MicroSDXC PRO Endurance Memory Card w Adapter 128GB Memory & Storage - MB-MJ128GA/AM | Samsung US

Continuous recording up to 25x longer than speed-focused cards¹ gives you long-lasting, best-in-class endurance up to 43,800 hours (5 years)². The PRO Endurance secures data with an industry-leading limited warranty up to 5 years³ and captures surveillance footage up to 128GB⁴ of storage space.

You see that the advertisement really only claims to last longer with permanent continuous sequential writes (WAF = 1) aka as no write amplification :warning:

But that is quite the opposite of what HA does with your SD card! Like the task expected from HA are very small (random) junks of data that a written every 5 seconds and in most of the cases will never fill a whole page and therefor have very high WAF (much more flash writes than the host/filesystem writes)

As an example one might think of a car that claims 1000km with one full tank/charge. This claim is most of the time theoretically - like tested in a scenario with no wind, completely flat surface, a complete empty car (often even without seats and driver) etc. The real life scenario will most likely be smaller than the seller claims - still you might get 80% of it for tours outside the city when one drives conscious. When it comes to the city with traffic lights and lots of stops maybe the range will already degrade to 50% of what the seller claimed. But this scenario is still not what HaOS does with our sd cards. Think about turning your car on and then move it one meter :warning: and turn it off again. What mileage you can expect with a usage like that? Maybe 10% of what the seller claimed - maybe 5%? Maybe less? Most likely your warranty will be void too because you use the car totally out of the specs it was mend for… (sd cards also typically have a “limited” warranty :wink: )

And that is somewhat HaOS is treating our flash storage. The mileage will vary but with a better suited commit interval the life expectancy of our sd cards could be greatly improved (with the same usage pattern!)

That’s really sad actually. Not necessary for you as you are even aware of the problems and that they could easily fixed by software but rather all the other users which don’t have a clue there SD cards are actually killed by a software (miss)configuration in HaOS and that there lifetime could be greatly (many years) higher that they are.

Specially in this times we should be conscious about not generating unnecessary amount of waste - even more when one line of code could fix this for (probably) thousands of users…

recorder:
  commit_interval: 10

That should make a difference.

1 Like

A difference maybe - could even make it worse on the flash layer in the end. In terms of resulting in higher write amplification (more host writes) = earlier dying flash storage.

image

Your setting is on the application level (indeed influencing the db writes on the fs layer) but the “real” problem here is the filesystem (host) level in conjunction with the flash level.

If HaOS (or any program/add-on/etc.) only wants to write 1 byte in a 5 second interval it will cause a full page write.

If we do the calculation and assume a page is 16kb than the WAF (write amplification factor) is 16384 divided by 1 = 16384. This means the host (filesystem) wanted to write :one: byte but on the flash layer it results in writing :one::six::three::eight::four: bytes written and worn. The “extra” 16383 bytes written actually will cause more write amplification in the future because that intended byte needs to be rewritten again to “free” that page for new data. The problem is that a the minimum amount a flash storage can delete is a block which typically consists of 256 pages.

@mobile.andrew.jones the problem here is really the commit interval of the data/storage filesystem that is “burned in” HaOS and nothing you or me can change (and certainly not something that is available in a yaml to configure!)

Do you have proof or actual measurements for these statements?

ext4 is probably a lot less worse than you think. ext4 introduced the concepts of extents, which essentially causes the file system to allocate space in larger, sequential blocks. That, and other optimizations geared towards SSDs (e.g. relatime) makes ext4 not an unsuitable FS for flash storage. After all, it was the default for many Distros even on SSD drives. Also Android used it for a long time (and still is for some configuration I think? Not sure), and Android essentially exclusively runs on flash storage.

Afaik HaOS uses ext4 with the defaults (flush interval of 5 seconds) which will probably cause heavy write amplification for most of the users. :floppy_disk::hammer:

Note that 5s only applies to data which has reached the (ext4) file system level. A write() first causes a dirty page in the page cache, which gets written out every dirty_expire_centisecs (which is 30s). So in absence of fsync() etc, typically, data is only written after ~30s (see also this post).

Afaict, the problems are on higher layers, where small writes to storage are requested. E.g. a simple INSERT INTO statement sent to a SQLite database (using implicit transaction), will cause an actual write to the flash! SQLite is an ACID compliant database, and the D(urability) in ACID requires it to make sure it hits the disk (think, fsync). That will cause writes on F2FS or ext4, no matter what commit interval.

The standard configuration of the recorder and its SQLite database backend has received quite some love in the last few years. It doesn’t cause flushes to disk as much anymore by pooling writes and using bigger transactions. However, there are countless of other places where things write to disk. Just recently ZHA has been found to cause quite some writes (see ZHA database writes (zigbee.db) · Issue #77049 · home-assistant/core · GitHub). But of course there are a lot more (custom) integrations and add-ons which might batter the storage.

In the end the whole Home Assistant platform pushes the limits of cheap storage: We run 24h and have a mostly random write work-load. From what I can tell, the amount of reported broken SD card went down. But as the platform grows older, and people run their installation for longer, reports of failing flash storage will continue to come in.

I am not against tuning ext4 parameters or switching to F2FS even, however, these things need a bit of thought. We cannot leave existing installation stranded. In general, instead of spreading FUD here, I’d prefer a PR from you so we can discuss the merits of a concrete change :wink:

4 Likes

Measurements (in software) are probably close to impossible with (a unmodified) HaOS on flash storage without SMART (like sd cards). On the other hand measuring the WAF on SSDs (which feature SMART) doesn’t make much sense as they mitigate bad filesystem configurations (like a flush interval of 5 seconds) usually with some small amounts of SLC flash which act like a buffer before a full page write to the valuable TLC/MLC/… flash happens. That way SSDs can even achieve a WAF < 1 :muscle:

But consulting technical sheets of a random flash vendor (for example sand disk/wd) could give some clue:

7.0 SYSTEM OPTIMIZATION TECHNIQUES

7.1 Write Amplification

The higher the write amplification, the faster the device will wear out. Write amplification is directly correlated to the workload. This means pure sequential workload will produce the lowest WA factor and random activity will result in higher WA

HA produces a lot’s of this random/small workload that could be easily mitigated by just using a sane commit interval on the filesystem. It’s the really the only “system wide” and guaranteed working solution as it’s close to impossible to avoid random scripts/programs etc. to write to the filesystems layer (which will then at some point cause a flash layer write) when they want :point_down:

Some measurements actually found here which compare the default of ext4 with a custom commit interval of 10 minutes (together with deactivating swap and activating zram). The results:

Before: 8 times within 60 seconds a few bytes were written to the card, now it took 12 minutes for 8 write attempts using larger data chunks

Write Amplification significantly decreased.

That’s a very interesting fact I wasn’t aware at all! Still what makes me wonder if this is the default why (in average) with the tested scenario linked a “dirty” page write was performed every 7,5 seconds and not every 30 seconds only? :thinking:

I think here it is really to distinguish. HaOS certainly does push the limits (or maybe beyond) but HA with another install method and “proper” configured filesystem settings is nothing that needs to kill a (cheap) flash storage fast. My first HA install (for around 2 years) was pip based on a 8GB flash card on armbian installed on a SBC - this system runs till today (~5 years later) with the same sd card and it runs 24-7 (not with ha anymore but some other small stuff).

Wearing out flash fast is nothing that is necessary :warning:

That’s very nice to read :+1: - the last time (I think it was on github) I remembered you writing that you don’t have any intention and want to stick with the defaults (from the hard disk age).

I would love to be able to deliver that but I know as much that I can edit etc/fstab on a linux install and add commit=600. Beside till now I even expect there would been no chance that a PR like this would be accepted.

I am not questioning the write amplification problem. It is real, and I am aware.

What I am question is if the commit interval is the culprit.

It can be a factor, for sure! But in the end its only one variable in the whole system. A commit interval of 10 minutes does nothing, if a SQL database calls a fsync every 5 seconds.

That is an interesting link indeed, we probably can use the script to monitor writes and see what changes are doing to HAOS.

However, I would be very careful to apply their results to HAOS. There are multiple changes done to the system, also swap has been disabled. HAOS has no swap on the storage. Also, they use an idle system as benchmark. That is vastly different from a system which actually does something :sweat_smile:

Lastly: 10 minutes commit interval seems really long! I don’t think that is acceptable in a production system.

What database system are you using on that installation? With just Core thre are less opportunities for writes of course.

I’ll prepare a PR, it should be easy enough. I probably push it to 30s or 60s, I’d don’t want people to loose to much data. In any case, I’ll also make some measurements to see how it affects writes. Just turning knobs without verifying if they have the intended effect is wrong. :no_good_man:

I expect it will have quite some effect on a new/vanilla/non-active installation. However, I expect that my production system (which uses MariaDB and InfluxDB) will not see that much of a difference as these databases likely sync data to disk quite frequently. We’ll see :slight_smile:

1 Like

I expect it to one of the most important one as it “catches” all the little “write” attempts on the filesystem layer.

Indeed, if a flush/sync is called that interval is overwritten - but that’s also the idea behind it actually.

A commit interval of 10 minutes also doesn’t say the system will always wait 10 minutes before it writes. It only says it waits up to a maximum of 10 minutes to write a dirty page - if a whole page can be written after 1 or 2 minutes the system should flush/sync it and that should result in a write to the flash storage. (from my limited knowledge)

That would be indeed very nice to visual the harm (hopefully very little) that’s caused by the default ext4 commit interval and even could show how a increased interval does (hopefully again) less harm/damage to the flash storage by minimizing the wear of flash cells.

That’s flash friendly as it can get regarding swap and a flash storage :+1:
Any idea what is reported as swap on a HaOS system via the systemmonitor integration?

image

Well, I have it on all my system that run from cheap flash (mostly SBC’s with sd card or other eMMC/NAND modules that I don’t want to kill) since years and never had any problems with it. The “only” thing which is to expected that if there is a sudden power loss the amount of data can (in theory) reach a maximum of 10 minutes.

Maybe this would be a good setting to actually let the user choose? Because if some runs for example HaOS with a raspberry but with a SSD that person might wanna choose that 5 second but another user that uses a usb stick or sd card maybe rather wants to choose 300, 600 or even 900 seconds and rather trades (a possible) small data loss with a power failure against a greatly (hopefully years) improved life time of his flash storage?

Beside there is almost no cheaper/easier UPS than for 5V SBC’s. By luck even a random power bank in the drawer allows to be charged and discharged at the same time and has the ability to act as a poor man’s UPS :battery: That way the possibility of data loss at power failure can be greatly reduced :ok_hand:

This particular installation is no more (only the SBC and the same sd card are still in use). But I (since ever) just used the default db (sqlite) which - I expect - most of the users probably do?

Maybe this could be really be a setting? Rather then trying to fit the both ends and choose a value in the middle that doesn’t really suite ether side…

In the on-boarding it could be asked if “cheap” flash storage like a sd card/usb stick/etc. is used or if the system is installed on SSD/HD - based on that ether a high or a low commit interval could be set. Or even have it set “freely” somewhere in the system/hardware settings if the user has the advanced mode enabled.

:muscle:

Do a lot of users use MariaDB? I tried to squeeze it out of the analytics but as this (by the looks) is not listed as integration. I only found SQL on the place 91 (3275 installation or 2.5%) - if that includes MariaDB than it doesn’t look like a frequently used setup (even more if that would be expurgated for all of the install types that only HaOS is left).

So while it might be no big difference for you or other people that use more expensive flash storage (like SSDs which mitigate a low commit interval and also have the abilities like TRIM which sd cards don’t) it would be great if a OS that is (also) available for SBC’s to treat “cheap” flash in a gracefully manner :heart_decoration:

We use ZRAM (compressed RAM block device) as a swap “device”. Essentially trading CPU cycles for more memory.

The add-on analytics says 17.7%, see Installations | Home Assistant Analytics.

The recorder backend uses SQLAlchemy, and I expect that SQLite and MariaDB cause somewhat similar writes (as we pool writes and transactions the same way). Of course there will be some difference, but it won’t be huge.

I only have this produciton instance. I can try to generate some data on test instances running SQLite, but it won’t be fully representative :slight_smile:

2 Likes

Hi, i recently migrated my installation from an sd card to the internal ssd of a generic x64 machine (an old laptop), and now the GRUB will not start the OS automatically, i have to manually hit “enter” to boot it, does anyone knows how to fix this?

the boot options editor is mostly done from the OS when booted. I tried to use the grub command line too, but cant seem to persist any edit i do for the boot menu. So you’ll need a ssh connection to the host when booted and then use the edit options from the os . Youll need to locate the grub.cfg file afaik.

1 Like