Raspberry Pi / filesystem stability

Hi there, I love Raspberry Pi, I love Home Assistant. I like everything except the filesystem stability. I run HA on cca 4 RPis. Everything is working well except those bloody SD cards. I tried lot of different scenarios, but almost everytime I end up with a totally broken SD card and I have to reinstall everything from the scratch. I guess that trouble is in SD cards. I read a lot about similar troubles with RPis - those cards are simply not designed for such a massive read/write.

Also, I have one RPi running Kodi / RetroPie connected to our TV - and this one is booting via BerryBoot from iSCSI target on my Synology server. For Kodi/Retropie it works well, but it’s not so stable with HA.

Anybody have a working solution? Only last think is buying an SSD disk and connecting to RPi, but I don’t like price/power issues/booting/another box… etc.

Any ideas, solutions?

Thanx a lot
Maxim

You are the second person in a week to make this statement. Can you tell me where you get the data from? Its just that many people seem to run fine for months without a problem so I’m not sure if this is unexpected, or the read/write is really a problem.

So, data is comming from my experience. Currently, I have about… eeh. 10 RPis, both at home and work. From RPi’s and making AirPlay speakers, print servers, home multimedia systems, streamers, etc. But somehow I got trouble only with HA. I guess it has something to do with read/write activity - printserver or home multimedia system will not write so much often. And yes, even very small HA setups are fine. But at home, I have lot of sensors and other stuff connected to my RPi and thats the one where I have most problems…

1 Like

Now I found some new Hass.io distributions so I can try this…

I have a Pi going strong for a few years. It boots the kernel off the SD card and everything else is NFS.

You could run NFS through the cloud if you can’t do it locally. I haven’t tried but I’m guessing it would work but run slower.

1 Like

I’ve found that while my first generation RPi has regular corruption issues, none of my RPi 3 units have had any issues. I’ve largely stuck with Sandisk, some Ultra, some basic.

Also, the Pi3 can be configured for network boot (no faffing with BerryBoot) should you wish.

1 Like

I have been looking around to see tools for measuring disk writes. iostat from the sysstat package seems to give i/o rates for the SD card. I get

$ iostat -d -x 60
Linux 4.9.24+ (raspberrypi)     10/07/17        _armv6l_        (1 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
mmcblk0           0.02     2.98    0.26    3.64     2.20    39.44    21.36     0.20   51.09    4.11   54.43   7.11   2.77

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
mmcblk0           0.00     2.98    0.17    3.61     0.68    38.77    20.88     0.07   17.76    1.00   18.54   3.23   1.22

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
mmcblk0           0.00     3.16    0.10    3.61     0.41    39.56    21.52     0.05   14.70    0.00   15.12   6.80   2.53

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
mmcblk0           0.00     3.37    0.05    3.98     0.20    42.74    21.31     0.04   11.01    3.33   11.11   3.40   1.37

Its quite surprising that it seems to report 39 kB/s writes for my Pi. Can anyone confirm this is normal? It would be interesting to compare a Pi running HA and one that doesn’t.

@gpbenton You can check out (link - two thirds of the way down the page) for a speed comparison of various SD Cards. Their benchmarks use “dd” to get sequential read/write speed and clear buffers between tests. Probably best to do after halting HA as that would likely impact results significantly. I’d be curious if SD Card speed has a material impact on real-world performance of HA?

Anyway, I get the following on RPi3 with SanDisk USB boot drive. The first is write and the second is read speed:

pi@hassbian:~ $ sudo systemctl stop [email protected]
pi@hassbian:~ $
pi@hassbian:~ $ sync; dd if=/dev/zero of=~/test.tmp bs=500K count=1024
524288000 bytes (524 MB) copied, 18.4087 s, 28.5 MB/s
pi@hassbian:~ $
pi@hassbian:~ $ sync; echo 3 | sudo tee /proc/sys/vm/drop_caches
pi@hassbian:~ $
pi@hassbian:~ $ sync; dd if=~/test.tmp of=/dev/null bs=500K count=1024
524288000 bytes (524 MB) copied, 15.5142 s, 33.8 MB/s
pi@hassbian:~ $
pi@hassbian:~ $ rm test.tmp
pi@hassbian:~ $ sudo systemctl start [email protected]

I am more concerned with the number of writes, causing wear on the SD card, rather than the speed of them.

OK, as a comparison, a set of results taken before/after starting up HA. I’m getting 15-18 writes per second consistently in steady-state after a big spike during start-up. The steady-state rate is due to a raft of smart switches sending meter readings every 10 seconds. This creates a ton of events all being logged to the ha db.

pi@hassbian:/home/homeassistant/.homeassistant $ iostat -d -x 60
Linux 4.9.35-v7+ (hassbian)     10/07/17        _armv7l_        (4 CPU)

                rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
HA not running    0.00     0.15    0.00    0.20     0.00     1.40    14.00     0.00    0.00    0.00    0.00   0.00   0.00
HA Start-up       0.00   114.13    0.28  134.05     3.80  1336.13    19.95     0.57    4.26    9.41    4.25   1.27  17.12
HA Steady-state   0.00    14.57    0.00   18.77     0.00   187.33    19.96     0.09    4.94    0.00    4.94   1.21   2.27

Repeat of above but with History/Logbook/Recorder/Logger turned-off:

                rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
HA not running    0.00     0.15    0.00    0.20     0.00     1.40    14.00     0.00    0.00    0.00    0.00   0.00   0.00
HA Start-up       0.00     1.00    0.00    1.02     0.00    10.07    19.80     0.02   18.20    0.00   18.20   4.59   0.47
HA Steady-state   0.00     0.33    0.00    0.60     0.00     6.33    21.11     0.00    1.39    0.00    1.39   0.83   0.05

So in a normal steady state system, your machine is writing 187 kBytes/s, whereas mine is 40 kBytes/s. Considering I also run influxdb on my Pi, I find this a little strange.

Do you experience any SDCard failures?

I suspect we have different message profiles. Currently I have a bunch of devices sending meter readings at a high rate into HA with full event logging. Each reading is around 5-6 messages (current, voltage, power, energy, …) and these are all adding up to a ton of database writes. Whereas you may (?) have a less “chatty” network albeit with two sets of writes (HA and InfluxDb). I have Influx running on a separate server so the RPi isn’t having to carry that load as well. So the numbers could be right.

For me, SD card issues have generally had more to do with unexpected power outages leaving the sd card in a corrupt state rather than writes wearing out the card. I had a power outage recently that messed things up and decided to move to an RPi3 with a USB flash drive mounted instead. Not sure it will make a difference but will see. Best is a sound backup strategy of course. 8)

1 Like

sound backup? :slight_smile:

You gotta love the global community. I re-read that sentence and did smile at how it could be misinterpreted! Of course, “sound” is Irish for “robust”. In my case, backup strategy is an emergency crate of Guinness to help me mourn the lost data!

2 Likes

I live Guiness and Ireland! By the “sound backup” I imagined my old Atari or ZX Spectrum connected to tape recorder and making backups. Yep and now it’s the time to open my Budweiser Budvar, cheers from Czech!

I’ve seen similar problems, already two cards burned by Hass in about 1.5 year.
From what I’ve seen, the root cause is not the write speed of the SD card, but rather the use of sqlite as a database. Sqlite does a filesystem flush after every change to the database, which not only alters the blocks of the database on disk, but also the filesystem metadata of the sqlite database. This prevents the Linux kernel from grouping and optimizing disk writes, but on the other hand it makes sure the on-disk database will always be consistent and up-to-date.

Now the weak point of flash media is not in the write cycles, but in the erase cycles: flash blocks wear from erase cycles. Flash media controllers try to even out the wear by shuffling logical blocks around, but there’s only so little wear leveling an SD card can do. Given the amount of writes (and syncs) you get with a Hass system with many events, you can imagine the amount of wear the SD card gets is beyond the normal casual use the SD card vendors expect.

I have been thinking how to solve this for quite some time, and I’ve come up with some possible directions for solution. Here they are in random order:

  1. Create a tmpfs to put the sqlite database on it. Works great, but the database will not be persistent over crashes or reboots.
  2. As 1, and have a cron job copy the database back to disk every an hour. At boot, copy the database from disk to tmpfs, start hass, and off you go.
  3. Put the sqlite database on NFS storage backed by something that’s not flash, or by a lying NFS server telling the client “yes, I have synced your blocks to disk”, while at the same time finding a good time to actually group that bunch of writes to disk.
  4. Tell Hass to use an in-memory Sqlite database. Same as option 1, less fiddling, but not evern constistent over Hass restarts.

Configure like this:

recorder:
  db_url: sqlite://

So far the easy options that don’t require any changes to hass.

A solution that needs coding is to tell Hass to put the sqlite database in memory (a RPi 3 has sufficient memory for that), and have Hass backup the database to disk once every hour (or on whatever configurable time interval). This is very easy to do in C, the Sqlite documentation even has example code. Unfortunately I haven’t found a way in SQLAlchemy to get to the underlying Sqlite connection, and even if I got, the pysqlite library doesn’t support the backup functions the C library does.

Regards,
Erik

Would switching the database to MySQL reduce the number of erase cycles?

A good way to go might be to just have the DB get written to disk when doing a reboot.

Though you’ll have to be careful because a HASS database get can get hundreds of MB in size pretty easily.

I don’t know if switching to MySQL or MariaDB reduces the number of erase cycles. I think it does because in my experience MySQL can be tuned to reduce the number of disk updates with the disadvantage of a corrupted database at a crash. Would be worth trying.

Another possibility would be to use a MySQL database on a remote host.

Regards,
Erik

After burning my first SD card within a few weeks of running hass I installed a 1,8" USB hdd to my rpi3. Now I use the sdcard only for the boot partition. USB boot did not work for me. Running everything from HDD feels faster than my SD card.

Today I would just by a cheap USB SSD.

1 Like