HA Suddenly stopped working

Erates · February 18, 2020, 7:21pm

This evening I came home and noticed our covers were not going down. I opened the HA interface and noticed a popup dialog stating some .js file could not be loaded. Then I noticed none of the buttons worked anymore, so I restarted my RaspberyPi 4.

But now it doesn’t boot anymore. The interface does not come up and none of the automations are working. I can ping the PI, but SSH is not working.

When plugging in a monitor I see the same log returning but not in this order:
docker0: port 1 entered blocking state
docker0: port 1 entered disabled state
docker0: port 1 entered forwarding state
IPv6 ADDRCONF(NETDEV_CHANGE): link becomes ready
IPv6 ADDRCONF(NETDEV_CHANGE): link is not ready
and then some audit logs type 1700, 1300 and 1327

Any help please, this is the third installation that becomes corrupted by doing nothing special.

Kr,
Erates

Vasco · February 18, 2020, 11:29pm

Corrupt SD card maybe? HA is known to burn through SD cards

risk · February 18, 2020, 11:56pm

it makes no sense to “like” a post about corrupt sd cards, … but yes.

only to add, if you’re using balena etcher, make sure it successfully verifies the new image that you flash onto it… alternatively if on linux compare the checksum of the file with checksum of that many bytes dd-ed.

Erates · February 20, 2020, 4:02pm

Probably yes, but I don’t get it. A friend of mine is using the same setup (older Pi though) and is using HA without any troubles. So why does this happen to me for the third time…

Yes, I did that the past 3 times.

Is there any option that I can use to not burn though SD cards?

risk · February 20, 2020, 9:30pm

After 2 dead cards (2months and 4 months lifespan) I’ve done the following:

bought a Samsung Pro Endurance card (built with older but more robust MLC flash, there’s SLC cards still on the market but I couldn’t be bothered to spend more time looking)
I changed snapshots to daily instead of hourly.
I disabled recorder by default and built a short whitelist of entities, I fully rely on influxdb for history

So far it’s been running fine for a year (I had 1 breakage, but was able to reflash and recover). Previously with those broken cards etcher would refuse to finish writing the whole install. I had to use scissors in those and throw them out.

petro · February 20, 2020, 9:34pm

You had hourly snapshots??!?!?!

risk · February 20, 2020, 9:46pm

Yes, this was before I’d learned how they were implemented, and how heavy weight they are - I’d set it up to save 24 hourly, 1 daily for 7 days , 1 weekly for forever.

I’d naively expected they were using some kind of device mapper / lvm / … mechanism, and my Google drive add-on would do the reading, archiving and upload, and would discard the snapshot eventually.

Turns out it’s implemented using docker save and every snapshot actually makes at least 1 copy of everything, so if you have historical database data, that rarely changes it would still copy it over and over and over again, every time it snapshotted. My sqlite db for history was 6GB/week, so … microsd wore out. Influxdb is much more compact.

Erates · February 21, 2020, 7:49am

Does the influxdb run on the SD card as well?

risk · February 21, 2020, 8:56pm

Yes, but it seems much more light weight, both in bytes as well as in duty cycle. And if I want a temperature diagram for the last 6 months on a graph, I can actually get that in grafana in less than 10s, on that same pi.

I figure this has to do with both the fact data is stored in compressed columnar tsm format, compared to what ha does by default. Ha has to fetch sqlite database rows, and convert them to Python objects using sqlalchemy, just so some json could be extracted, then parsed, then organized onto a list of values for graphing. It’s actually surprising it stays working.