HA crashes after migration to SSD

Fete · June 26, 2022, 9:57am

Hi everyone!

I migrated HA to an SSD yesterday, and it’s not working very well at all… The issue is that HA feels really slow, it don’t start addons on reboot and it crashes after sometimes 5-10 minutes after restart or it could last some hours.

I followed this guide as for the migration.

Went pretty well, except that most of my addons was not carried over and I had to do a partial restore on them. I also noticed that the file size of the backup I took just before changing to SSD was much larger than the backup that was created when I setup the Hassio Goodle drive backup addon. The first one was 297,3 MB and the one created after I restored was only 106 MB. To me it indicates that not everything was “carried over”, might have been the missing addons… Or something else, which is causing this problem!

I have tried to search the logs in HA after I restart it (have to be done by pulling the power to the RPI) but I can’t find anything relevant, and perhaps I look in the wrong place.

Connecting a monitor and a keyboard to the RPI gave me this:

I performed the “supervisor repair” yesterday and that seemed to do the trick, but apparently not… The RPI is unresponsive and has to be rebooted by cutting power.

When HA is up and running, I can access the RPI through the SSH terminal, if that could help with troubleshooting.

While typing this, HA went down and was unreachable and the RPI wouldnt take input from the keyboard. Jumped back up by itself for a while, and is now down again… I can however access deconz from VNC viewer. Sometimes that’s not possible and all my switches, lights, sensors etc won’t work.

BTW: Is there a way to access the RPI through VNC on a windows PC? I know it can be done through HA and the terminal add on, but since it’s going down, that method is unreliable…

System setup:
Raspberry Pi 3b+ with ethernet cable connected from router.
Official power supply 5.1V / 2.5A
Conbee 2 (no extension cable)
SSD Kingston SA400S37 120GB (on extension/data cable)
TooQ case for SSD
Home Assistant Core 2022.6.7
Home Assistant Supervisor 2022.05.3
Home Assistant OS 8.2

The RPI power checker in HA shows no problems, solid green light for a week back.

System health in HA (when it’s up…) gives me this:

|Version|core-2022.6.7|
| --- | --- |
|Installation Type|Home Assistant OS|
|Development|false|
|Supervisor|true|
|Docker|true|
|User|root|
|Virtual Environment|false|
|Python Version|3.9.12|
|Operating System Family|Linux|
|Operating System Version|5.15.32-v8|
|CPU Architecture|aarch64|
|Timezone|Europe/Stockholm|

### Home Assistant Community Store

[HANTERA](http://homeassistant.local:8123/hacs)

|GitHub API|ok|
| --- | --- |
|Github API Calls Remaining|5000|
|Installed Version|1.17.2|
|Stage|running|
|Available Repositories|1062|
|Installed Repositories|6|

### Home Assistant Cloud

[HANTERA](http://homeassistant.local:8123/config/cloud)

|Logged In|true|
| --- | --- |
|Subscription Expiration|1 januari 2018 01:00|
|Relayer Connected|false|
|Remote Enabled|true|
|Remote Connected|false|
|Alexa Enabled|false|
|Google Enabled|false|
|Remote Server||
|Reach Certificate Server|ok|
|Reach Authentication Server|ok|
|Reach Home Assistant Cloud|ok|

### Home Assistant Supervisor

|Host Operating System|Home Assistant OS 8.2|
| --- | --- |
|Update Channel|stable|
|Supervisor Version|supervisor-2022.05.3|
|Agent Version|1.2.1|
|Docker Version|20.10.14|
|Disk Total|111.1 GB|
|Disk Used|9.8 GB|
|Healthy|true|
|Supported|true|
|Board|rpi3-64|
|Supervisor API|ok|
|Version API|ok|
|Installed Add-ons|deCONZ (6.14.1), Grafana (7.2.0), Home Assistant Google Drive Backup (0.108.2), SSH & Web Terminal (9.0.1), File editor (5.3.3), Check Home Assistant configuration (3.9.0), ESPHome (2022.5.1), InfluxDB (4.2.1), Samba share (9.5.1)|

### Dashboards

[HANTERA](http://homeassistant.local:8123/config/lovelace)

|Kontrollpaneler|1|
| --- | --- |
|Resources|1|
|Views|1|
|Mode|storage|

### Recorder

|Äldsta starttid|17 juni 2022 08:17|
| --- | --- |
|Aktuell starttid|26 juni 2022 10:41|
|Estimated Database Size (MiB)|691.97 MiB|
|Database Engine|sqlite|
|Database Version|3.34.1|

### Core-Statistik

Processor usage:

2.6 %

Memory usage:

23.5 %

### Supervisor-Statistik

Processor usage

30.6 %

Memory usage

6.6 %

Not sure where to look for errors, so any help would be appreciated!

Fete · June 26, 2022, 10:08am

Since HA managed to kick itself backup, I just found some info in the core log. The supervisor log and most of the others can’t be retreieved "[Couldn’t fetch] supervisor -logs, Unknown error, see supervisor logs and also “core_check_config -loggar, 502: Bad Gateway” when trying to get check HA configuration log.

Core log:


Could not fetch stats for core_configurator:

12:05:03 – (VARNING) Home Assistant Supervisor - Meddelandet inträffade först 11:13:10 och har hänt 49 gånger

Timeout on /supervisor/stats request

12:05:03 – (FEL) Home Assistant Supervisor - Meddelandet inträffade först 11:08:13 och har hänt 58 gånger

Failed to to call /addons/a0d7b954_ssh/stats -

12:01:13 – (FEL) Home Assistant Supervisor - Meddelandet inträffade först 11:08:13 och har hänt 7 gånger

The 'discovery' option near /config/configuration.yaml:53 is deprecated, please remove it from your configuration

11:43:15 – (VARNING) MQTT - Meddelandet inträffade först 10:42:32 och har hänt 6 gånger

Invalid config for [automation]: Failed to load blueprint: Unable to find EPMatt/ikea_e1743.yaml (See /config/configuration.yaml, line 11).

11:43:15 – (FEL) components/blueprint/models.py - Meddelandet inträffade först 10:42:34 och har hänt 29 gånger

Invalid config for [template]: [platform] is an invalid option for [template]. Check: template->platform. (See /config/configuration.yaml, line 52).

11:43:15 – (FEL) config.py - Meddelandet inträffade först 10:48:35 och har hänt 21 gånger

Error fetching hassio data: Error on Supervisor API:

11:18:39 – (FEL) Home Assistant Supervisor

Catching up, dropped 1 old events.

11:17:14 – (VARNING) InfluxDB

Updating state for sensor.consumption_56_cost (<class 'homeassistant.components.energy.sensor.EnergyCostSensor'>) took 0.524 seconds. Please create a bug report at https://github.com/home-assistant/core/issues?q=is%3Aopen+is%3Aissue+label%3A%22integration%3A+energy%22

11:12:09 – (VARNING) helpers/entity.py

custom-components/templatesensor - Repository is archived.

10:43:23 – (VARNING) HACS (anpassad integration)

You have custom-components/templatesensor installed with HACS this repository has been removed, please consider removing it. Removal reason (remove)

10:43:21 – (VARNING) HACS (anpassad integration)

Timeout of 20 reached while waiting for https://api.github.com/repos/hacs/default/contents/plugin

10:43:21 – (FEL) HACS (anpassad integration) - Meddelandet inträffade först 10:43:21 och har hänt 3 gånger

Detected code that uses str for device registry entry_type. This is deprecated and will stop working in Home Assistant 2022.3, it should be updated to use DeviceEntryType instead. Please report this issue.

10:42:52 – (VARNING) helpers/frame.py

Can't connect to ESPHome API for temperaturer-vedpanna @ 192.168.8.190: Error connecting to ('192.168.8.190', 6053): [Errno 113] Connect call failed ('192.168.8.190', 6053)

10:42:51 – (VARNING) /usr/local/lib/python3.9/site-packages/aioesphomeapi/reconnect_logic.py

Config entry '192.168.0.2' for upnp integration not ready yet: Device not discovered: uuid:0DE4D3C6-7564-02B9-5CD2-FC75169DF5ED::urn:schemas-upnp-org:device:InternetGatewayDevice:1; Retrying in background

10:42:47 – (VARNING) config_entries.py

Invalid config for [automation]: Failed to load blueprint: Unable to find EPMatt/ikea_e1743.yaml (See /config/configuration.yaml, line 11).

10:42:33 – (FEL) components/blueprint/models.py

Setup of scene platform homeassistant is taking over 10 seconds.

10:42:32 – (VARNING) Scenario - Meddelandet inträffade först 10:42:32 och har hänt 2 gånger

Setup of update platform hassio is taking over 10 seconds.

10:42:32 – (VARNING) Update

Setup of application_credentials is taking over 10 seconds.

10:42:32 – (VARNING) /usr/local/lib/python3.9/asyncio/events.py - Meddelandet inträffade först 10:41:55 och har hänt 15 gånger

Failed intial setup.

10:42:03 – (FEL) airthings_wave (anpassad integration)

Disconnected

10:42:03 – (FEL) airthings_wave (anpassad integration)

Setup of binary_sensor platform hassio is taking over 10 seconds.

10:41:55 – (VARNING) Binär sensor

Setup of sensor platform airthings_wave is taking over 10 seconds.

10:41:55 – (VARNING) Sensor - Meddelandet inträffade först 10:41:55 och har hänt 2 gånger

Cannot connect to InfluxDB due to 'HTTPConnectionPool(host='a0d7b954-influxdb', port=8086): Max retries exceeded with url: /write?db=Homeassistant (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fadd92100>: Failed to establish a new connection: [Errno 111] Connection refused'))'. Please check that the provided connection details (host, port, etc.) are correct and that your InfluxDB server is running and accessible. Retrying in 60 seconds.

10:41:55 – (FEL) InfluxDB

Ended unfinished session (id=143 from 2022-06-26 04:10:37.930239)

10:41:11 – (VARNING) Recorder

The system could not validate that the sqlite3 database at //config/home-assistant_v2.db was shutdown cleanly

10:41:10 – (VARNING) Recorder

le_top · June 26, 2022, 10:26am

I did not follow any specific guide for upgrading my NUC HDD to a SSD.
However, just cloning the HDD did not work well, and I ended up doing a fresh HA install.

There is a method to recover from a backup when using a fresh install but that did not work for me - likely because it was not ready on a USB drive and the setup could not detect it. I did not want to re-initialize the SSD again, so I went for another method.

So as far as I remember, I ended up mounting the two disks in a VirtualBox VM and I copied the partition with all the docker images over to the new disk - replacing the (contents of) the corresponding partition that was created on the fresh SSD install.
I didn’t plug in my Zigbee stick into the new setup until I was sure that it was running (to avoid issues related to internal packet counters being out of sync).

Moving from an SDCard to an SSD on a raspberry pi (which I suppose you are doing) changes the boot sequence, but I think that you should be able to restore from a backup on a fresh install or use a method similar to the one I used eventually.

Fete · June 26, 2022, 10:32am

Thank you for your reply!

I did try to restore from a backup on a fresh install made by Raspberry Pi imager on a Windows PC.

I was trying to move HA to the SSD for the purpose of avoiding a future problem with corrupt SD-card which seems to be very common. Mine has been running for about 11 months now so I figured it could happen any time.

Didn’t expect this new problem to arise though…

I guess I cold try to restore from the latest backup again, but I doubt that it will make any difference… I mean, it’s the same info being written to the SSD again…

le_top · June 26, 2022, 10:42am

There is an other ‘restore from a backup’ guide here: How to restore a backup .

I just wanted to share my experience and “not in the book” method.

Fete · June 26, 2022, 11:01am

Thanks! I read that and as it says

" 1) Fix the issue that caused the crash.

There is little point restoring if it is just going to happen again."

I really hope that someone will come along here and see what’s causing this.

I try to deal with the warnings and errors put in the core log, but HA just keeps going down… Frustrating!

The not in the book method you did, how is that different from just cloning the HDD?

le_top · June 26, 2022, 1:10pm

Cloning the HDD also clones the MBR, the partition table, and partitions that IMHO are used for startup, possibly supervisor and some other stuff.

In my case that did not work out - maybe the MBR was not fine, the partition alignment was not ok, or whatever. Cloning resulted in a boot issue on my NUC that is also mentionned somewhere on this forum.

So by setting up a fresh system, the MBR, partition table and other partitions are set up as new, and booting that system on my NUC worked. So by then copying only the last partition where the configuration files, data, and add-ons are located (and possibly also the actual HA code), the boot setup/system is from the fresh install but the data from the previous install.

Fete · June 26, 2022, 4:13pm

I see, thank you for elaborating!

In my case I really don’t want to make another install or restore.

The problem continues, HA goes up and down, up and down… Seems like when it’s up, a certain way to make it crash is to ask for logs or check statistics from various entities.

This is really frustrating and the WAF is seriously damaged.

If anyone could tell me where to look or what I might try, I would be really grateful

Fete · June 27, 2022, 5:49am

Got this, out of memory. Anyone seen that before?

Fete · June 27, 2022, 7:39pm

Now I’m thinking it might be related to the memory swap. Seems to fill up quickly and it’s at 100%.

I will try my best to resolve this issue, and I will post here what fixed it, when I get it sorted it.

My next move will probably be to try and increase the size of available memory for the swap, or try to disable addons and integrations one by one until I find the culprit.

Still grateful for any help!

Fete · June 27, 2022, 7:49pm

Also getting this:

stevemann · June 27, 2022, 7:56pm

I would expect this with a marginal power supply.

I have “migrated” my HA twice. The first time from a Pi3 to a Pi4, and the second to an Intel NUC. In both cases I did a snapshot/backup and on the new system do a fresh install of HA, followed with a restore from the snapshot.

CO_4X4 · June 27, 2022, 8:29pm

I’ve done a few SD to SSD migrations of HomeAssistant and it always works fine - except the one thing you have to be aware of is that your database may have gotten corrupted when you restored because it was backed up while open (you can resolve this by stopping HA via ha core stop and backing up the database and once you restore everything then hand-copy the DB over and start HA again).

For me, this upgrade always is followed by upgrading to MariaDB so that restoring isn’t as much of a potential problem. It could be that you just have some left over junk and a bad DB causing you headaches.

Fete · July 22, 2022, 10:37am

After increasing swap size as per these instructions:
How to increase the swap file size on Home Assistant OS HA is back up and stable again.

Now the swap usage is about 25-30%, instead of 99-100% as it were when HA froze and crashed.

Would sincerely recommend anyone with the same problem to try the above method.

Thank you to all the nice ppl who replied in this thread