Homeassistant.local messed up after update os from 11.5-»12.0

kalmanlengyel · March 12, 2024, 8:02am

Hi,

during the last two days my loved HA has mucked my life. Two days ago I just clicked to update OS to 12.0, then came the nightmare. I have lost the connection with the local (:8123). I have read tons of topics, I have learnt a lot, but there is no solution.
Firstly, my config: Raspberry PI 4, 2024.2, 11.5 OS.

I took my device, and connected with my monitor and tried find solution via CLI.

Firstly I realized there was no ip address, so it was my mistake, I used a failed ethernet cable. (I switched on the wifi, I could see ip address, but, the 8123 port was not open (Fing is my friend). Later I changed the cable, and voila I got IP address.
I realized the OS update hasn’t been finished, so I firstly tried to upgrade the core, eventually by the end of the day I was successful and update the core to 2024.3. But the 8123 was not open again.
I tried to find the latest working backup, from the last three (March 10(before the core update), Jan 17, Jan 13) the Jan 13 is the only working solution. I used an another SD card, because I wanted to preserve the original setup.
The March 10 took 3 hour to backup, then the same, no local. Jan 17 I was to able to get a login screen, but during the login process it went dark. Jan 13 is the latest working backup (but it means two months data loss).
Then I tried from the working GUI (Jan 13) to restore the Marc 10 backup, but it is not working again, the 8123 port is not open.
The observer port 4357 is working it shows everything is in order…
|Supervisor:|Connected|
|Supported:|Supported|
|Healthy:|Healthy|
The IOS App shows the next failure under the websocket: websocket network nwerror error 53 software caused connection abort.

Any idea before I go back the Jan 13 backup and suffer two months data loss?

kalmanlengyel · March 12, 2024, 6:29pm

Follow up…
I think the os 12 and the 2024.3 update don’t like my existing integrations. Eventually I investigated the supervisor and core logs. Red errors everywhere: my dsmr reader, mosquitto, aqara, philips… Step by step I uninstalled them, firstly the su logs errors then secondly the core logs errors. I have a working UI with the Jan13 backup, this is my starting position, deleting case by case my integrations (the next and hopefully the last is the philips), and parallelly I editing the core.config_entries via vi on the other installation with the non working backup.
So my idea has been changed:

continue my Jan 13 backup and avoiding any update…
fingers cross, my deleting process will lead to positive result…
fresh install, total clean sheet…

stevemann · March 12, 2024, 6:50pm

I won’t rag on you about frequent backups. You’ll do that yourself.
Before you do a restore, save the contents of the config folder. You will be able to reconstruct any automations that you have made since the last backup.

Better yet, use a fresh MicroSD.

WallyR · March 13, 2024, 7:13am

You need to first check that you can connect to you HA installation with :8123.
If you can connect, then see if you can ping homeassistant.local to see if it pings the same IP address.
if it is the same, then the issue is not mDNS and homeassistant.local

If it is not the same, then no backups will help you here, but it is really not that bad.
The .local domain is controlled by the mDNS service. It is a serverless protocol, so the devices are often making up their own list of services based on the packets it hear on the network. There can be mDNS helper services though that keep a list and then “help” other devices get quicker answers.

The good part is that there is nothing that really can be lost in the mDNS setup, because it is just recreated over time. The bad part is that you do not really know where any wrong information is stored in your network.
To solve bad entries in the mDNS services you need to down all the mDNS services at the same time, so no information exist in your network and no replication can occur from other devices.
That means shutting all network devices down and when everything is down then start up the router again and wait for it to be running and then start the rest of the network.

Once the devices are up, then give the network 30 minutes to built the mDNS lists.

kalmanlengyel · March 14, 2024, 7:02pm

I used Fing to see my device had IP address or not, either the port was open. The device had IP address, but the port was open and sometimes wasn’t. The reason I think was the non-working core. The logs (su and core) showed me the core was in infinite loop. My assumption the integrations (dsmr, mqtt, aqara, philips) blocked the core start.

So my solution is here:
I had a fresh SD card with 12.1 OS (yesterday evening update), and upload my non working backup. Then via CLI I used vi editor to delete the integrations from the core.config_entries (dsmr, mqtt, aqara, philips). Suddenly I got a working :8123 and UI, but there was an error: Unable to fetch auth providers. I went back to the CLI and started a core rebuild and after then a core update to 2024.3.0 and voila I got a login page and a working HA.
Today I have managed to setup my integrations again dsmr, mqtt, philips, but aqara… Aqara had some issue, I couldn’t find solution.
All things considered I got 4 days data loss, and some extra honing issue with aqara and my energy dashboard because I had to update my old dsmr entities.