Home Assistant constantly losing connection - pls help

I am running hass os on bare metal. As far as I can tell, there is no UFW. (at least there is no ufw binary and nothing about it in ha network)

Also, I can access the web interface at times. Today, it works fine. Yesterday, it didn’t work all day. I didn’t change any settings between last night (when nothing worked) and this morning (can still access it since my first post today, so it is stable at the moment) – yet now I can use the web interface and yesterday I could not.

If it were a firewall thing, it should either work or not work at all, but not work sometimes, right? Unless the firewall settings were to change in between.

Both devices are on the same VLAN, so there is no port restriction issue. This is really strange.

Stupid question, but did you try to change the Ethernet cable ?

I assume you are accessing through Ethernet, by that excluded WIFI causes ! ( Don’t use your phone ( or WIFI ), to “troubleshoot” connection issues, unless it’s only for this specific Device you are troubleshooting )

The only thing i see in the HOST log, is the Journal-rotating, could be doo to extensive logging

UniFi Add-on, yes i see you have alot you “control” through HA, personally i prefer using HA for iot-devices/automations.

Network-admin such as Router/switches/VPN/Firewall etc. , i don’t load my HA system with ( The less the better, both in performance and stability perspective

So when i eventually “lift” cameras out, i.e i would have Frigate on a standalone Unit also

The 2 sec, delay for turning on a simple light, could be doo to Cloud-Connection.
I have no such experience for my lights/switches/motion-detection/door-sensors etc (including the automations, which controls/assist most of these , and i do find 2 seconds in the “upper limits” of what’s acceptable !
I.e a light should turn on within a second after motion-detected, and a “manual-switch_toggled-through-UI”, should basically be in a split second ( thou this in most cases requires, local-controlled only, no cloud dependencies for involved devices )

As you have struggled with this , for more than a month, i understand your “frustration”
So in such a situation, i would try to “clear my head” , Update HA to latest and greatest !, And then reboot the HOST.
Then Checking all logs ( Even on your Router ), after fresh Reboot, and again when the the “issues” starting to appears, look for time-outs, connection-issues, and don’t have i.e Glances-Monitor or Ping-integration running etc.

I would look at either( all of ) Network issues, CPU/Process/MeM usage, DB-Connection usage ( i assume you also have checked you Maria-DB’s logfiles ).

If there is not “significant” delays (2 seconds Are significant in this context ! ) when you ping, or connect through SSH ( Through Wired-connection ) , it sounds like either network “over http/https” I don’t know what you use, If it’s Https (which i don’t use) , you might have to “expand” your troubleshooting area.
Or i could be doo to Core-Front end is “busy”, shuffling datas, doo to “inappropriate” templates and/or API-calls ( And you have quite some Add-Ons, which i don’t use/load HA with )
Nginx/rest Docker - Cloudflare - Unify , PfSence , ( Do you also have Nabu-Casa ?)
Mosquito-Broker - ESP-Home -Zigbee2Mqtt
Frigate

So i guess you have lots of log-files to “check”, and i.e enable “DEBUG” -level on, if there is any indications is standard logging-level , for these Add-Ons

Sorry i can’t be of more help here, it’s already 1 month, and i haven’t got more info( Beside all the “new/added” information in regards to your “complex/stuffed” setup/ha-installation) than i could have “collected” within a day, on site. Sitting in the other “end” just makes it frustrating and cumbersome

Maybe you have “build up” something that overcomes your oversight

BR

Yes. It was actually a crappy one (temporary fix I forgot about). After replacing it with a decent cable, there was no change. Things worked fine, until they didn’t.

Right now, I can access the web ui, but things load really slowly – to the point where I cannot do anything with HA because by the time, something loads (for example, creating a helper), minutes have passed. In fact, just now it has lost connection in between me clicking on “add helper” and it trying to load the popup that is supposed to load within a second.

Is this something?

24-03-25 13:02:23 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
24-03-25 13:02:23 WARNING (MainThread) [supervisor.misc.tasks] Watchdog missed an Home Assistant Core API response.
24-03-25 13:05:49 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
24-03-25 13:05:49 INFO (MainThread) [supervisor.auth] Home Assistant not running, checking cache
24-03-25 13:07:16 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
24-03-25 13:07:16 WARNING (MainThread) [supervisor.misc.tasks] Watchdog missed an Home Assistant Core API response.

These are all from the SUPERVISOR logs. The IP address there must be internal only. This is an actual error from now where I have no access to the web ui.

I am connected directly through the local network. hassOS on ethernet, my PC on ethernet, same switch, same VLAN. I use the phone through wifi to debug as well, because if ethernet worked and wifi wouldn’t, I’d know there might be something to investigate. But there is no difference between ethernet and wifi.

Yeah. That’s how it always was. Blazing fast. Lights were zigbee2mqtt, controlled locally. But even when controlling from on the road (via VPN connection, not nabu), things worked decently quick. This all just happened recently.

There doesn’t seem to be any network activity on the host at all.

I tested this by curling a website from that host in a tmux split. Then, there would be small numbers. So while I am currently trying to access HA, there is absolutely no I/O activity. But there should be, correct? It cannot serve the pages without it.

I don’t run pfsense on the hass machine. It is a dedicated firewall. When testing this, I deactivated the addons via ha addons stop <name> through ssh. I deactivated all of them except for ssh (obviously), it did not make a difference.

Just checked the pfsense logs. There is nothing about that hosts ip address in there except for DHCP requests. No blocks, no filters, nothing.

I got rid of addons that I thought I wouldn’t really need - or at least could do without: piper, whisper, openwakeword, vscode.

At the moment, only official addons are installed. This doesn’t have to mean anything, but the way I see it - aren’t the Home Assistant yellow / blue devices sold by (and therefore, I assume, recommended by) the Home Assistant team not “just” raspberry pis with additional hardware? Yet, these addons are official… I see this as “these addons should run fine alongside hassos on our recommended hardware”. Maybe that’s wrong.

But then there’s two other things: 1) This is an intel NUC, not a raspberry pi. There’s 32GB of RAM. There’s an Intel(R) Core™ i7-8809G CPU @ 3.10GHz with 8 cores (if I read the output from cat /proc/cpuinfo correctly). Shouldn’t this be enough for hassos with a couple of officiall addons?

And then, perhaps more importantly, 2) this system has been running perfectly fine for years. Yeah, I sometimes add (also remove) integrations and (less frequently) addons – but it has always been working quickly as expected. No lag, no being logged out of the web ui, no losing connection (not even with wifi).

The heavy lifting for frigate is done by a google coral USB stick, but I stopped this addon as well to see whether this would make a difference. It didn’t.

Now, typing all this took me some time. The helper entity from the beginning of the post that I tried to create finally got to the point where I could select what type of helper I’d like to create; then (now that I just checked) it lost connection again.

Btw., everything is already up to date. supervisor 2024.03.0, os 12.1. All addons and the HACS stuff are up to date.

So I’d understand if I had just installed Home Assistant on this device and nothing would work. Then I’d say “it’s too much work for the device”. But again, the device has been working fine like this for ages. I even disabled integrations that have been working without issues just to see if it was going to work as a “bare minimum” (only addons and integrations I really need to use on a daily basis, no fun / experimental / useless stuff). Even that didn’t make any difference. Things are disabled atm., so the system should run faster, if anything. But the issue is still there.

But also, I should mention again that, often, Home Assistant itself works fine. I am not able to access the UI, but walking in the hallway in the evening will result in the lights automatically turning on. While going crazy not finding a solution for this, I will get a signal notification (including multiple camera snapshots) when the front gate opens.

Hopefully, this log I posted above gets us anywhere. I am currently working on a fresh configuration in a dockerized environment on my PC. So if nothing works, my plan is to set up a fresh install on the actually hass machine and have everything relevant ready to copy there. I also have daily backups, but installing them doesn’t seem to make any difference, either.

So while a fresh install might be the best I can do, it is also kinda risky. If there is anything wrong with my current configuration, I cannot use my backups – so I’ll have to start fresh from scratch and do everything (connecting services, create those automations and helpers that aren’t in package yaml files) manually.

Looks to me that your are showing Disk Usage, and Not network Usage

More than enought, frankly “OverKill” , im sure you’d be better of with 2 -max 4 CPU’s , and 4 -max 8GB RAM

I kinda wonder if you are taking the wrong approach here and rather than looking at addons and integrations, take a look at your lovelace config instead.

Given that HA itself is up, along with all your other integrations, what do you have in terms of custom frontend cards/scripts?

My thought is that perhaps there is something in the frontend that is blocking or timing out the websocket connection. When this happens, have you looked at the browser console to see what errors might be there? I know the first instinct is usually to check the server, but your issues seem to be client based rather than server based.

4 Likes

Yeah. Damnit. I realized that and already looked into some tools, but those that work (and can be installed on alpine linux, which hass os runs on) and that I’ve tried didn’t give me an option to filter. There are countless connections due to all my local devices, so I could not debug this way.

@code-in-progress ahhh! Thank you so much for this suggestion. I am afraid to celebrate too soon – but I just removed two cards from my “testing” dashboard. One was constantly trying to display a local file as an iframe (which didn’t work), the other also tried to access something on the hassos server. Now, I cannot be sure atm., because these issues come and go without any recognizable pattern… but as for now, everything runs smoothly and just the way I’d expect Home Assistant to work (and as it has).

I’ll report back in case this wasn’t it, but at the moment, it looks like one of these cards, or the combination of both, were causing this issue, maybe causing too much internal traffic?

I tried to trigger my Home Assistant into freezing by doing what I’ve tried to do for days, finishing some simple automations. Nothing so far. It still works :slight_smile:

1 Like

Anytime. Remember that the client side can get just as messed up as the server side can (often times much, much worse). If you see the issue again, drop into dev tools in your browser and look at the Console tab and the Networking tab. Those two tabs will usually suss out any problems with lovelace pretty quickly.

1 Like

tl;dr
Following because I am also running an Intel NUC i3 with native HAOS.

How often does the “Connection Lost” popup present?
Have you tried rebooting your router?
Have you tried clearing the browser cache?
If you have another PC, do both lose connection simultaneously?

@stevemann are you also experiencing this issue? If so, try Home Assistant constantly losing connection - pls help - #17 by code-in-progress this solution.

Constantly. I would open the Home Assistant tab in my browser. Then there would either be my dashboard, but the “Connection Lost” popup would be there, or I would only see the Home Assistant logo with “Trying to reconnect” underneath.

Rebooting the router and/or clearing browser cache did nothing.

I cannot say if connections drop at the very exact same moment, but I can confirm that, when this happens, none of our devices will be able to connect. So let’s say my computer loses connection – if I try my Android, it will not work. My girlfriends Android will not work, either.
So while I don’t know if connections get lost simultaneously, I do know that they must all at least drop around the same time.

That sounds more like a network problem.

So Home Assistant is never online?

@stevemann not network related. The issue has been solved and it’s all in this thread. Lovelace configuration caused, removing the responsible cards worked.

1 Like

Which cards?

One of them was an iframe card, pointing to some file on the hass device. I don’t remember which ones the other were – they were part of my testing dashboard where I test all sorts of cards. I simply removed all that looked like they could be responsible for this traffic.

So let’s say you have a custom card that pulls some sort of data from your local hass server, and a regular entities card… most likely, the regular entities card will not be responsible for the issue.

Which custom cards to you use?

I have had a similar situation. In my case I believe the reason has been a couple of wrong defined templates in the template section. Meteorological Sensors obtained from internet got unavailable/undefined and it seems it was creating the unavailability of the frontend. The strange thing it that those sensors were defined some time ago and I had never have this issue. Seems to me like that new release of HA has acted differently??.

1 Like

Just wondering if anyone else is still having this issue. I lose connection every minute or so. I can get in long enough to see my dashboard, but I get dropped before I can update integrations. Even getting to the restart menu has been a race against the clock. Then after I DC, I can get back in about a minute later.

My automatons are working, and everything in my house seems to be using the network just fine, I just cant get in through the app or url long enough to actually do anything. I’ve power cycled the router, modem, and Pi4 running the hassos just to be safe, but that didnt fix anything.

I’ve tried deleting all the cards I added around the time the problem started, even though I havent been using anything exotic. That doesn’t seem to have fixed anything either…

The what?

Then Home Assistant is running.

Have you tried using a browser on the same network?

Sorry, DC=disconnect. I’m having the same issue on the home network, wifi and ethernet. I’ve narrowed the problem down to the matter integration. If I remove it, the problem goes away completely. As soon as put it back in, I start to lose connection again. Hella frustrating, but I need to dig through the matter section of the forum. Maybe someone has fixed this over there already…

same here… every minutes keep on connecting…

I have somewhere the same issue. After starting up my HA is reachable for about one minute after that he went offline and will be not reachable anymore (not locally and
not via Nabu Casa). However the instance self keeps running, my automations are still working.

I got the loading page which will keeping loading until eternity.