Home Assistant constantly losing connection - pls help

prankousky · March 1, 2024, 12:17pm

Hi everybody,


Core 2024.2.5

Supervisor 2024.02.1

Operating System 12.0

Frontend 20240207.1

ha host info


agent_version: 1.6.0

apparmor_version: 3.1.2

boot_timestamp: 1709205353026628

broadcast_llmnr: true

broadcast_mdns: true

chassis: embedded

cpe: cpe:2.3:o:home-assistant:haos:12.0:*:production:*:*:*:generic-x86-64:*

deployment: production

disk_free: 798.5

disk_life_time: null

disk_total: 916.2

disk_used: 80.4

dt_synchronized: true

dt_utc: "2024-03-01T10:11:17.476817+00:00"

features:

- reboot

- shutdown

- services

- network

- hostname

- timedate

- os_agent

- haos

- resolved

- journal

- disk

- mount

hostname: homeassistant

kernel: 6.6.16-haos

llmnr_hostname: homeassistant

operating_system: Home Assistant OS 12.0

startup_time: 2.864046

timezone: Etc/UTC

use_ntp: true

I am running HASSos on an Intel NUC 8i7 Mini device; it is connected via ethernet.

Never had any noticable issues like this, but recently, my Home Assistant instance seems to keep losing connection to the network. This will happen in the browser as well as on the Android app. Suddenly, I will get the little popup on the bottom left of the browser (or the error message on android) telling me that there is no connection.

Then I’ll wait a couple of seconds, perhaps half a minute - and the connection is back.

So I don’t restart the device, I don’t do anything, really - the issue will fix itself. But eventually, this connection will randomly be lost again. I haven’t been able to find any pattern (for example, this does not happen every x minutes, as far as I can tell). I checked with my router (pfSense) and there is no other device with the same IP address as my Home Assistant machine; I’ve read that something like this might cause this type of error.

According to UniFi, the device has been connected for days. Let’s say I were to shut it down, or to remove the ethernet cable, then this uptime would reset. Since this is not the case, I assume this is some Home Assistant related issue, not caused by my network.

As I mentioned before, this usually didn’t happen before. Unfortunately, I cannot pinpoint what I might have changed (configuration-wise) that might have caused this. The issue has been occurring for a couple of weeks, but I’ve just now had the time to post here.

The log file is huge, and I don’t know what I should search for in order to fix this.

Any ideas and/or suggestions? Is there something specific I should search the log for? Is there some other information I can find and post here in order to get to the bottom of this issue?

Thank you in advance for your help

boheme61 · March 1, 2024, 12:32pm

First you should look at your logs, and post relevant sections here
People can only guessing upon your limited explanation.

You should / could search for i.e “Error” in your log file, or “Connection / Connect”

If it’s HA which have problems, there should be entries saying something like “Can’t connet to / Connection timeout” etc

If it’s actually your devices which can’t connect to HA, you should look in respective devices logs, or monitor you Router ( If it has capable logs ) , or if you have an local DNS ( same approach )

That is Device Specific ( And i dont reconnice this, maybe i use another browser, or android ) , And it’s not a “Message” From HA ( obviously )

prankousky · March 1, 2024, 12:50pm

Thank you.

There is a ton of this

2024-03-01 13:28:00.117 ERROR (MainThread) [coap] Connection loss was not expected.
2024-03-01 13:28:00.117 ERROR (MainThread) [coap] Connection loss was not expected.
2024-03-01 13:33:51.958 ERROR (MainThread) [coap] Connection loss was not expected.
2024-03-01 13:33:51.959 ERROR (MainThread) [coap] Connection loss was not expected.
2024-03-01 13:33:51.959 ERROR (MainThread) [coap] Connection loss was not expected.

Also these

2024-03-01 13:24:56.662 ERROR (MainThread) [hass_nabucasa.remote] Connection problem to snitun server (Challenge/Response error with SniTun server (0 bytes read on a total of 32 expected bytes))
2024-03-01 11:21:31.071 WARNING (MainThread) [zigpy_znp.uart] Lost connection
2024-03-01 13:21:52.454 WARNING (MainThread) [zigpy_znp.uart] Lost connection

So this is device specific… but when I get the error in the browser, I’ll also have no connection on my android app. I also tried not using WiFi, so the companion app would connect through the nabu casa internet URL, but that would also not be available at those times.

I would understand that neither the web browser, nor the android app can access Home Assistant if there is some network issue - but not being able to access the external URL makes this even weirder. Plus, at those times where Home Assistant is down/not reachable, the internet still works.

That one line about nabu casa above would make sense if only the external URL wouldn’t work, by a) this error was only in the log once (and this connectivity issue has occurred multiple times today so far) and b) if that was the only reason, the internal URL should still work.

I don’t use homeassistant.local, I access via the actual local URL; so http://10.0.0.25:8123 in my case. So likely not dns-related either…?

boheme61 · March 1, 2024, 1:07pm

OK, yes, so it seems HA for some reason can’t connect to the outside ( Maybe you can’t even “ping” HA during this “absence” ?

That you have extensive logfiles ofcause indicates problems ( It’s busy )

I suggest you first copy ( home-assistant.log & home-assistant.log.1 )

Then Reboot your HA & Intel NUC 8i7 Mini

Starting with “flushed” Logs ( Makes life easier , and a simple “1st” task, when an Issue is unclear/hard to analyse ) , it also exclude and clear some connection issue.
( You said “recently” , i don’t know this “timespan” but if that is within the past 12 days, it’s another reason to Reboot All )

After Reboot ( Of All ) , you can start looking at /monitoring your logs ( With either “tail” or / “more” ) , or just refresh in HA /Setting/ System/Logs , this way you see when it starts, and what seems to causing it to continue, until it’s unable to respond.
IF it’s unable, from startup, you have to dig into it’s internal network/interface setting, and your NUC’s interface settings( while you at it )
That is ofcause if the logs don’t show obvious signs of the cause

PS: I assume you already have looked at " ha net info "

prankousky · March 2, 2024, 7:09am

Just tested this. I can ping -and even ssh into- the Home Assistant host when this happens. Browser and mobile won’t work, but I can ssh into the machine.

Instead of just rebooting via the webinterface, I shut down the NUC, waited a couple of seconds, and manually turned it back on. At first, everything was super fast (this would usually not be the case when just restarting like I’d usually do).

After rebooting, I checked the logs for “lost”, “connec*” (connected/connection)… nothing that I can make any sense of. These entries usually have to do with coap (philips air purifier HACS integration), zigpy (I guess zha/z2m?), reolink, songpal, fully_kiosk, etc. etc., so basically individual devices. For example, yeah, currently, one of our philips air purifiiers is not plugged in, so I’d assume the connection to fail.

But none of these errors have anything to do with Home Assistant itself, the server, or anything that would explain why the connection to it would be lost. Yeah, it loses connection to certain devices, okay, but that shouldn’t be related to this…?

The “ha net info” is below.

docker:
  address: 172.30.32.0/23
  dns: 172.30.32.3
  gateway: 172.30.32.1
  interface: hassio
host_internet: true
interfaces:
- connected: true
  enabled: true
  interface: eno1
  ipv4:
    address:
    - 10.0.0.25/24
    gateway: 10.0.0.2
    method: auto
    nameservers:
    - 10.0.0.2
    ready: true
  ipv6:
    address:
    - ...
    gateway: null
    method: auto
    nameservers: []
    ready: false
  mac: ...
  primary: true
  type: ethernet
  vlan: null
  wifi: null
- connected: false
  enabled: false
  interface: wlp6s0
  ipv4:
    address: []
    gateway: null
    method: disabled
- connected: false
  enabled: false
  interface: enp5s0
  ipv4:
    address: []
    gateway: null
    method: disabled
    nameservers: []
    ready: false
  ipv6:
    address: []
    gateway: null
    method: disabled
    nameservers: []
    ready: false
  mac:...
  primary: false
  type: ethernet
  vlan: null
  wifi: null
supervisor_internet: true

Yesterday, I uninstalled all integrations that I didn’t really need / had installed for testing only. This seemed to have worked fine! I didn’t have any issues all afternoon.

Now, this morning, I experience the lost connection at least once after starting my PC and accessing lovelace.

boheme61 · March 2, 2024, 10:39am

IF HA Loosing connection to individual devices, and your PC/Phone have problems connecting, it could sounds like a WIFI / Network issue ( I.e coap ! )

However, Log-file is AO , And you mention you are using pfSense and UniFi ( very very briefly )

It certainly does ! , if HA is busy keeping up with extensive attempts to connect to devices, and writing to logfile AND adding tons of states to DB, it has everything to do with HA
Building up huge logfiles and writing useless entries to DB, is not only waste of Resources and Energy, it also prevent or slow down essential tasks

prankousky · March 24, 2024, 3:20pm

I thought the issue had been solved, but unfortunately, this was not the case.

At the moment, Home Assistant can only be accessed for a few minutes after restarting. After this, the web interface will time out. On the PC as well as mobile devices.

However, I can still ssh into the machine, and even use the web interface for other services on that very same machine; for example, I use the UniFi addon and I can log into that.

So HASSos is running the unifi addon and lets me connect. It also runs the ssh addon and I can ssh into the machine. But any device on the network trying to connect to the Home Assistant / lovelace will just time out.

I just disabled all custom_components (by moving the /config/custom_components to /config/custom_components_backup), restarted the machine, and the issue persists.

Just now I restarted the machine. While everything was still loading, I was able to access the web interface and saw the little popup about home assistant still starting. The last thing I saw was that home assistant finished startup - and then nothing would work… browser will eventually time out.

A little bit later I was able to see the web interface again. I was even able to switch of the smart plug for my printer - this took almost 10 seconds, while it would usually take about half a second.

htop on the HASSos machine looks fine. There are some spikes every now and then, but it doesn’t seem to be a load issue…?

boheme61 · March 24, 2024, 3:40pm

Check your Router logs, and your WIFI, and whatever else involves your network, look for connection timeout/errors anywhere/everywhere,

PS: And your “snippets” don’t tell much , because you selective , shows it like that (beside 3 “what ever that is CPU ?” is topping in the latest 3 snippets)

That doesn’t disable anything, moving a Folder and it’s content just make every configurations pointing to these, another problem

Somehow you seems to totally avoid logfiles in your troubleshooting efforts

System/settings/logs ! , thats CORE-Log , in the top-right Corner you see CORE-Log, with a down-Arrow , click it and check HOST, Supervisor And your Add-Ons logs

prankousky · March 25, 2024, 7:11am

I don’t really know what exactly I need to look for in my router logs. Because I can access that host. What should they show? Connections are there, they work - just the Home Assistant web interface does not.

While port 8123 keeps timing out on me (or trying to reload), I can visit port 8443 (Unifi Controller) at all times without issues. I can also keep a connection to port 22 (ssh) while all this happens.

To be clear: while I cannot do anything on port 8123, at the very same time I am able to view and edit UniFi settings. Same host, different port.

Perhaps I’m wrong, but I just assumed that, if I can connect to -and work with- the unifi controller run via Home Assistant, as well as ssh into the machine running it (via Home Assistant ssh Addon), I should also be able to connect to Home Assistant itself — unless there is some kind of issue with the system. If there wouldn’t be a connection, shouldn’t I be unable to access any port on that host?

I’ll post the logs below while I can access them. Usually, I don’t even have time to find these in the web ui because connection dies too quickly. It works at the moment. But perhaps that’s a bad thing, as the errors causing this might not be there. But here they are

HOST

Mar 25 01:30:33 homeassistant kernel: hassio: port 8(veth30de663) entered blocking state
Mar 25 01:30:33 homeassistant kernel: hassio: port 8(veth30de663) entered forwarding state
Mar 25 01:38:08 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85335 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 01:38:08 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 01:54:30 homeassistant NetworkManager[446]: <info>  [1711331670.5606] dhcp4 (eno1): state changed new lease, address=10.0.0.25
Mar 25 01:55:38 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85336 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 01:55:38 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 01:59:25 homeassistant kernel: kauditd_printk_skb: 182 callbacks suppressed
Mar 25 01:59:25 homeassistant kernel: audit: type=1334 audit(1711331965.611:831): prog-id=183 op=LOAD
Mar 25 01:59:25 homeassistant kernel: audit: type=1334 audit(1711331965.611:832): prog-id=184 op=LOAD
Mar 25 01:59:25 homeassistant kernel: audit: type=1334 audit(1711331965.611:833): prog-id=185 op=LOAD
Mar 25 01:59:25 homeassistant systemd[1]: Starting Hostname Service...
Mar 25 01:59:25 homeassistant systemd[1]: Started Hostname Service.
Mar 25 01:59:25 homeassistant kernel: audit: type=1334 audit(1711331965.790:834): prog-id=186 op=LOAD
Mar 25 01:59:25 homeassistant kernel: audit: type=1334 audit(1711331965.790:835): prog-id=187 op=LOAD
Mar 25 01:59:25 homeassistant kernel: audit: type=1334 audit(1711331965.790:836): prog-id=188 op=LOAD
Mar 25 01:59:25 homeassistant systemd[1]: Starting Time & Date Service...
Mar 25 01:59:25 homeassistant systemd[1]: Started Time & Date Service.
Mar 25 01:59:55 homeassistant systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Mar 25 01:59:55 homeassistant kernel: audit: type=1334 audit(1711331995.870:837): prog-id=185 op=UNLOAD
Mar 25 01:59:55 homeassistant kernel: audit: type=1334 audit(1711331995.870:838): prog-id=184 op=UNLOAD
Mar 25 01:59:55 homeassistant kernel: audit: type=1334 audit(1711331995.870:839): prog-id=183 op=UNLOAD
Mar 25 01:59:56 homeassistant systemd[1]: systemd-timedated.service: Deactivated successfully.
Mar 25 01:59:56 homeassistant kernel: audit: type=1334 audit(1711331996.009:840): prog-id=188 op=UNLOAD
Mar 25 01:59:56 homeassistant kernel: audit: type=1334 audit(1711331996.009:841): prog-id=187 op=UNLOAD
Mar 25 01:59:56 homeassistant kernel: audit: type=1334 audit(1711331996.009:842): prog-id=186 op=UNLOAD
Mar 25 02:13:13 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85335 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 02:13:13 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 02:30:55 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85335 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 02:30:55 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 02:48:11 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85335 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 02:48:11 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 03:05:32 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85335 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 03:05:32 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 03:22:45 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85335 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 03:22:45 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 03:39:30 homeassistant NetworkManager[446]: <info>  [1711337970.5610] dhcp4 (eno1): state changed new lease, address=10.0.0.25
Mar 25 03:40:11 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85334 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 03:40:11 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 03:57:42 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85333 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 03:57:42 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 04:06:06 homeassistant kernel: audit: type=1334 audit(1711339566.051:843): prog-id=189 op=LOAD
Mar 25 04:06:06 homeassistant kernel: audit: type=1334 audit(1711339566.051:844): prog-id=190 op=LOAD
Mar 25 04:06:06 homeassistant kernel: audit: type=1334 audit(1711339566.051:845): prog-id=191 op=LOAD
Mar 25 04:06:06 homeassistant systemd[1]: Starting Hostname Service...
Mar 25 04:06:06 homeassistant systemd[1]: Started Hostname Service.
Mar 25 04:06:06 homeassistant kernel: audit: type=1334 audit(1711339566.240:846): prog-id=192 op=LOAD
Mar 25 04:06:06 homeassistant kernel: audit: type=1334 audit(1711339566.240:847): prog-id=193 op=LOAD
Mar 25 04:06:06 homeassistant kernel: audit: type=1334 audit(1711339566.240:848): prog-id=194 op=LOAD
Mar 25 04:06:06 homeassistant systemd[1]: Starting Time & Date Service...
Mar 25 04:06:06 homeassistant systemd[1]: Started Time & Date Service.
Mar 25 04:06:36 homeassistant systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Mar 25 04:06:36 homeassistant kernel: audit: type=1334 audit(1711339596.312:849): prog-id=191 op=UNLOAD
Mar 25 04:06:36 homeassistant kernel: audit: type=1334 audit(1711339596.312:850): prog-id=190 op=UNLOAD
Mar 25 04:06:36 homeassistant kernel: audit: type=1334 audit(1711339596.312:851): prog-id=189 op=UNLOAD
Mar 25 04:06:36 homeassistant systemd[1]: systemd-timedated.service: Deactivated successfully.
Mar 25 04:06:36 homeassistant kernel: audit: type=1334 audit(1711339596.446:852): prog-id=194 op=UNLOAD
Mar 25 04:06:36 homeassistant kernel: audit: type=1334 audit(1711339596.446:853): prog-id=193 op=UNLOAD
Mar 25 04:06:36 homeassistant kernel: audit: type=1334 audit(1711339596.446:854): prog-id=192 op=UNLOAD
Mar 25 04:14:59 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85335 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 04:14:59 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 04:32:23 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85335 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 04:32:23 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 04:50:01 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85333 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 04:50:01 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 05:07:14 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85336 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 05:07:14 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 05:24:30 homeassistant NetworkManager[446]: <info>  [1711344270.5623] dhcp4 (eno1): state changed new lease, address=10.0.0.25
Mar 25 05:26:13 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85333 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 05:26:13 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 05:44:43 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85336 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 05:44:43 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 06:02:33 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85333 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 06:02:33 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 06:12:46 homeassistant kernel: audit: type=1334 audit(1711347166.498:855): prog-id=195 op=LOAD
Mar 25 06:12:46 homeassistant kernel: audit: type=1334 audit(1711347166.498:856): prog-id=196 op=LOAD
Mar 25 06:12:46 homeassistant kernel: audit: type=1334 audit(1711347166.498:857): prog-id=197 op=LOAD
Mar 25 06:12:46 homeassistant systemd[1]: Starting Hostname Service...
Mar 25 06:12:46 homeassistant systemd[1]: Started Hostname Service.
Mar 25 06:12:46 homeassistant kernel: audit: type=1334 audit(1711347166.670:858): prog-id=198 op=LOAD
Mar 25 06:12:46 homeassistant kernel: audit: type=1334 audit(1711347166.670:859): prog-id=199 op=LOAD
Mar 25 06:12:46 homeassistant kernel: audit: type=1334 audit(1711347166.670:860): prog-id=200 op=LOAD
Mar 25 06:12:46 homeassistant systemd[1]: Starting Time & Date Service...
Mar 25 06:12:46 homeassistant systemd[1]: Started Time & Date Service.
Mar 25 06:13:16 homeassistant systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Mar 25 06:13:16 homeassistant kernel: audit: type=1334 audit(1711347196.745:861): prog-id=197 op=UNLOAD
Mar 25 06:13:16 homeassistant kernel: audit: type=1334 audit(1711347196.745:862): prog-id=196 op=UNLOAD
Mar 25 06:13:16 homeassistant kernel: audit: type=1334 audit(1711347196.745:863): prog-id=195 op=UNLOAD
Mar 25 06:13:16 homeassistant systemd[1]: systemd-timedated.service: Deactivated successfully.
Mar 25 06:13:16 homeassistant kernel: audit: type=1334 audit(1711347196.885:864): prog-id=200 op=UNLOAD
Mar 25 06:13:16 homeassistant kernel: audit: type=1334 audit(1711347196.885:865): prog-id=199 op=UNLOAD
Mar 25 06:13:16 homeassistant kernel: audit: type=1334 audit(1711347196.885:866): prog-id=198 op=UNLOAD
Mar 25 06:20:32 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85334 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 06:20:32 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 06:38:12 homeassistant systemd-journald[131]: Data hash table of /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal has a fill level at 75.0 (85335 of 113777 items, 25165824 file size, 294 bytes per hash table item), suggesting rotation.
Mar 25 06:38:12 homeassistant systemd-journald[131]: /var/log/journal/7bbd3df51d484f269f08f43e301f05f7/system.journal: Journal header limits reached or header out-of-date, rotating.
Mar 25 06:46:01 homeassistant kernel: audit: type=1334 audit(1711349161.163:867): prog-id=201 op=LOAD
Mar 25 06:46:01 homeassistant systemd-timesyncd[1155]: Network configuration changed, trying to establish connection.
Mar 25 06:46:01 homeassistant systemd-timesyncd[1155]: Contacted time server 10.0.0.2:123 (10.0.0.2).
Mar 25 06:46:01 homeassistant systemd[1]: Started Journal Gateway Service.
Mar 25 06:46:01 homeassistant systemd-journal-gatewayd[292234]: microhttpd: MHD_OPTION_EXTERNAL_LOGGER is not the first option specified for the daemon. Some messages may be printed by the standard MHD logger.

Then for SUPERVISOR, there is nothing relevant. Everything is green, the only errors are unauthorized logins the the mqtt server (with might be caused by a misconfigured mqtt device): 24-03-25 07:44:13 WARNING (MainThread) [supervisor.auth] Unauthorized login for 'tasmota'

All Add-Ons seem fine as well. Either there is nothing at all (same message for each of those about “no errors found”), or there is general debug output (service started etc.) without any errors.

But again, Home Assistant lets me access the web interface at the moment. These errors I am looking for cannot/should not be there at the moment, because things work at the moment.

When I cannot access the web interface, I ssh into the machine and either tail -f /homeassistant/home-assistant.log or ha <host|system|> <info|logs>.

But there isn’t anything there that I can make any sense of. No “connection refused / connection error / http”, nothing about “8123” (that specific port), not anything I’d read and think “yeah, this has to be fixed”.

There are errors, but, for example, about misconfigured mqtt entities; one I keep seeing is some device running an older tasmota firmware version, so it doesn’t report some values formatted in a way that Home Assistant expects; this needs fixing as well, but it cannot be the reason for not reaching the web interface.

Right now, all HACS custom_components are running. All these Add-Ons are running (marked those I disabled)

And also right now everything works fine. I can access the web interface, I can toggle devices like I am supposed to. They are somewhat sluggish, but that’s all I notice. Usually, when I click on my desk lamp, it’ll turn on right away and the icon will change to display its status - at the moment, this takes about 2 seconds each. Click - 2 sec - light turns on - 2 sec - icon changes. Click again - 2 sec - light turns off - 2 sec - icon changes back. While this isn’t ideal, I can live with it for now.

But I don’t understand why suddenly things stop working. Last night, I wasn’t able to turn off a particular light in the bedroom, because I couldn’t access the web interface and hadn’t programmed a physical button for it. Fortunately, the light was part of an automation, so it still turned off eventually - but that’s not the point.

I noticed one more thing, might be irrelevant, but I thought I’d mention it:

board: generic-x86-64
boot: B
data_disk: Samsung-SSD-980-PRO-1TB-S5GXNS0T101226W
update_available: false
version: "12.1"
version_latest: "12.1"

boot: B. Does this mean anything? For example, is there a boot A and boot B, and might there be something wrong with boot A? It is always set to B, after each reboot.

francisp · March 25, 2024, 7:29am

The are 2 partitions with the OS. If you boot now from B, and there is a HA OS upgrade, the upgrade will be installed on the other partition, and if no errors happened, boot A will be selected.

pdwonline · March 25, 2024, 7:53am

Sounds like something like the firewall when you can ssh to the machine but not web. Sure UFW is not running and misconfigured?

prankousky · March 25, 2024, 8:19am

I am running hass os on bare metal. As far as I can tell, there is no UFW. (at least there is no ufw binary and nothing about it in ha network)

Also, I can access the web interface at times. Today, it works fine. Yesterday, it didn’t work all day. I didn’t change any settings between last night (when nothing worked) and this morning (can still access it since my first post today, so it is stable at the moment) – yet now I can use the web interface and yesterday I could not.

If it were a firewall thing, it should either work or not work at all, but not work sometimes, right? Unless the firewall settings were to change in between.

Both devices are on the same VLAN, so there is no port restriction issue. This is really strange.

francisp · March 25, 2024, 8:23am

Stupid question, but did you try to change the Ethernet cable ?

boheme61 · March 25, 2024, 12:05pm

I assume you are accessing through Ethernet, by that excluded WIFI causes ! ( Don’t use your phone ( or WIFI ), to “troubleshoot” connection issues, unless it’s only for this specific Device you are troubleshooting )

The only thing i see in the HOST log, is the Journal-rotating, could be doo to extensive logging

UniFi Add-on, yes i see you have alot you “control” through HA, personally i prefer using HA for iot-devices/automations.

Network-admin such as Router/switches/VPN/Firewall etc. , i don’t load my HA system with ( The less the better, both in performance and stability perspective

So when i eventually “lift” cameras out, i.e i would have Frigate on a standalone Unit also

The 2 sec, delay for turning on a simple light, could be doo to Cloud-Connection.
I have no such experience for my lights/switches/motion-detection/door-sensors etc (including the automations, which controls/assist most of these , and i do find 2 seconds in the “upper limits” of what’s acceptable !
I.e a light should turn on within a second after motion-detected, and a “manual-switch_toggled-through-UI”, should basically be in a split second ( thou this in most cases requires, local-controlled only, no cloud dependencies for involved devices )

As you have struggled with this , for more than a month, i understand your “frustration”
So in such a situation, i would try to “clear my head” , Update HA to latest and greatest !, And then reboot the HOST.
Then Checking all logs ( Even on your Router ), after fresh Reboot, and again when the the “issues” starting to appears, look for time-outs, connection-issues, and don’t have i.e Glances-Monitor or Ping-integration running etc.

I would look at either( all of ) Network issues, CPU/Process/MeM usage, DB-Connection usage ( i assume you also have checked you Maria-DB’s logfiles ).

If there is not “significant” delays (2 seconds Are significant in this context ! ) when you ping, or connect through SSH ( Through Wired-connection ) , it sounds like either network “over http/https” I don’t know what you use, If it’s Https (which i don’t use) , you might have to “expand” your troubleshooting area.
Or i could be doo to Core-Front end is “busy”, shuffling datas, doo to “inappropriate” templates and/or API-calls ( And you have quite some Add-Ons, which i don’t use/load HA with )
Nginx/rest Docker - Cloudflare - Unify , PfSence , ( Do you also have Nabu-Casa ?)
Mosquito-Broker - ESP-Home -Zigbee2Mqtt
Frigate

So i guess you have lots of log-files to “check”, and i.e enable “DEBUG” -level on, if there is any indications is standard logging-level , for these Add-Ons

Sorry i can’t be of more help here, it’s already 1 month, and i haven’t got more info( Beside all the “new/added” information in regards to your “complex/stuffed” setup/ha-installation) than i could have “collected” within a day, on site. Sitting in the other “end” just makes it frustrating and cumbersome

Maybe you have “build up” something that overcomes your oversight

BR

prankousky · March 25, 2024, 12:43pm

Yes. It was actually a crappy one (temporary fix I forgot about). After replacing it with a decent cable, there was no change. Things worked fine, until they didn’t.

Right now, I can access the web ui, but things load really slowly – to the point where I cannot do anything with HA because by the time, something loads (for example, creating a helper), minutes have passed. In fact, just now it has lost connection in between me clicking on “add helper” and it trying to load the popup that is supposed to load within a second.

Is this something?

24-03-25 13:02:23 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
24-03-25 13:02:23 WARNING (MainThread) [supervisor.misc.tasks] Watchdog missed an Home Assistant Core API response.
24-03-25 13:05:49 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
24-03-25 13:05:49 INFO (MainThread) [supervisor.auth] Home Assistant not running, checking cache
24-03-25 13:07:16 ERROR (MainThread) [supervisor.homeassistant.api] Timeout on call http://172.30.32.1:8123/api/core/state.
24-03-25 13:07:16 WARNING (MainThread) [supervisor.misc.tasks] Watchdog missed an Home Assistant Core API response.

These are all from the SUPERVISOR logs. The IP address there must be internal only. This is an actual error from now where I have no access to the web ui.

I am connected directly through the local network. hassOS on ethernet, my PC on ethernet, same switch, same VLAN. I use the phone through wifi to debug as well, because if ethernet worked and wifi wouldn’t, I’d know there might be something to investigate. But there is no difference between ethernet and wifi.

Yeah. That’s how it always was. Blazing fast. Lights were zigbee2mqtt, controlled locally. But even when controlling from on the road (via VPN connection, not nabu), things worked decently quick. This all just happened recently.

There doesn’t seem to be any network activity on the host at all.

I tested this by curling a website from that host in a tmux split. Then, there would be small numbers. So while I am currently trying to access HA, there is absolutely no I/O activity. But there should be, correct? It cannot serve the pages without it.

I don’t run pfsense on the hass machine. It is a dedicated firewall. When testing this, I deactivated the addons via ha addons stop <name> through ssh. I deactivated all of them except for ssh (obviously), it did not make a difference.

Just checked the pfsense logs. There is nothing about that hosts ip address in there except for DHCP requests. No blocks, no filters, nothing.

I got rid of addons that I thought I wouldn’t really need - or at least could do without: piper, whisper, openwakeword, vscode.

At the moment, only official addons are installed. This doesn’t have to mean anything, but the way I see it - aren’t the Home Assistant yellow / blue devices sold by (and therefore, I assume, recommended by) the Home Assistant team not “just” raspberry pis with additional hardware? Yet, these addons are official… I see this as “these addons should run fine alongside hassos on our recommended hardware”. Maybe that’s wrong.

But then there’s two other things: 1) This is an intel NUC, not a raspberry pi. There’s 32GB of RAM. There’s an Intel(R) Core™ i7-8809G CPU @ 3.10GHz with 8 cores (if I read the output from cat /proc/cpuinfo correctly). Shouldn’t this be enough for hassos with a couple of officiall addons?

And then, perhaps more importantly, 2) this system has been running perfectly fine for years. Yeah, I sometimes add (also remove) integrations and (less frequently) addons – but it has always been working quickly as expected. No lag, no being logged out of the web ui, no losing connection (not even with wifi).

The heavy lifting for frigate is done by a google coral USB stick, but I stopped this addon as well to see whether this would make a difference. It didn’t.

Now, typing all this took me some time. The helper entity from the beginning of the post that I tried to create finally got to the point where I could select what type of helper I’d like to create; then (now that I just checked) it lost connection again.

Btw., everything is already up to date. supervisor 2024.03.0, os 12.1. All addons and the HACS stuff are up to date.

So I’d understand if I had just installed Home Assistant on this device and nothing would work. Then I’d say “it’s too much work for the device”. But again, the device has been working fine like this for ages. I even disabled integrations that have been working without issues just to see if it was going to work as a “bare minimum” (only addons and integrations I really need to use on a daily basis, no fun / experimental / useless stuff). Even that didn’t make any difference. Things are disabled atm., so the system should run faster, if anything. But the issue is still there.

But also, I should mention again that, often, Home Assistant itself works fine. I am not able to access the UI, but walking in the hallway in the evening will result in the lights automatically turning on. While going crazy not finding a solution for this, I will get a signal notification (including multiple camera snapshots) when the front gate opens.

Hopefully, this log I posted above gets us anywhere. I am currently working on a fresh configuration in a dockerized environment on my PC. So if nothing works, my plan is to set up a fresh install on the actually hass machine and have everything relevant ready to copy there. I also have daily backups, but installing them doesn’t seem to make any difference, either.

So while a fresh install might be the best I can do, it is also kinda risky. If there is anything wrong with my current configuration, I cannot use my backups – so I’ll have to start fresh from scratch and do everything (connecting services, create those automations and helpers that aren’t in package yaml files) manually.

boheme61 · March 25, 2024, 12:54pm

Looks to me that your are showing Disk Usage, and Not network Usage

More than enought, frankly “OverKill” , im sure you’d be better of with 2 -max 4 CPU’s , and 4 -max 8GB RAM

code-in-progress · March 25, 2024, 12:57pm

I kinda wonder if you are taking the wrong approach here and rather than looking at addons and integrations, take a look at your lovelace config instead.

Given that HA itself is up, along with all your other integrations, what do you have in terms of custom frontend cards/scripts?

My thought is that perhaps there is something in the frontend that is blocking or timing out the websocket connection. When this happens, have you looked at the browser console to see what errors might be there? I know the first instinct is usually to check the server, but your issues seem to be client based rather than server based.

prankousky · March 25, 2024, 2:05pm

Yeah. Damnit. I realized that and already looked into some tools, but those that work (and can be installed on alpine linux, which hass os runs on) and that I’ve tried didn’t give me an option to filter. There are countless connections due to all my local devices, so I could not debug this way.

@code-in-progress ahhh! Thank you so much for this suggestion. I am afraid to celebrate too soon – but I just removed two cards from my “testing” dashboard. One was constantly trying to display a local file as an iframe (which didn’t work), the other also tried to access something on the hassos server. Now, I cannot be sure atm., because these issues come and go without any recognizable pattern… but as for now, everything runs smoothly and just the way I’d expect Home Assistant to work (and as it has).

I’ll report back in case this wasn’t it, but at the moment, it looks like one of these cards, or the combination of both, were causing this issue, maybe causing too much internal traffic?

I tried to trigger my Home Assistant into freezing by doing what I’ve tried to do for days, finishing some simple automations. Nothing so far. It still works

code-in-progress · March 25, 2024, 2:17pm

Anytime. Remember that the client side can get just as messed up as the server side can (often times much, much worse). If you see the issue again, drop into dev tools in your browser and look at the Console tab and the Networking tab. Those two tabs will usually suss out any problems with lovelace pretty quickly.

stevemann · March 25, 2024, 3:04pm

tl;dr
Following because I am also running an Intel NUC i3 with native HAOS.

How often does the “Connection Lost” popup present?
Have you tried rebooting your router?
Have you tried clearing the browser cache?
If you have another PC, do both lose connection simultaneously?