Hass.io on nuc radomly losing network connection (and WAF)

Mathijs · December 10, 2019, 9:04am

I run Homeassistant via the hass.io docker way on a Nuc (NUC7i5BNK). The operating-system is debian 10 headless. IP is static.

The problem is that Homeassistant is randomly losing connection every 10 min for like 2 minutes.
This while the log shows that the home assistant container is running fine. Also automations with Z-wave devices are working, but automations of wifi devices (ESPHome) gets delayed till reconnection.

This while portioner (not via hass.io) is connection just perfect.

I have removed all custom-components. And am thinking about disabling other (official) Integrations until Hass works normally. Is this a smart approach?

I hope you guys have some advice and can help me.

DavidFW1960 · December 10, 2019, 9:16am

I had a similar issue with same setup a few months ago… Never got to the bottom of it. It was taking 5 mins plus to restart too… Now it’s back to 20 seconds. What addons and custom_components do you use? Is everything up-to-date?

Just wracking my brain for anything else helpful…

Mathijs · December 10, 2019, 9:58am

It really is weird, spend all evening yesterday trying to figure it out.
Everything is up to date, NUC bios, Debian and Home Assistant 0.102.3

Yesterday I disabled all custom_components.

Addons that I have running:

Backup HASSIO to Google dive
ESpHome
MariaDB
Samba share
Visual Studio Code
DeCONZ
NodeRed (only installed and running no automations jet)

And a lot of “normal” integrations.

DavidFW1960 · December 10, 2019, 10:38am

I did go back to default dB from mariadb - dunno if related.

Mathijs · December 10, 2019, 10:41am

Did that help? The Samba share’s tru hass.io addon also remain working even if homeassistent self is not connecting.

DavidFW1960 · December 10, 2019, 8:39pm

There was some other issue I solved by dumping mariadb but I don’t think it helped with this one.

jimz011 · December 10, 2019, 9:08pm

Hm I had this issue a while ago when using the upnp component, where my modem would just reset every 30 minutes due to “too many requests”. It is probably unrelated to this, but every experience might help you out.

I would try a clean setup as well to see if it is your actual setup that is the culprit. Might be a faulty nic but who knows.

DavidFW1960 · December 10, 2019, 9:17pm

Interesting data point Jim. I have that component as well. Weird the issue just went away. Maybe it was after an update that fixed something.

jimz011 · December 10, 2019, 9:30pm

Actually, I did away with the modem and got myself another one. It had fixed the problem for me. It was indeed very very weird. But the crazy thing is I had problems before with the upnp component where HA wouldn’t even start (or it would take extremely long, 10 sec vs 10 min). It does work now, however I can’t say for sure if it was a HA update, a clean install or something else. Though the modem/router problem was certainly my modem. (I have the same model now which got switched by my ISP). However I did put it in bridge mode now with a separate router.

DavidFW1960 · December 10, 2019, 9:50pm

I have had ongoing issues with the UPnP integration for probably as long as it has been there. I opened a few issues about it and Steven Looman did a great job fixing it and it’s been good for a while now.

Mathijs · December 11, 2019, 11:08am

My modem from my isp is on bridge mode, to my router. The router is an tplink archer C3200 with UPnP enabled. I gave the server a static ip adres in Debian. Services like shh, samba, and portainer on the same server experience no downtime, only homeassistant. I can view the homeassistant container log in portainer while hass is not connecting tru “serverip:8123”, in the log it is clear that hass is unable to connect to any network services, ping, etz. Other thing on hass like automations to non-network related stuff keeps working fine.

Than 2 min later hass is connecting and every thing works fine for like 10 minutes, and the connection drops again.

jimz011 · December 11, 2019, 3:28pm

What you could try is to give it a dynamic IP (dhcp) and give it a static IP in your router instead (best practice would be to give it an IP outside of the DHCP range). I doubt it will make a difference though, but it might be worth a shot.

Mathijs · December 16, 2019, 2:00pm

Some how I fixed it but haven’t got a clue how.

All I can do is explain my steps:

I completely reinstalled Debian and hass.io etz, After recovering the backup, upgraded to 0.103.0 and connection drop problem persisted.
I cleared the MariaDB database, reboot and this didn’t help either.
Disabled MariaDB in the configuration.yaml file by putting a # afront of it. reboot and still connection dorps.
Disabled about 90% of the integration in my config file bij placing #. And for about an hour the connection didn’t dorp!
After this I set by step enabled integrations again. And every time tested for 15 min and the connection never dropped
Finally I had all integrations enabled, (so the config file was basically the same as step 1 and 2 ) and the connection stayed online. This was yesterday afternoon, its now more than 24 hours without any connection dorps. I can see this because a ESPHome wifi enabled thermometer has no flat lines in the graph. And I also didn’t notice any.

I don’t understand at all how I fixed the problem. It took me all Sunday morning and some of the afternoon. But i am glad its working now.

animoautomation · December 16, 2019, 7:05pm

I had the same issue. my solution…
1.- Uninstall all external custom component.
2.- Test Hassio operativity if is ok
3.- if ok, then Install and activate one by one your custom components and verify which one is caussing the failure.
4.- Update o delete permanently of the problematic custom component.
5.- Enjoy your Home Assistant without hangs or lost connection

In my case the custom component from Peterbuga for Sonoff was the cause of my head pain for weeks and the lost connection every 5 or 10 minutes.
Actualy i’m running HA 0.103.0 version or upper

Mathijs · December 17, 2019, 7:10am

Yes, this is basically what I did, and expected when reenabling things to find a problematic componend.
But somehow reenabled everything and the problem didn’t reoccur. I am happy but truly clueless.