I am running hass.io in Docker on a NUC which runs Ubuntu 18.04. The homeassistant container is using the host network. For some time I get connection errors on all integrations that use the internet for quite some time and this is starting to get on my nerves. Errors I see in the logs are things like indicated below.
Unable to retrieve raindata from Buienradar.(Msg: Cannot connect to host gadgets.buienradar.nl:80 ssl:None [Network unreachable], status: None,)
Unable to connect to Dark Sky. HTTPSConnectionPool(host='api.darksky.net', port=443): Max retries exceeded with url: /forecast/REDACTED/52.2,4.8?units=si&lang=en (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f13ae66b610>: Failed to establish a new connection: [Errno -3] Try again'))
HTTPConnectionPool(host='www.REDACTED.nl', port=80): Max retries exceeded with url: /emoncms9/feed/list.json?apikey=REDACTED(Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f13ae2d8790>, 'Connection to www.REDACTED.nl timed out. (connect timeout=5)'))
I run a ping integration on buienradar.nl in parallel which doesn’t indicate any outages. When I manually go to any of the indicated sites it works flawlessly. The problem does exist with both http and https connections and occur frequent, but random. The rest of the network has never shown any hiccups whatsoever.
I decided to test the connections by running homeassistant on a Raspberry pi (no docker whatsoever) running more or less the same config and also in there the log is filled with connection errors.
This leads me to the conclusion that something in homeassistant is broken, but I cannot figure out what. I do know I run quite a lot of sensors that rely on the internet (around 20), but that shouldn’t be too much to handle for a beefy NUC (cpu, memory never go above 40%).
It’s hard to tell without knowing what components you are running on Home Assistant. In the end, you have the most intimate knowledge of your instance (what changes you made recently) and are the best shot to figuring out the issue. My general advice is to suspect everything and revert all your recent changes, even those you think shouldn’t crash your instance.
I would also recommend disabling all custom components, Lovelace cards and Hass.io add-ons, then re-enable each one after running for several hours to a day. I’ve had instances (similar error messages to yours–vague connection errors happening all components) where Home Assistant became unreachable due to the DuckDNS Hass.io add-on, which I never would have suspected if I didn’t stumble on a forum post about DuckDNS issues.
I had a deep dive in the router logs and think I have found the source of the problem; every few minutes I find the following in my router log which doesn’t look healthy. I really don’t know what is causing it…
Aug 4 06:16:33 dnsmasq[220]: read /etc/hosts - 5 addresses
Aug 4 06:16:33 dnsmasq[220]: read /etc/hosts.dnsmasq - 20 addresses
Aug 4 06:16:33 dnsmasq[220]: using nameserver 1.0.0.1#53
Aug 4 06:16:33 dnsmasq[220]: using nameserver 1.1.1.1#53
Aug 4 06:16:33 nat: apply nat rules (/tmp/nat_rules_eth0_eth0)
Aug 4 06:16:33 rc_service: wanduck 158:notify_rc stop_ntpd
Aug 4 06:16:33 rc_service: wanduck 158:notify_rc start_ntpd
Aug 4 06:16:33 rc_service: waitting "stop_ntpd" via wanduck ...
Aug 4 06:16:33 ntpd: Stopped ntpd
Aug 4 06:16:34 ntpd: Started ntpd
Aug 4 06:18:32 nat: apply redirect rules
Aug 4 06:18:32 DualWAN: skip single wan wan_led_control - WANRED off
Aug 4 06:18:37 WAN_Connection: WAN was restored.