HomeAssistant becomes unresponsive, KNX Interface errors in _SelectorDatagramTransport

jellevervloessem · December 24, 2023, 10:15am

Hi!

For weeks now I have to reset my HA almost every day. It becomes unresponsive after several hours and I see a large memory peak when it does.
The first things that fail are KNX entites (may be the consequence instead of a cause). Afterwards, all communications tend to fail with HTTPS and SSL errors. (tuya, Daikin, scraper, dhcp tasks, switchbot).

So I dug into the logs, and at every restart I can identify some expected errors:

TemplateErrors for unknown values at restart/crash
mqtt that has some doulbe ID’s
Ancient config that’s still in there
…

The following appears also quite often, but that should not freeze my HA:

WARNING (MainThread) [xknx.log] Error: KNX bus did not respond in time (2.0 secs) to GroupValueRead request for:

But when the memory peak appears I get the following error that I cannot trace back:

ERROR (KNX Interface) [homeassistant] Error doing job: Exception in callback _SelectorDatagramTransport._read_ready()
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 1163, in _read_ready
    self._protocol.datagram_received(data, addr)
  File "/usr/local/lib/python3.11/site-packages/xknx/io/transport/udp_transport.py", line 48, in datagram_received
    self.data_received_callback(data, addr)
  File "/usr/local/lib/python3.11/site-packages/xknx/io/transport/udp_transport.py", line 96, in data_received_callback
    self.handle_knxipframe(knxipframe, HPAI(*source))
  File "/usr/local/lib/python3.11/site-packages/xknx/io/transport/ip_transport.py", line 68, in handle_knxipframe
    callback.callback(knxipframe, source, self)
  File "/usr/local/lib/python3.11/site-packages/xknx/io/tunnel.py", line 302, in _request_received
    self._tunnelling_request_received(knxipframe.body)
  File "/usr/local/lib/python3.11/site-packages/xknx/io/tunnel.py", line 513, in _tunnelling_request_received
    self._send_tunnelling_ack(
  File "/usr/local/lib/python3.11/site-packages/xknx/io/tunnel.py", line 541, in _send_tunnelling_ack
    self.transport.send(
  File "/usr/local/lib/python3.11/site-packages/xknx/io/transport/udp_transport.py", line 160, in send
    raise CommunicationError("Transport not connected")
xknx.exceptions.exception.CommunicationError: Transport not connected

The most recent logfile can be found here: home-assistant_2023-12-24T09-17-37.451Z.log - Google Drive

farmio · December 24, 2023, 10:51am

Hi !

That’s a result of the Knx “connection” timing out but still trying to send something. (Kind of a race condition).

I’d try to disable custom integrations (and maybe Addons) and see if the problem disappears. There is even some “safe mode” now that may help with that.

Other than that, there is a profiler that may help if you know how to use it.
See Profiler - Home Assistant
Or Instructions to install Py-spy on HAOS

jellevervloessem · January 11, 2024, 11:44am

So, This is what I did so far:

restarted in Safe mode
- same behabvior: stall after a few hours, lots of HTTP errors on the (non-custom) integrations
disabled all custom integrations from HACS, also noderd companion, hacs itself
removed all obsolete integrations
removed deprecated yaml in my configuration.yaml
cleanup all yaml, based on the watchman report

All of the above resulted in a slow and unresponsive system.
(unresponsiveness was to be found at: Ring, esphome, mqtt, zigbee, tuya, android tv)

I tried the opposite: disabled the KNX integration. → this resulted in a stable remaining HA setup. (although it really depends on KNX) So I think to pinpoint it to the KNX integration.

during my investigation, I saw in the logbook of the KNX Interface Individual address, references to 15.15.255, which I never configured in the past. So I reconfigured the KNX IP Interface according the manual (its a Weinzierl 730 disguised as an eelectron IN00A02IPI).
this is my knx setup:

Device address: 1.1.255 (address within ETS topology)
Connection 1:    1.1.250 (address within local settings)
Connection 2:    1.1.251 (assigned by learn key)
Connection 3:    1.1.252 (assigned by learn key)
Connection 4:    1.1.253 (assigned by learn key)
Connection 5:    1.1.254 (assigned by learn key)

Currently I see some behavior in the logbook that make sense:

KNX Interface Individual address changed to 1.1.250
12:16:59 - 13 minutes ago
KNX Interface Individual address became unavailable
12:16:26 - 13 minutes ago
KNX Interface Individual address changed to 1.1.252
12:16:24 - 13 minutes ago
KNX Interface Individual address became unavailable
12:16:20 - 13 minutes ago
KNX Interface Individual address changed to 1.1.251
12:16:18 - 14 minutes ago
KNX Interface Individual address became unavailable
12:16:14 - 14 minutes ago
KNX Interface Individual address changed to 1.1.254
12:16:10 - 14 minutes ago
KNX Interface Individual address became unavailable
12:16:10 - 14 minutes ago
KNX Interface Individual address changed to 1.1.253
12:16:09 - 14 minutes ago
KNX Interface Individual address became unavailable
12:15:58 - 14 minutes ago
KNX Interface Individual address changed to 1.1.251
12:15:50 - 14 minutes ago
KNX Interface Individual address changed to 1.1.250
12:15:50 - 14 minutes ago

So now I’m really stuck. (I didn’t get to the profiler so far…)

Thanks for looking into this…

farmio · January 11, 2024, 12:01pm

Aha. So you get constant Knx disconnections. Does your ip interface provide any kind of logs?

Other than that, I guess you can observe the communication with Wireshark. Or check / change the cabling between the interface and HA.

But I couldn’t think of why this would lead to an unresponsive system …

TDehaene · May 4, 2024, 4:55am

@jellevervloessem I’m currently experiencing similar issues.

Some additional symptoms on my end:

I can restart Home Assistant and then the KNX connection will work for some time (typically an hour or two), after which it fails again
If this is the case, I can no longer write to for example light group addresses from ETS6
If I shut down the Home Assistant container, the ETS6 setting of group addresses works again

So my current train of thought is also towards some problem with the KNX/IP gateway.

Were you able to solve your issue in the end?

jellevervloessem · May 4, 2024, 6:40am

Together with @farmio we discovered a malfunctioning ip-interface on my Bmax b3 device.
When I put the wifi interface as main interface, all my issues where gone.
It had something to do that the interface reconnected to many times.(I can retrieve the details in discord)

But it’s not clear to me if this can be resolved by a homeassistant OS update or a manual firmware update I need to find…