What might cause all clients on the built in hass.io MQTT broker to disconnect?

daneboom · June 7, 2019, 9:27pm

So after months of the built in Hass.io MQTT broker working absolutely spot on…today it’s been dropping all clients every 3 hours or so. Any ideas why it might want to start doing this?!

Thanks!!

nickrout · June 7, 2019, 11:01pm

daneboom · June 8, 2019, 10:43am

Hi Nick

Have amended the title and also am now including a log of some recent connections to the broker.

It’s odd. I’m getting socket errors/timeout exceeded on ALL my MQTT clients all of a sudden.

Swap usage/CPU usage on the Pi doesn’t seem to be excessively high, but just now during a broker outage, when I tried to connect to the front end from my iPhone to check what was going on, the app did show a red “request timed out” message I’ve not seen before.

Just not sure how to debug from here on, or why the problem just started yesterday when it’s been stable for months prior to this.

And when the clients disconnect, they ALL disconnect, not just one.

daneboom · June 8, 2019, 10:55am

Also just found this in my logs from the time of the outage:-

Sat Jun 08 2019 11:03:52 GMT+0100 (British Summer Time)
Error processing webhook 54866698xxxx70f8d475fbedcdxxxx6dbb08813fc970e6e692ef0466xxxba871
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/homeassistant/components/webhook/__init__.py", line 83, in async_handle_webhook
response = await webhook['handler'](hass, webhook_id, request)
  File "/usr/local/lib/python3.7/site-packages/homeassistant/components/mobile_app/webhook.py", line 273, in handle_webhook
await hass.data[DOMAIN][DATA_STORE].async_save(safe)
  File "/usr/local/lib/python3.7/site-packages/homeassistant/helpers/storage.py", line 119, in async_save
await self._async_handle_write_data()
  File "/usr/local/lib/python3.7/site-packages/homeassistant/helpers/storage.py", line 183, in _async_handle_write_data
self._write_data, self.path, data)
concurrent.futures._base.CancelledError

and

Sat Jun 08 2019 11:03:53 GMT+0100 (British Summer Time)
Unhandled exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aiohttp/web_protocol.py", line 447, in start
    await resp.prepare(request)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/web_response.py", line 353, in prepare
    return await self._start(request)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/web_response.py", line 667, in _start
    return await super()._start(request)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/web_response.py", line 410, in _start
    await writer.write_headers(status_line, headers)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/http_writer.py", line 112, in write_headers
    self._write(buf)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/http_writer.py", line 67, in _write
    raise ConnectionResetError('Cannot write to closing transport')
ConnectionResetError: Cannot write to closing transport

so not sure if they’re relevant but it seems they might be?! I’ll see if I can find if the same things appeared during the other outages yesterday and this morning

daneboom · June 9, 2019, 7:42am

I’m talking to myself here but that’s fine!

I’m slightly wondering if an ESPURNA device is spamming the broker with lots and lots of power usage messages. I’m going to keep the ESPURNA offline for a day or two to see if that fixes the problem. Besides that I don’t know what else I could possibly try.

nickrout · June 9, 2019, 7:45am

I don’t know what that means, but if you are running a pi it might be a failing sd card??

daneboom · June 9, 2019, 7:47am

Ah - I wouldn’t be surprised - do you think it might cause all sorts of queer problems such as these? I had to force restart the Pi twice a couple of days ago after a powercut…

nickrout · June 9, 2019, 7:56am

I don’t know, but try running the

dmesg

command and see if you get disk errors. But first, back up all your configs!!

daneboom · June 9, 2019, 11:53am

Interesting…