concurrent.futures._base.TimeoutError

sj3fk3 · May 13, 2020, 8:51am

I find myself, writing a lot of code like this, because some (mostly external) service in hass is taking a lot of time… We do we need to fix things? In the integration is maybe the most logical place, but that involves tons of developers adding retry code…

Would it also be possible to have this (more readily available) in appdeamon? Maybe like giving a int for number of retries?

  MAX_RETRIES = 10
        for _ in range(MAX_RETRIES):
            if self.is_alive:
                try:
                    self.do.a.hass.service.call
                except (TimeoutError):
                    self.log(
                        "WARNING: Got a very unexpected error")
                    continue
                else:
                    break
            else:
                time.sleep(4)
                self.log(f"bla bla after {_} retries")
                time.sleep(4)
        else:
            self.log("All hass failed")
            return None

swiftlyfalling · May 13, 2020, 12:39pm

I don’t see any issues like this in my AppDaemon to HomeAssistant connection.

To get to the bottom of it, you’re going to want to set HomeAssistant logging to Debug for everything. Then reproduce the error. If, with the extra logging, you’re unable to find the issue, please post your AppDaemon logs, HomeAssistant logs at the time of the AppDaemon error, as well as your AppDaemon App code.

sj3fk3 · May 13, 2020, 7:10pm

It’s not a problem between Hass and AD, they both run in a docker, but on the same host (and network). It issue is with underlying integrations like the tahoma integration. Every now and then apparently it takes a long time to query the API, that results in a time-out. The issue is: If and integration takes to long to answer AD is going to toss and TimeOut exception.

This is an example where pushover service takes to long:

2020-04-16 07:01:10.148756 WARNING covers_tod_weekdays: Unexpected error in worker for App covers_tod_weekdays:
2020-04-16 07:01:10.149881 WARNING covers_tod_weekdays: Worker Ags: {'id': '198a8ecd83834fa1b89003bf4ee8138e', 'name': 'covers_tod_weekdays', 'objectid': '73d5074e69ed48e193f13642e09afa00', 'type': 'scheduler', 'function': <bound method CoversTod.move_covers_at_set_times of <coverstod.CoversTod object at 0x7fdb7e5f4fa0>>, 'pin_app': True, 'pin_thread': 33, 'kwargs': {'interval': 86400, 'movement': 'open_cover', '__thread_id': 'thread-33'}}
2020-04-16 07:01:10.150887 WARNING covers_tod_weekdays: ------------------------------------------------------------
2020-04-16 07:01:10.152585 WARNING covers_tod_weekdays: Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/appdaemon/threading.py", line 766, in worker
    funcref(self.AD.sched.sanitize_timer_kwargs(app, args["kwargs"]))
  File "/conf/apps/home/coverstod.py", line 88, in move_covers_at_set_times
    self.push_msg(f"{_movement} on set times")
  File "/conf/apps/home/coverstod.py", line 62, in push_msg
    self.notify(_msg, title="Master Bedroom Cover", name="pushover")
  File "/usr/local/lib/python3.8/site-packages/appdaemon/plugins/hass/hassapi.py", line 29, in func_wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/appdaemon/utils.py", line 191, in inner_sync_wrapper
    f = run_coroutine_threadsafe(self, coro(self, *args, **kwargs))
  File "/usr/local/lib/python3.8/site-packages/appdaemon/utils.py", line 285, in run_coroutine_threadsafe
    result = future.result(self.AD.internal_function_timeout)
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 441, in result
    raise TimeoutError()
concurrent.futures._base.TimeoutError

sj3fk3 · May 23, 2020, 2:12pm

An dan other example:

ha log:

2020-05-23 15:49:40 ERROR (SyncWorker_6) [xiaomi_gateway] Cannot connect to Gateway
2020-05-23 15:49:40 ERROR (SyncWorker_6) [xiaomi_gateway] No data in response from hub None

appd log

2020-05-23 15:49:40.796409 WARNING alarm_system: Worker Ags: {'id': 'fd63f26b6f57404eaa686e42af476c89', 'name': 'alarm_system', 'objectid': '3e4874f7e6024c16866fbb4df50ded50', 'type': 'state', 'function': <bound method AlarmSystem.trigger_door_chime_callback of <alarm.AlarmSystem object at 0x7fed3c256100>>, 'attribute': 'state', 'entity': 'binary_sensor.hall_front_door_180', 'new_state': 'on', 'old_state': 'off', 'pin_app': True, 'pin_thread': 29, 'kwargs': {'new': 'on', 'old': 'off', '__thread_id': 'thread-29'}}
2020-05-23 15:49:40.797944 WARNING alarm_system: ------------------------------------------------------------
2020-05-23 15:49:40.800059 WARNING alarm_system: Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/appdaemon/threading.py", line 777, in worker
    funcref(entity, attr, old_state, new_state,
  File "/conf/apps/alarm_systeem/alarm.py", line 510, in trigger_door_chime_callback
    self.play_xiaomi_ringtone(self.xiaomi_chime_ringtone_id)
  File "/conf/apps/alarm_systeem/alarm.py", line 305, in play_xiaomi_ringtone
    self.call_service("xiaomi_aqara/play_ringtone",
  File "/usr/local/lib/python3.8/site-packages/appdaemon/utils.py", line 191, in inner_sync_wrapper
    f = run_coroutine_threadsafe(self, coro(self, *args, **kwargs))
  File "/usr/local/lib/python3.8/site-packages/appdaemon/utils.py", line 285, in run_coroutine_threadsafe
    result = future.result(self.AD.internal_function_timeout)
  File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 441, in result
    raise TimeoutError()
concurrent.futures._base.TimeoutError

So my point is… The integration is (in HA) is clearly having a hard time, apparently this is somewhat of a know issue with Xiaomi. What I would like is for appdaemon to retry. Or I would like to catch the timeout and retry myself. Somehow a try/accept does not work? Or maybe I’m missing something?

fmagn · August 6, 2020, 10:26am

Hi!

I have similiar problems but with Sonos. I have many speakers and it sometimes times out during a call to the snapshot or group_all service which results in missed audio notifications.

Is it possible to increase the time-out?

It also seams that the Sonos time out sometimes causes side-effects on other integrations that also starts timing out when called from an AppDaemon app. Perhaps it is a queue or something that is full/to busy (I don’t know how the AppDaemon hass integration is implemented but I here assume its some form of queue, directly on indirectly e g socket i/o queue)?

Heres an example of a side-effect when Sonos has just timed out, that causes a hue light to also time out afterwards (which I don’t have found to time-out on its own):

2020-08-06 08:00:10.445310 WARNING bedroom_2_light_logic1: Traceback (most recent call last):
  File "/usr/lib/python3.8/site-packages/appdaemon/threading.py", line 777, in worker
    funcref(entity, attr, old_state, new_state,
  File "/config/appdaemon/apps/logic/light.py", line 123, in _state_changed
    self._table._suggest_evaluate(self)
  File "/config/appdaemon/apps/logic/light.py", line 52, in _suggest_evaluate
    self.reevaluate()
  File "/config/appdaemon/apps/logic/light.py", line 62, in reevaluate
    self._callback(result, old)
  File "/config/appdaemon/apps/logic/light.py", line 270, in _condition_changed
    self._apply()
  File "/config/appdaemon/apps/logic/light.py", line 277, in _apply
    self._light_apply(light)
  File "/config/appdaemon/apps/logic/light.py", line 306, in _light_apply
    self.turn_on(light, brightness=self._target_brightness_calculated)
  File "/usr/lib/python3.8/site-packages/appdaemon/plugins/hass/hassapi.py", line 29, in func_wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python3.8/site-packages/appdaemon/utils.py", line 191, in inner_sync_wrapper
    f = run_coroutine_threadsafe(self, coro(self, *args, **kwargs))
  File "/usr/lib/python3.8/site-packages/appdaemon/utils.py", line 285, in run_coroutine_threadsafe
    result = future.result(self.AD.internal_function_timeout)
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 441, in result
    raise TimeoutError()
concurrent.futures._base.TimeoutError

fmagn · August 6, 2020, 11:01am

I took a quick look at the code and found:

github.com

AppDaemon/appdaemon/blob/6f99b1f42f9c62bcfdcaf83de435df3b01eca671/appdaemon/services.py

import threading
import traceback
import asyncio

from appdaemon.appdaemon import AppDaemon
import appdaemon.utils as utils


class Services:
    def __init__(self, ad: AppDaemon):

        self.AD = ad
        self.services = {}
        self.services_lock = threading.RLock()
        self.logger = ad.logging.get_child("_services")

    def register_service(self, namespace, domain, service, callback, **kwargs):
        self.logger.debug(
            "register_service called: %s.%s.%s -> %s", namespace, domain, service, callback,
        )

This file has been truncated. show original

async def call_service(self, namespace, domain, service, data):
        self.logger.debug(
            "call_service: namespace=%s domain=%s service=%s data=%s", namespace, domain, service, data,
        )
        with self.services_lock:
                ....
                if isasync is True:
                    # it's a coroutine just await it.
                    return await funcref(ns, domain, service, data)
                else:
                    # It's not a coroutine, , run it in an executor
                    return await utils.run_in_executor(self, funcref, ns, domain, service, data)

I am mainly a .NET developer and new to python but as I read this code, AppDaemon holds a lock when awaiting/calling the service? If so (if locks in python in relation to await works as in .NET) means that an app only can have one call_service-operation running at the same time, e g no parallel call_service from an App?

Also is it really best practice to mix async with locks?

omayhemo · February 28, 2021, 1:58pm

I know this is a pretty old thread, but there’s been no resolution. I still get these timeouts in AD revolving around zigbee bulbs mostly, but always zigbee. I’m not sure of the over all side-effects when the timeout happens.

Even in .NET we sometimes have to use locks, for example when collections do not support concurrency, or manipulation of an object could possibly result in a race condition.

sj3fk3 · March 2, 2021, 8:11pm

I have moved to pyscript since and very happy with it.

omayhemo · March 2, 2021, 9:31pm

Good to know, I will be checking it out, it’s intriguing so far.

omayhemo · May 4, 2021, 9:04pm

I’m still getting this error, is there any plans to handle this in AppDaemon any time soon? Exceptions are expensive and annoying and this has zero affect on the operation of the entity. It works as normal, so this seems to be AppDaemon’s issue.