Thread Starvation

I am running into the follow:

2018-09-17 22:11:29.299787 WARNING AppDaemon: Queue size is 10, suspect thread starvation
2018-09-17 22:11:29.300088 INFO AppDaemon: --------------------------------------------------
2018-09-17 22:11:29.300342 INFO AppDaemon: Threads
2018-09-17 22:11:29.300547 INFO AppDaemon: --------------------------------------------------
2018-09-17 22:11:29.300759 INFO AppDaemon: Currently busy threads: 0
2018-09-17 22:11:29.300967 INFO AppDaemon: Most used threads: 3 at 2018-09-17 19:53:45
2018-09-17 22:11:29.301185 INFO AppDaemon: Last activity: 2018-09-17 22:10:11.525268
2018-09-17 22:11:29.301362 INFO AppDaemon: --------------------------------------------------
2018-09-17 22:11:29.302687 INFO AppDaemon: thread-1 - current callback: idle since 2018-09-17 20:03:52, alive: False
2018-09-17 22:11:29.302974 INFO AppDaemon: thread-2 - current callback: idle since 1969-12-31 17:00:00, alive: False
2018-09-17 22:11:29.303194 INFO AppDaemon: thread-3 - current callback: idle since 1969-12-31 17:00:00, alive: False
2018-09-17 22:11:29.303395 INFO AppDaemon: thread-4 - current callback: idle since 1969-12-31 17:00:00, alive: False
2018-09-17 22:11:29.303647 INFO AppDaemon: thread-5 - current callback: idle since 2018-09-17 17:13:11, alive: False
2018-09-17 22:11:29.303862 INFO AppDaemon: thread-6 - current callback: idle since 2018-09-17 17:11:28, alive: False
2018-09-17 22:11:29.304080 INFO AppDaemon: thread-7 - current callback: idle since 2018-09-17 18:41:19, alive: False
2018-09-17 22:11:29.304296 INFO AppDaemon: thread-8 - current callback: idle since 2018-09-17 20:04:55, alive: False
2018-09-17 22:11:29.304511 INFO AppDaemon: thread-9 - current callback: idle since 2018-09-17 22:10:11.525268, alive: False
2018-09-17 22:11:29.304722 INFO AppDaemon: thread-10 - current callback: idle since 2018-09-17 18:42:39, alive: False
2018-09-17 22:11:29.304918 INFO AppDaemon: thread-11 - current callback: idle since 2018-09-17 17:57:05, alive: False
2018-09-17 22:11:29.305163 INFO AppDaemon: thread-12 - current callback: idle since 2018-09-17 20:04:38, alive: False
2018-09-17 22:11:29.305375 INFO AppDaemon: thread-13 - current callback: idle since 2018-09-17 16:56:10, alive: False
2018-09-17 22:11:29.305566 INFO AppDaemon: thread-14 - current callback: idle since 2018-09-17 18:41:57, alive: False
2018-09-17 22:11:29.305815 INFO AppDaemon: thread-15 - current callback: idle since 2018-09-17 20:03:37, alive: False

I can’t seem to determine what is causing it. The busy threads is below the 10 I have established in my appdaemon.yaml file so I’m assuming the worker process is being killed some how.

Is there a good way to run down what is the cause?

I’m going to ping @ReneTode and @aimc since I’m certain they’ll be able to help!

the only thing i can think of is that you somehow have a bad app.
just so i can get a picture.

how many apps do you have?
what do they do?
can you show 1 or more from them, so i can look if you have some basic error?
is there a loop, sleep or wait in 1 of your apps?
is there anything else in your errorlog?
how does your appdaemon.yaml look it this moment?

Thanks, @ReneTode

  • About 16 apps
  • anything from light scheduling, to entry notification, to lights on if motion
  • Here is one of my newer apps:
import appdaemon.plugins.hass.hassapi as hass

class arrive_depart(hass.Hass):
    def initialize(self):
        self.utils = self.get_app('utils')
        self.notify = self.get_app('Notify') 
        self.arrive_handle = None
        self.depart_handle = None

        ## App Args
        self.devices = self.args["devices"]
        self.depart_delay = self.args["depart_delay"]
        self.arrive_delay = self.args["arrive_delay"]
        self.topic_path = self.args["topic_path"]

        self.__ARRIVE__ = "{}/scan/arrive".format(self.topic_path)
        self.__DEPART__ = "{}/scan/depart".format(self.topic_path)

        # self.arrive_handle = self.run_in(self.occupancy_scan, delay, topic = self.__ARRIVE__)
        # self.depart_handle = self.run_in(self.occupancy_scan, delay, topic = self.__DEPART__)

        ### Arrive/Depart Listener(s)
        entities_to_monitor = []
        if "arrive_depart_triggers" in self.args:
            for entity in self.args["arrive_depart_triggers"]:
                ## If Group - initialize all entities from the group
                if "group" in entity:
                    groupitem = self.get_state(entity, attribute="all")
                    entities_to_monitor.extend(groupitem['attributes']['entity_id'])
                else:
                    entities_to_monitor.extend([entity])
        else:
            # This will monitory ALL of the entities in your house
            self.log("No arrive/depart provided in cfg, not doing anything.")

        for entity in entities_to_monitor:
            self.listen_state(self.input_handler, entity)

    def input_handler(self, entity, attribute, old, new, kwargs):
        self.notify.log_entry(calling_app=self.name, message="MQTT Callback Received - {}".format(new), log_level="INFO")

        someone_home = False
        someone_away = False

        for device in self.devices:
            ## If Group - initialize all entities from the group
            if self.get_state(device) == "home":
                someone_home = True
            else:
                someone_away = True

        if not someone_home:
            ##If no one is home and the door/garage is triggered, it will only scan for arrive
            if not self.arrive_handle:         
                self.notify.log_entry(calling_app=self.name, message="Everyone away - Arrive Handle - Delay: {}".format(self.arrive_delay), log_level="INFO")
                self.arrive_handle = self.run_in(self.occupancy_scan, self.arrive_delay, topic = self.__ARRIVE__)

        elif not someone_away:
            ##If everyone is home and the door/garage is triggered, it will only scan for depart            
            if not self.depart_handle:
                self.notify.log_entry(calling_app=self.name, message="Everyone home - Depart Handle - Delay: {}".format(self.depart_delay), log_level="INFO")
                self.depart_handle = self.run_in(self.occupancy_scan, self.depart_delay, topic = self.__DEPART__)

        else:
            ##If some in/out and the door is triggered, it first does an arrive scan, then does a depart (as it wouldn’t know if someone going out or coming in)          
            if not self.arrive_handle:         
                self.notify.log_entry(calling_app=self.name, message="Home and Away - Arrive Handle - Delay: {}".format(self.arrive_delay), log_level="INFO")
                self.arrive_handle = self.run_in(self.occupancy_scan, self.arrive_delay, topic = self.__ARRIVE__)
            
            if not self.depart_handle:
                self.notify.log_entry(calling_app=self.name, message="Home and Away - Depart Handle - Delay: {}".format(self.depart_delay), log_level="INFO")
                self.depart_handle = self.run_in(self.occupancy_scan, self.depart_delay, topic = self.__DEPART__)

    def occupancy_scan(self, kwargs):
        topic = kwargs['topic']

        self.notify.log_entry(calling_app=self.name, message="Starting Occupancy Scan - {}".format(topic), log_level="INFO")
        self.call_service("mqtt/publish", topic=topic, payload="")  

        ### DEGUB
        self.log("Arrive: {}".format(self.arrive_handle))
        self.log("Depart: {}".format(self.depart_handle))

    def terminate(self):
        self.cancel_timer(self.arrive_handle)          
        self.cancel_timer(self.depart_handle)  

        ### DEGUB
        self.log("Arrive: {}".format(self.arrive_handle))
        self.log("Depart: {}".format(self.depart_handle))
                        

.yaml

arrive_depart:
  module: arrive_depart
  class: arrive_depart
  plugin: HASS
  dependencies:
    - Notify
  devices:
    - device_tracker.mqtt_kyler_location
    - device_tracker.mqtt_haley_location
  topic_path: location
  depart_delay: 10
  arrive_delay: 10
  arrive_depart_triggers:
    # - group.binary_arrive_depart_entries
    - cover.double_garage_door
    - binary_sensor.front_door_sensor_contact
    - binary_sensor.mudroom_door_sensor_contact
    # - binary_sensor.back_door_sensor_contact
    # - binary_sensor.garage_side_door_sensor_contact
    # - binary_sensor.basement_hall_multisensor_motion
    - binary_sensor.office_window_sensor_contact
  • Nothing else in the error log that seems to tie to the thread starvation.
  • appdaemon.yaml
secrets: /conf/secrets.yaml
log:
  accessfile: '/conf/logs/access.log'
  errorfile: '/conf/logs/error.log'
  logfile: '/conf/logs/appdaemon.log'
  log_generations: 10
  log_size: 100000
  
appdaemon:
  threads: 15
  api_port: 8124
  production_mode: false
  api_key: !secret api_key
  # utility_delay: 5      # seconds to run utility loop
  # utility_skew: 4       # warning for excessive time, default: 90% of utility_delay
  exclude_dirs:
    - apps_WIP
    - settingsfiles
  plugins:
    HASS:
      type: hass
      ha_key: !secret ha_key
      ha_url: !secret ha_url


For reference, here is the Notify app:

import appdaemon.plugins.hass.hassapi as hass
#
# Centralizes messaging. Among other things, it will determine whether a user is at home and if yes in which room. 
#
# Release Notes
#
### Initialize
#     def initialize(self):   
#        self.notify = self.get_app('Notify')
#
#### apps.yaml
#
#   dependencies:
#     - Notify
#
# USAGE: 
######### self.notify.alert(calling_app=self.name, message=msg, log=True)
#
#

# Version 1.0:
#   Initial Version


class notify(hass.Hass):

    def initialize(self):
        self.__ALL__ = "all"


    def alert(self, calling_app, message, title="", notify_name="notify", target=None, data=None,
            text_message=True, log=False, log_level="INFO", broadcast=False, 
            persistent=False):
       
        if text_message:
            self.text_message(calling_app, message, title, notify_name, target)
        
        if log:
            self.log_entry(calling_app, message, log_level)

        if broadcast:
            self.broadcast(calling_app, message)

        if persistent:
            self.persistent(calling_app, message, title)


    def text_message(self, calling_app, message, title="", notify_name="notify", target=None):
        self.call_service("notify/{}".format(notify_name), message=message, title=title, target=target)
    
    def broadcast(self, calling_app, message):
        self.call_service("notify/googleassistant", message=message)

    def persistent(self, calling_app, message, title=""):
        self.call_service('persistent_notification/create', title=title, message=message, notification_id=calling_app)

    def log_entry(self, calling_app, message, log_level="INFO"):
        message = "{} -- {}".format(calling_app, message)
        self.log(message, log_level)
    
    def terminate(self):
        pass

What’s interesting, is that when I make a change and Terminate is called, I am getting a value for the handle.

2018-09-18 17:15:55.476356 INFO AppDaemon: App 'arrive_depart' changed
2018-09-18 17:15:55.490655 INFO AppDaemon: Terminating arrive_depart
2018-09-18 17:15:55.491108 INFO AppDaemon: Calling terminate() for arrive_depart
2018-09-18 17:15:55.493094 INFO arrive_depart: Arrive: None
2018-09-18 17:15:55.494901 INFO arrive_depart: Depart: a7ceb540-8830-42e6-84a9-409ed986a070
2018-09-18 17:15:55.495641 INFO AppDaemon: Initializing app arrive_depart using class arrive_depart from module arrive_depart

first thing i noticed, you call 2 apps with get_app (utils and notify) but you only did set dependency for notify

the log you give shows there is a problem.
after cancel_timer there is still an active handler.
so that keeps a thread busy.
when you are developing that app and dont restart AD, that thread is kept busy
every time you make a change that happens, so slowly you occupy threads.

the good thing: restarting AD would get rid of your problem.

then the bad part:

in your callback you call a run_in but you never check if its already started.
so the handle can be set after that its already has been set. which can cause your trouble.

i would suggest that you replace every line like:

                self.arrive_handle = self.run_in(self.occupancy_scan, self.arrive_delay, topic = self.__ARRIVE__)

with

         if self.arrive_handle != None:
                self.arrive_handle = self.run_in(self.occupancy_scan, self.arrive_delay, topic = self.__ARRIVE__)
         else:
                self.cancel_timer(self.arrive_handle)
                self.arrive_handle = self.run_in(self.occupancy_scan, self.arrive_delay, topic = self.__ARRIVE__)

and in the init: self.arrive_handle = None

Cancelling the timer doesn’t remove or change the variable you have stored. To do that you must delete the variable, or set it None. But I don’t think that is relevant to thread starvation, as the timer has been cancelled.

hmm, cancel_timer should just do that.
cancel the timer so the handle would be set to None.

but its not needed to cancel the timer in the terminate, because it gets cancelled anyway, unless there is a problem created before (like creating 1 var for several handlers)

AD cancels the timer by removing it from whatever scheduling algorithm it uses.

It cannot change the variable that is passed in cancel_timer() because of Python’s method of passing parameters.

hmm yeah i did say that wrong.
in the code structure i gave the timer will be reset untill its back to None because the timer did run till the end.

and when thats not done and you have a devicetracker that gives several statechanges in a short time, then you get a lot of run_ins running at the same time.