Thread starvation when trying to update sensor every x minutes

Since the Google Maps Travel Time Component does not have custom update intervals I implemented the functionality with appdaemon. I have several sensors showing the travel time between two locations and update them in a fixed interval using self.run_in().
When the script is running for some minutes it is flodding the log files with thread starvation messages. Can someone check whether I made some basic mistakes in trying to run the script every x minutes?

Script can be found under https://github.com/eifinger/appdaemon-scripts/blob/master/google_travel_time.py and directly here:

import appdaemon.plugins.hass.hassapi as hass
import googlemaps
import datetime
import secrets
import messages

#
# App which calculates travel time between two locations and if wanted notifies the user if the travel time is within a normal amount
#
#
# Args:
#
# entities: Entity state to update
# notify_input_boolean: input_boolean determining whether to notify
# entity.from : Location from where to drive
# entity.to : Location to drive to
# example:
# entities:
#  input_number.travel_from_home_to_work:
#    notify_input_boolean: input_boolean.travel_from_home_to_work
#    from: Mainz
#    to: Wiesbaden
#
# Release Notes
#
# Version 1.1:
#   Add notification feature
#
# Version 1.0:
#   Initial Version

class GoogleTravelTime(hass.Hass):

    def initialize(self):
        self.gmaps = googlemaps.googlemaps.Client(secrets.GOOGLE_MAPS_API_TOKEN)
    
        self.handle = None
        self.max_api_calls = 2500
        self.delay = int(round(3600 * 24 / self.max_api_calls * 2))
        self.log("Delay is: {}".format(self.delay))
        if "entities" in self.args:
            self.delay = int(round(self.delay * len(self.args["entities"])))
            self.log("Found {} entities to update. Setting delay to {}".format(str(len(self.args["entities"])), str(self.delay)))
        else:
            self.log("No entities defined", level = "ERROR")
        self.run_in(self.calculate_travel_times, self.delay)            

    
    def calculate_travel_times(self, *kwargs):
        if "entities" in self.args:
            for entity in self.args["entities"]:
                _from = self.args["entities"][entity]["from"]
                if _from.startswith("secret_"):
                    _from = self.get_secret(_from)
                _to = self.args["entities"][entity]["to"]
                if _to.startswith("secret_"):
                    _to = self.get_secret(_to)
                travelTime = self.get_distance_matrix(_from, _to)
                roundedTravelTime = int(round(travelTime["duration_in_traffic"]["value"] / 60))
                self.log("Updating {} to {} minutes".format(entity, str(roundedTravelTime)))
                self.set_state(entity, state = roundedTravelTime)
                #Notify component
                if roundedTravelTime <= travelTime["duration"]["value"] * 1.2 and self.get_state(self.args["entities"][entity]["notify_input_boolean"]) == "on":
                    message = messages.journey_start().format(_to)
                    self.log("Notify user")
                    self.call_service("notify/slack",message=message)
                    self.turn_off(self.args["entities"][entity]["notify_input_boolean"])
        else:
            self.log("No entities defined", level = "ERROR")
        self.run_in(self.calculate_travel_times, self.delay) 

    def get_distance_matrix(self, origin, destination):
        now = datetime.datetime.now()
        matrix = self.gmaps.distance_matrix(origin,
                                            destination,
                                            mode="driving",
                                            departure_time=now,
                                            language="de",
                                            traffic_model = "best_guess")
        distance = matrix['rows'][0]['elements'][0]['distance']
        duration = matrix['rows'][0]['elements'][0]['duration']
        duration_in_traffic = matrix['rows'][0]['elements'][0]['duration_in_traffic']
        return {"distance": distance, "duration": duration, "duration_in_traffic": duration_in_traffic}

    def get_secret(self, key):
        if key in secrets.secret_dict:
            return secrets.secret_dict[key]
        else:
            self.log("Could not find {} in secret_dict".format(key))

There doesn’t look like anything intrinsically wrong here -

A couple of things to check:

  1. What is delay time being calculated as? If it’s less than the time it takes to do all the processing, call to google etc. you will run out of threads eventually
  2. Are the calls to google hanging?

i think andrew has already pointed to the right direction.
you try to get the maximum allowed api calls and set the delay as short as possible.
then you do this:

  • open thread
  • do a lot of stuff (including api calls) for a lot of sensors
  • after delay open thread
  • do it again

so every time you open a thread you keep it busy for a long time.
but that shouldnt be a problem in itselve, because the next thread is only called after the first is finished.
so it can be that this is the last drop and you already have other apps exhausting your threads.

better practice would be to use more threads and keep them busy for a short time.

in stead of running a loop through a list of entities, open an app for every new entity.
then you use a thread for every entity and only keep it busy for a lot shorter time.
extend your delay to an acceptable level.
instead of trying to get the max, set it to a level that you want (for instance once a minute) and compare that with your max per entity, if its lower use the level, if it higher set it back to the max.

it can be that you need to set your threads in appdaemon to a little higher level. but be carefull with that, it could be that you start hiding underlaying problems.
try logging how long 1 time in the loop takes, see if getting the values from the api call isnt taking up way to much time.

Thank you for your input!

The whole loop for updating 9 sensors is taking between 2-3 seconds and I already get warnings for that.

I calculate the fastest update time possible, depending on number of sensors and to be on the save side double that. So for 9 sensors the update cycle is about every 10 minutes. This shouldn’t be a problem with threads overlapping.

I fully agree with your pointed out best practices @ReneTode, but setting the update delay would be a hassle if I split this up in several apps. Have to give this some thoughts.

I will watch the log files again and see if for some reason after some time the API calls fail or start taking too long. That is the only explanation I can think of right now.

1 other thing i noticed.

you dont have a terminate function.
that results in the following effect:

  • you start AD
  • the app gets initialised and the loop starts running
  • you edit something in the loop
  • you save that and the app gets reinitialised, the loop starts running again. but a run_in() that isnt reset in the initialise will keep running

the effect is that every time you edit the app and save it you start another loop, but you dont stop the other loop. so if you develope a lot, you end up with the loop running several times.

you need to save the handle from your run_in by using it in your function like:

self.handle= self.run_in(self.calculate_travel_times, self.delay)

and then create a function like:

def terminate(self):
  self.cancel_timer(self.handle)

or restart AD after you have been developing in this app.

2 Likes

You are amazing!
That is exactly what was happening.
Thank you!!

2 Likes