Appdaemon not receivng events after HA restart

I have the latest HA and Appdaemon versions running in docker and it all works fine, but the issue is that as soon as I restart HA it gets detected by AD - which is also perfectly fine. Afterwards it tells me that it called terminate on all apps and the initialize methods get executed. Only issue is now that AD has no real connection to HA anymore - no events/… is received anymore.

Is there anything I have to do to make AD work correctly after a HA restart?

normally AD just should reconnect to HA when HA is up and running again.
can you show me the logs from the point you restart HA until there is a new connection?

Here is a log of a restart:

seems like a normal restart from an very active AD :wink:
are none of your apps working after restart, or just some?

you say no events. can you be more specific.

All of them. Everything that is not connected seems to work (like scheduling), but events are not getting triggered or calls to get_state return None. As soon as I restart AD everything is back to normal.

when did this start to happen?
did you update HA when it started to happen? or anything else?

i think that this is something that @aimc will ask you some more questions about.

This started with the upgrade from the latest 2.x to 3.x. Only other thing that also changed was that I had 2.0 directly on my machine and with 3 I changed to a docker setup (AD and HA both use host networking).

which version from AD are you running?

Seems a little weird - the terminates should have occurred before the reconnecting messages - can you show me the whole log from before the point where HASS restarted please?

Sure - here is another log (this one also shows the error because of get_state returning None).

https://gist.github.com/TheEggi/dbe6395fd4740a5b7a558d5d670e6625

that shows that the sensor sensor.angela_fahrzeit isnt initialised in HA at the moment that AD is reinitialising.
so probably a race condition.

i suspect from lines like this:

Aug 19 18:48:08 :  2018-08-19 18:48:08.142242 INFO climate_dressing_room: Changed temperature of Heizkörper Ankleidezimmer to 4.5 -> adjust slider Heizkörper Ankleidezimmer

that the app climate_dressing_room does change the value from an entity in HA.
but it doesnt give an error so the entity must exist.

it seems to me that you do all kind of checks in your initialise without errors.
why do you think there is no connection to HA?

and still the question, what version from AD are you running?

Thank you for the analysis - AD version is the latest one published to docker. Guess I will add some code in all my scripts to get some kind of a state value and see which automations are affected. Will post the results when I have them.

dont get me wrong, but in lots of cases where people say they have the latest version, it end up not beiing the latest version, thats why i ask again, which version?

you can find out which version by looking at the docs directly after a restart from AD.

it seems like you have a lot going on in AD, so i suspect you also have a lot going on in HA. it can be that some HA components are to slow. but there have been some changes in the last versions from AD that did help against race situations, thats why i keep asking about the version.

the problem could also be in your mqtt.

but if its just this 1 error you base your question on then i can say that AD is just working as suspected.
add a testapp with a listenstate to several devices, set it to priority 1 and let it just log the state and you know if entities exist and what values they have at the start from the reinitialising.

Aug 19 18:51:45 : 2018-08-19 18:51:45.261445 INFO AppDaemon Version 3.0.1 starting

That is what I see in the logs. And yes I will probanly have to analyze it a bit more and see if I can get any results to improve the situation.

1 Like

ok, that is indeed the last version :wink:

Following code seems to have resolved it…

def initialize(self):
count = 0
while (not self.check_state()) and count < MAX_RETRY:
time.sleep(0.3)
count += 1

def check_state(self):
    state = True
    if not self.get_state('sensor.test'):
        self.error('>> ERROR: SENSOR NOT FOUND!!')
        state = False
    else:
        self.log('>> OK: FOUND!!')

    return state

Interesting thing is that since I added it, it seems to not even go into the “ERROR”-Part … and I already tried it like ten times.

could be that that sensor was slow just once.
but your code is wrong.

in your log you got the error while the state was None.
None would be returned as value in this case also.

and also could a sensor have the state False, and then it would return sensor not found.

a better way to go would be:

def check_state(self):
    state = True
    if not entity_exists('sensor.test'):
        self.error('>> ERROR: SENSOR NOT FOUND!!')
        state = False
    else:
        self.log('>> OK: FOUND!!')

    return state

but you would still get the error you got, because it seems that the sensor you used normally gives an INT, but in some cases returns None and you didnt check for that.

Thank you for the improvement - will change it.

1 Like

Hi, I’m also seeing the same issue at the moment for quite some time (~3 weeks?).

Log file:

Version
HA 0.111.3
16.06.2020 07:27:21 INFO AppDaemon: AppDaemon Version 4.0.3 starting
16.06.2020 07:27:21 INFO AppDaemon: Python version is 3.8.2

I’m running HA and AD in docker containers, which are updated via watchtower.
For the last year or so this worked flawless. Whenever HA was updated AD reconnected
and send me a message via pushbullet that one of my sensors timed out (that sensor is dead for a long time … but that was kind of how I saw that HA was updated :smiley: ).

This stopped about 2-3 weeks ago … instead many automations (AD scripts) stopped working at the same time. I’ve restarted the AD container and all was good. Tonight same behavior. Reading the log made my digg up this very old thread … but it’s the same issue.

I’ll get a lot of errors after the init, looking like this:

pretty much every script goes crazy. about a minute later it all works fine again … well at least all scripts which are still alive and are doing some processing on a timebase will report regular behaviour, meaning that they can read the state of the sensor again …

Is it possible to delay the start of AD once it reconnects without rewriting all scripts? I guess I could add a “try to read state of sun.sun for 5 min in a loop until that works” in each init function … but that not very elegant …

any ideas?
Thanks, JKW

HA did rearrange.
HA is up and running before all integrations are up and running.

so the only way to work with this is delay.