Indeed, everything is purely running using MQTT streams. Got Mosquitto running on one of the devices as a broker
I would expect that the disk IO might be a sticking point - probably worse than a P, and a lot of the housekeeping is doing IO
Is there an easy way to test that ? Disk IO in this case will be RAM IO since I am running without an SD card in the phones.
Currently running for 2 days without any issues, skew threshold at 15 seconds. No loss of any of the apps thusfar.
I wonder if it would be possible to add a watchdog-like function inside appdaemon, if no activity was seen for X seconds make appdaemon exit with an error or restart. I would prefer exit with an error since the user can than catch those and act accordingly. Restart can be done automatic for instance by using supervisord.
I see you have added the additional info using a warning system, I see for instance:
WARNING Excessive time spent in scheduler loop: 1253.0ms
Doesnt happen very often it seems but I wonder what I can learn from this exactly.
This basically means you are trying to do more than the hardware can handle - the scheduler loop executes once every second - if that execution takes more than a second things are going to start going wrong.
I am going to make another change in this area that will give you some additional tuning options that might help, by checking for app and config changes less often than once a second which might take some of the load off, but after that, if you get that warning it means that callbacks will be inevitably delayed due to a lack of processing power - and this is also an explanation for the clock skew - I’ll make that tuneable too.
Thanks again, both for implementing and reaction.
Some of my apps make calls to external services, I have the feeling this plays a role. WHen I for instance connect to a MQTT broker this seems to happen sometimes.
I will follow my logs carefully and see if I can deduct a bit more.
Anything in an app uses a separate pool of threads so shouldn’t impact the timer loop …
OK, that means I cant debug what exactly causes it. Its clearly not a regular event, I see it happen but with different space between warnings and also different amount of “spent in scheduler loop”. Most of the time its just around 1000ms but I have seen 5000 and 6000ms also.
It could be literally anything - cpu and I/O are a scarcer resource for you than most, which means some background task in the OS could be running and causing this perhaps.
Can you pause and resume a different app from within another app?
What do you mean by
pause and resume?
stop and start? Pause/resume is a concept from WebCoRE.
I don’t know of any way to do that directly, but you could set something in one app that is used as a constraint in another app, that would prevent the second app running anything.
Yeah I was just thinking about constraints. That would work for me.