logger:
default: info
logs:
homeassistant.components.systemd: debug
I am seeing this issue that the daemon is restarting via systemd once i restart via ha service call. Basically is restarting twice. The msg i get from the onfailure directive is
May 27 09:18:39 ha systemd[1]: hass.service: Main process exited, code=killed, status=9/KILL
May 27 09:18:39 ha systemd[1]: hass.service: Unit entered failed state.
May 27 09:18:39 ha systemd[1]: hass.service: Triggering OnFailure= dependencies.
May 27 09:18:39 ha systemd[1]: hass.service: Failed with result 'timeout'.
I tried to increase the timeout for watchdog for 7 minutes but my guess is the issue is not there. If i restart via systemd doesn’t trigger twice
Some testing
If i restart via ha then check the dog pat ts
systemctl show --property=WatchdogTimestamp hass.service
WatchdogTimestamp=Mon 2019-05-27 09:50:43 AEST
Then the telegram msg comes when it triggers the restart including systemd status output:
Active: activating (auto-restart) (Result: timeout) since Mon 2019-05-27 09:51:59 AEST; 17ms ago
I can see the notify STATUS successfully updated, but somehow fails to notify READY=1. As systemctl keeps reporting ‘deactivating’
I mean it doesn’t fail, i can see the log tha ha pushes the READY status, but systemd doesn’t receive it or doesn’t acknowledge.
First off, thanks for the detailed bug report! (I mean that; I wish other people would put in a fraction of the amount of effort you have to track down a problem before reporting it.)
So, I think I know what’s going on here: I normally never restart HA from within HA. Generally I do it via systemctl restart hass.service, so this issue may have slipped by me. I know it did work at one point, I just completely forgot to test it with future builds.
It looks like systemd is killing HA based on the Timeout= (or StartTimeout/StopTimeout) value. This is different from the WatchdogTimeout= value, as it controls the time systemd waits for a READY or STOPPING notification on startup or shutdown.
This tells me that either:
The hass-systemd component isn’t reporting the new PID after restart, therefor systemd is ignoring READY notifications from it. (Systemd will only accept messages from process IDs it considers valid for that service.)
HA is killing the hass-systemd thread before it has a chance to send a STOPPING notification to systemd, hence the perpetual deactivating status you’re seeing.
Give me 24 hours and I’ll have a new build for you to test.
Not sure about this. The pid and the status msg are both updated. I can see the log on ha and the subsequent the pid change on the sctl status
Not sure also about point 2. But maybe should be able to test instead of watching the stop event send notify intercepting the call_service event on homeassistant.restart
Correct. D-bus messages include information about the sender (process ID, cgroups, etc.) and systemd uses this information to make sure it only accepts notifications from what it considers the main PID. [I belive there *is* an option you can enable in the service file that *will* allow it to accept notifications from all child processes and not just the parent.]
I think I’ve figured out an easy way to fix your issue (assuming it’s problem #2), basically listening for the restart event (like you suggested). I’m testing it now.
Sorry for the delay, holiday weekend here in the US and all.
It turns out it’s not problem #2. After some additional testing combined with your experimentation I believe I’ve figured out what’s going on. Essentially, we’re catching the ha.stop event and sending a STOPPING notification to systemd. The problem is, since we’re restarting HA it never actually stops; systemd sits there waiting for it to stop until the timeout is reached. This is confirmed by the fact that disabling the ha.stop listener fixes the problem.
So, I think we need to listen for the ha.restart event and, if detected, set a flag. We’ll have the ha.stop listener function check that flag to determine if it should send the STOPPING notification or not.
Alternatively we can have the ha.restart listener simply disable the ha.stop listener.
I was looking at core.py among other files also. I didn’t not see an event called restart, I did however see an event fired after stop called EVENT_HOMEASSSISTANT_CLOSE but I couldn’t catch it with the listener, I was thinking this event was related to full stop.
Yes, until I get a chance to implement support for HA’s native restart ability the best way is to simple issue a systemctl restart *ha-service*.service.
For upgrading HA, after you’ve stopped it and performed the upgrade, I’d go into your HA directory and start it by hand and let it load up fully once:
Obviously change the stuff in asterisk to match your installation. The reason I do this is because after an upgrade HA typically updates various packages during startup, which can extend the startup tIme and cause systemd to kill and restart it. Alternatively you could do a systemctl edit —full *ha-service*.service and extend the watchdog and startup timers to 5+ minutes. This should give HA enough time to update itself before HA kills it.
Hey guys, so I’ve got a new version coming up here soon. I’ve added support for HA’s native restart functionality. It’s not super elegant, but it works! It also requires a watchdog timeout value of at least 60 seconds, but that shouldn’t be an issue for most people. (Basically when we see a restart request come through we don’t send the STOPPING message to systemd and we immediately pet the watchdog. A 60 second watchdog timeout value should be long enough for HA to stop, start and reactivate our plugin. If you have a ton of components or devices in HA, or have a very slow system, it might require an even longer timeout value.)
Like I said, not the most elegant solution, but it functions.
Hi @timothybrown
May be I am completly off topic, should I use this custom component to start - stop - query status of others systemd daemons?
I want to have a switch in HA to turn on and off httpd, pure-ftp and whatever has a systemctl .service registered.
Suggestion?