Custom Component: systemd (Run HA as a systemd notify daemon, with watchdog support!)

What is the correct line to debug the component?

logger:
  default: info
  logs:
    homeassistant.components.systemd: debug

I am seeing this issue that the daemon is restarting via systemd once i restart via ha service call. Basically is restarting twice. The msg i get from the onfailure directive is

May 27 09:18:39 ha systemd[1]: hass.service: Main process exited, code=killed, status=9/KILL
May 27 09:18:39 ha systemd[1]: hass.service: Unit entered failed state.
May 27 09:18:39 ha systemd[1]: hass.service: Triggering OnFailure= dependencies.
May 27 09:18:39 ha systemd[1]: hass.service: Failed with result 'timeout'.

I tried to increase the timeout for watchdog for 7 minutes but my guess is the issue is not there. If i restart via systemd doesn’t trigger twice

Some testing
If i restart via ha then check the dog pat ts

systemctl show --property=WatchdogTimestamp hass.service
WatchdogTimestamp=Mon 2019-05-27 09:50:43 AEST

Then the telegram msg comes when it triggers the restart including systemd status output:

Active: activating (auto-restart) (Result: timeout) since Mon 2019-05-27 09:51:59 AEST; 17ms ago

That is shorter than half the watchdog time

I can see the notify STATUS successfully updated, but somehow fails to notify READY=1. As systemctl keeps reporting ‘deactivating’
I mean it doesn’t fail, i can see the log tha ha pushes the READY status, but systemd doesn’t receive it or doesn’t acknowledge.

Made a video so you can see the whole sequence.

https://www.dropbox.com/s/vcoxv3s0jey1tlf/Peek%202019-05-27%2011-20.mp4?dl=0

I even try to use the alternative systemd module

from cysystemd.daemon import notify, Notification

then

    notify(Notification.READY)

in the good_dog notify_started function but doesn’t make a difference

First off, thanks for the detailed bug report! (I mean that; I wish other people would put in a fraction of the amount of effort you have to track down a problem before reporting it.)

So, I think I know what’s going on here: I normally never restart HA from within HA. Generally I do it via systemctl restart hass.service, so this issue may have slipped by me. I know it did work at one point, I just completely forgot to test it with future builds.

It looks like systemd is killing HA based on the Timeout= (or StartTimeout/StopTimeout) value. This is different from the WatchdogTimeout= value, as it controls the time systemd waits for a READY or STOPPING notification on startup or shutdown.

This tells me that either:

  1. The hass-systemd component isn’t reporting the new PID after restart, therefor systemd is ignoring READY notifications from it. (Systemd will only accept messages from process IDs it considers valid for that service.)

  2. HA is killing the hass-systemd thread before it has a chance to send a STOPPING notification to systemd, hence the perpetual deactivating status you’re seeing.

Give me 24 hours and I’ll have a new build for you to test.

There is no way of proxy-ing or mitm the notify socket right? So should be watching then d-bus to see the stop msg right?

Not sure about this. The pid and the status msg are both updated. I can see the log on ha and the subsequent the pid change on the sctl status

Not sure also about point 2. But maybe should be able to test instead of watching the stop event send notify intercepting the call_service event on homeassistant.restart

Correct. D-bus messages include information about the sender (process ID, cgroups, etc.) and systemd uses this information to make sure it only accepts notifications from what it considers the main PID. [I belive there *is* an option you can enable in the service file that *will* allow it to accept notifications from all child processes and not just the parent.]

Yes, I noticed that too when testing just now.

I think I’ve figured out an easy way to fix your issue (assuming it’s problem #2), basically listening for the restart event (like you suggested). I’m testing it now.

Not sure if the call service intercept will work. I added this

shell_command:
  send_systemd_notify: /usr/bin/systemd-notify STOPPING=1

Then i called the service in ha, i can see status shows

deactivating (stop-sigterm)

A bit under

Status: "Home Assistant is running." Which is correct since i just send only STOPPING, then call homeassistant.restart and same happens again

For this you have to setup

NotifyAccess=all

A dirty workaround is just to disable the STOPPING notification, just use the STATUS messages

Sorry for the delay, holiday weekend here in the US and all.

It turns out it’s not problem #2. After some additional testing combined with your experimentation I believe I’ve figured out what’s going on. Essentially, we’re catching the ha.stop event and sending a STOPPING notification to systemd. The problem is, since we’re restarting HA it never actually stops; systemd sits there waiting for it to stop until the timeout is reached. This is confirmed by the fact that disabling the ha.stop listener fixes the problem.

So, I think we need to listen for the ha.restart event and, if detected, set a flag. We’ll have the ha.stop listener function check that flag to determine if it should send the STOPPING notification or not.

Alternatively we can have the ha.restart listener simply disable the ha.stop listener.

I’m working on this now.

I was looking at core.py among other files also. I didn’t not see an event called restart, I did however see an event fired after stop called EVENT_HOMEASSSISTANT_CLOSE but I couldn’t catch it with the listener, I was thinking this event was related to full stop.

Hi thanks very much for this watchdog, I’m going to try it out. I’m currently using the systemd instructions that are in the docs here:

I have a few quick questions when using this component, is the best way to restart HA to simply run:

  • sudo systemctl restart hass.service

Also when upgrading HA to a different version, I guess this service should be stopped first? Is something like this OK:

sudo systemctl stop hass.service
cd homeassistant
source bin/activate
python3 -m pip install --upgrade homeassistant
deactivate
sudo systemctl start hass.service

Yes, until I get a chance to implement support for HA’s native restart ability the best way is to simple issue a systemctl restart *ha-service*.service.

For upgrading HA, after you’ve stopped it and performed the upgrade, I’d go into your HA directory and start it by hand and let it load up fully once:

sudo systemctl stop *ha-service*.service
cd *ha-dir*
source bin/activate.sh
sudo -u *ha-user* pip3 install —upgrade homeassistant
sudo -u *ha-user* hass -c /srv/hass
[Home Assistant Starts...]
[Home Assistant Finishes Loading...]
<CTRL-C>
[Home Assistant Stops...]
sudo systemctl start *ha-service*.service

Obviously change the stuff in asterisk to match your installation. The reason I do this is because after an upgrade HA typically updates various packages during startup, which can extend the startup tIme and cause systemd to kill and restart it. Alternatively you could do a systemctl edit —full *ha-service*.service and extend the watchdog and startup timers to 5+ minutes. This should give HA enough time to update itself before HA kills it.

Let me know if you need more help. :slight_smile:

1 Like

Thanks that’s great, I think I’ll try the start it by hand method next time I need to do a HA update, they can take quite a while sometimes.

I’ve only been running the watchdog for 1 day, so far so good. :slight_smile:

Hey guys, so I’ve got a new version coming up here soon. I’ve added support for HA’s native restart functionality. It’s not super elegant, but it works! It also requires a watchdog timeout value of at least 60 seconds, but that shouldn’t be an issue for most people. (Basically when we see a restart request come through we don’t send the STOPPING message to systemd and we immediately pet the watchdog. A 60 second watchdog timeout value should be long enough for HA to stop, start and reactivate our plugin. If you have a ton of components or devices in HA, or have a very slow system, it might require an even longer timeout value.)

Like I said, not the most elegant solution, but it functions.

3 Likes

when do you think you’re going to submit the component to main ha?

Hi, I have a timeout value set to 5 mins as sometimes it takes a while to load HA when it’s just been updated to a later version.

However usually my HA takes less than 1 min to startup. :slight_smile:

Hi @timothybrown
May be I am completly off topic, should I use this custom component to start - stop - query status of others systemd daemons?
I want to have a switch in HA to turn on and off httpd, pure-ftp and whatever has a systemctl .service registered.
Suggestion?

Thanks in advance.

A quick question to anyone using this custom component, is it still working OK for you?

It has recently stopped working for me, this is the error I’m getting in HAC:

Failed config
  homeassistant.packages.custom_systemd.systemd:
    - Package custom_systemd setup failed. Component systemd Integration 'systemd' not found.
    - systemd: None

Yet I have the custom component installed…

It was working fine in the July release of HAC

This seems to not have been updated in several years. I found GitHub - brianegge/home-assistant-sdnotify: systemd service for Home Assistant first — and that seems to work and be more current.

1 Like