Hello_world example addon from developer docs stopped working (s6 overlay issue?)

@CentralCommand - Are you affiliated with the Home Assistant project? Asking because you said “This is the first time we’ve had an issue like this”.

I posted in a different thread, but am having the same can only run as pid 1 error in my add-on I develop. I’ve already marked my scripts as 755 and set "init": false in my config.json. What am I missing? For reference, my branch is here.

Searching GitHub, I see quite a few other add-ons are having the same error.

Yes. I’m mdegat01 on github.

So those were the only two steps I had to do in my testing. I will try your branch and take a look later to see if I can find something, can’t right now.

Lawris above mentioned needing to add hassio_role: default so I suppose you could try that? I don’t know why that would make a difference since default is still supposed to be the default value of that field but I suppose it can’t hurt in that case.

@CentralCommand I tried setting "hassio_role": "default" without any luck. Agree with you that it shouldn’t make a difference though…

I found two other add-ons on GitHub that have gone through the instructions and it’s still not working for them. Wondering if there is something else we’re missing…

This one just worked fine for me when I added init: false to the config. I didn’t have to do anything else and it seemed to build, start and run. At least as far as I know, I got this:

Which said to me that the addon was doing its thing it just found an error in my config. Which makes sense since I did not touch the config as I don’t use this addon and have no idea how it works.

So this one was multiple things, one of which had nothing to do with s6.

The addon was not pinning the version of the base image it was using. When the latest version of the addon base images came out one of the other things we did was make alpine 3.15 the latest for the alpine base image since its been out and stable for a while now. Since the addon did not pin the version of the base image it was using it immediately started trying to use alpine 3.15 instead of alpine 3.14. This is a problem because it appears the addon can’t be built on 3.15, some other updates are required.

This part is not captured in the blog because its completely unrelated. This is the risk of using latest and why every best practices guide for developing docker images says to pin your versions. It would’ve happened no matter what was in the release. It could’ve been a documentation change and the addon would’ve broken.

I actually put in a PR for this one because I use Z2M and do know how it works so I can test it. This could be something to watch out for elsewhere though I suppose? If you see an addon without a build.yaml or the version pinned in the Dockerfile then they are risking issues like this.

For your addon, I think you found a bug actually. I noticed with your addon that when protection mode is on I get this as output

The addon obviously isn’t working like this but it is trying to. It’s actually running your cont-init script which is then exiting due to a validation error. But then when I turn off protection mode I get this:
Screen Shot 2022-05-16 at 7.56.17 PM

Which suggests to me that init: false is not working correctly when protection mode is disabled. Can you make an issue here about this?

EDIT: As an aside, this is a scary add-on.

As such, I have requested all possible permissions.

No kidding… Actually forgot an addon could request to turn off apparmor, haven’t seen that used before.

@CentralCommand I appreciate you reaching out to that zigbee2mqtt repo and putting in the PR. I added a comment to the other issue confirming your findings.


Thanks for the feedback. I had started it with protection mode enabled and saw that output, but didn’t think that error pointed to a bug. I just thought it was still broken, but in a different way. 🤦‍♂

I opened an issue as you said.

EDIT: As an aside, this is a scary add-on.

Ya, tell me about it. I don’t love it. :sweat_smile:

It’s similar to glances in that it requires access to hardware data (CPU, memory, etc…) about the host it’s running on. I’ve tried leaving protection mode enabled, but I’m not getting all the metrics I need. I could probably investigate dropping some API access (e.g., hassio_api, homeassistant_api, auth_api) in order to get my rating up, but protection mode pretty much needs to be disabled.

Side note: If you could integrate node metrics into the official Prometheus integration, that would be great and would negate the need for my add-on completely :sweat_smile:.

FYI I replied on your github issue. Figured out that its really an issue with host_pid: true which only works when protection mode is off. There’s actually not a good solution to this though I put what I have in the comment. And added a PR about it.

I mentioned keeping an eye on the glances addon in the comment and you should do that. But to be honest I think the only solution here is going to be don’t use S6 in images that need this option. I didn’t realize since I’ve never needed it but that option is pretty dangerous. Since you forked this addon I’m not sure you’re aware but this bit of code that was copied from the Glances addon is actually critical:

When running with host_pid: true if you don’t have this code then stopping the addon literally shuts off your machine. Because S6 kills the process with PID 1 and every child process like it normally does, except this time its doing that to the host OS not the docker container.

It’s possible we’ll come up with a way to use S6 here but it clearly was not designed for this.

Would have to bring it up with the code owner for that integration, I’m not really familiar. Although to be honest this whole host_pid requirement makes me think there’s long odds of that happening.

Thanks for the reply! I’m going to take your step 1+2 approach as outlined in the issue.

In the long run, I’m going to probably copy whatever frenck does with glances. Figured he’s more qualified than me :sweat_smile:

When running with host_pid: true if you don’t have this code then stopping the addon literally shuts off your machine. Because S6 kills the process with PID 1 and every child process like it normally does, except this time its doing that to the host OS not the docker container.

Yes, I noticed that the hard way haha then found this issue and realized NOOP was the key and that’s where I copied the code from.

1 Like