Hello_world example addon from developer docs stopped working (s6 overlay issue?)

CentralCommand · May 13, 2022, 11:41pm

You got it already actually, this appears to be all that is needed:

As for the addon log looking different, that’s to be expected. S6 changed a lot in V3 and they log all the things they’re doing. They are mostly backwards compatible with a few exceptions that we’ve tried to note, looks like we may have missed one here.

As an aside you should definitely expect a bunch of addon updates coming up around the ecosystem. Although V3 is mostly backwards compatible they did mark a ton of things as deprecated/legacy that were previously considered best practices in addons. Deprecated stuff doesn’t have to be fixed immediately but does need to be fixed.

EDIT: Put in a PR to adjust the blog post and hello world example. Thanks for catching this @jant90 ! A lot of the HA addons already had init: false so we didn’t catch that this had become mandatory.

davidusb · May 14, 2022, 6:09am

For me it only worked again after adding init: true and granting full permission to the run file with: chmod 777 run

Lawris · May 14, 2022, 4:48pm

Hi, I did everything that was suggested, my addon finally runs but I can’t get the environment variable SUPERVISOR_TOKEN anymore, it has become undefined, therefore I can’t communicate with HA. Any idea on how to fix that? Thanks.

CentralCommand · May 14, 2022, 6:56pm

You definitely need to grant execute permission to the run file. I’m not sure how init true worked for you. Is the addon in a repo I can look at?

CentralCommand · May 14, 2022, 6:59pm

I’m not really sure how that could be, nothing about that was changed. The image has no control over that, supervisor injects that env when it runs the container. If the addon is in a repo I can look but there’s no recent change in that area I’m aware of to point to.

davidusb · May 14, 2022, 7:35pm

Hi, yes of course you can take a look here: GitHub - davidusb-geek/emhass-add-on: The Home Assistant Add-on for EMHASS: Energy Management Optimization for Home Assistant

davidusb · May 14, 2022, 9:11pm

Oh I’m sorry I meant to say init: false

CentralCommand · May 14, 2022, 10:29pm

Oh that makes sense then, that’s exactly what I found as well. I just modified the blog post to include init: false after this thread

Lawris · May 15, 2022, 9:27am

Thanks for your reply, I just figured that the problem was that I was lacking the "hassio_role": "default" in the config file. Weird because it wasn’t necessary before the update… Thanks again!

jant90 · May 15, 2022, 10:56am

Thanks for that. Also, wouldn’t it make sense to make init: false the default now?

CentralCommand · May 15, 2022, 12:14pm

Hm that’s a new one to me, I didn’t have to make that change with any addons I tested. If the addon is in a GitHub repo I would still like a link so I can see try and figure out why you needed that and if we need to modify the blog post for others. Glad you got it working though!

CentralCommand · May 15, 2022, 12:20pm

If we were adding that option for the first time today - absolutely. However keep in mind not everyone uses the HA base images or s6 overlay in their addons and those that don’t still need init: true. So that means that would be another breaking change.

It’s not off the table but it would need to be managed. And tbh it’s probably not be worth it to risk breaking folks. The breaking change here was worth it because the base image was pretty far behind in it’s s6 version and it’s important to keep dependencies current. But that wouldn’t update anything, it would just be a more sensible default today.

Also init: false was actually supposed to be used in s6 v2 as well it just wasn’t required before (like the permissions thing in the run and finish scripts). The example addon and many of the addons in the official addons repo were already using it for that reason. We just missed a couple and unfortunately one of the ones missed was the hello world one.

lmm7425 · May 15, 2022, 5:38pm

@CentralCommand - Are you affiliated with the Home Assistant project? Asking because you said “This is the first time we’ve had an issue like this”.

I posted in a different thread, but am having the same can only run as pid 1 error in my add-on I develop. I’ve already marked my scripts as 755 and set "init": false in my config.json. What am I missing? For reference, my branch is here.

Searching GitHub, I see quite a few other add-ons are having the same error.

CentralCommand · May 15, 2022, 6:57pm

Yes. I’m mdegat01 on github.

So those were the only two steps I had to do in my testing. I will try your branch and take a look later to see if I can find something, can’t right now.

Lawris above mentioned needing to add hassio_role: default so I suppose you could try that? I don’t know why that would make a difference since default is still supposed to be the default value of that field but I suppose it can’t hurt in that case.

lmm7425 · May 16, 2022, 9:14pm

@CentralCommand I tried setting "hassio_role": "default" without any luck. Agree with you that it shouldn’t make a difference though…

I found two other add-ons on GitHub that have gone through the instructions and it’s still not working for them. Wondering if there is something else we’re missing…

CentralCommand · May 16, 2022, 11:43pm

This one just worked fine for me when I added init: false to the config. I didn’t have to do anything else and it seemed to build, start and run. At least as far as I know, I got this:

Which said to me that the addon was doing its thing it just found an error in my config. Which makes sense since I did not touch the config as I don’t use this addon and have no idea how it works.

So this one was multiple things, one of which had nothing to do with s6.

The addon was not pinning the version of the base image it was using. When the latest version of the addon base images came out one of the other things we did was make alpine 3.15 the latest for the alpine base image since its been out and stable for a while now. Since the addon did not pin the version of the base image it was using it immediately started trying to use alpine 3.15 instead of alpine 3.14. This is a problem because it appears the addon can’t be built on 3.15, some other updates are required.

This part is not captured in the blog because its completely unrelated. This is the risk of using latest and why every best practices guide for developing docker images says to pin your versions. It would’ve happened no matter what was in the release. It could’ve been a documentation change and the addon would’ve broken.

I actually put in a PR for this one because I use Z2M and do know how it works so I can test it. This could be something to watch out for elsewhere though I suppose? If you see an addon without a build.yaml or the version pinned in the Dockerfile then they are risking issues like this.

CentralCommand · May 16, 2022, 11:57pm

For your addon, I think you found a bug actually. I noticed with your addon that when protection mode is on I get this as output

The addon obviously isn’t working like this but it is trying to. It’s actually running your cont-init script which is then exiting due to a validation error. But then when I turn off protection mode I get this:
Screen Shot 2022-05-16 at 7.56.17 PM

Which suggests to me that init: false is not working correctly when protection mode is disabled. Can you make an issue here about this?

EDIT: As an aside, this is a scary add-on.

As such, I have requested all possible permissions.

No kidding… Actually forgot an addon could request to turn off apparmor, haven’t seen that used before.

lmm7425 · May 17, 2022, 2:40am

@CentralCommand I appreciate you reaching out to that zigbee2mqtt repo and putting in the PR. I added a comment to the other issue confirming your findings.

Thanks for the feedback. I had started it with protection mode enabled and saw that output, but didn’t think that error pointed to a bug. I just thought it was still broken, but in a different way. 🤦‍♂

I opened an issue as you said.

EDIT: As an aside, this is a scary add-on.

Ya, tell me about it. I don’t love it.

It’s similar to glances in that it requires access to hardware data (CPU, memory, etc…) about the host it’s running on. I’ve tried leaving protection mode enabled, but I’m not getting all the metrics I need. I could probably investigate dropping some API access (e.g., hassio_api, homeassistant_api, auth_api) in order to get my rating up, but protection mode pretty much needs to be disabled.

Side note: If you could integrate node metrics into the official Prometheus integration, that would be great and would negate the need for my add-on completely .

CentralCommand · May 17, 2022, 4:56pm

FYI I replied on your github issue. Figured out that its really an issue with host_pid: true which only works when protection mode is off. There’s actually not a good solution to this though I put what I have in the comment. And added a PR about it.

I mentioned keeping an eye on the glances addon in the comment and you should do that. But to be honest I think the only solution here is going to be don’t use S6 in images that need this option. I didn’t realize since I’ve never needed it but that option is pretty dangerous. Since you forked this addon I’m not sure you’re aware but this bit of code that was copied from the Glances addon is actually critical:

github.com

loganmarchione/hassos-addons/blob/28fbd78642d9b7daae82b8e8fd60cf93249e3391/prometheus_node_exporter/rootfs/bin/s6-nuke

 #!/usr/bin/env bash
# ==============================================================================
# Home Assistant Community Add-on: Glances
# This file turns s6-nuke into a NOOP to prevent total termination
# of the host system since the add-on runs in the same PID namespace.
# ==============================================================================
echo "S6-NUKE: NOOP"
exit 0

When running with host_pid: true if you don’t have this code then stopping the addon literally shuts off your machine. Because S6 kills the process with PID 1 and every child process like it normally does, except this time its doing that to the host OS not the docker container.

It’s possible we’ll come up with a way to use S6 here but it clearly was not designed for this.

Would have to bring it up with the code owner for that integration, I’m not really familiar. Although to be honest this whole host_pid requirement makes me think there’s long odds of that happening.

lmm7425 · May 17, 2022, 6:09pm

Thanks for the reply! I’m going to take your step 1+2 approach as outlined in the issue.

In the long run, I’m going to probably copy whatever frenck does with glances. Figured he’s more qualified than me

When running with host_pid: true if you don’t have this code then stopping the addon literally shuts off your machine. Because S6 kills the process with PID 1 and every child process like it normally does, except this time its doing that to the host OS not the docker container.

Yes, I noticed that the hard way haha then found this issue and realized NOOP was the key and that’s where I copied the code from.