Kubernetes vs. Supervisor

zaneclaes · April 6, 2020, 6:41pm

Per Franck’s request, I am moving the Github discussion here. Before I begin, let me be very clear: I love HA, and I do not mean to criticize. But engineering discussions must be blunt to be effective, and I am simply trying to lay out a clear case due to the confusion in the other thread. Note that I know that Supervisor is more than container orchestration – I’ll address that below.

I’ll start with why I would prefer to use K(3|8)s over Supervisor:

Supervisor is basically a black-box. There’s a whole ecosystem of Observability and maintenance tools in Kubernetes. It is easy to see where a container is scheduled, the resources it is consuming, its logs, its failure states, etc. In short, Supervisor has a very limited subset of those features offered by Kubernetes.
When using Supervisor, all containers are scheduled on the same machine. This will not scale past a certain point. My home is already too large of a deployment for Supervisor to be an option.
Supervisor re-invents the wheel. It approaches things like multicasting/DNS resolution in a unique way, incompatible with the well-defined solutions that exist elsewhere.
A home-made container orchestrator seems like it’s just asking for trouble (see: “The Impact of Container Orchestration on Isolation”). Has Supervisor ever been subjected to a pentest?
The future of Docker is uncertain. Kubernetes has moved away from the Docker runtime, and supports many other CRIs, which makes it much more forward-compatible.
Extensibility. Once you’ve chosen a container orchestrator, you’re basically forced to stick with it. I don’t want to port everything to Supervisor’s home-grown syntax.

The fact that the dev team has to keep explaining to (presumably intelligent) users that “add-ons are not just Docker containers” is indicative of code-smell. It seems like the current Supervisor API design pattern breaks the “Self Containment” design principle (one of the 7 principles of container-based application design, stating that a container may not rely upon the presence of anything outside the container) – because add-on component containers are coupled to the presence of the Supervisor API. Generally, containers which need to interact use a pull model instead of a push (i.e, Prometheus). Instead of broadcasting to Supervisor, Supervisor would use standard pod labels to inherently discover those resources which are of interest. Thus, a container does not need to be designed for HA – HA can configure itself to support the containers, and the tight-coupling is broken.

Now, the statement from Franck has been that those who wish to use Kubernetes may simply do so. While technically true (I do!), I worry about the implications for the future of HA. The reason can be found in the statement that “Supervisor is more than a container orchestrator.” For that to be true, then those of use not using Supervisor must be missing features (and we are: the community of add-ons which integrate nicely with HA). Continuing to invest in this design pattern feels like a canonical example of the Sunk Cost Fallacy.

My belief is that Supervisor could be made into a Service, ripping out the orchestration components. While this would certainly incur development effort, it would ultimately limit the scope of HA to doing what HA does best, and actually free up development for those key features. Put differently… why is the team spending time building & maintaining a container orchestrator, when the best that can possibly be hoped for is a very poor substitute for Kubernetes? Moreover, this means that add-on developers must also spend their time building one-off implementations compatible only with HA. Think about how much of everybodys’ time it would have saved if the HA add-on system simply installed an already-designed, generic container!

More importantly, think how much time will be saved in the future if the “add-ons” could easily pull from the entire Kubernetes (helm & operator) ecosystem!

edit Here’s an example of what I mean. The topic which led me down this wormhole was trying to figure out how to build an add-on that would be exposed to Prometheus. In a Kubernetes world, this is trivial. I’d just label my pod and call it a day. But in a Supervisor world, I need to either rebuild or reconfigure Prometheus to somehow use the Supervisor API for discovery of the add-on. Thus, the Supervisor API design pattern requires changes to both services (container add-ons) instead of just the one which I am responsible for.

… but I understand that the team has decided. And I respect the fact that Supervisor is much more complicated than my 1 day of research may indicate. I just worry that this will be an increasing problem as time goes on, and I want to make sure that the other side of the case is as clear as possible.

Mutt · April 6, 2020, 8:23pm

A very well reasoned case with some excellent points.
You will forgive me though if I do a lot more research before jumping ship.
It may help if you write up your installation method.
Does it apply to pi’s, nuc’s etc.?

You can see why the generic install is so favoured, write an image to a card plug it in and off you go. (and yet we get soooo many questions/issues from newbies)

So that will/should be maintained, but this ‘could’ be an alternate upgrade path.

Does it cost anything.?

zaneclaes · April 6, 2020, 8:41pm

Thanks for the reply, @Mutt.

First, let me say that I am not saying that Supervisor should be removed. The goal would be for the “generic case” to still work exactly the same. The only difference would be to remove the container orchestration from Supervisor and instead delegate the work to Kubernetes. But this would be an implementation detail, hidden from the users.

Deploying Kubernetes is easy these days. For anybody reading this who hasn’t tried in the last ~6 months, I recommend looking again. I have a battle-hardened Ansible repository I could adapt to this purpose. It’s been tested on (Raspberry Pi 3+4, Atomic Pi, and generic linux servers) with (Raspbian & Ubuntu) and (Jessie & Buster). In fact, I’m running all of those different combinations right now in 3 different deployments around the world, each with a HA instance. And there’s lots of other good resources out there, as well, like k3s. Assuming you have Docker already installed, you only need about 3 more commands to install Kubernetes: (1) install kubectl/kubeadm (2) initialize the master node (3) install a networking plugin, like Flannel. This will give you a working single-node master, and could be incorporated into the generic install script. Personally, I do exactly that, and then I run HA-Core via the Helm repository. My suggestion would be to simply add Supervisor as a side-car to the official Helm repository.

I would need to chat with folks about the Supervisor upgrade path to try to make it as seamless as possible. I roughly sketched it in the Github issue. TBH, the hardest part would be backwards compatibility, from what I can see. Continuing to support the Supervisor API & config.json would require a translation layer between Supervisor and Kubernetes. And I would need to make sure I understood all of the features of Supervisor to ensure there were no gotchas.

There are some great minds in the Github thread, too, so I don’t want to get too prescriptive about the implementation details of changing Supervisor over, as it might lead to a straw-man argument.

frits1980 · April 6, 2020, 9:41pm

Didn’t you just answer your own questions with this?

Although I do agree on most of your points. But I also think going for kubernetes is also a risk. It’s a dependency were HA might not benefit from in the future. Since you say the future of docker in uncertain and since the same company is going to be the manager of that future I think kubernetes might just as well have the same uncertainty. But maybe I’m seeing it wrong.

zaneclaes · April 6, 2020, 9:48pm

In a way. I acknowledge that the upgrade path would be a good amount of work. A bit daunting.

Will the long term maintenance of a hand-rolled orchestrator outweigh the cost of switching?

FWIW, I mostly wrote this post to try to clearly articulate the argument from those people on Github who felt this way. I’m not a gung-ho Kubernetes zealot here, although I do generally agree with the sentiment enough to have written this AFAIK, Kubernetes is open-source and pretty much community maintained at this point, not a revenue source or tied to Google’s biz model. Not 100% sure though.

To truly support this idea myself, I’d need to see others from Git chime in and want to contribute brain-power toward its development and maintenance. And, of course, approval from the core devs. As it stands, I’m treating this thread as a philosophical conversation and spirited technical discussion

frits1980 · April 6, 2020, 9:57pm

Fair enough and always good to start discussions and brainstorm about possibilities. To be fair I use docker in the simplest way. And haven’t got a clue of what kubernetes does or can do. And I use supervisor. But like you said the end user wouldn’t know what happens behind the curtains.

Most of the time I’m for using open source big proven software over in house closed source. Just because it’s easier to maintain in case of rapid resource upscaling or when changing supplier. But I also know that there are times open source can bite you in the fingers.

pvizeli · April 6, 2020, 10:39pm

I can’t understand the problem. If you want run k8s because you have the hardware requirements and knowlage, there is helm which have the Home Assistant Core in. You can pick that or make it self running. And good is.

Could be that we need switch to runc in future. More possible is that docker will be bought by Microsoft.

No one said, you need use the Supervisor.

I like Infrastructure as a code and run self some k8s cluster with Rancher or AKS. And you can’t make that also because the Supervisor manage the host hardware too, they have a complete different focus. Feel free to invest you time to make an PR.

As I setup some baremetal k8s cluster, the hardware requirements are big. Yes you can do all as single device but that is for dev systems and not recommend to use that on a productive cluster. With LB, you need 4 devices. 2 For a loadbalancer (I used all time OpnSense with HAProxy in a failover cluster) and 2 Hypervisor to spinup Nodes + Controller. Do you want give such system your parents or other people in here home?

pvizeli · April 6, 2020, 11:07pm

Well let start simple:

How would you k8s cluster look like which need run on:
amd64, i386, aarch64, armv7, armv6

Which network would you recommend which support offical this architecture?

zaneclaes · April 6, 2020, 11:55pm

Sorry it wasn’t clear. I tried to address this point in my post, but basically, Supervisor also forks the end-user-experience. If it weren’t for that basic fact, I’d be fine ignoring Supervisor all together. This is the root-cause of all the naming confusion, etc. that is seen.

I already did. 3 months ago. Set my dad up with his first RPi for Christmas and he installed K8s

You’re right — several will not work on Kubernetes.

But, I have to wonder: are there really so many i386 & armv6 devices that benefit from Supervisor’s container orchestration? Why target the worst hardware with the most sophisticated features? Why isn’t it that use-case which is limited to HA-Core? This would be much easier to explain to users. And as these disappear, so will the forked user experience.

I’m thinking of devices like the Zero W right now. Probably I’m being ignorant and there’s a bunch of i386/armv6 hardware in-use that needs to be supported. But supporting a Zero W doesn’t seem like a good reason to run your own container orchestration to me.

subzero79 · April 6, 2020, 11:57pm

Can you please point me on these repositories ?
I always found the k8s deployment tutorials to differ quite a lot in between each other. Sometimes reading ansible makes it much easier.

pvizeli · April 7, 2020, 6:56am

Yeah, our Supervisor is more IoT

I think that I can say, I’m a k8s user from the first hour. However, they were designed for run IaaC or SaaS over a PaaS/IaaS. They were never designed to run just on 1 device. I know some people did that because as a developer you can’t run a full k8s at your developer instance. Like my Superviserd installer they were designed for the developer, other people use it now to run it on top of Linux. Sure it work, I don’t recommend this. That is like k8s, they highly recommend a full-featured environment of multiple devices which maybe 1% of the world can run at home. If you run k8s productive on 1 device, you did not understand the design of a k8s. It starts with the requirement of a ectd which is not designed for IoT, it works with this limited resource but eats so much resource that you could use for other stuff…

Yes, there are project like minicube or k4s which are smaller but like I said before, for development or small nodes. Nodes != 1 device.

Respect to your dad. It is hard to learn k8s and hold his own system up to date. Howerver, my dad can’t learn k8s and I want hold the system simple as possible for everbody.

zaneclaes · April 7, 2020, 1:44pm

This was true at one point, but not any more. If you’ve SSHd into an Ubuntu machine any time in the past year, it literally advertises that you should use microk8s to do exactly this. The login prompt itself says:

 * Kubernetes 1.18 GA is now available! See https://microk8s.io for docs or
   install it with:

     sudo snap install microk8s --channel=1.18 --classic

It says this to everybody who logs in to Ubuntu, which is perhaps the most ubiquitous Linux end-user distro out there. Why would they be broadcasting via MOTD if not to target general users?

Have you actually benchmarked this? The docker engine is the least performant of any CRI with over 3x the time to start a container compared to containerd alone. Quote from slide #40:

cri-o & containerD both perform better than docker

Is kubelet as much overhead as the docker engine? I can’t find benchmarks, but I doubt it.

Again, outdated information. Here’s a quote from the official Ubuntu wiki:

And from the microk8s website:

(emphasis mine).

Karsten · April 16, 2020, 8:42am

I am in favor of running k8s in some kind of form, microk8s seems like a good choice and runs on 42 flavours of Linux.

Using k8s gives you the ability to run functions framework which would be very nice to have in home automation and all the other k8s apps out there.

There might be some resource consumption problems but you can “just” add and other PI and let k8s handle the workload scheduling. microk8s does clustering as well very easy with microk8s add-node

The added benefit is that you do not need to migrate from PI -> nuc / desktop / server if you are out of resources you can just add an other PI and reschedule the containers.

dailyherold · May 13, 2020, 4:11am

Great thread to read. I’m pretty on board with the idea of K8s as the orchestrator, because that’s what it’s made to do. I’ve actually been planning a k3s or microK8s SBC cluster to use for a lot of homelab stuff since I run K8s at work. Was going to run prometheus as well for metrics, something for log aggregation, etc so I find your reason for thinking about all this quite aligned with mine @zaneclaes. Also found this thread after reading the decommission supervisor blog post (and forum discussion), and would be interested in helping out.

With that said, I do think the overall experience would be a major initial jump in operational complexity for the non-work-k8s-users mainly using docker-compose, if not using the OS image. We could make the install process automated, and scripted for k3s/microk8s with a chapter on multi-node cluster for those interested. Beyond that having k8s knowledge would be kind of a prerequisite in my mind. We all want to debug, tinker, break things, put back together, and with k8s as the orchestrator we’ll need people to dive in a bit with k8s resources and operational usage. Benefit would be that there is a world of k8s resources/advice versus the custom supervisor specific to just this community and the subset who run it in prod.

Would be interesting to discuss a homeassistant controller or operator as well given some of the things that would probably need to happen within cluster for maintaining of core plus normal docker/whatever CRI image addons (essentially the custom templating work supervisor does). So much flexibility to your point if addons could be normal containers.

catchmonster · May 14, 2020, 11:45pm

Nope, you are actually right. Both kuberneties and docker are going their own paths and yes ha will need to adjust and sometimes for no gain.
I don’t like idea and I am for home recepie. That said, it got to support Debian and Unix philosophy.
I personally don’t like idea of system relying on os containers for these type of apps. More like HA using Dockers or kuberneties for plugins or something like that.
I would prefer more this system to be a close to kernel as possible.

mtomlin96 · May 28, 2020, 1:49am

the easiest one out there now is https://k3sup.dev

ofirm93 · October 7, 2020, 8:47am

I think that there are good points for both sides, yet whether it is a good idea or doesn’t it is worth at least trying, even as a seperate project, namely, unrealated to the main Hass project.
Is there someone who tries to make that happen? If so please let me know as I would like to contribute.

PPCM · January 15, 2021, 8:34pm

It is a pleasure to read this thread

Technical agurments for k8s are right for me. As a developper, I think a proprietary solution is less evolutive than “standardized” solution, I prefer to focus my energy on added values rather than to reinvent the wheel, even if it is a littlle more complicated at th beginning.
But I would add one argument: The usability

I deployed a cluster of RasPi with k3s, Rancher and Longhorn (try the package it is great ). I installed on it lot of stuff for development, but for my own use too, like a media center, a mail server, file sharing, …
I tryed HA on it, and effectively, addons are missing.

So I didn’t need to deploy an other cluster specifically for HA (it’s time consuming and it costs), so I use a simple Raspi with a classic installation (without docker)

So the question should be: who need a specific cluster installation for a home automation application?
If it can’t be integrated in your existant cluster, use a standalone device

However, I really like HA, guys, you made a great job, and I’ll really love to integrate it in my k3s cluster

swiftlyfalling · January 15, 2021, 8:58pm

At least in spirit, I agree with this. I’ve never run any of the “supervised” flavors of Home Assistant for this very reason. Sure, I “miss out” on add-ons, but it’s worth it. k3s is SO easy these days.

And yeah, any path the developers take could result in a dead-end road in the future. Docker is dying, despite that being the current chosen path. But, at the very least, k3s does almost everything Home Assistant needs a “supervisor” to do, and it does so in an open, standards-compliant way.

My only uncertainty is, now that Supervisor is developed and presumably does what it needs to do, is all the effort to change over to a k8s based architecture worth it. There are a lot of pieces to the puzzle that have to be just right to land on a stable, easy to use platform. Anyone who is capable of making the conversion to k8s likely doesn’t really NEED Supervised Home Assistant or Home Assistant Add Ons anyway, as they are skilled enough to spin it up the “old fashioned” way (as both you and I have done). So the real benefits are to the maintainer of the Supervisor, as well as to the community of potential Add On developers since they’d have a standard platform to work with instead of Supervisor.

hazio · January 23, 2021, 10:08pm

Would be nice if HA supported Containerd (k3s) install without docker.