Per Franck’s request, I am moving the Github discussion here. Before I begin, let me be very clear: I love HA, and I do not mean to criticize. But engineering discussions must be blunt to be effective, and I am simply trying to lay out a clear case due to the confusion in the other thread. Note that I know that Supervisor is more than container orchestration – I’ll address that below.
I’ll start with why I would prefer to use K(3|8)s over Supervisor:
Supervisor is basically a black-box. There’s a whole ecosystem of Observability and maintenance tools in Kubernetes. It is easy to see where a container is scheduled, the resources it is consuming, its logs, its failure states, etc. In short, Supervisor has a very limited subset of those features offered by Kubernetes.
When using Supervisor, all containers are scheduled on the same machine. This will not scale past a certain point. My home is already too large of a deployment for Supervisor to be an option.
Supervisor re-invents the wheel. It approaches things like multicasting/DNS resolution in a unique way, incompatible with the well-defined solutions that exist elsewhere.
A home-made container orchestrator seems like it’s just asking for trouble (see: “The Impact of Container Orchestration on Isolation”). Has Supervisor ever been subjected to a pentest?
The future of Docker is uncertain. Kubernetes has moved away from the Docker runtime, and supports many other CRIs, which makes it much more forward-compatible.
Extensibility. Once you’ve chosen a container orchestrator, you’re basically forced to stick with it. I don’t want to port everything to Supervisor’s home-grown syntax.
The fact that the dev team has to keep explaining to (presumably intelligent) users that “add-ons are not just Docker containers” is indicative of code-smell. It seems like the current Supervisor API design pattern breaks the “Self Containment” design principle (one of the 7 principles of container-based application design, stating that a container may not rely upon the presence of anything outside the container) – because add-on component containers are coupled to the presence of the Supervisor API. Generally, containers which need to interact use a pull model instead of a push (i.e, Prometheus). Instead of broadcasting to Supervisor, Supervisor would use standard pod labels to inherently discover those resources which are of interest. Thus, a container does not need to be designed for HA – HA can configure itself to support the containers, and the tight-coupling is broken.
Now, the statement from Franck has been that those who wish to use Kubernetes may simply do so. While technically true (I do!), I worry about the implications for the future of HA. The reason can be found in the statement that “Supervisor is more than a container orchestrator.” For that to be true, then those of use not using Supervisor must be missing features (and we are: the community of add-ons which integrate nicely with HA). Continuing to invest in this design pattern feels like a canonical example of the Sunk Cost Fallacy.
My belief is that Supervisor could be made into a Service, ripping out the orchestration components. While this would certainly incur development effort, it would ultimately limit the scope of HA to doing what HA does best, and actually free up development for those key features. Put differently… why is the team spending time building & maintaining a container orchestrator, when the best that can possibly be hoped for is a very poor substitute for Kubernetes? Moreover, this means that add-on developers must also spend their time building one-off implementations compatible only with HA. Think about how much of everybodys’ time it would have saved if the HA add-on system simply installed an already-designed, generic container!
More importantly, think how much time will be saved in the future if the “add-ons” could easily pull from the entire Kubernetes (helm & operator) ecosystem!
edit Here’s an example of what I mean. The topic which led me down this wormhole was trying to figure out how to build an add-on that would be exposed to Prometheus. In a Kubernetes world, this is trivial. I’d just label my pod and call it a day. But in a Supervisor world, I need to either rebuild or reconfigure Prometheus to somehow use the Supervisor API for discovery of the add-on. Thus, the Supervisor API design pattern requires changes to both services (container add-ons) instead of just the one which I am responsible for.
… but I understand that the team has decided. And I respect the fact that Supervisor is much more complicated than my 1 day of research may indicate. I just worry that this will be an increasing problem as time goes on, and I want to make sure that the other side of the case is as clear as possible.