On the other hand, the end state is often: Dear friend, you have built a LAMP server, using k8s (and Helm, and operators, and cert-manager, and Prometheus, and…).
I hope that this doesn’t come across as too snarky but I’ve just suffered the pain of getting a single-node Ansible AWX server running. Their only supported architecture is k8s and their operator. It took days! I remember back in the old days we used to say “simple should be easy and complicated should be possible.” Now it’s “everyone needs Google scale so that’s all we’re going to do.”
As an example of doing it the right way, the Immich photo management app is comparable to AWX in run-time complexity but it comes with a Compose file that makes it trivial to run and manage on any machine with docker already installed.
FWIW you don’t need Helm for Kubernetes, and you’re going to want something like cert-manager regardless (and if you don’t want cert-manager, just don’t do anything—I wish k8s would include it by default, but it doesn’t).
Similarly, there’s a lot of middle ground between single node and Google scale, and you should have at least one replica running in production in most cases, which i means you’re going to need some sort of load balancer and some deployment automation. At this point you may as well start considering Kubernetes.
Not only does Kubernetes provide load balancing and deployment automation, but it also allows you to easily run cloudnativepg, an operator that makes it easy to run a Postgres cluster with replication, backups, and failover. Maybe you don’t need that for your production application, but if you do, it’s a lot easier to run k8s and that operator than it is to use other tools to manage Postgres IMHO.
which means you’re going to need some sort of load balancer and some deployment automation. At this point you may as well start considering Kubernetes.
That’s an astonishing jump. For a handful of machines, a few scripts with ssh and some pretty standard networking tricks would be what I think of long before k8s.
I’m not sure exactly what you’re thinking with your “few scripts and networking tricks”, but it sounds like you’re trying to refute the need for a load balancer by arguing that you can implement your own load balancer.
Moreover, while any given feature Kubernetes provides out of the box could be implemented easily enough by a sufficiently competent, experienced engineer, the point of the article is that the total effort is much greater and you still end up with a platform for which no documentation, or training, or experienced hiring pool exists and your expensive competent, experienced professional has spent his time building Kubernetes instead of doing more valuable tasks (or more likely, your professional only thinks he’s sufficiently competent and experienced, and you wind up with a very broken Kubernetes that causes endless frustration).
it sounds like you’re trying to refute the need for a load balancer by arguing that you can implement your own load balancer.
If you have two machines you don’t need a load balancer. You just need a way to fail over to the second machine. That’s just network configuration, which you have to do anyway, Kubernetes or no Kubernetes.
That’s the problem with this argument. You still have to deploy machines, configure operating systems and networking, install services (whether your services or Kubernetes’s services), etc. unless you’re using a 3rd party managed Kubernetes, which renders the comparison irrelevant anyway. So you can expose Kubernetes as a uniform target to a bunch of teams, but you still have a lot of stuff that is local deployment. If you don’t have a platform team providing a uniform target to lots of other teams, and you’re already doing all this configuration anyway, why do the configuration of Kubernetes as well?
If you have two machines you don’t need a load balancer. You just need a way to fail over to the second machine. That’s just network configuration, which you have to do anyway, Kubernetes or no Kubernetes.
Kubernetes generally takes care of the failover for you. You don’t really need to think about it.
That’s the problem with this argument. You still have to deploy machines, configure operating systems and networking, install services (whether your services or Kubernetes’s services), etc.
There’s much less operating system configuration if you’re using Kubernetes than if you’re rolling your own hosts, and the networking setup is “install MetalLB”. It’s extremely minimal. Similarly, certificate management is “install cert-manager”. DNS is “install external-dns”. Similarly there are standard, high-quality packages for logging, monitoring, alerting, volume management, etc. While I’m surprised there are Kubernetes distros that include this stuff out of the box, it’s still less work than doing it yourself and you get the benefits of standardization: abundant, high quality documentation, training resources, and an experienced hiring pool. And moreover, as your requirements scale, you can incrementally opt into additional automation (Kubernetes is a platform for infrastructure automation).
unless you’re using a 3rd party managed Kubernetes, which renders the comparison irrelevant anyway.
I’m not sure what you mean here. Why would managed Kubernetes render the comparison obsolete?
So you can expose Kubernetes as a uniform target to a bunch of teams, but you still have a lot of stuff that is local deployment. If you don’t have a platform team providing a uniform target to lots of other teams, and you’re already doing all this configuration anyway, why do the configuration of Kubernetes as well?
I’m not following you here. What do you mean by “uniform target” vs “local deployment”.
No, it’s not. You still have to set up the underlying network that the physical machines are on. That’s the key point. You can install all this stuff within Kubernetes…but it’s all dependent on nodes that are fully configured outside of Kubernetes.
I see what you mean. Yes, you need to have nodes that are on a network, but “fully configured” for running Kubernetes basically means “SSH is configured”, which is dramatically easier than any configuration I’ve seen for running production-grade applications directly on nodes (no need to haul out ansible or anything like it).
Kubernetes generally takes care of the failover for you. You don’t really need to think about it.
By design, sure. In practice though the biggest failures I’ve had with kubernetes have been nginx ingress pods just randomly losing their ability to connect to upstream and silently causing downtime that wouldn’t haven’t happened if the app was exposed directly, using an AWS ALB, etc. I’m not saying this is everyone’s experience. It might be due to noob misconfiguration mistakes. But that’s sorta the point the article misses. Kubernetes might offer repeatable solutions for these problems (and I appreciate it for that) but it really isn’t a solution on its own. Every team running a kubernetes cluster has their own collection of weird scripts with bespoke workarounds for kubermetes quirks.
Thank you! In this case I was told to stand up an AWX instance, and I didn’t know what I didn’t know, but I’ll take a look at your suggestion. If there’s a “next time” then I’ll be able to advocate for an alternative.
OMG this. At work I tried to fix a broken internal app that I had no role in getting set up. Everyone who had worked on it had departed. So started the slog of figuring out how it worked, where it was deployed, and getting access. Oh cool, this runs on the “internal dev platform”. What is that? Oh it’s a kunernetes cluster. Neat that they tried to think ahead spread some infra costa across projects. I wonder what else is running on this thing… nothing. It is a 2 node kubernetes deployment that runs one application pod. Not per node. Total.
It is a 2 node kubernetes deployment that runs one application pod. Not per node. Total.
I was working in an Oracle shop where devs would spin up an Oracle instance for a web application that had 4 data items. Not databases. Not tables. Items.
My production app ran completely without databases and was several orders of magnitude faster, smaller, and more reliable than its predecessor.
But think. How the devs could otherwise move on in their careers without “designed, built, deployed and maintained a scalable K8 solution” on their CVs?
Ding ding ding! You got it. On a lark I searched Google for large sections of the docs they “wrote” for it and sure enough it was all basically copy/pasted from guides on how to set up RBAC.
This really downplays the complexity of kubernetes and the specialization it inherently requires. Claiming that building something yourself distracts from your core business rings pretty hollow to me when you have to dedicate one or more people to running kubernetes. There are piles and piles of resources and automation one must learn to work with.
If you’re interested in alternatives (as I am), you don’t need to use shell scripts. Real programming languages exist. And real immutable and reproducible Linux distributions do too. Using ansible to automate a Linux install is neither immutable nor reproducible.
There is a time and place for kubernetes, but I reject the notion that it’s appropriate for all or even most companies. I don’t buy the hype, regardless of how much superior this post reads.
With respect to “specialization” it’s a lot easier IMHO to learn Kubernetes than it is to learn any given bespoke Kubernetes alternative. And if you run into problems with a bespoke platform, good luck finding documentation online, and if you want to train someone on your bespoke platform, you won’t find training materials online, and if you want to hire someone who already knows your bespoke platform, you’re similarly out of luck.
And programming languages, bash or otherwise, don’t solve the problem that Kubernetes solves, which is reconciliation. Programming languages are imperative, while Kubernetes lets you describe the desired state of your infrastructure and reconciles everything accordingly whereas programs are mostly only useful for creating the infrastructure the first time, not updating it.
There are declarative infrastructure options besides kubernetes. Terraform, nixos and puppet all meet that definition. Kubernetes assumes that everything always needs to be constantly reconciled, but often that’s not the case.
I understand the concern about training and hiring, but I stand by my statement that most companies shouldn’t be running kubernetes at all so it’s a bit moot IMO. We can clearly agree to disagree on the value here. :)
There are declarative infrastructure options besides kubernetes. Terraform, nixos and puppet all meet that definition.
You’re moving the goal posts here. Your initial comparison was between Kubernetes and programming languages. NixOS and Puppet are primarily for managing hosts, not general infrastructure. But yes, Terraform fits the bill.
Kubernetes assumes that everything always needs to be constantly reconciled, but often that’s not the case.
I’m not sure what you mean by “always needs to be constantly reconciled”, but for the most part, Kubernetes only runs the reconciliation when things change. For example, if a pod dies, Kubernetes will restart it automatically, whereas Terraform will not (Terraform typically expects you to configure something like Kubernetes that does automatic reconciliation, like EC2 or some other cloud service).
I understand the concern about training and hiring, but I stand by my statement that most companies shouldn’t be running kubernetes at all so it’s a bit moot IMO. We can clearly agree to disagree on the value here.
Agree to disagree, but even if you don’t pick Kubernetes, the thing you build instead is still going to have documentation, training, and hiring concerns not to mention the technical concerns like load balancing, certificate management, deployment automation, central logging and monitoring, networking, etc. I think the barrier to entry for Kubernetes is pretty low these days, and it’s often a lot easier just to spin up a Kubernetes cluster than it is to do all of the host configuration. Of course, it depends on what you already know.
You’re moving the goal posts here. Your initial comparison was between Kubernetes and programming languages.
Somewhat fair that I did change the subject slightly, but I never once compared Kubernetes and programming languages. The separate paragraphs in my original response were meant to indicate two different responses to the OP article. They brought up crappy bash scripts and I was pointing out that automation can be done in better, more maintainable, languages.
NixOS and Puppet are primarily for managing hosts, not general infrastructure
Why do we run kubernetes? Not to only manage general infrastructure. The value of kubernetes is services, which includes managing those services, along with providing infrastructure automation. So I do include nixos and puppet because they can manage services running on an operating system. Whether a web service or load balance, they can provide the same end value for some or much of what kubernetes offers. There are trade offs, yes!
I only posit you can go quite far with a simple Linux system. ;) If that’s unpopular these days, so be it, but I’m not sold on the complexity k8s requires and the automation surrounding it.
(Slight jest because there’s no sites enabled on it, but really, while there’s definitely a learning curve, it’s not as sharp as it might look from the outset.)
I’ve played a bunch with nix and bounced off to the point where I need to see some dramatic change/improvement before I’ll touch it again.
Maybe if this Hetzner box locks me out again I’ll get sick of it and then codify the entire config but for now, I’m going to apt-get my way through it.
That’s very fair. Maybe I was lucky in that I bounced off it first back in 2020, and enough had improved by my second attempt last year that I felt like I could stick with it. There’s still plenty of awkward/hard/inexplicable bits and no way past them except through them.
It took me three times but it finally all clicked. I’m now quite sold on the nixos distribution, but also quite open about its own complexity and costs.
Getting the Hetzner thing clicked together again was a nightmare (systemd docs do not exist so you’re copy-pasting bits and pieces of superuser.com together), so if it breaks again I’ll probably go for something redistributable.
Claiming that building something yourself distracts from your core business rings pretty hollow to me…
Me too. And I think the last ten years proves that this is demonstrably false. Lots (most? all?) high growth startups that actually scale up rapidly seems to have lots of custom infrastructure components that they wrote themselves for packaging, CI, IaaS orchestration, database and state management, deployment orchestration. I can’t think of anyone that’s ridden a vanilla, cloud-managed, or even mildly customized K8s distribution (and nothing else) to the top.
Don’t get me wrong. K8s does what it says on the tin. And it might be the most accessible technology that accomplishes such a broad featureset. But I also think it’s pretty complex for what most people need from it. And in all the systems I’ve seen, other non-k8s components are doing a lot of heavy lifting. Examples include:
CI/CD systems, where organizations encode their own definitions of what dev, staging, and prod really mean; k8s clusters are the receipients of these promoted artifacts
Secret management systems, where organizations encode their own definitions of what kind of secrets are accessible where, by whom
IaaS platforms that do actual inter-networking, like actual “real” networking with actual networks where you actually care about actual IP addresses and subnets, running real DNS and not some corny cluster-scoped knockoff
Behind one of those DNS records is probably a database that isn’t running on K8s because you actually care about your data and you don’t really trust k8s’ storage management for that, or maybe you do, but it’s on a dedicated stateful cluster with the right kinds of disks and the whole storage layer is a custom operator, written in Go, that your infra team maintains.
What are most people getting from k8s? Bin-packing, very basic config management, very basic logging (that no one uses), and restarting something that crashes. That’s actually a nice feature set! But I think we’ll find a nicer way to deliver that feature set in the future.
Maybe I’m misunderstanding the argument, but it seems like you’re implying a sort of dichotomy where you either build everything yourself or you ride vanilla, unadorned Kubernetes all the way to the top, and if you customize Kubernetes at all you’re not really benefiting from Kubernetes? I agree that most successful companies who use Kubernetes are using it as an extensible platform for hanging their automation off of, but I think that validates Kubernetes’ value proposition.
The fact that you can start with something as simple as a three node k3s cluster for a single team and, as your requirements increase, continue to add automation all the way up to enterprise without hitting some scaling limit that forces you to completely overhaul your platform is pretty cool in my opinion. And I think the people who say “Kubernetes is too complicated for most companies” are really overestimating the barrier of entry for Kubernetes and especially underestimating the complexity of alternatives, at least beyond a single node scale. For example, if you want Postgres with replication, backups, and failover, with Kubernetes you can just install the CloudNativePG operator, while it seems like the best practice otherwise is to cobble together a variety of tools (I’m sure there’s something out there that addresses this, but it doesn’t appear that there’s any canonical solution apart from cloud-provider managed Postgres).
What are most people getting from k8s? Bin-packing, very basic config management, very basic logging (that no one uses), and restarting something that crashes.
The API itself is pretty useful and powerful, and along with Custom Resource Definitions, it forms the foundation for using Kubernetes as a full platform for highly reliable, scalable, infrastructure automation. You can pretty easily write your own operators for automating infrastructure, and there are off-the-shelf packages for common tasks ranging from cert management and dns all the way up to high-availability Postgres.
six months later, you have pile of shell scripts that do not work—breaking every time there’s a slight shift in the winds of production.
This screams of projecting lack of skill into others. Perhaps your shellscripts don’t work. Where does the conclusion that mine won’t work comes from?
This is the same mindset as the common myth of “don’t write your own SQL, Hibernate developers can write much better SQL than you”. Yeah, how did that work out?
Your bash scripts are probably fine, but recognise that rolling your own requires maintenance.
I think it’s a bit lost in the format but for me, the takeaway from this article is that you should be conscious of the point at which maintenance cost starts to outstrip value.
You may well not be there and it’s definitely easy to fall into the trap of adopting technology X too early and eating cost and complexity as a result.
But I have worked at enough places where the world is run by collections of well-meaning and increasingly stretched shell scripts marinating in Historical Context. There’s a trap there as well.
Your bash scripts are probably fine, but recognise that rolling your own requires maintenance.
Rolling my own what? The reason why I disagree with this post is because it is vague and fails to detail what exactly would this shellscripts entail and what work does it go to set up deployments on Kubernetes even with an already working cluster. Frankly speaking, I find the amount of painfully complicated yaml it takes to set up a service is more evolved than a simple deployment script.
it is vague and fails to detail what exactly would this shellscripts entail
It kinda isn’t and doesn’t, though. It even provides a handy summary of the set of things that, if you need them, maybe you should consider k8s, after all:
standard config format, a deployment method, an overlay network, service discovery, immutable nodes, and an API server.
Invariably, I’ve seen collections of shell scripts spring up around maintaining kubernetes. The choice isn’t between bash scripts or kubernetes. The choice is around how you bin-pack the services onto servers and ship the code there. All options come with a large side of scripting to manage them.
Not my shell scripts! They’re perfect. Perfect in every ways. And my SQL — well escaped and injection proof! My memory? I bounds-checked it myself with a slide rule, not a single byte will by read out of bounds.
Other people have skill issues but you and me? We’re in a league of our own :)
Oh yeah, the old shortcut to fake humbleness “we all make mistakes, I’m not perfect, neither are you”.
That argumentative position is useless. So we completely relativize any and every bug? Are they all the same? All code has bugs… How many bugs are reasonable? Is it acceptable that a single 10 line shell script has 4 bugs? What about 8 bugs?
And what about Kubernetes manifests? Are they magically bug free because we just say they are?
Can we try to keep the discussion fruitful and somewhat technical?
Yes, I am claiming that the author sounds like they are not too familiar with shell scripts and dismiss them as something that attracts bugs and is difficult to maintain. What at is the technical foundation of such claims?
Your example being a good one. SQL injection was a problem from the old PHP era when lots of people jumped in using relational database without any prior programming knowledge. It is rather trivially avoided and it is virtually non existent nowadays. I think everyone expects it to be non problem and if people go about assembling SQL by string concatenation without proper escaping, that will certainly not fall under the “we all make bugs” category.
It’s a categorical difference. It requires dramatically more “skill” (I would argue that it becomes functionally impossible to do this at any but the most trivial scales but maybe you’re the rare genius who could be curing cancer but prefers to use bash to address already-solved problems?) to write correct, idempotent shell scripts as opposed to describing your desired state and letting a controller figure out how to update it.
Even if you think you are capable of writing immaculate scripts that can do everything you need and maintaining them, can you not conceive of a world where other people have to maintain them when you’re not around? In other words, even if you are perfect, if the baseline skill required to operate a shell-based deployment method is so high, aren’t you basically arguing against it?
Like, there’s plenty of technical arguments against kubernetes, and there’s great alternatives that are less complex. You can even argue about whether some of these things, like rolling deploys, are even required for most people. Skipping all of that and calling someone else a bad programmer because they’d rather use Kubernetes is just mean spirited. Just this week another user was banned for (among other things) saying “skill issue”, but if you rephrase that to “lack of skill” it sits at +22?
This is the same mindset as the common myth of “don’t write your own SQL, Hibernate developers can write much better SQL than you”. Yeah, how did that work out?
Most teams converge on using an ORM. Developers who can’t deal with ORMs and feel the need to constantly break out into SQL are a code smell.
This is largely untrue and the peak gas passed long ago, with ORM libraries that promised to take over the world up to around 2010 being pretty much all dead.
The explosion of popularity of postgrest, supabase, and the like seems unstoppable at this moment.
My experience has been by and large the inverse, with teams bemoaning ORMs systematically because they’d gotten bitten by Weird ORM Bugs more than once. Not saying that raw SQL is more fun, but I derive no fun from ORMs either (nor have I seen many teams having fun with ORMs). Of course, this is also anecdata.
Fair, but at what point in this story is the right time to switch to Kubernetes?
Kubernetes doesn’t advertise itself for deployments on single nodes, but maybe it’s perfectly fine? This is an area I honestly haven’t investigated.
At $work I often have opposite needs: we want to isolate customers, and keep costs down, one server is usually enough. So it’s basically a whole bunch of individual VMs. I’m enthusiastic about NixOS for these types of deployments.
But we also have customers that demand the zero downtime deployments and failover magicks, so we use EKS. And while it’s generally smooth, I find it harder to debug issues, and more difficult to explain to developers. So I’m reluctant about more Kubernetes.
One single-node technique I’ve seen involves an app that has a cloud version and an onprem version. You write the app config once as a Helm chart which is easy to install in our own k8s cluster. Then you give onprem customers a VM image running k3s, which is really stable these days in my experience.
Could you please expand upon “give onprem customers a VM image running k3s”? I roughly know that K3s is a distribution of K8s that is more lightweight, but I’m unsure how what you described plays out in real life.
I’m curious about this too. I tried to theorycraft my way through it for fun, though I haven’t played with k3s in a long time now. The following is untested, mainly just musing while flipping through some docs.
Since Kubernetes requires stable IP addresses for each node, I wonder if the VM image installs k3s, enables the systemd unit but doesn’t start it (env INSTALL_K3S_SKIP_START=true when running the install script) as a way of deferring the cluster init, since the end user may choose to specify their own static IP for the VM, or preconfigure their DHCP server with a static lease.
Then the VM image can also contain Kubernetes manifests[1] or Helm charts[2] at a well-known path (/var/lib/rancher/k3s/server/manifests) such that when the end-user creates a VM from the image, k3s will start, run cluster init, then apply the manifests under that directory. This is where I’d put manifests for the workload I’m trying to ship in the image.
Probably best to include container images for the deployment in the VM image, that way end users don’t have any additional startup cost due to image pulling when creating a VM from the VM image. I don’t remember off the top of my head if the k3s installer includes these images, so I’d run the airgap instructions[3] for good measure to include them.
Now, with regard to including the container images for our custom workloads under /var/lib/rancher/k3s/server/manifests, they can either be prepared with something like docker image save coolcompany/workload:v3.1.2 > workload_v3.1.2.tar [4] onto the machine we are preparing the VM image on. Then we can copy the resulting tarball into the VM image under /var/lib/rancher/k3s/agent/images/. At least, I think k3s will automatically import OCI images from this directory, from how the airgap instructions seem to work.
Though if that doesn’t work, if we’re running the image in a virtual machine prior to running virt-sysprep to finalize the VM image, I think something like sudo k3s ctr image import -n=k8s.io workload_v3.1.2.tar would have the desired effect.
If the app needs persistent storage, I don’t know off the top of my head if the included local-path-provisioner[5] has a default set of paths where it will provision volumes, so if that was the case, I’d probably err on the side of caution and create a well-known path in the image /var/lib/myapp/data and then include a local-path-provisioner ConfigMap[6] under /var/lib/rancher/k3s/server/manifests pointing to that location.
Then we can include a PersistentVolume[7] and PersistentVolumeClaim[8] that references the aforementioned PV under /var/lib/rancher/k3s/server/manifests.
Some additional documentation about this for end users might be good, since they can write a CronJob object to /var/lib/rancher/k3s/server/manifests to back up the persistent data.
Now for accessing the workload. If it’s something that needs to be accessed over HTTP(S), then it’s good to know that k3s already includes the Traefik ingress controller[9], so we just need to ensure our manifests under /var/lib/rancher/k3s/server/manifests define a service with type: ClusterIP and a corresponding Ingress object. I think some additional thought is needed here for providing end users with a turnkey way of having automatic certificate rotation for TLS.
The Traefik ingress controller will automatically listen on the VM’s ports 80 and 443 thanks to ServiceLB[10] which is also included in k3s.
I think there’s still a way forward if our workload is not a simple layer7 HTTP thing, since we can still define a Service object with type: LoadBalancer thanks to ServiceLB and still have it accessible since ServiceLB will listen on the NodePorts.
At this point, I think the VM image template is done, and we can shut it down, run virt-sysprep and begin distributing this as a release artifact.
I think there’s still a lot of open questions here. For example, this VM image will have a Kubernetes API server listening on a non-localhost interface, so maybe that’s not desirable if Kubernetes is meant to be an invisible implementation detail here and there’s no chance of expanding beyond single node.
I think k3s may manage most of the PKI (all of the various k8s components certificates) for us, but not 100% sure. So if an end user has been using the system beyond their expiry date and they don’t automatically rotate, I can’t imagine good things will happen to their deployment, and since I mainly wrote this from the perspective of the end user not being aware that the VM image runs on Kubernetes to begin with, this may not be ideal.
I mentioned above the lack of automatic certificate rotation for the workload. For example if we deployed this to a client’s business or something, they may have their own certs. If the end user doesn’t mind the kubernetes of it all, they could install cert-manager or manually apply TLS crt,key secrets into the namespace where we deployed the workload that depends on it, then the Ingress object needs to be updated to reference it.
Also, this has no provisions at all for upgrading k3s itself lol.
… and probably many more, but it was a fun thought experiment before I went to lunch. I might try this out in my homelab later.
Nowhere in this post do they need what Kubernetes was actually built to do: run a distributed consensus algorithm non stop to efficiently pack containers onto a cluster of machines. That’s why you see people trying to avoid it.
Needing multiple servers does not mean you automatically need etcd either. We built apps for decades with a shared nothing architecture. You probably don’t want your tasks automatically migrating between machines every time the network drops out. If you need a strong guarantee of only one task running at a time, relying on the orchestrator is probably not a good idea either.
Yes, K8s solves a lot of these other problems too but it is not the solution. There are plenty of valid use cases that do no require scheduling and orchestration across a cluster of machines, including the one in the article.
Scheduling is an important feature, but even more fundamentally Kubernetes is ultimately a reconciliation platform and a collection of controllers for common infrastructure concerns like load balancing, deployment automation, service discovery, networking, etc. Many people use Kubernetes even if they don’t need scheduling specifically because the value proposition is much larger.
Thing is you don’t replace a pile of deployment scripts with Kubernetes. You replace it with another pile of scripts/configuration that runs on top of Kubernetes. There’s no universal answer for which pile is easier to build/maintain.
I get that this is trying to parody ‘dear friend, you have built a compiler’ but what it misses is that the original is about trying to convince you to be honest about the domain you’re operating in and design your system accordingly. It’s not about saying “actually you should never try to build a compiler in the first place, you should just wrangle the problem into something that an existing third-party compiler can handle”.
I have seen at least 4 different K8s based teams.
Not only they spend more on the infrastructure in terms of bills but the human cost to run a K8s infrastructure is at least 4-8 hours a week of a mid-level engineer.
What’s the alternative?
Google cloud run, AWS fargate, render.com, Railway app and several others.
Unless you are an infrastructure company or have 50+ engineers, K8s is a distraction.
Exactly, can’t agree more… This post reminds me pre-k8s era, AWS struggling to deliver ECS, everyone creating tooling to handle containers, every mid-company creating their own PaaS… “Let’s use CoreOS… try Deis.. damn, wiped entire environment, Rancher now will solve everything..!” etc.. My team at time was happy using the old droneio deploying on hardened ec2 instances, relying on couple scripts handling the daemons and health checking, faster deployments, high availability… Most of other apps running on Heroku.
Now containers are everywhere, we have infinite tools to manage them and k8s became cloud defacto, consulting companies are happy and devs sad… There is huge space between couple scripts and k8s where we need to analyze the entire context.. IMHO If you don’t have a budget for an entire SRE/DevOps team, weeks or months for planning/provisioning and just run stateless apps, even managed k8s (EKS, GKE, AKS) does not make any sense, as you said, you can achieve high availability using ultra-managed-solutions which runs over k8s (also appengine flex, fly.io) or a combination of other things, we also have nice tools for self-hosting like kamal and coolify and another options like nanovms or unikraft.
btw, I’m a former SRE from payment gateway/acquirer in Brazil (valued at $2.15bi), responsible for hundred thousands of users and PoS terminals connected to AppEngine clusters complaint with PCI-DSS managed by small team. So, yes, I have some idea about what I’m saying…
I looked at App Runner but it would be hell to have 20-30 ‘services’ talk to each other over that and then we’d be stuck in a dead end. We are moving everything to Kubernetes because with a vanilla EKS setup it just works (surprisingly low amount of head ache) and it offers customisability into the far heavens for when we would need it.
Trying to fit the format closely removes a lot of nuance. I promise I’m only half serious, as someone with my own fair share of effective non-Kubernetes setups.
I felt that the original “Dear sir, you have built a compiler” was a lot of fun to read, even though the arguments felt a little weak to me (which can happen if entertainment is prioritized over persuasion). But I was surprised to see so much agreement in the comments.
My rendition is the same way. It’s kind of a just-so story where the challenges and solutions that the reader encounters on their “journey to avoid building a Kubernetes” don’t have strong justifications. But since it offers a simple explanation for a complex phenomenon like the growing adoption of Kubernetes, it can sound more compelling.
The growing adoption of Kubernetes is the result of marketing dollars, devops startups growth hacking, the aging-out of people who cut their teeth on LAMP boxes, the psyop of microservices being the solution to all problems, and developers wanting to larp as Google SREs. I would argue that it isn’t really a complex phenomenon.
So you march on, and add another few sections to your deploy.sh script, certain, that this will be the last of what you need to do to maintain this pile of hacks.
Ehm, this sounds a lot like my job. Where we use Kubernetes.
Sometimes I have a problem like I want to be able to press a key on my computer and have something complicated happen like my backlight brightness to increase. I could just write a simple snippet of python to read the brightness, handle doing the log scaling, and then write a new brightness value. Then I would just need to add it to my documentation for machine setup and put that python script with a small install script into source control.
But then I worry, what if tomorrow I want my brightness script to speak to ChatGPT to perform AI scaling on my brightness depending on my mood which it gets from a photo of my face taken with my webcam,
And then I wonder, what if at some point I wanted to be able to customize which AI agent I wanted to use, and also run my own local LLM. What if I want to also take microphone input and the temperature of my seat. What if I want my brightness manager to have a web browser. What then?
Since I am a sensible person I decide that, instead of writing some straightforward little script to handle this, two lines of shell, and a sentence of documentation and handle new features as they come, I am just going to start a new company, hire a CTO, get him to hire a team of developers, to write brightness management SAAS which also hosts your linux computer for you and comes with thin client software which runs on custom hardware with a microphone, webcam and seat temperature sensor because by golly if I go down this path of “simple program” insanity I am bound to end up here anyway and I might as well do it properly rather than “organically” and risk it getting out of hand.
(As an aside, this person has not understood “Dear Sir, You Have Built a Compiler” or has intentionally straw manned it.)
Perfect, that describes precisely one of my clients. Their deployment system turned into a complex monster that won’t scale properly and constantly is suffering from pain and outages after reboot, deploys and sudden spikes in traffic. The other day I found myself duct-taping one of the burning issues and it’s just no fun. We are on track reimplementing Kubernetes there…
I wish they invested half a day to look into Kubernetes and IaC when they had to deploy another site. Now it’s a mess of 25+ deployments and a huge bowl of SaltStack flavored copypasta.
Oh wonderful! First it was the “X Considered Harmful” meme. Then it was the “Falsehoods programmers believe about X”. Now it seems we have “Dear Friend, you have built an X.”
I actually wanted to try my hand at building a barebones Kubernetes with just shell scripts but…
I have a very simple web application to deploy which I have a Hetzner box for and doing that manually, finagling the .service file, figuring out how to have Github Actions copy the right files to the right place, downtime deployments. It’s such a shitshow.
On the other hand, the end state is often: Dear friend, you have built a LAMP server, using k8s (and Helm, and operators, and cert-manager, and Prometheus, and…).
I hope that this doesn’t come across as too snarky but I’ve just suffered the pain of getting a single-node Ansible AWX server running. Their only supported architecture is k8s and their operator. It took days! I remember back in the old days we used to say “simple should be easy and complicated should be possible.” Now it’s “everyone needs Google scale so that’s all we’re going to do.”
As an example of doing it the right way, the Immich photo management app is comparable to AWX in run-time complexity but it comes with a Compose file that makes it trivial to run and manage on any machine with docker already installed.
FWIW you don’t need Helm for Kubernetes, and you’re going to want something like cert-manager regardless (and if you don’t want cert-manager, just don’t do anything—I wish k8s would include it by default, but it doesn’t).
Similarly, there’s a lot of middle ground between single node and Google scale, and you should have at least one replica running in production in most cases, which i means you’re going to need some sort of load balancer and some deployment automation. At this point you may as well start considering Kubernetes.
Not only does Kubernetes provide load balancing and deployment automation, but it also allows you to easily run cloudnativepg, an operator that makes it easy to run a Postgres cluster with replication, backups, and failover. Maybe you don’t need that for your production application, but if you do, it’s a lot easier to run k8s and that operator than it is to use other tools to manage Postgres IMHO.
That’s an astonishing jump. For a handful of machines, a few scripts with ssh and some pretty standard networking tricks would be what I think of long before k8s.
I’m not sure exactly what you’re thinking with your “few scripts and networking tricks”, but it sounds like you’re trying to refute the need for a load balancer by arguing that you can implement your own load balancer.
Moreover, while any given feature Kubernetes provides out of the box could be implemented easily enough by a sufficiently competent, experienced engineer, the point of the article is that the total effort is much greater and you still end up with a platform for which no documentation, or training, or experienced hiring pool exists and your expensive competent, experienced professional has spent his time building Kubernetes instead of doing more valuable tasks (or more likely, your professional only thinks he’s sufficiently competent and experienced, and you wind up with a very broken Kubernetes that causes endless frustration).
If you have two machines you don’t need a load balancer. You just need a way to fail over to the second machine. That’s just network configuration, which you have to do anyway, Kubernetes or no Kubernetes.
That’s the problem with this argument. You still have to deploy machines, configure operating systems and networking, install services (whether your services or Kubernetes’s services), etc. unless you’re using a 3rd party managed Kubernetes, which renders the comparison irrelevant anyway. So you can expose Kubernetes as a uniform target to a bunch of teams, but you still have a lot of stuff that is local deployment. If you don’t have a platform team providing a uniform target to lots of other teams, and you’re already doing all this configuration anyway, why do the configuration of Kubernetes as well?
Kubernetes generally takes care of the failover for you. You don’t really need to think about it.
There’s much less operating system configuration if you’re using Kubernetes than if you’re rolling your own hosts, and the networking setup is “install MetalLB”. It’s extremely minimal. Similarly, certificate management is “install cert-manager”. DNS is “install external-dns”. Similarly there are standard, high-quality packages for logging, monitoring, alerting, volume management, etc. While I’m surprised there are Kubernetes distros that include this stuff out of the box, it’s still less work than doing it yourself and you get the benefits of standardization: abundant, high quality documentation, training resources, and an experienced hiring pool. And moreover, as your requirements scale, you can incrementally opt into additional automation (Kubernetes is a platform for infrastructure automation).
I’m not sure what you mean here. Why would managed Kubernetes render the comparison obsolete?
I’m not following you here. What do you mean by “uniform target” vs “local deployment”.
No, it’s not. You still have to set up the underlying network that the physical machines are on. That’s the key point. You can install all this stuff within Kubernetes…but it’s all dependent on nodes that are fully configured outside of Kubernetes.
I see what you mean. Yes, you need to have nodes that are on a network, but “fully configured” for running Kubernetes basically means “SSH is configured”, which is dramatically easier than any configuration I’ve seen for running production-grade applications directly on nodes (no need to haul out ansible or anything like it).
By design, sure. In practice though the biggest failures I’ve had with kubernetes have been nginx ingress pods just randomly losing their ability to connect to upstream and silently causing downtime that wouldn’t haven’t happened if the app was exposed directly, using an AWS ALB, etc. I’m not saying this is everyone’s experience. It might be due to noob misconfiguration mistakes. But that’s sorta the point the article misses. Kubernetes might offer repeatable solutions for these problems (and I appreciate it for that) but it really isn’t a solution on its own. Every team running a kubernetes cluster has their own collection of weird scripts with bespoke workarounds for kubermetes quirks.
/sympathize with random nginx Ingress failures leading to widespread 503 outages.
not sure if this is welcome here, but try https://semaphoreui.com/ instead
Thank you! In this case I was told to stand up an AWX instance, and I didn’t know what I didn’t know, but I’ll take a look at your suggestion. If there’s a “next time” then I’ll be able to advocate for an alternative.
OMG this. At work I tried to fix a broken internal app that I had no role in getting set up. Everyone who had worked on it had departed. So started the slog of figuring out how it worked, where it was deployed, and getting access. Oh cool, this runs on the “internal dev platform”. What is that? Oh it’s a kunernetes cluster. Neat that they tried to think ahead spread some infra costa across projects. I wonder what else is running on this thing… nothing. It is a 2 node kubernetes deployment that runs one application pod. Not per node. Total.
I was working in an Oracle shop where devs would spin up an Oracle instance for a web application that had 4 data items. Not databases. Not tables. Items.
My production app ran completely without databases and was several orders of magnitude faster, smaller, and more reliable than its predecessor.
But think. How the devs could otherwise move on in their careers without “designed, built, deployed and maintained a scalable K8 solution” on their CVs?
Ding ding ding! You got it. On a lark I searched Google for large sections of the docs they “wrote” for it and sure enough it was all basically copy/pasted from guides on how to set up RBAC.
This really downplays the complexity of kubernetes and the specialization it inherently requires. Claiming that building something yourself distracts from your core business rings pretty hollow to me when you have to dedicate one or more people to running kubernetes. There are piles and piles of resources and automation one must learn to work with.
If you’re interested in alternatives (as I am), you don’t need to use shell scripts. Real programming languages exist. And real immutable and reproducible Linux distributions do too. Using ansible to automate a Linux install is neither immutable nor reproducible.
There is a time and place for kubernetes, but I reject the notion that it’s appropriate for all or even most companies. I don’t buy the hype, regardless of how much superior this post reads.
With respect to “specialization” it’s a lot easier IMHO to learn Kubernetes than it is to learn any given bespoke Kubernetes alternative. And if you run into problems with a bespoke platform, good luck finding documentation online, and if you want to train someone on your bespoke platform, you won’t find training materials online, and if you want to hire someone who already knows your bespoke platform, you’re similarly out of luck.
And programming languages, bash or otherwise, don’t solve the problem that Kubernetes solves, which is reconciliation. Programming languages are imperative, while Kubernetes lets you describe the desired state of your infrastructure and reconciles everything accordingly whereas programs are mostly only useful for creating the infrastructure the first time, not updating it.
There are declarative infrastructure options besides kubernetes. Terraform, nixos and puppet all meet that definition. Kubernetes assumes that everything always needs to be constantly reconciled, but often that’s not the case.
I understand the concern about training and hiring, but I stand by my statement that most companies shouldn’t be running kubernetes at all so it’s a bit moot IMO. We can clearly agree to disagree on the value here. :)
You’re moving the goal posts here. Your initial comparison was between Kubernetes and programming languages. NixOS and Puppet are primarily for managing hosts, not general infrastructure. But yes, Terraform fits the bill.
I’m not sure what you mean by “always needs to be constantly reconciled”, but for the most part, Kubernetes only runs the reconciliation when things change. For example, if a pod dies, Kubernetes will restart it automatically, whereas Terraform will not (Terraform typically expects you to configure something like Kubernetes that does automatic reconciliation, like EC2 or some other cloud service).
Agree to disagree, but even if you don’t pick Kubernetes, the thing you build instead is still going to have documentation, training, and hiring concerns not to mention the technical concerns like load balancing, certificate management, deployment automation, central logging and monitoring, networking, etc. I think the barrier to entry for Kubernetes is pretty low these days, and it’s often a lot easier just to spin up a Kubernetes cluster than it is to do all of the host configuration. Of course, it depends on what you already know.
Somewhat fair that I did change the subject slightly, but I never once compared Kubernetes and programming languages. The separate paragraphs in my original response were meant to indicate two different responses to the OP article. They brought up crappy bash scripts and I was pointing out that automation can be done in better, more maintainable, languages.
Why do we run kubernetes? Not to only manage general infrastructure. The value of kubernetes is services, which includes managing those services, along with providing infrastructure automation. So I do include nixos and puppet because they can manage services running on an operating system. Whether a web service or load balance, they can provide the same end value for some or much of what kubernetes offers. There are trade offs, yes!
I only posit you can go quite far with a simple Linux system. ;) If that’s unpopular these days, so be it, but I’m not sold on the complexity k8s requires and the automation surrounding it.
Fuck me if I could figure out how to get my site up and running with that. I’d love to, don’t get me wrong, but snowball’s chance in hell, really.
services.nginx.enable = true;
.(Slight jest because there’s no sites enabled on it, but really, while there’s definitely a learning curve, it’s not as sharp as it might look from the outset.)
I’ve played a bunch with nix and bounced off to the point where I need to see some dramatic change/improvement before I’ll touch it again.
Maybe if this Hetzner box locks me out again I’ll get sick of it and then codify the entire config but for now, I’m going to
apt-get
my way through it.That’s very fair. Maybe I was lucky in that I bounced off it first back in 2020, and enough had improved by my second attempt last year that I felt like I could stick with it. There’s still plenty of awkward/hard/inexplicable bits and no way past them except through them.
It took me three times but it finally all clicked. I’m now quite sold on the nixos distribution, but also quite open about its own complexity and costs.
Getting the Hetzner thing clicked together again was a nightmare (systemd docs do not exist so you’re copy-pasting bits and pieces of superuser.com together), so if it breaks again I’ll probably go for something redistributable.
Me too. And I think the last ten years proves that this is demonstrably false. Lots (most? all?) high growth startups that actually scale up rapidly seems to have lots of custom infrastructure components that they wrote themselves for packaging, CI, IaaS orchestration, database and state management, deployment orchestration. I can’t think of anyone that’s ridden a vanilla, cloud-managed, or even mildly customized K8s distribution (and nothing else) to the top.
Don’t get me wrong. K8s does what it says on the tin. And it might be the most accessible technology that accomplishes such a broad featureset. But I also think it’s pretty complex for what most people need from it. And in all the systems I’ve seen, other non-k8s components are doing a lot of heavy lifting. Examples include:
What are most people getting from k8s? Bin-packing, very basic config management, very basic logging (that no one uses), and restarting something that crashes. That’s actually a nice feature set! But I think we’ll find a nicer way to deliver that feature set in the future.
Maybe I’m misunderstanding the argument, but it seems like you’re implying a sort of dichotomy where you either build everything yourself or you ride vanilla, unadorned Kubernetes all the way to the top, and if you customize Kubernetes at all you’re not really benefiting from Kubernetes? I agree that most successful companies who use Kubernetes are using it as an extensible platform for hanging their automation off of, but I think that validates Kubernetes’ value proposition.
The fact that you can start with something as simple as a three node k3s cluster for a single team and, as your requirements increase, continue to add automation all the way up to enterprise without hitting some scaling limit that forces you to completely overhaul your platform is pretty cool in my opinion. And I think the people who say “Kubernetes is too complicated for most companies” are really overestimating the barrier of entry for Kubernetes and especially underestimating the complexity of alternatives, at least beyond a single node scale. For example, if you want Postgres with replication, backups, and failover, with Kubernetes you can just install the CloudNativePG operator, while it seems like the best practice otherwise is to cobble together a variety of tools (I’m sure there’s something out there that addresses this, but it doesn’t appear that there’s any canonical solution apart from cloud-provider managed Postgres).
The API itself is pretty useful and powerful, and along with Custom Resource Definitions, it forms the foundation for using Kubernetes as a full platform for highly reliable, scalable, infrastructure automation. You can pretty easily write your own operators for automating infrastructure, and there are off-the-shelf packages for common tasks ranging from cert management and dns all the way up to high-availability Postgres.
We have 1 person with their focus on k8s but I wouldn’t say they are working on k8s full time. Not nearly.
This screams of projecting lack of skill into others. Perhaps your shellscripts don’t work. Where does the conclusion that mine won’t work comes from?
This is the same mindset as the common myth of “don’t write your own SQL, Hibernate developers can write much better SQL than you”. Yeah, how did that work out?
Your bash scripts are probably fine, but recognise that rolling your own requires maintenance.
I think it’s a bit lost in the format but for me, the takeaway from this article is that you should be conscious of the point at which maintenance cost starts to outstrip value.
You may well not be there and it’s definitely easy to fall into the trap of adopting technology X too early and eating cost and complexity as a result.
But I have worked at enough places where the world is run by collections of well-meaning and increasingly stretched shell scripts marinating in Historical Context. There’s a trap there as well.
I am a heavy k8s user at work. I can confidently say my bash scripts require less maintenance than k8s.
Rolling my own what? The reason why I disagree with this post is because it is vague and fails to detail what exactly would this shellscripts entail and what work does it go to set up deployments on Kubernetes even with an already working cluster. Frankly speaking, I find the amount of painfully complicated yaml it takes to set up a service is more evolved than a simple deployment script.
It kinda isn’t and doesn’t, though. It even provides a handy summary of the set of things that, if you need them, maybe you should consider k8s, after all:
Invariably, I’ve seen collections of shell scripts spring up around maintaining kubernetes. The choice isn’t between bash scripts or kubernetes. The choice is around how you bin-pack the services onto servers and ship the code there. All options come with a large side of scripting to manage them.
Not my shell scripts! They’re perfect. Perfect in every ways. And my SQL — well escaped and injection proof! My memory? I bounds-checked it myself with a slide rule, not a single byte will by read out of bounds.
Other people have skill issues but you and me? We’re in a league of our own :)
Oh yeah, the old shortcut to fake humbleness “we all make mistakes, I’m not perfect, neither are you”.
That argumentative position is useless. So we completely relativize any and every bug? Are they all the same? All code has bugs… How many bugs are reasonable? Is it acceptable that a single 10 line shell script has 4 bugs? What about 8 bugs?
And what about Kubernetes manifests? Are they magically bug free because we just say they are?
Can we try to keep the discussion fruitful and somewhat technical?
Yes, I am claiming that the author sounds like they are not too familiar with shell scripts and dismiss them as something that attracts bugs and is difficult to maintain. What at is the technical foundation of such claims?
Your example being a good one. SQL injection was a problem from the old PHP era when lots of people jumped in using relational database without any prior programming knowledge. It is rather trivially avoided and it is virtually non existent nowadays. I think everyone expects it to be non problem and if people go about assembling SQL by string concatenation without proper escaping, that will certainly not fall under the “we all make bugs” category.
Well, my Kubernetes manifests are bug-free. I’m an expert.
OTOH do avoid writing your own database instead of using an off the shelf one. The durability on power loss alone would consume you for weeks. ;)
Yeah, “skill issue” arguments in the domain of software engineering never cease to tickle me.
“You aren’t writing your own database and operating system on bespoke hardware? Skill issue!”
😂
It’s a categorical difference. It requires dramatically more “skill” (I would argue that it becomes functionally impossible to do this at any but the most trivial scales but maybe you’re the rare genius who could be curing cancer but prefers to use bash to address already-solved problems?) to write correct, idempotent shell scripts as opposed to describing your desired state and letting a controller figure out how to update it.
Declarative programming sounds good but the effect is that you have an application whose runtime control flow relevant state is “your entire system”.
Even if you think you are capable of writing immaculate scripts that can do everything you need and maintaining them, can you not conceive of a world where other people have to maintain them when you’re not around? In other words, even if you are perfect, if the baseline skill required to operate a shell-based deployment method is so high, aren’t you basically arguing against it?
Like, there’s plenty of technical arguments against kubernetes, and there’s great alternatives that are less complex. You can even argue about whether some of these things, like rolling deploys, are even required for most people. Skipping all of that and calling someone else a bad programmer because they’d rather use Kubernetes is just mean spirited. Just this week another user was banned for (among other things) saying “skill issue”, but if you rephrase that to “lack of skill” it sits at +22?
Most teams converge on using an ORM. Developers who can’t deal with ORMs and feel the need to constantly break out into SQL are a code smell.
Seriously? The Vietnam och computing?
This is largely untrue and the peak gas passed long ago, with ORM libraries that promised to take over the world up to around 2010 being pretty much all dead.
The explosion of popularity of postgrest, supabase, and the like seems unstoppable at this moment.
I don’t see it for teams or for anybody who’s hiring/managing teams. Raw SQL is fun if you’re 1-2 developers but the fun wears off quickly.
Also, yes most ORMs suck (with “suck” I mean, they’re not the Django ORM).
My experience has been by and large the inverse, with teams bemoaning ORMs systematically because they’d gotten bitten by Weird ORM Bugs more than once. Not saying that raw SQL is more fun, but I derive no fun from ORMs either (nor have I seen many teams having fun with ORMs). Of course, this is also anecdata.
I love this. Inside but also outside of the context that surrounds this comment.
Fair, but at what point in this story is the right time to switch to Kubernetes?
Kubernetes doesn’t advertise itself for deployments on single nodes, but maybe it’s perfectly fine? This is an area I honestly haven’t investigated.
At $work I often have opposite needs: we want to isolate customers, and keep costs down, one server is usually enough. So it’s basically a whole bunch of individual VMs. I’m enthusiastic about NixOS for these types of deployments.
But we also have customers that demand the zero downtime deployments and failover magicks, so we use EKS. And while it’s generally smooth, I find it harder to debug issues, and more difficult to explain to developers. So I’m reluctant about more Kubernetes.
One single-node technique I’ve seen involves an app that has a cloud version and an onprem version. You write the app config once as a Helm chart which is easy to install in our own k8s cluster. Then you give onprem customers a VM image running k3s, which is really stable these days in my experience.
Outside of that example, choosing k3s initially means expanding to a second server can be done relatively easily (although I’ve never personally tried this): https://docs.k3s.io/datastore/ha-embedded#existing-single-node-clusters
Could you please expand upon “give onprem customers a VM image running k3s”? I roughly know that K3s is a distribution of K8s that is more lightweight, but I’m unsure how what you described plays out in real life.
I’m curious about this too. I tried to theorycraft my way through it for fun, though I haven’t played with k3s in a long time now. The following is untested, mainly just musing while flipping through some docs.
Since Kubernetes requires stable IP addresses for each node, I wonder if the VM image installs k3s, enables the systemd unit but doesn’t start it (env INSTALL_K3S_SKIP_START=true when running the install script) as a way of deferring the cluster init, since the end user may choose to specify their own static IP for the VM, or preconfigure their DHCP server with a static lease.
Then the VM image can also contain Kubernetes manifests[1] or Helm charts[2] at a well-known path (/var/lib/rancher/k3s/server/manifests) such that when the end-user creates a VM from the image, k3s will start, run cluster init, then apply the manifests under that directory. This is where I’d put manifests for the workload I’m trying to ship in the image.
Probably best to include container images for the deployment in the VM image, that way end users don’t have any additional startup cost due to image pulling when creating a VM from the VM image. I don’t remember off the top of my head if the k3s installer includes these images, so I’d run the airgap instructions[3] for good measure to include them.
Now, with regard to including the container images for our custom workloads under /var/lib/rancher/k3s/server/manifests, they can either be prepared with something like
docker image save coolcompany/workload:v3.1.2 > workload_v3.1.2.tar
[4] onto the machine we are preparing the VM image on. Then we can copy the resulting tarball into the VM image under /var/lib/rancher/k3s/agent/images/. At least, I think k3s will automatically import OCI images from this directory, from how the airgap instructions seem to work.Though if that doesn’t work, if we’re running the image in a virtual machine prior to running virt-sysprep to finalize the VM image, I think something like
sudo k3s ctr image import -n=k8s.io workload_v3.1.2.tar
would have the desired effect.If the app needs persistent storage, I don’t know off the top of my head if the included local-path-provisioner[5] has a default set of paths where it will provision volumes, so if that was the case, I’d probably err on the side of caution and create a well-known path in the image
/var/lib/myapp/data
and then include a local-path-provisioner ConfigMap[6] under /var/lib/rancher/k3s/server/manifests pointing to that location.Then we can include a PersistentVolume[7] and PersistentVolumeClaim[8] that references the aforementioned PV under /var/lib/rancher/k3s/server/manifests.
Some additional documentation about this for end users might be good, since they can write a CronJob object to /var/lib/rancher/k3s/server/manifests to back up the persistent data.
Now for accessing the workload. If it’s something that needs to be accessed over HTTP(S), then it’s good to know that k3s already includes the Traefik ingress controller[9], so we just need to ensure our manifests under /var/lib/rancher/k3s/server/manifests define a service with type: ClusterIP and a corresponding Ingress object. I think some additional thought is needed here for providing end users with a turnkey way of having automatic certificate rotation for TLS.
The Traefik ingress controller will automatically listen on the VM’s ports 80 and 443 thanks to ServiceLB[10] which is also included in k3s.
I think there’s still a way forward if our workload is not a simple layer7 HTTP thing, since we can still define a Service object with type: LoadBalancer thanks to ServiceLB and still have it accessible since ServiceLB will listen on the NodePorts.
At this point, I think the VM image template is done, and we can shut it down, run
virt-sysprep
and begin distributing this as a release artifact.I think there’s still a lot of open questions here. For example, this VM image will have a Kubernetes API server listening on a non-localhost interface, so maybe that’s not desirable if Kubernetes is meant to be an invisible implementation detail here and there’s no chance of expanding beyond single node.
I think k3s may manage most of the PKI (all of the various k8s components certificates) for us, but not 100% sure. So if an end user has been using the system beyond their expiry date and they don’t automatically rotate, I can’t imagine good things will happen to their deployment, and since I mainly wrote this from the perspective of the end user not being aware that the VM image runs on Kubernetes to begin with, this may not be ideal.
I mentioned above the lack of automatic certificate rotation for the workload. For example if we deployed this to a client’s business or something, they may have their own certs. If the end user doesn’t mind the kubernetes of it all, they could install cert-manager or manually apply TLS crt,key secrets into the namespace where we deployed the workload that depends on it, then the Ingress object needs to be updated to reference it.
Also, this has no provisions at all for upgrading k3s itself lol.
… and probably many more, but it was a fun thought experiment before I went to lunch. I might try this out in my homelab later.
[1] https://docs.k3s.io/installation/packaged-components#auto-deploying-manifests-addons
[2] https://docs.k3s.io/helm
[3] https://docs.k3s.io/installation/airgap#prepare-the-images-directory-and-airgap-image-tarball
[4] https://docs.docker.com/reference/cli/docker/image/save/
[5] https://docs.k3s.io/storage#setting-up-the-local-storage-provider
[6] https://github.com/rancher/local-path-provisioner/blob/master/README.md#customize-the-configmap
[7] https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistent-volumes
[8] https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims
[9] https://docs.k3s.io/networking/networking-services?_highlight=ingress#traefik-ingress-controller
[10] https://docs.k3s.io/networking/networking-services?_highlight=service#service-load-balancer
Nowhere in this post do they need what Kubernetes was actually built to do: run a distributed consensus algorithm non stop to efficiently pack containers onto a cluster of machines. That’s why you see people trying to avoid it.
Needing multiple servers does not mean you automatically need etcd either. We built apps for decades with a shared nothing architecture. You probably don’t want your tasks automatically migrating between machines every time the network drops out. If you need a strong guarantee of only one task running at a time, relying on the orchestrator is probably not a good idea either.
Yes, K8s solves a lot of these other problems too but it is not the solution. There are plenty of valid use cases that do no require scheduling and orchestration across a cluster of machines, including the one in the article.
Scheduling is an important feature, but even more fundamentally Kubernetes is ultimately a reconciliation platform and a collection of controllers for common infrastructure concerns like load balancing, deployment automation, service discovery, networking, etc. Many people use Kubernetes even if they don’t need scheduling specifically because the value proposition is much larger.
https://lobste.rs/s/vajg5m/what_is_boring_technology_solution_for
I’m being attacked 😅
Just kidding, this is great thanks for sharing OP :)
Haha you are my inspiration, my muse :)
Thing is you don’t replace a pile of deployment scripts with Kubernetes. You replace it with another pile of scripts/configuration that runs on top of Kubernetes. There’s no universal answer for which pile is easier to build/maintain.
I get that this is trying to parody ‘dear friend, you have built a compiler’ but what it misses is that the original is about trying to convince you to be honest about the domain you’re operating in and design your system accordingly. It’s not about saying “actually you should never try to build a compiler in the first place, you should just wrangle the problem into something that an existing third-party compiler can handle”.
I have seen at least 4 different K8s based teams. Not only they spend more on the infrastructure in terms of bills but the human cost to run a K8s infrastructure is at least 4-8 hours a week of a mid-level engineer.
What’s the alternative? Google cloud run, AWS fargate, render.com, Railway app and several others.
Unless you are an infrastructure company or have 50+ engineers, K8s is a distraction.
Exactly, can’t agree more… This post reminds me pre-k8s era, AWS struggling to deliver ECS, everyone creating tooling to handle containers, every mid-company creating their own PaaS… “Let’s use CoreOS… try Deis.. damn, wiped entire environment, Rancher now will solve everything..!” etc.. My team at time was happy using the old droneio deploying on hardened ec2 instances, relying on couple scripts handling the daemons and health checking, faster deployments, high availability… Most of other apps running on Heroku.
Now containers are everywhere, we have infinite tools to manage them and k8s became cloud defacto, consulting companies are happy and devs sad… There is huge space between couple scripts and k8s where we need to analyze the entire context.. IMHO If you don’t have a budget for an entire SRE/DevOps team, weeks or months for planning/provisioning and just run stateless apps, even managed k8s (EKS, GKE, AKS) does not make any sense, as you said, you can achieve high availability using ultra-managed-solutions which runs over k8s (also appengine flex, fly.io) or a combination of other things, we also have nice tools for self-hosting like kamal and coolify and another options like nanovms or unikraft.
btw, I’m a former SRE from payment gateway/acquirer in Brazil (valued at $2.15bi), responsible for hundred thousands of users and PoS terminals connected to AppEngine clusters complaint with PCI-DSS managed by small team. So, yes, I have some idea about what I’m saying…
Yes, if you’re not pushing more traffic than my M2 laptop could also do on a rainy day, you do not need Kubernetes.
I have served 100qps+ on Google Cloud Run. Not huge by FAANG standards but still more than what most startups need.
I looked at App Runner but it would be hell to have 20-30 ‘services’ talk to each other over that and then we’d be stuck in a dead end. We are moving everything to Kubernetes because with a vanilla EKS setup it just works (surprisingly low amount of head ache) and it offers customisability into the far heavens for when we would need it.
How will you deal with 20-30 services talking to each other on K8s? Isn’t complexity the same?
Trying to fit the format closely removes a lot of nuance. I promise I’m only half serious, as someone with my own fair share of effective non-Kubernetes setups.
I felt that the original “Dear sir, you have built a compiler” was a lot of fun to read, even though the arguments felt a little weak to me (which can happen if entertainment is prioritized over persuasion). But I was surprised to see so much agreement in the comments.
My rendition is the same way. It’s kind of a just-so story where the challenges and solutions that the reader encounters on their “journey to avoid building a Kubernetes” don’t have strong justifications. But since it offers a simple explanation for a complex phenomenon like the growing adoption of Kubernetes, it can sound more compelling.
The growing adoption of Kubernetes is the result of marketing dollars, devops startups growth hacking, the aging-out of people who cut their teeth on LAMP boxes, the psyop of microservices being the solution to all problems, and developers wanting to larp as Google SREs. I would argue that it isn’t really a complex phenomenon.
Ehm, this sounds a lot like my job. Where we use Kubernetes.
Sometimes I have a problem like I want to be able to press a key on my computer and have something complicated happen like my backlight brightness to increase. I could just write a simple snippet of python to read the brightness, handle doing the log scaling, and then write a new brightness value. Then I would just need to add it to my documentation for machine setup and put that python script with a small install script into source control.
But then I worry, what if tomorrow I want my brightness script to speak to ChatGPT to perform AI scaling on my brightness depending on my mood which it gets from a photo of my face taken with my webcam,
And then I wonder, what if at some point I wanted to be able to customize which AI agent I wanted to use, and also run my own local LLM. What if I want to also take microphone input and the temperature of my seat. What if I want my brightness manager to have a web browser. What then?
Since I am a sensible person I decide that, instead of writing some straightforward little script to handle this, two lines of shell, and a sentence of documentation and handle new features as they come, I am just going to start a new company, hire a CTO, get him to hire a team of developers, to write brightness management SAAS which also hosts your linux computer for you and comes with thin client software which runs on custom hardware with a microphone, webcam and seat temperature sensor because by golly if I go down this path of “simple program” insanity I am bound to end up here anyway and I might as well do it properly rather than “organically” and risk it getting out of hand.
(As an aside, this person has not understood “Dear Sir, You Have Built a Compiler” or has intentionally straw manned it.)
Perfect, that describes precisely one of my clients. Their deployment system turned into a complex monster that won’t scale properly and constantly is suffering from pain and outages after reboot, deploys and sudden spikes in traffic. The other day I found myself duct-taping one of the burning issues and it’s just no fun. We are on track reimplementing Kubernetes there…
I wish they invested half a day to look into Kubernetes and IaC when they had to deploy another site. Now it’s a mess of 25+ deployments and a huge bowl of SaltStack flavored copypasta.
I love this. Whom amongst us has not …?
I would love to extend this to “Dear friend, you have built a PaaS” ;)
Oh wonderful! First it was the “X Considered Harmful” meme. Then it was the “Falsehoods programmers believe about X”. Now it seems we have “Dear Friend, you have built an X.”
“Dear friend, you have built a better Kubernetes” ftfy
I actually wanted to try my hand at building a barebones Kubernetes with just shell scripts but…
I have a very simple web application to deploy which I have a Hetzner box for and doing that manually, finagling the
.service
file, figuring out how to have Github Actions copy the right files to the right place, downtime deployments. It’s such a shitshow.I’ve done both forks. A lot of what is elided in the post, I see with clear eyes.
Use the freaking k8s. Learn the thing. Don’t go on making it hard with complicated operators. Move on to solve real problems.
[Comment removed by author]
Commenting on a story (authored by a Lobster) just to say it trips a pet peeve of yours and you’d rather you’d ignored it?