Kubernetes' Cluster Autoscaler is a prime example of the differences between different managed Kubernetes offerings. We'll use it to compare the three major Kubernetes-as-a-Service providers.
I'll limit the comparison between the vendors only to the topics related to Cluster Autoscaling.
GKE is a no-brainer for those who can use Google to host their cluster. It is the most mature and feature-rich platform. They started Google Kubernetes Engine (GKE) long before anyone else. When we combine their headstart with the fact that they are the major contributor to Kubernetes and hence have the most experience, it comes as no surprise that their offering is way above others.
When using GKE, everything is baked into the cluster. That includes Cluster Autoscaler. We do not have to execute any additional commands. It simply works out of the box. Our cluster scales up and down without the need for our involvement, as long as we specify the
--enable-autoscaling argument when creating the cluster. On top of that, GKE brings up new nodes and joins them to the cluster faster than the other providers. If there is a need to expand the cluster, new nodes are added within a minute.
There are many other reasons I would recommend GKE, but that's not the subject right now. Still, Cluster Autoscaling alone should be the proof that GKE is the solution others are trying to follow.
Amazon's Elastic Container Service for Kubernetes (EKS) is somewhere in the middle. Cluster Autoscaling works, but it's not baked in. It's as if Amazon did not think that scaling clusters is important and left it as an optional add-on.
EKS installation is too complicated (when compared to GKE and AKS) but thanks to eksctl from the folks from WeaveWorks, we have that, more or less, solved. Still, there is a lot left to be desired from eksctl. For example, we cannot use it to upgrade our clusters.
The reason I'm mentioning eksctl in the context of auto-scaling lies in the Cluster Autoscaler setup.
I cannot say that setting up Cluster Autoscaler in EKS is hard. It's not. And yet, it's not as simple as it should be. We have to tag the Autoscaling Group, put additional privileges to the role, and install Cluster Autoscaler. That's not much. Still, those steps are much more complicated than they should be. We can compare it with GKE. Google understands that auto-scaling Kuberentes clusters is a must and it provides that with a single argument (or a checkbox if you prefer UIs). AWS, on the other hand, did not deem auto-scaling important enough to give us that much simplicity. On top of the unnecessary setup in EKS, the fact is that AWS added the internal pieces required for scaling only recently. Metrics Server can be used only since September 2018.
My suspicion is that AWS does not have the interest to make EKS great by itself and that they are saving the improvements for Fargate. If that's the case (we'll find that out soon), I'd characterize it as "sneaky business". Kubernetes has all the tools required for scaling Pod and nodes and they are designed to be extensible. The choice not to include Cluster Autoscaler as an integral part of their managed Kubernetes service is a big minus.
What can I say about Azure Kubernetes Service (AKS)? I admire the improvements Microsoft made in Azure as well as their contributions to Kubernetes. They do recognize the need for a good managed Kubernetes offering. Yet, Cluster Autoscaler is still in beta. Sometimes it works, more often than not it doesn't. Even when it does work as it should, it is slow. Waiting for a new node to join the cluster is an exercise in patience.
The steps required to install Cluster Autoscaler in AKS are sort of ridiculous. We are required to define a myriad of arguments that were supposed to be already available inside the cluster. It should know what is the name of the cluster, what is the resource group, and so on and so forth. And yet, it doesn't. At least, that's the case at the time of this writing (October 2018). I hope that both the process and the experience will improve over time. For now, from the perspective of auto-scaling, AKS is at the tail of the pack.
You might argue that the complexity of the setup does not really matter. You'd be right. What matters is how reliable Cluster Autoscaling is and how fast it adds new nodes to the cluster. Still, the situation is the same. GKE leads in reliability and the speed. EKS is the close second, while AKS is trailing behind.
The DevOps 2.5 Toolkit: Monitoring, Logging, and Auto-Scaling Kubernetes
The article you just read is an extract from The DevOps 2.5 Toolkit: Monitoring, Logging, and Auto-Scaling Kubernetes.
What do we do in Kubernetes after we master deployments and automate all the processes? We dive into monitoring, logging, auto-scaling, and other topics aimed at making our cluster resilient, self-sufficient, and self-adaptive.
Kubernetes is probably the biggest project we know. It is vast, and yet many think that after a few weeks or months of reading and practice they know all there is to know about it. It's much bigger than that, and it is growing faster than most of us can follow. How far did you get in Kubernetes adoption?
From my experience, there are four main phases in Kubernetes adoption.
In the first phase, we create a cluster and learn intricacies of Kube API and different types of resources (e.g., Pods, Ingress, Deployments, StatefulSets, and so on). Once we are comfortable with the way Kubernetes works, we start deploying and managing our applications. By the end of this phase, we can shout "look at me, I have things running in my production Kubernetes cluster, and nothing blew up!" I explained most of this phase in The DevOps 2.3 Toolkit: Kubernetes.
The second phase is often automation. Once we become comfortable with how Kubernetes works and we are running production loads, we can move to automation. We often adopt some form of continuous delivery (CD) or continuous deployment (CDP). We create Pods with the tools we need, we build our software and container images, we run tests, and we deploy to production. When we're finished, most of our processes are automated, and we do not perform manual deployments to Kubernetes anymore. We can say that things are working and I'm not even touching my keyboard. I did my best to provide some insights into CD and CDP with Kubernetes in The DevOps 2.4 Toolkit: Continuous Deployment To Kubernetes.
The third phase is in many cases related to monitoring, alerting, logging, and scaling. The fact that we can run (almost) anything in Kubernetes and that it will do its best to make it fault tolerant and highly available, does not mean that our applications and clusters are bulletproof. We need to monitor the cluster, and we need alerts that will notify us of potential issues. When we do discover that there is a problem, we need to be able to query metrics and logs of the whole system. We can fix an issue only once we know what the root cause is. In highly dynamic distributed systems like Kubernetes, that is not as easy as it looks.
Further on, we need to learn how to scale (and de-scale) everything. The number of Pods of an application should change over time to accommodate fluctuations in traffic and demand. Nodes should scale as well to fulfill the needs of our applications.
Kubernetes already has the tools that provide metrics and visibility into logs. It allows us to create auto-scaling rules. Yet, we might discover that Kuberentes alone is not enough and that we might need to extend our system with additional processes and tools. This phase is the subject of this book. By the time you finish reading it, you'll be able to say that your clusters and applications are truly dynamic and resilient and that they require minimal manual involvement. We'll try to make our system self-adaptive.
I mentioned the fourth phase. That, dear reader, is everything else. The last phase is mostly about keeping up with all the other goodies Kubernetes provides. It's about following its roadmap and adapting our processes to get the benefits of each new release.