Setting Up Cluster Autoscaler In EKS

Unlike GKE, EKS does not come with Cluster Autoscaler. We’ll have to configure it ourselves. We’ll need to add a few tags to the Autoscaling Group dedicated to worker nodes, to put additional permissions to the Role we’re using, and to install Cluster Autoscaler.

Let’s get going.

The commands that follow assume that you created an EKS cluster using etsctl and that you have the environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION defined. If you do not have such a cluster, you can create one using the
eks-ca.sh Gist. Alternatively, you can create the cluster without eksctl, but the commands that follow might need to be slightly modified. You’ll also need to clone the code of the vfarcic/k8s-specs repository (e.g., git clone https://github.com/vfarcic/k8s-specs) and enter inside the cloned folder (e.g.,cd k8s-specs`). The repository contains the definitions we’ll use in this article.

We’ll add a few tags to the Autoscaling Group dedicated to worker nodes. To do that, we need to discover the name of the group. Since we created the cluster using eksctl, names follow a pattern which we can use to filter the results. If, on the other hand, you created your EKS cluster without eksctl, the logic should still be the same as the one that follows, even though the commands might differ slightly.

First, we’ll retrieve the list of the AWS Autoscaling Groups, and filter the result with jq so that only the name of the matching group is returned.

[code lang=bash]
export NAME=devops25

ASG_NAME=$(aws autoscaling \
describe-auto-scaling-groups \
| jq -r ".AutoScalingGroups[] \
| select(.AutoScalingGroupName \
| startswith(\"eksctl-$NAME-nodegroup\")) \
.AutoScalingGroupName")

echo $ASG_NAME
[/code]

The output of the latter command should be similar to the one that follows.

[code lang=text]
eksctl-devops25-nodegroup-0-NodeGroup-1KWSL5SEH9L1Y
[/code]

We stored the name of the cluster in the environment variable NAME. Further on, we retrieved the list of all the groups and filtered the output with jq so that only those with names that start with eksctl-$NAME-nodegroup are returned. Finally, that same jq command retrieved the AutoScalingGroupName field and we stored it in the environment variable ASG_NAME. The last command output the group name so that we can confirm (visually) that it looks correct.

Next, we’ll add a few tags to the group. Kubernetes Cluster Autoscaler will work with the one that has the k8s.io/cluster-autoscaler/enabled and kubernetes.io/cluster/[NAME_OF_THE_CLUSTER] tags. So, all we have to do to let Kubernetes know which group to use is to add those tags.

[code lang=bash]
aws autoscaling \
create-or-update-tags \
–tags \
ResourceId=$ASG_NAME,ResourceType=auto-scaling-group,Key=k8s.io/cluster-autoscaler/enabled,Value=true,PropagateAtLaunch=true \
ResourceId=$ASG_NAME,ResourceType=auto-scaling-group,Key=kubernetes.io/cluster/$NAME,Value=true,PropagateAtLaunch=true
[/code]

The last change we’ll have to do in AWS is to add a few additional permissions to the role created through eksctl. Just as with the Autoscaling Group, we do not know the name of the role, but we do know the pattern used to create it. Therefore, we’ll retrieve the name of the role, before we add a new policy to it.

[code lang=bash]
IAM_ROLE=$(aws iam list-roles \
| jq -r ".Roles[] \
| select(.RoleName \
| startswith(\"eksctl-$NAME-nodegroup-0-NodeInstanceRole\")) \
.RoleName")

echo $IAM_ROLE
[/code]

The output of the latter command should be similar to the one that follows.

[code lang=text]
eksctl-devops25-nodegroup-0-NodeInstanceRole-UU6CKXYESUES
[/code]

We listed all the roles, and we used jq to filter the output so that only the one with the name that starts with eksctl-$NAME-nodegroup-0-NodeInstanceRole is returned. Once we filtered the roles, we retrieved the RoleName and stored it in the environment variable IAM_ROLE.

Next, we need JSON that describes the new policy. I already prepared one, so let’s take a quick look at it.

[code lang=bash]
cat scaling/eks-autoscaling-policy.json
[/code]

The output is as follows.

[code lang=text]
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup"
],
"Resource": "*"
}
]
}
[/code]

If you’re familiar with AWS (I hope you are), that policy should be straightforward. It allows a few additional actions related to autoscaling.

Finally, we can put the new policy to the role.

[code lang=bash]
aws iam put-role-policy \
–role-name $IAM_ROLE \
–policy-name $NAME-AutoScaling \
–policy-document file://cluster/eks-autoscaling-policy.json
[/code]

Now that we added the required tags to the Autoscaling Group and that we created the additional permissions that will allow Kubernetes to interact with the group, we can install Cluster Autoscaler Helm Chart.

[code lang=bash]
helm install stable/cluster-autoscaler \
–name aws-cluster-autoscaler \
–namespace kube-system \
–set autoDiscovery.clusterName=$NAME \
–set awsRegion=$AWS_DEFAULT_REGION \
–set sslCertPath=/etc/kubernetes/pki/ca.crt \
–set rbac.create=true

kubectl -n kube-system \
rollout status \
deployment aws-cluster-autoscaler
[/code]

Once the Deployment is rolled out, the autoscaler should be fully operational. Try it out by deploying more Pods than your cluster can handle and observe that a new node was added a few minutes later. Similarly, nodes will be removed when the cluster is underutilized.

The DevOps 2.5 Toolkit: Monitoring, Logging, and Auto-Scaling Kubernetes

The article you just read is an extract from The DevOps 2.5 Toolkit: Monitoring, Logging, and Auto-Scaling Kubernetes.

What do we do in Kubernetes after we master deployments and automate all the processes? We dive into monitoring, logging, auto-scaling, and other topics aimed at making our cluster resilient, self-sufficient, and self-adaptive.

Kubernetes is probably the biggest project we know. It is vast, and yet many think that after a few weeks or months of reading and practice they know all there is to know about it. It’s much bigger than that, and it is growing faster than most of us can follow. How far did you get in Kubernetes adoption?

From my experience, there are four main phases in Kubernetes adoption.

In the first phase, we create a cluster and learn intricacies of Kube API and different types of resources (e.g., Pods, Ingress, Deployments, StatefulSets, and so on). Once we are comfortable with the way Kubernetes works, we start deploying and managing our applications. By the end of this phase, we can shout “look at me, I have things running in my production Kubernetes cluster, and nothing blew up!” I explained most of this phase in The DevOps 2.3 Toolkit: Kubernetes.

The second phase is often automation. Once we become comfortable with how Kubernetes works and we are running production loads, we can move to automation. We often adopt some form of continuous delivery (CD) or continuous deployment (CDP). We create Pods with the tools we need, we build our software and container images, we run tests, and we deploy to production. When we’re finished, most of our processes are automated, and we do not perform manual deployments to Kubernetes anymore. We can say that things are working and I’m not even touching my keyboard. I did my best to provide some insights into CD and CDP with Kubernetes in The DevOps 2.4 Toolkit: Continuous Deployment To Kubernetes.

The third phase is in many cases related to monitoring, alerting, logging, and scaling. The fact that we can run (almost) anything in Kubernetes and that it will do its best to make it fault tolerant and highly available, does not mean that our applications and clusters are bulletproof. We need to monitor the cluster, and we need alerts that will notify us of potential issues. When we do discover that there is a problem, we need to be able to query metrics and logs of the whole system. We can fix an issue only once we know what the root cause is. In highly dynamic distributed systems like Kubernetes, that is not as easy as it looks.

Further on, we need to learn how to scale (and de-scale) everything. The number of Pods of an application should change over time to accommodate fluctuations in traffic and demand. Nodes should scale as well to fulfill the needs of our applications.

Kubernetes already has the tools that provide metrics and visibility into logs. It allows us to create auto-scaling rules. Yet, we might discover that Kuberentes alone is not enough and that we might need to extend our system with additional processes and tools. This phase is the subject of this book. By the time you finish reading it, you’ll be able to say that your clusters and applications are truly dynamic and resilient and that they require minimal manual involvement. We’ll try to make our system self-adaptive.

I mentioned the fourth phase. That, dear reader, is everything else. The last phase is mostly about keeping up with all the other goodies Kubernetes provides. It’s about following its roadmap and adapting our processes to get the benefits of each new release.

Buy it now from Amazon, LeanPub, or look for it through your favorite book seller.

Leave a Reply