When I finished the last book (The DevOps 2.5 Toolkit: Monitoring, Logging, and Auto-Scaling Kubernetes), I wanted to take a break from writing for a month or two. I thought that would clear my mind and help me decide which subject to tackle next. Those days were horrible. I could not make up my mind. So many cool and useful tech is emerging and being adopted. I was never as undecided as those weeks. Which should be my next step?
I could explore serverless. That's definitely useful, and it might be considered the next big thing. Or I could dive into Istio. It is probably the biggest and the most important project sitting on top of Kubernetes. Or I could tackle some smaller subjects. Kaniko is the missing link in continuous delivery. Building containers might be the only thing we still do on the host level, and Kaniko allows us to move that process inside containers. How about security scanning? It is one of the things that are mandatory in most organizations, and yet I did not include it in "The DevOps 2.4 Toolkit: Continuous Deployment To Kubernetes". Then there is skaffold, prow, KNative, and quite a few other tools that are becoming stable and very useful. Continue reading →
What do we do in Kubernetes after we master deployments and automate all the processes? We dive into monitoring, logging, auto-scaling, and other topics aimed at making our cluster resilient, self-sufficient, and self-adaptive. Continue reading →
There are quite a few candidates for your need for centralized logging. Which one should you choose? Will it be Papertrail, Elasticsearch-Fluentd-Kibana stack (EFK), AWS CloudWatch, GCP Stackdriver, Azure Log Analytics, or something else?
When possible and practical, I prefer a centralized logging solution provided as a service, instead of running it inside my clusters. Many things are easier when others are making sure that everything works. If we use Helm to install EFK, it might seem like an easy setup. However, maintenance is far from trivial. Elasticsearch requires a lot of resources. For smaller clusters, compute required to run Elasticsearch alone is likely higher than the price of Papertrail or similar solutions. If I can get a service managed by others for the same price as running the alternative inside my own cluster, service wins most of the time. But, there are a few exceptions. Continue reading →
Dashboards are useless! They are a waste or time. Get Netflix if you want to watch something. It's cheaper than any other option.
I repeated those words on many public occasions. I think that companies exaggerate the need for dashboards. They spend a lot of effort creating a bunch of graphs and put a lot of people in charge of staring at them. As if that's going to help anyone. The main advantage of dashboards is that they are colorful and full of lines, boxes, and labels. Those properties are always an easy sell to decision makers like CTOs and heads of departments. When a software vendor comes to a meeting with decision makers with authority to write checks, he knows that there is no sale without "pretty colors". It does not matter what that software does, but how it looks like. That's why every software company focuses on dashboards.
Think about it. What good is a dashboard for? Are we going to look at graphs until a bar reaches a red line indicating that a critical threshold is reached? If that's the case, why not create an alert that will trigger under the same conditions and stop wasting time staring at screens and waiting until something happens. Instead, we can be doing something more useful (like staring Netflix). Continue reading →
What to automate? Which parts of the delivery process are good candidates? Which applications will benefit from automation? At first, those sound like silly questions. Automate all your repetitive processes. If you think that you'll do the same thing manually more than once, automate it. Why would you waste your creative potential and knowledge by doing things that are much better done by scripts? If we can create robots that assemble cars, we can surely create a Jenkins job that builds software, runs tests, and deploys containers. Car assembly is undoubtedly harder to accomplish than building software. Yet, an average company does not adhere to that logic. Why is that?
Two main culprits prevent many companies from automating all the repetitive process; silos and legacy code. Continue reading →
Kubernetes HorizontalPodAutoscaler (HPA) and Cluster Autoscaler (CA) provide essential, yet very rudimentary mechanisms to scale our Pods and clusters. While they do scaling decently well, they do not solve our need to be alerted when there's something wrong, nor do they provide enough information required to find the cause of an issue. We'll need to expand our setup with additional tools that will allow us to store and query metrics as well as to receive notifications when there is an issue.
If we focus on tools that we can install and manage ourselves, there is very little doubt about what to use. If we look at the list of Cloud Native Computing Foundation (CNCF) projects, only two graduated so far (October 2018). Those are Kubernetes and Prometheus. Given that we are looking for a tool that will allow us to store and query metrics and that Prometheus fulfills that need, the choice is straightforward. That is not to say that there are no other similar tools worth considering. There are, but they are all service based. We might explore them later but, for now, we're focused on those that we can run inside our cluster. So, we'll add Prometheus to the mix and try to answer a simple question. What is Prometheus? Continue reading →
Unlike GKE, EKS does not come with Cluster Autoscaler. We'll have to configure it ourselves. We'll need to add a few tags to the Autoscaling Group dedicated to worker nodes, to put additional permissions to the Role we're using, and to install Cluster Autoscaler. Continue reading →
Knowing that HorizontalPodAutoscaler (HPA) manages auto-scaling of our applications, the question might arise regarding replicas. Should we define them in our Deployments and StatefulSets, or should we rely solely on HPA to manage them? Instead of answering that question directly, we'll explore different combinations and, based on results, define the strategy.
First, let's see how many Pods we have in our cluster right now.
You might not be able to use the same commands since they assume that go-demo-5 application is already running, that the cluster has HPA enabled, that you cloned the code, and a few other things. I presented the outputs so that you can follow the logic without running the same commands.
The output is as follows.
We can see that there are two replicas of the api Deployment, and three replicas of the db StatefulSets. Continue reading →
Kubernetes is probably the biggest project we know. It is vast, and yet many think that after a few weeks or months of reading and practice they know all there is to know about it. It's much bigger than that, and it is growing faster than most of us can follow. How far did you get in Kubernetes adoption?
From my experience, there are four main phases in Kubernetes adoption.
In the first phase, we create a cluster and learn intricacies of Kube API and different types of resources (e.g., Pods, Ingress, Deployments, StatefulSets, and so on). Once we are comfortable with the way Kubernetes works, we start deploying and managing our applications. By the end of this phase, we can shout "look at me, I have things running in my production Kubernetes cluster, and nothing blew up!" I explained most of this phase in The DevOps 2.3 Toolkit: Kubernetes. Continue reading →