There are quite a few candidates for your need for centralized logging. Which one should you choose? Will it be Papertrail, Elasticsearch-Fluentd-Kibana stack (EFK), AWS CloudWatch, GCP Stackdriver, Azure Log Analytics, or something else?
When possible and practical, I prefer a centralized logging solution provided as a service, instead of running it inside my clusters. Many things are easier when others are making sure that everything works. If we use Helm to install EFK, it might seem like an easy setup. However, maintenance is far from trivial. Elasticsearch requires a lot of resources. For smaller clusters, compute required to run Elasticsearch alone is likely higher than the price of Papertrail or similar solutions. If I can get a service managed by others for the same price as running the alternative inside my own cluster, service wins most of the time. But, there are a few exceptions. Continue reading →
Dashboards are useless! They are a waste or time. Get Netflix if you want to watch something. It's cheaper than any other option.
I repeated those words on many public occasions. I think that companies exaggerate the need for dashboards. They spend a lot of effort creating a bunch of graphs and put a lot of people in charge of staring at them. As if that's going to help anyone. The main advantage of dashboards is that they are colorful and full of lines, boxes, and labels. Those properties are always an easy sell to decision makers like CTOs and heads of departments. When a software vendor comes to a meeting with decision makers with authority to write checks, he knows that there is no sale without "pretty colors". It does not matter what that software does, but how it looks like. That's why every software company focuses on dashboards.
Think about it. What good is a dashboard for? Are we going to look at graphs until a bar reaches a red line indicating that a critical threshold is reached? If that's the case, why not create an alert that will trigger under the same conditions and stop wasting time staring at screens and waiting until something happens. Instead, we can be doing something more useful (like staring Netflix). Continue reading →
What to automate? Which parts of the delivery process are good candidates? Which applications will benefit from automation? At first, those sound like silly questions. Automate all your repetitive processes. If you think that you'll do the same thing manually more than once, automate it. Why would you waste your creative potential and knowledge by doing things that are much better done by scripts? If we can create robots that assemble cars, we can surely create a Jenkins job that builds software, runs tests, and deploys containers. Car assembly is undoubtedly harder to accomplish than building software. Yet, an average company does not adhere to that logic. Why is that?
Two main culprits prevent many companies from automating all the repetitive process; silos and legacy code. Continue reading →
Kubernetes HorizontalPodAutoscaler (HPA) and Cluster Autoscaler (CA) provide essential, yet very rudimentary mechanisms to scale our Pods and clusters. While they do scaling decently well, they do not solve our need to be alerted when there's something wrong, nor do they provide enough information required to find the cause of an issue. We'll need to expand our setup with additional tools that will allow us to store and query metrics as well as to receive notifications when there is an issue.
If we focus on tools that we can install and manage ourselves, there is very little doubt about what to use. If we look at the list of Cloud Native Computing Foundation (CNCF) projects, only two graduated so far (October 2018). Those are Kubernetes and Prometheus. Given that we are looking for a tool that will allow us to store and query metrics and that Prometheus fulfills that need, the choice is straightforward. That is not to say that there are no other similar tools worth considering. There are, but they are all service based. We might explore them later but, for now, we're focused on those that we can run inside our cluster. So, we'll add Prometheus to the mix and try to answer a simple question. What is Prometheus? Continue reading →
Unlike GKE, EKS does not come with Cluster Autoscaler. We'll have to configure it ourselves. We'll need to add a few tags to the Autoscaling Group dedicated to worker nodes, to put additional permissions to the Role we're using, and to install Cluster Autoscaler. Continue reading →
Knowing that HorizontalPodAutoscaler (HPA) manages auto-scaling of our applications, the question might arise regarding replicas. Should we define them in our Deployments and StatefulSets, or should we rely solely on HPA to manage them? Instead of answering that question directly, we'll explore different combinations and, based on results, define the strategy.
First, let's see how many Pods we have in our cluster right now.
You might not be able to use the same commands since they assume that go-demo-5 application is already running, that the cluster has HPA enabled, that you cloned the code, and a few other things. I presented the outputs so that you can follow the logic without running the same commands.
The output is as follows.
We can see that there are two replicas of the api Deployment, and three replicas of the db StatefulSets. Continue reading →
Kubernetes is probably the biggest project we know. It is vast, and yet many think that after a few weeks or months of reading and practice they know all there is to know about it. It's much bigger than that, and it is growing faster than most of us can follow. How far did you get in Kubernetes adoption?
From my experience, there are four main phases in Kubernetes adoption.
In the first phase, we create a cluster and learn intricacies of Kube API and different types of resources (e.g., Pods, Ingress, Deployments, StatefulSets, and so on). Once we are comfortable with the way Kubernetes works, we start deploying and managing our applications. By the end of this phase, we can shout "look at me, I have things running in my production Kubernetes cluster, and nothing blew up!" I explained most of this phase in The DevOps 2.3 Toolkit: Kubernetes. Continue reading →
Continuous Deployment is about making a decision to do things right. It is a clear goal and a proof that the changes across all levels were successful. The primary obstacle is thinking that we can get there without drastic changes in the application's architecture, processes, and culture. Tools are the least of our problems.
I spend a significant chunk of my time helping companies improve their systems. The most challenging part of my job is going back home after an engagement knowing that the next time I visit the same company, I will discover that there was no significant improvement. I cannot say that is not partly my fault. It certainly is. I might not be very good at what I do. Or maybe I am not good at conveying the right message. Maybe my advice was wrong. There can be many reasons for those failures, and I do admit that they are probably mostly my fault. Still, I cannot shake the feeling that my failures are caused by something else. I think that the root cause is in false expectations. Continue reading →
Let's paint a high-level picture of the continuous delivery pipeline. To be more precise, we'll draw a diagram instead of painting anything. But, before we dive into a continuous delivery diagram, we'll refresh our memory with the one we used before for describing continuous deployment.
The continuous deployment pipeline contains all the steps from pushing a commit to deploying and testing a release in production.
Continuous delivery removes one of the stages from the continuous deployment pipeline. We do NOT want to deploy a new release automatically. Instead, we want humans to decide whether a release should be upgraded in production. If it should, we need to decide when will that happen. Those (human) decisions are, in our case, happening as Git operations. We'll comment on them soon. For now, the important note is that the deploy stage is now removed from pipelines residing in application repositories. Continue reading →