Dashboards are useless! They are a waste or time. Get Netflix if you want to watch something. It's cheaper than any other option.
I repeated those words on many public occasions. I think that companies exaggerate the need for dashboards. They spend a lot of effort creating a bunch of graphs and put a lot of people in charge of staring at them. As if that's going to help anyone. The main advantage of dashboards is that they are colorful and full of lines, boxes, and labels. Those properties are always an easy sell to decision makers like CTOs and heads of departments. When a software vendor comes to a meeting with decision makers with authority to write checks, he knows that there is no sale without "pretty colors". It does not matter what that software does, but how it looks like. That's why every software company focuses on dashboards.
Think about it. What good is a dashboard for? Are we going to look at graphs until a bar reaches a red line indicating that a critical threshold is reached? If that's the case, why not create an alert that will trigger under the same conditions and stop wasting time staring at screens and waiting until something happens. Instead, we can be doing something more useful (like staring Netflix). Continue reading →
Kubernetes HorizontalPodAutoscaler (HPA) and Cluster Autoscaler (CA) provide essential, yet very rudimentary mechanisms to scale our Pods and clusters. While they do scaling decently well, they do not solve our need to be alerted when there's something wrong, nor do they provide enough information required to find the cause of an issue. We'll need to expand our setup with additional tools that will allow us to store and query metrics as well as to receive notifications when there is an issue.
If we focus on tools that we can install and manage ourselves, there is very little doubt about what to use. If we look at the list of Cloud Native Computing Foundation (CNCF) projects, only two graduated so far (October 2018). Those are Kubernetes and Prometheus. Given that we are looking for a tool that will allow us to store and query metrics and that Prometheus fulfills that need, the choice is straightforward. That is not to say that there are no other similar tools worth considering. There are, but they are all service based. We might explore them later but, for now, we're focused on those that we can run inside our cluster. So, we'll add Prometheus to the mix and try to answer a simple question. What is Prometheus? Continue reading →
The article that follows is an extract from the last chapter of The DevOps 2.2 Toolkit: Self-Sufficient Docker Clusters book. It provides a good summary into the processes and tools we explored in the quest to build a self-sufficient cluster that can (mostly) operate without humans.
We split the tasks that a self-sufficient system should perform into those related to services and those oriented towards infrastructure. Even though some of the tools are used in both groups, the division between the two allowed us to keep a clean separation between infrastructure and services running on top of it. Continue reading →
If you liked this article, you might be interested in The DevOps 2.2 Toolkit: Self-Sufficient Docker Clusters book. The book goes beyond Docker and schedulers and tries to explore ways for building self-adaptive and self-healing Docker clusters. If you are a Docker user and want to explore advanced techniques for creating clusters and managing services, this book might be just what you're looking for.
Please get a copy from Amazon, LeanPub, or look for it through your favorite book seller.
Give the book a try and let me know what you think.
In the Forwarding Logs From All Containers Running Anywhere Inside A Docker Swarm Cluster article, we managed to add centralized logging to our cluster. Logs from any container running inside any of the nodes are shipped to a central location. They are stored in ElasticSearch and available through Kibana. However, the fact that we have easy access to all the logs does not mean that we have all the information we would need to debug a problem or prevent it from happening in the first place. We need to complement our logs with the rest of the information about the system. We need much more than what logs alone can provide. Continue reading →