A Quick Introduction To Prometheus And Alertmanager

Kubernetes HorizontalPodAutoscaler (HPA) and Cluster Autoscaler (CA) provide essential, yet very rudimentary mechanisms to scale our Pods and clusters. While they do scaling decently well, they do not solve our need to be alerted when there's something wrong, nor do they provide enough information required to find the cause of an issue. We'll need to expand our setup with additional tools that will allow us to store and query metrics as well as to receive notifications when there is an issue.

If we focus on tools that we can install and manage ourselves, there is very little doubt about what to use. If we look at the list of Cloud Native Computing Foundation (CNCF) projects, only two graduated so far (October 2018). Those are Kubernetes and Prometheus. Given that we are looking for a tool that will allow us to store and query metrics and that Prometheus fulfills that need, the choice is straightforward. That is not to say that there are no other similar tools worth considering. There are, but they are all service based. We might explore them later but, for now, we're focused on those that we can run inside our cluster. So, we'll add Prometheus to the mix and try to answer a simple question. What is Prometheus?

Prometheus is a database (of sorts) designed to fetch (pull) and store highly dimensional time series data. Time series are identified by a metric name and a set of key-value pairs. Data is stored both in memory and on disk. Former allows fast retrieval of information, while the latter exists for fault tolerance.

Prometheus' query language allows us to easily find data that can be used both for graphs and, more importantly, for alerting. It does not attempt to provide "great" visualization experience. For that, it integrates with Grafana.

Unlike most other similar tools, we do not push data to Prometheus. Or, to be more precise, that is not the common way of getting metrics. Instead, Prometheus is a pull-based system that periodically fetches metrics from exporters. There are many third-party exporters we can use. But, in our case, the most crucial exporter is baked into Kubernetes. Prometheus can pull data from an exporter that transforms information from Kube API. Through it, we can fetch (almost) everything we might need. Or, at least, that's where the bulk of the information will be coming from.

Finally, storing metrics in Prometheus would not be of much use if we are not notified when there's something wrong. Even when we do integrate Prometheus with Grafana, that will only provide us with dashboards. I assume that you have better things to do than to stare at colorful graphs. So, we'll need a way to send alerts from Prometheus to, let's say, Slack. Luckily, Alertmanager allows us just that. It is a separate application maintained by the same community.

We'll see how all those pieces fit together through hands-on exercises. So, let's get going and install Prometheus, Alertmanager, and a few other applications.


For this tutorial, I will assume that you have an operational Kubernetes cluster. Such a cluster should have nginx Ingress, tiller (Helm Server), and Metrics Server (not required for GKE and AKS) up and running. I'll also assume that you have an IP through which you can access that cluster stored in environment variable LB_IP. Normally, that would be the IP of your nginx Ingress Service.

Feel free to use one of the following Gists to meet those requirements if you do not already have a Kubernetes cluster you can use for the exercises that follow.

The Gists are as follows.

  • gke-monitor.sh: GKE with 3 n1-standard-1 worker nodes, nginx Ingress, tiller, and cluster IP stored in environment variable LB_IP.
  • eks-monitor.sh: EKS with 3 t2.small worker nodes, nginx Ingress, tiller, Metrics Server, and cluster IP stored in environment variable LB_IP.
  • aks-monitor.sh: AKS with 3 Standard_B2s worker nodes, nginx Ingress, and tiller, and cluster IP stored in environment variable LB_IP.
  • docker-monitor.sh: Docker for Desktop with 2 CPUs, 3 GB RAM, nginx Ingress, tiller, Metrics Server, and cluster IP stored in environment variable LB_IP.
  • minikube-monitor.sh: minikube with 2 CPUs, 3 GB RAM, ingress, storage-provisioner, default-storageclass, and metrics-server addons enabled, tiller, and cluster IP stored in environment variable LB_IP.

Outside of the cluster, you should have kubectl and helm clients installed on your laptop.

Finally, all the examples are using YAML definitions stored in the vfarcic/k8s-specs repository. Please clone it and enter into the k8s-specs directory in your favorite terminal.

A note to Windows users

Git Bash might not be able to use the open command. If that's the case, replace open with echo. As a result, you'll get the full address that should be opened directly in your browser of choice.

A Quick Introduction To Prometheus And Alertmanager

We'll install Prometheus and a few other tools using Helm. Prometheus' Helm Chart is maintained as one of the official Charts. You can find more info in the project's README. If you focus on the variables in the Configuration section, you'll notice that there are quite a few things we can tweak. We won't go through all the variables. You can check the official documentation for that. Instead, we'll start with a basic setup, and extend it as our needs increase.

Let's take a look at the variables we'll use as a start.

The output is as follows.

All we're doing for now is defining resources for all five applications we'll install, as well as enabling Ingress with a few annotations that will make sure that we are not redirected to HTTPS version since we do not have certificates for our ad-hoc domains. We'll dive into the applications that'll be installed later. For now, we'll define the addresses for Prometheus and Alertmanager UIs.

Let's install the Chart.

The command we just executed should be self-explanatory, so we'll jump into the relevant parts of the output.

We can see that the Chart installed one DeamonSet and four Deployments.

The DeamonSet is Node Exporter, and it'll run a Pod on every node of the cluster. It provides node-specific metrics that will be pulled by Prometheus. The second exporter (Kube State Metrics) runs as a single replica Deployment. It fetches data from Kube API and transforms them into the Prometheus-friendly format. The two will provide most of the metrics we'll need. Later on, we might choose to expand them with additional exporters. For now, those two together with metrics fetched directly from Kube API should provide more metrics than we can absorb in a single article.

Further on, we got the Server, which is Prometheus itself. Alertmanager will forward alerts to their destination. Finally, there is Pushgateway that we might explore in one of the following articles.

While waiting for all those apps to become operational, we might explore the flow between them.

Prometheus Server pulls data from exporters. In our case, those are Node Exporter and Kube State Metrics. The job of those exporters is to fetch data from the source and transform it into the Prometheus-friendly format. Node Exporter gets the data from /proc and /sys volumes mounted on the nodes, while Kube State Metrics gets it from Kube API. Metrics are stored internally in Prometheus.

Apart from being able to query that data, we can define alerts. When an alert reaches its threshold, it is forwarded to Alertmanager that acts as a crossroad. Depending on its internal rules, it can forward those alerts further to various destinations like Slack, email, and HipChat (only to name a few).

The flow of data to and from Prometheus (arrows indicate the direction)

By now, Prometheus Server probably rolled out. We'll confirm that just in case.

Let's take a look at what is inside the Pod created through the prometheus-server Deployment.

The output, limited to the relevant parts, is as follows.

Besides the container based on the prom/prometheus image, we got another one created from jimmidyson/configmap-reload. The job of the latter is to reload Prometheus whenever we change the configuration stored in a ConfigMap.

Next, we might want to take a look at the prometheus-server ConfigMap, since it stores all the configuration Prometheus needs.

The output, limited to the relevant parts, is as follows.

We can see that the alerts are still empty. We'll change that soon.

Further down is the prometheus.yml config with scrape_configs taking most of the space. We could spend a whole article explaining the current config and the ways we could modify it. We will not do that because the config in front of you is bordering insanity. It's the prime example of how something can be made more complicated than it should be. In most cases, you should keep it as-is. If you do want to fiddle with it, please consult the official documentation.

Next, we'll take a quick look at Prometheus' screens.

A note to Windows users

Git Bash might not be able to use the open command. If that's the case, replace open with echo. As a result, you'll get the full address that should be opened directly in your browser of choice.

The config screen reflects the same information we already saw in the prometheus-server ConfigMap, so we'll move on.

Next, let's take a look at the targets.

That screen contains seven targets, each providing different metrics. Prometheus is periodically pulling data from those targets.

All the outputs and screenshots in this article are taken from AKS. You might see some differences depending on your Kubernetes flavor.

Prometheus' targets screen

A note to AKS users

The kubernetes-apiservers target might be red indicating that Prometheus cannot connect to it. That's OK since we won't use its metrics.

A note to minikube users

The kubernetes-service-endpoints target might have a few sources in red. There's no reason for alarm. Those are not reachable, but that won't affect our exercises.

We cannot find out what each of those targets provides from that screen. We'll try to query the exporters in the same way as Prometheus pulls them. To do that, we'll need to find out the Services through which we can access the exporters.

The output, from AKS, is as follows.

We are interested in prometheus-kube-state-metrics and prometheus-node-exporter since they provide access to data from the exporters we'll use in this article.

Next, we'll create a temporary Pod through which we'll access the data available through the exporters behind those Services.

We created a new Pod based on appropriate/curl. That image serves a single purpose of providing curl. We specified prometheus-node-exporter:9100/metrics as the command, which is equivalent to running curl with that address. As a result, a lot of metrics were output. They are all in the same key/value format with optional labels surrounded by curly braces ({ and }). On top of each metric, there is a HELP entry that explains its function as well as TYPE (.e.g, gauge). One of the metrics is as follows.

We can see that it provides Memory information field MemTotal_bytes and that the type is gauge. Below the TYPE is the actual metric with the key (node_memory_MemTotal_bytes) and value 3.878477824e+09.

Most of Node Exporter metrics are without labels, so we'll look for an example in the prometheus-kube-state-metrics exporter.

As you can see, the Kube State metrics follow the same pattern as those from the Node Exporter. The major difference is that most of them do have labels. An example is as follows.

That metric represents the time the Deployment prometheus-server was created inside the metrics Namespace.

I'll leave it to you to explore those metrics in more detail. We'll use quite a few of them soon.

For now, just remember that with the combination of the metrics coming from the Node Exporter, Kube State Metrics, and those coming from Kubernetes itself, we can cover most of our needs. Or, to be more precise, those provide data required for most of the basic and common use cases.

Next, we'll take a look at the alerts screen.

The screen is empty. Do not despair. We'll get back to that screen quite a few times. The alerts we'll be increasing as we progress. For now, just remember that's where you can find your alerts.

Finally, we'll open the graph screen.

That is where you'll spend your time debugging issues you'll discover through alerts.

As our first task, we'll try to retrieve information about our nodes. We'll use kube_node_info so let's take a look at its description (help) and its type.

The output, limited to the HELP and TYPE entries, is as follows.

You are likely to see variations between your results and mine. That's normal since our clusters probably have different amounts of resources, my bandwidth might be different, and so on. In some cases, my alerts will fire, and yours won't, or the other way around. I'll do my best to explain my experience and provide screenshots that accompany them. You'll have to compare that with what you see on your screen.

Now, let's try using that metric in Prometheus.

Please type the following query in the expression field.

Click the Execute button to retrieve the values of the kube_node_info metric.

If you check the HELP entry of the kube_node_info, you'll see that it provides information about a cluster node and that it is a gauge. A gauge is a metric that represents a single numerical value that can arbitrarily go up and down. That makes sense for information about nodes since their number can increase or decrease over time.

A Prometheus gauge is a metric that represents a single numerical value that can arbitrarily go up and down.

If we focus on the output, you'll notice that there are as many entries as there are worker nodes in the cluster. The value (1) is useless in this context. Labels, on the other hand, can provide some useful information. For example, in my case, operating system (os_image) is Ubuntu 16.04.5 LTS. Through that example, we can see that we can use the metrics not only to calculate values (e.g., available memory) but also to get a glimpse into the specifics of our system.

Prometheus' console output of the kube_node_info metric

Let's see if we can get a more meaningful query by combining that metric with one of the Prometheus' functions. We'll count the number of worker nodes in our cluster. The count is one of Prometheus' aggregation operators.

Please execute the expression that follows.

The output should show the total number of worker nodes in your cluster. In my case (AKS) there are 3. On the first look, that might not be very helpful. You might think that you should know without Prometheus how many nodes you have in your cluster. But that might not be true. One of the nodes might have failed, and it did not recuperate. That is especially true if you're running your cluster on-prem without scaling groups. Or maybe Cluster Autoscaler increased or decreased the number of nodes. Everything changes over time, either due to failures, through human actions, or through a system that adapts itself. No matter the reasons for volatility, we might want to be notified when something reaches a threshold. We'll use nodes as the first example.

Our mission is to define an alert that will notify us if there are more than three or less than one nodes in the cluster. We'll imagine that those are our limits and that we want to know if the lower or the upper thresholds are reached due to failures or Cluster Autoscaling.

We'll take a look at a new definition of the Prometheus Chart's values. Since the definition is big and it will get grow with time, from now on, we'll only look at the differences.

The output is as follows.

We added a new entry serverFiles.alerts. If you check Prometheus' Helm documentation, you'll see that it allows us to define alerts (hence the name). Inside it, we're using the "standard" Prometheus syntax for defining alerts.

Please consult Alerting Rules documentation for more info about the syntax.

We defined only one group of rules called nodes. Inside it are two rules. The first one (TooManyNodes) will notify us if there are more than 3 nodes for more than 15 minutes. The other (TooFewNodes) will do the opposite. It'll tell us if there are no nodes (` From now on, I won't comment (much) on the need to wait for a while until next config is propagated. If what you see on the screen does not coincide with what you're expecting, please wait for a while and refresh it.

You should see two alerts.

Both alerts are green since none evaluates to true. Depending on the Kuberentes flavor you choose, you either have only one node (e.g., Docker For Desktop and minikube) or you have three nodes (e.g., GKE, EKS, AKS). Since our alerts are checking whether we have less than one, or more than three nodes, neither of the conditions are met, no matter which Kubernetes flavor you're using.

If your cluster was not created through one of the Gists provided at the beginning of this article, then you might have more than three nodes in your cluster, and the alert will fire. If that's the case, I suggest you modify the mon/prom-values-nodes.yml file to adjust the threshold of the alert.

Prometheus' alerts screen

Seeing inactive alerts is boring, so I want to show you one that fires (becomes red). To do that, we can add more nodes to the cluster (unless you're using a single node cluster like Docker For Desktop and minikube). However, it would be easier to modify the expression of one of the alerts, so that's what we'll do next.

The output is as follows.

The new definition changed the condition of the TooManyNodes alert to fire if there are more than zero nodes. We also changed the for statement so that we do not need to wait for 15 minutes before the alert fires.

Let's upgrade the Chart one more time.

... and we'll go back to the alerts screen.

A few moments later (don't forget to refresh the screen), the alert will switch to the pending state, and the color will change to yellow. That means that the conditions for the alert are met (we do have more than zero nodes) but the for period did not yet expire.

Wait for a minute (duration of the for period) and refresh the screen. The alert's state switched to firing and the color changed to red. Prometheus sent our first alert.

Prometheus' alerts screen with one of the alerts firing

Where was the alert sent? Prometheus Helm Chart deployed Alertmanager and pre-configured Prometheus to send its alerts there. Let's take a look at it's UI.

We can see that one alert reached Alertmanager. If we click the + info button next to the TooManyNodes alert, we'll see the annotations (summary and description) as well as the labels (severity).

Alertmanager UI with one of the alerts expanded

We are likely not going to sit in front of the Alertmanager waiting for issues to appear. If that would be our goal, we could just as well wait for the alerts in Prometheus.

Displaying alerts is indeed not the reason why we have Alertmanager. It is supposed to receive alerts and dispatch them further. It is not doing anything of that sort simply because we did not yet define the rules it should use to forward alerts. That's our next task.

We'll take a look at yet another update of the Prometheus Chart values.

The output is as follows.

When we apply that definition, we'll add alertmanager.yml file to Alertmanager. If contains the rules it should use to dispatch alerts. The route section contains general rules that will be applied to all alerts that do not match one of the routes. The group_wait value makes Alertmanager wait for 10 seconds in case additional alerts from the same group arrive. That way, we'll avoid receiving multiple alerts of the same type.

When the first alert of a group is dispatched, it'll use the value of the group_interval field (5m) before sending the next batch of the new alerts from the same group.

The receiver field in the route section defines the default destination of the alerts. Those destinations are defined in the receivers section below. In our case, we're sending the alerts to the slack receiver by default.

The repeat_interval (set to 3h) defines the period after which alerts will be resent if Alertmanager continues receiving them.

The routes section defines specific rules. Only if none of them match, those in the route section above will be used. The routes section inherits properties from above so only those that we define in this section will change. We'll keep sending matching routes to slack, and the only change is the increase of the repeat_interval from 3h to 5d.

The critical part of the routes is the match section. It defines filters that are used to decide whether an alert is a match or not. In our case, only those with the labels severity: notify and frequency: low will be considered a match.

All in all, the alerts with severity label set to notify and frequency set to low will be resent every five days. All the other alerts will have a frequency of three hours.

The last section of our Alertmanager config is receivers. We have only one receiver named slack. Below the name is slack_config. It contains Slack-specific configuration. We could have used hipchat_config, pagerduty_config, or any other of the supported ones. Even if our destination is not one of those, we could always fall back to webhook_config and send a custom request to the API of our tool of choice.

For the list of all the supported receivers, please consult Alertmanager Configuration page.

Inside slack_config section, we have the api_url that contains the Slack address with the token from one of the rooms in the devops20 channel.

For information how to general an incoming webhook address for your Slack channel, please visit the Incoming Webhooks page.

Next is the send_resolved flag. When set to true, Alertmanager will send notifications not only when an alert is fired, but also when the issue that caused it is resolved.

We're using summary annotation as the title of the message, and the description annotation for the text. Both are using Go Templates. Those are the same annotations we defined in the Prometheus' alerts.

Finally, the title_link is set to http://my-prometheus.com/alerts. That is indeed not the address of your Prometheus UI but, since I could not know in advance what will be your domain, I put a non-existing one. Feel free to change my-prometheus.com to the value of the environment variable $PROM_ADDR. Or just leave it as-is knowing that if you click the link, it will not take you to your Prometheus UI.

Now that we explored Alertmanager configuration, we can proceed and upgrade the Chart.

A few moments later, Alertmanager will be reconfigured, and the next time it receives the alert from Prometheus, it'll dispatch it to Slack. We can confirm that by visiting the devops20.slack.com workspace. If you did not register already, please go to slack.devops20toolkit.com. Once you are a member, we can visit the devops25-tests channel.

You should see the Cluster increased notification. Don't get confused if you see other messages. You are likely not the only one running the exercises from this article.

Slack with an alert message received from Alertmanager

The DevOps 2.5 Toolkit: Monitoring, Logging, and Auto-Scaling Kubernetes

The article you just read is an extract from The DevOps 2.5 Toolkit: Monitoring, Logging, and Auto-Scaling Kubernetes.

What do we do in Kubernetes after we master deployments and automate all the processes? We dive into monitoring, logging, auto-scaling, and other topics aimed at making our cluster resilient, self-sufficient, and self-adaptive.

Kubernetes is probably the biggest project we know. It is vast, and yet many think that after a few weeks or months of reading and practice they know all there is to know about it. It's much bigger than that, and it is growing faster than most of us can follow. How far did you get in Kubernetes adoption?

From my experience, there are four main phases in Kubernetes adoption.

In the first phase, we create a cluster and learn intricacies of Kube API and different types of resources (e.g., Pods, Ingress, Deployments, StatefulSets, and so on). Once we are comfortable with the way Kubernetes works, we start deploying and managing our applications. By the end of this phase, we can shout "look at me, I have things running in my production Kubernetes cluster, and nothing blew up!" I explained most of this phase in The DevOps 2.3 Toolkit: Kubernetes.

The second phase is often automation. Once we become comfortable with how Kubernetes works and we are running production loads, we can move to automation. We often adopt some form of continuous delivery (CD) or continuous deployment (CDP). We create Pods with the tools we need, we build our software and container images, we run tests, and we deploy to production. When we're finished, most of our processes are automated, and we do not perform manual deployments to Kubernetes anymore. We can say that things are working and I'm not even touching my keyboard. I did my best to provide some insights into CD and CDP with Kubernetes in The DevOps 2.4 Toolkit: Continuous Deployment To Kubernetes.

The third phase is in many cases related to monitoring, alerting, logging, and scaling. The fact that we can run (almost) anything in Kubernetes and that it will do its best to make it fault tolerant and highly available, does not mean that our applications and clusters are bulletproof. We need to monitor the cluster, and we need alerts that will notify us of potential issues. When we do discover that there is a problem, we need to be able to query metrics and logs of the whole system. We can fix an issue only once we know what the root cause is. In highly dynamic distributed systems like Kubernetes, that is not as easy as it looks.

Further on, we need to learn how to scale (and de-scale) everything. The number of Pods of an application should change over time to accommodate fluctuations in traffic and demand. Nodes should scale as well to fulfill the needs of our applications.

Kubernetes already has the tools that provide metrics and visibility into logs. It allows us to create auto-scaling rules. Yet, we might discover that Kuberentes alone is not enough and that we might need to extend our system with additional processes and tools. This phase is the subject of this book. By the time you finish reading it, you'll be able to say that your clusters and applications are truly dynamic and resilient and that they require minimal manual involvement. We'll try to make our system self-adaptive.

I mentioned the fourth phase. That, dear reader, is everything else. The last phase is mostly about keeping up with all the other goodies Kubernetes provides. It's about following its roadmap and adapting our processes to get the benefits of each new release.

Buy it now from Amazon, LeanPub, or look for it through your favorite book seller.

1 thought on “A Quick Introduction To Prometheus And Alertmanager

  1. Admir Trakic


    Damn good post! Thx

    Looks I have to buy new book again! Too bad im slow reader …


    Sendt from iPhone Admir Trakic



Leave a Reply