Serverless Computing With Knative And Containers As A Service (CaaS)

This text was taken from the book and a Udemy course The DevOps Toolkit: Catalog, Patterns, And Blueprints

All the commands from this article are in the [](Gist with the commands: Gist.

Before we dive into the actual usage of Knative, let’s see which components we got and how they interact with each other. We’ll approach the subject by trying to figure out the flow of a request. It starts with a user.

When we send a request, it goes to the external load balancer, which, in our case, forwards it to IstioGateway accessible through a Kubernetes Service created when we installed Istio. That’s the same service that created the external load balancer if you are using GKE, EKS, or AKS. In the case of Minikube and Docker Desktop, there is no external load balancer, so you should use your imagination.

It could also be internal traffic but, for simplicity reasons, we’ll focus on users. The differences are trivial.

From the external LB, requests are forwarded to the cluster and picked up by the Istio Gateway. Its job is to forward requests to the destination Service associated with our application. However, we do not yet have the app, so let’s deploy something.

We’ll simulate that this is a deployment of a serverless application to production, so we’ll start by creating a Namespace.

kubectl create namespace production

Since we are using Istio, we might just as well tell it to auto-inject Istio proxy sidecars (Envoy). That is not a requirement. We could just as well use Istio only for Knative internal purposes, but since we already have it, why not go all the way in and use it for our applications?

As you already saw when we installed Knative, all we have to do is add the istio-injection label to the Namespace.

kubectl label namespace production \

Now comes the big moment. We are about to deploy our first application using Knative. To simplify the process, we’ll use kn CLI for that. Please visit the Installing the Knative CLI for the instructions on how to install it.

Remember that if you are using Windows Subsystem For Linux (WSL), you should follow the Linux instructions.

In the simplest form, all we have to do is execute kn service create and provide info like the Namespace, the container image, and the port of the process inside the container.

kn service create devops-toolkit \
    --namespace production \
    --image vfarcic/devops-toolkit-series \
    --port 80

W> You might receive an error message similar to RevisionFailed: Revision "devops-toolkit-...-1" failed with message: 0/3 nodes are available: 3 Insufficient cpu. If you did, your cluster does not have enough capacity. If you have Cluster Autoscaler, that will correct itself soon. If you created a GKE or AKS cluster using my Gist, you already have it. If you don’t, you might need to increase the capacity by adding more nodes to the cluster or increasing the size of the existing nodes. Please re-run the previous command after increasing the capacity (yourself or through Cluster Autoscaler).

The output is as follows.

Creating service 'devops-toolkit' in namespace 'production':

  0.030s The Configuration is still working to reflect the latest desired specification.
  0.079s The Route is still working to reflect the latest desired specification.
  0.126s Configuration "devops-toolkit" is waiting for a Revision to become ready.
 31.446s ...
 31.507s Ingress has not yet been reconciled.
 31.582s Waiting for load balancer to be ready
 31.791s Ready to serve.

Service 'devops-toolkit' created to latest revision 'devops-toolkit-...-1' is available at URL:

We can see that the Knative service is ready to serve and that, in my case, it is available through the subdomain devops-toolkit.production. It is a combination of the name of the Knative service (devops-toolkit), the Namespace (production), and the base domain (

If we ever forget which address was assigned to a service, we can retrieve it through the routes.

kubectl --namespace production \
    get routes

The output is as follows.

NAME           URL                                  READY REASON
devops-toolkit http://devops-toolkit.production.... True  

Finally, let’s see whether we can access the application through that URL. The commands will differ depending on whether you assigned as the base domain or kept If it is, we can open it in a browser. On the other hand, if the base domain is set to, we’ll have to inject the URL as the header of a request. We can use curl for that. The alternative is to change your hosts file. If you do, you should be able to use open commands.

Please execute the command that follows if you are using Minikube, Docker Desktop, or EKS. It will send a simple HTTP request using curl. Since the base domain is set to, but the service through which the app is accessible is set to a different host, we’ll "fake" the domain by adding the header into the request.

curl -H "Host:" \

Please execute the command that follows if you are using GKE or AKS.

W> If you are a Linux or a WSL user, I will assume that you created the alias open and set it to the xdg-open command. If that’s not the case, you will find instructions on how to do that in the Setting Up A Local Development Environment chapter. If you do not have the open command (or the alias), you should replace open with echo and copy and paste the output into your favorite browser.

open http://devops-toolkit.production.$INGRESS_HOST

If you used curl, you should see the HTML of the application as the output in your terminal. On the other hand, if you executed open, the home screen of the Web app we just deployed should have opened in your default browser.

How did that happen? How did we manage to have a fully operational application through a single command?

We know that any application running in Kubernetes needs quite a few types of resources. Since this is a stateless application, there should be, as a minimum, a Deployment, which creates a ReplicaSet, which creates Pods. We also need a HorizontalPodAutoscaler to ensure that the correct number of replicas is running. We need a Service through which other processes can access our applications. Finally, if an application should be accessible from outside the cluster, we would need an Ingress configured to use a specific (sub)domain and associate it with the Service. We might, and often do, need even more than those resources.

Yet, all we did was execute a single kn command with a few arguments. The only explanation could be that the command created all those resources. We’ll explore them later. For now, trust me when I say that a Deployment, a Service, and a Pod Autoscaler was created. On top of that, the Ingress Gateway we already commented on was reconfigured to forward all requests coming from a specific (sub)domain to our application. It also created a few other resources like a route, a configuration, an Istio VirtualService, and others. Finally, and potentially most importantly, it enveloped all those resources in a revision. Each new version of our app would create a new revision with all those resources. That way, Knative can employ rolling updates, rollbacks, separate which requests go to which version, and so on.

Creating all the resources we usually need to run an application in Kubernetes is already a considerable advantage. We removed the clutter and were able to focus only on the things that matter. All we specified was the image, the Namespace, and the port. In a "real world" situation, we would likely specify more. Still, the fact is that Knative allows us to skip defining things that Kubernetes needs, and focus on what differentiates one application from another. We’ll explore that aspect of Knative in a bit more detail later. For now, I hope you already saw that simplicity is one of the enormous advantages of Knative, even without diving into the part that makes our applications serverless.

Now that sufficient time passed, we might want to take a look at the Pods running in the production Namespace.

kubectl --namespace production \
    get pods

The output states that no resources were found in production namespace. If, in your case, there is still a Pod, you are indeed a fast reader, and you did not give Knative sufficient time. Wait for a few moments, and re-run the previous command.

Knative detected that no one was using our application for a while and decided that it is pointless to keep it running. That would be a massive waste of resources (e.g., memory and CPU). As a result, it scaled the app to zero replicas. Typically, that would mean that our users, when they decide to continue interacting with the application, would start receiving 5XX responses. That’s what would usually happen when none of the replicas are running. But, as you can probably guess, there’s much more to it than scaling to zero replicas and letting our users have a horrible experience. Knative is a solution for serverless workloads, and, as such, it not only scales our application, but it also queues the requests when there are no replicas to handle incoming requests. Let’s confirm that.

Please execute the command that follows if you are using Minikube, Docker Desktop, or EKS.

curl -H "Host:" \

Please execute the command that follows if you are using GKE or AKS.

open http://devops-toolkit.production.$INGRESS_HOST

As you can see, the application is available. From the user’s perspective, it’s as if it was never scaled to zero replicas.

When we sent a request, it was forwarded to the Ingress Gateway. But, since none of the replicas were available, instead of forwarding it to the associated Service, it sent it to Knative Activator. It, in turn, instructed the Autoscaler to increase the number of replicas of the Deployment. As you probably already know, the Deployment modified the ReplicaSet, which, in turn, created the missing Pod. Once a Pod was operational, it forwarded the queued requests to the Service, and we got the response.

The Autoscaler knew what to do because it was configured by the PodScaler created when we deployed the application.

In our case, only one Pod was created since the amount of traffic was very low. If the traffic increased, it could have been scaled to two, three, or any other number of replicas. The exact amount depends on the volume of concurrent requests.

We’ll explore the components and the scaling abilities in a bit more detail soon. For now, we’ll remove the application we created with Knative CLI since we are about to see a better way to define it.

kn service delete devops-toolkit \
    --namespace production

That’s it. The application is no more. We are back where we started.

Defining Knative Applications As Code

Executing commands like kn service create is great because it’s simple. But it is the wrong approach to deploying any type of applications, Knative included. Maintaining a system created through ad-hoc commands is a nightmare. The initial benefits from that approach are often overshadowed with the cost that comes later. But you already know that. You already understand the benefits of defining everything as code, storing everything in Git, and reconciling the actual and the desired state. I’m sure that you know the importance of the everything-as-code approach combined with GitOps. I hope you do since that is not the subject of this chapter.

We’ll move on with the assumption that you want to have a YAML file that defines your application. It could be some other format but, given that almost everything is YAML in the Kubernetes world, I will assume that’s what you need.

So, let’s take a look at how we would define our application. As you can probably guess, I already prepared a sample definition for us to use.

cat devops-toolkit.yaml

The output is as follows.

kind: Service
  name: devops-toolkit
      annotations: "0" "3"
      containerConcurrency: 100
      - image: vfarcic/devops-toolkit-series
        - containerPort: 80
            memory: 256Mi
            cpu: 100m

That definition could be shorter. If we’d want to accomplish the same result as what we had with the kn service create command, we wouldn’t need the annotations and the resources section. But I wanted to show you that we can be more precise. That’s one of the big advantages of Knative. It can be as simple or as complicated as we need it to be. But we do not have time to go into details of everything we might (or might not) want to do. Instead, we are trying to gain just enough knowledge to decide whether Knative is worth exploring in more detail and potentially adopting it as a way to define, deploy, and manage some (if not all) of our applications.

You can probably guess what that definition does. The annotations tell Knative that we want to scale to 0 replicas if there is no traffic and that there should never be more than 3 replicas. For example, we could choose never to scale below 2 replicas, and go way above 3. That would give us scalability and high-availability, without making our applications serverless, without scaling down to zero replicas.

The containerConcurrency field is set to 100, meaning that, in a simplified form, there should be one replica for every hundred concurrent requests, while never going above the maxScale value.

The image, ports, and resources fields should be self-explanatory since those are the same ones we would typically use in, let’s say, a Deployment.

There are also some limitations we might need be aware of. The most important one is that we can have only one container for each application managed by Knative. If you try to add additional entries to the containers array, you’d see that kubectl apply would throw an error. That might change in the future, but, for now (August 2020), it is something you should be aware of.

That’s it. Let’s apply that definition and see what we’ll get.

kubectl --namespace production apply \
    --filename devops-toolkit.yaml

We created a single resource. We did not specify a Deployment, nor we created a Service. We did not define a HorizontalPodAutoscaler. We did not create any of the things we usually do. Still, our application should have all those and quite a few others. It should be fully operational, it should be scalable, and it should be serverless. Knative created all those resources, and it made our application serverless through that single short YAML definition. That is a very different approach from what we typically expect from Kubernetes.

Kubernetes is, in a way, a platform to build platforms. It allows us to create very specialized resources that provide value only when combined together. An application runs in Pods, Pods need ReplicaSets to scale, ReplicaSets need Deployments for applying new revisions. Communication is done through Services. External access is provided through Ingress. And so on and so forth. Usually, we need to create and maintain all those, and quite a few other resources ourselves. So, we end up with many YAML files, a lot of repetition, and with a lot of definitions that are not valuable to end-users, but instead required for Kubernetes’ internal operations. Knative simplifies all that by requiring us to define only the differentiators and only the things that matter to us. It provides a layer on top of Kubernetes that, among other things, aims to simplify the way we define our applications.

We’ll take a closer look at some (not all) of the resources Knative created for us. But, before we do that, let’s confirm that our application is indeed running and accessible.

Please execute the command that follows if you are using Minikube, Docker Desktop, or EKS.

curl -H "Host:" \

Please execute the command that follows if you are using GKE or AKS.

open http://devops-toolkit.production.$INGRESS_HOST

You already saw a similar result before. The major difference is that, this time, we applied a YAML definition instead of relying on kn service create to do the work. As such, we can store that definition in a Git repository. We can apply whichever process we use to make changes to the code, and we can hook it into whichever CI/CD tool we are using.

Now, let’s see which resources were created for us. The right starting point is kservice since that is the only one we created. Whatever else might be running in the production Namespace was created by Knative and not us.

kubectl --namespace production \
    get kservice

The output is as follows.

NAME           URL                      LATESTCREATED      LATESTREADY        READY REASON
devops-toolkit http://devops-toolkit... devops-toolkit-... devops-toolkit-... True    

As I already mentioned, that single resource created quite a few others. For example, we have revisions. But, to get to revisions, we might need to talk about Knative Configuration.

kubectl --namespace production \
    get configuration

The output is as follows.

devops-toolkit devops-toolkit-... devops-toolkit-... True

The Configuration resource contains and maintains the desired state of our application. Whenever we change Knative Service, we are effectively changing the Configuration, which, in turn, creates a new Revision.

kubectl --namespace production \
    get revisions

The output is as follows.

devops-toolkit-k8j9j devops-toolkit devops-toolkit-k8j9j 1          True  

Each time we deploy a new version of our application, a new immutable revision is created. It is a collection of almost all the application-specific resources. Each has a separate Service, a Deployment, a Knative PodAutoscaler, and, potentially, a few other resources. Creating revisions allows Knative to decide which request goes where, how to rollback, and a few other things.

Now that we mentioned Deployments, Services, and other resources, let’s confirm that they were indeed created. Let’s start with Deployments.

kubectl --namespace production \
    get deployments

The output is as follows.

NAME                          READY UP-TO-DATE AVAILABLE AGE
devops-toolkit-...-deployment 0/0   0          0         13m

Deployment is indeed there. The curious thing is that 0 out of 0 replicas are ready. Since it’s been a while since we interacted with the application, Knative decided that there is no point running it. So, it scaled it to zero replicas. As you already saw, it will scale back up when we start sending requests to the associated Service. Let’s take a look at them as well.

kubectl --namespace production \
    get services,virtualservices

The output is as follows.

NAME                               TYPE         CLUSTER-IP    EXTERNAL-IP               PORT(S)    AGE
service/devops-toolkit             ExternalName <none>        cluster-local-gateway.... <none>     2m47s
service/devops-toolkit-...         ClusterIP <none>                    80/TCP     3m6s
service/devops-toolkit-...-private ClusterIP  <none>                    80/TCP,... 3m6s

NAME               GATEWAYS              HOSTS               AGE
virtualservice.... [knative-serving/...] [devops-...]        2m48s
virtualservice.... [mesh]                [devops-toolkit...] 2m48s

We can see that Knative created Kubernetes Services, but also Istio VirtualServices. Since we told it that we want to combine it with Istio, it understood that we need not only Kubernetes core resources, but also those specific to Istio. If we chose a different service mesh, it would create whatever makes sense for it.

Further on, we got the PodAutoscaler.

kubectl --namespace production \
    get podautoscalers

The output is as follows.

devops-toolkit-... 0            0           False NoTraffic

PodAutoscaler is, as you can guess by its name, in charge of scaling the Pods to comply with the changes in traffic, or whichever other criteria we might use. By default, it measures the incoming traffic, but it can be extended to use formulas based on queries from, for example, Prometheus.

Finally, we got a Route.

kubectl --namespace production \
    get routes

The output is as follows.

NAME           URL                       READY REASON
devops-toolkit http://devops-toolkit.... True    

Routes are mapping endpoints (e.g., a subdomain) to one or more revisions of the application. They can be configured in quite a few different ways, but, in its essence, it is the entity that routes the traffic to our applications.

We are almost finished. There is only one crucial thing left to observe, at least from the perspective of a quick overview of Knative. What happens when many requests are "bombing" our application? We saw that when we do not interact with the app, it is scaled down to zero replicas. We also saw that when we send a request to it, it scales up to one replica. But, what would happen if we start sending five hundred concurrent requests? Take another look at devops-toolkit.yaml and try to guess. It shouldn’t be hard.

Did you guess how many replicas we should have if we start sending five hundred concurrent requests? Let’s assume that you did, and let’s see whether you were right.

We’ll use Siege to send requests to our application. To be more specific, we’ll use it to send a stream of five hundred concurrent requests over sixty seconds. We’ll also retrieve all the Pods from the production Namespace right after siege is finished "bombing" the application.

As before, the commands will differ slightly depending on the Kubernetes platform you’re using.

W> You will NOT be able to use Siege with Docker Desktop. That should not be a big deal since the essential thing is the output, which you can see here.

Please execute the command that follows if you are using minikube or EKS.

kubectl run siege \
    --image yokogawa/siege \
    --generator run-pod/v1 \
    -it --rm \
    -- --concurrent 500 --time 60S \
    --header "Host:" \
    "http://$INGRESS_HOST" \
    && kubectl --namespace production \
    get pods

Please execute the command that follows if you are using GKE or AKS.

kubectl run siege \
    --image yokogawa/siege \
    --generator run-pod/v1 \
    -it --rm \
    -- --concurrent 500 --time 60S \
    "http://devops-toolkit.production.$INGRESS_HOST" \
    && kubectl --namespace production \
    get pods

The output, in my case, is as follows.

Transactions:             40697 hits
Availability:            100.00 %
Elapsed time:             59.53 secs
Data transferred:         83.72 MB
Response time:             0.22 secs
Transaction rate:        683.64 trans/sec
Throughput:                1.41 MB/sec
Concurrency:             149.94
Successful transactions:  40699
Failed transactions:          0
Longest transaction:       5.30
Shortest transaction:      0.00
NAME                              READY STATUS  RESTARTS AGE
devops-toolkit-...-deployment-... 3/3   Running 0        58s
devops-toolkit-...-deployment-... 3/3   Running 0        60s
devops-toolkit-...-deployment-... 3/3   Running 0        58s

We can see that, in my case, over forty thousand requests were sent, and the availability is 100.00 %. That might not always be the situation, so don’t be alarmed if, in your case, it’s a slightly lower figure. Your cluster might not even have enough capacity to handle the increase in workload and might need to scale up. In such a case, the time required to scale up the cluster might have been too long for all the requests to be processed. You can always wait for a while for all the Pods to terminate and try again with increased cluster capacity.

For now, Knative does not give 100% availability. I was lucky. If you have huge variations in traffic, you can expect something closer to 99.9% availability. But that is only when there is a huge difference like the one we just had. Our traffic jumped from zero to a continuous stream of five hundred concurrent requests within milliseconds. For the "normal" usage, it should be closer to 100% (e.g., 99.99%) availability.

What truly matters is that the number of Pods was increased from zero to three. Typically, there should be five Pods since we set the containerConcurrency value to 100, and we were streaming 500 concurrent requests. But we also set the maxScale annotation to 3, so it reached the limit of the allowed number of replicas.

While you’re reading this, Knative probably already started scaling down the application. It probably scaled it to one replica, to keep it warm in case new requests come in. After a while, it should scale down to nothing (zero replicas) as long as traffic keeps being absent.

The vital thing to note is that Knative does not interpret the traffic based only on the current metrics. It will not scale up when the first request that cannot be handled with the existing replicas kicks in. It will also not scale down to zero replicas the moment all requests stop coming in. It changes things gradually, and it uses both current and historical metrics to figure out what to do next.

Assuming that you are not a very fast reader, the number of Pods should have dropped to zero by now. Let’s confirm that.

kubectl --namespace production \
    get pods

The output states that no resources were found in production namespace. In your case, a Pod might still be running, or the status might be terminating. If that’s the case, wait for a while longer and repeat the previous command.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s