This text was taken from the book and a Udemy course The DevOps Toolkit: Catalog, Patterns, And Blueprints
All the commands from this article are in the [04-03-knative.sh](Gist with the commands: https://gist.github.com/dc4ba562328c1d088047884026371f1f) Gist.
Before we dive into the actual usage of Knative, let’s see which components we got and how they interact with each other. We’ll approach the subject by trying to figure out the flow of a request. It starts with a user.
When we send a request, it goes to the external load balancer, which, in our case, forwards it to IstioGateway accessible through a Kubernetes Service created when we installed Istio. That’s the same service that created the external load balancer if you are using GKE, EKS, or AKS. In the case of Minikube and Docker Desktop, there is no external load balancer, so you should use your imagination.
It could also be internal traffic but, for simplicity reasons, we’ll focus on users. The differences are trivial.
From the external LB, requests are forwarded to the cluster and picked up by the Istio Gateway. Its job is to forward requests to the destination Service associated with our application. However, we do not yet have the app, so let’s deploy something.
We’ll simulate that this is a deployment of a serverless application to production, so we’ll start by creating a Namespace.
kubectl create namespace production
Since we are using Istio, we might just as well tell it to auto-inject Istio proxy sidecars (Envoy). That is not a requirement. We could just as well use Istio only for Knative internal purposes, but since we already have it, why not go all the way in and use it for our applications?
As you already saw when we installed Knative, all we have to do is add the istio-injection
label to the Namespace.
kubectl label namespace production \
istio-injection=enabled
Now comes the big moment. We are about to deploy our first application using Knative. To simplify the process, we’ll use kn
CLI for that. Please visit the Installing the Knative CLI for the instructions on how to install it.
Remember that if you are using Windows Subsystem For Linux (WSL), you should follow the Linux instructions.
In the simplest form, all we have to do is execute kn service create
and provide info like the Namespace, the container image, and the port of the process inside the container.
kn service create devops-toolkit \
--namespace production \
--image vfarcic/devops-toolkit-series \
--port 80
W> You might receive an error message similar to RevisionFailed: Revision "devops-toolkit-...-1" failed with message: 0/3 nodes are available: 3 Insufficient cpu
. If you did, your cluster does not have enough capacity. If you have Cluster Autoscaler, that will correct itself soon. If you created a GKE or AKS cluster using my Gist, you already have it. If you don’t, you might need to increase the capacity by adding more nodes to the cluster or increasing the size of the existing nodes. Please re-run the previous command after increasing the capacity (yourself or through Cluster Autoscaler).
The output is as follows.
Creating service 'devops-toolkit' in namespace 'production':
0.030s The Configuration is still working to reflect the latest desired specification.
0.079s The Route is still working to reflect the latest desired specification.
0.126s Configuration "devops-toolkit" is waiting for a Revision to become ready.
31.446s ...
31.507s Ingress has not yet been reconciled.
31.582s Waiting for load balancer to be ready
31.791s Ready to serve.
Service 'devops-toolkit' created to latest revision 'devops-toolkit-...-1' is available at URL:
http://devops-toolkit.production.34.75.214.7.xip.io
We can see that the Knative service is ready to serve
and that, in my case, it is available through the subdomain devops-toolkit.production
. It is a combination of the name of the Knative service (devops-toolkit
), the Namespace (production
), and the base domain (34.75.214.7.xip.io
).
If we ever forget which address was assigned to a service, we can retrieve it through the routes
.
kubectl --namespace production \
get routes
The output is as follows.
NAME URL READY REASON
devops-toolkit http://devops-toolkit.production.... True
Finally, let’s see whether we can access the application through that URL
. The commands will differ depending on whether you assigned xip.io as the base domain or kept example.com. If it is xip.io, we can open it in a browser. On the other hand, if the base domain is set to example.com
, we’ll have to inject the URL as the header of a request. We can use curl
for that. The alternative is to change your hosts file. If you do, you should be able to use open
commands.
Please execute the command that follows if you are using Minikube, Docker Desktop, or EKS. It will send a simple HTTP request using
curl
. Since the base domain is set to example.com, but the service through which the app is accessible is set to a different host, we’ll "fake" the domain by adding the header into the request.
curl -H "Host: devops-toolkit.production.example.com" \
http://$INGRESS_HOST
Please execute the command that follows if you are using GKE or AKS.
W> If you are a Linux or a WSL user, I will assume that you created the alias
open
and set it to the xdg-open
command. If that’s not the case, you will find instructions on how to do that in the Setting Up A Local Development Environment chapter. If you do not have the open
command (or the alias), you should replace open
with echo
and copy and paste the output into your favorite browser.
open http://devops-toolkit.production.$INGRESS_HOST
If you used curl
, you should see the HTML of the application as the output in your terminal. On the other hand, if you executed open
, the home screen of the Web app we just deployed should have opened in your default browser.
How did that happen? How did we manage to have a fully operational application through a single command?
We know that any application running in Kubernetes needs quite a few types of resources. Since this is a stateless application, there should be, as a minimum, a Deployment, which creates a ReplicaSet, which creates Pods. We also need a HorizontalPodAutoscaler to ensure that the correct number of replicas is running. We need a Service through which other processes can access our applications. Finally, if an application should be accessible from outside the cluster, we would need an Ingress configured to use a specific (sub)domain and associate it with the Service. We might, and often do, need even more than those resources.
Yet, all we did was execute a single kn
command with a few arguments. The only explanation could be that the command created all those resources. We’ll explore them later. For now, trust me when I say that a Deployment, a Service, and a Pod Autoscaler was created. On top of that, the Ingress Gateway we already commented on was reconfigured to forward all requests coming from a specific (sub)domain to our application. It also created a few other resources like a route, a configuration, an Istio VirtualService, and others. Finally, and potentially most importantly, it enveloped all those resources in a revision. Each new version of our app would create a new revision with all those resources. That way, Knative can employ rolling updates, rollbacks, separate which requests go to which version, and so on.
Creating all the resources we usually need to run an application in Kubernetes is already a considerable advantage. We removed the clutter and were able to focus only on the things that matter. All we specified was the image, the Namespace, and the port. In a "real world" situation, we would likely specify more. Still, the fact is that Knative allows us to skip defining things that Kubernetes needs, and focus on what differentiates one application from another. We’ll explore that aspect of Knative in a bit more detail later. For now, I hope you already saw that simplicity is one of the enormous advantages of Knative, even without diving into the part that makes our applications serverless.
Now that sufficient time passed, we might want to take a look at the Pods running in the production
Namespace.
kubectl --namespace production \
get pods
The output states that no resources
were found in production namespace.
If, in your case, there is still a Pod, you are indeed a fast reader, and you did not give Knative sufficient time. Wait for a few moments, and re-run the previous command.
Knative detected that no one was using our application for a while and decided that it is pointless to keep it running. That would be a massive waste of resources (e.g., memory and CPU). As a result, it scaled the app to zero replicas. Typically, that would mean that our users, when they decide to continue interacting with the application, would start receiving 5XX responses. That’s what would usually happen when none of the replicas are running. But, as you can probably guess, there’s much more to it than scaling to zero replicas and letting our users have a horrible experience. Knative is a solution for serverless workloads, and, as such, it not only scales our application, but it also queues the requests when there are no replicas to handle incoming requests. Let’s confirm that.
Please execute the command that follows if you are using Minikube, Docker Desktop, or EKS.
curl -H "Host: devops-toolkit.production.example.com" \
http://$INGRESS_HOST
Please execute the command that follows if you are using GKE or AKS.
open http://devops-toolkit.production.$INGRESS_HOST
As you can see, the application is available. From the user’s perspective, it’s as if it was never scaled to zero replicas.
When we sent a request, it was forwarded to the Ingress Gateway. But, since none of the replicas were available, instead of forwarding it to the associated Service, it sent it to Knative Activator. It, in turn, instructed the Autoscaler to increase the number of replicas of the Deployment. As you probably already know, the Deployment modified the ReplicaSet, which, in turn, created the missing Pod. Once a Pod was operational, it forwarded the queued requests to the Service, and we got the response.
The Autoscaler knew what to do because it was configured by the PodScaler created when we deployed the application.
In our case, only one Pod was created since the amount of traffic was very low. If the traffic increased, it could have been scaled to two, three, or any other number of replicas. The exact amount depends on the volume of concurrent requests.
We’ll explore the components and the scaling abilities in a bit more detail soon. For now, we’ll remove the application we created with Knative CLI since we are about to see a better way to define it.
kn service delete devops-toolkit \
--namespace production
That’s it. The application is no more. We are back where we started.
Defining Knative Applications As Code
Executing commands like kn service create
is great because it’s simple. But it is the wrong approach to deploying any type of applications, Knative included. Maintaining a system created through ad-hoc commands is a nightmare. The initial benefits from that approach are often overshadowed with the cost that comes later. But you already know that. You already understand the benefits of defining everything as code, storing everything in Git, and reconciling the actual and the desired state. I’m sure that you know the importance of the everything-as-code approach combined with GitOps. I hope you do since that is not the subject of this chapter.
We’ll move on with the assumption that you want to have a YAML file that defines your application. It could be some other format but, given that almost everything is YAML in the Kubernetes world, I will assume that’s what you need.
So, let’s take a look at how we would define our application. As you can probably guess, I already prepared a sample definition for us to use.
cat devops-toolkit.yaml
The output is as follows.
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: devops-toolkit
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "0"
autoscaling.knative.dev/maxScale: "3"
spec:
containerConcurrency: 100
containers:
- image: vfarcic/devops-toolkit-series
ports:
- containerPort: 80
resources:
limits:
memory: 256Mi
cpu: 100m
That definition could be shorter. If we’d want to accomplish the same result as what we had with the kn service create
command, we wouldn’t need the annotations
and the resources
section. But I wanted to show you that we can be more precise. That’s one of the big advantages of Knative. It can be as simple or as complicated as we need it to be. But we do not have time to go into details of everything we might (or might not) want to do. Instead, we are trying to gain just enough knowledge to decide whether Knative is worth exploring in more detail and potentially adopting it as a way to define, deploy, and manage some (if not all) of our applications.
You can probably guess what that definition does. The annotations
tell Knative that we want to scale to 0
replicas if there is no traffic and that there should never be more than 3
replicas. For example, we could choose never to scale below 2
replicas, and go way above 3
. That would give us scalability and high-availability, without making our applications serverless, without scaling down to zero replicas.
The containerConcurrency
field is set to 100
, meaning that, in a simplified form, there should be one replica for every hundred concurrent requests, while never going above the maxScale
value.
The image
, ports
, and resources
fields should be self-explanatory since those are the same ones we would typically use in, let’s say, a Deployment.
There are also some limitations we might need be aware of. The most important one is that we can have only one container for each application managed by Knative. If you try to add additional entries to the containers
array, you’d see that kubectl apply
would throw an error. That might change in the future, but, for now (August 2020), it is something you should be aware of.
That’s it. Let’s apply that definition and see what we’ll get.
kubectl --namespace production apply \
--filename devops-toolkit.yaml
We created a single resource. We did not specify a Deployment, nor we created a Service. We did not define a HorizontalPodAutoscaler. We did not create any of the things we usually do. Still, our application should have all those and quite a few others. It should be fully operational, it should be scalable, and it should be serverless. Knative created all those resources, and it made our application serverless through that single short YAML definition. That is a very different approach from what we typically expect from Kubernetes.
Kubernetes is, in a way, a platform to build platforms. It allows us to create very specialized resources that provide value only when combined together. An application runs in Pods, Pods need ReplicaSets to scale, ReplicaSets need Deployments for applying new revisions. Communication is done through Services. External access is provided through Ingress. And so on and so forth. Usually, we need to create and maintain all those, and quite a few other resources ourselves. So, we end up with many YAML files, a lot of repetition, and with a lot of definitions that are not valuable to end-users, but instead required for Kubernetes’ internal operations. Knative simplifies all that by requiring us to define only the differentiators and only the things that matter to us. It provides a layer on top of Kubernetes that, among other things, aims to simplify the way we define our applications.
We’ll take a closer look at some (not all) of the resources Knative created for us. But, before we do that, let’s confirm that our application is indeed running and accessible.
Please execute the command that follows if you are using Minikube, Docker Desktop, or EKS.
curl -H "Host: devops-toolkit.production.example.com" \
http://$INGRESS_HOST
Please execute the command that follows if you are using GKE or AKS.
open http://devops-toolkit.production.$INGRESS_HOST
You already saw a similar result before. The major difference is that, this time, we applied a YAML definition instead of relying on kn service create
to do the work. As such, we can store that definition in a Git repository. We can apply whichever process we use to make changes to the code, and we can hook it into whichever CI/CD tool we are using.
Now, let’s see which resources were created for us. The right starting point is kservice
since that is the only one we created. Whatever else might be running in the production
Namespace was created by Knative and not us.
kubectl --namespace production \
get kservice
The output is as follows.
NAME URL LATESTCREATED LATESTREADY READY REASON
devops-toolkit http://devops-toolkit... devops-toolkit-... devops-toolkit-... True
As I already mentioned, that single resource created quite a few others. For example, we have revisions. But, to get to revisions, we might need to talk about Knative Configuration.
kubectl --namespace production \
get configuration
The output is as follows.
NAME LATESTCREATED LATESTREADY READY REASON
devops-toolkit devops-toolkit-... devops-toolkit-... True
The Configuration resource contains and maintains the desired state of our application. Whenever we change Knative Service, we are effectively changing the Configuration, which, in turn, creates a new Revision.
kubectl --namespace production \
get revisions
The output is as follows.
NAME CONFIG NAME K8S SERVICE NAME GENERATION READY REASON
devops-toolkit-k8j9j devops-toolkit devops-toolkit-k8j9j 1 True
Each time we deploy a new version of our application, a new immutable revision is created. It is a collection of almost all the application-specific resources. Each has a separate Service, a Deployment, a Knative PodAutoscaler, and, potentially, a few other resources. Creating revisions allows Knative to decide which request goes where, how to rollback, and a few other things.
Now that we mentioned Deployments, Services, and other resources, let’s confirm that they were indeed created. Let’s start with Deployments.
kubectl --namespace production \
get deployments
The output is as follows.
NAME READY UP-TO-DATE AVAILABLE AGE
devops-toolkit-...-deployment 0/0 0 0 13m
Deployment is indeed there. The curious thing is that 0
out of 0
replicas are ready. Since it’s been a while since we interacted with the application, Knative decided that there is no point running it. So, it scaled it to zero replicas. As you already saw, it will scale back up when we start sending requests to the associated Service. Let’s take a look at them as well.
kubectl --namespace production \
get services,virtualservices
The output is as follows.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/devops-toolkit ExternalName <none> cluster-local-gateway.... <none> 2m47s
service/devops-toolkit-... ClusterIP 10.23.246.205 <none> 80/TCP 3m6s
service/devops-toolkit-...-private ClusterIP 10.23.242.13 <none> 80/TCP,... 3m6s
NAME GATEWAYS HOSTS AGE
virtualservice.... [knative-serving/...] [devops-...] 2m48s
virtualservice.... [mesh] [devops-toolkit...] 2m48s
We can see that Knative created Kubernetes Services, but also Istio VirtualServices. Since we told it that we want to combine it with Istio, it understood that we need not only Kubernetes core resources, but also those specific to Istio. If we chose a different service mesh, it would create whatever makes sense for it.
Further on, we got the PodAutoscaler.
kubectl --namespace production \
get podautoscalers
The output is as follows.
NAME DESIREDSCALE ACTUALSCALE READY REASON
devops-toolkit-... 0 0 False NoTraffic
PodAutoscaler is, as you can guess by its name, in charge of scaling the Pods to comply with the changes in traffic, or whichever other criteria we might use. By default, it measures the incoming traffic, but it can be extended to use formulas based on queries from, for example, Prometheus.
Finally, we got a Route.
kubectl --namespace production \
get routes
The output is as follows.
NAME URL READY REASON
devops-toolkit http://devops-toolkit.... True
Routes are mapping endpoints (e.g., a subdomain) to one or more revisions of the application. They can be configured in quite a few different ways, but, in its essence, it is the entity that routes the traffic to our applications.
We are almost finished. There is only one crucial thing left to observe, at least from the perspective of a quick overview of Knative. What happens when many requests are "bombing" our application? We saw that when we do not interact with the app, it is scaled down to zero replicas. We also saw that when we send a request to it, it scales up to one replica. But, what would happen if we start sending five hundred concurrent requests? Take another look at devops-toolkit.yaml
and try to guess. It shouldn’t be hard.
Did you guess how many replicas we should have if we start sending five hundred concurrent requests? Let’s assume that you did, and let’s see whether you were right.
We’ll use Siege to send requests to our application. To be more specific, we’ll use it to send a stream of five hundred concurrent requests over sixty seconds. We’ll also retrieve all the Pods from the production
Namespace right after siege is finished "bombing" the application.
As before, the commands will differ slightly depending on the Kubernetes platform you’re using.
W> You will NOT be able to use Siege with Docker Desktop. That should not be a big deal since the essential thing is the output, which you can see here.
Please execute the command that follows if you are using minikube or EKS.
kubectl run siege \
--image yokogawa/siege \
--generator run-pod/v1 \
-it --rm \
-- --concurrent 500 --time 60S \
--header "Host: devops-toolkit.production.example.com" \
"http://$INGRESS_HOST" \
&& kubectl --namespace production \
get pods
Please execute the command that follows if you are using GKE or AKS.
kubectl run siege \
--image yokogawa/siege \
--generator run-pod/v1 \
-it --rm \
-- --concurrent 500 --time 60S \
"http://devops-toolkit.production.$INGRESS_HOST" \
&& kubectl --namespace production \
get pods
The output, in my case, is as follows.
...
Transactions: 40697 hits
Availability: 100.00 %
Elapsed time: 59.53 secs
Data transferred: 83.72 MB
Response time: 0.22 secs
Transaction rate: 683.64 trans/sec
Throughput: 1.41 MB/sec
Concurrency: 149.94
Successful transactions: 40699
Failed transactions: 0
Longest transaction: 5.30
Shortest transaction: 0.00
...
NAME READY STATUS RESTARTS AGE
devops-toolkit-...-deployment-... 3/3 Running 0 58s
devops-toolkit-...-deployment-... 3/3 Running 0 60s
devops-toolkit-...-deployment-... 3/3 Running 0 58s
We can see that, in my case, over forty thousand requests were sent, and the availability
is 100.00 %
. That might not always be the situation, so don’t be alarmed if, in your case, it’s a slightly lower figure. Your cluster might not even have enough capacity to handle the increase in workload and might need to scale up. In such a case, the time required to scale up the cluster might have been too long for all the requests to be processed. You can always wait for a while for all the Pods to terminate and try again with increased cluster capacity.
For now, Knative does not give 100% availability. I was lucky. If you have huge variations in traffic, you can expect something closer to 99.9% availability. But that is only when there is a huge difference like the one we just had. Our traffic jumped from zero to a continuous stream of five hundred concurrent requests within milliseconds. For the "normal" usage, it should be closer to 100% (e.g., 99.99%) availability.
What truly matters is that the number of Pods was increased from zero to three. Typically, there should be five Pods since we set the containerConcurrency
value to 100
, and we were streaming 500
concurrent requests. But we also set the maxScale
annotation to 3
, so it reached the limit of the allowed number of replicas.
While you’re reading this, Knative probably already started scaling down the application. It probably scaled it to one replica, to keep it warm in case new requests come in. After a while, it should scale down to nothing (zero replicas) as long as traffic keeps being absent.
The vital thing to note is that Knative does not interpret the traffic based only on the current metrics. It will not scale up when the first request that cannot be handled with the existing replicas kicks in. It will also not scale down to zero replicas the moment all requests stop coming in. It changes things gradually, and it uses both current and historical metrics to figure out what to do next.
Assuming that you are not a very fast reader, the number of Pods should have dropped to zero by now. Let’s confirm that.
kubectl --namespace production \
get pods
The output states that no resources
were found in production namespace.
In your case, a Pod might still be running, or the status might be terminating
. If that’s the case, wait for a while longer and repeat the previous command.