Traditionally, we deploy a new release by replacing the current one. The old release is stopped, and the new one is brought up in its place. The problem with this approach is the downtime occurring from the moment the old release is stopped until the new one is fully operational. No matter how quickly you try to do this process, there will be some downtime. That might be only a millisecond, or it can last for minutes or, in extreme situations, even hours. Having monolithic applications introduces additional problems like, for example, the need to wait a considerable amount of time until the application is initialized. People tried to solve this issue in various ways, and most of them used some variation of the blue-green deployment process. The idea behind it is simple. At any time, one of the releases should be running meaning that, during the deployment process, we must deploy a new release in parallel with the old one. The new and the old releases are called blue and green.
We run one color as a current release, bring up the other color as a new release and, once it is fully operational, switch all the traffic from the current to the new release. This switch is often made with a router or a proxy service.
With the blue-green process, not only that we are removing the deployment downtime, but we are also reducing the risk the deployment might introduce. No matter how well we tested our software before it reached the production node(s), there is always a chance that something will go wrong. When that happens, we still have the current version to rely on. There is no real reason to switch the traffic to the new release until it is tested enough that any reasonable possibility of a failure due to some specifics of the production node is verified. That usually means that integration testing is performed after the deployment and before the "switch" is made. Even if those verifications returned false negatives and there is a failure after the traffic is redirected, we can quickly switch back to the old release and restore the system to the previous state. We can roll back much faster than if we'd need to restore the application from some backup or do another deployment.
If we combine the blue-green process with immutable deployments (through VMs in the past and though Docker containers today), the result is a very powerful, secure and reliable deployment procedure that can be performed much more often. If architecture is based on microservices in conjunction with Docker containers, we don't need two nodes to perform the procedure and can run both releases side by side.
The significant challenges with this approach are databases. In many cases, we need to upgrade a database schema in a way that it supports both releases and then proceed with the deployment. The problems that might arise from this database upgrade are often related to the time that passes between releases. When releases are done often, changes to the database schema tend to be small, making it easier to maintain compatibility across two releases. If weeks, or months, passed between releases, database changes could be so big that backward compatibility might be impossible or not worthwhile doing. If we are aiming towards continuous delivery or deployment, the period between two releases should be short or, if it isn't, involve a relatively small amount of changes to the code base.
The Blue-Green Deployment Process
The blue-green deployment procedure, when applied to microservices packed as containers, is as follows.
The current release (for example blue), is running on the server. All traffic to that release is routed through a proxy service. Microservices are immutable and deployed as containers.
When a new release (for example green) is ready to be deployed, we run it in parallel with the current release. This way we can test the new release without affecting the users since all the traffic continues being sent to the current release.
Once we think that the new release is working as expected, we change the proxy service configuration so that the traffic is redirected to that release. Most proxy services will let the existing requests finish their execution using the old proxy configuration so that there is no interruption.
When all the requests sent to the old release received responses, the previous version of a service can be removed or, even better, stopped from running. If the latter option is used, rollback in case of a failure of the new release will be almost instantaneous since all we have to do is bring the old release back up.
The DevOps 2.0 Toolkit
This article was the beginning of the Blue-Green Deployment chapter of The DevOps 2.0 Toolkit: Automating the Continuous Deployment Pipeline with Containerized Microservices book.
This book is about different techniques that help us architect software in a better and more efficient way with microservices packed as immutable containers, tested and deployed continuously to servers that are automatically provisioned with configuration management tools. It's about fast, reliable and continuous deployments with zero-downtime and ability to roll-back. It's about scaling to any number of servers, design of self-healing systems capable of recuperation from both hardware and software failures and about centralized logging and monitoring of the cluster.
In other words, this book envelops the whole microservices development and deployment lifecycle using some of the latest and greatest practices and tools. We'll use Docker, Kubernetes, Ansible, Ubuntu, Docker Swarm and Docker Compose, Consul, etcd, Registrator, confd, Jenkins, and so on. We'll go through many practices and, even more, tools.