Running Fault Tolerant Jenkins Inside A Docker Swarm Cluster

In this article, we’ll discuss a way to create a Jenkins service inside a Docker Swarm cluster and some of the benefits such a service provides.

Environment Setup

We’ll start by creating a Docker Swarm cluster. I will assume you already have at least a basic knowledge how Docker Swarm Mode works. If you don’t, I suggest you read the Docker Swarm Introduction (Tour Around Docker 1.12 Series) article or fetch The DevOps 2.1 Toolkit: Docker Swarm book.

Some of the files will be shared between the host file system and Docker Machines we’ll create soon. Docker Machine makes the whole directory that belongs to the current user available inside the VM. Therefore, please make sure that the code is cloned inside one of the user’s sub-folders.

git clone

cd cloud-provisioning


We cloned the cloud-provisioning repository and executed the scripts/ script that created the production cluster.

Let’s confirm that the cluster was indeed created correctly.

eval $(docker-machine env swarm-1)

docker node ls

The output of the node ls command is as follows (IDs are removed for brevity).

swarm-2   Ready   Active        Reachable
swarm-1   Ready   Active        Leader
swarm-3   Ready   Active        Reachable

Now that the production cluster is up and running, we can create Jenkins service.

Jenkins Service

Traditionally, we would run Jenkins in its own server. Even if we’d choose to share server’s resources with other applications, the deployment would still be static. We’d run a Jenkins instance (without or without Docker) and hope that it never fails. The problem with this approach is in the fact that every application fails sooner or later. Either the process will stop, or the whole node will die. Either way, Jenkins, like any other application, will stop working at some moment.

The problem is that Jenkins become a critical application in many organizations. If we move the execution or, to be more precise, triggering of all automation into Jenkins, we create a strong dependency. If Jenkins is not running, our code is not built, it is not tested, and it is not deployed. Sure, when it fails, you can bring it up again. If the server it is running on stops working, you can deploy it somewhere else. The downtime, assuming that happens during working hours, will not be long. An hour, maybe two, or even more time will pass since the moment it stops working, someone finds out, notifies someone else, that someone restarts the application or provisions a new server. Is that a long time? It depends on the size of your organization. The more people depend on something, the bigger the cost when that something doesn’t work. Even if such a downtime and the cost it produces is not critical, we already have all the knowledge and the tools to avoid it. All we have to do is create another service and let Swarm take care of the rest.

Let’s create a Jenkins service.

mkdir -p docker/jenkins

docker service create --name jenkins \
    -p 8082:8080 \
    -p 50000:50000 \
    -e JENKINS_OPTS="--prefix=/jenkins" \
    --mount "type=bind,source=$PWD/docker/jenkins,target=/var/jenkins_home" \
    --reserve-memory 300m \

docker service ps jenkins

Jenkins stores its state in the file system. Therefore, we started by creating a directory (mkdir) on the host. It will be used as Jenkins home. Since we are inside one of the subdirectories of our host’s user, the docker/jenkins directory is mounted on all the machines we created.

Next, we created the service. It exposes the internal port 8080 as 8082 as well as the port 50000. The first one is used to access Jenkins’ UI and the second for master/agent communication. We also defined the URL prefix as /jenkins and mounted the Jenkins home directory. Finally, we reserved 300m of memory.

Once the image is downloaded, the output of the service ps command is as follows (IDs are removed for brevity).

jenkins.1 jenkins:2.7.4-alpine swarm-1 Running       Running 52 seconds ago
Production cluster with the Jenkins service

Production cluster with the Jenkins service

Jenkins 2 changed the setup process. While the previous versions allowed us to run it without any mandatory configuration, the new Jenkins forces us to go through some steps manually. Unfortunately, at the time of this writing, there is no good API to help us automate the process. While there are some “tricks” we could use, the benefits are not high enough when compared with the additional complexity they introduce. After all, we’ll setup Jenkins only once, so there is no big incentive to automate the process (at least until configuration API is created).

Let’s open the UI.

open http://$(docker-machine ip swarm-1):8082/jenkins

The first thing you will notice is that you are required to introduce the Administrator password. Quite a few enterprise users requested security hardening. As a result, Jenkins cannot be accessed, anymore, without initializing a session. If you are new to Jenkins, or, at least, version 2, you might wonder what the password is. It is output to logs (in our case stdout) as well as to the secrets/initialAdminPassword file which will be removed at the end of the setup process.

Let’s see the content of the secrets/initialAdminPassword file.

cat docker/jenkins/secrets/initialAdminPassword

The output will be a long string that represents the temporary password. Please copy it, go back to the UI, paste it to the Administrator password field, and click the Continue button.

Unlock Jenkins screen

Unlock Jenkins screen

Once you unlock Jenkins, you will be presented with a choice to Install suggested plugins or select those that fit your needs. The recommended plugins fit most commonly used scenarios so we’ll go with that option.

Please click the Install suggested plugins button.

Once the plugins are downloaded and installed, we are presented with a screen that allows us to create the first admin user. Please use admin as both the username and the password. Fill free to fill the rest of the fields with any value. Once you’re done, click the Save and Finish button.

Create First Admin User screen

Create First Admin User screen

Jenkins is ready. All that’s left, for now, is to click the Start using Jenkins button.

Now we can test whether Jenkins failover works.

Jenkins Failover

Let’s stop the service and observe Swarm in action. To do that, we need to find out the node it is running in, point our Docker client to it, and remove the container.

NODE=$(docker service ps -f desired-state=running jenkins | tail +2 | awk '{print $4}')

eval $(docker-machine env $NODE)

docker rm -f $(docker ps -qa -f "ancestor=jenkins:2.7.4-alpine")

We listed Jenkins processes and applied the filter that will return only the one with the desired state running (docker service ps -f desired-state=running jenkins). The output was piped to the tail command that removed the header (tail +2) and, later on, piped again to the awk command that limited the output to the fourth column (awk '{print $4}') that contains the node the process is running in. The final result was stored in the NODE variable.

Later on, we used the eval command to create environment variables that will be used by our Docker client to operate the remote engine. Finally, we removed the container with the combination of the ps and rm commands.

As we already learned in the previous chapters, if a container fails, Swarm will run it again somewhere inside the cluster. When we created the service, we told swarm that the desired state is to have one instance running and Swarm is doing its best to make sure our expectations are fulfilled.

Let us confirm that the service is, indeed, running.

docker service ps jenkins

If Swarm decided to re-run Jenkins on a different node, it might take a few moments until the image is pulled. After a while, the output of the service ps command should be as follows.

NAME           IMAGE           NODE     DESIRED STATE  CURRENT STATE                   ERROR
jenkins.1      jenkins:alpine  swarm-3  Running        Running less than a second ago
 \_ jenkins.1  jenkins:alpine  swarm-1  Shutdown       Failed 5 seconds ago            "task: non-zero exit (137)"

We can do a final confirmation by reopening the the UI.

open http://$(docker-machine ip swarm-1):8082/jenkins

Since Jenkins does not allow unauthenticated users, you’ll have to login. Please use admin as both the User and the Password.

You’ll notice that, this time, we did not have to repeat the setup process. Even though a fresh new Jenkins image is run on a different node, the state is still preserved thanks to the host directory we mounted.

We managed to make Jenkins fault tolerant, but we did not manage to make it run without any downtime. Due to its architecture, Jenkins master cannot be scaled. As a result, when we simulated a failure by removing the container, there was no second instance to absorb the traffic. Even though Swarm re-scheduled it on a different node, there was some downtime. During a short period, the service was not accessible. While that is not a perfect situation, we managed to reduce downtime to a minimum. We made it fault tolerant, but could not make it run without downtime. Considering its architecture, we did the best we could.

The DevOps 2.1 Toolkit: Docker Swarm

The DevOps 2.1 Toolkit: Docker SwarmIf you liked this article, you might be interested in The DevOps 2.1 Toolkit: Docker Swarm book. Unlike the previous title in the series (The DevOps 2.0 Toolkit: Automating the Continuous Deployment Pipeline with Containerized Microservices) that provided a general overlook of some of the latest DevOps practices and tools, this book is dedicated entirely to Docker Swarm and the processes and tools we might need to build, test, deploy, and monitor services running inside a cluster.

You can get a copy from (and the other worldwide sites) or LeanPub. It is also available as The DevOps Toolkit Series bundle.

Give the book a try and let me know what you think.

10 thoughts on “Running Fault Tolerant Jenkins Inside A Docker Swarm Cluster

  1. chandan agarwal

    I used the following but it doesnt work.
    mkdir -p jenkins
    docker service create –name jenkins \
    -p 8090:8080 \
    –mount type=bind,src=$PWD/jenkins,dst=/var/jenkins_home \

    i get this error “invalid mount config for type‚Ķ”

    1. Viktor Farcic Post author

      This can happen for quite a few reasons. I would need to have more details of your setup before answering. Would you mind joining You can ping me there and we can discuss this problem through a chat or share your screen and debug it together. Does that sound like a good option for you?

  2. dserodio

    No volumes are used? How is Jenkins state persisted and restored if the new Jenkins container is started in a different Swarm node?

    1. Viktor Farcic Post author

      The code in this article uses local setup that simulates persistance. In the command that creates the jenkins service you’ll see --mount argument. Since the user’s directory on the host (your laptop) is mounted inside every Docker machine, each will have the same data so when the service is rescheduled to a different node, Jenkins state is persisted. Try it out. Remove a container and Swarm will consider it a failure and reschedule it. With a little bit of luck, that rescheduling will be done on a different node.

      You’ll see that the state is persisted. Mounting of a host (laptop) directory to all machines is a simulation of doing NFS mounts and a simplest way to showcase it locally. In a real-world situation, you would use a driver like REX-Ray that would mount an EFS or EBS volume (in case of AWS), or whatever else is supported by your vendor (including VMWare if on premise).

      A detailed examples that use AWS (same logic applies to any other provider) can be found in or If you get it from LeanPub it is risk-free since there is unconditional return policy. Please give it a try (at least the persistent storage chapter) and let me know what you think.

  3. sankemax

    I also have a problem with binding a host dir. I get the following error:
    invalid mount config for type “bind”: bind source path does not exist

    did any of you who encountered it manage to find a solution?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s