By Howard Lo
This is part two of our exploration into Jenkins and how we use it at Earnest. Check out part one if you haven’t already.
Any experienced administrator could tell you Jenkins upgrades are often prone to failures. As Jenkins plugins can modify their behaviors with little to no notice, unsuspecting admins who skipped the fine print may often find themselves with the daunting task of tracking down all the changes on the road to recovery. Some of these behavioral changes may surprise our end users, but the worst of them may cause Jenkins to crash on launch or even result in the data loss.
Earnest has a dedicated team that builds the underlying platform and offers it as a service to other engineering teams, where they can build services that contain business logic on top. As such, we have accrued years of experience with the tools and processes for rolling out platform changes. Ideas such as contract testing, zero-downtime deployment, and automated rollbacks are leveraged heavily to ensure our platform is stable despite constant updates. We can borrow those same ideas and treat our Jenkins upgrades just like how we roll out new features in our platform-as-a-service offerings.
In principle, we need to guarantee upgrades will not result in any major regressions while minimizing downtime, without the intimate knowledge of the various custom CI/CD pipelines that runs on our platform.
To visualize this, we’ve divided our Jenkins upgrade into four stages:
Review and Decide
We only ever run the LTS version of Jenkins due to its relative high level of stability. In an ideal world, we could trigger an upgrade immediately upon a new LTS version release. In practice, an LTS release can bundle with it changes that may be undesirable. So to ensure that our end users experience a Jenkins upgrade that is both necessary and of high quality, our team will review the release notes to fully understand what it brings. This process starts with a subscription to the Jenkins release RSS feed and the feed alert will prompt us to review the Jenkins changelog. When reviewing the changelog, we will try to identify key security fixes, notable feature improvements, and any other bug fixes that would make it worth the small level of time commitment and risk involved with upgrading Jenkins. Once the team has reviewed the changes and decides that the benefits outweigh the costs of an upgrade, we move on to the next stage.
Jenkins provides a pre-built docker image on DockerHub, so that serves as a good starting point for us to build our own customized image on top.
Here is a snippet of our Dockerfile for Jenkins:
Next, we want to also lock in the of plugins that go into our Jenkins master. Fortunately, the official Jenkins image ships with a install script that we can use to make plugin installation a breeze. Our team selected a set of Jenkins plugins that are applicable to our CI/CD workflow based on our customers’ needs, and they are defined in a plugins.txt file. This file is a manifest of all the plugins with their corresponding versions. This file is in version control and is ultimately copied into the Jenkins image and used by the install script.
Once the RUN statement is complete our image will have the jenkins plugins in a predictable and controllable version.
Running the Jenkins container
When running a container, there are multiple parameters that need to be configured per environment. For this reason and to make life easier for ourselves, we generate a `docker-compose.yml` file from an Ansible jinja template per runtime environment. This means we can simply launch Jenkins on any hosts with a simple `docker-compose up -d` command without remembering all the parameters we need to set. This has the added benefit of making it easy for our system administrators to re-configure and restart Jenkins instance on-the-fly, should the need arise.
After launching the Jenkins container successfully on our staging environment, we can then proceed to the testing phase. The way we test this new Jenkins instance is by creating an end-to-end test pipeline that combines testing of our both our core Jenkins library functions and our build pipelines. To create this “pipeline for all pipelines”, we can simply make a matrix capturing all the different types of pipeline and programming language that we support. So that we can have full test coverage no matter what tools and language the project is written in.
Here is a snippet of the test pipeline stage. We can simply invoke tests from any of our existing git repositories. With this technique, we can re-use test code from our existing projects. This is also extensible and we can add more test stages as the variety of the test pipelines increases. Also, because the tests come from different repositories, we can run them in parallel as there is no inter-dependency. No additional time will be added to our test pipeline. The only thing we’d have to be mindful of is to make sure we have enough Jenkins build executors to support these parallel tests.
Promoting Jenkins to production
With our Jenkins instance on staging passing all the tests, we’re finally ready to promote it! We use a technique known as blue-green deployment in this stage. This essentially means that we will take the same Jenkins image that we deployed to staging, re-tag it so we know it’s tested and production-ready, and then deploy it to a new stack identical to the existing one, and finally re-routing traffic to the new instance. Since Jenkins uses the underlying file system to store its build data and configuration, we’ll also have to use a file sync tool such as rclone to copy over the data, or use a network file system such as Amazon’s EFS, before we launch the new Jenkins.Once the new instance is up, the team member responsible for the promotion will ensure that new Jenkins is accessible and \ are retained. Our monitoring tool will also help us confirm that our Jenkins instance is indeed healthy on all the endpoints. With our shiny new instance of “green” Jenkins up and running, we can finally clean up our previous (blue) Jenkins instances.
Here is what our Jenkins clean-up script looks like, and of course we run this as a Jenkins job (wrapped around an ansible playbook)
And here is our trusty slackbot reporting on a successful clean up:
Although the initial investment in the project required considerable effort, the benefits have compounded in terms of increasing our release (upgrade) velocity and overall confidence when rolling out changes.
Some of the key benefits of this project include:
- The ability to apply monthly security patches without disrupting users
- Happier customers that are able to take advantage of new pipeline features that make their way into the latest Jenkins versions as soon as they’re available
- The ability to perform upgrades or maintenance during office-hours with minimal downtime
- The ability to maintain high service uptime during upgrade
This project has also served as a forcing function that paves the road of running Jenkins as a container. So that in the future, we could even opt to migrate it over to a container management service. For now, both our customers and admins are very happy with the results!
If you found this series interesting and you want to learn more about what we work on, Earnest is hiring and we’d love to talk!