KARL's New Approach to Safely Releasing Updates to Hosted Production Sites

One of my many functions as the KARL Champion at Six Feet Up is to release updates for our KARL customers such as OSF and Oxfam Great Britain. We release copies of KARL software for each customer in both a testing and production environment. During the KARL migration, from Xens to a combined KARL Hosting Infrastructure, we spent some time addressing some concerns with how we deploy new updates.

This represents an oversimplification of the process and reviews some of the steps involved in the old process. In the old process, we often performed some of the following steps:

svn up (retrieve the updates from the repository) on the buildout directory for the customer's copy of the KARL buildout
Running the buildout
Restarting Supervisor

One of the biggest concerns, when we ran buildouts is "Will it successfully complete?" Much of this concern is managed through adequate levels of testing. Even with the thorough testing and running projects through QA teams, there always is a chance that something will be missed. For KARL, as with other projects at Six Feet Up, the developers will have one set of configurations for running the product locally so that they can perform development tasks. When deploying a buildout to a staging or a production environment, we often used a different set of configurations established for the environment on which that buildout is being deployed. These configuration changes, environment differences or even something that slipped into the code base after QA completed testing can lead to random and unexpected results.

When a buildout is being updated, it deletes the old commands in the bin directory and recreates them. If an unexpected error is encountered during buildout, a buildout may not complete. On a production environment this could be devastating.

This single, critical issue began a conversation and led to the creation of a process to identify how we could we improve our release processes for KARL sites. As the exclusive host for KARL sites, we realized that our customers deserve greater certainty, which made this issue a priority. For some of the projects at Six Feet Up, we archive a copy of the buildout before updating. For our KARL deployments, the KARL Development team identified and created a list of proposed changes that might be considered in order to handle this type of issue.

I spent quite a bit of time trying to implement this plan while making sure that we automated as much as possible. Chris Rossi of the KARL Development team and I took some of my automation scripts and converted them into a package of tools to assist in the KARL release management process.

High level overview of the change.

For each customer, we have a separate directory location where we used to checkout the buildouts. We now make sure we check them out to a unique directory, which we use the tag number as part of the filename to make the sure the directory is unique. For tag 3.0 for customerA we might produce a directory structure like this:

/all_KARL_Sites/customerA/tag_3.0/buildout_files

Next, we run the buildout. If, and only if, the buildout was successful, we then make a sym link from the folder tag_3.0 to a directory for "current". This leads to a directory structure such as:

/all_KARL_Sites/customerA/tag_3.0/buildout_files
/all_KARL_Sites/customerA/current (-> points to /all_KARL_Sites/customerA/tag_3.0/)

For our supervisor, webserver or any other application that needs to use files for customer_A, we point them to

/all_KARL_Sites/customerA/current/buildout_files

When we do the next release, we start with a clean svn checkout in a new directory. We run the buildout and update the "current" sym link to the new tagged directory, leaving us a directory structure of:

/all_KARL_Sites/customerA/tag_3.0/buildout_files
/all_KARL_Sites/customerA/tag_3.1/buildout_files
/all_KARL_Sites/customerA/current (-> now points to /all_KARL_Sites/customerA/tag_3.1/)

What's our gain?

When doing this, we can retain the last buildout. If something is discovered immediately after the release, we could switch the "current" sym link back to the previous install. If the buildout fails to finish, the "current" link is already pointing to the previous install and therefore the site is not offline and there is no "fire" to put out. We can stop the release and take our time to find the problem.

What are the side effects?

When we began to implement this process, we identified some issues and tackled them one by one.

First, each time we release, we install a fresh copy of the buildout. This means any "data", "files" or "logs" that are created inside the buildout directory need to be managed or they will be lost when the next release occurs. For our KARL sites, we had mailout, zodb, blobs and some other items that were stored inside the buildout directory.

We managed some items like mailout by moving those folders to the home directory of the user that the KARL application was running as. For other items like zodb and blobs, we created the same type of directory structure in a different location on the server. We only touch the zodb and blobs for a release to KARL if we perform an evolve, on the data. We typically don't have an evolve for each release. By moving the zodb and blobs outside of buildout, we only touch them when we need to do an evolve.

Since we were taking so much care in keeping backup copies of code by using fresh copies of buildout, it made sense to extend this to our data management as well. When we have to perform an evolve statement, instead of doing it live against the current data, we make a copy of the data into a separate directory. If the evolve is successful, we then update the data's "current" link to point to the correct new data directory in the same fashion we did above for the code.

Evaluating Pro's & Con's:

Cons:

Complexity can add more steps to do an update
Each update is a full, fresh install
Each update takes longer with this method
Backup copies of code, data, etc. can use more disk space

Pros:

Safer Installs
Rollback capability
Backups of code and data
Failures of install don't kill the live site
Standardization of folders and setups can allow for easier capability to automate

Conclusion

By performing this process, we turn every release into the equivalent of installing for the first time. Everything is run fresh:

fresh svn checkout,
fresh buildout,
fresh copy of the zodb and blob data, etc.

This complicates the update process which makes the requirement for writing scripts to automate the deployments a must. Now with scripts, it is easier than before to do an update. Typically, one command and I can deploy an updated tag for a customer. The best benefit of all: when running a buildout, I know we aren't going to take down a site if a buildout fails.

KARL's New Approach to Safely Releasing Updates to Hosted Production Sites

Table of Contents