Transmogrifier: A Pipeline for Easing Migration into Plone

inertia (ɪnˈɜːʃə, -ʃɪə)

The tendency of a body to preserve its state of rest or uniform motion unless acted upon by an external force

Inertia is the tendency to resist change. In the end, it may be the single force that keeps people using the wrong CMS for longer than they should.

When it comes down to comparing Content Management Systems on a technical basis, it's pretty straightforward to figure out the features you need, how you want to deploy, what underlying technology you want to be on, and which CMS is the best fit for your business. Where the issue of whether to move or not often arises is not with the seemingly larger issue of technology, but with the "soft" issues of existing content and how it can be migrated between systems. This is because in so many migrations, the process of getting information to move from one system to another creates a huge time sink for staff time, IT resources and frustrated end users just to get the new system "back to where we were in the old one."

In these instances, content creates inertia behind the existing system, making it more difficult to change directions based on the amount of content you have.

Large sites, loaded with content and standing to profit most from a top notch CMS, often limp along for years because the task of moving seems so daunting. This is particularly true with homegrown CMS implementations. Others never make the move until something like a catastrophic system crash, the CMS going out of business or some other disaster forces the move to be made. Nothing is quite as painful as a rushed migration, yielding sub-par results and devouring time as users work frantically to fix links, reorganize content in what is often a manual and terribly tedious process.

Migrations - the "before" case

When migrations were made in the past, they were mostly technological hacks - one-off scripts, random modules of code from all over the place, strung together to handle a specific migration from point A to point B. And, once you were done with the migration, you never touched that code again. These migrations are expensive to write, hard to replicate consistently, and prone to failure when even the smallest parameter changed. The next time a similar migration came up, you might dig into that code as a resource, pulling out bits that were useful - but for the most part there was very limited or no reusability.

Migrations - the "after" case

I recently did a presentation at the Plone Symposium East at Penn State on the subject of "Migrating from Drupal to Plone" that showed off Transmogrifier, a Plone tool which offers a different and better way to look at migrations. Jarn originally developed Transmogrifier back in 2008 for a specific migration, then donated it back to the community - something they deserve a lot of appreciation for. The framework has seen a lot of growth in the last year or so as many people working at doing migrations began to see the advantages of an approach which could be re-used.

Transmogrifier is a framework for doing migrations which is built on a set of reusable parts which you can add to and share with the community. That way, the next time someone wants to do a migration which someone else has already worked out, its simply a matter of downloading the module for that migration - called a Blueprint - adding it to your pipeline - and you're good to go.

Understanding the terms

A Pipeline is a series of steps written in Python and used to move content from one place to another very much like a physical pipeline does. A Transmogrifier Pipeline is made up of three things - in addition to Blueprints there are also sources and sections.

Sources are Blueprints that know how to bring data into the pipeline - extracting it from a database, pulling it from the filesystem or otherwise getting it into a form where it can be worked with.

Sections are analogous to sections of pipeline, each being a step that is gone through in translating, rearranging and transferring content through the process. Plone users will see Sections as similar to parts in a buildout which can be moved and arranged to provide the desired results.

So what does a pipeline look like?

[transmogrifier]
pipeline = 
    csv_file
 constructor
 schemaupdater

[csv_file]
blueprint = collective.transmogrifier.sections.csvsource
filename = my.migration.import:my_items.csv

[constructor]
blueprint = collective.transmogrifier.sections.constructor

[schemaupdater]
blueprint = plone.app.transmogrifier.atschemaupdater

Once you have created your pipeline, it is time to run things though it. In Transmogrifier, what you run through your pipeline is Items. Items are content elements which are being moved through a pipeline and each is a mapping of content in one form to another. Items may contain keys, which are fields - and keys with a leading underscore are controllers; basically attributes which can be applied to a piece of content you are creating.

As an example, let's say you start with items which have been extracted from Drupal in CSV format using a source. A set of items might look like this:

_path , _type , title , description /folder1 , Folder , First Folder , This is folder One /folder2 , Folder , Second Folder , This is folder Two /folder1/foo , Document , One Foo , A document named foo /folder2/foo , Document , Two Foo , Another doc named foo

After being run though a pipeline, these items become Plone objects as shown below:

The real value in the process is that these blueprints and sources can be built once, and then shared with the community for reuse - so the only parts you need to build are the ones which are unique to your situation, rather than having to reinvent what other people have already figured out.

Migration Strategies

With Transmogrifier in your bag o' tricks, building a migration is relatively straightforward.

Investigate the source program - that is, find out if there are easy ways to get the system you are starting with to dump out its content into a standard format like csv or json. If there is, that might save you having to create a a source which extracts it or pulls it off the filesystem. If a system's background is in SQL, I quite often use SQLAlchemy - though I will always look to see if a) the system offers exports and b) if the exports contain all the information I need.

Prepare the destination - in the Plone site you are moving content to. This can mean creating or setting up custom content types which are analogous to the types of content coming from the source system.

Find Transmogrifier Blueprints - go to PyPI and search for "transmogrify". You should find some helper blueprints to get you started. This will give you a good idea of what work is already done, and what you will need to do. Our goal for the long term should be filling up PyPI with a variety of Blueprints which address as many contingencies as possible so that the people doing migrations have to do less and less of the coding and heavy lifting.

If there is one thing I would like to point out here, its that donating back good blueprints to the community is potentially a huge leg up to others who are in situations similar to yours. The more of these Blueprints we have, the more different systems out there can be cleanly and easily migrated into Plone. The documentation on Transmogrifier is excellent, and the system does such a good job that writing a migration from scratch instead of just coding the special piece you need seems a colossal waste of time.

Finally, for what is left, write your own blueprint for the parts of the migration unique to your circumstance.

Making your migration happen

The preferred way to use Transmogrifier is to make the migration part of your release via GenericSetup. This gives you the chance to package your migration as a part of a profile and run it automatically as part of your release using collective.recipe.plonesite. I will go over this in more detail in a follow-up post.

An example migration - Drupal to Plone

Going back to the original purpose behind using Transmogrifier in my case - migrating from Drupal to Plone. Here is how that process went:

First, I used the transmogrify.sqlalchemy source to pull the content of the Drupal site out so I could get access to it. From there, I used blueprints to move the content into Plone 4.0.5, and also to organize certain pieces of the content so they could be imported as objects into two Plone custom content types - collective.blog.star for migrating blog entries from the standard Drupal blog type and plone.app.discussion for the regular Drupal discussion items.

I created a package which held the elements that made up the whole migration - which is useful from an organizational standpoint as well as when you need to go back and do something again. I also registered configs because the query for each type was slightly different, and this allowed me to do exactly what I wanted for each of them.

Probably the most exciting thing here was that I was able to use blueprints which were already in existence for the vast majority of the migration. In fact, I demonstrated at PSE a migration from Drupal to Plone which required no extra Python code in addition to that which already existed in blueprints. The only coding involved was writing TALES expressions - a far cry from what is normally a marathon of hacking, pasting and testing code in order to get something that works for only one specific case.

There are a tremendous number of options available to you that I'm not even touching on here - so instead of trying to make this a comprehensive guide I suggest you watch my presentation from PSE2011 AND visit the Transmogrifier documentation on PyPI to find out what you need. It is a very well documented project, and those docs will get you a long way into creating and running your own pipelines.

The Fine Print

There are limitations to Transmogrifier - or at least places where you need to take care. If someone has customized their Drupal site, for example, you need to watch out for that and take those changes into account in your pipelines.
There are also plenty of cases where no one has so far written a blueprint to handle the migration. Searching PyPI will bring up plenty of examples that are already out there, but there is still an opportunity for people interested in creating a blueprint and donating it to the community.
Pipelines are not difficult to write, but do require some special knowledge and an understanding of how Transmogrifier works. For example, if you are migrating from something with an SQL backend, someone with a PHP background should be pretty comfortable. But if all you know is Plone and Python, you will face a learning curve.
Since Transmogrifier has a dependency on CMFCore, it is not as straight forward as it could be for migrating to sites that are not CMF based. So it is currently very much a tool for getting content into Plone, not something that is a general-purpose migration tool allowing you to move content from A to B to C to D.

So how does this address inertia?

Transmogrifier does for migration what the assembly line did for mass-produced products. It takes advantage of standardization and the ability to reuse standard parts in many different configurations to move content from System to System in a light, fast, predictable way. In the end, the ability to move content out of a CMS that isn't meeting your needs becomes viable at the point where the pain of making the move is less than the pain of continuing with the status quo. Efficient migrations get rid of inertia as a blocker to making choices an organization would otherwise make without delay.

Of course the flip side of that coin is the argument that once content is moved into Plone, what protects the user from vendor lock-in when they want to get their content out of Plone and into somewhere else? There are a couple of good examples of how this issue can be addressed. First, quintagroup.transmogrifier includes a facility for exporting site content with compatibility back to Plone 2.1, and another product - Content Mirror (which is no longer being actively maintained) - allows content to be moved from Plone into a relational database. Each of these options stops Plone from being a digital black hole.

I think Transmogrifier is an exciting tool at the disposal of Plone developers which can become a big advantage as more and more Blueprints are developed. As we at Six Feet Up found out in our recent migration of a site from Liferay into Plone, this approach has some real benefits both in terms of quality AND speed of doing migrations. We plan on donating Blueprints to the community as we develop them, and hope that other developers and Plone companies will do the same.

Last Question - Why is it called Transmogrifier?

The Transmogrifier was an invention of Calvin (of Calvin and Hobbes fame) which could turn one thing into another. In the comic, it was an inverted cardboard box with a dial on the side which could be set to whatever the user wanted to become. In this case, the dial would be turned to "Plone."