High Availability and Horizontal Scaling with Celery

By Calvin Hendryx-Parker

February 1, 2017

Lighten the Load

When writing a program, long-running processes can block the main program from executing normally. These long running tasks can instead be put on a Celery job queue to help decouple and run asynchronous work loads, and therefore prevent blocking the main program. Workers can find jobs on the Celery queue, do the work outside of the normal process, then return the result at a later time.

As an example, one of Six Feet Up's clients needs to be able to be notified when there are new blog posts in their third party blogging platform. The third party blogging platform sends webhooks to our content management (CMS) system platform via REST API calls. The CMS then throws the job onto the Celery queue. From there, the workers retrieve the actual blog post contents and insert them into the site, all asynchronously. If there is any kind of failure or need to retry, the job can just be put back onto the queue for retry at a later time.

As a special twist, this client's front-end web servers aren't allowed to talk to the Internet. So the workers are run on the backend database servers where they can talk to the Internet. This allows the client to receive messages on the front-end, and put jobs onto the queue for the backend to complete.

Python-Based

Many different flavors of these message queuing systems exist. The Celery one is written in Python, so it runs well in an environment with a Plone CMS, or a Django web application. It’s nice to be able to use some native tools.

A Pluggable Backend Queuing System

Celery is a pluggable backend queuing system. RabbitMQ, Redis or MySQL (experimentally via SQLAlchemy) can be used as the transport and the messaging brokering system for Celery itself. Depending on the environment used, there may already be an existing piece of infrastructure in the environment that Celery can already work with, eliminating the need to install extra moving parts to use an asynchronous message bus.

Getting Started with Celery

Writing a Celery task is very simple. collective.celery included some helpers and some decorators to make it easier to write Plone-based tasks that can easily interact with the ZODB as a Zope client. Celery itself runs outside of Zope, so it isn’t aware of Zope. The collective.celery package adds in the bits to allow it to talk to the database through a ZEO client which makes it fully transactional. It works as if is was just a regular Zope instance talking to the backend database.

from celery import task
@task()
def do_some_work(arg1, arg2)
    """ Do some long running work """
    ...
    return result

Celery Security

Since Celery relies on a broker for passing the messages between the clients and workers, it's necessary to secure the broker so that no other processes can write to it. Celery optionally supports message signing with pyOpenSSL. This allows the clients to sign their jobs using a private key and the workers can verify the signatures with the public key. If the broker supports an encrypted transport, it's possible to enable Celery to use this for full end-to-end encryption of the job.

Scaling Up

The big feature with Celery is that it is asynchronous. More workers can always be added to listen on the queue and do the work in parallel if the tasks are taking too long. Celery allows for many, many long-running tasks to run at the same time.

For this reason, Celery helps make systems more robust and supports high availability and horizontal scaling.

Tips for Newbies

pdb won’t work very well with Celery because there typically isn’t a foreground terminal process when running Celery tasks because they are all running asynchronously. There is a contrib package that comes with Celery called celery.contrib.rdb. This package allows for remote Python debugging of a Celery task. It’s super handy because it makes it possible to debug and run a Python debugger on a remote Celery task. To use it, you just need to import it like pdb and set a trace in the code:

from celery.contrib import rdb
from celery import task
@task()
def add(x, y):
    result = x + y
    rdb.set_trace()
    return result

When the set_trace() is hit, you will get a log message with a port number to telnet to so you can run your standard debugger commands.

More Info

For more information, you can go to the Celery website. For using Celery with Plone and Zope, I’d recommend looking at the collective.celery package . And for Django applications, Celery 3.1 now has direct support for Django built-in.

Thanks for filling out the form! A Six Feet Up representative will be in contact with you soon.

Blog