Managing complex Bacula configurations with YAML, Jinja and Python

At Six Feet Up, we have used Bacula to backup our servers for years. Bacula is a modern backup system. It has a great scheduling mechanism with the ability to auto-schedule the backups doing both incremental and full backups. It's a bit more sophisticated than other systems, such as Amanda. We especially like how easy it is to perform restores. Additionally, Bacula provides lots of details about when hosts were backed up, how much data was captured, what level of backups was done, etc. It is a really nice system.

Inefficient Backups

However, Bacula also has some flaws. One of them is that it does file level backups, not block level backups. This means that if only one part of a 10GB file file changes, Bacula will back up the whole 10GB file.

Confusing Configuration

Another issue is that Bacula isn’t easy to audit. An entry is entered in the configuration file for each host, and each entry is a few lines long. Therefore the configuration file is a big giant file of literally thousands of lines long. So, it can be difficult to make sure all of the servers are actually being backed up.

Performance Issues

Finally, Bacula presents some performance problems. To back up some of the servers that are virtual machines on bigger hosts, there's a hypervisor that runs a virtual machine. Unfortunately we can’t perform full back ups on 2 hosts on the same hypervisor at the same time as it would deplete all the I/O on the disk for any other kind of operation and cause a big performance hit.

Python Scripting to the Rescue

To simplify the management of our Bacula-based backups, I used Python and wrote literally 20 lines of script using the YAML and Jinja libraries. I output the big giant Bacula configuration file using Jinja as a templating engine. The Jinja template is just a single host entry with placeholders for information about the host. The Python script fills in the blanks and writes it into the larger configuration file that Bacula will use in its larger configuration.

To address the performance issue, the backup load needed to be distributed appropriately. So the Python script uses the YAML file to make sure that no single hypervisor has more than one virtual machine being backed up at the same time. Now, adding a new host is trivial: we just need to find the hypervisor in the YAML file, add the VM to it, rerun the script and the files are regenerated.

The YAML file also has lists of other individual hosts that aren't doing hypervisor stuff. The file allows us to make decisions about the order of the jobs, so as to ensure that certain full backup jobs run at certain times and they don't overlap with other ones that are on the same hypervisor.

Example Code

Here is the script that I created to generate the config files that get included into our Bacula system.

import logging
import yaml
import jinja2

from collections import defaultdict, OrderedDict

# configure some logging
logging.basicConfig(format='%(asctime)s %(message)s')

# bring in a structure of hosts with hypervisors listing their VMs
with open('hosts.yaml') as f:
    hosts = yaml.load(f)
    # keep the order of the main hosts alphbetic
    hosts = OrderedDict(sorted(hosts.items(), key=lambda t: t[0]))

# There are 28 slots the hosts can fit into
schedules = ["MonthlyCycle" + str(i) for i in range(1, 29)]

hypervisor_types = ['xen', 'bhyve']

# create reverse lookup so we don't put
# 2 hypervisor full backups on the same day
reverse = OrderedDict()
for host, data in hosts.items():
    if data.get('type') in hypervisor_types:
        new_hosts = {x: host for x in data.get('hosts')}
    else:
        new_hosts = {host: None}
    reverse.update(new_hosts)

# track the jobs added to the scheule
jobs = defaultdict(list)
# loop over all hosts and distribute the full backups
for count, host in enumerate(reverse.iterkeys()):
    current_schedule = schedules[divmod(count, len(schedules))[1]]
    current_schedule_hypervisors = [reverse[i] for i in jobs[current_schedule]]
    if reverse[host] in current_schedule_hypervisors:
        logging.warning("host going into schedule with another on same day")
    jobs[current_schedule].append(host)

# setup jinja environment
env = jinja2.Environment(loader=jinja2.FileSystemLoader('.'))
# get the job template
template = env.get_template('job.jinja')

# Output/update the jobs files
for sched in jobs.keys():
    for host in jobs[sched]:
        print(template.render(hostname=host, schedule=sched))

This script is expecting two external files. The first one is the YAML file that contains our definitions of the hosts.

hypervisor01.sixfeetup.com:
    type: xen
    hosts:
        - vm01.sixfeetup.com
        - vm02.sixfeetup.com
        - vm03.sixfeetup.com
        - vm04.sixfeetup.com

hypervisor02.sixfeetup.com:
    type: bhyve
    hosts:
        - vm05.sixfeetup.com
        - vm06.sixfeetup.com
        - vm07.sixfeetup.com
        - vm08.sixfeetup.com
        - vm09.sixfeetup.com

server01.sixfeetup.com:
    type: standalone

And it is looking for our jinja template so it can fill in the blanks.

Job {
 Name = {{ hostname }}
 Client = {{ hostname }}
 Schedule = {{ schedule }}
 JobDefs = ClientVM
}
Client {
 Name = {{ hostname }}
 Address = {{ hostname }}
 FDPort = 9102
 Catalog = MyCatalog
 Password = "pgqSDQp8tXZppKxXSqbFR+qzLoEw54zWYRpSQYkfJ07r"
 File Retention = 30 days
 Job Retention = 90 days
 AutoPrune = yes
}

It will now generate output that looks like this:

Job {
   Name = vm04.sixfeetup.com
   Client = vm04.sixfeetup.com
   Schedule = MonthlyCycle1
   JobDefs = ClientVM
 }
 Client {
   Name = vm04.sixfeetup.com
   Address = vm04.sixfeetup.com
   FDPort = 9102
   Catalog = MyCatalog
   Password = "changeme"
   File Retention = 30 days
   Job Retention = 90 days
   AutoPrune = yes
 }
 Job {
   Name = vm09.sixfeetup.com
   Client = vm09.sixfeetup.com
   Schedule = MonthlyCycle8
   JobDefs = ClientVM
 }
 Client {
   Name = vm09.sixfeetup.com
   Address = vm09.sixfeetup.com
   FDPort = 9102
   Catalog = MyCatalog
   Password = "changeme"
   File Retention = 30 days
   Job Retention = 90 days
   AutoPrune = yes
 }
 ...

Using this technique of generating your configuration scripts with Python can really take some of the headaches away from managing complex configurations. YAML and Jinja are very easy to use and keep your configurations in nice easy to read files. If you have any questions, please don't hesitate to contact me.

Blog