At Six Feet Up, we have used Bacula to backup our servers for years. Bacula is a modern backup system. It has a great scheduling mechanism with the ability to auto-schedule the backups doing both incremental and full backups. It's a bit more sophisticated than other systems, such as Amanda. We especially like how easy it is to perform restores. Additionally, Bacula provides lots of details about when hosts were backed up, how much data was captured, what level of backups was done, etc. It is a really nice system.
However, Bacula also has some flaws. One of them is that it does file level backups, not block level backups. This means that if only one part of a 10GB file file changes, Bacula will back up the whole 10GB file.
Another issue is that Bacula isn’t easy to audit. An entry is entered in the configuration file for each host, and each entry is a few lines long. Therefore the configuration file is a big giant file of literally thousands of lines long. So, it can be difficult to make sure all of the servers are actually being backed up.
Finally, Bacula presents some performance problems. To back up some of the servers that are virtual machines on bigger hosts, there's a hypervisor that runs a virtual machine. Unfortunately we can’t perform full back ups on 2 hosts on the same hypervisor at the same time as it would deplete all the I/O on the disk for any other kind of operation and cause a big performance hit.
To simplify the management of our Bacula-based backups, I used Python and wrote literally 20 lines of script using the YAML and Jinja libraries. I output the big giant Bacula configuration file using Jinja as a templating engine. The Jinja template is just a single host entry with placeholders for information about the host. The Python script fills in the blanks and writes it into the larger configuration file that Bacula will use in its larger configuration.
To address the performance issue, the backup load needed to be distributed appropriately. So the Python script uses the YAML file to make sure that no single hypervisor has more than one virtual machine being backed up at the same time. Now, adding a new host is trivial: we just need to find the hypervisor in the YAML file, add the VM to it, rerun the script and the files are regenerated.
The YAML file also has lists of other individual hosts that aren't doing hypervisor stuff. The file allows us to make decisions about the order of the jobs, so as to ensure that certain full backup jobs run at certain times and they don't overlap with other ones that are on the same hypervisor.
Here is the script that I created to generate the config files that get included into our Bacula system.
import logging import yaml import jinja2 from collections import defaultdict, OrderedDict # configure some logging logging.basicConfig(format='%(asctime)s %(message)s') # bring in a structure of hosts with hypervisors listing their VMs with open('hosts.yaml') as f: hosts = yaml.load(f) # keep the order of the main hosts alphbetic hosts = OrderedDict(sorted(hosts.items(), key=lambda t: t[0])) # There are 28 slots the hosts can fit into schedules = ["MonthlyCycle" + str(i) for i in range(1, 29)] hypervisor_types = ['xen', 'bhyve'] # create reverse lookup so we don't put # 2 hypervisor full backups on the same day reverse = OrderedDict() for host, data in hosts.items(): if data.get('type') in hypervisor_types: new_hosts = {x: host for x in data.get('hosts')} else: new_hosts = {host: None} reverse.update(new_hosts) # track the jobs added to the scheule jobs = defaultdict(list) # loop over all hosts and distribute the full backups for count, host in enumerate(reverse.iterkeys()): current_schedule = schedules[divmod(count, len(schedules))[1]] current_schedule_hypervisors = [reverse[i] for i in jobs[current_schedule]] if reverse[host] in current_schedule_hypervisors: logging.warning("host going into schedule with another on same day") jobs[current_schedule].append(host) # setup jinja environment env = jinja2.Environment(loader=jinja2.FileSystemLoader('.')) # get the job template template = env.get_template('job.jinja') # Output/update the jobs files for sched in jobs.keys(): for host in jobs[sched]: print(template.render(hostname=host, schedule=sched))
This script is expecting two external files. The first one is the YAML file that contains our definitions of the hosts.
hypervisor01.sixfeetup.com: type: xen hosts: - vm01.sixfeetup.com - vm02.sixfeetup.com - vm03.sixfeetup.com - vm04.sixfeetup.com hypervisor02.sixfeetup.com: type: bhyve hosts: - vm05.sixfeetup.com - vm06.sixfeetup.com - vm07.sixfeetup.com - vm08.sixfeetup.com - vm09.sixfeetup.com server01.sixfeetup.com: type: standalone
And it is looking for our jinja template so it can fill in the blanks.
Job { Name = {{ hostname }} Client = {{ hostname }} Schedule = {{ schedule }} JobDefs = ClientVM } Client { Name = {{ hostname }} Address = {{ hostname }} FDPort = 9102 Catalog = MyCatalog Password = "pgqSDQp8tXZppKxXSqbFR+qzLoEw54zWYRpSQYkfJ07r" File Retention = 30 days Job Retention = 90 days AutoPrune = yes }
It will now generate output that looks like this:
Job { Name = vm04.sixfeetup.com Client = vm04.sixfeetup.com Schedule = MonthlyCycle1 JobDefs = ClientVM } Client { Name = vm04.sixfeetup.com Address = vm04.sixfeetup.com FDPort = 9102 Catalog = MyCatalog Password = "changeme" File Retention = 30 days Job Retention = 90 days AutoPrune = yes } Job { Name = vm09.sixfeetup.com Client = vm09.sixfeetup.com Schedule = MonthlyCycle8 JobDefs = ClientVM } Client { Name = vm09.sixfeetup.com Address = vm09.sixfeetup.com FDPort = 9102 Catalog = MyCatalog Password = "changeme" File Retention = 30 days Job Retention = 90 days AutoPrune = yes } ...
Using this technique of generating your configuration scripts with Python can really take some of the headaches away from managing complex configurations. YAML and Jinja are very easy to use and keep your configurations in nice easy to read files. If you have any questions, please don't hesitate to contact me.