Replication with ZFS and PostgreSQL

When clients have high availability requirements for their Plone site, we recommend using the RelStorage implementation for the ZODB, combined with the PostgreSQL database platform. When we deploy this solution, we use ZFS on FreeBSD. Until recenty, this solution relied on rsync to synchronize the PostgreSQL data directory from the primary server to the secondary server. The problem with this approach is that rsync has to scan the entire data directory to find changed files. We decided to take advantage of the ZFS snapshot replication features to make this step more efficient.

Since ZFS snapshots have the deltas we need regarding changes to the PostgreSQL data directory, we wrote a script that will:

Stop PostgreSQL on the local secondary server
Notify the primary PostgreSQL server that a backup is commencing
Take a new ZFS snapshot
Initiate an incremental ZFS replication stream
Munge the recovery.conf / postgresql.conf files
Notify the primary PostgreSQL server that the backup is finished
Re-start PostgreSQL on the local secondary server

Prerequesites

There are moving parts to this solution, so you'll need to do some leg work before being able to use the script.

Operating System User

In order to take a ZFS snapshot on the remote primary server and initiate replication back to the secondary, you need to have an operating system user setup with a password-less SSH key (on the primary database server):

$ sudo pw groupadd -n zfssync -g 6000
$ sudo pw useradd -n zfssync -u 6000 -g 6000 -m
$ sudo -H -u zfssync ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/zfssync/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/zfssync/.ssh/id_rsa.
Your public key has been saved in /home/zfssync/.ssh/id_rsa.pub.
The key fingerprint is:
48:da:0d:68:2e:66:96:ee:d8:ba:fc:6d:a3:b6:dd:8d davidb@dhmo.local
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|     .           |
|    o o          |
|   + + +         |
| `* o o S        |
| = .             |
|  .              |
|.+ .oo. o        |
|++=++o.E .       |
+-----------------+
# Copy /home/zfssync/.ssh to your secondary database server
$ sudo zfs allow -u zfssync create,mount,snapshot,send,receive,hold data/pgsql

PostgreSQL User

The calls to pg_start_backup and pg_stop_backup require elevated privileges in PostgreSQL. Here's how to set up a PostgreSQL database user with the replication role:

$ psql -U pgsql postgres
psql (9.3.2)
Type "help" for help.

postgres=# create user replicator with replication;

Then, in pg_hba.conf:

host    all             replicator   10.12.2.0/24  trust

You can use md5 for the METHOD, but you must then setup the .pgpass file for your operating system user.

Initial Snapshot Replication

In order to take advantage of efficient incremental replication, the secondary database server must first transfer an initial snapshot of the ZFS filesystem holding the current PostgreSQL data directory (this builds on the first requisite above, to be run on the secondary server):

$ sudo ssh -i /home/zfssync/.ssh/id_rsa zfssync@<IP of primary> zfs snapshot data/pgsql@init-secondary
$ sudo ssh -i /home/zfssync/.ssh/id_rsa zfssync@<IP of primary> zfs send -Rv data@init-secondary | sudo zfs recv -Fv data/pgsql

Coup de grâce

The reset_secondary.sh script (and ancillary config file info) is on Github: https://gist.github.com/davidblewett/8282108 . It's usage is pretty simple (on secondary):

$ sudo /path/to/reset_secondary.sh <IP of primary>

After the script has run, the secondary server will be running off of an up-to-date ZFS snapshot. If you use the config file info in the gist, it will then continue to use PostgreSQL's built-in streaming replication to keep itself up to date.

It is possible to build on this setup and add automatic failover and promotion of the secondary to primary by using FreeBSD's CARP and ifstated mechanisms.

Blog