I recently had a WD Raptor drive die in a server that hosted our PostgreSQL database. I had a ZFS snapshot strategy setup that sent over ZFS snapshots of the live database to a ZFS mirror for backup purposes. Looked good in theory right? Except, I forgot to do one critical thing, test my backups. Long story short, I had a bunch of snapshots that were useless. Luckily I had offsite nightly PostgreSQL dumps that I did test which were used to seed my development database. So in the end I avoided catastrophic data failure.

With that lesson in mind, I reconfigured our backup system to do it correctly after re-reading the PostgreSQL documentation.

Prerequisite: You must have WAL archiving on and have the archive directory under your database directory. For example if your database is under /rpool/pgdata/db1 configure your archive directory under /rpool/pgdata/db1/archives

Completely optional but I highly suggest you automate your backups; My zbackup ruby script is pretty simple to setup.

This is how my /rpool/pgdata/db1 Looks like:

victori@opensolaris:/# ls /rpool/data/db1
archives      pg_clog	     pg_multixact  pg_twophase	    postmaster.log
backup_label  pg_hba.conf    pg_stat_tmp   PG_VERSION	    postmaster.opts
base	      pg_ident.conf  pg_subtrans   pg_xlog	    postmaster.pid
global	      pg_log	     pg_tblspc	   postgresql.conf

Source for my pgsnap.sh script.

#!/bin/sh

PGPASSWORD="mypass" psql -h fab2 postgres -U myuser -c "select pg_start_backup('nightly',true);"

/usr/bin/ruby /opt/zbackup.rb rpool/pgdata 7

PGPASSWORD="mypass" psql -h fab2 postgres -U myuser -c "select pg_stop_backup();"

rm /rpool/pgdata/db1/archives/*

The process is quite simple. You issue a command to initiate the backup process so PostgreSQL goes into “backup mode.” Second, you do the ZFS snapshot, in this case I am using my zbackup ruby script. Third, you issue another SQL command to PostgrSQL to get out of backup mode. Lastly, since you have the database snapshot you can safely delete your previous WAL archives.

Now, this is all nice and dandy but you should *TEST* your backups, before assuming your backup strategy actually worked.


zfs clone rpool/pgdata@2010-6-17 rpool/pgtest
postgres --singleuser mydb -D /rpool/pgtest/db1

Basically you clone the snapshot and test it by running it under PostgreSQL in single user mode. Once in singleuser mode, test out your backup to make sure it is readable, you can issue a SQL queries to confirm that all is fine with the backup.

ZFS you rock my world 😉