Speedy PostgreSQL Parallel Compression Dumps
I used to backup our database using the following statement;
Once our dataset grew into the gigabytes, it took a very long time to do database dumps. Today, I stumbled upon yet another awesome blog post done by Ted Dzibua mentioning two useful parallel compression utilities. So why not try parallel compression with PostgreSQL dumps?
pbzip2 – Parallel BZIP2: Parallel implementation of BZIP2. BZIP2 is well known for being balls slow, so speed it up using multiple CPUs.
pigz – Parallel GZIP: Parallel implementation of GZIP written by Mark Adler.
Time to try this out with our PostgreSQL dump, here are the result times.
• This was done on a quad core xeon 2.66ghz machine.
real 2m7.332s
user 1m16.414s
sys 0m8.233s
# time pg_dump -U secret -h fab2 somedb | pbzip2 -c > somedb.bz2
real 4m14.253s
user 10m35.879s
sys 0m10.904s
The original database was 1.6gigs. The compressed files came out to….
147M somedb.bz2
194M somedb.gz
And just to make this post complete, to pipe the SQL dump back into PostgreSQL
# createdb somedb
# gzip -d -c somedb.gz | psql somedb



Since you are using opensolaris, why bother with sql dumps ? Leveraging ZFS send/recv with incremental snapshot is imo a much better option. We backup our psql data every minute that way and it works beautifully. And if you really need a SQL dump, then you can at a given time have a dummmy psql postmater on another server which run against a snapshot of your bakcup and dump the in sql your data somewhere.
I need to do SQL dumps for my workstation that runs on OSX. I can’t do work against the live database, people actively use the service throughout the day. Even If I could do work against the live database, it would be a horrible idea and a disaster just waiting to happen.