I used to backup our database using the following statement;


pg_dump -h fab2 -Fc -Z9 somedb > somedb.db

Once our dataset grew into the gigabytes, it took a very long time to do database dumps. Today, I stumbled upon yet another awesome blog post done by Ted Dzibua mentioning two useful parallel compression utilities. So why not try parallel compression with PostgreSQL dumps?

pbzip2 – Parallel BZIP2: Parallel implementation of BZIP2. BZIP2 is well known for being balls slow, so speed it up using multiple CPUs.

pigz – Parallel GZIP: Parallel implementation of GZIP written by Mark Adler.

Time to try this out with our PostgreSQL dump, here are the result times.

• This was done on a quad core xeon 2.66ghz machine.


# time pg_dump -U secret -h fab2 somedb | pigz -c > somedb.gz

real 2m7.332s
user 1m16.414s
sys 0m8.233s

# time pg_dump -U secret -h fab2 somedb | pbzip2 -c > somedb.bz2

real 4m14.253s
user 10m35.879s
sys 0m10.904s

The original database was 1.6gigs. The compressed files came out to….


# du -hs somedb.*
147M somedb.bz2
194M somedb.gz

And just to make this post complete, to pipe the SQL dump back into PostgreSQL


# dropdb somedb
# createdb somedb
# gzip -d -c somedb.gz | psql somedb