Speedy PostgreSQL Parallel Compression Dumps
I used to backup our database using the following statement;
pg_dump -h fab2 -Fc -Z9 somedb > somedb.db
Once our dataset grew into the gigabytes, it took a very long time to do database dumps. Today, I stumbled upon yet another awesome blog post done by Ted Dzibua mentioning two useful parallel compression utilities. So why not try parallel compression with PostgreSQL dumps?
pbzip2 – Parallel BZIP2: Parallel implementation of BZIP2. BZIP2 is well known for being balls slow, so speed it up using multiple CPUs.
pigz – Parallel GZIP: Parallel implementation of GZIP written by Mark Adler.
Time to try this out with our PostgreSQL dump, here are the result times.
• This was done on a quad core xeon 2.66ghz machine.
# time pg_dump -U secret -h fab2 somedb | pigz -c > somedb.gz
real 2m7.332s
user 1m16.414s
sys 0m8.233s
# time pg_dump -U secret -h fab2 somedb | pbzip2 -c > somedb.bz2
real 4m14.253s
user 10m35.879s
sys 0m10.904s
The original database was 1.6gigs. The compressed files came out to….
# du -hs somedb.*
147M somedb.bz2
194M somedb.gz
And just to make this post complete, to pipe the SQL dump back into PostgreSQL
# dropdb somedb
# createdb somedb
# gzip -d -c somedb.gz | psql somedb
Since you are using opensolaris, why bother with sql dumps ? Leveraging ZFS send/recv with incremental snapshot is imo a much better option. We backup our psql data every minute that way and it works beautifully. And if you really need a SQL dump, then you can at a given time have a dummmy psql postmater on another server which run against a snapshot of your bakcup and dump the in sql your data somewhere.
I need to do SQL dumps for my workstation that runs on OSX. I can’t do work against the live database, people actively use the service throughout the day. Even If I could do work against the live database, it would be a horrible idea and a disaster just waiting to happen.