Not all malloc implementations are created equal
I have recently blogged about swapping malloc implementations for the JVM to help boost multi-threaded performance. Well there is yet another malloc implementation that solaris comes with that is optimized for single threaded performance; bsdmalloc. I just recently switched our perl interpreter to use bsdmalloc and got 33% faster performance with our perlbal proxy.
You can try out multiple malloc implementations by setting LD_PRELOAD environment variable.
LD_PRELOAD="/usr/lib/libbsdmalloc.so" perl somecode.pl
So here is the rule of thumb for which malloc implementation to use for your application.
libumem = For multithreaded applications. umem avoids thread heap contention and is highly optimized for multi-threaded applications.
bsdmalloc = For single threaded applications. PHP/Perl/Python and Ruby will fall into this category.
Applying the right malloc implementation to your resource intensive application can see a nice performance benefit.