Browsing the 2009 August archive
I am in the process of evaluating which option to choose for a new production deployment of a Sinatra application.
Pros and Cons of the implementations:
JRuby Stack:
Pros:
• Multi-threaded, easy to scale with spiked traffic / shared resources.
Cons:
• Single process is a single point of failure.
MRI Ruby Stack:
Pros:
• Scaled via processes, no single point of failure.
Cons:
• Single Process, no shared resources (Possibly using more memory over time).
These tests are run against a real-world application that is soon to be released, not some dummy “hello world” application.
Application Background:
Sinatra / HAML templates (not compiled, rendered per request) / CouchDB / R18N Translation
Server Specifications:
Hardware: 8Gig / Quad Core Xeon x5355
MRI Stack:
Ruby 1.8.7 (2008-08-11 patchlevel 72)
Nginx Passenger 2.2.4
Passenger Config: passenger_max_pool_size 8, passenger_use_global_queue on
Java Stack:
JRuby 1.3.1 (ruby 1.8.6p287) (2009-06-15 6586)
Jetty-6.1.15
JDK Flags: -server -Xverify:none -XX:MaxPermSize=96m -XX:+AggressiveOpts -Xss128k -Xms256m -Xmx384m -XX:+UseParallelGC -XX:+UseParallelOldGC
JDK 1.7.0 b67
Here are the results. I have taken the best time out of 10 runs, giving enough time for the JDK to warmup and passenger to load all the children. The results are clipped for brevity.
Benchmark command:
JRuby Results:
Time per request: 116.316 [ms] (mean)
Time taken for tests: 11.632 seconds
Memory Use After Test: 437M (RSS)
MRI Results:
Time per request: 84.142 [ms] (mean)
Time taken for tests: 8.414 seconds
Memory Use After Test: 264M (RSS)
Conclusions and final thoughts:
Seems like MRI Ruby has a 39% performance advantage on JRuby executing my application. I am still a bit skeptical if MRI Ruby would still win out in production when it turns into a long running process marathon with varied traffic patterns. At the end of the day the JVM currently has the edge in garbage collection on MRI Ruby, so in “theory” JRuby should be the better choice. This is all a hypothetical guesstimate[sic] on my behalf. I will most likely end up trying both variants in production and see which works best.
One of our articles on fabulously40 went viral on the tagged.com which is one of the largest social networks in the alexa top 100. The viral aspect was quite apparent when the bandwidth sky rocketed to 30Mb/sec of sustained traffic. We were pushing over 60gigs of image data per day! We broke our bandwidth quota in just 3 days. Granted, this was toward the end of the billing cycle for that month.
Facing overdraft charges for bandwidth I decided to re-encode the images with ImageMagick. I converted the images to a “lower quality” compression for a file savings of 71%. Swapping out the images with the newer smaller images dropped the bandwidth by 50% Awesome.
Spot the difference where I swapped out the images?

I have finally nailed out all our issues surrounding Varnish on Solaris, thanks to the help of sky from #varnish. Apparently Varnish uses a wrapper around connect() to drop stale connections to avoid thread pileups if the back-end ever dies. Setting connect_timeout to 0 will force Varnish to use connect() directly. This should eliminate all 503 back-end issues under Solaris that I have mentioned in an earlier blog post.
Here is our startup script for varnish that works for our needs. Varnish is a 64-bit binary hence the “-m64″ cc_command passed.
rm /sessions/varnish_cache.bin
newtask -p highfile /opt/extra/sbin/varnishd -f /opt/extra/etc/varnish/default.vcl -a 72.11.142.91:80 -p listen_depth=8192 -p thread_pool_max=2000 -p thread_pool_min=12 -p thread_pools=4 -p cc_command=’cc -Kpic -G -m64 -o %o %s’ -s file,/sessions/varnish_cache.bin,4G -p sess_timeout=10s -p max_restarts=12 -p session_linger=50s -p connect_timeout=0s -p obj_workspace=16384 -p sess_workspace=32768 -T 0.0.0.0:8086 -u webservd -F
I noticed varnish had particular problem of keeping connections around in CLOSE_WAIT state for a long time, enough to cause issues. I did some tuning on Solaris’s TCP stack so it is more aggressive in closing sockets after the work has been done.
Here are my aggressive TCP settings to force Solaris to close off connections in a short duration of time, to avoid file descriptor leaks. You can merge the following TCP tweaks with the settings I have posted earlier to handle more clients.
/usr/sbin/ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500
# 30 seconds, aggressively close connections – default 4 minutes on solaris < 8
/usr/sbin/ndd -set /dev/tcp tcp_time_wait_interval 30000
# 1 minute, poll for dead connection - default 2 hours
/usr/sbin/ndd -set /dev/tcp tcp_keepalive_interval 60000
Last but not least, I have finally swapped out ActiveMQ for the FUSE message broker, an “enterprise” ActiveMQ distribution. Hopefully it won’t crash once a week like ActiveMQ does for us. The FUSE message broker is based off of ActiveMQ 5.3 sources that fix various memory leaks found in the current stable release of ActiveMQ 5.2 as of this writing.
If the FUSE message broker does not work out, I might have to give Kestrel a try. Hey, if it worked for twitter, it should work for us…right?

(2 votes, average: 4.00 out of 5)