Are you running JRuby in production? Do you want distributed file storage for your “enterprise” application? Look no further, MogileFS is here.
MogileFS-Client has compatibility issues with JRuby due to it’s use of the low level Socket class. JRuby 1.5-dev does not yet support all the Socket methods, so here is a monkey patch to get the ruby mogilefs client working on JRuby. Yes it blocks, but who cares JRuby has native threads.
This is exactly why I love Ruby; monkey patching.
def self.mogilefs_new(host,port,timeout=5.0)
TCPSocket.open(host,port,timeout)
end
end
class TCPSocket
attr_accessor :mogilefs_addr, :mogilefs_connected, :mogilefs_size, :mogilefs_tcp_cork
def self.open(host,port,timeout = 5.0)
super(host,port.to_i)
end
def readable?
true
end
def write_nonblock(data)
write(data)
end
def recv_nonblock(size,arg)
recv(size,arg)
end
def mogilefs_init(host = nil, port = nil)
true
end
end
Here is an example test case on how to get it all to work.
require ‘mogilefs’
# jmogilefs.rb is the monkey patch above
# load it after loading mogilefs client.
require ‘jmogilefs.rb’
mg = MogileFS::MogileFS.new(:domain=>‘testserv’,:hosts=>[‘xxx.xxx.xxx.xxx:6001′])
p mg.get_file_data ‘video:100:default.jpg’
p mg.get_paths ‘video:100:default.jpg’,true
mg.list_keys(‘video:100′)[0].each do |f|
p f
end
Squid is a fundamental part of our infrastructure at Fabulously40. It helps us lower response times quite considerably. The problem with Squid is that it is quite “dense” when it comes to configuration flexibility. Unless your willing to do a bit of C hacking on it, it does not have much configuration flexibility. This can be overcome by using supporting software to help squid out.
Note: Our configuration would be quite simplified if we used Varnish but it lacks some key features that make Squid a better candidate.
1. Varnish can’t stream cache-misses, it can only buffer. This adds latency to cache-miss requests.
2. Varnish is unable to avoid caching objects based on content-length size.
3. Varnish has an issue with connect_timeout and Solaris socket handling.
The picture below illustrates my point. Varnish connection time out code just does not work all that well.

Until Varnish can handle the three things listed, Squid remains the best choice at the cost of configuration complexity.
Optimize Cache Hits by Normalizing Headers
“Accept-Encoding: gzip” and “Accept-Encoding: gzip/deflate” will be cached separately unless you normalize client headers. Squid has no configuration option to normalize headers like Varnish. However, you can use Nginx to normalize headers before passing off the request to Squid.
Here is the setup: Client -> Nginx -> Squid -> Backend Services
The NGINX Configuration
location / { proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host; proxy_hide_header Pragma; # Remove client cache-control header # to avoid fetching from backend if page is in cache proxy_set_header Cache-Control ""; # Normalize static assets for squid if ($request_uri ~* "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|ico|swf|mp4|flv|mov|dmg|mkv)") { set $normal_encoding ""; proxy_pass http://squids; break; } # Normalize gzip encoding if ($http_accept_encoding ~* gzip) { set $normal_encoding "gzip"; proxy_pass http://squids; break; } # Normalize deflate encoding if ($http_accept_encoding ~* deflate) { set $normal_encoding "deflate"; proxy_pass http://squids; break; } # Define the normalize header proxy_set_header Accept-Encoding $normal_encoding; # default... proxy_pass http://squids; break; }
So by the time Squid receives the request, the accept-encoding header is normalized, for efficient cache storage.
Avoid Caching Error Pages with Squid 2.7
Squid 2.7 has better support for reverse caching and HTTP 1.1 than the Squid 3.x branch. However, it missing one important ACL that Squid 3.x has but 2.7 does not; http_status. Squid 2.7 is unable to be configured to avoid caching pages based on status response codes from origin. I have written a Perlbal plugin CacheKill specifically to address this issue. Perlbal::Plugin::CacheKill sits between backend services and Squid and rewrites cache-control headers based on response code.
Here is the setup: Client -> Squid -> Perlbal -> Backend Services
If the backend service responds with a 501,502 or 503 http status code, Perlbal will append Cache-Control: no-cache header before giving back the response to Squid.
Here is the configuration file for Perlbal::Plugin::CacheKill to deny Squid from caching error pages.
CREATE SERVICE fab40 SET listen = 0.0.0.0:8003 SET role = reverse_proxy SET pool = backends cache_kill codes = 404,500,501,503,502 SET plugins = CacheKill ENABLE fab40
Stitching software together can make Squid just as flexible as Varnish with its VCL configuration.
http://github.com/victori/perlbal-plugin-mogilefs
Key features
- Asynchronous, does not stall the Perlbal event loop.
- Converts URL paths to MogileFS fetch keys.
- Failover to filesystem if key fetch failed.
- Pretty statistics in Perlbal’s Management console.
Its freaking awesome
On a side note, I have also updated my other two Perlbal plugins.
http://github.com/victori/perlbal-plugin-stickysessions
- Session affinity via Cookie.
http://github.com/victori/perlbal-plugin-backendheaders
- Appending Backend information on the served response.
*Update* Patches got accepted into MogileFS Trunk
Just go check out trunk, it has all my patches already included.
http://code.sixapart.com/svn/mogilefs/trunk/
The only thing you need is my mogstored disk patch which is still pending. All the issues revolving around postgresql and solaris have been already included in trunk.
I fixed a few issues with MogileFS and Solaris. MogileFS should run wonderfully on Solaris with my patches applied.
Directory for all my patches: http://victori.uploadbooth.com/patches
http://victori.uploadbooth.com/patches/solaris-disk-du.patch
This patch fixes mogstored to work with solaris’s df utility.
http://victori.uploadbooth.com/patches/store-max-requests.patch
This patch adds a new feature to the MogileFS Tracker – max_requests.
The default is 0, but it is suggested you set it to 1000 max_requests, to avoid memory leaks.
The tracker will give out the database handle up to the max_requests limit before expiring the connection for a new one. This avoids memory leaks with long running persistent connections. PostgreSQL has issues with long persistent connections, it accumulates a lot of ram and does not let go until the process/connection is killed off. This patch makes sure that the connection is expired after so many dbh handle requests.
http://victori.uploadbooth.com/patches/mogilefs-sunos-pg.patch
This patch applies the InactiveDestroy argument to avoid the MogileFS Tracker locking up with the PostgreSQL store on Solaris.
http://victori.uploadbooth.com/patches/solaris-mogilefs-full.patch
This is the full patch for all my fixes.
I am slowly migrating our fab40 static asset data to MogileFS. I have imported >300,000 images, no issues with my patches so far.
/ PLUG go make an account on uploadbooth!
Enjoy
I just received my “Guide to Open-Source Operating Systems” comparing Solaris with Linux from Sun’s marketing department. Here are some of the facts that made me cringe due to blatant lying and half truths. Hey Sun, don’t let the facts get in your way.
Believe it or not but this is actually verbatim from the guide.
• Solaris is supported by more applications.
• Solaris holds performance and price/performance world records that demonstrate its speed and scalability on a variety of systems.
• Solaris is supported by Sun, the company dedicated to UNIX for more than two decades.
1. Lets see, the first fact is just blatant lying. Last I checked Linux supported IA-32, MIPS, x86-64, SPARC, DEC Alpha, Itanium, PowerPC, ARM, m68k, PA-RISC, s390, SuperH, M32R and many more platforms. While Solaris only supports SPARC, IA-32 and x86-64. Does anyone at Sun’s marketing department care to fact check?
2. Depends on your definition of “supported.” Marketing is most likely referring to commercial support. I don’t have the facts to back this up but I doubt this is hold true with Linux in 2009, maybe they had a case back in 1999. Majority of open source applications are developed against Linux and Solaris compatibility is just an after thought.
3. You win http://www.tpc.org/tpcc/results/tpcc_perf_results.asp
Sun develops some of the best hardware and software on the market, but their marketing department is a disaster. There can only be one Steve Jobs and his reality distortion field.
Once again I have been blind sided by yet another conservative out-of-the-box setting. IPFilter is tuned way too conservative with it’s state table size.
Here is how you can tell if your hitting any issues, run ipfstat and check for lost packets.
victori@opensolaris:~# ipfstat | grep lost fragment state(in): kept 0 lost 0 not fragmented 0 fragment state(out): kept 0 lost 0 not fragmented 0 packet state(in): kept 798 lost 100 packet state(out): kept 612 lost 234
Notice that the in and out lost state lines have a non-zero value. This means IPFilter has been dropping client connections, bummer.
The default settings are quite conservative.
fr_statemax min 0×1 max 0×7fffffff current 4096
fr_statesize min 0×1 max 0×7fffffff current 5002
You need to shutdown IPFilter and apply larger table size limits.
victori@opensolaris:~# /usr/sbin/ipf -T fr_statemax=18963,fr_statesize=27091
Lets confirm that it works.
fr_statemax min 0×1 max 0×7fffffff current 18963
fr_statesize min 0×1 max 0×7fffffff current 27091
Awesome, now all we need to do is enable IPfilter and no more lost packets.
To make this persistent across reboots edit ipf.conf
name=”ipf” parent=”pseudo” instance=0 fr_statemax=18963 fr_statesize=27091;
Then update the contents
This can be applied to any OS that uses IPFilter.
Update: The following information could be beneficial to some, however my issues actually were with Caviar black drives shipping with TLER disabled. You need to pay Western Digital a premium for their “RAID” drives with TLER enabled. So for anyone reading this, avoid consumer Western Digital drives if you plan on using them for RAID.
zfs_vdev_max_pending
I can’t believe how long I have been tolerating horrible concurrent IO performance on OpenSolaris running ZFS. When I have any IO intensive writes happening the whole system slows down to a crawl for any further IO. Running “ls” on a uncached directory is just painful.
victori@opensolaris:/opt# iostat -xnz 1 extended device statistics r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 87.0 0.0 2878.1 0.0 0.0 0.0 0.4 0 100 c4t0d0 0.0 83.0 0.0 2878.1 0.0 0.1 0.2 0.7 1 50 c4t1d0 1.0 0.0 28.0 0.0 0.0 0.0 0.0 5.4 0 1 c4t2d0
Notice c4t0d0 is blocking at 100%. If a disk is blocking at 100% good luck getting the disk to do any other operations such as reads.
SATA disks do Native Command Queuing while SAS disks do Tagged Command Queuing, this is an important distinction. Seems like OpenSolaris/Solaris is optimized for the latter with a 32 wide command queue set by default. This completely saturates the SATA disks with IO commands in turn making the system unusable for short periods of time.
Dynamically set the ZFS command queue to 1 to optimize for NCQ.
And add to /etc/system
Enjoy your OpenSolaris server on cheap SATA disks!
Recently a primary boot disk went bad on our server and I got blind sided by a non-bootable secondary mirror disk. All the data was intact but I could not boot it. This required a slow re-installation and migration process that took a very long time.
• ZFS attach automatically partitions the drive as EFI.
• ZFS send/recv transfers on gzip compressed data-slices is slow.
Here is the correct way of getting both the disks in the ZFS mirror to boot.
Plug the new drive into the server that you want to add to the ZFS mirror. If your hot swapping or adding a new drive while the server is still on, you need to use cfgadm to configure it.
Now that the drive is configured and seen by the server you need to repartition it with format so it can be used as a bootable device.
AVAILABLE DISK SELECTIONS:
0. c4t0d0
/pci@0,0/pci8086,346c@1f,2/disk@0,0
1. c4t1d0
/pci@0,0/pci8086,346c@1f,2/disk@1,0
2. c4t2d0
/pci@0,0/pci8086,346c@1f,2/disk@2,0
* select your new drive *
# fdisk
* use fdisk to remove the EFI partition and add a solaris2 partition. *
Select the partition type to create:
1=SOLARIS2 2=UNIX 3=PCIXOS 4=Other
5=DOS12 6=DOS16 7=DOSEXT 8=DOSBIG
9=DOS16LBA A=x86 Boot B=Diagnostic C=FAT32
D=FAT32LBA E=DOSEXTLBA F=EFI 0=Exit?
This step is very important, if you did not repartition your drive, zfs attach will default the drive back to an EFI partition table that is not bootable.
c4t0d0s2 — primary drive.
c4t1d0s2 — new drive that we are setting up.
You should now be able to attach the secondary drive to your mirror using the identical slice.
Once the mirror is done synchronizing you need to install the bootloader on the drive.
Updating master boot sector destroys existing boot managers (if any).
continue (y/n)?y
stage1 written to partition 0 sector 0 (abs 16065)
stage2 written to partition 0, 267 sectors starting at 50 (abs 16115)
stage1 written to master boot sector
Trouble Shooting
raw device must be a root slice (not s2)
You did not re-partition the drive to a solaris2 partition. EFI partitions can’t be made bootable. Use the format tool to reconfigure the drive with a solaris2 partition.
cannot open/stat device /dev/rdsk/c1t0d0s0
You did not copy your label information from your primary to your secondary disk with prtvtoc and fmthard.
I am working on a little twitter project that uses twitter4r as the client API. As of recently twitter pulled some strings on their API and broke compatibility.
/opt/local/lib/ruby/gems/1.8/gems/mbbx6spp-twitter4r-0.4.0/lib/twitter/client/base.rb:43:in `raise_rest_error’: Not Found (Twitter::RESTError)
from /opt/local/lib/ruby/gems/1.8/gems/mbbx6spp-twitter4r-0.4.0/lib/twitter/client/base.rb:48:in `handle_rest_response’
from /opt/local/lib/ruby/gems/1.8/gems/mbbx6spp-twitter4r-0.4.0/lib/twitter/client/base.rb:20:in `http_connect’
from /opt/local/lib/ruby/1.8/net/http.rb:543:in `start’
from /opt/local/lib/ruby/gems/1.8/gems/mbbx6spp-twitter4r-0.4.0/lib/twitter/client/base.rb:16:in `http_connect’
from /opt/local/lib/ruby/gems/1.8/gems/mbbx6spp-twitter4r-0.4.0/lib/twitter/client/user.rb:37:in `user’
from somebot.rb:5
Curse you twitter!
Luckly Ruby has the concept of monkey patching, here is the fix to get it all working correctly.
@@USER_URIS = {
:info => ‘/users/show.json’,
:friends => ‘/statuses/friends.json’,
:followers => ‘/statuses/followers.json’,
}
end
Shazzam… it works…
I am in the process of evaluating which option to choose for a new production deployment of a Sinatra application.
Pros and Cons of the implementations:
JRuby Stack:
Pros:
• Multi-threaded, easy to scale with spiked traffic / shared resources.
Cons:
• Single process is a single point of failure.
MRI Ruby Stack:
Pros:
• Scaled via processes, no single point of failure.
Cons:
• Single Process, no shared resources (Possibly using more memory over time).
These tests are run against a real-world application that is soon to be released, not some dummy “hello world” application.
Application Background:
Sinatra / HAML templates (not compiled, rendered per request) / CouchDB / R18N Translation
Server Specifications:
Hardware: 8Gig / Quad Core Xeon x5355
MRI Stack:
Ruby 1.8.7 (2008-08-11 patchlevel 72)
Nginx Passenger 2.2.4
Passenger Config: passenger_max_pool_size 8, passenger_use_global_queue on
Java Stack:
JRuby 1.3.1 (ruby 1.8.6p287) (2009-06-15 6586)
Jetty-6.1.15
JDK Flags: -server -Xverify:none -XX:MaxPermSize=96m -XX:+AggressiveOpts -Xss128k -Xms256m -Xmx384m -XX:+UseParallelGC -XX:+UseParallelOldGC
JDK 1.7.0 b67
Here are the results. I have taken the best time out of 10 runs, giving enough time for the JDK to warmup and passenger to load all the children. The results are clipped for brevity.
Benchmark command:
JRuby Results:
Time per request: 116.316 [ms] (mean)
Time taken for tests: 11.632 seconds
Memory Use After Test: 437M (RSS)
MRI Results:
Time per request: 84.142 [ms] (mean)
Time taken for tests: 8.414 seconds
Memory Use After Test: 264M (RSS)
Conclusions and final thoughts:
Seems like MRI Ruby has a 39% performance advantage on JRuby executing my application. I am still a bit skeptical if MRI Ruby would still win out in production when it turns into a long running process marathon with varied traffic patterns. At the end of the day the JVM currently has the edge in garbage collection on MRI Ruby, so in “theory” JRuby should be the better choice. This is all a hypothetical guesstimate[sic] on my behalf. I will most likely end up trying both variants in production and see which works best.


(4 votes, average: 3.50 out of 5)