Letsgetdugg

Random tech jargon

Browsing the 2009 November archive

Are you running JRuby in production? Do you want distributed file storage for your “enterprise” application? Look no further, MogileFS is here.

MogileFS-Client has compatibility issues with JRuby due to it’s use of the low level Socket class. JRuby 1.5-dev does not yet support all the Socket methods, so here is a monkey patch to get the ruby mogilefs client working on JRuby. Yes it blocks, but who cares JRuby has native threads.

This is exactly why I love Ruby; monkey patching.

class Socket
  def self.mogilefs_new(host,port,timeout=5.0)
    TCPSocket.open(host,port,timeout)
  end
end
class TCPSocket
  attr_accessor :mogilefs_addr, :mogilefs_connected, :mogilefs_size, :mogilefs_tcp_cork
  def self.open(host,port,timeout = 5.0)
    super(host,port.to_i)
  end
  def readable?
    true
  end
  def write_nonblock(data)
    write(data)
  end
  def recv_nonblock(size,arg)
    recv(size,arg)
  end
  def mogilefs_init(host = nil, port = nil)
    true
  end
end

Here is an example test case on how to get it all to work.

require ‘rubygems’
require ‘mogilefs’

# jmogilefs.rb is the monkey patch above
# load it after loading mogilefs client.
require ‘jmogilefs.rb’

mg = MogileFS::MogileFS.new(:domain=>‘testserv’,:hosts=>[‘xxx.xxx.xxx.xxx:6001′])

p mg.get_file_data ‘video:100:default.jpg’

p mg.get_paths ‘video:100:default.jpg’,true

mg.list_keys(‘video:100′)[0].each do |f|
 p f
end

Tagged with , ,

Squid is a fundamental part of our infrastructure at Fabulously40. It helps us lower response times quite considerably. The problem with Squid is that it is quite “dense” when it comes to configuration flexibility. Unless your willing to do a bit of C hacking on it, it does not have much configuration flexibility. This can be overcome by using supporting software to help squid out.

Note: Our configuration would be quite simplified if we used Varnish but it lacks some key features that make Squid a better candidate.

1. Varnish can’t stream cache-misses, it can only buffer. This adds latency to cache-miss requests.
2. Varnish is unable to avoid caching objects based on content-length size.
3. Varnish has an issue with connect_timeout and Solaris socket handling.

Until Varnish can handle the three things listed, Squid remains the best choice at the cost of configuration complexity.

Optimize Cache Hits by Normalizing Headers

“Accept-Encoding: gzip” and “Accept-Encoding: gzip/deflate” will be cached separately unless you normalize client headers. Squid has no configuration option to normalize headers like Varnish. However, you can use Nginx to normalize headers before passing off the request to Squid.

Here is the setup: Client -> Nginx -> Squid -> Backend Services

The NGINX Configuration

location / { proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host; proxy_hide_header Pragma; # Remove client cache-control header # to avoid fetching from backend if page is in cache proxy_set_header Cache-Control ""; # Normalize static assets for squid if ($request_uri ~* "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|ico|swf|mp4|flv|mov|dmg|mkv)") { set $normal_encoding ""; proxy_pass http://squids; break; } # Normalize gzip encoding if ($http_accept_encoding ~* gzip) { set $normal_encoding "gzip"; proxy_pass http://squids; break; } # Normalize deflate encoding if ($http_accept_encoding ~* deflate) { set $normal_encoding "deflate"; proxy_pass http://squids; break; } # Define the normalize header proxy_set_header Accept-Encoding $normal_encoding; # default... proxy_pass http://squids; break; }

So by the time Squid receives the request, the accept-encoding header is normalized, for efficient cache storage.

Avoid Caching Error Pages with Squid 2.7

Squid 2.7 has better support for reverse caching and HTTP 1.1 than the Squid 3.x branch. However, it missing one important ACL that Squid 3.x has but 2.7 does not; http_status. Squid 2.7 is unable to be configured to avoid caching pages based on status response codes from origin. I have written a Perlbal plugin CacheKill specifically to address this issue. Perlbal::Plugin::CacheKill sits between backend services and Squid and rewrites cache-control headers based on response code.

Here is the setup: Client -> Squid -> Perlbal -> Backend Services

If the backend service responds with a 501,502 or 503 http status code, Perlbal will append Cache-Control: no-cache header before giving back the response to Squid.

Here is the configuration file for Perlbal::Plugin::CacheKill to deny Squid from caching error pages.

CREATE SERVICE fab40 SET listen = 0.0.0.0:8003 SET role = reverse_proxy SET pool = backends cache_kill codes = 404,500,501,503,502 SET plugins = CacheKill ENABLE fab40

Stitching software together can make Squid just as flexible as Varnish with its VCL configuration.

Tagged with , ,

http://github.com/victori/perlbal-plugin-mogilefs

Key features

- Asynchronous, does not stall the Perlbal event loop.
- Converts URL paths to MogileFS fetch keys.
- Failover to filesystem if key fetch failed.
- Pretty statistics in Perlbal’s Management console.

Its freaking awesome ;-)

On a side note, I have also updated my other two Perlbal plugins.

http://github.com/victori/perlbal-plugin-stickysessions

- Session affinity via Cookie.

http://github.com/victori/perlbal-plugin-backendheaders

- Appending Backend information on the served response.

Tagged with ,

*Update* Patches got accepted into MogileFS Trunk ;-)

Just go check out trunk, it has all my patches already included.

http://code.sixapart.com/svn/mogilefs/trunk/

The only thing you need is my mogstored disk patch which is still pending. All the issues revolving around postgresql and solaris have been already included in trunk.


I fixed a few issues with MogileFS and Solaris. MogileFS should run wonderfully on Solaris with my patches applied.

Directory for all my patches: http://victori.uploadbooth.com/patches

http://victori.uploadbooth.com/patches/solaris-disk-du.patch

This patch fixes mogstored to work with solaris’s df utility.

http://victori.uploadbooth.com/patches/store-max-requests.patch

This patch adds a new feature to the MogileFS Tracker – max_requests.

The default is 0, but it is suggested you set it to 1000 max_requests, to avoid memory leaks.

The tracker will give out the database handle up to the max_requests limit before expiring the connection for a new one. This avoids memory leaks with long running persistent connections. PostgreSQL has issues with long persistent connections, it accumulates a lot of ram and does not let go until the process/connection is killed off. This patch makes sure that the connection is expired after so many dbh handle requests.

http://victori.uploadbooth.com/patches/mogilefs-sunos-pg.patch

This patch applies the InactiveDestroy argument to avoid the MogileFS Tracker locking up with the PostgreSQL store on Solaris.

http://victori.uploadbooth.com/patches/solaris-mogilefs-full.patch

This is the full patch for all my fixes.

I am slowly migrating our fab40 static asset data to MogileFS. I have imported >300,000 images, no issues with my patches so far.

/ PLUG go make an account on uploadbooth!

Enjoy ;-)