Letsgetdugg

Random tech jargon

Browsing the tag cache

I have finally nailed out all our issues surrounding Varnish on Solaris, thanks to the help of sky from #varnish. Apparently Varnish uses a wrapper around connect() to drop stale connections to avoid thread pileups if the back-end ever dies. Setting connect_timeout to 0 will force Varnish to use connect() directly. This should eliminate all 503 back-end issues under Solaris that I have mentioned in an earlier blog post.

Here is our startup script for varnish that works for our needs. Varnish is a 64-bit binary hence the “-m64″ cc_command passed.

#!/bin/sh

rm /sessions/varnish_cache.bin

newtask -p highfile /opt/extra/sbin/varnishd -f /opt/extra/etc/varnish/default.vcl -a 72.11.142.91:80 -p listen_depth=8192 -p thread_pool_max=2000 -p thread_pool_min=12 -p thread_pools=4 -p cc_command=’cc -Kpic -G -m64 -o %o %s’ -s file,/sessions/varnish_cache.bin,4G -p sess_timeout=10s -p max_restarts=12 -p session_linger=50s -p connect_timeout=0s -p obj_workspace=16384 -p sess_workspace=32768 -T 0.0.0.0:8086 -u webservd -F

I noticed varnish had particular problem of keeping connections around in CLOSE_WAIT state for a long time, enough to cause issues. I did some tuning on Solaris’s TCP stack so it is more aggressive in closing sockets after the work has been done.

Here are my aggressive TCP settings to force Solaris to close off connections in a short duration of time, to avoid file descriptor leaks. You can merge the following TCP tweaks with the settings I have posted earlier to handle more clients.

# 67 seconds default 675 seconds
/usr/sbin/ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500

# 30 seconds, aggressively close connections – default 4 minutes on solaris < 8
/usr/sbin/ndd -set /dev/tcp tcp_time_wait_interval 30000

# 1 minute, poll for dead connection - default 2 hours
/usr/sbin/ndd -set /dev/tcp tcp_keepalive_interval 60000

Last but not least, I have finally swapped out ActiveMQ for the FUSE message broker, an “enterprise” ActiveMQ distribution. Hopefully it won’t crash once a week like ActiveMQ does for us. The FUSE message broker is based off of ActiveMQ 5.3 sources that fix various memory leaks found in the current stable release of ActiveMQ 5.2 as of this writing.

If the FUSE message broker does not work out, I might have to give Kestrel a try. Hey, if it worked for twitter, it should work for us…right?

Update: Just an update, All the issues with Varnish on Solaris have been fixed with the 2.1.4 release. We have been using Varnish on our Solaris production servers since the release with great stability and performance. A big thanks to the Varnish devs and slink for the eventport fixes.

I have dumped varnish as our primary cache due to multiple failures of service. I have tried to make it work but varnish kept insisting on producing 503 XID backend failures on perfectly healthy backends. I have tried doing all types of crazy configuration hacks such as forcing varnish to retry backends via a round-robin director. It did not work out all too well since the round trip added latency when varnish had to re-fetch the document multiple times. The final straw that broke the camel’s back was when varnish configured for a 256mb malloc store grew to an astonishing size of 780mb+ RSS.

I have switched to squid-3 and so far it has been stable and fast. I will later post a matching squid configuration to the one below that does the same thing.

Squid-3 will require this patch for it to compile on Solaris.

Varnish on Solaris is a dud.

List of failures

1. Producing 503 responses for perfectly healthy backends. Backend never even gets contacted.
2. Growing to a crazy size when using the malloc implementation.
3. Segfaulting every hour on the hour with the newest trunk r4080+

Here is the configuration I have used. Feel free to use it if varnish works for you.

#
# This is a basic VCL configuration file for varnish.  See the vcl(7)
# man page for details on VCL syntax and semantics.
#
# $Id: default.vcl 1818 2007-08-09 12:12:12Z des $
#

# Default backend definition.  Set this to point to your content
# server.

 # my wonderful re-try hack, that kinda works.
 director www_dir round-robin {
     { .backend = { .connect_timeout = 2s; .host="127.0.0.1"; .port="8001"; }  }
     { .backend = { .connect_timeout = 2s; .host="127.0.0.1"; .port="8001"; }  }
     { .backend = { .connect_timeout = 2s; .host="127.0.0.1"; .port="8001"; }  }
     { .backend = { .connect_timeout = 2s; .host="127.0.0.1"; .port="8001"; }  }
 }

#backend default { .host = "127.0.0.1"; .port = "8089"; .connect_timeout = 2s; }

sub vcl_recv {
 remove req.http.X-Forwarded-For;
 set req.http.X-Forwarded-For = client.ip;
 set req.grace = 2m;

    if (req.http.Accept-Encoding) {
        if (req.url ~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|ico|swf|flv|dmg)") {
            # No point in compressing these
            remove req.http.Accept-Encoding;
        } elsif (req.http.Accept-Encoding ~ "gzip") {
            set req.http.Accept-Encoding = "gzip";
        } elsif (req.http.Accept-Encoding ~ "deflate") {
            set req.http.Accept-Encoding = "deflate";
        } else {
            # unkown algorithm
            remove req.http.Accept-Encoding;
        }
    }

# don't trust MSIE6
# if (req.http.user-agent ~ "MSIE [1-6]\.") {
#     remove req.http.Accept-Encoding;
# }

 if (req.http.host == "jira.fabulously40.com") {
   pipe;
 }

 if (req.request == "GET" || req.request == "HEAD") {
	if ( req.url ~ "\.(xml|gif|jpg|swf|css|png|jpeg|tiff|tif|svg|ico|pdf|ico|swf)") {
		remove req.http.cookie;
		lookup;
	}
	# avoid caching jsps
	if ( req.url ~ "\.js([^p]|$)" ) {
		remove req.http.cookie;
		lookup;
	}
 }

 # don't bother caching large files
 if(req.url ~ "\.(mp3|flv|mov|mp4|mpg|mpeg|avi|dmg)") {
     pipe;
 }

 if (req.request != "GET" && req.request != "HEAD") {
     pipe;
 }

 if (req.request == "POST") {
     pipe;
 }

 if (req.http.Expect || req.http.Authorization || req.http.Authenticate || req.http.WWW-Authenticate) {
    pipe;
 }

 # pipe pages with these cookies set
 if (req.http.cookie && req.http.cookie ~ "_.*_session=") {
     pipe;
 }
 if (req.http.cookie && req.http.cookie ~ "JSESSIONID=") {
     pipe;
 }
 if (req.http.cookie && req.http.cookie ~ "PHPSESSID=") {
     pipe;
 }
 if (req.http.cookie && req.http.cookie ~ "wordpress_logged_in") {
     pipe;
 }

 lookup;
}

sub vcl_error {
	# retry on errors
    if (obj.status == 503) {
        if ( req.restarts < 12 ) {
             restart;
         }
     }
}

sub vcl_fetch {

	# don't cache when these cookies are in place
	if(beresp.http.Location || beresp.http.WWW-Authenticate) {
	    pass;
	}
	if(beresp.http.cookie && beresp.http.cookie ~ "JSESSIONID=") {
	    pass;
	}
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "JSESSIONID=") {
	    pass;
	}
	if(beresp.http.cookie && beresp.http.cookie ~ "_.*_session=") {
	    pass;
	}
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "_.*_session=") {
	    pass;
	}
	if(beresp.http.cookie && beresp.http.cookie ~ "PHPSESSID=") {
	    pass;
	}
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "PHPSESSID=") {
	    pass;
	}
	if(beresp.http.cookie && beresp.http.cookie ~ "wordpress_logged_in") {
	    pass;
	}
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "wordpress_logged_in") {
	    pass;
	}
	if(beresp.http.Cache-Control && beresp.http.Cache-Control ~ "no-cache") {
	    pass;
	}
	if(beresp.http.Pragma && beresp.http.Pragma ~ "no-cache") {
	    pass;
	}

# avoid defaults since we *want* pages cached with cookies
#	if (!beresp.cacheable) {
#	    pass;
#	}
#	if (beresp.http.Set-Cookie) {
#		pass;
#	}


	#cache for 30 minutes..
	if((beresp.http.Cache-Control !~ "max-age" || beresp.http.Cache-Control !~ "s-maxage") && beresp.ttl < 1800s) {
		set beresp.ttl = 1800s;
	}
	set beresp.grace = 2m;

	# anonymous users get 10 min delay
	if(beresp.http.Content-Type && beresp.http.Content-Type ~ "html" && (beresp.http.Cache-Control !~ "max-age" ||  beresp.http.Cache-Control !~ "s-maxage")) {
	    set beresp.ttl = 600s;
	}

	# remove server affinity cookie from cached pages.
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "X-SERVERID=") {
	    remove beresp.http.Set-Cookie;
	}
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "SERVERID=") {
	    remove beresp.http.Set-Cookie;
	}
	if(beresp.http.X-Backend) {
	    remove beresp.http.X-Backend;
	}

	deliver;
}




Tagged with , , ,