Letsgetdugg

Random tech jargon

Browsing the tag varnish

Clearing stale cache by domain

You can clear a site’s cache by domain, this is really nifty if you have Varnish in front of multiple sites. You can log into Varnish’s administration console via telnet and execute the following purge command to wipe out the undesired cache.

purge req.http.host ~ letsgetdugg.com

Monitor Response codes

Worried that some of your clients might be receiving 503 Varnish response pages? Find out with varnishtop.

varnishtop -i TxStatus

Here is how the output looks like.

list length 7 web 4018.65 TxStatus 200 132.35 TxStatus 304 44.17 TxStatus 404 34.63 TxStatus 302 30.87 TxStatus 301 9.36 TxStatus 403 1.39 TxStatus 503
Tagged with , ,

Update 2010-02-19: Seems other people are also affected by the Varnish LINGER crash on OpenSolaris. This does not address the core problem but removes the “fail fast” behavior with no negative side effects.

r4576 has been running reliably with the fix below.

In varnishd/bin/cache_acceptor.c

if (need_linger)
                setsockopt(sp->fd, SOL_SOCKET, SO_LINGER,
                    &linger, sizeof linger);

Remove TCP_assert line encapsulating setsockopt().

Update 2010-02-17: This might be a random fluke but Varnish has connection issues when compiled under SunCC, stick to GCC. I have compiled Varnish with GCC 4.3.2 and the build seems to work well. Give r4572 a try, phk commited some solaris aware errno code.

Update 2010-02-16: r4567 seems stable. Errno isn’t thread-safe by default on Solaris like other platforms, you need to define -pthreads for GCC and -mt for SunCC in both the compile and linking flags.

GCC example:

VCC_CC=”cc -Kpic -G -m64 -o %o %s” CC=/opt/sfw/bin/gcc CFLAGS=”-O3 -L/opt/extra/lib/amd64 -pthreads -m64 -fomit-frame-pointer” LDFLAGS=”-lumem -pthreads” ./configure –prefix=/opt/extra

SunCC Example:

VCC_CC=”cc -Kpic -G -m64 -o %o %s” CC=/opt/SSX0903/bin/cc CFLAGS=”-xO3 -fast -xipo -L/opt/extra/lib/amd64 -mt -m64″ LDFLAGS=”-lumem -mt” ./configure –prefix=/opt/extra

Here are the sources on how I pieced it all together: sun docs, stack overflow answer

See what -pthreads define on GCC

# gcc -x c /dev/null -E -dM -pthreads | grep REENT
#define _REENTRANT 1

snippet from solaris’s /usr/include/errno.h to confirm that errno isn’t thread safe by default.

#if defined(_REENTRANT) || defined(_TS_ERRNO) || _POSIX_C_SOURCE – 0 >= 199506L
extern int *___errno();
#define errno (*(___errno()))
#else
extern int errno;
/* ANSI C++ requires that errno be a macro */
#if __cplusplus >= 199711L
#define errno errno
#endif
#endif /* defined(_REENTRANT) || defined(_TS_ERRNO) */

Update 2010-01-28: r4508 seems stable. No patches needed aside from removing an assert(AZ) in cache_acceptor.c on line 163.

Update 2010-01-21: If your using Varnish from trunk past r4445 apply this session cache_waiter_poll patch to avoid stalled connections.

Update 2009-21-12: Still using Varnish in production, the site is working beautifully with the settings below.

Update(new): I think I figured the last remaining piece of the puzzle. Switching Varnish’s default listener to poll fixed the long connection accept wait times.

Update: Monitor charts looked good, but persistent connections kept flaking under production traffic. I was forced to revert back to Squid 2.7. *Sigh* I think Squid might be the only viable option on Solaris when it comes to reverse proxy caching. The information below is useful if you still want to try out Varnish on Solaris.

I have finally wrangled Varnish to work reliably on Solaris without any apparent issues. The recent commit to trunk by phk(creator) fixed the last remaining Solaris issue that I am aware of.

There are three four requirements to get this working reliably on Solaris.

1. Run from trunk – r4508 is a known stable revision that works well. Remove the AZ() assert in cache_acceptor.c on line 163.

2. Set connect_timeout to 0, this is needed to work around a Varnish/Solaris TCP incompatibility that resides in lib/libvarnish/tcp.c#TCP_connect timeout code.

3. Switch the default waiter to poll. EventPorts seems bugged on OpenSolaris builds.

4. If you have issues starting Varnish, start Varnish in the foreground via -F argument.

Here is a Pingdom graph of our monitored service. Can you tell when Varnish was swapped in for Squid? Varnish does a better job of keeping content cached due to header normalization and larger cache size.

varnish latency improvement

There are a few “gotchas” to look out for to get it all running reliably. Here is the configuration that I used in production. I have annotated each setting with a brief description.

newtask -p highfile /opt/extra/sbin/varnishd -f /opt/extra/etc/varnish/default.vcl -a 0.0.0.0:82 # IP/Port to listen on -p listen_depth=8192 # Connections kernel buffers before rejecting. -p waiter=poll # Listener implementation to use. -p thread_pool_max=2000 # Max threads per pool -p thread_pool_min=50 # Min Threads per pool, crank this high -p thread_pools=4 # Thread Pool per CPU -p thread_pool_add_delay=2ms # Thread init delay, not to bomb OS -p cc_command='cc -Kpic -G -m64 -o %o %s' # 64-Bit if needed -s file,/sessions/varnish_cache.bin,512M # Define cache size -p sess_timeout=10s # Keep-Alive timeout -p max_restarts=12 # Amount of restart attempts -p session_linger=120ms # Milliseconds to keep thread around -p connect_timeout=0s # Important bug work around for Solaris -p lru_interval=20s # LRU interval checks -p sess_workspace=65536 # Space for headers -T 0.0.0.0:8086 # Admin console -u webservd # User to run varnish as

System configuration Optimizations

Solaris lacks SO_{SND|RCV}TIMEO BSD socket flags. These flags are used to define TCP timeout values per socket. Every other OS has it Mac OS X, Linux, FreeBSD, AIX but not Solaris. Meaning Varnish is unable to make use of custom defined timeout values on Solaris. You can do the next best thing with Solaris; optimize the TCP timeouts globally.

# Turn off Nagle. Nagle Adds latency. /usr/sbin/ndd -set /dev/tcp tcp_naglim_def 1 # 30 second TIME_WAIT timeout. (4 minutes default) /usr/sbin/ndd -set /dev/tcp tcp_time_wait_interval 30000 # 15 min keep-alive (2 hour default) /usr/sbin/ndd -set /dev/tcp tcp_keepalive_interval 900000 # 120 sec connect time out , 3 min default ndd -set /dev/tcp tcp_ip_abort_cinterval 120000 # Send ACKs right away - less latency on bursty connections. ndd -set /dev/tcp tcp_deferred_acks_max 0 # RFC says 1 segment, BSD/Win stack requires 2 segments. /usr/sbin/ndd -set /dev/tcp tcp_slow_start_initial 2

Varnish Settings Dissected

Here are the most important settings to look out for when deploying Varnish in production.

File Descriptors

Run Varnish under a Solaris project that gives the proxy enough file descriptors to handle the concurrency. If Varnish can not allocate enough file descriptors, it can’t serve the requests.

# Paste into /etc/project # Run the Application newtask -p highfile highfile:101::*:*:process.max-file-descriptor=(basic,32192,deny)

Threads

Give enough idle threads to Varnish so it does not stall on requests. Thread creation is slow and expensive, idle threads are not. Don’t go cheap with threads, allocate a minimum of 200. Modern browsers use 8 concurrent connections by default, meaning Varnish will need 8 threads to handle a single page view.

thread_pool_max=2000 # 2000 max threads per pool thread_pool_min=50 # 50 min threads per pool # 50 threads x 4 Pools = 200 threads thread_pools=4 # 4 Pools, Pool per CPU Core. session_linger=120ms # How long to keep a thread around # To handle further requests.
Tagged with , ,

I have finally nailed out all our issues surrounding Varnish on Solaris, thanks to the help of sky from #varnish. Apparently Varnish uses a wrapper around connect() to drop stale connections to avoid thread pileups if the back-end ever dies. Setting connect_timeout to 0 will force Varnish to use connect() directly. This should eliminate all 503 back-end issues under Solaris that I have mentioned in an earlier blog post.

Here is our startup script for varnish that works for our needs. Varnish is a 64-bit binary hence the “-m64″ cc_command passed.

#!/bin/sh

rm /sessions/varnish_cache.bin

newtask -p highfile /opt/extra/sbin/varnishd -f /opt/extra/etc/varnish/default.vcl -a 72.11.142.91:80 -p listen_depth=8192 -p thread_pool_max=2000 -p thread_pool_min=12 -p thread_pools=4 -p cc_command=’cc -Kpic -G -m64 -o %o %s’ -s file,/sessions/varnish_cache.bin,4G -p sess_timeout=10s -p max_restarts=12 -p session_linger=50s -p connect_timeout=0s -p obj_workspace=16384 -p sess_workspace=32768 -T 0.0.0.0:8086 -u webservd -F

I noticed varnish had particular problem of keeping connections around in CLOSE_WAIT state for a long time, enough to cause issues. I did some tuning on Solaris’s TCP stack so it is more aggressive in closing sockets after the work has been done.

Here are my aggressive TCP settings to force Solaris to close off connections in a short duration of time, to avoid file descriptor leaks. You can merge the following TCP tweaks with the settings I have posted earlier to handle more clients.

# 67 seconds default 675 seconds
/usr/sbin/ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500

# 30 seconds, aggressively close connections – default 4 minutes on solaris < 8
/usr/sbin/ndd -set /dev/tcp tcp_time_wait_interval 30000

# 1 minute, poll for dead connection - default 2 hours
/usr/sbin/ndd -set /dev/tcp tcp_keepalive_interval 60000

Last but not least, I have finally swapped out ActiveMQ for the FUSE message broker, an “enterprise” ActiveMQ distribution. Hopefully it won’t crash once a week like ActiveMQ does for us. The FUSE message broker is based off of ActiveMQ 5.3 sources that fix various memory leaks found in the current stable release of ActiveMQ 5.2 as of this writing.

If the FUSE message broker does not work out, I might have to give Kestrel a try. Hey, if it worked for twitter, it should work for us…right?

Update: Just an update, All the issues with Varnish on Solaris have been fixed with the 2.1.4 release. We have been using Varnish on our Solaris production servers since the release with great stability and performance. A big thanks to the Varnish devs and slink for the eventport fixes.

I have dumped varnish as our primary cache due to multiple failures of service. I have tried to make it work but varnish kept insisting on producing 503 XID backend failures on perfectly healthy backends. I have tried doing all types of crazy configuration hacks such as forcing varnish to retry backends via a round-robin director. It did not work out all too well since the round trip added latency when varnish had to re-fetch the document multiple times. The final straw that broke the camel’s back was when varnish configured for a 256mb malloc store grew to an astonishing size of 780mb+ RSS.

I have switched to squid-3 and so far it has been stable and fast. I will later post a matching squid configuration to the one below that does the same thing.

Squid-3 will require this patch for it to compile on Solaris.

Varnish on Solaris is a dud.

List of failures

1. Producing 503 responses for perfectly healthy backends. Backend never even gets contacted.
2. Growing to a crazy size when using the malloc implementation.
3. Segfaulting every hour on the hour with the newest trunk r4080+

Here is the configuration I have used. Feel free to use it if varnish works for you.

#
# This is a basic VCL configuration file for varnish.  See the vcl(7)
# man page for details on VCL syntax and semantics.
#
# $Id: default.vcl 1818 2007-08-09 12:12:12Z des $
#

# Default backend definition.  Set this to point to your content
# server.

 # my wonderful re-try hack, that kinda works.
 director www_dir round-robin {
     { .backend = { .connect_timeout = 2s; .host="127.0.0.1"; .port="8001"; }  }
     { .backend = { .connect_timeout = 2s; .host="127.0.0.1"; .port="8001"; }  }
     { .backend = { .connect_timeout = 2s; .host="127.0.0.1"; .port="8001"; }  }
     { .backend = { .connect_timeout = 2s; .host="127.0.0.1"; .port="8001"; }  }
 }

#backend default { .host = "127.0.0.1"; .port = "8089"; .connect_timeout = 2s; }

sub vcl_recv {
 remove req.http.X-Forwarded-For;
 set req.http.X-Forwarded-For = client.ip;
 set req.grace = 2m;

    if (req.http.Accept-Encoding) {
        if (req.url ~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|ico|swf|flv|dmg)") {
            # No point in compressing these
            remove req.http.Accept-Encoding;
        } elsif (req.http.Accept-Encoding ~ "gzip") {
            set req.http.Accept-Encoding = "gzip";
        } elsif (req.http.Accept-Encoding ~ "deflate") {
            set req.http.Accept-Encoding = "deflate";
        } else {
            # unkown algorithm
            remove req.http.Accept-Encoding;
        }
    }

# don't trust MSIE6
# if (req.http.user-agent ~ "MSIE [1-6]\.") {
#     remove req.http.Accept-Encoding;
# }

 if (req.http.host == "jira.fabulously40.com") {
   pipe;
 }

 if (req.request == "GET" || req.request == "HEAD") {
	if ( req.url ~ "\.(xml|gif|jpg|swf|css|png|jpeg|tiff|tif|svg|ico|pdf|ico|swf)") {
		remove req.http.cookie;
		lookup;
	}
	# avoid caching jsps
	if ( req.url ~ "\.js([^p]|$)" ) {
		remove req.http.cookie;
		lookup;
	}
 }

 # don't bother caching large files
 if(req.url ~ "\.(mp3|flv|mov|mp4|mpg|mpeg|avi|dmg)") {
     pipe;
 }

 if (req.request != "GET" && req.request != "HEAD") {
     pipe;
 }

 if (req.request == "POST") {
     pipe;
 }

 if (req.http.Expect || req.http.Authorization || req.http.Authenticate || req.http.WWW-Authenticate) {
    pipe;
 }

 # pipe pages with these cookies set
 if (req.http.cookie && req.http.cookie ~ "_.*_session=") {
     pipe;
 }
 if (req.http.cookie && req.http.cookie ~ "JSESSIONID=") {
     pipe;
 }
 if (req.http.cookie && req.http.cookie ~ "PHPSESSID=") {
     pipe;
 }
 if (req.http.cookie && req.http.cookie ~ "wordpress_logged_in") {
     pipe;
 }

 lookup;
}

sub vcl_error {
	# retry on errors
    if (obj.status == 503) {
        if ( req.restarts < 12 ) {
             restart;
         }
     }
}

sub vcl_fetch {

	# don't cache when these cookies are in place
	if(beresp.http.Location || beresp.http.WWW-Authenticate) {
	    pass;
	}
	if(beresp.http.cookie && beresp.http.cookie ~ "JSESSIONID=") {
	    pass;
	}
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "JSESSIONID=") {
	    pass;
	}
	if(beresp.http.cookie && beresp.http.cookie ~ "_.*_session=") {
	    pass;
	}
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "_.*_session=") {
	    pass;
	}
	if(beresp.http.cookie && beresp.http.cookie ~ "PHPSESSID=") {
	    pass;
	}
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "PHPSESSID=") {
	    pass;
	}
	if(beresp.http.cookie && beresp.http.cookie ~ "wordpress_logged_in") {
	    pass;
	}
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "wordpress_logged_in") {
	    pass;
	}
	if(beresp.http.Cache-Control && beresp.http.Cache-Control ~ "no-cache") {
	    pass;
	}
	if(beresp.http.Pragma && beresp.http.Pragma ~ "no-cache") {
	    pass;
	}

# avoid defaults since we *want* pages cached with cookies
#	if (!beresp.cacheable) {
#	    pass;
#	}
#	if (beresp.http.Set-Cookie) {
#		pass;
#	}


	#cache for 30 minutes..
	if((beresp.http.Cache-Control !~ "max-age" || beresp.http.Cache-Control !~ "s-maxage") && beresp.ttl < 1800s) {
		set beresp.ttl = 1800s;
	}
	set beresp.grace = 2m;

	# anonymous users get 10 min delay
	if(beresp.http.Content-Type && beresp.http.Content-Type ~ "html" && (beresp.http.Cache-Control !~ "max-age" ||  beresp.http.Cache-Control !~ "s-maxage")) {
	    set beresp.ttl = 600s;
	}

	# remove server affinity cookie from cached pages.
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "X-SERVERID=") {
	    remove beresp.http.Set-Cookie;
	}
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "SERVERID=") {
	    remove beresp.http.Set-Cookie;
	}
	if(beresp.http.X-Backend) {
	    remove beresp.http.X-Backend;
	}

	deliver;
}




Tagged with , , ,