Letsgetdugg

Random tech jargon

Browsing the tag optimization

Since Varnish did not work out on Solaris yet again. I have decided to bite the bullet and write a headers normalization patch for Squid 2.7. This patch should produce much better cache hit rates with Squid. Efficiency++

What the patch does:

1. Removes Cache-Control request headers, don’t let clients by-pass cache if it is primed.
2. Normalize Accept-Encoding Headers for a higher cache hit rate.
3. Clear Accept-Encoding Headers for content that should not be compressed such as image,video and audio.

and the patch: squid-headers-normalization.patch

Update: Fixed a minor memory leak, all good now.
Update 2: Added audio exception to strip accept-encoding.

--- src/client_side.c.og 2010-01-20 12:00:56.000000000 -0800 +++ src/client_side.c 2010-01-19 20:35:31.000000000 -0800 @@ -3983,6 +3983,7 @@ errorAppendEntry(http->entry, err); return -1; } + /* compile headers */ /* we should skip request line! */ if ((http->http_ver.major >= 1) && !httpMsgParseRequestHeader(request, &msg)) { @@ -3992,10 +3993,59 @@ err->url = xstrdup(http->uri); http->al.http.code = err->http_status; http->log_type = LOG_TCP_DENIED; + http->entry = clientCreateStoreEntry(http, method, null_request_flags); errorAppendEntry(http->entry, err); return -1; } + + /* + * Normalize Request Cache-Control / If-Modified-Since Headers + * Don't let client by-pass the cache if there is cached content. + */ + if(httpHeaderHas(&request->header,HDR_CACHE_CONTROL)) { + httpHeaderDelByName(&request->header,"cache-control"); + } + + /* + * Un-comment this if you want Squid to always respond with the request + * instead of returning back with a 304 if the cache has not changed. + */ + /* + if(httpHeaderHas(&request->header,HDR_IF_MODIFIED_SINCE)) { + httpHeaderDelByName(&request->header,"if-modified-since"); + }*/ + + /* + * Normalize Accept-Encoding Headers sent from client + */ + if(httpHeaderHas(&request->header,HDR_ACCEPT_ENCODING)) { + String val = httpHeaderGetByName(&request->header,"accept-encoding"); + if(val.buf) { + if(strstr(val.buf,"gzip") != NULL) { + httpHeaderDelByName(&request->header,"accept-encoding"); + httpHeaderPutStr(&request->header,HDR_ACCEPT_ENCODING,"gzip"); + } else if(strstr(val.buf,"deflate") != NULL) { + httpHeaderDelByName(&request->header,"accept-encoding"); + httpHeaderPutStr(&request->header,HDR_ACCEPT_ENCODING,"deflate"); + } else { + httpHeaderDelByName(&request->header,"accept-encoding"); + } + } + stringClean(&val); + } + + /* + * Normalize Accept-Encoding Headers for video/image content + */ + char *mime_type = mimeGetContentType(http->uri); + if(mime_type) { + if(strstr(mime_type,"image") != NULL || strstr(mime_type,"video") != NULL || strstr(mime_type,"audio") != NULL) { + httpHeaderDelByName(&request->header,"accept-encoding"); + } + } + + /* * If we read past the end of this request, move the remaining * data to the beginning

Once again I have been blind sided by yet another conservative out-of-the-box setting. IPFilter is tuned way too conservative with it’s state table size.

Here is how you can tell if your hitting any issues, run ipfstat and check for lost packets.

victori@opensolaris:~# ipfstat | grep lost fragment state(in): kept 0 lost 0 not fragmented 0 fragment state(out): kept 0 lost 0 not fragmented 0 packet state(in): kept 798 lost 100 packet state(out): kept 612 lost 234

Notice that the in and out lost state lines have a non-zero value. This means IPFilter has been dropping client connections, bummer.

The default settings are quite conservative.

victori@opensolaris:~# ipf -T list | grep fr_state
fr_statemax min 0×1 max 0x7fffffff current 4096
fr_statesize min 0×1 max 0x7fffffff current 5002

You need to shutdown IPFilter and apply larger table size limits.

victori@opensolaris:~# svcadm disable ipfilter
victori@opensolaris:~# /usr/sbin/ipf -T fr_statemax=18963,fr_statesize=27091

Lets confirm that it works.

victori@opensolaris:~# ipf -T list | grep fr_state
fr_statemax min 0×1 max 0x7fffffff current 18963
fr_statesize min 0×1 max 0x7fffffff current 27091

Awesome, now all we need to do is enable IPfilter and no more lost packets.

victori@opensolaris:~# svcadm enable ipfilter

To make this persistent across reboots edit ipf.conf

victori@opensolaris:~# vi /usr/kernel/drv/ipf.conf
name=”ipf” parent=”pseudo” instance=0 fr_statemax=18963 fr_statesize=27091;

Then update the contents

victori@opensolaris:~# devfsadm -i ipf

This can be applied to any OS that uses IPFilter.

One of our articles on fabulously40 went viral on the tagged.com which is one of the largest social networks in the alexa top 100. The viral aspect was quite apparent when the bandwidth sky rocketed to 30Mb/sec of sustained traffic. We were pushing over 60gigs of image data per day! We broke our bandwidth quota in just 3 days. Granted, this was toward the end of the billing cycle for that month.

Facing overdraft charges for bandwidth I decided to re-encode the images with ImageMagick. I converted the images to a “lower quality” compression for a file savings of 71%. Swapping out the images with the newer smaller images dropped the bandwidth by 50% Awesome.

Spot the difference where I swapped out the images?

Tagged with ,

I have finally nailed out all our issues surrounding Varnish on Solaris, thanks to the help of sky from #varnish. Apparently Varnish uses a wrapper around connect() to drop stale connections to avoid thread pileups if the back-end ever dies. Setting connect_timeout to 0 will force Varnish to use connect() directly. This should eliminate all 503 back-end issues under Solaris that I have mentioned in an earlier blog post.

Here is our startup script for varnish that works for our needs. Varnish is a 64-bit binary hence the “-m64″ cc_command passed.

#!/bin/sh

rm /sessions/varnish_cache.bin

newtask -p highfile /opt/extra/sbin/varnishd -f /opt/extra/etc/varnish/default.vcl -a 72.11.142.91:80 -p listen_depth=8192 -p thread_pool_max=2000 -p thread_pool_min=12 -p thread_pools=4 -p cc_command=’cc -Kpic -G -m64 -o %o %s’ -s file,/sessions/varnish_cache.bin,4G -p sess_timeout=10s -p max_restarts=12 -p session_linger=50s -p connect_timeout=0s -p obj_workspace=16384 -p sess_workspace=32768 -T 0.0.0.0:8086 -u webservd -F

I noticed varnish had particular problem of keeping connections around in CLOSE_WAIT state for a long time, enough to cause issues. I did some tuning on Solaris’s TCP stack so it is more aggressive in closing sockets after the work has been done.

Here are my aggressive TCP settings to force Solaris to close off connections in a short duration of time, to avoid file descriptor leaks. You can merge the following TCP tweaks with the settings I have posted earlier to handle more clients.

# 67 seconds default 675 seconds
/usr/sbin/ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500

# 30 seconds, aggressively close connections – default 4 minutes on solaris < 8
/usr/sbin/ndd -set /dev/tcp tcp_time_wait_interval 30000

# 1 minute, poll for dead connection - default 2 hours
/usr/sbin/ndd -set /dev/tcp tcp_keepalive_interval 60000

Last but not least, I have finally swapped out ActiveMQ for the FUSE message broker, an “enterprise” ActiveMQ distribution. Hopefully it won’t crash once a week like ActiveMQ does for us. The FUSE message broker is based off of ActiveMQ 5.3 sources that fix various memory leaks found in the current stable release of ActiveMQ 5.2 as of this writing.

If the FUSE message broker does not work out, I might have to give Kestrel a try. Hey, if it worked for twitter, it should work for us…right?