Letsgetdugg

Random tech jargon

Browsing the tag squid

Since Varnish did not work out on Solaris yet again. I have decided to bite the bullet and write a headers normalization patch for Squid 2.7. This patch should produce much better cache hit rates with Squid. Efficiency++

What the patch does:

1. Removes Cache-Control request headers, don’t let clients by-pass cache if it is primed.
2. Normalize Accept-Encoding Headers for a higher cache hit rate.
3. Clear Accept-Encoding Headers for content that should not be compressed.

If you have issues patching, here is the patched file. Just replace it with the default one.

squid-2.7/src/client_side.c

and the patch: squid-headers-normalization.patch

Update: Fixed a minor memory leak, all good now.

--- src/client_side.c.og 2010-01-20 12:00:56.000000000 -0800 +++ src/client_side.c 2010-01-19 20:35:31.000000000 -0800 @@ -3983,6 +3983,7 @@ errorAppendEntry(http->entry, err); return -1; } + /* compile headers */ /* we should skip request line! */ if ((http->http_ver.major >= 1) && !httpMsgParseRequestHeader(request, &msg)) { @@ -3992,10 +3993,59 @@ err->url = xstrdup(http->uri); http->al.http.code = err->http_status; http->log_type = LOG_TCP_DENIED; + http->entry = clientCreateStoreEntry(http, method, null_request_flags); errorAppendEntry(http->entry, err); return -1; } + + /* + * Normalize Request Cache-Control / If-Modified-Since Headers + * Don't let client by-pass the cache if there is cached content. + */ + if(httpHeaderHas(&request->header,HDR_CACHE_CONTROL)) { + httpHeaderDelByName(&request->header,"cache-control"); + } + + /* + * Un-comment this if you want Squid to always respond with the request + * instead of returning back with a 304 if the cache has not changed. + */ + /* + if(httpHeaderHas(&request->header,HDR_IF_MODIFIED_SINCE)) { + httpHeaderDelByName(&request->header,"if-modified-since"); + }*/ + + /* + * Normalize Accept-Encoding Headers sent from client + */ + if(httpHeaderHas(&request->header,HDR_ACCEPT_ENCODING)) { + String val = httpHeaderGetByName(&request->header,"accept-encoding"); + if(val.buf) { + if(strstr(val.buf,"gzip") != NULL) { + httpHeaderDelByName(&request->header,"accept-encoding"); + httpHeaderPutStr(&request->header,HDR_ACCEPT_ENCODING,"gzip"); + } else if(strstr(val.buf,"deflate") != NULL) { + httpHeaderDelByName(&request->header,"accept-encoding"); + httpHeaderPutStr(&request->header,HDR_ACCEPT_ENCODING,"deflate"); + } else { + httpHeaderDelByName(&request->header,"accept-encoding"); + } + } + stringClean(&val); + } + + /* + * Normalize Accept-Encoding Headers for video/image content + */ + char *mime_type = mimeGetContentType(http->uri); + if(mime_type) { + if(strstr(mime_type,"image") != NULL || strstr(mime_type,"video") != NULL) { + httpHeaderDelByName(&request->header,"accept-encoding"); + } + } + + /* * If we read past the end of this request, move the remaining * data to the beginning

Squid is a fundamental part of our infrastructure at Fabulously40. It helps us lower response times quite considerably. The problem with Squid is that it is quite “dense” when it comes to configuration flexibility. Unless your willing to do a bit of C hacking on it, it does not have much configuration flexibility. This can be overcome by using supporting software to help squid out.

Note: Our configuration would be quite simplified if we used Varnish but it lacks some key features that make Squid a better candidate.

1. Varnish can’t stream cache-misses, it can only buffer. This adds latency to cache-miss requests.
2. Varnish is unable to avoid caching objects based on content-length size.
3. Varnish has an issue with connect_timeout and Solaris socket handling.

The picture below illustrates my point. Varnish connection time out code just does not work all that well.

varnish wtf

Until Varnish can handle the three things listed, Squid remains the best choice at the cost of configuration complexity.

Optimize Cache Hits by Normalizing Headers

“Accept-Encoding: gzip” and “Accept-Encoding: gzip/deflate” will be cached separately unless you normalize client headers. Squid has no configuration option to normalize headers like Varnish. However, you can use Nginx to normalize headers before passing off the request to Squid.

Here is the setup: Client -> Nginx -> Squid -> Backend Services

The NGINX Configuration

location / { proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $http_host; proxy_hide_header Pragma; # Remove client cache-control header # to avoid fetching from backend if page is in cache proxy_set_header Cache-Control ""; # Normalize static assets for squid if ($request_uri ~* "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|ico|swf|mp4|flv|mov|dmg|mkv)") { set $normal_encoding ""; proxy_pass http://squids; break; } # Normalize gzip encoding if ($http_accept_encoding ~* gzip) { set $normal_encoding "gzip"; proxy_pass http://squids; break; } # Normalize deflate encoding if ($http_accept_encoding ~* deflate) { set $normal_encoding "deflate"; proxy_pass http://squids; break; } # Define the normalize header proxy_set_header Accept-Encoding $normal_encoding; # default... proxy_pass http://squids; break; }

So by the time Squid receives the request, the accept-encoding header is normalized, for efficient cache storage.

Avoid Caching Error Pages with Squid 2.7

Squid 2.7 has better support for reverse caching and HTTP 1.1 than the Squid 3.x branch. However, it missing one important ACL that Squid 3.x has but 2.7 does not; http_status. Squid 2.7 is unable to be configured to avoid caching pages based on status response codes from origin. I have written a Perlbal plugin CacheKill specifically to address this issue. Perlbal::Plugin::CacheKill sits between backend services and Squid and rewrites cache-control headers based on response code.

Here is the setup: Client -> Squid -> Perlbal -> Backend Services

If the backend service responds with a 501,502 or 503 http status code, Perlbal will append Cache-Control: no-cache header before giving back the response to Squid.

Here is the configuration file for Perlbal::Plugin::CacheKill to deny Squid from caching error pages.

CREATE SERVICE fab40 SET listen = 0.0.0.0:8003 SET role = reverse_proxy SET pool = backends cache_kill codes = 404,500,501,503,502 SET plugins = CacheKill ENABLE fab40

Stitching software together can make Squid just as flexible as Varnish with its VCL configuration.

Tagged with , ,

I have dumped varnish as our primary cache due to multiple failures of service. I have tried to make it work but varnish kept insisting on producing 503 XID backend failures on perfectly healthy backends. I have tried doing all types of crazy configuration hacks such as forcing varnish to retry backends via a round-robin director. It did not work out all too well since the round trip added latency when varnish had to re-fetch the document multiple times. The final straw that broke the camel’s back was when varnish configured for a 256mb malloc store grew to an astonishing size of 780mb+ RSS.

I have switched to squid-3 and so far it has been stable and fast. I will later post a matching squid configuration to the one below that does the same thing.

Squid-3 will require this patch for it to compile on Solaris.

Varnish on Solaris is a dud.

List of failures

1. Producing 503 responses for perfectly healthy backends. Backend never even gets contacted.
2. Growing to a crazy size when using the malloc implementation.
3. Segfaulting every hour on the hour with the newest trunk r4080+

Here is the configuration I have used. Feel free to use it if varnish works for you.

#
# This is a basic VCL configuration file for varnish.  See the vcl(7)
# man page for details on VCL syntax and semantics.
#
# $Id: default.vcl 1818 2007-08-09 12:12:12Z des $
#

# Default backend definition.  Set this to point to your content
# server.

 # my wonderful re-try hack, that kinda works.
 director www_dir round-robin {
     { .backend = { .connect_timeout = 2s; .host="127.0.0.1"; .port="8001"; }  }
     { .backend = { .connect_timeout = 2s; .host="127.0.0.1"; .port="8001"; }  }
     { .backend = { .connect_timeout = 2s; .host="127.0.0.1"; .port="8001"; }  }
     { .backend = { .connect_timeout = 2s; .host="127.0.0.1"; .port="8001"; }  }
 }

#backend default { .host = "127.0.0.1"; .port = "8089"; .connect_timeout = 2s; }

sub vcl_recv {
 remove req.http.X-Forwarded-For;
 set req.http.X-Forwarded-For = client.ip;
 set req.grace = 2m;

    if (req.http.Accept-Encoding) {
        if (req.url ~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|ico|swf|flv|dmg)") {
            # No point in compressing these
            remove req.http.Accept-Encoding;
        } elsif (req.http.Accept-Encoding ~ "gzip") {
            set req.http.Accept-Encoding = "gzip";
        } elsif (req.http.Accept-Encoding ~ "deflate") {
            set req.http.Accept-Encoding = "deflate";
        } else {
            # unkown algorithm
            remove req.http.Accept-Encoding;
        }
    }

# don't trust MSIE6
# if (req.http.user-agent ~ "MSIE [1-6]\.") {
#     remove req.http.Accept-Encoding;
# }

 if (req.http.host == "jira.fabulously40.com") {
   pipe;
 }

 if (req.request == "GET" || req.request == "HEAD") {
	if ( req.url ~ "\.(xml|gif|jpg|swf|css|png|jpeg|tiff|tif|svg|ico|pdf|ico|swf)") {
		remove req.http.cookie;
		lookup;
	}
	# avoid caching jsps
	if ( req.url ~ "\.js([^p]|$)" ) {
		remove req.http.cookie;
		lookup;
	}
 }

 # don't bother caching large files
 if(req.url ~ "\.(mp3|flv|mov|mp4|mpg|mpeg|avi|dmg)") {
     pipe;
 }

 if (req.request != "GET" && req.request != "HEAD") {
     pipe;
 }

 if (req.request == "POST") {
     pipe;
 }

 if (req.http.Expect || req.http.Authorization || req.http.Authenticate || req.http.WWW-Authenticate) {
    pipe;
 }

 # pipe pages with these cookies set
 if (req.http.cookie && req.http.cookie ~ "_.*_session=") {
     pipe;
 }
 if (req.http.cookie && req.http.cookie ~ "JSESSIONID=") {
     pipe;
 }
 if (req.http.cookie && req.http.cookie ~ "PHPSESSID=") {
     pipe;
 }
 if (req.http.cookie && req.http.cookie ~ "wordpress_logged_in") {
     pipe;
 }

 lookup;
}

sub vcl_error {
	# retry on errors
    if (obj.status == 503) {
        if ( req.restarts < 12 ) {
             restart;
         }
     }
}

sub vcl_fetch {

	# don't cache when these cookies are in place
	if(beresp.http.Location || beresp.http.WWW-Authenticate) {
	    pass;
	}
	if(beresp.http.cookie && beresp.http.cookie ~ "JSESSIONID=") {
	    pass;
	}
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "JSESSIONID=") {
	    pass;
	}
	if(beresp.http.cookie && beresp.http.cookie ~ "_.*_session=") {
	    pass;
	}
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "_.*_session=") {
	    pass;
	}
	if(beresp.http.cookie && beresp.http.cookie ~ "PHPSESSID=") {
	    pass;
	}
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "PHPSESSID=") {
	    pass;
	}
	if(beresp.http.cookie && beresp.http.cookie ~ "wordpress_logged_in") {
	    pass;
	}
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "wordpress_logged_in") {
	    pass;
	}
	if(beresp.http.Cache-Control && beresp.http.Cache-Control ~ "no-cache") {
	    pass;
	}
	if(beresp.http.Pragma && beresp.http.Pragma ~ "no-cache") {
	    pass;
	}

# avoid defaults since we *want* pages cached with cookies
#	if (!beresp.cacheable) {
#	    pass;
#	}
#	if (beresp.http.Set-Cookie) {
#		pass;
#	}

	#cache for 30 minutes..
	if((beresp.http.Cache-Control !~ "max-age" || beresp.http.Cache-Control !~ "s-maxage") && beresp.ttl < 1800s) {
		set beresp.ttl = 1800s;
	}
	set beresp.grace = 2m;

	# anonymous users get 10 min delay
	if(beresp.http.Content-Type && beresp.http.Content-Type ~ "html" && (beresp.http.Cache-Control !~ "max-age" ||  beresp.http.Cache-Control !~ "s-maxage")) {
	    set beresp.ttl = 600s;
	}

	# remove server affinity cookie from cached pages.
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "X-SERVERID=") {
	    remove beresp.http.Set-Cookie;
	}
	if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "SERVERID=") {
	    remove beresp.http.Set-Cookie;
	}
	if(beresp.http.X-Backend) {
	    remove beresp.http.X-Backend;
	}

	deliver;
}
Tagged with , , ,