Seeya Varnish, nice known’ you
Update: Just an update, All the issues with Varnish on Solaris have been fixed with the 2.1.4 release. We have been using Varnish on our Solaris production servers since the release with great stability and performance. A big thanks to the Varnish devs and slink for the eventport fixes.
I have dumped varnish as our primary cache due to multiple failures of service. I have tried to make it work but varnish kept insisting on producing 503 XID backend failures on perfectly healthy backends. I have tried doing all types of crazy configuration hacks such as forcing varnish to retry backends via a round-robin director. It did not work out all too well since the round trip added latency when varnish had to re-fetch the document multiple times. The final straw that broke the camel’s back was when varnish configured for a 256mb malloc store grew to an astonishing size of 780mb+ RSS.
I have switched to squid-3 and so far it has been stable and fast. I will later post a matching squid configuration to the one below that does the same thing.
Squid-3 will require this patch for it to compile on Solaris.
Varnish on Solaris is a dud.
List of failures
1. Producing 503 responses for perfectly healthy backends. Backend never even gets contacted.
2. Growing to a crazy size when using the malloc implementation.
3. Segfaulting every hour on the hour with the newest trunk r4080+
Here is the configuration I have used. Feel free to use it if varnish works for you.
# # This is a basic VCL configuration file for varnish. See the vcl(7) # man page for details on VCL syntax and semantics. # # $Id: default.vcl 1818 2007-08-09 12:12:12Z des $ # # Default backend definition. Set this to point to your content # server. # my wonderful re-try hack, that kinda works. director www_dir round-robin { { .backend = { .connect_timeout = 2s; .host="127.0.0.1"; .port="8001"; } } { .backend = { .connect_timeout = 2s; .host="127.0.0.1"; .port="8001"; } } { .backend = { .connect_timeout = 2s; .host="127.0.0.1"; .port="8001"; } } { .backend = { .connect_timeout = 2s; .host="127.0.0.1"; .port="8001"; } } } #backend default { .host = "127.0.0.1"; .port = "8089"; .connect_timeout = 2s; } sub vcl_recv { remove req.http.X-Forwarded-For; set req.http.X-Forwarded-For = client.ip; set req.grace = 2m; if (req.http.Accept-Encoding) { if (req.url ~ "\.(jpeg|jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg|ico|swf|flv|dmg)") { # No point in compressing these remove req.http.Accept-Encoding; } elsif (req.http.Accept-Encoding ~ "gzip") { set req.http.Accept-Encoding = "gzip"; } elsif (req.http.Accept-Encoding ~ "deflate") { set req.http.Accept-Encoding = "deflate"; } else { # unkown algorithm remove req.http.Accept-Encoding; } } # don't trust MSIE6 # if (req.http.user-agent ~ "MSIE [1-6]\.") { # remove req.http.Accept-Encoding; # } if (req.http.host == "jira.fabulously40.com") { pipe; } if (req.request == "GET" || req.request == "HEAD") { if ( req.url ~ "\.(xml|gif|jpg|swf|css|png|jpeg|tiff|tif|svg|ico|pdf|ico|swf)") { remove req.http.cookie; lookup; } # avoid caching jsps if ( req.url ~ "\.js([^p]|$)" ) { remove req.http.cookie; lookup; } } # don't bother caching large files if(req.url ~ "\.(mp3|flv|mov|mp4|mpg|mpeg|avi|dmg)") { pipe; } if (req.request != "GET" && req.request != "HEAD") { pipe; } if (req.request == "POST") { pipe; } if (req.http.Expect || req.http.Authorization || req.http.Authenticate || req.http.WWW-Authenticate) { pipe; } # pipe pages with these cookies set if (req.http.cookie && req.http.cookie ~ "_.*_session=") { pipe; } if (req.http.cookie && req.http.cookie ~ "JSESSIONID=") { pipe; } if (req.http.cookie && req.http.cookie ~ "PHPSESSID=") { pipe; } if (req.http.cookie && req.http.cookie ~ "wordpress_logged_in") { pipe; } lookup; } sub vcl_error { # retry on errors if (obj.status == 503) { if ( req.restarts < 12 ) { restart; } } } sub vcl_fetch { # don't cache when these cookies are in place if(beresp.http.Location || beresp.http.WWW-Authenticate) { pass; } if(beresp.http.cookie && beresp.http.cookie ~ "JSESSIONID=") { pass; } if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "JSESSIONID=") { pass; } if(beresp.http.cookie && beresp.http.cookie ~ "_.*_session=") { pass; } if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "_.*_session=") { pass; } if(beresp.http.cookie && beresp.http.cookie ~ "PHPSESSID=") { pass; } if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "PHPSESSID=") { pass; } if(beresp.http.cookie && beresp.http.cookie ~ "wordpress_logged_in") { pass; } if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "wordpress_logged_in") { pass; } if(beresp.http.Cache-Control && beresp.http.Cache-Control ~ "no-cache") { pass; } if(beresp.http.Pragma && beresp.http.Pragma ~ "no-cache") { pass; } # avoid defaults since we *want* pages cached with cookies # if (!beresp.cacheable) { # pass; # } # if (beresp.http.Set-Cookie) { # pass; # } #cache for 30 minutes.. if((beresp.http.Cache-Control !~ "max-age" || beresp.http.Cache-Control !~ "s-maxage") && beresp.ttl < 1800s) { set beresp.ttl = 1800s; } set beresp.grace = 2m; # anonymous users get 10 min delay if(beresp.http.Content-Type && beresp.http.Content-Type ~ "html" && (beresp.http.Cache-Control !~ "max-age" || beresp.http.Cache-Control !~ "s-maxage")) { set beresp.ttl = 600s; } # remove server affinity cookie from cached pages. if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "X-SERVERID=") { remove beresp.http.Set-Cookie; } if(beresp.http.Set-Cookie && beresp.http.Set-Cookie ~ "SERVERID=") { remove beresp.http.Set-Cookie; } if(beresp.http.X-Backend) { remove beresp.http.X-Backend; } deliver; }
I’m sorry that Varnish didn’t work out for you. We (Redpill Linpro) really do believe Varnish is a great cache, but acknowledge that there are still issues to be worked out on Solaris.
The truth is that Varnish is developed mainly for GNU/Linux and FreeBSD. We receive patches for Solaris and Mac OS X, but our quality control is focused around what our customers use, and that’s GNU/Linux and FreeBSD at the moment.
The issues you described are issues we are semi-aware of when it comes to Solaris, and we hope to get the time needed to improve Varnish’ stability on Solaris.
I hope your squid rig works out for you, and hope you’ll give Varnish a try at a later date when we’ve had a chance to solve these issues. (Or you could always sponsor the development, and it’ll happen a lot faster).
We had stability issues with 2.0.4, including 503 issues not unlike the ones you mention. However, we’ve had spectacular uptime and stability since upgrading to 2.1.3 — we’re serving about 200 requests per second.
By the way I think you have a bug in your VCL (I just discovered and fixed this one in my own)
beresp.http.cookie should never exist. HTTP responses only send set-cookie headers, never cookie headers. So that should be:
beresp.http.set-cookie
Just an update, All the issues with Varnish on Solaris have been fixed with the 2.1.4 release. We have been using Varnish on our Solaris production servers since the release with great stability and performance.
Are you using malloc or file storage? We use 2.1.5 on Solaris 10 and see behaviour that perfectly fits the
“2. Growing to a crazy size when using the malloc implementation.”
description.
Using varnishd with -s malloc,1G and after a while the process grew to 3.6G RSS.
Thanks,
Dommas