I needed something like zfs-auto-snapshot written by Tim Foster but portable so it works on all systems that support ZFS. I reviewed a few scripts on github and was unhappy with what was out there so I decided to write my own.
With zbackup.rb you can define what to snapshot and how many rotation days you want to go back.
So say you want a month of snapshots:
Simple, no?
# Create snapshots for a 7 day rotation.
# ./zbackup.rb iraidz/zWork 7
#
# Add to crontab
# crontab -e
# 0 2 * * * /usr/bin/zbackup.rb iraidz/zWork 7
pool = ARGV[0]
days_back = ARGV[1].to_i
if pool.nil? or pool.empty?
puts "\nDefine the pool you want to snapshot:"
puts "\tex: zbackup.rb iraidz/zWork 7\n\n"
exit 0
end
if days_back.nil? or days_back < 1
puts "\nDefine how many days for your rotation:"
puts "\tex: zbackup.rb iraidz/zWork 7\n\n"
exit 0
end
# response from zfs list
curr_snaps = `zfs list -t snapshot -o name`
# days back limit variable
date_back = Time.now - (86400*days_back)
curr_snaps.split(/\n/).each do |pline|
if m = pline.match(/#{pool}\@([0-9]+)\-([0-9]+)\-([0-9]+)/)
if date_back >= Time.local(m[1],m[2],m[3])
`zfs destroy #{pline}`
end
end
end
# take snapshot for this run if needed.
month = Time.now.month
day = Time.now.day
year = Time.now.year
if curr_snaps !~ /#{pool}\@#{year}\-#{month}\-#{day}/
`zfs snapshot -r #{pool}@#{year}-#{month}-#{day}`
end
Clustering Wicket for fun and profit!
2 Comments | Filed under administration main open source programmingI hate expired sessions, death to all expired sessions. Traditionally a Java servlet container has a fixed session time, a flood of traffic can potentially cause JVM OOM errors if the session time is set too high. I wanted a smart session container that can hold onto sessions for as long as possible and expire sessions only when it is absolutely necessary; A Memcached store would be perfect for this.
There for I recently open sourced the jetty-session-store to solve this problem. With the jetty-session-store you can save your session state to Ehcache, Memcached or the database. State should not be bound to a single JVM, Viva Shared Session Stores!
So now that jetty-session-store is out in the wild you can technically cluster Wicket using just the HttpSessionStore. However, it isn’t very efficient with the way Memcached allocates data in fixed sized cache buckets.
1. Wicket sessions under the HttpSessionStore can get quite large, well over 1Mb in size. A Wicket session not only stores the session state but also the previous serialized pages the user has visited.
2. Serializing and de-serializing a large data structure can get expensive. The HttpSessionStore retains an AccessStackPageMap, which is a list data structure consisting of multiple page map revisions.
So instead of saving one large AccessStackPageMap, I wrote a SecondLevelCacheSessionStore that saves a page map revision per cache entry. This leads to much better cache utilization and a whole lot less serialization on the wire. Not to mention this avoids the whole 1Mb Memcached size limit.
Before you go willy nilly with clustering, read the Wicket render strategies page. Wicket requires session affinity for buffered responses with the default rendering strategy.
Clustering Wicket has never been easier.
Here is an example on how to offload page maps to a hybrid EhCache/Memcached cache. Memcached for long term shared storage while EhCache for short-lived fast cache look ups.
@Override
protected ISessionStore newSessionStore() {
// localhost:11211 — memcached server
// "fabpagestore" — unique appender to avoid key clashes.
// 300 — 5 minute TTL for local ehcache.
return new SecondLevelCacheSessionStore(this,
new CachePageStore(Arrays.asList("localhost:11211"),"fabpagestore",300));
}
}
Here is an example on how to offload page maps to the database.
@Override
protected ISessionStore newSessionStore() {
// "fabpagestore" — unique appender to avoid key clashes.
return new SecondLevelCacheSessionStore(this,new CachePageStore(
new DBCache("jdbc:mysql://foo/mydb", "myname", "mypass", "com.driver.Name", "fabpagestore")));
}
}
Here is my CachePageStore;
import com.base.cache.AsyncMemcache;
import com.base.cache.ICache;
import org.apache.wicket.Page;
import org.apache.wicket.protocol.http.SecondLevelCacheSessionStore.IClusteredPageStore;
import org.apache.wicket.protocol.http.pagestore.AbstractPageStore;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.List;
public class CachePageStore extends AbstractPageStore implements IClusteredPageStore {
private ICache cache;
private Logger logger = LoggerFactory.getLogger(CachePageStore.class);
public CachePageStore(final List<String> servers, final String poolName, final int ttl) {
this(servers, poolName, true, ttl);
}
public CachePageStore(final List<String> servers, final String poolName, boolean async, final int ttl) {
this(new AsyncMemcache(servers, poolName, async, ttl));
}
public CachePageStore(final ICache cache) {
this.cache = cache;
}
// If pageVersion -1 must return highest page version.
protected String getKey(final String sessId, final String pageMapName, final int pageId, final int pageVersion) {
int pageVer = (pageVersion == -1) ? 0 : pageVersion;
if(pageVersion == -1) {
String[] meta = getMeta(sessId, pageMapName, pageId);
pageVer = Integer.valueOf(meta[0]);
}
return sessId + ":" + pageMapName + ":" + pageId + ":" + pageVer;
}
// If pageVersion -1 must return highest page version.
// If ajaxVersion -1 must return highest version.
public String getKey(final String sessId, final String pageMapName, final int pageId, final int pageVersion, final int ajaxVersion) {
// Default it to 0 initially
int ajaxVer = (ajaxVersion == -1) ? 0 : ajaxVersion;
int pageVer = (pageVersion == -1) ? 0 : pageVersion;
if(pageVersion == -1 || ajaxVersion == -1) {
String[] meta = getMeta(sessId, pageMapName, pageId);
if(pageVersion == -1) {
pageVer = Integer.valueOf(meta[0]);
}
if(ajaxVersion == -1) {
ajaxVer = Integer.valueOf(meta[1]);
}
}
return sessId + ":" + pageMapName + ":" + pageId + ":" + pageVer + ":" + ajaxVer;
}
protected String storeKey(final String sessionId, final Page page) {
return sessionId + ":" + page.getPageMapName() + ":" + page.getId() + ":" + page.getCurrentVersionNumber() + ":" + page.getAjaxVersionNumber();
}
protected String getBaseKey(String sessionId, Page page) {
return sessionId + ":" + page.getPageMapName() + ":" + page.getId();
}
protected String getMetaKey(String sessionId, String pageMap, int id) {
return getBaseKey(sessionId,pageMap,id)+"_meta";
}
protected String getMetaKey(String sessionId, Page page) {
return getBaseKey(sessionId,page)+"_meta";
}
protected String getBaseKey(String sessionId, String pageMap, int id) {
if(id == -1) {
return sessionId + ":" + pageMap;
} else {
return sessionId + ":" + pageMap + ":" + id;
}
}
public boolean containsPage(final String sessionId, final String pageMapName, final int pageId, final int pageVersion) {
String key = getKey(sessionId, pageMapName, pageId, pageVersion, -1);
if (logger.isDebugEnabled()) {
logger.debug("CheckExists: " + key);
}
return cache.keyExists(key);
}
public void destroy() {
}
public <T> Page getPage(final String sessionId, final String pagemap, final int id, final int versionNumber, final int ajaxVersionNumber) {
String key = getKey(sessionId, pagemap, id, versionNumber, ajaxVersionNumber);
if (logger.isDebugEnabled()) {
logger.debug("GetPage: " + key);
}
return (Page) cache.get(key);
}
public void pageAccessed(final String sessionId, final Page page) {
}
// If ID == -1 remove the entire pagemap; getBaseKey() takes care of this.
public void removePage(final String sessionId, final String pagemap, final int id) {
String key = getBaseKey(sessionId, pagemap, id);
if (logger.isDebugEnabled()) {
logger.debug("RemovePage: " + key);
}
cache.remove(getMetaKey(sessionId, pagemap, id));
for (String k : cache.getKeys()) {
if (k.startsWith(key)) {
cache.remove(k);
}
}
}
protected String[] getMeta(final String sessionId, String pageMap, int pageId) {
String metaKey = getMetaKey(sessionId,pageMap,pageId);
Object ret = cache.get(metaKey);
if (logger.isDebugEnabled()) {
logger.debug("GetMeta: " + metaKey);
}
if(ret == null) {
return new String[] {"0","0"};
} else {
return String.valueOf(ret).split(":");
}
}
protected void storeMeta(final String sessionId, final Page page) {
String metaKey = getMetaKey(sessionId, page);
Object ret = cache.get(metaKey);
if (logger.isDebugEnabled()) {
logger.debug("StoreMeta: " + metaKey);
}
if(ret == null) {
cache.put(metaKey,page.getCurrentVersionNumber()+":"+page.getAjaxVersionNumber());
} else {
String[] vals = String.valueOf(ret).split(":");
int currPage = Integer.valueOf(vals[0]);
int currAjax = Integer.valueOf(vals[1]);
if(page.getCurrentVersionNumber() > currPage) {
currPage = page.getCurrentVersionNumber();
}
if(page.getAjaxVersionNumber() > currAjax) {
currAjax = page.getAjaxVersionNumber();
}
cache.put(metaKey,currPage+":"+currAjax);
}
}
public void storePage(final String sessionId, final Page page) {
String sKey = storeKey(sessionId, page);
if (logger.isDebugEnabled()) {
logger.debug("StorePage: " + sKey);
}
cache.put(sKey, page);
storeMeta(sessionId,page);
}
public void unbind(final String sessionId) {
if (logger.isDebugEnabled()) {
logger.debug("Unbind: " + sessionId);
}
for (String key : cache.getKeys()) {
if (key.startsWith(sessionId)) {
cache.remove(key);
}
}
}
}
I’ll start this post off with a quote from IRC
ivaynberg: you cant build good looking sites with wicket victori: lies ivaynberg: or public-facing sites
I have to admit that Wicket appeals more to the “backend” programmer than to the front-end design conscious developer. For every good-looking Wicket site out there, there are ten abysmal looking Wicket sites. Just look at the Wicket Wiki, it is littered with some dreadfully designed sites (Sorry Guys, this isn’t personal). You can tell right off the bat that the developers behind the sites care more about OO and clean code rather than clean design. Well to be frank, I don’t even know if the code behind the listed sites is even elegant. However, the fact that the sites are written on Wicket, tells me that the developers care about things such as separation of concerns and object oriented programming.
So to combat against the whole mentality that Wicket can’t scale and any site done in Wicket must look atrocious. I have decided to compile a list of some awesomely kick ass public-facing / good-looking Wicket sites.
If you don’t see your site and you feel that it should have made the list, feel free to leave a comment with your site’s URL.
High Traffic Wicket Sites
adscale.de
This site has an Alexa 1,700 traffic rank and runs on a single Tomcat servlet container. No proxy caches, no fancy clustering just Tomcat.
vegas.com
Next time someone states that no public facing sites are ever written in wicket, point them to vegas.com.
Clean Wicket Sites
kontain.com
The design behind this site is quite good and sets the design bar in my book.
meetmoi.com
Ah, I remember when the developer behind meetmoi dropped by #wicket and stated that he is officially working on it full time with a million dollars in venture capital seed money.
songtexte.com
Don’t know much about this site, aside that it looks clean and the author did the original b-side wicket site that got replaced with wordpress.
memolio.com
fabulously40.com
Disclaimer: this is the site I developed and I think it looks good
winerevolution.com
islamicdesignhouse.com


Update: I feel like a jackass now, I thought I was running this against the stable haproxy build, but in reality this was against haproxy-1.4dev6. DOH! Well on the bright-side, I am helping out the author fix a potentially critical bug. Here is the truss and tcp dump if anyone cares.
Well yet another Solaris specific bug/issue to report. HAProxy resets long running connections. Meaning users on slow bandwidth connections are affected by this. I have sent tcpdumps and logs to the author of HAProxy, hopefully this bug/issue would be resolved. I am writing this as a precautionary warning to other Solaris admins out there.
Here the way to trigger this, see if your service is affected by this.
Result:
–2010-01-20 11:19:29– http://somesite.com/onebigfile.txt
Resolving somesite.com (somesite.com)… 72.11.142.91
Connecting to somesite.com (somesite.com)|72.11.142.91|:84… connected.
HTTP request sent, awaiting response… 200 OK
Length: 3806025 (3.6M)
Saving to: “onebigfile.txt”
7% [====> ] 269,008 20.1K/s in 13s
2010-01-20 11:19:42 (20.1 KB/s) – Read error at byte 269008/3806025 (Connection reset by peer). Retrying.
–2010-01-20 11:19:43– (try: 2) http://somesite.com/onebigfile.txt
Connecting to somesite.com (somesite.com)|72.11.142.91|:84… connected.
HTTP request sent, awaiting response… 200 OK
Length: 3806025 (3.6M)
Saving to: “onebigfile.txt”
4% [==> ] 186,016 20.0K/s eta
/Raging, why are there so many Solaris TCP issues? First Varnish? now HAProxy? ARGHHHHH!@#!@
Speedy PostgreSQL Parallel Compression Dumps
2 Comments | Filed under administration main open sourceI used to backup our database using the following statement;
Once our dataset grew into the gigabytes, it took a very long time to do database dumps. Today, I stumbled upon yet another awesome blog post done by Ted Dzibua mentioning two useful parallel compression utilities. So why not try parallel compression with PostgreSQL dumps?
pbzip2 – Parallel BZIP2: Parallel implementation of BZIP2. BZIP2 is well known for being balls slow, so speed it up using multiple CPUs.
pigz – Parallel GZIP: Parallel implementation of GZIP written by Mark Adler.
Time to try this out with our PostgreSQL dump, here are the result times.
• This was done on a quad core xeon 2.66ghz machine.
real 2m7.332s
user 1m16.414s
sys 0m8.233s
# time pg_dump -U secret -h fab2 somedb | pbzip2 -c > somedb.bz2
real 4m14.253s
user 10m35.879s
sys 0m10.904s
The original database was 1.6gigs. The compressed files came out to….
147M somedb.bz2
194M somedb.gz
And just to make this post complete, to pipe the SQL dump back into PostgreSQL
# createdb somedb
# gzip -d -c somedb.gz | psql somedb
I just pushed up a new version of Satan to GitHub. For the uniformed uninformed Satan is my process reaper for run away unix processes. Satan was designed to work with Solaris’ SMF self-healing properties. Basically, Satan kills while SMF revives. The new version that was pushed up contains HTTP health checks, so Satan now has the ability to kill processes that are not responding back with a HTTP/200 response code.
The motivation behind HTTP health checks was because once a month or so at Fabulously40 our ActiveMQ would break down while still accepting connections, the only way to figure out if it was zombified was to check the HTTP administrator interface. If the ActiveMQ instance was actually knelled over, the administrator interface would come back with a HTTP/500 response code, hence the birth of HTTP health checks.
Here is our Satan configuration file that we use at Fabulously40.
The “args” property might be a bit confusing, it is a snippet of text that Satan looks for in the arguments passed to your application to identify the running process. So for example, if you start your ActiveMQ instance with the following arguments; “java -jar activemq.jar -Dactivemq=8161 -XXXXX” Placing “8161″ in args property would be a good unique identifier for Satan to pick up on.
Satan.watch do |s| s.name = "jvm instances" # name of job s.user = "webservd" # under what user s.group = "webservd" # under what group s.deamon = "java" # deamon binary name to grep for s.args = nil # globally look for specific arguments, optional s.debug = true # if to write out debug information s.safe_mode = false # If in safe mode, satan will not kill ;-( s.interval = 10.seconds # interval to run at to collect statistics s.sleep_after_kill = 1.minute # sleep after killing, satan is tired! s.contact = "victori@fabulously40.com" # admin contact, optional if you want email alerts s.kill_if do |process| process.condition(:cpu) do |cpu| # on cpu condition cpu.name = "50% CPU limit" # name for job cpu.args = "jetty" # make sure this is a jetty process, optional cpu.above = 48.percent # if above certain percentage cpu.times = 5 # how many times we can hit this condition before killing end process.condition(:memory) do |memory| # on memory condition memory.name = "850MB limit" # name for job memory.args = "jetty" # make sure this is a jetty process, optional memory.above = 850.megabytes # limit for memory use memory.times = 5 # how many times we can hit this condition before killing end # ActiveMQ tends to die on us under heavy load so we need the power of satan! process.condition(:http) do |http| # on http condition http.name = "HTTP ActiveMQ Check" # name for job http.args = "8161" # look for specific app arguments # to associate app to URI http.uri = "http://localhost:8161/admin/queues.jsp" # the URI http.times = 5 # how many times before kill end end end
Ted Dziuba beautifully articulated why deadlines go to crap and seemingly straight forward tasks go out the window. You sir have done a public service for us all, thank you.
What I hate is fording endless rivers of horseshit that are in the way of seemingly simple tasks. And I hate it even more when I have to explain to a non-programmer what I am doing, "building LXML against a different version of libiconv because I think it might be the source of a crash". "But all I asked you to do was parse some documents." Good times.
I needed a thread-safe JSMin library for compressing javascripts on the fly on UploadBooth, so I took an existing ruby implementation and made it thread safe. I don’t think there was license defined when I got it, so I am re-releasing it as-is.
class JSMin
EOF = -1
include MonitorMixin
# jsmin — Copy the input to the output, deleting the characters which are
# insignificant to JavaScript. Comments will be removed. Tabs will be
# replaced with spaces. Carriage returns will be replaced with linefeeds.
# Most spaces and linefeeds will be removed.
# thread safe
def minimize(jstext)
synchronize do
@theA = ""
@theB = ""
@current = 0
@output = ""
@text = jstext
@theA = "\n"
action(3)
while (@theA != JSMin::EOF)
case @theA
when " "
if (isAlphanum(@theB))
action(1)
else
action(2)
end
when "\n"
case (@theB)
when "{","[","(","+","-"
action(1)
when " "
action(3)
else
if (isAlphanum(@theB))
action(1)
else
action(2)
end
end
else
case (@theB)
when " "
if (isAlphanum(@theA))
action(1)
else
action(3)
end
when "\n"
case (@theA)
when "}","]",")","+","-","\"","\\", "’", ‘"’
action(1)
else
if (isAlphanum(@theA))
action(1)
else
action(3)
end
end
else
action(1)
end
end
end
@output
end
end
private
# isAlphanum — return true if the character is a letter, digit, underscore,
# dollar sign, or non-ASCII character
def isAlphanum(c)
return false if !c || c == JSMin::EOF
return ((c >= ‘a’ && c <= ‘z’) || (c >= ’0′ && c <= ’9′) ||
(c >= ‘A’ && c <= ‘Z’) || c == ‘_’ || c == ‘$’ ||
c == ‘\’ || c[0] > 126)
end
# get — return the next character from stdin. Watch out for lookahead. If
# the character is a control character, translate it to a space or linefeed.
# thread safe
def get
return JSMin::EOF if @current>(@text.length-1)
c = @text[@current]
@current += 1
c = c.chr
return c if (c >= " " || c == "\n" || c.unpack("c") == JSMin::EOF)
return "\n" if (c == "\r")
return " "
end
# Get the next character without getting it.
def peek
lookaheadChar = @text[@current]
return lookaheadChar.chr
end
# mynext — get the next character, excluding comments.
# peek() is used to see if a ‘/‘ is followed by a ‘/‘ or ‘*‘.
def mynext
c = get
if (c == "/")
if(peek == "/")
while(true)
c = get
if (c <= "\n")
return c
end
end
end
if(peek == "*")
get
while(true)
case get
when "*"
if (peek == "/")
get
return " "
end
when JSMin::EOF
raise "Unterminated comment"
end
end
end
end
return c
end
# action — do something! What you do is determined by the argument: 1
# Output A. Copy B to A. Get the next B. 2 Copy B to A. Get the next B.
# (Delete A). 3 Get the next B. (Delete B). action treats a string as a
# single character. Wow! action recognizes a regular expression if it is
# preceded by ( or , or =.
def action(a)
if(a==1)
@output << @theA
end
if(a==1 || a==2)
@theA = @theB
if (@theA == "\’" || @theA == "\"")
while (true)
@output << @theA
@theA = get
break if (@theA == @theB)
raise "Unterminated string literal" if (@theA <= "\n")
if (@theA == "\\")
@output << @theA
@theA = get
end
end
end
end
if(a==1 || a==2 || a==3)
@theB = mynext
if (@theB == "/" && (@theA == "(" || @theA == "," || @theA == "=" ||
@theA == ":" || @theA == "[" || @theA == "!" ||
@theA == "&" || @theA == "|" || @theA == "?" ||
@theA == "{" || @theA == "}" || @theA == ";" ||
@theA == "\n"))
@output << @theA
@output << @theB
while (true)
@theA = get
if (@theA == "/")
break
elsif (@theA == "\\")
@output << @theA
@theA = get
elsif (@theA <= "\n")
raise "Unterminated RegExp Literal" + @output
end
@output << @theA
end
@theB = mynext
end
end
end
end
Since Varnish did not work out on Solaris yet again. I have decided to bite the bullet and write a headers normalization patch for Squid 2.7. This patch should produce much better cache hit rates with Squid. Efficiency++
What the patch does:
1. Removes Cache-Control request headers, don’t let clients by-pass cache if it is primed.
2. Normalize Accept-Encoding Headers for a higher cache hit rate.
3. Clear Accept-Encoding Headers for content that should not be compressed such as image,video and audio.
and the patch: squid-headers-normalization.patch
Update: Fixed a minor memory leak, all good now.
Update 2: Added audio exception to strip accept-encoding.
--- src/client_side.c.og 2010-01-20 12:00:56.000000000 -0800 +++ src/client_side.c 2010-01-19 20:35:31.000000000 -0800 @@ -3983,6 +3983,7 @@ errorAppendEntry(http->entry, err); return -1; } + /* compile headers */ /* we should skip request line! */ if ((http->http_ver.major >= 1) && !httpMsgParseRequestHeader(request, &msg)) { @@ -3992,10 +3993,59 @@ err->url = xstrdup(http->uri); http->al.http.code = err->http_status; http->log_type = LOG_TCP_DENIED; + http->entry = clientCreateStoreEntry(http, method, null_request_flags); errorAppendEntry(http->entry, err); return -1; } + + /* + * Normalize Request Cache-Control / If-Modified-Since Headers + * Don't let client by-pass the cache if there is cached content. + */ + if(httpHeaderHas(&request->header,HDR_CACHE_CONTROL)) { + httpHeaderDelByName(&request->header,"cache-control"); + } + + /* + * Un-comment this if you want Squid to always respond with the request + * instead of returning back with a 304 if the cache has not changed. + */ + /* + if(httpHeaderHas(&request->header,HDR_IF_MODIFIED_SINCE)) { + httpHeaderDelByName(&request->header,"if-modified-since"); + }*/ + + /* + * Normalize Accept-Encoding Headers sent from client + */ + if(httpHeaderHas(&request->header,HDR_ACCEPT_ENCODING)) { + String val = httpHeaderGetByName(&request->header,"accept-encoding"); + if(val.buf) { + if(strstr(val.buf,"gzip") != NULL) { + httpHeaderDelByName(&request->header,"accept-encoding"); + httpHeaderPutStr(&request->header,HDR_ACCEPT_ENCODING,"gzip"); + } else if(strstr(val.buf,"deflate") != NULL) { + httpHeaderDelByName(&request->header,"accept-encoding"); + httpHeaderPutStr(&request->header,HDR_ACCEPT_ENCODING,"deflate"); + } else { + httpHeaderDelByName(&request->header,"accept-encoding"); + } + } + stringClean(&val); + } + + /* + * Normalize Accept-Encoding Headers for video/image content + */ + char *mime_type = mimeGetContentType(http->uri); + if(mime_type) { + if(strstr(mime_type,"image") != NULL || strstr(mime_type,"video") != NULL || strstr(mime_type,"audio") != NULL) { + httpHeaderDelByName(&request->header,"accept-encoding"); + } + } + + /* * If we read past the end of this request, move the remaining * data to the beginning
Clearing stale cache by domain
You can clear a site’s cache by domain, this is really nifty if you have Varnish in front of multiple sites. You can log into Varnish’s administration console via telnet and execute the following purge command to wipe out the undesired cache.
purge req.http.host ~ letsgetdugg.com
Monitor Response codes
Worried that some of your clients might be receiving 503 Varnish response pages? Find out with varnishtop.
varnishtop -i TxStatus
Here is how the output looks like.
list length 7 web 4018.65 TxStatus 200 132.35 TxStatus 304 44.17 TxStatus 404 34.63 TxStatus 302 30.87 TxStatus 301 9.36 TxStatus 403 1.39 TxStatus 503


(11 votes, average: 4.73 out of 5)








