Browsing the topic main
I was recently considering trying out the new G1 garbage collector, see if it was any better than current real time CMS garbage collector. A concurrent soft real-time garbage collector that can compact? Awesome!
I switched one of my production applications to use the new G1 garbage collector and noticed a spike in CPU and diminishing throughput almost instantaniously, what gives? I googled around and stumbled upon this blog post and decided to do my own benchmarking.
I hacked up the following scala script based off the blog post to compare the two garbage collectors. The JDK that was used was JDK7u3 on Solaris on a quad core box.
CMS:
time scala -J-XX:+UseConcMarkSweepGC GC.scala
real 0m12.477s
user 0m12.364s
sys 0m0.491s
G1:
time scala -J-XX:+UseG1GC GC.scala
real 2m26.121s
user 7m33.234s
sys 0m10.888s
Conclusion:
Just what I saw with my production application, the throughput substantially diminished and the CPU cores spiked. I won’t be using the G1 garbage collector any time soon, hopefully Oracle will improve the G1 garbage collector with subsequent releases.
Today, I am open sourcing my Ruby LocaleTranslator; the translator uses google’s translator API to translate a primary seed locale into various other languages. This eases the creation of multi-lingual sites. Not only can the LocaleTranslator translate your main seed locale into different languages but it can also recursively merge in differences, this comes in handy if you have hand-optimized your translated locales.
Viva Localization!
My projects that use the LocaleTranslator; UploadBooth, PasteBooth and ShrinkBooth.
LocaleTranslator Examples
en.yml
site: hello_world: Hello World! home: Home statement: Localization should be simple!
Batch Conversion of your English locale.
require ‘monkey-patches.rb’
require ‘locale_translator.rb’
en_yml = YAML::load(File.open(‘en.yml’))
[:de,:ru].each do |lang|
lang_yml = LocaleTranslator.translate(en_yml,
:to=>lang,
:html=>true,
:key=>‘GOOGLE API KEY’)
f = File.new("#{lang.to_s.downcase}.yml","w")
f.puts(lang_yml.ya2yaml(:syck_compatible => true))
f.close
p "Translated to #{lang.to_s}"
end
Merge in new locale keys from your English Locale into your already translated Russian locale.
require ‘monkey-patches.rb’
require ‘locale_translator.rb’
en_yml = YAML::load(File.open(‘en.yml’))
ru_yml = YAML::load(File.open(‘ru.yml’))
ru_new_yml = LocaleTranslator.translate(en_yml,
:to=>:ru,
:html=>true,
:merge=>ru_yml,
:key=>‘GOOGLE API KEY’)
puts ru_new_yml.ya2yaml(:syck_compatible => true)
The Implementation Code
Support Monkey Patches
monkey-patches.rb
def to_list
h2l(self)
end
def diff(hash)
hsh = {}
this = self
hash.each do |k,v|
if v.kind_of?Hash and this.key?k
tmp = this[k].diff(v)
hsh[k] = tmp if tmp.size > 0
else
hsh[k] = v unless this.key?k
end
end
hsh
end
def merge_r(hash)
hsh = {}
this = self
hash.each do |k,v|
if v.kind_of?Hash
hsh[k] = this[k].merge_r(v)
else
hsh[k] = v
end
end
self.merge(hsh)
end
private
def h2l(hash)
list = []
hash.each {|k,v| list = (v.kind_of?Hash) ? list.merge_with_dups(h2l(v)) : list << v }
list
end
end
class Array
def chunk(p=2)
return [] if p.zero?
p_size = (length.to_f / p).ceil
[first(p_size), *last(length - p_size).chunk(p - 1)]
end
def to_hash(hash)
l2h(hash,self)
end
def merge(arr)
self | arr
end
def merge_with_dups(arr)
temp = []
self.each {|a| temp << a }
arr.each {|a| temp << a }
temp
end
def merge!(arr)
temp = self.clone
self.clear
temp.each {|a| self << a }
arr.each {|a| self << a unless temp.include?a }
true
end
def merge_with_dups!(arr)
temp = self.clone
self.clear
temp.each {|a| self << a }
arr.each {|a| self << a }
true
end
private
def l2h(hash,lst)
hsh = {}
hash.each {|k,v| hsh[k] = (v.kind_of?Hash) ? l2h(v,lst) : lst.shift }
hsh
end
end
The LocaleTranslator Implementation
You need the ya2yaml and easy_translate gems. Ya2YAML can export locales in UTF-8 unlike the standard yaml implementation that can only export in binary for non-standard ascii.
locale-translator.rb
require ‘rubygems’
require ‘ya2yaml’
require ‘yaml’
require ‘easy_translate’
class LocaleTranslator
def self.translate(text,opts)
opts[:to] = [opts[:to]] if opts[:to] and !opts[:to].kind_of?Array
if opts[:merge].kind_of?Hash and text.kind_of?Hash
diff = opts[:merge].diff(text)
diff_hsh = LocaleTranslator.translate(diff,:to=>opts[:to],:html=>true)
return opts[:merge].merge_r(diff_hsh)
end
if text.kind_of?Hash
t_arr = text.to_list
t_arr = t_arr.first if t_arr.size == 1
tout_arr = LocaleTranslator.translate(t_arr,:to=>opts[:to],:html=>true)
tout_arr = [tout_arr] if tout_arr.kind_of?String
tout_arr.to_hash(text)
elsif text.kind_of?Array
if text.size > 50
out = []
text.chunk.each {|l| out.merge_with_dups!(EasyTranslate.translate(l,opts).first) }
out
else
text = text.first if text.size == 1
EasyTranslate.translate(text,opts).first
end
else
EasyTranslate.translate(text,opts).first
end
end
end
I have a very non-standard storage setup at home. The setup is made up of a 3x500G raidz array on ZFS hosted by OSX. For the longest time I could not get files to copy over samba on ZFS. The files would stream just fine but not copy over, they would abort at the 99% transfer point. Well, I have finally found the fix for it; turn off extended attributes!
smb.conf
vfs objects = notify_kqueue,darwinacl
; The darwin_streams module gives us named streams support.
stream support = no
ea support = no
; Enable locking coherency with AFP.
darwin_streams:brlm = no
As Charles Heston would say, You can have my ZFS when you pry it from my cold dead hands.
viva ZFS on OSX!
I recently needed to make use of our ActiveMQ message queue service to scale up write performance of CouchDB. However, there seemed to be a bug with JRuby that kills off the STOMP subscriber every 5 seconds. Digging a bit deeper into the STOMP source, I figured out a way to get around the bug by removing the timeout line.
ActiveMQ let me scale CouchDB writes from 10req/sec to 128req/sec. Huge performance win with very little effort.
STOMP Library Monkey Patch:
if defined?(JRUBY_VERSION)
module Stomp
class Connection
def _receive( read_socket )
@read_semaphore.synchronize do
line = read_socket.gets
return nil if line.nil?
# If the reading hangs for more than 5 seconds, abort the parsing process
#Timeout::timeout(5, Stomp::Error::PacketParsingTimeout) do
# Reads the beginning of the message until it runs into a empty line
message_header = ”
begin
message_header += line
begin
line = read_socket.gets
rescue
p read_socket
end
end until line =~ /^\s?\n$/
# Checks if it includes content_length header
content_length = message_header.match /content-length\s?:\s?(\d+)\s?\n/
message_body = ”
# If it does, reads the specified amount of bytes
char = ”
if content_length
message_body = read_socket.read content_length[1].to_i
raise Stomp::Error::InvalidMessageLength unless parse_char(read_socket.getc) == "\0"
# Else reads, the rest of the message until the first \0
else
message_body += char while read_socket.ready? && (char = parse_char(read_socket.getc)) != "\0"
end
# If the buffer isn’t empty, reads the next char and returns it to the buffer
# unless it’s a \n
if read_socket.ready?
last_char = read_socket.getc
read_socket.ungetc(last_char) if parse_char(last_char) != "\n"
end
# Adds the excluded \n and \0 and tries to create a new message with it
Message.new(message_header + "\n" + message_body + "\0")
end
#end
end
end
end
end
I spent the better part of the day trying to figure out why the Final Cut Pro audio was out of sync. The audio would be delayed by 3 seconds in playback. I thought it was some setting in Final Cut Pro that broke, but eventually came to the conclusion that it was the OS and not Final Cut Pro that was causing the audio delay. The specific cause was the VoodooHDA driver, even though it worked perfect in ordinary applications such as iTunes and Safari it had a delay issue with Final Cut Pro. The Fix? Install SoundFlowerBed and configure your audio output settings in it. This somehow magically fixes the delay issue in the VoodooHDA driver. I thought I should post it here for “internet” record keeping.

Clustering Wicket for fun and profit!
2 Comments | Filed under administration main open source programmingI hate expired sessions, death to all expired sessions. Traditionally a Java servlet container has a fixed session time, a flood of traffic can potentially cause JVM OOM errors if the session time is set too high. I wanted a smart session container that can hold onto sessions for as long as possible and expire sessions only when it is absolutely necessary; A Memcached store would be perfect for this.
There for I recently open sourced the jetty-session-store to solve this problem. With the jetty-session-store you can save your session state to Ehcache, Memcached or the database. State should not be bound to a single JVM, Viva Shared Session Stores!
So now that jetty-session-store is out in the wild you can technically cluster Wicket using just the HttpSessionStore. However, it isn’t very efficient with the way Memcached allocates data in fixed sized cache buckets.
1. Wicket sessions under the HttpSessionStore can get quite large, well over 1Mb in size. A Wicket session not only stores the session state but also the previous serialized pages the user has visited.
2. Serializing and de-serializing a large data structure can get expensive. The HttpSessionStore retains an AccessStackPageMap, which is a list data structure consisting of multiple page map revisions.
So instead of saving one large AccessStackPageMap, I wrote a SecondLevelCacheSessionStore that saves a page map revision per cache entry. This leads to much better cache utilization and a whole lot less serialization on the wire. Not to mention this avoids the whole 1Mb Memcached size limit.
Before you go willy nilly with clustering, read the Wicket render strategies page. Wicket requires session affinity for buffered responses with the default rendering strategy.
Clustering Wicket has never been easier.
Here is an example on how to offload page maps to a hybrid EhCache/Memcached cache. Memcached for long term shared storage while EhCache for short-lived fast cache look ups.
@Override
protected ISessionStore newSessionStore() {
// localhost:11211 — memcached server
// "fabpagestore" — unique appender to avoid key clashes.
// 300 — 5 minute TTL for local ehcache.
return new SecondLevelCacheSessionStore(this,
new CachePageStore(Arrays.asList("localhost:11211"),"fabpagestore",300));
}
}
Here is an example on how to offload page maps to the database.
@Override
protected ISessionStore newSessionStore() {
// "fabpagestore" — unique appender to avoid key clashes.
return new SecondLevelCacheSessionStore(this,new CachePageStore(
new DBCache("jdbc:mysql://foo/mydb", "myname", "mypass", "com.driver.Name", "fabpagestore")));
}
}
Here is my CachePageStore;
import com.base.cache.AsyncMemcache;
import com.base.cache.ICache;
import org.apache.wicket.Page;
import org.apache.wicket.protocol.http.SecondLevelCacheSessionStore.IClusteredPageStore;
import org.apache.wicket.protocol.http.pagestore.AbstractPageStore;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.List;
public class CachePageStore extends AbstractPageStore implements IClusteredPageStore {
private ICache cache;
private Logger logger = LoggerFactory.getLogger(CachePageStore.class);
public CachePageStore(final List<String> servers, final String poolName, final int ttl) {
this(servers, poolName, true, ttl);
}
public CachePageStore(final List<String> servers, final String poolName, boolean async, final int ttl) {
this(new AsyncMemcache(servers, poolName, async, ttl));
}
public CachePageStore(final ICache cache) {
this.cache = cache;
}
// If pageVersion -1 must return highest page version.
protected String getKey(final String sessId, final String pageMapName, final int pageId, final int pageVersion) {
int pageVer = (pageVersion == -1) ? 0 : pageVersion;
if(pageVersion == -1) {
String[] meta = getMeta(sessId, pageMapName, pageId);
pageVer = Integer.valueOf(meta[0]);
}
return sessId + ":" + pageMapName + ":" + pageId + ":" + pageVer;
}
// If pageVersion -1 must return highest page version.
// If ajaxVersion -1 must return highest version.
public String getKey(final String sessId, final String pageMapName, final int pageId, final int pageVersion, final int ajaxVersion) {
// Default it to 0 initially
int ajaxVer = (ajaxVersion == -1) ? 0 : ajaxVersion;
int pageVer = (pageVersion == -1) ? 0 : pageVersion;
if(pageVersion == -1 || ajaxVersion == -1) {
String[] meta = getMeta(sessId, pageMapName, pageId);
if(pageVersion == -1) {
pageVer = Integer.valueOf(meta[0]);
}
if(ajaxVersion == -1) {
ajaxVer = Integer.valueOf(meta[1]);
}
}
return sessId + ":" + pageMapName + ":" + pageId + ":" + pageVer + ":" + ajaxVer;
}
protected String storeKey(final String sessionId, final Page page) {
return sessionId + ":" + page.getPageMapName() + ":" + page.getId() + ":" + page.getCurrentVersionNumber() + ":" + page.getAjaxVersionNumber();
}
protected String getBaseKey(String sessionId, Page page) {
return sessionId + ":" + page.getPageMapName() + ":" + page.getId();
}
protected String getMetaKey(String sessionId, String pageMap, int id) {
return getBaseKey(sessionId,pageMap,id)+"_meta";
}
protected String getMetaKey(String sessionId, Page page) {
return getBaseKey(sessionId,page)+"_meta";
}
protected String getBaseKey(String sessionId, String pageMap, int id) {
if(id == -1) {
return sessionId + ":" + pageMap;
} else {
return sessionId + ":" + pageMap + ":" + id;
}
}
public boolean containsPage(final String sessionId, final String pageMapName, final int pageId, final int pageVersion) {
String key = getKey(sessionId, pageMapName, pageId, pageVersion, -1);
if (logger.isDebugEnabled()) {
logger.debug("CheckExists: " + key);
}
return cache.keyExists(key);
}
public void destroy() {
}
public <T> Page getPage(final String sessionId, final String pagemap, final int id, final int versionNumber, final int ajaxVersionNumber) {
String key = getKey(sessionId, pagemap, id, versionNumber, ajaxVersionNumber);
if (logger.isDebugEnabled()) {
logger.debug("GetPage: " + key);
}
return (Page) cache.get(key);
}
public void pageAccessed(final String sessionId, final Page page) {
}
// If ID == -1 remove the entire pagemap; getBaseKey() takes care of this.
public void removePage(final String sessionId, final String pagemap, final int id) {
String key = getBaseKey(sessionId, pagemap, id);
if (logger.isDebugEnabled()) {
logger.debug("RemovePage: " + key);
}
cache.remove(getMetaKey(sessionId, pagemap, id));
for (String k : cache.getKeys()) {
if (k.startsWith(key)) {
cache.remove(k);
}
}
}
protected String[] getMeta(final String sessionId, String pageMap, int pageId) {
String metaKey = getMetaKey(sessionId,pageMap,pageId);
Object ret = cache.get(metaKey);
if (logger.isDebugEnabled()) {
logger.debug("GetMeta: " + metaKey);
}
if(ret == null) {
return new String[] {"0","0"};
} else {
return String.valueOf(ret).split(":");
}
}
protected void storeMeta(final String sessionId, final Page page) {
String metaKey = getMetaKey(sessionId, page);
Object ret = cache.get(metaKey);
if (logger.isDebugEnabled()) {
logger.debug("StoreMeta: " + metaKey);
}
if(ret == null) {
cache.put(metaKey,page.getCurrentVersionNumber()+":"+page.getAjaxVersionNumber());
} else {
String[] vals = String.valueOf(ret).split(":");
int currPage = Integer.valueOf(vals[0]);
int currAjax = Integer.valueOf(vals[1]);
if(page.getCurrentVersionNumber() > currPage) {
currPage = page.getCurrentVersionNumber();
}
if(page.getAjaxVersionNumber() > currAjax) {
currAjax = page.getAjaxVersionNumber();
}
cache.put(metaKey,currPage+":"+currAjax);
}
}
public void storePage(final String sessionId, final Page page) {
String sKey = storeKey(sessionId, page);
if (logger.isDebugEnabled()) {
logger.debug("StorePage: " + sKey);
}
cache.put(sKey, page);
storeMeta(sessionId,page);
}
public void unbind(final String sessionId) {
if (logger.isDebugEnabled()) {
logger.debug("Unbind: " + sessionId);
}
for (String key : cache.getKeys()) {
if (key.startsWith(sessionId)) {
cache.remove(key);
}
}
}
}
I’ll start this post off with a quote from IRC
ivaynberg: you cant build good looking sites with wicket victori: lies ivaynberg: or public-facing sites
I have to admit that Wicket appeals more to the “backend” programmer than to the front-end design conscious developer. For every good-looking Wicket site out there, there are ten abysmal looking Wicket sites. Just look at the Wicket Wiki, it is littered with some dreadfully designed sites (Sorry Guys, this isn’t personal). You can tell right off the bat that the developers behind the sites care more about OO and clean code rather than clean design. Well to be frank, I don’t even know if the code behind the listed sites is even elegant. However, the fact that the sites are written on Wicket, tells me that the developers care about things such as separation of concerns and object oriented programming.
So to combat against the whole mentality that Wicket can’t scale and any site done in Wicket must look atrocious. I have decided to compile a list of some awesomely kick ass public-facing / good-looking Wicket sites.
If you don’t see your site and you feel that it should have made the list, feel free to leave a comment with your site’s URL.
High Traffic Wicket Sites
adscale.de
This site has an Alexa 1,700 traffic rank and runs on a single Tomcat servlet container. No proxy caches, no fancy clustering just Tomcat.
vegas.com
Next time someone states that no public facing sites are ever written in wicket, point them to vegas.com.
Clean Wicket Sites
kontain.com
The design behind this site is quite good and sets the design bar in my book.
meetmoi.com
Ah, I remember when the developer behind meetmoi dropped by #wicket and stated that he is officially working on it full time with a million dollars in venture capital seed money.
songtexte.com
Don’t know much about this site, aside that it looks clean and the author did the original b-side wicket site that got replaced with wordpress.
memolio.com
fabulously40.com
Disclaimer: this is the site I developed and I think it looks good
winerevolution.com
islamicdesignhouse.com
Speedy PostgreSQL Parallel Compression Dumps
2 Comments | Filed under administration main open sourceI used to backup our database using the following statement;
Once our dataset grew into the gigabytes, it took a very long time to do database dumps. Today, I stumbled upon yet another awesome blog post done by Ted Dzibua mentioning two useful parallel compression utilities. So why not try parallel compression with PostgreSQL dumps?
pbzip2 – Parallel BZIP2: Parallel implementation of BZIP2. BZIP2 is well known for being balls slow, so speed it up using multiple CPUs.
pigz – Parallel GZIP: Parallel implementation of GZIP written by Mark Adler.
Time to try this out with our PostgreSQL dump, here are the result times.
• This was done on a quad core xeon 2.66ghz machine.
real 2m7.332s
user 1m16.414s
sys 0m8.233s
# time pg_dump -U secret -h fab2 somedb | pbzip2 -c > somedb.bz2
real 4m14.253s
user 10m35.879s
sys 0m10.904s
The original database was 1.6gigs. The compressed files came out to….
147M somedb.bz2
194M somedb.gz
And just to make this post complete, to pipe the SQL dump back into PostgreSQL
# createdb somedb
# gzip -d -c somedb.gz | psql somedb
I just pushed up a new version of Satan to GitHub. For the uniformed uninformed Satan is my process reaper for run away unix processes. Satan was designed to work with Solaris’ SMF self-healing properties. Basically, Satan kills while SMF revives. The new version that was pushed up contains HTTP health checks, so Satan now has the ability to kill processes that are not responding back with a HTTP/200 response code.
The motivation behind HTTP health checks was because once a month or so at Fabulously40 our ActiveMQ would break down while still accepting connections, the only way to figure out if it was zombified was to check the HTTP administrator interface. If the ActiveMQ instance was actually knelled over, the administrator interface would come back with a HTTP/500 response code, hence the birth of HTTP health checks.
Here is our Satan configuration file that we use at Fabulously40.
The “args” property might be a bit confusing, it is a snippet of text that Satan looks for in the arguments passed to your application to identify the running process. So for example, if you start your ActiveMQ instance with the following arguments; “java -jar activemq.jar -Dactivemq=8161 -XXXXX” Placing “8161″ in args property would be a good unique identifier for Satan to pick up on.
Satan.watch do |s| s.name = "jvm instances" # name of job s.user = "webservd" # under what user s.group = "webservd" # under what group s.deamon = "java" # deamon binary name to grep for s.args = nil # globally look for specific arguments, optional s.debug = true # if to write out debug information s.safe_mode = false # If in safe mode, satan will not kill ;-( s.interval = 10.seconds # interval to run at to collect statistics s.sleep_after_kill = 1.minute # sleep after killing, satan is tired! s.contact = "victori@fabulously40.com" # admin contact, optional if you want email alerts s.kill_if do |process| process.condition(:cpu) do |cpu| # on cpu condition cpu.name = "50% CPU limit" # name for job cpu.args = "jetty" # make sure this is a jetty process, optional cpu.above = 48.percent # if above certain percentage cpu.times = 5 # how many times we can hit this condition before killing end process.condition(:memory) do |memory| # on memory condition memory.name = "850MB limit" # name for job memory.args = "jetty" # make sure this is a jetty process, optional memory.above = 850.megabytes # limit for memory use memory.times = 5 # how many times we can hit this condition before killing end # ActiveMQ tends to die on us under heavy load so we need the power of satan! process.condition(:http) do |http| # on http condition http.name = "HTTP ActiveMQ Check" # name for job http.args = "8161" # look for specific app arguments # to associate app to URI http.uri = "http://localhost:8161/admin/queues.jsp" # the URI http.times = 5 # how many times before kill end end end
Ted Dziuba beautifully articulated why deadlines go to crap and seemingly straight forward tasks go out the window. You sir have done a public service for us all, thank you.
What I hate is fording endless rivers of horseshit that are in the way of seemingly simple tasks. And I hate it even more when I have to explain to a non-programmer what I am doing, "building LXML against a different version of libiconv because I think it might be the source of a crash". "But all I asked you to do was parse some documents." Good times.


(2 votes, average: 3.50 out of 5)








