Relax, Satan is on your side.
I just pushed up a new version of Satan to GitHub. For the uniformed uninformed Satan is my process reaper for run away unix processes. Satan was designed to work with Solaris’ SMF self-healing properties. Basically, Satan kills while SMF revives. The new version that was pushed up contains HTTP health checks, so Satan now has the ability to kill processes that are not responding back with a HTTP/200 response code.
The motivation behind HTTP health checks was because once a month or so at Fabulously40 our ActiveMQ would break down while still accepting connections, the only way to figure out if it was zombified was to check the HTTP administrator interface. If the ActiveMQ instance was actually knelled over, the administrator interface would come back with a HTTP/500 response code, hence the birth of HTTP health checks.
Here is our Satan configuration file that we use at Fabulously40.
The “args” property might be a bit confusing, it is a snippet of text that Satan looks for in the arguments passed to your application to identify the running process. So for example, if you start your ActiveMQ instance with the following arguments; “java -jar activemq.jar -Dactivemq=8161 -XXXXX” Placing “8161” in args property would be a good unique identifier for Satan to pick up on.
Satan.watch do |s|
s.name = "jvm instances" # name of job
s.user = "webservd" # under what user
s.group = "webservd" # under what group
s.deamon = "java" # deamon binary name to grep for
s.args = nil # globally look for specific arguments, optional
s.debug = true # if to write out debug information
s.safe_mode = false # If in safe mode, satan will not kill ;-(
s.interval = 10.seconds # interval to run at to collect statistics
s.sleep_after_kill = 1.minute # sleep after killing, satan is tired!
s.contact = "victori@fabulously40.com" # admin contact, optional if you want email alerts
s.kill_if do |process|
process.condition(:cpu) do |cpu| # on cpu condition
cpu.name = "50% CPU limit" # name for job
cpu.args = "jetty" # make sure this is a jetty process, optional
cpu.above = 48.percent # if above certain percentage
cpu.times = 5 # how many times we can hit this condition before killing
end
process.condition(:memory) do |memory| # on memory condition
memory.name = "850MB limit" # name for job
memory.args = "jetty" # make sure this is a jetty process, optional
memory.above = 850.megabytes # limit for memory use
memory.times = 5 # how many times we can hit this condition before killing
end
# ActiveMQ tends to die on us under heavy load so we need the power of satan!
process.condition(:http) do |http| # on http condition
http.name = "HTTP ActiveMQ Check" # name for job
http.args = "8161" # look for specific app arguments
# to associate app to URI
http.uri = "http://localhost:8161/admin/queues.jsp" # the URI
http.times = 5 # how many times before kill
end
end
end
Nice! Helpful – though as an ActiveMQ committer – worrying!
What version you using ?
FuseMQ – 5.3.0.4
We used to use ActiveMQ 5.2.x which kept crashing every 2 to 3 weeks. Upon your suggestion we switched over to fuseMQ which seemed solid, but it still fails from time to time just a lot less often.
If it helps, here is the JVM opts we use.
ACTIVEMQ_OPTS=”-Dactivemq=8161 -Xmx512M -Xss128k -XX:MaxPermSize=96m -XX:MinHeapFreeRatio=20 -XX:MaxHeapFreeRatio=30 -XX:+UseParallelGC -XX:+UseParallelOldGC -Dorg.apache.activemq.UseDedicatedTaskRunner=false”