Relax, Satan is on your side.
I just pushed up a new version of Satan to GitHub. For the
uniformed uninformed Satan is my process reaper for run away unix processes. Satan was designed to work with Solaris’ SMF self-healing properties. Basically, Satan kills while SMF revives. The new version that was pushed up contains HTTP health checks, so Satan now has the ability to kill processes that are not responding back with a HTTP/200 response code.
The motivation behind HTTP health checks was because once a month or so at Fabulously40 our ActiveMQ would break down while still accepting connections, the only way to figure out if it was zombified was to check the HTTP administrator interface. If the ActiveMQ instance was actually knelled over, the administrator interface would come back with a HTTP/500 response code, hence the birth of HTTP health checks.
Here is our Satan configuration file that we use at Fabulously40.
The “args” property might be a bit confusing, it is a snippet of text that Satan looks for in the arguments passed to your application to identify the running process. So for example, if you start your ActiveMQ instance with the following arguments; “java -jar activemq.jar -Dactivemq=8161 -XXXXX” Placing “8161” in args property would be a good unique identifier for Satan to pick up on.
Satan.watch do |s| s.name = "jvm instances" # name of job s.user = "webservd" # under what user s.group = "webservd" # under what group s.deamon = "java" # deamon binary name to grep for s.args = nil # globally look for specific arguments, optional s.debug = true # if to write out debug information s.safe_mode = false # If in safe mode, satan will not kill ;-( s.interval = 10.seconds # interval to run at to collect statistics s.sleep_after_kill = 1.minute # sleep after killing, satan is tired! s.contact = "firstname.lastname@example.org" # admin contact, optional if you want email alerts s.kill_if do |process| process.condition(:cpu) do |cpu| # on cpu condition cpu.name = "50% CPU limit" # name for job cpu.args = "jetty" # make sure this is a jetty process, optional cpu.above = 48.percent # if above certain percentage cpu.times = 5 # how many times we can hit this condition before killing end process.condition(:memory) do |memory| # on memory condition memory.name = "850MB limit" # name for job memory.args = "jetty" # make sure this is a jetty process, optional memory.above = 850.megabytes # limit for memory use memory.times = 5 # how many times we can hit this condition before killing end # ActiveMQ tends to die on us under heavy load so we need the power of satan! process.condition(:http) do |http| # on http condition http.name = "HTTP ActiveMQ Check" # name for job http.args = "8161" # look for specific app arguments # to associate app to URI http.uri = "http://localhost:8161/admin/queues.jsp" # the URI http.times = 5 # how many times before kill end end end