I just pushed up a new version of Satan to GitHub. For the uniformed uninformed Satan is my process reaper for run away unix processes. Satan was designed to work with Solaris’ SMF self-healing properties. Basically, Satan kills while SMF revives. The new version that was pushed up contains HTTP health checks, so Satan now has the ability to kill processes that are not responding back with a HTTP/200 response code.

The motivation behind HTTP health checks was because once a month or so at Fabulously40 our ActiveMQ would break down while still accepting connections, the only way to figure out if it was zombified was to check the HTTP administrator interface. If the ActiveMQ instance was actually knelled over, the administrator interface would come back with a HTTP/500 response code, hence the birth of HTTP health checks.

Here is our Satan configuration file that we use at Fabulously40.

The “args” property might be a bit confusing, it is a snippet of text that Satan looks for in the arguments passed to your application to identify the running process. So for example, if you start your ActiveMQ instance with the following arguments; “java -jar activemq.jar -Dactivemq=8161 -XXXXX” Placing “8161” in args property would be a good unique identifier for Satan to pick up on.


Satan.watch do |s|
  s.name = "jvm instances"                # name of job
  s.user = "webservd"                     # under what user
  s.group = "webservd"                    # under what group
  s.deamon = "java"                       # deamon binary name to grep for
  s.args = nil                            # globally look for specific arguments, optional
  s.debug = true                          # if to write out debug information
  s.safe_mode = false                     # If in safe mode, satan will not kill ;-(
  s.interval = 10.seconds                 # interval to run at to collect statistics
  s.sleep_after_kill = 1.minute           # sleep after killing, satan is tired!
  s.contact = "victori@fabulously40.com"  # admin contact, optional if you want email alerts
  
  s.kill_if do |process|
    process.condition(:cpu) do |cpu|      # on cpu condition
      cpu.name  = "50% CPU limit"         # name for job
      cpu.args  = "jetty"                 # make sure this is a jetty process, optional
      cpu.above = 48.percent              # if above certain percentage
      cpu.times = 5                       # how many times we can hit this condition before killing
    end
    
    process.condition(:memory) do |memory|  # on memory condition
      memory.name  = "850MB limit"          # name for job
      memory.args  = "jetty"                # make sure this is a jetty process, optional
      memory.above = 850.megabytes          # limit for memory use
      memory.times = 5                      # how many times we can hit this condition before killing
    end
    
    # ActiveMQ tends to die on us under heavy load so we need the power of satan!
    process.condition(:http) do |http|                        # on http condition
      http.name   = "HTTP ActiveMQ Check"                     # name for job
      http.args   = "8161"                                    # look for specific app arguments
                                                              # to associate app to URI
      http.uri    = "http://localhost:8161/admin/queues.jsp"  # the URI
      http.times  = 5                                         # how many times before kill
    end
  end
end