Dynamic type languages such as Perl, Ruby, PHP, and Python free you as the developer from managing memory in your application. However, it isn’t a fool proof solution that you won’t have memory leaks in your application. You as the developer should be aware of how the underlying garbage collector of your preferred language works to accommodate for the inadequacies of its garbage collection algorithm.

Currently there are two ways of doing garbage collection; mark and sweep and reference counting. The Perl interpreter uses the latter. Reference counting is a fairly simple garbage collection technique. Each time you declare an instance, the reference count increments by one. When your program reaches the end of scope, objects with a reference of one get collected. However, if your object has a reference count of two it is kept. The one main draw back of reference counting is the fact it can’t deal with circular references. This is when both objects point to each other and they never get garbage collected.

On the other hand, Ruby and Java use the mark and sweep garbage collector. I personally have mixed feelings about it, since I don’t know exactly when my objects will be collected. The way mark and sweep garbage collection works, is it does not collect anything for a period of time. At intervals when the heap gets full, it runs its garbage collection. The downside to this is you don’t know exactly know when this happens and if there are lots of objects to be collected this leads to “stutters” and unresponsiveness of the application. If you have ever used a Java swing application you might have noticed these stutters, this is when garbage collection is taking place. However, it’s not as gloomy as I set the pretense to be with the mark and sweep garbage collection. Mark and sweep garbage collection can handle cyclic references unlike with reference counting, which is a huge boon to its usefulness. There has been much work done on mark and sweep garbage collection, specifically with generational mark and sweep collectors that try to fix the unresponsiveness issue. Java currently uses a generation GC, and Ruby hopes to obtain a generational GC for the Ruby 2.0 interpreter. Ideally a generational garbage collector would be the preferred GC for a long-standing process.

With that little garbage collection background out of the way, lets look at the life cycle of a instance in reference counting garbage collector.

Here is an example of how reference counting works ideally:

foreach (1..5) {
  my $i = 5;
  $i + 5;
  print $i . ‘n';
} # $i should be garbage collected when it goes out of scope. 

Unlike mark and sweep garbage collection with reference counting, you know exactly when your instance gets collected.

Here is a very simple problematic case for reference counting:

foreach (1..5) {
	my $a;
	my $b;
	$a->{b} = $b;
	$b->{a} = $a;
} # since both are pointing to each other they will never get collected.

This is a fairly simple case of where reference counting falls right on its face. Usually this isn’t a problem since most Perl scripting revolves around short-lived scripts. However, with frameworks such as Catalyst that are long running perl scripts this becomes an issue quickly. Thankfully, with Perl it is extremely easily to nail memory leaks, more so than with Ruby or Java. Enter Devel::Cycle and Devel::Peek, both of these modules can be installed from cpan. Both Devel::Cycle and Devel::Peek can assist you in tracking down the memory leak in a relatively short time.

use Devel::Cycle;
use Devel::Peek;

foreach (1) {
	my $parent = {name => 'victor' };
	my $child = {name => 'victor jr' };
	
	$parent->{child} = $child;
	$child->{parent} = $parent;

     find_cycle($parent);    
	# find_cycle belongs to Devel::Cycle 
	# which prints out the
	# circular reference to STDOUT
	Dump($parent);          
	# Dump belongs to Devel::Peek , its extra verbose 
 	# which prints out the reference count to STDOUT
}

# Sample output
# ibook:~/Desktop victori$ perl blah.pl     
# Cycle (1):  <-- find_cycle tells you literly where the cyclic reference leak is at.
#   $A->{'child'} => %B                           
#   $B->{'parent'} => %A                           
# 
# SV = RV(0x1817898) at 0x1800ec8
#   REFCNT = 1
#   FLAGS = (PADBUSY,PADMY,ROK)
#   RV = 0x18006dc
#   SV = PVHV(0x1830980) at 0x18006dc
#     REFCNT = 2 <-- Notice the reference count of 2 , we know we have a leak
#     FLAGS = (SHAREKEYS)
#     IV = 2
#     NV = 0
#     ARRAY = 0x404e60  (0:6, 1:2)
#     hash quality = 125.0%
#     KEYS = 2
#     FILL = 2
#     MAX = 7
#     RITER = -1
#     EITER = 0x0
#     Elt "name" HASH = 0xe6e17f14
#     SV = PV(0x1801460) at 0x1800ea4
#       REFCNT = 1
#       FLAGS = (POK,pPOK)
#       PV = 0x401730 "victor"
#       CUR = 6
#       LEN = 8
#     Elt "child" HASH = 0x33ec6b5
#     SV = RV(0x1817870) at 0x1832ca4
#       REFCNT = 1
#       FLAGS = (ROK)
#       RV = 0x1800484
#       SV = PVHV(0x18309b0) at 0x1800484
#         REFCNT = 2
#         FLAGS = (SHAREKEYS)
#         IV = 2
#         NV = 0
#         ARRAY = 0x404db0  (0:6, 1:2)
#         hash quality = 125.0%
#         KEYS = 2
#         FILL = 2
#         MAX = 7
#         RITER = -1
#         EITER = 0x0
#         Elt "parent" HASH = 0xa99c4651
#         SV = RV(0x18178a0) at 0x1832c44
#           REFCNT = 1
#           FLAGS = (ROK)
#           RV = 0x18006dc


So how do we fix this? Quite simple, all we do is weaken the reference count using weaken(). Here is a proper way of patching up the memory leak we introduced in our program.

use Devel::Cycle;
use Devel::Peek;
use Scalar::Util qw/weaken/;

foreach (1) {
	my $parent = {name => 'victor' };
	my $child = {name => 'victor jr' };
	
	weaken($parent->{child} = $child);  
	# we weaken the reference at the parent and all is well.

	$child->{parent} = $parent;  

     find_cycle($parent);    
	# find_cycle belongs to Devel::Cycle which prints out the
	# circular reference to STDOUT
	Dump($parent);          
	# Dump belongs to Devel::Peek , its extra verbose 
 	# which prints out the reference count to STDOUT
}

We weaken the reference at the parent level to set it back to a reference count of 1, so when it reaches the end of scope it will be collected and the memory leak will be no more.

Hopefully this is a good primer for other Perl coders out there who are facing memory leaks in their running long running perl scripts.