Thursday, April 10, 2008

Garbage Collector

Dont gimme that look... :) I am talking about JVM's (Java Virtual Machine's) Garbage collector.

Automatic Garbage Collection is one of the most convenient features in Java, which promises to prevent memory leaks unlike in C or C++, where programmer has to manuallly allocate and de-allocate memory using malloc,calloc/de-alloc functions.

Lets learn the inner workings of GC. Excerpts of an article from Sun:

GC def: is a benevolent sentinel present in every JVM. Its role is to identify and free chunks of memory left unused by the currently running application. Its job should shed some light on the origin of the name "garbage collector." Despite the depreciating name, the GC is like Dilbert's garbage man who is very smart, even smarter than you sometimes.

During its lifecycle, an application creates a certain number of objects, that is to say a certain amount of data that consumes memory and lasts according to its role in the application. As a matter of fact, an application spawns a large amount of short-lived objects during its life span. So you can understand that defining and controlling the lifecycle of every single object generated by an application demands a tremendous amount of work from developers.

A simple example should give you a better idea of the incredible number of objects that are born and then killed. When you open a plain text file in a text editor, in our case Jext, 342,997 objects are created and destroyed.

A memory leak is caused by objects, called undeads or zombies, left unused but still marked alive in memory. The more undeads wandering about, the more memory the application will need. The program eventually runs of out fresh memory and crashes. Good memory management is one of the most difficult issues in low-level languages like C or C++. Some high-level languages, like Java or Python, rely on a GC that cuts down the programmer's workload. So a developer supported by a GC only needs to do object creation.

Although extremely convenient, a GC is not a miracle tool and every so often does wicked things



Parts of a JVM Heap:

The GC in the HotSpot JVM is also called the "generational garbage collector." As its name suggests, this GC can make a difference to several generations of objects. Within the virtual machine, objects are born, live, and die in a memory area known as the heap.

The heap itself is divided into two parts, each one corresponding to a given generation: the young space and the tenured space. The first hosts recent objects, also called children, while the second one holds objects with a long life span, also called ancestors. Next to the heap, the virtual machine contains another particular memory area, called the perm, in which the binary code of each class loaded by the currently executing program is archived. Although the perm is important to applications dynamically generating a lot of bytecode, like a J2EE server, it's unlikely you'll ever need to tweak it.


Both the tenured space and the young space contain a virtual space, a zone of memory available to the JVM but free of any data. That means that those spaces might grow and shrink with time

Whenever a new object is allocated to the heap, the JVM puts it in the eden.The GC uses the one that remains free as a temporary storage bucket. When the young space gets overcrowded, a minor collection is done. A very simple copy algorithm is used that involves the free survivor space. During a minor collection, the GC runs through every object in both the eden and the occupied survivor space to determine which ones are still alive, in other words which still have external references to themselves. Each one of them will then be copied into the empty survivor space.


In the tenured space the laws are different. Whenever more memory is needed, a major collection is done with the help of the Mark-Sweep-Compact algorithm. Though it's not complex, it's greedier than the copy algorithm. The GC will run through all the objects in the heap, mark the candidates for memory reclaiming, and run through the heap again to compact remaining objects and avoid memory fragmentation. At the end of this cycle, all living objects exist side-by-side in the tenured space. The major collections responsible for most Java applications slow down from time to time. Unlike a minor collection, running a major collection stops the execution of the whole application. So a good optimization trick is to reduce the burden of the Mark-Sweep-Compact algorithm.


Parameters of JVM Heap:
-Xms{size} specifies the minimal size of the heap. This option is used to avoid frequent resizing of the heap when your application needs a lot of memory.
-Xmx{size} specifies the maximum size of the heap. This option is used mainly by server-side applications that sometimes need several gigs of memory. So the heap is allowed to grow and shrink between the two values defines by the –Xms and –Xmx flags.
-XX:NewRatio={number} specifies the size ratio between the tenured space and the young space. For instance, -XX:NewRatio=2 would yield a 64MB tenured space and a 32MB young space, together a 96MB heap.
-XX:SurvivorRatio={number} specifies the size ratio between the eden and one survivor space. With a ratio of 2 and a young space of 64MB, the eden will need 32MB of memory whereas each survivor space will use 16MB.

No comments: