Tuesday, September 23, 2014

Garbage Collection

As we said earlier objects are stored in the Young generation as well as Old generation and there should be a way to clean the objects. The Garbage Collector comes at this point.

How a JVM does the Memory allocation?
Memory allocations in JVM are done using 2 techniques
Bump-the-pointer: bump-the-pointer creates objects on the top of the EDEN space. This keeps track of the last object created. Whenever a request for creating a new object comes it will check the old object created and make sure the new object size is suitable enough to be created in Eden space. It creates the object and the new object will be the first one on the top.

This can be considered as a Pointer increment method where the first object allocated will have an 'address' (actually an offset into the segment) of zero. When you allocate object then the memory manager will know how big the object is, allocate that much space within the segment (16 bytes say) and then increment it's "offset address" by that amount.

But this becomes complicated during the multi-threaded case, to save objects used by multiple threads in the Eden space for Thread-Safe, an inevitable lock will occur and the performance will drop due to the lock-contention. The TLAB method helps in this case

TLABs (Thread-Local Allocation Buffers): This allows each thread created in JVM to have its own small portion of the EDEN space. As each thread created can only access to their own TLAB, the memory allocations can be done without a lock

Garbage Collector
The Garbage Collector algorithm is another important place to tune the performance. These determine how the garbage collection process is executed.

A GC can be considered good when it meets certain criteria
When a high throughput is achieved &
When a Small Pause time is achieved

JVM always executes GC in a dedicated Thread called "GC threads”. So when ever GC threads are active, they have to compete with the application threads for Processor and CPU time.

Throughput, the throughput can be the amount of work done by an application as a ratio of time spent in GC

For example, a throughput of 99/100 means that on average the application threads are running 99 out of 100 seconds of program execution time, while the GC threads are only running for one second during the same time span.

This can be managed by ‑XX:GCTimeRatio=99 ; 99 is the default equating to 1% GC time.

The term “pause time” refers to a time span in which the application threads are paused completely in favor of the GC threads. For example, a pause time of 100 milliseconds during a GC means that no application thread was active during that 100 millisecond interval

This can be managed by  ‑XX:MaxGCPauseMillis=<n>.
Garbage collection uses a term "stop-the-world”. This means that when a JVM is doing a Garbage collection, every thread except for the threads needed for the GC will stop their tasks. The interrupted tasks will resume only after the GC task has completed. So when ever a GC is going being performed, all application related threads are kept on hold until the Gc is completed.

The JVM uses a form of garbage collector called a tracing collector, which operates by pausing the world around it, marking all root objects (objects referenced directly by running threads), and following their references, marking each object it sees along the way.

Java implements something called a generational garbage collector based upon the generational hypothesis assumption, which states that the majority of objects that are created are quickly discarded, and objects that are not quickly collected are likely to be around for a while.

Why application Threads need to be stopped?
 A GC requires certain preconditions in order to run safely. For example, it must be guaranteed that application threads do not modify the state of objects while the GC threads try to decide which objects are still referenced and which ones are not. For this reason, the application threads must be stopped during a GC 

Even though GC is good but it causes additional costs for thread scheduling: direct costs through context switches and indirect costs because of cache effects

So a GC comes with negligible cost efforts

Minor and Major GC
This is divided into 2 parts,

Minor GC (Young generation): Most of the objects created in the Young Generation.  These objects are created and then will be disappeared. When objects disappear from this area, we say a "minor GC" has occurred. 

Major GC (Old generation): When Objects that are survived from young generation, they are copied to the Old Generation. As its size gets bigger and bigger a GC Occurs which is called a Major GC that cleans up objects that can be garbage collected.

Gc happened in Perm Space can also be considered as a Major GC.

How can the GC find objects that are eligible to Garbage?
An Object will be considered to garbage when the object is no longer referenced by any pointers from inside the application. Normally a GC iterates over every reachable object and if any objects are left over they are Garbage Collected.

When the Young Generation is full (smaller Pause + more Frequent), a minor collection GC is trigged. This has the small performance impact since it cleans only the smaller memory area.

When the Old generation is full (Bigger Pause + Less Frequent), a major collection GC is trigged. This can lead to a performance impact since it is targeted to the Entire Heap area.

If the major GC fails to free required memory, the JVM increases the current memory to allocate memory for the Object allocation. The whole cycle moves until memory reaches Max Memory Set for the JVM and then we see the Out Of Memory.

What if an object in the old generation needs to reference an object in the young generation?

To handle this case, the Old generation maintains something called ‘Card Table’ which is a 512 byte chunk. So If an object in old Generation refers an object in young generation, this information is recorded in this table. When a GC is started in the young generation, only this card table is searched to determine whether an object is being referenced by something from the Old generation. This eliminated the need to check the object references for all the objects in the old generation to check whether they are accessing something from Young generation. This card table is managed with write barrier. This write barrier is a device that allows a faster performance for minor GC. Though a bit of overhead occurs because of this, the overall GC time is reduced.

GC alogirthms 
There are 5 algorithms
  1. Serial GC
  2. Parallel GC
  3. Parallel Old GC (Parallel Compacting GC)
  4. Concurrent Mark & Sweep GC  (or "CMS")
  5. Garbage First (G1) GC
And these can be further divided into

Collectors operating on young generation

Collectors operating on old generation

Serial Collector (-XX:+useSerialGC)
This is a default copying collector. This collector performs the Garbage collection in a single thread. During this process it stops all other threads. Serial GC may drop the application performance gradually.

This can be used with single processor systems. Since there is no communication overhead between various threads, it cant advantage from the multi processor systems. This GC type was created when there was only one CPU core on desktop computers.

Parallel GC (-XX:+UseParNewGC)
This is similar to the serial collector which is a stop-the-world collector. But this collector parallelizes the copying collection over multiple threads, which is more efficient than the original single-thread copying collector for multi-CPU machines.

Parallel GC (-XX:+UseParallelGC) &  Parallel Old GC  (‑XX:+UseParallelOldGC)
Parallel GC uses multiple Threads for performing the GC and hence it is faster. This GC is useful when there is enough memory and a large number of cores. It is also called the "throughput GC." 

The parallel collector comes in 2 forms, Parallel collector which uses multiple threads to perform minor GC on the young generation and a single thread for performing the major GC on the old generation.

The other one Parallel Old Collector which is a default on from jdk7 uses multiple threads for both minor GC as well as major GC.

This algorithm is considered as the best on a multi processor systems which will give the greatest throughput. For example, batch processing like printing reports or bills or performing a large number of database queries .The below image tells the differences between Serial and Parallel GC

Concurrent Mark & Sweep GC (-XX:+UseConcMarkSweepGC)
This GC method is one of the complex method available now. As we have learned earlier the Throughput Collector always pauses the application threads for some time during the GC process. In Contrast the CMS collector runs along with the application threads and only cause few pause times. This is sometimes called as Concurrent Low Pause Collector.

A GC cycle of the CMS Collector consists of six phases. Four of the phases are run concurrently to the actual application while the other two phases need to stop the application threads.

·         Initial Mark: The application threads are paused and marks all objects that are reachable as live from root objects (GC roots).
·         Concurrent Mark: in this phase all the application threads are restarted again. Objects that are reachable by the objects identified in the first phase are also marked as live
·         Concurrent Pre-clean: in the 3 phase, it checks the objects which has been updated or promoted during the second phase. It also checks for any new objects allocated. This phase runs multiple times and make sure certain amount of memory space is available in Eden. It also checks whether objects are alive or dead.
·         Remark: in the next stage , the application threads are stopped one more time so that a check can be done on objects to find any reference changes happened in the 3 phase. The application threads are stopped ensure a correct view of referenced objects before the actual cleaning takes place
·         Concurrent Sweep: All objects that are not referenced are removed from heap
·         Concurrent Reset: The GC collector does some house keeping jobs to make sure that there is clean state for the next Gc run.

Concurrent Mode Failure
At some points, the CMS collector can’t fill the needs of the application and a Full Gc is needed. This is called Concurrent Mode Failure. This failure occurs when there is not enough space in tenured to promote a Objects

CMS doesn't collect permgen spaces by default, and requires the XX:+CMSClassUnloadingEnabled flag enabled in order to do for the perm space to be gc
Most cases a Full GC is trigged if the flag is not enabled. permgen space can hold references into normal heap via things like class loaders, which means that until you collect Permgen you may be leaking memory in regular heap

A separate article will be available on Garbage G1 collector.

More to Come ,Happy Learning J

1 comment :

  1. excellent efforts..I appreciate you....