Thursday, September 25, 2014

Defunct Processes

Recently we faced a issue with one of our tomcat Server. The Server was crashed with Out of Memory error. We tried to get the server up and running but it did not came up running at any point.

We then checked the process table where we,

hello:local-ews $ ps ux
root        6274  4.2    0.0         0        0    ?          Zl     Sep18  343:53 [java] <defunct>

We see that a Process 6274 is running from some time. This is the Process ID for the tomcat instance when running. Now the state of the PID is in Zl State which mean is gone to a Zombie state.

The defunct process or zombie process is a process that has completed execution but still has an entry in the process table. So the process has actually completed the task and waiting for their parent process to destroy them. This process will not go away until the parent process collects this child process or destroy them. At this point our only way was to find the parent process and bounce it. I obtained the Parent Process ID as

hello:local-IEM-A2 $ ps -l 6274
0 Z  7282  6274     1  4  80   0 -     0 exit   ?        344:57 [java] <defunct>

The parent Process ID was “1” which says that the defunct process was started from the Parent Process.

At any point we cannot kill the defunct process as it is already killed but exists in the Process table. The only way to clear a defunct process is to kill the parent process but in this case it is the boot process so we had to bounce the machine for clearing this process.

The defunct process occupies very little resources like a slot in the process table and the resource (timing) information that the parent can ask for.

So what happens exactly?
UNIX or Linux maintains a Parent-Child relationship between processes. So a Whenever a Child process dies, the parent process receives the notification and then it is the responsibility of the parent process to take notice of the child death by using a Wait() system call. The return value of this wait() method is the PID of the child. Once the status is returned the parent process changes the arguments of the child process to the exit status. The shells like bash will know how to process following commands and set the special $? Variable. As long as the parent hasn't called wait(), the system needs to keep the dead child in the global process list, because that's the only place where the process ID is stored.

The purpose of the "zombies" is really just for the system to remember the process ID, so that it can inform the parent process about it on request. If the parent "forgets" to collect on its children, then the zombie will stay undead forever.
This is the reason why we need to find the parent process and bounce it to clear these defunct processes.

More to Come, Happy learningJ
Read More

Tuesday, September 23, 2014

Garbage Collection

As we said earlier objects are stored in the Young generation as well as Old generation and there should be a way to clean the objects. The Garbage Collector comes at this point.

How a JVM does the Memory allocation?
Memory allocations in JVM are done using 2 techniques
Bump-the-pointer: bump-the-pointer creates objects on the top of the EDEN space. This keeps track of the last object created. Whenever a request for creating a new object comes it will check the old object created and make sure the new object size is suitable enough to be created in Eden space. It creates the object and the new object will be the first one on the top.

This can be considered as a Pointer increment method where the first object allocated will have an 'address' (actually an offset into the segment) of zero. When you allocate object then the memory manager will know how big the object is, allocate that much space within the segment (16 bytes say) and then increment it's "offset address" by that amount.

But this becomes complicated during the multi-threaded case, to save objects used by multiple threads in the Eden space for Thread-Safe, an inevitable lock will occur and the performance will drop due to the lock-contention. The TLAB method helps in this case

TLABs (Thread-Local Allocation Buffers): This allows each thread created in JVM to have its own small portion of the EDEN space. As each thread created can only access to their own TLAB, the memory allocations can be done without a lock

Garbage Collector
The Garbage Collector algorithm is another important place to tune the performance. These determine how the garbage collection process is executed.

A GC can be considered good when it meets certain criteria
When a high throughput is achieved &
When a Small Pause time is achieved

JVM always executes GC in a dedicated Thread called "GC threads”. So when ever GC threads are active, they have to compete with the application threads for Processor and CPU time.

Throughput, the throughput can be the amount of work done by an application as a ratio of time spent in GC

For example, a throughput of 99/100 means that on average the application threads are running 99 out of 100 seconds of program execution time, while the GC threads are only running for one second during the same time span.

This can be managed by ‑XX:GCTimeRatio=99 ; 99 is the default equating to 1% GC time.

The term “pause time” refers to a time span in which the application threads are paused completely in favor of the GC threads. For example, a pause time of 100 milliseconds during a GC means that no application thread was active during that 100 millisecond interval

This can be managed by  ‑XX:MaxGCPauseMillis=<n>.
Garbage collection uses a term "stop-the-world”. This means that when a JVM is doing a Garbage collection, every thread except for the threads needed for the GC will stop their tasks. The interrupted tasks will resume only after the GC task has completed. So when ever a GC is going being performed, all application related threads are kept on hold until the Gc is completed.

The JVM uses a form of garbage collector called a tracing collector, which operates by pausing the world around it, marking all root objects (objects referenced directly by running threads), and following their references, marking each object it sees along the way.

Java implements something called a generational garbage collector based upon the generational hypothesis assumption, which states that the majority of objects that are created are quickly discarded, and objects that are not quickly collected are likely to be around for a while.

Why application Threads need to be stopped?
 A GC requires certain preconditions in order to run safely. For example, it must be guaranteed that application threads do not modify the state of objects while the GC threads try to decide which objects are still referenced and which ones are not. For this reason, the application threads must be stopped during a GC 

Even though GC is good but it causes additional costs for thread scheduling: direct costs through context switches and indirect costs because of cache effects

So a GC comes with negligible cost efforts

Minor and Major GC
This is divided into 2 parts,

Minor GC (Young generation): Most of the objects created in the Young Generation.  These objects are created and then will be disappeared. When objects disappear from this area, we say a "minor GC" has occurred. 

Major GC (Old generation): When Objects that are survived from young generation, they are copied to the Old Generation. As its size gets bigger and bigger a GC Occurs which is called a Major GC that cleans up objects that can be garbage collected.

Gc happened in Perm Space can also be considered as a Major GC.

How can the GC find objects that are eligible to Garbage?
An Object will be considered to garbage when the object is no longer referenced by any pointers from inside the application. Normally a GC iterates over every reachable object and if any objects are left over they are Garbage Collected.

When the Young Generation is full (smaller Pause + more Frequent), a minor collection GC is trigged. This has the small performance impact since it cleans only the smaller memory area.

When the Old generation is full (Bigger Pause + Less Frequent), a major collection GC is trigged. This can lead to a performance impact since it is targeted to the Entire Heap area.

If the major GC fails to free required memory, the JVM increases the current memory to allocate memory for the Object allocation. The whole cycle moves until memory reaches Max Memory Set for the JVM and then we see the Out Of Memory.

What if an object in the old generation needs to reference an object in the young generation?

To handle this case, the Old generation maintains something called ‘Card Table’ which is a 512 byte chunk. So If an object in old Generation refers an object in young generation, this information is recorded in this table. When a GC is started in the young generation, only this card table is searched to determine whether an object is being referenced by something from the Old generation. This eliminated the need to check the object references for all the objects in the old generation to check whether they are accessing something from Young generation. This card table is managed with write barrier. This write barrier is a device that allows a faster performance for minor GC. Though a bit of overhead occurs because of this, the overall GC time is reduced.

GC alogirthms 
There are 5 algorithms
  1. Serial GC
  2. Parallel GC
  3. Parallel Old GC (Parallel Compacting GC)
  4. Concurrent Mark & Sweep GC  (or "CMS")
  5. Garbage First (G1) GC
And these can be further divided into

Collectors operating on young generation

Collectors operating on old generation

Serial Collector (-XX:+useSerialGC)
This is a default copying collector. This collector performs the Garbage collection in a single thread. During this process it stops all other threads. Serial GC may drop the application performance gradually.

This can be used with single processor systems. Since there is no communication overhead between various threads, it cant advantage from the multi processor systems. This GC type was created when there was only one CPU core on desktop computers.

Parallel GC (-XX:+UseParNewGC)
This is similar to the serial collector which is a stop-the-world collector. But this collector parallelizes the copying collection over multiple threads, which is more efficient than the original single-thread copying collector for multi-CPU machines.

Parallel GC (-XX:+UseParallelGC) &  Parallel Old GC  (‑XX:+UseParallelOldGC)
Parallel GC uses multiple Threads for performing the GC and hence it is faster. This GC is useful when there is enough memory and a large number of cores. It is also called the "throughput GC." 

The parallel collector comes in 2 forms, Parallel collector which uses multiple threads to perform minor GC on the young generation and a single thread for performing the major GC on the old generation.

The other one Parallel Old Collector which is a default on from jdk7 uses multiple threads for both minor GC as well as major GC.

This algorithm is considered as the best on a multi processor systems which will give the greatest throughput. For example, batch processing like printing reports or bills or performing a large number of database queries .The below image tells the differences between Serial and Parallel GC

Concurrent Mark & Sweep GC (-XX:+UseConcMarkSweepGC)
This GC method is one of the complex method available now. As we have learned earlier the Throughput Collector always pauses the application threads for some time during the GC process. In Contrast the CMS collector runs along with the application threads and only cause few pause times. This is sometimes called as Concurrent Low Pause Collector.

A GC cycle of the CMS Collector consists of six phases. Four of the phases are run concurrently to the actual application while the other two phases need to stop the application threads.

·         Initial Mark: The application threads are paused and marks all objects that are reachable as live from root objects (GC roots).
·         Concurrent Mark: in this phase all the application threads are restarted again. Objects that are reachable by the objects identified in the first phase are also marked as live
·         Concurrent Pre-clean: in the 3 phase, it checks the objects which has been updated or promoted during the second phase. It also checks for any new objects allocated. This phase runs multiple times and make sure certain amount of memory space is available in Eden. It also checks whether objects are alive or dead.
·         Remark: in the next stage , the application threads are stopped one more time so that a check can be done on objects to find any reference changes happened in the 3 phase. The application threads are stopped ensure a correct view of referenced objects before the actual cleaning takes place
·         Concurrent Sweep: All objects that are not referenced are removed from heap
·         Concurrent Reset: The GC collector does some house keeping jobs to make sure that there is clean state for the next Gc run.

Concurrent Mode Failure
At some points, the CMS collector can’t fill the needs of the application and a Full Gc is needed. This is called Concurrent Mode Failure. This failure occurs when there is not enough space in tenured to promote a Objects

CMS doesn't collect permgen spaces by default, and requires the XX:+CMSClassUnloadingEnabled flag enabled in order to do for the perm space to be gc
Most cases a Full GC is trigged if the flag is not enabled. permgen space can hold references into normal heap via things like class loaders, which means that until you collect Permgen you may be leaking memory in regular heap

A separate article will be available on Garbage G1 collector.

More to Come ,Happy Learning J
Read More

Analyzing a JVM Crash

A crash can cause the JVM to terminate. A Crash or even a fatal error can cause a Crash. A bug in the API, a bug in third party library that we use or bug in a native api can cause these JVM crashes. JVM can crash due to the external factors like resource exhaustion etc.

In this article we will see how we can analyze a JVM crash. Whenever a JVM is crashed a fatal error log file is generated. The file is named as hs_err_pid<pid>.log (where <pid> is the process id of the process). Most cases this file is created in the same working directory and in some other cases the file may be created in the temporary location. The file is created in the /tmp location in Linux and in directory specified by TEMP environment variable in windows.

The format of the error log file will change accordingly.

The header in the log file tells us about the problem. This is printed to standard output. An example log file looks as

# An unexpected error has been detected by Java Runtime Environment:
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x074639a3, pid=1232, tid=9240
# Java VM: Java HotSpot(TM) Client VM (11.0-b16 mixed mode, sharing windows-x86)
# Problematic frame:
# C [jp2iexp.dll+0x39a3]
# If you would like to submit a bug report, please visit:
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.

The first line tells that the JVM crashed due to an Unexpected signal. The second line gives some more details like,
EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x074639a3, pid=1232, tid=9240

EXCEPTION_ACCESS_VIOLATION: This was crashed with an Access violation exception and this is the signal name.
(0xc0000005): Signal Number
0x074639a3: Instruction pointer
pid=1232: Process ID
tid=9240: Thread ID

The next line tells us about the JVM
# Java VM: Java HotSpot(TM) Client VM (11.0-b16 mixed mode, sharing windows-x86)
This says that the JVM is running in Client mode, mixed and sharing mode.

The Problematic frame is the function frame where the error occurred. The next line gives us like
C [jp2iexp.dll+0x39a3]
This is represented as a Library name and offset. It's possible to inspect the instructions that caused the crash without a debugger or core file.  Just use a disassembler to dump instructions near the offset.

In the above line, the “C” is called a Native C Frame. We do have some other frames as
C -Native C frame
J –Other frame types including compiled Java frames
j -Interpreted Java frames
V -VM frames
v –VM generated stub frame

The fatal error log will also give us the Thread details. The thread section in the log file gives us information about the thread that crashed. There may be cases where multiple threads crash, at this point only one thread is printed.

The first part in the thread section tell us
Current thread (0x0d0eb000):  JavaThread "main" [_thread_in_native, id=9240, stack (0x044e0000, 0x046e0000)]

Current thread (0x0d0eb000): The Thread pointer is a pointer to the internal representation to the JVM internal thread structure.
JavaThread "main": The Java Thread that is currently running. This can change accordingly.
_thread_in_native: The Thread State.

Some other thread states include,
_thread_uninitialized: Thread is not created. This should never happen unless there's
memory corruption
_thread_new: Thread has been created but it has not yet started.
_thread_in_native: Thread is running native code. Probably a bug in native code.
_thread_in_vm: Thread is running VM code.
_thread_in_Java: Thread is running (either interpreted or compiled) Java code.
_thread_blocked: Thread is blocked.
..._trans: If you see any of the above states but followed by "_trans", it means
the thread is changing to a different state.

id=9240: The thread ID is the native Thread ID also seen as nid in Thread dumps.

The next line is the signal information like
siginfo: ExceptionCode=0xc0000005, reading address 0x00000000

This indicates that the exception code is 0xc0000005 (ACCESS_VIOLATION), and it occurred when the thread attempted to read address 0x00000000.

The next line after this will be the register information like,
EAX=0x00000000, EBX=0x28108828, ECX=0x004583a8, EDX=0x046dc46c
ESP=0x046dc448, EBP=0x046dc458, ESI=0x00000000, EDI=0x0d0eb000
EIP=0x074639a3, EFLAGS=0x00010246

These register gives us the context of the register at the time of the fatal error and these will change depending on the underlying architecture.

After the register values, you will see the top of stack and the instructions/opcodes near the crashing pc. The error log prints 32 bytes which can be decoded using a disassembler to see the instructions around the location of the crash. These can be seen as
Top of Stack: (sp=0x046dc448)
0x046dc448:   046dc46c 046dc454 28108828 00000000
0x046dc458:   046dc498 0e869d11 0d0eb114 046dc4a0
0x046dc468:   09eaabf8 00000000 0d0eb9ac 0d0eb9a4
0x046dc478:   046dc478 28108828 046dc4ac 28110a90
0x046dc488:   00000000 28108828 00000000 046dc4a8
0x046dc498:   046dc4d4 0e862f07 281104e0 0e8681a6
0x046dc4a8:   09eaabf8 00000000 2718dd98 046dc4b4
0x046dc4b8:   28108797 046dc4dc 28110a90 00000000

Instructions: (pc=0x074639a3)
0x07463993:   33 f6 85 c0 7c 3a 8b 45 fc 8d 55 14 52 89 75 14
0x074639a3:   8b 08 50 ff 91 a0 00 00 00 85 c0 7c 1a 8b 75 14

The next line of information will be the thread stack trace. This includes the addresses of the base and the top of the stack, the current stack pointer, and the amount of unused stack available to the thread. This is followed by the stack frames and up to 100 frames are printed. For C/C++ frames the library name may also be printed.

Stack: [0x044e0000, 0x046e0000], sp=0x046dc448, free space=2033k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [jp2iexp.dll+0x39a3]
j  sun.plugin2.main.server.IExplorerPlugin.getDocumentBase0(J)Ljava/lang/String;+0
j  sun.plugin2.main.server.IExplorerPlugin.getDocumentBase()Ljava/lang/String;+31

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  sun.plugin2.main.server.IExplorerPlugin.getDocumentBase0(J)Ljava/lang/String;+0
j  sun.plugin2.main.server.IExplorerPlugin.getDocumentBase()Ljava/lang/String;+31
If we see we have 2 stack traces one was a Native Stack trace and other was a Java Stack Trace.

The Native Stack trace gives us the native thread information showing all function calls leaving the Java calls. The second stack is Java frames which prints the java frames including the in-lined methods. It skips the native frames. Sometimes according to the crash it will not print the native thread details but can print the java details.

Some of the fatal error log file will generate the Process Information too. The process section gives us information about the process which includes the thread details and memory usage. We can see the information as,

Java Threads: ( => current thread )

Other Threads:
=>0x00007faa08008800 (exited) JavaThread "Unknown thread" [_thread_in_vm, id=12588, stack(0x00007faa0c9d5000,0x00007faa0cad6000)]

Java Threads: ( => current thread )
  0x11726c00 JavaThread "JRE Worker Thread" [_thread_blocked, id=10764, stack(0x123d0000,0x124d0000)]
  0x11726800 JavaThread "JRE Output Reader Thread" [_thread_in_native,

Other Threads:
  0x0d14b800 VMThread [stack: 0x109b0000,0x10ab0000] [id=9664]
  0x0d171c00 WatcherThread [stack: 0x114e0000,0x115e0000] [id=8004]

In the above line we can see, The thread list includes the threads that the VM is aware of. This includes all Java threads and some VM internal threads, but does not include any native threads created by the user application that have not attached to the VM

VM state:not at safepoint (normal execution)
The next line gives us the VM state. This normally tells us the state of the VM. Some normal VM states include
1.    not at a safepoint – normal execution
2.    at safepoint – indicates that all threads are blocked in the VM waiting for a special vm operation to complete
3.    synchronizing – indicates that a special vm operation is required and the VM is waiting for all threads in the vm to block.

VM Mutex/Monitor currently owned by a thread: None
The next line gives us the Mutex/Monitor Information. These are not java Object locks but JVM internal locks. For each lock it prints the name of the lock, its owner, and the addresses of a VM internal mutex structure and its OS lock. In this case the lock is owned by None.
 def new generation   total 960K, used 773K [0x27100000, 0x27200000, 0x27230000)
  eden space 896K,  86% used [0x27100000, 0x271c15f8, 0x271e0000)
  from space 64K,   0% used [0x271e0000, 0x271e0000, 0x271f0000)
  to   space 64K,   0% used [0x271f0000, 0x271f0000, 0x27200000)
 tenured generation   total 4096K, used 0K [0x27230000, 0x27630000, 0x28100000)
   the space 4096K,   0% used [0x27230000, 0x27230000, 0x27230200, 0x27630000)
 compacting perm gen  total 12288K, used 749K [0x28100000, 0x28d00000, 0x2c100000)
   the space 12288K,   6% used [0x28100000, 0x281bb598, 0x281bb600, 0x28d00000)
    ro space 8192K,  63% used [0x2c100000, 0x2c613ae8, 0x2c613c00, 0x2c900000)
    rw space 12288K,  53% used [0x2c900000, 0x2cf683f8, 0x2cf68400, 0x2d500000)

The next section prints about the heap details and they depend upon the GC and memory used.

After the heap details, the next section tells us about the memory regions at the time of crash. A Memory map is also generated which is use-full when debugging some crashes, as it can tell you what libraries are actually being used, their location in memory, as well as the location of heap, stack and guard pages. This memory map section will be different depending on the Operating Systems. On Linux the process memory map (/proc/<pid>/maps) is printed. On Windows, the base and end addresses of each library are printed.

Dynamic libraries:
0x01220000 - 0x012dc000   C:\Program Files (x86)\Internet Explorer\IEXPLORE.EXE
0x777a0000 - 0x77920000   C:\Windows\SysWOW64\ntdll.dll
0x76d70000 - 0x76e80000   C:\Windows\syswow64\kernel32.dll

Or in some cases

Dynamic libraries:
00400000-00401000 r-xp 00000000 fd 04 263750    /software/bin/java
0238e000-023af000 rw-p 00000000 00:00 0                            [heap]
3fae00000-800000000 rw-p 00000000 00:00 0
3be8802000-3be8a02000 ---p 00002000 fd:00 321        /lib64/

The reading of memory map is as follows                      
00400000-00401000: memory region
r-xp: Permission on the region , we have Read,Write,eXecute,Private and Share.
Fd: File Off set number
04:Major ID and minor ID of the device from where the file is located or accessed.
263750: Inode number   
/software/bin/java: File Name

Every library has two virtual memory regions. One for code and one for data. The code
segment is marked with r-xp (readable, executable, private); data segment is rw-p
(readable, writable, private).

The Java heap details can be obtained with the map listed as heap. This gives us the the actual memory regions reserved for heap with attributes rwxp.

Thread stacks usually show up in the memory map as two back-to-back regions, one with
attribute ---p (guard page) and one with rwxp (actual stack space). Of course, knowing the
guard page size or stack size would make the job a bit easier. For example, in this memory
map, 3be8802000-3be8a02000 is stack.

After the memory map details, we can see VM arguments passed,

VM Arguments:
jvm_args: -XX:+AggressiveOpts -XX:+UseLargePages -Xmn10g -Xms26g -Xmx26g
java_command: <unknown>
Launcher Type: SUN_STANDARD

The next section tells us about the Environment variables passed
Environment Variables:

The last section is about the System which gives us the information about the system statistics when the crash occurred like

OS:Red Hat Enterprise Linux Server release 6.4 (Santiago)

uname:Linux 2.6.32-358.23.2.el6.x86_64 #1 SMP Sat Sep 14 05:32:37 EDT 2013 x86_64
libc:glibc 2.12 NPTL 2.12
rlimit: STACK 10240k, CORE 0k, NPROC 32768, NOFILE 8192, AS infinity
load average:0.00 0.01 0.00

MemTotal:        1922716 kB
MemFree:          177136 kB
Buffers:           67908 kB
Cached:           226672 kB
SwapCached:        11612 kB
Active:          1093288 kB
Inactive:         472848 kB
Active(anon):     950380 kB
Inactive(anon):   324624 kB
Active(file):     142908 kB
Inactive(file):   148224 kB
Unevictable:        4208 kB
Mlocked:            4208 kB
SwapTotal:       4194296 kB
SwapFree:        4131324 kB
Dirty:               112 kB
Writeback:             0 kB
AnonPages:       1267744 kB
Mapped:            21132 kB
Shmem:                 0 kB
Slab:             139492 kB
SReclaimable:     111720 kB
SUnreclaim:        27772 kB
KernelStack:        2768 kB
PageTables:         7384 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     5155652 kB
Committed_AS:    2018712 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       12060 kB
VmallocChunk:   34359712480 kB
HardwareCorrupted:     0 kB
AnonHugePages:     75776 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        8180 kB
DirectMap2M:     2088960 kB

CPU:total 1 (1 cores per cpu, 1 threads per core) family 6 model 13 stepping 3, cmov, cx8, fxsr, mmx, sse, sse2, sse3, lzcnt, tsc

processor      : 0
vendor_id      : AuthenticAMD
cpu family     : 6
model           : 13
model name   : QEMU Virtual CPU version (cpu64-rhel6)
stepping        : 3
cpu MHz                  : 2394.026
cache size      : 512 KB
fpu               : yes
fpu_exception : yes
cpuid level     : 4
wp               : yes
flags             : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm up unfair_spinlock pni cx16 hypervisor lahf_lm abm sse4a
bogomips       : 4788.05
TLB size        : 1024 4K pages
clflush size     : 64
cache_alignment       : 64
address sizes : 48 bits physical, 48 bits virtual
power management:

Memory: 4k page, physical 1922716k(177136k free), swap 4194296k(4131324k free)

The output is operating system specific but in general includes the operating system version, CPU information, and summary information about the memory configuration.

In Linux or solaris , the the operating system information obtained by reading /etc/*release.
The uname In linux is called to get information about the kernel and the Thread library

uname:Linux 2.6.32-358.23.2.el6.x86_64 #1 SMP Sat Sep 14 05:32:37 EDT 2013 x86_64
libc:glibc 2.12 NPTL 2.12

On linux there are 3 types of threading libraries which areinuxthreads (fixed stack), linuxthreads (floating stack) and NPTL. In our case it’s the NTPL. Knowing this type is use-full when we see crashes that come due to the threading library.

The rlimit is actually a wrapper for the ulimit command in linux
rlimit: STACK 10240k, CORE 0k, NPROC 32768, NOFILE 8192, AS infinity
load average:0.00 0.01 0.00
The ulimit gives us the information about the limits that are set for various elements. In the above line
Stack: Stack Size (ulimit –s)
CORE: Core dump size(ulimit –c)
NPROC: Maximum user process (Ulimit –u)
NOFILE: Maximum Open files (ulimit –n)

CPU:total 1 (1 cores per cpu, 1 threads per core) family 6 model 13 stepping 3, cmov, cx8, fxsr, mmx, sse, sse2, sse3, lzcnt, tsc

The CPU information is also provided with the fatal error log and last the memory information is provided as
Memory: 4k page, physical 1922716k(177136k free), swap 4194296k(4131324k free)
4K: Page Size
1922716k: Physical memory
177136k: un-used Physical Memory
4194296k: Total Swap memory
4131324k: Un-used Swap Memory

If both physical memory and swap space are almost full, there's good reason to suspect the crash was due to running out of memory.

vm_info: Java HotSpot(TM) Client VM (11.0-b16) for windows-x86 JRE (1.6.0_11-b03), built on Nov 10 2008 02:15:12 by "java_re" with MS VC++ 7.1

The Final Section is the Vm_info.This is a version string embedded in Every JVM has its own unique vm_info string. When you are in doubt about whether the hs_err*.log was generated by a particular JVM, check the version string.

I will be updating this doc with more details on analyzing the error log file.

More to Come, Happy Learning J
Read More