A crash can cause
the JVM to terminate. A Crash or even a fatal error can cause a Crash. A bug in
the API, a bug in third party library that we use or bug in a native api can
cause these JVM crashes. JVM can crash due to the external factors like
resource exhaustion etc.
In this article we
will see how we can analyze a JVM crash. Whenever a JVM is crashed a fatal
error log file is generated. The file is named as hs_err_pid<pid>.log (where
<pid> is the process id of the process). Most cases this file is
created in the same working directory and in some other cases the file may be
created in the temporary location. The file is created in the /tmp location in Linux
and in directory specified by TEMP environment variable in windows.
The format of the
error log file will change accordingly.
Header
The header in the
log file tells us about the problem. This is printed to standard output. An
example log file looks as
#
# An
unexpected error has been detected by Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at
pc=0x074639a3, pid=1232, tid=9240
#
#
Java VM: Java HotSpot(TM) Client VM (11.0-b16 mixed mode, sharing windows-x86)
#
Problematic frame:
# C [jp2iexp.dll+0x39a3]
#
# If
you would like to submit a bug report, please visit:
#
http://java.sun.com/webapps/bugreport/crash.jsp
# The
crash happened outside the Java Virtual Machine in native code.
# See
problematic frame for where to report the bug.
#
The first line
tells that the JVM crashed due to an Unexpected signal. The second line gives
some more details like,
EXCEPTION_ACCESS_VIOLATION
(0xc0000005) at pc=0x074639a3, pid=1232, tid=9240
EXCEPTION_ACCESS_VIOLATION:
This was crashed with an Access violation exception and this is the signal
name.
(0xc0000005):
Signal Number
0x074639a3:
Instruction pointer
pid=1232: Process
ID
tid=9240: Thread ID
The next line tells
us about the JVM
#
Java VM: Java HotSpot(TM) Client VM (11.0-b16 mixed mode, sharing windows-x86)
This says that the
JVM is running in Client mode, mixed and sharing mode.
The Problematic
frame is the function frame where the error occurred. The next line gives us
like
C [jp2iexp.dll+0x39a3]
This is represented
as a Library name and offset. It's possible to inspect the instructions that
caused the crash without a debugger or core file. Just use a disassembler to dump instructions
near the offset.
In the above line,
the “C” is called a Native C Frame.
We do have some other frames as
C -Native C frame
J –Other frame types including compiled Java frames
j -Interpreted Java frames
V -VM frames
v –VM generated stub frame
Thread
The fatal error log
will also give us the Thread details. The thread section in the log file gives
us information about the thread that crashed. There may be cases where multiple
threads crash, at this point only one thread is printed.
The first part in
the thread section tell us
Current
thread (0x0d0eb000): JavaThread
"main" [_thread_in_native, id=9240, stack (0x044e0000, 0x046e0000)]
Current thread
(0x0d0eb000): The Thread pointer is a pointer to the internal representation to
the JVM internal thread structure.
JavaThread
"main": The Java Thread that is currently running. This can change
accordingly.
_thread_in_native:
The Thread State.
Some other thread
states include,
_thread_uninitialized:
Thread is not created. This should never happen unless there's
memory corruption
_thread_new: Thread
has been created but it has not yet started.
_thread_in_native: Thread
is running native code. Probably a bug in native code.
_thread_in_vm: Thread
is running VM code.
_thread_in_Java: Thread
is running (either interpreted or compiled) Java code.
_thread_blocked: Thread
is blocked.
..._trans: If you
see any of the above states but followed by "_trans", it means
the thread is
changing to a different state.
id=9240: The thread
ID is the native Thread ID also seen as nid in Thread dumps.
The next line is
the signal information like
siginfo:
ExceptionCode=0xc0000005, reading address 0x00000000
This indicates that
the exception code is 0xc0000005 (ACCESS_VIOLATION), and it occurred when the
thread attempted to read address 0x00000000.
The next line after
this will be the register information like,
Registers:
EAX=0x00000000,
EBX=0x28108828, ECX=0x004583a8, EDX=0x046dc46c
ESP=0x046dc448,
EBP=0x046dc458, ESI=0x00000000, EDI=0x0d0eb000
EIP=0x074639a3,
EFLAGS=0x00010246
These register
gives us the context of the register at the time of the fatal error and these
will change depending on the underlying architecture.
After the register
values, you will see the top of stack and the instructions/opcodes near the
crashing pc. The error log prints 32 bytes which can be decoded using a
disassembler to see the instructions around the location of the crash. These
can be seen as
Top
of Stack: (sp=0x046dc448)
0x046dc448: 046dc46c 046dc454 28108828 00000000
0x046dc458: 046dc498 0e869d11 0d0eb114 046dc4a0
0x046dc468: 09eaabf8 00000000 0d0eb9ac 0d0eb9a4
0x046dc478: 046dc478 28108828 046dc4ac 28110a90
0x046dc488: 00000000 28108828 00000000 046dc4a8
0x046dc498: 046dc4d4 0e862f07 281104e0 0e8681a6
0x046dc4a8: 09eaabf8 00000000 2718dd98 046dc4b4
0x046dc4b8: 28108797 046dc4dc 28110a90 00000000
Instructions:
(pc=0x074639a3)
0x07463993: 33 f6 85 c0 7c 3a 8b 45 fc 8d 55 14 52 89 75
14
0x074639a3: 8b 08 50 ff 91 a0 00 00 00 85 c0 7c 1a 8b 75
14
The next line of
information will be the thread stack trace. This includes the addresses of the base and the top of
the stack, the current stack pointer, and the amount of unused stack available
to the thread. This is followed by the stack frames and up to 100 frames are
printed. For C/C++ frames the library name may also be printed.
Stack:
[0x044e0000, 0x046e0000], sp=0x046dc448, free space=2033k
Native
frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [jp2iexp.dll+0x39a3]
j
sun.plugin2.main.server.IExplorerPlugin.getDocumentBase0(J)Ljava/lang/String;+0
j
sun.plugin2.main.server.IExplorerPlugin.getDocumentBase()Ljava/lang/String;+31
……..
……..
Java
frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j
sun.plugin2.main.server.IExplorerPlugin.getDocumentBase0(J)Ljava/lang/String;+0
j
sun.plugin2.main.server.IExplorerPlugin.getDocumentBase()Ljava/lang/String;+31
…….
If we see we have 2
stack traces one was a Native Stack trace and other was a Java Stack Trace.
The Native Stack
trace gives us the native thread information showing all function calls leaving
the Java calls. The second stack is Java frames which prints the
java frames including the in-lined methods. It skips the native frames. Sometimes
according to the crash it will not print the native thread details but can
print the java details.
Process
Some of the fatal
error log file will generate the Process Information too. The process section
gives us information about the process which includes the thread details and
memory usage. We can see the information as,
Java
Threads: ( => current thread )
Other
Threads:
=>0x00007faa08008800
(exited) JavaThread "Unknown thread" [_thread_in_vm, id=12588,
stack(0x00007faa0c9d5000,0x00007faa0cad6000)]
Java
Threads: ( => current thread )
0x11726c00 JavaThread "JRE 1.6.0.11
Worker Thread" [_thread_blocked, id=10764, stack(0x123d0000,0x124d0000)]
0x11726800 JavaThread "JRE 1.6.0.11
Output Reader Thread" [_thread_in_native,
……
Other
Threads:
0x0d14b800 VMThread [stack:
0x109b0000,0x10ab0000] [id=9664]
0x0d171c00 WatcherThread [stack:
0x114e0000,0x115e0000] [id=8004]
In the above line
we can see, The
thread list includes the threads that the VM is aware of. This includes all
Java threads and some VM internal threads, but does not include any native
threads created by the user application that have not attached to the VM
VM
state:not at safepoint (normal execution)
The next line gives
us the VM state. This normally tells us the state of the VM. Some normal VM
states include
1.
not at a safepoint
– normal execution
2.
at safepoint –
indicates that all threads are blocked in the VM waiting for a special vm
operation to complete
3.
synchronizing –
indicates that a special vm operation is required and the VM is waiting for all
threads in the vm to block.
VM
Mutex/Monitor currently owned by a thread: None
The next line gives
us the Mutex/Monitor Information. These are not java Object locks but JVM
internal locks. For each lock it prints the name of the lock, its
owner, and the addresses of a VM internal mutex structure and its OS lock. In
this case the lock is owned by None.
Heap
def new generation total 960K, used 773K [0x27100000,
0x27200000, 0x27230000)
eden space 896K, 86% used [0x27100000, 0x271c15f8, 0x271e0000)
from space 64K, 0% used [0x271e0000, 0x271e0000, 0x271f0000)
to
space 64K, 0% used [0x271f0000,
0x271f0000, 0x27200000)
tenured generation total 4096K, used 0K [0x27230000,
0x27630000, 0x28100000)
the space 4096K, 0% used [0x27230000, 0x27230000, 0x27230200,
0x27630000)
compacting perm gen total 12288K, used 749K [0x28100000,
0x28d00000, 0x2c100000)
the space 12288K, 6% used [0x28100000, 0x281bb598, 0x281bb600,
0x28d00000)
ro space 8192K, 63% used [0x2c100000, 0x2c613ae8, 0x2c613c00,
0x2c900000)
rw space 12288K, 53% used [0x2c900000, 0x2cf683f8, 0x2cf68400,
0x2d500000)
The next section
prints about the heap details and they depend upon the GC and memory used.
After the heap
details, the next section tells us about the memory regions at the time of
crash. A Memory map is also generated which is use-full when debugging some
crashes, as it can tell you what libraries are actually being used, their
location in memory, as well as the location of heap, stack and guard pages.
This memory map section will be different depending on the Operating Systems. On
Linux the process memory map (/proc/<pid>/maps) is printed. On Windows,
the base and end addresses of each library are printed.
Dynamic
libraries:
0x01220000
- 0x012dc000 C:\Program Files
(x86)\Internet Explorer\IEXPLORE.EXE
0x777a0000
- 0x77920000 C:\Windows\SysWOW64\ntdll.dll
0x76d70000
- 0x76e80000 C:\Windows\syswow64\kernel32.dll
……..
Or in some cases
Dynamic
libraries:
00400000-00401000
r-xp 00000000 fd 04 263750 /software/bin/java
0238e000-023af000
rw-p 00000000 00:00 0 [heap]
3fae00000-800000000
rw-p 00000000 00:00 0
3be8802000-3be8a02000
---p 00002000 fd:00 321 /lib64/libdl-2.12.so
…..
The reading of
memory map is as follows
00400000-00401000: memory region
r-xp: Permission on the region , we have Read,Write,eXecute,Private and Share.
Fd: File Off set number
04:Major ID and minor ID of the device from where the
file is located or accessed.
263750: Inode number
/software/bin/java: File Name
Every library has
two virtual memory regions. One for code and one for data. The code
segment is marked
with r-xp (readable, executable, private); data segment is rw-p
(readable,
writable, private).
The Java heap
details can be obtained with the map listed as heap. This gives us the the
actual memory regions reserved for heap with attributes rwxp.
Thread stacks
usually show up in the memory map as two back-to-back regions, one with
attribute ---p (guard
page) and one with rwxp (actual stack space). Of course, knowing the
guard page size or
stack size would make the job a bit easier. For example, in this memory
map, 3be8802000-3be8a02000
is stack.
After the memory
map details, we can see VM arguments passed,
VM
Arguments:
jvm_args:
-XX:+AggressiveOpts -XX:+UseLargePages -Xmn10g -Xms26g -Xmx26g
java_command:
<unknown>
Launcher
Type: SUN_STANDARD
The next section
tells us about the Environment variables passed
Environment
Variables:
JAVA_HOME=/software/java64/current
PATH=/usr/local/bin:/bin:/usr/bin:/usr/local/.............
LD_LIBRARY_PATH=/software/jboss/ews/current/lib:
SHELL=/bin/bash
System
The last section is
about the System which gives us the information about the system statistics
when the crash occurred like
OS:Red
Hat Enterprise Linux Server release 6.4 (Santiago)
uname:Linux
2.6.32-358.23.2.el6.x86_64 #1 SMP Sat Sep 14 05:32:37 EDT 2013 x86_64
libc:glibc
2.12 NPTL 2.12
rlimit:
STACK 10240k, CORE 0k, NPROC 32768, NOFILE 8192, AS infinity
load
average:0.00 0.01 0.00
/proc/meminfo:
MemTotal: 1922716 kB
MemFree: 177136 kB
Buffers: 67908 kB
Cached: 226672 kB
SwapCached: 11612 kB
Active: 1093288 kB
Inactive: 472848 kB
Active(anon): 950380 kB
Inactive(anon): 324624 kB
Active(file): 142908 kB
Inactive(file): 148224 kB
Unevictable: 4208 kB
Mlocked: 4208 kB
SwapTotal: 4194296 kB
SwapFree: 4131324 kB
Dirty: 112 kB
Writeback: 0 kB
AnonPages: 1267744 kB
Mapped: 21132 kB
Shmem: 0 kB
Slab: 139492 kB
SReclaimable: 111720 kB
SUnreclaim: 27772 kB
KernelStack: 2768 kB
PageTables: 7384 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 5155652 kB
Committed_AS: 2018712 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 12060 kB
VmallocChunk: 34359712480 kB
HardwareCorrupted: 0 kB
AnonHugePages: 75776 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 8180 kB
DirectMap2M: 2088960 kB
CPU:total
1 (1 cores per cpu, 1 threads per core) family 6 model 13 stepping 3, cmov,
cx8, fxsr, mmx, sse, sse2, sse3, lzcnt, tsc
/proc/cpuinfo:
processor : 0
vendor_id : AuthenticAMD
cpu
family : 6
model : 13
model
name : QEMU Virtual CPU version (cpu64-rhel6)
stepping : 3
cpu
MHz : 2394.026
cache
size : 512 KB
fpu : yes
fpu_exception : yes
cpuid
level : 4
wp : yes
flags : fpu de pse tsc msr pae mce cx8
apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx lm up
unfair_spinlock pni cx16 hypervisor lahf_lm abm sse4a
bogomips : 4788.05
TLB
size : 1024 4K pages
clflush
size : 64
cache_alignment : 64
address
sizes : 48 bits physical, 48 bits virtual
power
management:
Memory:
4k page, physical 1922716k(177136k free), swap 4194296k(4131324k free)
The output is
operating system specific but in general includes the operating system version,
CPU information, and summary information about the memory configuration.
In Linux or solaris
, the the operating system information obtained by reading /etc/*release.
The uname In linux is called to get
information about the kernel and the Thread library
uname:Linux
2.6.32-358.23.2.el6.x86_64 #1 SMP Sat Sep 14 05:32:37 EDT 2013 x86_64
libc:glibc
2.12 NPTL 2.12
On linux there are
3 types of threading libraries which areinuxthreads (fixed stack), linuxthreads
(floating stack) and NPTL. In our case it’s the NTPL. Knowing this type is use-full
when we see crashes that come due to the threading library.
The rlimit is
actually a wrapper for the ulimit command in linux
rlimit:
STACK 10240k, CORE 0k, NPROC 32768, NOFILE 8192, AS infinity
load
average:0.00 0.01 0.00
The ulimit gives us
the information about the limits that are set for various elements. In the
above line
Stack: Stack Size (ulimit
–s)
CORE: Core dump
size(ulimit –c)
NPROC: Maximum user
process (Ulimit –u)
NOFILE: Maximum
Open files (ulimit –n)
CPU:total
1 (1 cores per cpu, 1 threads per core) family 6 model 13 stepping 3, cmov,
cx8, fxsr, mmx, sse, sse2, sse3, lzcnt, tsc
The CPU information
is also provided with the fatal error log and last the memory information is
provided as
Memory:
4k page, physical 1922716k(177136k free), swap 4194296k(4131324k free)
4K: Page Size
1922716k: Physical memory
177136k: un-used Physical Memory
4194296k: Total Swap memory
4131324k: Un-used Swap Memory
If both physical memory and swap space are almost full,
there's good reason to suspect the crash was due to running out of memory.
vm_info:
Java HotSpot(TM) Client VM (11.0-b16) for windows-x86 JRE (1.6.0_11-b03), built
on Nov 10 2008 02:15:12 by "java_re" with MS VC++ 7.1
The Final Section is the Vm_info.This is a version string
embedded in libjvm.so/jvm.dll. Every JVM has its own unique vm_info string.
When you are in doubt about whether the hs_err*.log was generated by a
particular JVM, check the version string.
I will be updating
this doc with more details on analyzing the error log file.
More to Come, Happy
Learning J