Recently we faced a
issue with one of our tomcat Server. The Server was crashed with Out of Memory error.
We tried to get the server up and running but it did not came up running at any
point.
We then checked the
process table where we,
hello:local-ews $
ps ux
USER PID %CPU %MEM VSZ
RSS TTY STAT START TIME COMMAND
root 6274 4.2 0.0
0 0
?
Zl Sep18 343:53
[java] <defunct>
We see that a
Process 6274 is running from some time. This is the Process ID for the tomcat
instance when running. Now the state of the PID is in Zl State which mean is gone
to a Zombie state.
The defunct process
or zombie process is a process that has completed execution but still has an
entry in the process table. So the process has actually completed the task
and waiting for their parent process to destroy them. This process will not go away
until the parent process collects this child process or destroy them. At this
point our only way was to find the parent process and bounce it. I obtained the
Parent Process ID as
hello:local-IEM-A2
$ ps -l 6274
F S UID PID
PPID C PRI
NI ADDR SZ WCHAN TTY TIME CMD
0 Z 7282 6274
1 4
80 0 - 0 exit
? 344:57 [java]
<defunct>
The parent Process
ID was “1” which says that the defunct process was started from the Parent
Process.
At any point we
cannot kill the defunct process as it is already killed but exists in the
Process table. The only way to clear a defunct process is to kill the parent
process but in this case it is the boot process so we had to bounce the machine
for clearing this process.
The defunct process
occupies very little resources like a slot in the process table and the
resource (timing) information that the parent can ask for.
So what happens
exactly?
UNIX or Linux
maintains a Parent-Child relationship between processes. So a Whenever a Child
process dies, the parent process receives the notification and then it is the responsibility
of the parent process to take notice of the child death by using a Wait()
system call. The return value of this wait() method is the PID of the child.
Once the status is returned the parent process changes the arguments of the
child process to the exit status. The shells like bash will know how to process
following commands and set the special $? Variable. As long as the
parent hasn't called wait(), the system needs to keep the dead child in the
global process list, because that's the only place where the process ID is
stored.
The purpose of the
"zombies" is really just for the system to remember the process ID,
so that it can inform the parent process about it on request. If the parent
"forgets" to collect on its children, then the zombie will stay
undead forever.
This is the reason why we need
to find the parent process and bounce it to clear these defunct processes.
More to Come, Happy
learningJ
what happens if the pid is 0? that means no PID exists. in this case what would be the next step?
ReplyDelete