Pages

Friday, September 14, 2012

Resource Management : Strace

Strace is a command available in linux which helps in debugging and troubleshooting the execution of a executable ( command or a process).Strace also tells you what is going with a program at this point. It can tell you what system calls the program is using and whether they pass or fail.
Strace monitors the system calls and signal of a Specific program. This command is mainly helpful when we don't have a Source code and would like to debug the execution of a program.
Strace runs the specified command until it exits. It intercepts and records the system calls which are called by a process and the signals which are received by a process
Basic Example
When you run the strace on the ls (executable) we can see
Dev:vx1379:djbs002-~ $ strace ls
execve("/bin/ls", ["ls"], [/* 39 vars */]) = 0
brk(0) = 0x192d7000
****
****
****

Trace a Specific System calls
We can also use strace to find specific system calls that were made like,

Dev:vx1379:djbs002-~ $ strace -e open ls
open("/software/jboss/ews32/1.0/lib/tls/x86_64/librt.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)

Only Open system calls are shown.

Trace Multiple System Calls
if we want to trace multiple system call we can use,

strace -e trace=open,read ls

A few other Calls include
strace -e trace=set
strace -e trace=open
strace -e trace=read
strace -e trace=file
strace -e trace=process
strace -e trace=network
strace -e trace=signal
strace -e trace=ipc
strace -e trace=desc //descriptors
strace -e read=set
We can also use strace to find the input activity like,
To see all input activity on file descriptors 3 and 5 use
-e read=3,5
To see all output activity on file descriptors 3 and 5 use
-e write=3,5

Number of Calls and Time
We can also use strace to find how many system calls were done and how much time they took like,

Dev:vx1379:djbs002-~ $ strace -c ls
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
nan 0.000000 0 13 read
nan 0.000000 0 6 write
nan 0.000000 0 56 43 open
nan 0.000000 0 14 close
nan 0.000000 0 4 3 stat
nan 0.000000 0 13 fstat
nan 0.000000 0 31 mmap
nan 0.000000 0 13 mprotect
nan 0.000000 0 4 munmap
nan 0.000000 0 3 brk
nan 0.000000 0 2 rt_sigaction
nan 0.000000 0 1 rt_sigprocmask
nan 0.000000 0 2 ioctl
nan 0.000000 0 2 1 access
nan 0.000000 0 1 execve
nan 0.000000 0 1 fcntl
nan 0.000000 0 2 getdents
nan 0.000000 0 1 getrlimit
nan 0.000000 0 1 arch_prctl
nan 0.000000 0 1 futex
nan 0.000000 0 1 set_tid_address
nan 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 173 47 total

A Total of 173 calls were done in which were divided according to the type of calls they were made.

Strace with Time of Day
We can use strace to generate the output along with the Time of the Day also like,

Dev:vx1379:djbs002-~ $ strace -t ls
02:41:10 execve("/bin/ls", ["ls"], [/* 39 vars */]) = 0
02:41:10 brk(0) = 0xa457000

Time Spent
We can also use Strace to find how much time every system call took in executing like,

Dev:vx1379:djbs002-~ $ strace -T ls
execve("/bin/ls", ["ls"], [/* 39 vars */]) = 0 <0.001015>
brk(0) = 0x1a33f000 <0.000053>

What Does a Process Doing at This Point
We can also see what a process is doing at this point like what system calls were being made , time they took etc like,

Dev:vx1379:djbs002-~ $ strace -p 4487 -o ./jhas.txt
Process 4487 attached - interrupt to quit
[ Process PID=4487 runs in 32 bit mode. ]

Now you can open another console to tail the jhas.txt file to see what exactly happening. When you see the txt file we can see

futex(0x7f5ee28c49d0, FUTEX_WAIT, 3039, NULL

This is Just a Tomcat Process waiting for the requests. FuTEX is alocking mechanism in the linux kernel.
Some Other Ways Like,
Find out what Calls were being made to the Catalina file

[root@vx111a bin]# strace ./startup.sh 2>&1 | grep catalina
read(255, "EXECUTABLE=catalina.sh\n\n# Check "..., 1961) = 541
stat("./catalina.sh", {st_mode=S_IFREG|0755, st_size=19877, ...}) = 0
access("./catalina.sh", X_OK) = 0
execve("/soa/apache-tomcat-7.0.42/bin/catalina.sh", ["./catalina.sh", "start"], [/* 45 vars */]) = 0
open("/soa/apache-tomcat-7.0.42/bin/catalina.sh", O_RDONLY) = 3
lstat("/soa/apache-tomcat-7.0.42/bin/catalina.sh", {st_mode=S_IFREG|0755, st_size=19877, ...}) = 0
open("/soa/apache-tomcat-7.0.42/logs/catalina.out", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3

catalina.sh was read from the Current location.

Find out the Specific Calls that we made on the Catalina File

[root@vx111a bin]# strace -e open,access ./startup.sh 2>&1 | grep catalina
access("./catalina.sh", X_OK) = 0
open("/soa/apache-tomcat-7.0.42/bin/catalina.sh", O_RDONLY) = 3
open("/soa/apache-tomcat-7.0.42/logs/catalina.out", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3

The above tip can be used when Ever we Run into a case where a file was opened or not and cant find how to handle this thing.

Network Tracing
We can also use the Strace for network tracing like,

strace -o /tmp/ping -s 512 ping www.google.com > /dev/null 2>&1

[root@vx111a bin]# grep connect /tmp/ping
connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
connect(4, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)

To find the Network calls for a Process
strace -e trace=network -p <PID>

Why does this program not open my file?
$ strace -e open,access 2>&1 | grep your-filename

Look for an open() or access() syscall that fails

What is taking time?
strace -c -p 11084

Can't I connect to that server?
strace -e poll,select,connect,recvfrom,sendto nc www.yahoo.com 80

More To Come , Happy learning :-)