I/O analysis in Linux
is very important when we see the System is performing slow. First we need to
make sure that the system is performing slowly due to the High I/0.
Disk I/O
encompasses the input/output operations on a physical disk. If you’re reading
data from a file on a disk, the processor needs to wait for the file to be read.
During this process, the access time is very important. This is the time taken
for a process to be processed on a processor along with all operations done
like accessing data from the disk etc.
In this article we
will see how we can analyze the Disk I/O performance and trouble shoot the
performance Issues.
I/O wait is very
important factor when there is a System slowness. I/O Wait is the percentage of
time your processors are waiting on the disk.
Finding whether I/O is causing the system Slowness
For making sure
that the system slowness is due to the High I/O, we can use “top” command
provided by the linux.
When we run the “top”
command, we see
top - 04:00:14 up
26 days, 10:17, 2 users, load average: 4.91, 2.69, 2.65
Tasks: 546
total, 1 running, 542 sleeping, 2 stopped,
1 zombie
Cpu(s): 11.3%us, 0.5%sy, 9.0%ni, 78.9%id, 0.0%wa,
0.0%hi, 0.2%si, 0.0%st
Mem: 65831008k total, 63023472k used, 2807536k free, 3177444k buffers
Swap: 4194296k total, 208k used, 4194088k free, 10945140k cached
0.0%wa – This
is an important value which tells about the amount of time the CPU is waiting
for the I/O operation to complete. The higher the value , the more CPU
resources are waiting for the I/O operations.
Once we find out
the the I/O wait is higher which is causing the System to perform slow we then
need to find out the disk that is actually having Issues in performing the I/O
operations.
Finding out which Disk is Having Issues in performing
I/O Operations
Now we need to make
sure which disk is having issues in performing the I/O Operations. For this
linux provides us “iostat” command which will analyze and provide us the I/O Statistics.
If we run the
iostat command as
Dev:vx1379:hello-~
$ iostat -x 2 5
avg-cpu: %user
%nice %system %iowait %steal %idle
0.83
0.12 0.53
3.01 0.14 95.37
Device: rrqm/s wrqm/s
r/s w/s rsec/s
wsec/s avgrq-sz avgqu-sz
await svctm %util
|
xvda 0.04 2.84
0.12 3.37 4.64
45.91 14.50 0.05 15.05
14.14 4.93
|
xvda1 0.00 0.00 0.00 0.00 0.01
0.00 37.17
0.00 123.64
59.65 0.00
|
This will display
the Current I/O Statistics.
We can also see the
%iowait as 3.01 which is same in top command. The above iostat command will run
for every 2 seconds and for 5 times.
The first report
printed by the above command are the statistics from the systems last reboot
and after the first report all reports printed are based on the previous
statistics.
The important attribute
we need to check is the %util column at last. This tells you how the disk is
being utilized and if we see a higher percentage value then the Disk has issues
performing the I/O operations.
The iostat command
gives a large amount of information on how the Disk is being used like items
such as read and write requests per millisecond(rrqm/s & wrqm/s), reads and
writes per second (r/s & w/s) and plenty more
The Other 3
Important columns we need to check are avgqu-sz ,await and svctm
which are
Service time (storage wait) is the time it takes to actually
send the I/O request to the storage and get an answer back – this is the time
the storage system (EMC in this case) needs to handle the I/O. It varies
between 3.8 and 7 ms on average.
Average Wait is the time the I/O’s wait in the host I/O queue.
Average Queue Size is the average amount of I/O’s waiting in the queue.
Assume a storage
system handles each I/O in exactly 10 milliseconds and there are always 10
I/O’s in the host queue. Then the Average Wait will be 10 x 10 = 100
milliseconds.
Now lets analyze
the iostat command more deeply
Device: rrqm/s wrqm/s
r/s w/s rsec/s
wsec/s avgrq-sz avgqu-sz
await svctm %util
xvda 0.04 2.84
0.12 3.37 4.64
45.91 14.50 0.05
15.05 14.14 4.93
The above is the
output the iostat command for a Specific device xvda
r/s and w/s—the number of read and write requests issued by
processes to the device
rsec/s and wsec/s — sectors read/written (as each sector is 512
bytes).
rkB/s and wkB/s — kilobytes read/written.
avgrg-sz – average sectors per request and the formula is
(rsec + wsec) / (r + w) = (4.64+45.91)/(0.04+2.84)
If you want it in kilobytes, divide by 2.
If you want it separate for reads and writes — do you own math using rkB/s and wkB/s.
avgqu-sz — average
Queue length for the device
await - is the time
the I/O’s wait in the host I/O queue.
Util - Percentage of CPU time during which I/O
requests were issued to the device. The formula would be
%util = ( r + w ) * svctim /10 = (0.12
+ 3.37 ) * 14.14 /10 = 4.93
Now once we are aware of the Disk having issues. We need to find the process that Is causing the High I/O operations.
Find process Causing High I/O
Iotop command
provided by linux gives us exactly which process is causing the high I/O. But
the command may not be available on many of the linux systems or it may not be
installed by the System admin. If we run the command we can see
Total DISK READ: 0.00 B/s | Total DISK WRITE:
0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
We need to check
the “IO>” which tells us about the IO% being used.
But as I Said , we don’t
have the command available on most linux machines. For the same data we can use
the “ps” command which will be available on all linux machines.
Every process in
linux has a State much like
D uninterruptible
sleep (usually IO)
R running or
runnable (on run queue)
S interruptible
sleep (waiting for an event to complete)
T stopped, either
by a job control signal or because it is being traced.
W paging (not valid
since the 2.6.xx kernel)
X dead (should
never be seen)
Z defunct
("zombie") process, terminated but not reaped by its parent.
So a Process when
it is in Uninterruptible sleep ( which is D state ) are said to be waiting for
the I/O. Just run the loop on the command line
for x in `seq 1 1
10`; do ps -eo state,pid,cmd | grep "^D"; echo "----";
sleep 5; done
The above loop runs
for every 5 seconds for 10 times and will print the process that are in “D”
state. Once we narrow out the process which are in “D” state, we can get more
details of the process from the /Proc file system
Take the PID of the
Process that is in “D” state and check the details like
Dev:vx1379:hello-~
$ cat /proc/<PID>/io
rchar: 58378821
wchar: 25851061
syscr: 154249
syscw: 13221
read_bytes: 32673792
write_bytes:
3541221376
cancelled_write_bytes:
16384
We can see all
amounts of the date being read and written.
Finding what files are being written too heavily
Once we obtained
the Process that is causing the High I/O , we need to find out what is
happening with the Process for performing those oprations.
For this we can use
the “lsof” command provided by linux to see what files are currently open and
being read/written. We can find out the files that are being held by the PID (process
that is in “D” state).
More Details about
the I/O is here. Some More Snippets
Get the I/O wait time using
iostat -c|awk '/^
/{print $4}'
3.01
Grep the Top command for the CPU statues
Dev:vx1379:hello-~
$ top -p 1 -b -d 1 | awk '
> /^top/{a=$0}
> /^Cpu/{
> sub(/top - /,"Time:",a);
> sub(/up.*$/,"",a);
> printf "%s
%s\n",a,$0;a=""}'
Time:04:05:43 Cpu(s):
0.8%us, 0.5%sy, 0.1%ni, 95.4%id, 3.0%wa,
0.0%hi, 0.0%si, 0.1%st
Time:04:05:44 Cpu(s):
0.0%us, 0.5%sy, 0.0%ni, 99.0%id, 0.5%wa,
0.0%hi, 0.0%si, 0.0%st
*********
*********
More to come ,
Happy learning