Wednesday, March 5, 2014

Trouble Shooting High I/O In Linux

I/O analysis in Linux is very important when we see the System is performing slow. First we need to make sure that the system is performing slowly due to the High I/0.

Disk I/O encompasses the input/output operations on a physical disk. If you’re reading data from a file on a disk, the processor needs to wait for the file to be read. During this process, the access time is very important. This is the time taken for a process to be processed on a processor along with all operations done like accessing data from the disk etc.

In this article we will see how we can analyze the Disk I/O performance and trouble shoot the performance Issues.

I/O wait is very important factor when there is a System slowness. I/O Wait is the percentage of time your processors are waiting on the disk.

Finding whether I/O is causing the system Slowness

For making sure that the system slowness is due to the High I/O, we can use “top” command provided by the linux.

When we run the “top” command, we see

top - 04:00:14 up 26 days, 10:17,  2 users,  load average: 4.91, 2.69, 2.65
Tasks: 546 total,   1 running, 542 sleeping,   2 stopped,   1 zombie
Cpu(s): 11.3%us, 0.5%sy, 9.0%ni, 78.9%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st
Mem:  65831008k total, 63023472k used,  2807536k free,  3177444k buffers
Swap:  4194296k total,      208k used,  4194088k free, 10945140k cached

0.0%wa – This is an important value which tells about the amount of time the CPU is waiting for the I/O operation to complete. The higher the value , the more CPU resources are waiting for the I/O operations.

Once we find out the the I/O wait is higher which is causing the System to perform slow we then need to find out the disk that is actually having Issues in performing the I/O operations.

Finding out which Disk is Having Issues in performing I/O Operations

Now we need to make sure which disk is having issues in performing the I/O Operations. For this linux provides us “iostat” command which will analyze and provide us the I/O Statistics.

If we run the iostat command as

Dev:vx1379:hello-~ $ iostat -x 2 5
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                0.83     0.12    0.53       3.01       0.14       95.37

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.04     2.84     0.12   3.37   4.64    45.91    14.50      0.05       15.05  14.14   4.93
xvda1            0.00     0.00     0.00   0.00  0.01     0.00      37.17     0.00       123.64  59.65   0.00

This will display the Current I/O Statistics.
We can also see the %iowait as 3.01 which is same in top command. The above iostat command will run for every 2 seconds and for 5 times.

The first report printed by the above command are the statistics from the systems last reboot and after the first report all reports printed are based on the previous statistics.

The important attribute we need to check is the %util column at last. This tells you how the disk is being utilized and if we see a higher percentage value then the Disk has issues performing the I/O operations.

The iostat command gives a large amount of information on how the Disk is being used like  items such as read and write requests per millisecond(rrqm/s & wrqm/s), reads and writes per second (r/s & w/s) and plenty more

The Other 3 Important columns we need to check are avgqu-sz ,await  and svctm  which are

Service time (storage wait) is the time it takes to actually send the I/O request to the storage and get an answer back – this is the time the storage system (EMC in this case) needs to handle the I/O. It varies between 3.8 and 7 ms on average.

Average Wait is the time the I/O’s wait in the host I/O queue.

Average Queue Size is the average amount of I/O’s waiting in the queue.

Assume a storage system handles each I/O in exactly 10 milliseconds and there are always 10 I/O’s in the host queue. Then the Average Wait will be 10 x 10 = 100 milliseconds.

Now lets analyze the iostat command more deeply

Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.04     2.84      0.12  3.37   4.64    45.91    14.50     0.05       15.05   14.14   4.93

The above is the output the iostat command for a Specific device xvda

r/s and w/s—the number of read and write requests issued by processes to the device
rsec/s and wsec/s — sectors read/written (as each sector is 512 bytes).
rkB/s and wkB/s — kilobytes read/written.
avgrg-sz – average sectors per request and the formula is
(rsec + wsec) / (r + w) = (4.64+45.91)/(0.04+2.84)

If you want it in kilobytes, divide by 2.
If you want it separate for reads and writes — do you own math using rkB/s and wkB/s.

avgqu-sz — average Queue length for the device
await - is the time the I/O’s wait in the host I/O queue.
Util - Percentage of CPU time during which I/O requests were issued to the device. The formula would be

%util = ( r + w ) * svctim /10 = (0.12 + 3.37 ) * 14.14 /10 = 4.93

Now once we are aware of the Disk having issues. We need to find the process that Is causing the High I/O operations.

Find process Causing High I/O

Iotop command provided by linux gives us exactly which process is causing the high I/O. But the command may not be available on many of the linux systems or it may not be installed by the System admin. If we run the command we can see

 Total DISK READ: 0.00 B/s | Total DISK WRITE: 0.00 B/s
  TID PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                           1   be/4  root        0.00 B/s     0.00  B/s       0.00 %    0.00    % init
   2   be/4  root        0.00 B/s     0.00  B/s       0.00 %    0.00    % [kthreadd]

We need to check the “IO>” which tells us about the IO% being used.

But as I Said , we don’t have the command available on most linux machines. For the same data we can use the “ps” command which will be available on all linux machines.

Every process in linux has a State much like

D uninterruptible sleep (usually IO)
R running or runnable (on run queue)
S interruptible sleep (waiting for an event to complete)
T stopped, either by a job control signal or because it is being traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z defunct ("zombie") process, terminated but not reaped by its parent.

So a Process when it is in Uninterruptible sleep ( which is D state ) are said to be waiting for the I/O. Just run the loop on the command line

for x in `seq 1 1 10`; do ps -eo state,pid,cmd | grep "^D"; echo "----"; sleep 5; done

The above loop runs for every 5 seconds for 10 times and will print the process that are in “D” state. Once we narrow out the process which are in “D” state, we can get more details of the process from the /Proc file system

Take the PID of the Process that is in “D” state and check the details like

Dev:vx1379:hello-~ $ cat /proc/<PID>/io
rchar: 58378821
wchar: 25851061
syscr: 154249
syscw: 13221
read_bytes: 32673792
write_bytes: 3541221376
cancelled_write_bytes: 16384

We can see all amounts of the date being read and written.

Finding what files are being written too heavily
Once we obtained the Process that is causing the High I/O , we need to find out what is happening with the Process for performing those oprations.

For this we can use the “lsof” command provided by linux to see what files are currently open and being read/written. We can find out the files that are being held by the PID (process that is in “D” state).

More Details about the I/O is here. Some More Snippets

Get the I/O wait time using
iostat -c|awk '/^ /{print $4}'
3.01

Grep the Top command for the CPU statues
Dev:vx1379:hello-~ $ top -p 1 -b -d 1 | awk '
>     /^top/{a=$0}
>     /^Cpu/{
>         sub(/top - /,"Time:",a);
>         sub(/up.*$/,"",a);
>         printf "%s %s\n",a,$0;a=""}'
Time:04:05:43  Cpu(s):  0.8%us,  0.5%sy,  0.1%ni, 95.4%id,  3.0%wa,  0.0%hi,  0.0%si,  0.1%st
Time:04:05:44  Cpu(s):  0.0%us,  0.5%sy,  0.0%ni, 99.0%id,  0.5%wa,  0.0%hi,  0.0%si,  0.0%st
*********
*********


More to come , Happy learning