Monday, September 2, 2013

Resource Management – IOSTAT ( Disk Statistics )

Analyzing Disk Performances are very Important while troubleshooting a Issue. There are many commands in linux which help a trouble shooting use in identifying various issues. IOSTAT is one such command.

IOSTAT is used to report CPU and I/O statistics. The Command is used to monitor the load of the server Input/Output Devices by observing the time the devices are active compared to the average transfer rate of the devices.

Dev:vx1000a:jbs002-~ $ iostat
Linux 2.6.18-348.el5xen (vx1379) 09/01/2013

avg-cpu: %user %nice %system %iowait %steal %idle
6.00 0.14 0.24 0.64 0.11 92.86

Device:  tps Blk_read/s Blk_wrtn/s Blk_read       Blk_wrtn
xvda      3.28      8.66    39.81        102492582    471202740
xvda1    0.00      0.02    0.00          253902          4814
xvda2    3.28      8.64    39.81        102238528     471197926
dm-0     0.69      0.52     5.31         6209314         62897560
dm-1     0.08      0.04     0.65         462970           7738000
dm-2     0.95      0.26     1.88         3110040         22252328
dm-3     0.00      0.00     0.00         3736              17498
dm-4     0.01      0.07     0.03         801082          344192
dm-5     1.56      4.32     11.59       51129474       137219904
The details provided by the IOSTAT command are from the last reboot.

The Output shows you the transfer rates of all the partitions configured in the system.

avg-cpu: %user %nice %system %iowait %steal %idle
               6.00      0.14   0.24         0.64     0.11    92.86

These columns tells us about the cpu utilization that occurred when user code (applications) were executing, and at the user level with nice set , then the system(kernel) level.

The last two columns show the percentage of time the CPU was idle while it had an outstanding disk I/O request and while it did not have an outstanding disk I/O request.

The Next Lines gives us the information like,

  • device: The device name as listed in the /dev directory is displayed.  These device names are mapped to mount points in the file /etc/fstab and are also listed in the output of the df command.
  • tps: The number of transfers (I/O requests) per second issued to the device.
  • blk_read/s: The number of blocks per second read from the device.
  • blk_wrtn/s: The number of blocks per second written to the device.
  • blk_read: The total number of blocks read.
  • blk_wrtn: The total number of blocks written.

This information can assist in the determination of which devices are more heavily used than others and perhaps help with the determination of how to better distribute data to balance the workload.

We can use iostat command which gives us the output for every few seconds like(as previous command gives information since the last reboot)

iostat -xtc 5 3

Gives 3 outputs for every 5 seconds.

We can information about a particular device like,

Dev:vx1000a:jbs002-~ $ iostat -p dm-12
Linux 2.6.18-348.el5xen (vx1379) 09/01/2013

avg-cpu: %user %nice %system %iowait %steal %idle
6.00 0.14 0.24 0.64 0.11 92.86

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
dm-12 0.01 0.00 0.02 3304 290168

If we need an extended report we can use,

Dev:vx1000a:jbs002-~ $ iostat -x
Linux 2.6.18-348.el5xen (vx1379) 09/01/2013

avg-cpu: %user %nice %system %iowait %steal %idle
6.00 0.14 0.24 0.64 0.11 92.86

Device: rrqm/s wrqm/s r/s     w/s    rsec/s  wsec/s  avgrq-sz avgqu-sz await   svctm    %util
Xvda    0.07    2.63     0.22    3.06  8.66    39.81    14.77     0.08      23.76    4.41      1.45
xvda1   0.00    0.00    0.00    0.00   0.02    0.00     43.29     0.00      17.18    7.09      0.00
xvda2   0.07    2.63     0.22   3.06   8.64    39.81    14.76    0.08       23.76    4.41     1.45
dm-0    0.00    0.00     0.02   0.66   0.52    5.31      8.48      0.02      33.43     2.88     0.20
dm-1    0.00    0.00     0.00   0.08   0.04    0.65      8.40      0.00      18.97     8.79     0.07
dm-2    0.00    0.00     0.00   0.00   0.00    0.00      2.07      0.00      33.18     8.49      0.00

The columns explain,

rrqm/s : The number of read requests merged per second that were queued to the hard disk

wrqm/s : The number of write requests merged per second that were queued to the hard disk

r/s : The number of read requests per second

w/s : The number of write requests per second

rsec/s : The number of sectors read from the hard disk per second

wsec/s : The number of sectors written to the hard disk per second

avgrq-sz : The average size (in sectors) of the requests that were issued to the device.

avgqu-sz : The average queue length of the requests that were issued to the device

await : The average time (in milliseconds) for I/O requests issued to the device to be served. This
includes the time spent by the requests in queue and the time spent servicing them.

svctm : The average service time (in milliseconds) for I/O requests that were issued to the device

%util : Percentage of CPU time during which I/O requests were issued to the device (bandwidth
utilization for the device). Device saturation occurs when this value is close to 100%.

Which To Analyze
The average service time (svctm)
Percentage of CPU time during which I/O requests were issued (%util)
See if a hard disk reports consistently high reads/writes (r/s and w/s)

If any one of these are high, you need to take one of the following action:

Get high speed disk and controller for file system
Tune software or application or kernel or file system for better disk utilization
Use RAID array to spread the file system

Dev:vx1000a:jbs002-~ $ iostat -c
Linux 2.6.18-348.el5xen (vx1379) 09/01/2013

avg-cpu: %user %nice %system %iowait %steal %idle
      6.0       0.14    0.24        0.64     0.11     92.86
The Major Columns we need to look are,

Iowait : The first thing you should look at is iowait. If you have a high percentage of CPU time idle while it’s waiting on disk I/O, that’s a good indicator that you have an I/O bottleneck.

Moving on to the device section, you should be able to easily see how I/O is being distributed between disks. Do you have a lot of activity on one disk while another one is sitting idle? If so, you should see if you can move some of the activity from the active disk to the idle disk. You may have a case where all of your available disks are being utilized or you can’t evenly distribute the load among the existing disks. In that case, you need to either add additional disks (if you have the capacity) or replace the current disks with ones that have a faster spindle speed, higher throughput, and lower seek times.

Once you are comfortable with iostat you can use the –x parameter to get useful information such as average request size, average wait time for requests and average service time for requests.
With a little work, iostat allows you to identify I/O bottlenecks and lead you to potential solutions. The numbers may seem overwhelming at first, but with some patience, you’ll be able to use iostat productively in no time. Happy hunting.

Display device and NFS statistics with iostat.
# iostat –n


More to Come , Happy learning :-)