Pages

Tuesday, August 20, 2013

Trouble Shooting : Using Sar Command

Sar is a command available in linux which helps in analyzing various performance bottlenecks. This can also help in analyzing various data while doing a Performance trouble shooting.

Consider our system is compromised of 3 sub systems, CPU , memory and Disk. We need to find out which sub system is the reason for causing the issues.

CPU
(! 1005)-> sar -u 1 1
Linux 2.6.18-348.el5xen (vx181d) 08/18/2013

11:56:55 PM CPU %user %nice %system %iowait %steal %idle
11:56:56 PM all 0.00 0.00 0.00 0.00 0.00 100.00
Average: all 0.00 0.00 0.00 0.00 0.00 100.00

The %user and %system columns simply specify the amount of time the CPU spends in user and system mode. The %iowait and %idle columns are of interest to us when doing performance analysis. The %iowait column specifies the amount of time the CPU spends waiting for I/O requests to complete. The %idle column tells us how much useful work the CPU is doing.

A %idle time near zero indicates a CPU bottleneck, while a high %iowait value indicates unsatisfactory disk performance.

Spending time in %user is expected behavior, as this is where all non-system tasks are accounted for. If cycles are actively being spent in %system then much of the execution time is being spent in lower-level code. If %iowait is high then it indicates processes are actively waiting due to disk accesses being a bottleneck on the system.

Load
(! 1005)-> sar -q 1 1
Linux 2.6.18-348.el5xen (vx181d) 08/18/2013

11:59:19 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
11:59:20 PM 3 482 2.13 1.04 0.03
11:59:21 PM 5 487 2.15 1.04 0.03
11:59:22 PM 6 489 2.16 1.04 0.03

Average: 0 482 0.13 0.04 0.03

sar -q” displays the run Queue length , Total Number of process and the load averages for the last 1 ,5 and 15 minutes.

The System seems to be a little busy since multiple process can be executed at the same time.

If your ldavg-1 column stays consistently high, or continues to rise during this load check, this is an indication that you could have something on the server spiking its usage.

Typically a system's load should remain at 70% of the number of cores or lower. If the system's load is consistently above this amount there may be performance degradation, and if the load ever rises above the number of cores there will be a significant slowdown.

Memory
omhq19e9:dwls990-~ $ sar -r 1 2
Linux 2.6.18-348.4.1.el5 (omhq19e9) 08/19/2013

04:30:20 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
04:30:01 62716272 201531148 76.27 15952 1556572 2048236 12 0.00 4
04:40:03 191904 264055516 99.93 2692 28908 0 2048248 100.00 8496
04:50:14 184100 264063320 99.93 1388 10600 0 2048248 100.00 0

Average: 4415719 259831701 98.33 1357749 20307185 1906978 141270 6.90 297

The Memory Details when taken with 10min gap tell us many details. Linux likes to use memory upto 99% of the Memory.In the above out put we can see that swap is being used extensively.


IO
04:30:00 tps rtps wtps bread/s bwrtn/s
04:30:01 67.95 51.84 16.12 4303.82 4664.14
04:40:03 564.60 227.07 337.52 34338.84 87719.02
04:50:14 51.05 40.25 10.80 1326.32 245.06

Average: 31.12 11.00 20.12 1383.64 3346.65

The number of disk reads and writes will vary based on the underlying hardware; however, we can take a look at what is considered 'normal' for this system by examining the data over a period of time, and then look for spikes. We can see a large spike at 4:40 where the number of reads and writes increases dramatically. Note that shortly after these go back down, indicating that this massive burst was resolved.

The Trouble Shoot Documents will be Updated Continuously for more Tips.

More To Come , Happy learning :-)