Pages

Monday, March 3, 2014

Uptime explained

When we see high load on a System, we are talking about the load-average of the system. In this article we will see how the load average can be used to identify issues. The load average can be obtained either by running command

omhq199e:dwls999-~ $ uptime
 01:03:24 up 15 days, 14:30,2 users,  load average: 1.30, 1.32, 0.89

Or we can get the same details using the top command too like,

omhq199e:dwls999-~ $ top
top - 01:03:26 up 15 days, 14:30,  2 users,  load average: 1.20, 1.30, 0.88

In both the above cases, the load average can be seen as
load average: 1.20, 1.30, 0.88

These numbers represent my system load average from the last 1, 5 and 15 minutes time. This says how system is handling the process that needs system CPU time for their processing. The says the average number of process that are have to wait for the CPU time during the last 1, 5 and 15 minutes.

If we have a load of 0, it means that the system is idle.
If we have a load of 1, then it means that there is at least 1 process waiting to get the CPU time.
If we have a load of 2, that means that there are multiple process waiting to get the CPU time.

More Detail, Consider that iam a Bridge operator, I want cars on the bridge to move smoothly on the bridge. So if I say I have a 0 load, it means that there are no cars to move on the bridge. The load between ‘0’ and ‘1’ says that the load is normal. If I say I have a Load of ‘1’, it means that I have cars that are moving smoothly and if more cars come the load increases.

If I says that I have a load of ‘2’, it means that there are 2 lanes of cars waiting to cross.

The cars here are similar to Process in Linux. The load will rise when there are process waiting for the CPU time. So the CPU load should ideally stay below 1.0

So is ‘1.00’ is Ideal Load? What is the load that we need to consider as Serious?
The load average value deferrers with CPU Installed and the Core the system has.

For a Single Core System, the load average of normal upto1.0 is considered normal. Similarly for a Duel core system the load average is normal up to 2.0. This means that If  a Quad core System has load average of 4.0 , it is working fine and if it is more than 4.0 , the load is more on that system and we need to find the reasons.

Average to Consider?
When we run the uptime command we see load average for 1,5 and 15 minutes , now which one to consider.

We need to concentrate on the 5 and 15 minute average because if there is a hype in the 1 minute ,it is acceptable but if the average is high for the 5 min or 15 minutes we need to consider it as a server issue and find out the reasons.

How do I Find out How Many Core are available?

[root@vx148d tmp]$ grep 'model name' /proc/cpuinfo | wc -l
2

[root@vx148d tmp]$ egrep -e "core id" -e ^physical /proc/cpuinfo|xargs -l2 echo|sort -u
physical id : 0 core id : 0
physical id : 1 core id : 0


More to Come , Happy Learning