How Do You Know When to Worry About Linux CPU Load

How do you know when to worry about Linux CPU Load? You might already know what a Linux load average is. The three numbers that appear with the uptime and top commands are load averages. They look like this:

Most people have an idea what the load averages mean: the three numbers represent averages over increasingly longer periods of time (one-minute, five-minute, and fifteen-minute averages), and lower numbers are better. Higher numbers mean that there is a problem or that the machine is too busy. But where do you draw the line? What do “good” and “bad” average load values mean? When should you worry about a load average value, and when should you fix it as soon as possible?

First, let’s talk about what the Linux +  load average values mean in general. We’ll start with the easiest case, which is a machine with only one core.

The comparison to traffic

A single-core CPU is like a road with only one lane. Imagine you’re in charge of a bridge. Sometimes it’s so busy that cars have to wait in line to cross. You want to let people know how the traffic on your bridge is moving. How many cars are waiting at a certain time would be a good metric. If there are no cars waiting, cars coming from the other direction know they can cross right away. If there is a line of cars, drivers know they will have to wait.

So, Bridge Operator, how are you going to number things? Why not:

  • 0.00 means that the bridge is not being used at all. In fact, if the number is between 0.00 and 1.00, there is no back up and a car coming in will just go on.
  • 1.00 means that the bridge is at its fullest. Still, everything is fine, but if there’s more traffic, things will slow down.
  • over 1.00 means there are people waiting. How much? Well, 2.00 means that there are a total of two lanes of cars: one lane on the bridge and one lane waiting. 3.00 means that there are a total of three lanes: one lane on the bridge and two lanes waiting. Etc.

This is what CPU load is all about. “Cars” are processes that are “crossing the bridge” or waiting in line to use the CPU. Unix calls this the run-queue length, which is the number of processes that are running plus the number that are waiting (queued) to run.

Like the person who runs the bridge, you want your cars/processes to never have to wait. So, you should try to keep your CPU load below 1.00. Also, like the bridge operator, you’re still fine if you have a few temporary spikes above 1.00, but if you stay above 1.00 for a long time, you need to worry.

So you’re saying that 1.00 is the best load?

Nope, not really. When you have a load of 1, there is no room for your head. In practice, most system administrators will stop at 0.70:

  • Rule of Thumb for “Need to Look into it”: 0.70 If your average load stays above 0.70, you should look into what’s going on before things get worse.
  • The rule of thumb for “Fix this now” is 1.00. If your average load stays above 1, find out why and fix it right away. If you don’t, you’ll be woken up in the middle of the night, which won’t be fun.
  • “Argh, it’s 3 AM, what the heck?” 5.0 is a rule of thumb. If your load average is above 5, you could be in big trouble. This means that your box is either stuck or moving very slowly. This will happen at the worst possible time, like in the middle of the night or when you’re giving a presentation at a conference. Don’t let it reach that point.

What about more than one processor? My load says 3, but everything is fine.

Got a quad-processor system? With a load of 3.00, it’s still healthy.

On a system with more than one processor, the load is proportional to the number of processor cores. On a single-core system, the “100% utilization” mark is 1, on a dual-core system it is 2, on a quad-core system it is 4, etc.

Using the bridge as an example again, the “1.00” really means “one lane of traffic.” On a bridge with only one lane, that means it’s full. On a two-lane bridge, a load of 1.00 means that it’s at 50% capacity, which means that only one lane is full and another can be filled.

Same thing with CPUs: a load of 1.00 means that a single-core box is using all of its CPU. On a computer with two cores, a load of 2.00 means that the CPU is being used 100% of the time.

Multicore vs. multiprocessor

Let’s talk about the difference between a multicore and a multiprocessor while we’re at it. Is a machine with a single dual-core processor basically the same as a machine with two processors, each with one core, in terms of how well it works? Yes. Roughly. There are a lot of subtleties here, like the amount of cache, how often tasks are passed from one processor to another, etc. Even though these details are important, what matters for figuring out the CPU load value is the total number of cores, not how many physical processors those cores are spread across.

So, there are two new rules of thumb:

  • “Max load = number of cores” Rule of thumb: On a system with multiple cores, your load shouldn’t be more than the number of cores you have.
  • “Cores are cores” Rule of thumb: It doesn’t matter how the cores are spread out on CPUs. Two quad-cores equals four dual-cores, which adds up to eight single-cores. For these purposes, all eight cores are needed.

If you want to learn more about the linux certification stay connected?

Leave a Reply