CodeSport (October 2011)

Let's optimise a bit...

In this month’s column, let us focus our attention on some of the common compiler optimisations, and on how to write code that can be well optimised by the compiler.

In last month’s column, I had posed a set of questions from C/C++, algorithms and data structures for readers to practice and prepare for interviews. I would like to thank those of you who sent in your answers to the different questions. Two of our readers, R Mohan and C K Jha, answered six of the 15 questions correctly. Congratulations to both of them!

However, since there was no one who answered all the questions correctly, I would like to keep the questions open for this month as well. Please do send me your answers quickly, so that we can discuss the solutions to these questions in next month’s column.

One of our student readers had written to ask me about what happens under the hood in a compiler, and how it optimises the code. The best book to learn all about compilers is Compiler Design by Aho, Sethi and Ullman. Though the book seems quite daunting, with considerable focus on the front-end translation aspects of compilation, it is definitely worth a read.

In this month’s column, we take a quick peek at the various optimisations that are typically performed by the compiler.

What happens during compilation? As most of you would already know, compilers typically take as input a program written in a high-level language, and generate the corresponding machine code, which can run on a specified architecture. For example, if you have a file containing C code, and you compiled it using the open source GCC compiler on your x86 machine, the GCC compiler would generate the corresponding x86 machine code, which will then run on your x86 machine.

Typically, the high-level language program is lowered to the machine code by the compiler. The lowering happens in multiple steps, such as translating the high-level language code to an intermediate representation, which is then lowered to machine code. The following diagram shows the typical organisation of a compiler.

Typical compiler organisation

Typical compiler organisation

From a layman’s perspective, we can consider the compiler as consisting of two pieces, the front-end and the back-end. The front-end is language-dependent, and is different for each high-level language like C, C++ or Fortran. Front-ends take source files in a high-level language and create intermediate code. The back-end of the compiler typically consists of an optimiser, which optimises the intermediate code; and the code generator, which generates the machine code.

So it is possible to have different front-ends sharing a common back-end (which is done in GCC). The back-end needs to be different for each of the different computer architectures for which it generates machine code. For instance, when compiling C programs in GCC on the x86 platform and on the Sun SPARC platform, both compilers will use the same front-end (C), but will have different back-ends for platform- specific machine code.

The optimiser is part of the compiler back-end. Recall that the intermediate code is taken as input by the back-end. Optimisations performed in the back-end can be either machine-independent or machine-dependent. Typically, the optimiser consists of two parts: a high-level optimiser, which performs machine-independent optimisations, and a low-level optimiser, which performs the machine-dependent optimisations.

Next, let us look at a few of the common optimisations performed by most compilers.

Where did all my code vanish?

Consider the following code snippet:

int foo(int x) {
      int i = 9;
      x = 0;
      if (x) {
            i = 13;
      }
      return i;
}

Now, when this function is optimised by the compiler, here is the high-level language version of the machine code that would be generated.

int foo(int x) {
      return 9;
}

The transformation of foo from high-level to low-level, where it simply returns the value 9, involves two important optimisations: constant propagation and dead code elimination. In constant propagation, the compiler determines which of the variables involved in a computation are constant, and then uses the constant values to compute the result. If a variable has a constant value assigned to it, then the constant value is propagated to all uses of that variable.

In the above code snippet, x was assigned the constant value 0. The compiler propagates this value to all uses of x. Thus, the if condition becomes if (0), which always evaluates to false. Now, by dead code elimination, the compiler eliminates the code controlled by the if statement, since the condition always evaluates to false (the block will never be entered). Therefore, the only definition of i that reaches the statement return i is the assignment i = 9. That’s why the compiler again performs constant propagation for the variable i, reducing the function to the single statement return 9.

It is important to know the difference between dead code and useless code. Dead code is code that can never be executed. In the above code example, the statement i = 13 is dead code, since it cannot ever be executed. On the other hand, useless code consists of executable statements, but whose results are not used further in the computation, and hence can be eliminated if it does not have any other side effects.

For example, consider the following code snippet:

int foo(int x)
{
      int i = 9;
      int j = 13;
      int k;
      k = i + j;
      k = k - 1;
      return i;
}

Here, the second to fifth statements are unused code, since their results are not used further in the function. Hence, the compiler can eliminate these useless code statements.

Another important optimisation that compilers perform is redundant computation elimination. Redundant computation is when the result value of a computation is already available at the point of computation. In other words, at a given program point P, an expression X op Y is redundant if there is another evaluation of X op Y that definitely reaches this point P along all paths from the program entry, and there is no re-definition of the operands X and Y from the earlier evaluation of the expression along any of the paths to P.

There are two types of redundant computation elimination. One is fully redundant computation elimination, more popularly known as “common sub-expression elimination”. Another is partial redundant elimination (PRE). Let’s look at examples of each of these optimisations.

if (cond1) {
      // some code
      y = x + 4;
} else {
      // other code
      a = x + 4;
}
z = x + 4;

In the above code snippet, the computation of the expression (x+4) in the statement z = x + 4 is fully redundant since this expression is available along all paths reaching this statement. Therefore, the compiler can eliminate this computation. The modified code after common sub-expression elimination is shown below.

if (cond1) {
      // some code
      t = x + 4;
      y = t;
} else {
      // other code
      t = x + 4;
      a = t;
}
z = t;

However, note that we can apply common sub-expression elimination only if the expression is available along all paths reaching the point P.

Now, can you eliminate the computation of (x + 4) in the statement z = x + 4 in the code snippet shown below?

if (cond1) {
      // some code
      y = x + 4;
} else {
      // other code
}
z = x + 4;

You cannot apply common sub-expression elimination here, since the expression (x + 4) is not available along the path containing the else-part of the if statement reaching the statement z = x + 4.

However, partial redundancy elimination (PRE) can be applied to eliminate this redundant computation. All the optimisations we have discussed here are known as scalar optimisations. Compilers also support optimisations targeted at loops in the code. These are known as loop optimisations. Recall that applications spend most of their time waiting for data from the main memory. In fact, a study of enterprise applications shows that more than 60 per cent of application execution time is lost in data cache miss stall cycles. Therefore, the primary aim of loop optimisations is to reduce data cache misses experienced by an application.

Some of the popular loop optimisations are loop interchange, loop unswitching, loop tiling, loop fission, loop fusion, loop unrolling and strip mining. We will discuss some of the loop optimisations in next month’s column.

Takeaway question for this month

Consider the following code snippet from a C program.

for (int j = 0; j < 20; j++) {
      for (int i = 0; i < 10; i++) {
            a[i][j] = i + j; }
}

Remember that arrays are allocated in row-major form in C applications. This means that the contiguous elements of the row are placed together. How do you rewrite the above code snippet to reduce the data cache misses incurred by this code snippet?

My “must-read” article for this month

The recommendation for this month’s “must-read” article comes from our reader Abhinav Kumar. “When algorithms control the world” is an online article by BBC reporter, Jane Wakefield. Thanks to Abhinav for the recommendation. I read this article and found it to be thought-provoking. Though it was a bit one-sided, focusing more on the overt impact of algorithms on everyday life, it was an interesting read.

If you have a favourite programming book/article that you think is a must-read for every programmer, please do send me a note with the book’s name, and a short write-up on why you think it is useful, so I can mention it in this column. This would help many readers who want to improve their coding skills.

If you have any favourite programming puzzles that you would like to discuss on this forum, please send them to me, along with your solutions and feedback, at sandyasm_AT_yahoo_DOT_com. Till we meet again next month, happy programming, and here’s wishing you the very best!

Feature image courtesy: Per Axbom. Reused under terms of CC-BY-SA 2.0 License.

All published articles are released under Creative Commons Attribution-NonCommercial 3.0 Unported License, unless otherwise noted.
Open Source For You is powered by WordPress, which gladly sits on top of a CentOS-based LEMP stack.

Creative Commons License.