In last month’s column, I had presented a coding snippet (shown below) and had asked readers how it could be rewritten to reduce the data cache misses associated with that code region:
for (int j = 0; j < 20; j++) { for (int i = 0; i < 10; i++) { a[i][j] = i + j; } }
Congrats to our readers Babu Prasad, Vishuprakash and Sudhanshu Gupta who emailed me with the correct solution! As they pointed out in their email, the matrix is allocated in row-major form, but the data access pattern is in column-major form. Hence, the code snippet suffers from considerable data cache misses. Therefore, the inner and outer loops need to be inter-changed in order that the data access pattern for the matrix is the same as that of the allocation pattern. The modified code is as follows:
for (int i = 0; i < 10; i++) { for (int j = 0; j < 20; j++) { a[i][j] = i + j; } }
This transformation is performed by compilers automatically, and is known as a loop interchange. However, one needs to remember that it may not always be possible to interchange the inner and outer loops, due to dependencies in the code. Loop interchange is illegal if it changes the dependencies that were present in the original code.
Consider the following piece of FORTRAN code (in FORTRAN, arrays are allocated in column-major form):
do i = 1,n do j = 1,n C(i,j) = C(i+1,j-1) enddo enddo
Assume that in order to improve locality, the compiler performs loop interchange, and generates the following code:
do j = 1,n do i = 1,n C(i,j) = C(i+1,j-1) enddo enddo
Can you figure out why this transformation is illegal in this example? When the value of I
is 1 and that of j
is 2, we have the assignment: C(1,2) = C(2,1)
, where C(2,1)
is the original value from the matrix which has not yet been assigned in the loop.
However, in the interchanged case, we have the iteration corresponding to (I = 2, j = 1)
occurring earlier than the iteration corresponding to (I =1, j = 2)
. In the iteration corresponding to (I = 2, j = 1)
, the matrix element C(2, 1)
is assigned a new value, and in the iteration corresponding to (I = 1, j = 2)
, it is the modified value of C(2, 1)
, which is now assigned to C(1, 2)
.
Hence, the results computed by the original code and the modified code, where the loops were interchanged, would be different — and hence it is illegal to apply this transformation. Therefore, loop interchange can be applied only if all the original dependencies that were present are completely preserved after the transformation.
Now, let us discuss some of the other loop transformations performed by the compiler.
Loop fusion transformation
Consider the following piece of code:
for (int i = 0; i < N; ++i){ a[i] = b[i]; } for (int i = 0; i < N; ++i){ c[i] = b[i] + d[i]; }
These two loops appear immediately after one another in the original source code, and they run over the same values of the loop index. Assume that in our architecture, the cache sizes are small, so that the entire array b
cannot be held in cache; instead, only a few elements of b can be present at a time in the cache. Because of the limited size of the cache, none of the elements b[i]
exhibit cache reuse in the second loop; they need to be re-fetched into cache. This leads to a considerable number of cache misses for this code region.
In order to improve the cache re-use for the array b
, the compiler performs a transformation known as loop fusion, wherein it fuses the two loops into a single loop as shown below:
for (int i = 0; i < N; ++i){ a[i] = b[i]; c[i] = b[i] + d[i]; }
By fusing the two loops, the array lookup b[i]
exhibits cache reuse in the second statement; it reuses values fetched into cache in the first statement.
However, as in the case of loop interchange, performing loop fusion is not always legal. There are certain conditions which must be satisfied by the two loops before they can be fused. For instance, loop fusion needs to preserve all the data dependencies that were present in the original code. Consider the following piece of FORTRAN code. Can you explain why it is not legal to fuse the two loops?
L1: do i=2,n−1 a[i] = ... b[i] = ... end do L2: do i=2,n−1 ... = a[i+1] ... = b[i−1] end do
Loop fission is another loop transformation, which performs just the opposite of loop fusion — it takes a single loop and breaks it into two. Can you think of the situations in which loop fission is beneficial to performance?
Remembering Dennis Ritchie
Last month saw the loss of the programming Guru Dennis Ritchie. Ritchie was instrumental in the development of the “C” language and the UNIX operating system. As a way of expressing our gratitude to him, here are five questions on UNIX for our readers:
- How are pipes implemented in UNIX?
- What is a daemon process in UNIX?
- What is an Inode?
- Was the first version of unix written by Ritchie single threaded or multi-threaded?
- Can you write a shell command to find all files in your file systems mounted on your machine whose setuid bit is set?
Takeaway question for November
Consider the following code snippet from a C program:
int** two_dim_array = (int*) calloc (sizeof (int*) * N); for (int j = 0; j < N; j++) two_dim_array[j] = calloc(sizeof(int) * N); ….. // read in input values for the elements of two_dim_array[i][j] for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) result + = Two_dim_array[i][j];
Remember that arrays are allocated in row-major form in C applications — that is, the contiguous elements of the row are placed together. How do you rewrite the above code snippet to reduce the data cache misses incurred by this code snippet?
My must-read book for November
This month’s must-read book suggestion comes from our reader Tanvir Haveri, who has suggested the book The Art of Unix Programming by Eric Steven Raymond.
As per Tanvir, “This book discusses the philosophy behind the way Unix is designed, and the one which should be followed while we develop applications for Unix. It elaborates the strength of Unix as a development platform, and provides crucial insights on varied topics, right from optimisation to complexity. The book also discusses interesting case studies on real-world applications. It will help the reader absorb the Unix philosophy, and make Unix development a pleasure.”
Thank you, Tanvir, for your suggestion! I bought a copy yesterday, and am planning to go through it over the next few weeks.
Also, one of our readers, Sanjay Chigurupati, had written in, suggesting the website unix.org for more information on Unix specifications. Thank you, Sanjay!
Another sad loss for software folks last month was the death of Steve Jobs of Apple Inc. Steve was the role model for millions of software and hardware developers. He was not only a great designer, but he had an attitude to life that is worth reading about. One of my favourite quotes from Steve Jobs is: “Your work is going to fill a large part of your life, and the only way to be truly satisfied is to do what you believe as great work. And the only way to do great work is to love what you do. If you haven’t found it yet, keep looking. Don’t settle for anything less.”
I would like to reiterate to newbie software developers: “If writing code does not make your heart beat faster, don’t stay in this field for the wrong reasons, such as for money or peer pressure. You cannot always measure the fun you have in life in dollars. And believe me, being cool is doing the best in whatever career you choose — be it a programmer, painter or plumber.”
I would like to suggest our readers take a look at Steve’s 2005 Stanford commencement address. It is definitely worth reading!
If you have a favourite programming book/article that you think is a must-read for every programmer, please do send me a note with the book’s name, and a short write-up on why you think it is useful, so I can mention it in the column. This would help many readers who want to improve their coding skills.
If you have any favourite programming puzzles that you would like to discuss on this forum, please send them to me, along with your solutions and feedback, at sandyasm_AT_yahoo_DOT_com. Till we meet again next month, happy programming, and here’s wishing you the very best!