Welcome to CodeSport! In this column, we provide the solutions to a few of the questions we had featured last month.

Last month’s column featured solutions to a medley of questions on computer architecture, operating systems and algorithms. Thanks to our readers M M Sreejith and L Narasimhan for their solutions and inputs.

Some readers had expressed doubts regarding Question 4, since the code snippet did not cause any signal on their system, and had requested me to explain the solution in detail. I had warned you that it was a trick question, intended to deceive the reader into thinking that the code ought to cause a SIGBUS — when, in actual fact, the code is bug-free, and will not cause any signal. Consider the relevant part of the code snippet:

int* array = (int*) malloc(10); array++; *array = 120000;

The variable array points to an array of integers, and is incremented by the operation array++. Consider the general case array = array + x, where x is an integral value (x is 1, in our example). When a pointer variable is added with an integral value, the integral value (namely, 1) is first converted to an address offset, by multiplying it with the size of the base object to which the pointer points.

The base object being an integer in this case, the size is sizeof(integer) — four bytes. Therefore, the pointer variable is incremented by four bytes when the statement array++ executes, and so points to the next integer in the array. Hence, dereferencing this pointer as an integer does not cause any SIGBUS signal, since the address is obviously word aligned.

So when does a SIGBUS signal get generated? Here is a contrived code snippet to illustrate this:

int main(void) { int array[1000]; int* p = (int*)((char*)(array) + 1); *p = 10; printf("%d\n", *p); }

Now, if you compile and execute this code snippet, you would get a SIGBUS. The reason is that here the variable array, which points to the first element of the array, has been cast to a pointer to a char. When the pointer arithmetic is now performed on this pointer variable, the pointer arithmetic treats the variable as pointing to a char object, and therefore increments the address offset by 1 * sizeof (char) (which is equal to 1).

For example, if the variable array contained the address 1000, it now becomes 1001. Therefore, p is now assigned to an address which is not word-aligned, and dereferencing it will result in a SIGBUS.

## This month’s questions

This month, we discuss a couple of graph-related questions. First, we discuss what a bipartite graph is, and the problem of identifying it. Next, we discuss the graph clustering problem.

### Bipartite graphs

A bipartite graph is one in which the vertices can be partitioned into two sets, X and Y, such that all edges of the graph are from vertices in X to vertices in Y. If we are asked to colour such a bipartite graph, we will be able to colour the graph with just two colours — with one colour for all vertices in partition X, and one colour for all vertices in partition Y.

Now, how do we find out whether a given graph is bipartite or not? Before we answer that, let us look at a simpler question — what are the kinds of graphs that can not be bipartite? Also, let’s see if we can identify some common properties that will allow us to decide whether a graph is bipartite or not.

Consider a simple line with two vertices. It is bipartite, since each vertex is in a different partition, and hence can be coloured with two colours. Next, consider a triangle with three vertices, A, B and C. If we colour A with one colour, and B with another, we do not have a colour left for the third vertex, since it is connected to both A and B by an edge.

Hence, a triangle cannot be coloured with two colours, and therefore is not bipartite. A rectangle with four vertices, A, B, C and D, is colourable with two colours. A pentagon with five vertices can not be coloured with two colours.

As we go on increasing the number of vertices, what is the property that results in certain graphs not being two-colourable? Any graph that contains an odd cycle is not two-colourable, since an odd cycle is not two-colourable. Therefore, any graph that contains an odd cycle is not bipartite.

Hence, can there be graphs that do not contain an odd cycle, and are still not two-colourable? We leave it to you to prove that if a graph does not contain an odd cycle, it is bipartite. We now focus our attention on coming up with an algorithm to identify the presence of an odd cycle in a given graph.

To make it simpler, let’s assume that the given graph is connected and that we have chosen an arbitrary starting vertex, s. Also, let’s assume that we have two colours, black and white. Colour the source vertex s, white. Then colour any immediate neighbours of s, black. After that, process each neighbour of s, one vertex at a time, colouring each of their neighbours with white again. Repeat this until the whole graph is coloured.

If the graph is two-colourable, all edges will have each of their vertices in different colours. Hence, the graph is bipartite. If an edge has both its vertices in the same colour (i.e., it is not colourable with two colours), it is not bipartite.

The algorithm described above is nothing but the breadth-first search (BFS) procedure. The only modification is to add a colouring step to the BFS. We colour all nodes that are at an odd number of edges from the root vertex with black, and all nodes that are at an even number of edges from the root, with white. Therefore, deciding whether a graph is bipartite or not can be done with a slight modification to the breadth-first search algorithm. I leave it to the reader to come up with the code.

## This month’s takeaway question

This month’s takeaway question is about a well-known problem — the “clustering of objects”. We are very often given a set of N objects, and have to partition them into k different groups or clusters, based on some property. For instance, we may be given a set of N images, and told to cluster all similar images together, such that we have k clusters of images. All members of the same cluster have similar properties; members of two different clusters should be as dissimilar to each other as possible.

Typically, the similarity/dissimilarity is defined by a distance function between the objects. In graph notation, the objects are represented as points/vertices. We define the spacing to be the minimum distance between any pair of points present in different clusters. Since we want to group similar objects into the same cluster, and dissimilar objects in different clusters, our objective is to maximise the spacing, i.e., maximise the value of d, where d is the minimum distance between any pair of points (x, y), where x X and y Y; and X and Y are two different clusters.

How would we do this?

The naïve approach is to list all possible k-clusterings, and check which maximises the spacing. However, this approach is exponential in complexity, and hence impractical. The question for this month is to come up with a polynomial-time algorithm to perform the k-clustering. Though at first glance the problem seems difficult, I will give you a clue: recall our discussion of Minimum Spanning Trees. Can you apply MST algorithms and come up with a solution to the problem?

## My ‘must read book’ for this month

This month’s suggestion comes from our reader R Srikanth. He has recommended *C Programming Language* by Kernighan and Ritchie. Srikanth says, “There are books on C which contain 500 pages and above, but teach you nothing. But here is a book that is just around 200 pages, yet covers all the concepts of C. More than that, this book teaches you how to write simple but efficient programs.” I completely agree with Srikanth. This book is a must-read for every programmer.

Also, do send me your answers to this month’s questions. If you have any favourite programming puzzles that you would like to discuss on this forum, please send them to me, along with your solutions and feedback, at sandyasm_AT_yahoo_DOT_com. Till we meet again next month, happy programming and wish you the very best!

Feature image courtesy: Tom Magliery. Reused under the terms of CC-BY-NC-SA 2.0 License.