CodeSport (February 2009)

This month’s column focuses on computational complexity and the lower bounds for algorithms. In particular, we’ll show that any algorithm to find the maximum in an array of N elements has a lower bound of O(N) by using an adversary argument.


Thanks to all the readers who sent in their comments on the problems we discussed in last month’s column, in which the takeaway question was a variant of the 3-sum problem. You were given three sets of numbers A, B and C, each containing N numbers. Readers were asked to come up with an algorithm to determine whether there was a triple a € A, b € B and c € C such that a + b = c? It is quite easy to come up with an O (N^2 log N) algorithm. But you need to come up with an O (N2) algorithm for this.

Instead of a separate algorithm for our takeaway problem, we will first look at how we can reduce the 3-sum problem to this. Given a set S of n integers, do there exist three elements {a, b, c} € S such that a + b + c = 0? Let us denote the takeaway problem as PR2 and the 3-sum problem as PR1. Since we know that the 3-sum problem has a lower bound of O (N2), PR2 also is equally hard. Hence, the best we can hope for to solve PR2 is an O (N2) algorithm. Recall that the reduction from a 3-sum problem to PR2 can be given by A=S, B=S and C=-S. Hence, a € A, b € B, c € C, and a solution of PR2 which satisfies a + b = c, will also satisfy PR1. Now, we would like to look at how we can reduce PR1 to PR2; i.e., given a problem instance of PR1, we want to formulate a problem instance of PR2 that can be reduced to PR1. In that case, we can use the solution of the 3-sum problem to solve PR2.

We create one set S such that whenever three elements in S add up to zero, there are three elements a € A, b € B and c € C, such that a+b = c. For the sake of simplicity, let us assume that all elements are positive. We define a value m, which is equal to twice the maximum element among all members of A, B and C. We now try and construct set S from the elements of A, B and C as follows: For each element a € A, we add an element a` = a + m to S. For each element b € B, we add an element b` = b to S. And for each element c € C, we add an element c` = (-c – m) to S. We can see that if (a+b+c) is equal to zero, then (a` + b` + c`) is also equal to zero. In order to construct a solution of PR2 from PR1, we need to show that whenever there are three elements in S that add up to zero, then the three elements came from each of the three different sets A, B and C.

Based on the way we have created a matching element in S for each element in A, B and C, we can see that a` lies between 0.5m to 1.5m; b` lies between 0 to 0.5m; and c` lies between -1.5m to -0.5m. Now consider three elements x, y, z in S such that x + y + z = 0. Now, at the most, one of the elements in (x, y, z) could have been constructed from an element in A since, otherwise, the sum would be at least 2m – 1.5m = 0.5m. Similarly, only one element could have been constructed from C, since otherwise the sum would be -2m + 1.5m = -0.5m. Also note that, out of (x, y, z), one element must come from C, since elements constructed from A and B are always positive and we need at least one negative element to make the sum zero. This negative element can only come from C. Now, if we consider the possibility that one element in (x,y,z) is constructed from C and the other two are constructed from B, then again the sum of the two positive elements is at most m, whereas the element constructed from C is smaller than m, so the sum cannot become zero. This leaves us with the only possibility that each of the elements in (x,y,z) was constructed from a different set. Hence, we have representatives constructed from all three sets A, B and C in (x,y,z). Therefore, we have shown that if we can solve PR1 on the set S that we have constructed, we have a solution for PR2 also.

Now we can show a O(N2) algorithm for solving PR2 as follows: we sort the sets B and C. Consider an element a € A. Now consider the set B + a (the set of all numbers in B such that each number has a added to it). The set (B + a) can be computed in O(N) time. We now traverse the sets (B + a) and C simultaneously to check if there is a common element between (B + a) and C. This can be done in O(N) time since these sets are sorted.

If we find a common element, we have found the triplet (a, b, c) such that a + b = c. Else we repeat the above procedure with the next element in A. Since we may end up repeating the above procedure for each of the elements in A, we end up with an overall complexity of O(N2) for our solution.
Now let us turn our attention to this month’s topic of computational complexity and lower bounds for algorithms.

Computational complexity

Let us start with a simple children’s game. You and your son are playing a number game. You tell your son that he can think of any positive number between zero and one million. You will ask him 20 questions to which he has to answer either “Yes” or “No” truthfully. You challenge your son that at the end of the 20 questions, you will be able to tell your son what number he had originally thought of. Your son is amazed at your ability to read his mind. Now how do you manage to answer correctly? The answer, of course, lies in the divide and conquer approach. With each question, you reduce the number range where the mystery number can possibly lie. For instance, your first question to your son is: “Is the number between 1 and 5,00,000?” An answer of “Yes” reduces your working range from 1 to 5,00,000, whereas an answer of “No” will reduce the answer range to 5,00,001 to one million. You know that you can always find the answer within your allotted 20 questions because one million is less than 2 ^ 20.

Okay, now that you have enthralled your son with your abilities, he asks you a question: Do you need 20 questions to arrive at the correct answer or can you do it with fewer questions? Now you are stumped. You know that the binary search algorithm that you have used has an upper bound of O (logN), and therefore you knew that 20 questions are sufficient to solve the problem. But your son’s question to you now is whether 20 questions are necessary or can you do this with a fewer number of questions no matter what number he thinks up. This is essentially the realm of the computational complexity of problems. For this problem of finding a number in a given range, we can show that the computational complexity has a lower bound of O (logN). That is, no matter what the algorithm you or anyone can else come up with, it will have a lower bound on the complexity as O(log N). If you had known this fact, you can tell your son that 20 questions are, in fact, necessary, for any arbitrary number he can think of. If you decide to challenge him to a context where only 19 questions are allowed, he can always win. I leave the detailed proof of this statement to interested readers since it would be fun to try this game with a friend/family member a few times first before trying to prove the statement rigorously.

Lower bounds of some well-known computational problems

Complexity analysis tries to prove that for a given problem, any algorithm capable of solving the problem correctly on all of its instances, must necessarily take a time that is greater than or equal to the established lower bounds for that problem. Basically, computational complexity theory establishes that there cannot exist an algorithm that can solve the problem in a time lower than the established lower bound, under the specified model of computation. For instance, it is widely known that any sorting algorithm based on comparison for sorting an array of N numbers has a lower bound of O(NlogN). This means that there can exist no sorting algorithm based on the comparison of elements, which can sort faster than this.

There are a number of ways of establishing the lower bounds for any problem. Two of the well-known methods are the decision tree method and adversary argument. The decision tree method is well known and the interested reader can refer to any algorithm book for details on this. Let us take a look at the adversary argument for establishing lower bounds. Let us consider the well-known and simple problem of finding the maximum, given an array of N numbers.

Given below is the code snippet to find the maximum of a given set of N numbers, using only comparison between the numbers:

int find_maximum(int array[], int N)
{
         int max = A[0];
          for (int i = 0; i < N; i++)
          {
                 if (a[i] > max)
                      max =  a[i];
            }
            return max;
 }

It is evident that the algorithm given above uses exactly (N-1) comparisons to arrive at the answer, since the for loop runs for N-1 iterations. Hence the algorithmic complexity is O(N). The interesting question for us is whether it is possible to find the maximum by doing fewer than (N-1) comparisons. We can show the lower bound by using an adversary argument.

The idea behind an adversary argument is that we start off with an unspecified input and whenever the algorithm probes the input, the adversary gives an answer such that the algorithm is forced to work hard. The adversary must be consistent in its answers in the sense that there must exist at least one set of valid inputs on which the algorithm will exactly see the adversary answers matching with that input. The adversary’s goal is to keep the algorithm uncertain of the answer for as long as possible. If the algorithm declares that it has found the answer without proceeding through the established lower bound number of probes, the adversary can always come up with a sequence of inputs that will invalidate the algorithm’s answer.

For instance, consider a trivial example of a sequence of four numbers, namely {1, 2, 3, X} and the algorithm needs to find the maximum. Assume that the algorithm performed two comparisons only, namely, between 1 and 2, and 2 and 3. It skipped the comparison of the current maximum 3 with the last element X in the array and declared the answer as 3. Since the algorithm has not probed A[3], the adversary can fill it up with any value for X, thereby defeating the algorithm. No matter how the algorithm tries, if it does not probe one of the elements of the array, the adversary can fill up that element with a value that will defeat the answer provided by the algorithm. Thus an adversary argument shows that any algorithm for finding the maximum in an array of elements has a lower bound of O(N) comparisons for arriving at the correct answer.

For this month’s takeaway problem, consider the well-known problem of deciding whether a given graph with N nodes is connected. The only question the algorithm can ask the adversary is of the form, “Does an edge exist between vertex u and vertex v?” What is the best lower bound you can establish for this algorithm using an adversary argument?

If you have any favourite programming puzzles that you would like to discuss on this forum, please send them to me, along with your feedback, to sandyasm_AT_yahoo_DOT_com. Till we meet again next month, happy programming!

All published articles are released under Creative Commons Attribution-NonCommercial 3.0 Unported License, unless otherwise noted.
Open Source For You is powered by WordPress, which gladly sits on top of a CentOS-based LEMP stack.

Creative Commons License.