CodeSport (January 2009)

Thanks to all the readers who sent in their comments about the problems we discussed in the previous issue. Last month’s takeaway question was the birthday problem. As you enter a room with N people, what is the probability that there is someone in the room whose birthday is on the same day as yours? You can assume that there are no leap years and all years have only 365 days. Also assume that it’s equally likely that the birthday is on any day of the year. None of the solutions I received from readers this month had the correct solution. Therefore, we will keep this problem open this month too. Here are some directions to help the reader arrive at the solution.

Let us number the N people in the room as 1,2,3… up to N. We will use the loop index i to iterate over all people in the room and hence i can take values from 1,2,3… up to N. Let ‘bi’ be the day of the year in which the birthday of the person ‘i’ falls. ‘bi’ can take values from 1,2,3,… up to 365. If we assume that a person’s birthday is equally likely to be any day of the year, the probability that a person’s birthday falls on day ‘D’ is given by 1/365. In mathematical terms, we can say that probability of ‘bi’ being equal to ‘D’ is 1/365, for any ‘i’ from 1 to N and for any day from 1 to 365.

We can also assume that the birthdays of two people are independent events, in the sense that selecting ‘D’ as the first person’s birthday does not affect the outcome in the selection of the birthday of the second person. Since the events are independent, the probability that two people ‘i’ and ‘j’ have the same birthday can be calculated by computing these two independent events. Since these are independent, the probability that the birthday of ‘i’ is on day ‘D’ and the birthday of ‘j’ also is on day ‘D’ is nothing but multiplication of these two individual probabilities. This can be written as: Probability of (bi = D) and (bj = D) = 1/365 * 1/365.

Hence the probability that the birthdays of ‘i’ and ‘j’ both falling on any day can be obtained by summing this probability over all possible values for the days. Hence the probability of (bi == bj) = ∑ (1/365) * (1/365) over all days from 1 to 365.

Summing up ((1/365) * (1/365)) over all days from 1 to 365, we find that the probability of (bi == bj) is nothing but 1/365. Let Psame_birthday be equal to the probability that two out of the N people have a matching birthday. We can use the probability we calculated for (bi == bj) to show that √N people need to be present in the room in order for Psame_birthday to be greater than 0.5. I leave it to the reader as an exercise. The hint to the solution is to consider the complementary event E1 of no two people in the room having the same birthday and then finding the probability of (1 – P(E1)) which gives the probability that at least two people in the room have the same birthday.

3-SUM problem—a naive algorithm

Let us get started with this month’s well-known number theoretic problem known as the 3-SUM problem. Given a set of N integers, you are asked to find out whether there are three numbers a, b and c in the set of N numbers whose sum is equal to zero. The simplest algorithm can consider each triple of the numbers at a time, and check whether any triple satisfies the criteria of a+b+c=0. The following is the pseudo-code for this:

for i = 1 to N-2
   for j = i+1 to N-1
      for k = j+1 to N
           if (A[i] + A[j] + A[k] === 0)
                return the triple (A[i], A[j], A[k]);
  if no such triple found,
       return false;

What is the time complexity of the above algorithm? We can see that since there are three for loops running over N, hence the complexity is O(N^3). Can we do better than this? Can we come up with a O(n^2) algorithm for this problem?

O(N^2) solution to the 3-SUM problem

We saw that in the above approach, we picked up triples blindly and hence ended up with O(N^3) complexity. How can we pick up triples more cleverly? Let us first sort the set of input numbers. Now we compare A[1], A[2] and A[N]. If the sum is zero, return the triple. If the sum is greater than zero, we compare A[1], A[2] and A[N-1]. And if the sum is less than zero, we compare A[1], A[3] and A[N]. Basically, since the numbers are sorted in increasing order, if the sum is above zero, we need to reduce the sum by using a smaller operand. If the sum is negative, we need to increase the value of the operand to make the sum zero. Given below is the pseudo code for this problem:

bool  is_3_sum(array A[], N)
{
     for i = 1 to N-2
    {
           j = i +1; k = N
          while (k > j)
         {
                 if (A[i] + A[j] + A[k] == 0)
                      return true;
               else if (A[i] + A[j] + A[k] > 0)
                      k = k – 1;
               else
                      j = j + 1 //case when the sum is less than zero
      }
      return false;
}

What is the complexity of the above algorithm? In the inner loop, during each iteration, we eliminate one element of the array from consideration; hence, the inner loop runs for a maximum of N iterations and the outer loop runs for a maximum of (N-2) iterations. Hence, the overall complexity of the algorithm is O(N ^ 2). Now the interesting question is to determine whether we can come up with a sub-quadratic algorithm for this problem?

Theoretical significance of the 3-SUM problem

Interestingly, all the research so far has not been able to come up with a sub-quadratic algorithm. The best measure of the complexity obtained so far has only been O(N^2). Hence, it is widely believed that there is no sub-quadratic algorithm for this problem. However, this lower bound has not been theoretically established for all general models of computation, so the problem of finding a sub-quadratic solution to the 3-SUM problem still remains an open question in the field of algorithms.

The 3-SUM problem is interesting not just because we have not been able to find a sub-quadratic algorithm, but because many of the computation geometry problems can be reduced to an instance of the 3-SUM problem. There are problems like 3-point collinearity, where, given a set of N points, we need to decide whether any of the three points are collinear. A similar problem is in finding the minimum area triangle formed by three points from a given set of N points.

It can be shown that both these problems can be reduced to a 3-SUM problem in sub-quadratic time. Hence, if a sub-quadratic solution is found for the 3-SUM problem, both 3-point collinearity and the minimum area triangle also can be solved in sub-quadratic time. However, till date, no sub-quadratic solution has been determined for any of these problems.

This month’s takeaway problem

For this month’s takeaway problem, let us consider a variant of the 3-SUM problem. You are given three sets of numbers A, B and C, each containing N numbers. Can you come up with an algorithm to determine whether there is a triple—a € A, b € B and c € C—such that a + b = c. It is quite easy to come up with a O(N^2 log N) algorithm. But you need to come up with a O(N^2) algorithm for this. Here’s a hint: use a variant of the 3-SUM algorithm to solve this problem.

If you have any favourite programming puzzles that you would like to discuss on this forum, please send them to me. Feel free to send your solutions and feedback at sandyasm_AT_yahoo_DOT_com. Till we meet again next month, happy programming!

All published articles are released under Creative Commons Attribution-NonCommercial 3.0 Unported License, unless otherwise noted.
Open Source For You is powered by WordPress, which gladly sits on top of a CentOS-based LEMP stack.

Creative Commons License.