Tuesday, September 13, 2011

Median of 2 sorted arrays of equal size n

Question 

There are 2 sorted arrays A and B. Write an algorithm to find the median of the array obtained after merging the above 2 arrays(i.e. array of length 2n). The complexity should be O(log(n))

Solution

Before going to solution what is the median.

What is Median?

Median: In probability theory and statistics, a median is described as the number separating the higher half of a sample, a population, or a probability distribution, from the lower half.
The median of a finite list of numbers can be found by arranging all the numbers from lowest value to highest value and picking the middle one.


For getting the median of input array { 12, 11, 15, 10, 20 }, first sort the array. We get { 10, 11, 12, 15, 20 } after sorting. Median is the middle element of the sorted array which is 12.
Input Array - {12,11,15,10,20}
Sorted array - { 10, 11, 12, 15, 20 }
Median - 12

There are different conventions to take median of an array with even number of elements, one can take the mean of the two middle values, or first middle value, or second middle value.

Let us see different methods to get the median of two sorted arrays of size n each. Since size of the set for which we are looking for median is even (2n), we are taking average of two middle two numbers in all below solutions.

Method 1 - Simply count while Merging(Brute force)
Use merge procedure of merge sort. Keep track of count while comparing elements of two arrays. If count becomes n(For 2n elements), we have reached the median. Take the average of the elements at indexes n-1 and n in the merged array. See the below implementation.

int getMedian(int arr1[], int arr2[], int n)
{
 int i = 0;  // Current index of array arr1[] 
 int j = 0;  // Current index of array arr2[] 
 int count;
 int m1 = -1, m2 = -1;

 //Since there are 2n elements, median will be average
 // of elements at index n-1 and n in the array obtained after
 // merging ar1 and ar2 
 for(count = 0; count <= n; count++)
 {
  //Below is to handle case where all elements of arr1[] are
  //  smaller than smallest(or first) element of arr2[]
  if(i == n)
  {
   m1 = m2;
   m2 = arr2[0];
   break;
  }

  //Below is to handle case where all elements of arr2[] are
  //  smaller than smallest(or first) element of arr1[]
  else if(j == n)
  {
   m1 = m2;
   m2 = arr1[0];
   break;
  }

  if(arr1[i] < arr2[j])
  {
   m1 = m2;  //Store the prev median
   m2 = arr1[i];
   i++;
  }
  else
  {
   m1 = m2;  //Store the prev median 
   m2 = arr2[j];
   j++;
  }
 }    

 return (m1 + m2)/2;
}    

// Driver program to test above function 
int main()
{
 int arr1[] = {1, 12, 15, 26, 38};
 int arr2[] = {2, 13, 17, 30, 45};

 printf("%d", getMedian(ar1, ar2, 5)) ;

 getchar();
 return 0;
}

Time Complexity - O(n)

Dry running the algorithm
A1 = { 1,12,15,26,38}
A2 = {2,13,17,30,45}

Initially i=0,j=0,m1=m2=-1

So, we reach arr1[i] < a[j] i.e. arr1[0]<arr2[0] which is true.

Method 2 - By comparing the medians of two arrays(Divide and Conquer)
This method works by first getting medians of the two sorted arrays and then comparing them.

Here is an exemple of how it works. Imagine we have these two sorted lists:
l1 = [1, 4, 5, 7, 7, 8, 9, 9] 
l2 = [0, 1, 3, 6, 6, 6, 7, 8]

We calculate the median of both lists:
l1 median = 7
l2 median = 6

l1 median is greater than l2 median so we can safely remove the numbers on the right side of the median of l1 because the final median is somewhere between the median of both. We also have to remove the same amount of numbers at the left side of l2 to keep the stability of the final list. If one of the list is shorter the amount of removed numbers will the minimum between both median index. So we get:
l1 = [1, 4, 5, 7, 7]
l2 = [6, 6, 6, 7, 8]

We continue the process until have a list with 2 numbers (or less).
l1 median = 5
l2 median = 6

We remove 2 from left side of l1, we remove 2 from right side of l2.
l1 = [5, 7, 7]
l2 = [6, 6, 6]

l1 median = 7
l2 median = 6

We remove 1 from left side of l2, we remove 1 from right side of l1.
l1 = [5, 7]
l2 = [6, 6]

Remember that we have 2 list of the same size. In another case one could be an order of magnitude bigger than the other. Finaly we have to deal with some special case:

  1. The 2 remaining numbers have to be inserted on the left side of the median of the other list.
  2. The 2 remaining numbers have to be inserted on the right side of the median of the other list.
  3. One have to be inserted on the left side of the median and the other on the right side.

For each case we have to check if the number could change the median of the resulting list. If we decide to insert l1 into l2, there will be no implication on the initial median value of l2. But if we decide to insert l2 into l1, the median of l1 will be completly changed. In both case we optain this list [5, 6, 6, 7] and calcuate the median of it : 6.
Inserting the two remaining value into the other list is not the fastest thing possible but it's simple and understandable.

Algorithm (Divide and Conquer):

1) Calculate the medians m1 and m2 of the input arrays ar1[]
   and ar2[] respectively.
2) If m1 and m2 both are equal then we are done.
     return m1 (or m2)
3) If m1 is greater than m2, then median is present in one
   of the below two subarrays.
    a)  From first element of ar1 to m1 (ar1[0...|_n/2_|])
    b)  From m2 to last element of ar2  (ar2[|_n/2_|...n-1])
4) If m2 is greater than m1, then median is present in one
   of the below two subarrays.
   a)  From m1 to last element of ar1  (ar1[|_n/2_|...n-1])
   b)  From first element of ar2 to m2 (ar2[0...|_n/2_|])
5) Repeat the above process until size of both the subarrays
   becomes 2.
6) If size of the two arrays is 2 then use below formula to get
  the median.
    Median = (max(ar1[0], ar2[0]) + min(ar1[1], ar2[1]))/2

To sum up, actual median M will lie between m1 and m2. If m1 & gt; m2 => M lies between m2 to m1, so all elements less than m1 but more than m2, hence first half of ar1 and 2nd half of ar2. Likewise for m2>m1.

Time Complexity = O(lg n)

Here is the code:
// This function returns median of ar1[] and ar2[].
// This function returns median of ar1[] and ar2[].
//   Assumptions in this function:
//   Both ar1[] and ar2[] are sorted arrays
 //Both have n elements 
int getMedian(int ar1[], int ar2[], int n)
{
  int m1; // For median of ar1 
  int m2; // For median of ar2 
 
  // return -1  for invalid input 
  if(n <= 0)
    return -1;
 
  if(n == 1)
    return (ar1[0] + ar2[0])/2;
 
  if (n == 2)
    return (max(ar1[0], ar2[0]) + min(ar1[1], ar2[1])) / 2;
 
  m1 = median(ar1, n); // get the median of the first array 
  m2 = median(ar2, n); // get the median of the second array 
 
  // If medians are equal then return either m1 or m2 
  if(m1 == m2)
    return m1;
 
  // if m1 < m2 then median must exist in ar1[m1....] and ar2[....m2] 
  if (m1 < m2)
    return getMedian(ar1 + n/2, ar2, n - n/2);
 
  // if m1 > m2 then median must exist in ar1[....m1] and ar2[m2...] 
  return getMedian(ar2 + n/2, ar1, n - n/2);
}
 
// Driver program to test above function 
int main()
{
  int ar1[] = {1, 12, 15, 26, 38};
  int ar2[] = {2, 13, 17, 30, 45};
  printf("%d", getMedian(ar1, ar2, 5)) ;
 
  getchar();
  return 0;
}
 
// Utility functions 
int max(int x, int y)
{
    return x > y? x : y;
}
 
int min(int x, int y)
{
    return x > y? y : x;
}
 
// Function to get median of a single array 
int median(int arr[], int n)
{
  if(n%2 == 0)
    return (arr[n/2] + arr[n/2-1])/2;
  else
    return arr[n/2];
}

References 

2 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete