Sunday, August 4, 2013

Randomized Selection : Selecting the i'th smallest element

Input - array A with n distinct numbers and a number i ∈ {1,2,...,n}

Output : ith order statistic i.e. ith smallest element of the input array

O(n log n) algorithm
1) Apply mergesort/quicksort
2) Return the ith element of sorted array

So, we used reduction via taking the help of sorting algorithm. So this was the naive way as we would be to say, "Hey let's sort it and then we can just pick out the ith one!" We could use MergeSort or even better yet, QuickSort and have it done in O(nlogn). This is called a reduction of our problem: reducing selecting to sorting and then selecting.

Linear time or O(n) solution
Finding the ith smallest element may seem difficult at first, but is it possible to find some O(n) or linear time solution.

It turns out that we can actually accomplish this task by slighting changing QuickSort. Recall that after partitioning, the pivot stays in its "rightful" spot and doesn't get moved in any of recursive calls later on.   So let's say we're looking for the ith element and after the first partition, the pivot is at k.

If k=i, then we know we found the ith element and we're done! If k>i, then recursively partition the left hand side. Otherwise, recursively partition the right side. So, this can be done in 2 ways - randomized as well as deterministic. Deterministic is not that obvious. So, we will take it in the end.

Randomized selection
1) Choose a pivot. We know that the pivot property is like elements less than pivot, lie on left of it and more than it stay on right.


2) We will recurse on the right side of the array. We will

pseudocode
/**A is the input array
i - is the i'th element we have to select
arrayLength= is the length of the array we are assing.
**/
public int RSelect(int[] A, int arrayLength,int i)
{
  if(n==1) return A[1];
  p = choosePivot();
  //partition A around p
  //Let j = new index of p
  if(j==i)   return p;
  if(j>i) return RSelect(1st part of A, j-1,i);
  if(j<i) return RSelect(2nd part of A, n-j,i-j)
}

Running time
What is the running time of this selection algorithm? Well it depends on quality of chosen pivots.
Worst case
If pivots are chosen in the worst possible way every time, then it will be O(n2). This would be if we chose the smallest or largest element every time, reducing the problem by only one element.

What about the best pivot? To get the best pivot, our problem size should decrease significantly. This would be the median. So, problem size will be n/2, and then we do partitioning which takes O(n) time. Using the master method:
T(n) <= T(n/2)+O(n)
which is case 2, and we get that this will run in O(n).

Summary of random-select
Note that randomness is not in data but in code. Also, we find out that selection is moderate problem when compared to sorting, and and hence it took linear time.

Here is the deterministic algorithm for the same.

 Source

0 comments:

Post a Comment