Thursday, May 1, 2014

Deterministic Selection - Select the i th smallest element

We have already seen RSelect or Random Selection here. Now lets focus on DSelect.

Though RSelect is good, and works in linear time, but it deterministically runs in O(n) time, even in worst case. But the downside is that it doesn't runs as fast as RSelect, but its worst case is still better. Like in quicksort, we had it better than merge sort, but worst case of quicksort was worse than mergesort.

How do we get the best pivot - without using randomization, rather deterministic. So, we have to change choose pivot method.

-logically break A into n/5 group of size 5 each,
-sort each group
-copy n/5 medians into new array C
-recursively compute median of C
-return pivot

So, now the deterministic select Pseudocode is:
DSelect(array A, length n, order statistic i)
  1. Break A into groups of 5, sort each group;
  2. C = the n/5 "middle elements";
  3. p = Select(C,n/5,n/10); //Recursively computes median of C;
  4. Partition A around p;
  5. if(j==i) return p;
  6. if (j<i) return select(1st part of A, j-1,i);
  7. if (j>i) return Select(2nd part of A, n-j,i-j);

So, last 3 steps are same as RandomSelect or RSelect. In DSelect we have 2 recursive calls - one to get the pivot p(line 4), and then in last 2 lines(line 7 and 8), based on the situation.

Warning for DSelect algorithm

  • Worse constants in time complexity, though overall time complexity is O(n)
  • not in place algorithm as we need array C.
History of DSelect in linear time dates back in 1973, when 5 authors Blum - Floyd - Pratt - Rivest - Tarjan came together and found this algorithm. 4 authors except Pratt got the turing awards for different contributions.

Time complexity of DSelect
Step 1 takes O(n) time and not O(n/log n). We have break the array into group of 5. So, time complexity of sorting each group of 5 is O(5 log5) which is nearly 5*3 = 15. But we have n/5 arrays of this, so the overall step takes O(n) time.
Step 2 takes O(n) time, as for each sub array of 5 elements, it takes O(1) time.
Step 3 will be a recurrence, hence T(n/5)
Step 4 takes O(n) again, as we know partitioning takes O(n) time.
Step 5,Step6 and Step 7 depend on how good the pivot is. Recall that in quicksort and random select, we were not able to make a recurrence, as the algorithm was randomized and we didn't knew where the pivot is. But in deterministic approach we can come with worst case recurrence. It is 7n/10.(Proof of this will come letter)

T(1) = 1, T(n) = cn+T(n/5)+T(7n/10)
Here as the sub-problem size is different, we can't use master method directly. So, here we can get the time complexity using "hope and check".  We will hope that there is O(n) time, and come up with this time once using induction.



Post a Comment