order statistics. order statistics given an input of n values and an integer i, we wish to find the...

30
Order Statistics

Upload: juniper-webster

Post on 01-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Order Statistics

Order statistics

• Given an input of n values and an integer i, we wish to find the i’th largest value.

• There are i-1 elements smaller than the i’th order statistic.

• The minimal element is of order statistic 1

Order statistics

• The maximum element is the n’th order statistic.

• Finding the i’th order statistic when the values are sorted is trivial O(1) using direct access, but requires at least nlogn time to sort the elements in advance.

Selection

• Our goal is to find the order statistics without sorting the elements, if indeed we will be able to improve the execution time.

• If we want to find the minimal or maximal element, then a linear search will do. However this idea can not be easily expanded for any order statistic.

Tournaments

• In a basketball tournament involving n teams, we form a complete binary tree with n leaves. Each internal node represents an elimination game.

• Each level has half the number of nodes from the previous level. Assuming the better team always wins its game, the best team always wins all its games, and can be found as the winner of the last game.

Tournaments

Tournaments

• Tournaments can be used for finding minimum or maximum. But could they be enhanced for selection of any order statistics.

• The tournament algorithm:– Can be run in parallel.– Is fair (every team gets to each step after the

same number of games)

Tournaments

• To select the second best team in the tournament, we need to compare all the logn teams that lost to the best element.

• We can compare these elements recursively using another tournament.

• The running time is therefore of n + logn

HeapSelect

• The tournament algorithm is like a binary heap, and finding the second minimum is like removing the minimal element from a binary heap. For any other k we use:

• heapSelect (int[] values, int k) {Heap heap = buildHeap(values);for (i = 1; i < k; i++) heap.removeMin();return heap.minElement();

}

Heap Select

• The time is O(n + klogn) which is linear for any k = O(n/logn)

• But this algorithm is not linear for finding the median element, which is of common interest.

Quick Select

• We could use quick sort to first sort the elements and then select the k’th element according to its location in the sorted values

• quickSelect (int[] values, int k) {quickSort(values);return values (k);

}

Quick Select

• An inline version of this algorithm would look like this.

• quickSelect(int[] values, int k) {pick x in valuespartition values into L1<x, L2=x, L3>xquicksort(L1)quicksort(L3)concatenate L1,L2,L3 return kth element in concatenation

}

Quick Select

• But if k is less than the length of L1, we will always return some object in L1.

• Similarly, if k is greater than the combined lengths of L1 and L2, we will always return some object in L3, and it doesn't matter whether we call quicksort on L1.

• In either case, we can save some time by only making one of the two recursive calls. If we find that the element to be returned is in L2, we can just immediately return x without making either recursive call.

Quick Select

• quickSelect(int[] values, int k) {pick x in valuespartition values into L1<x, L2=x, L3>x if (k <= length(L1)) {

quicksort(L1) return kth element in L1

} else if (k > length(L1)+length(L2)) { quicksort(L3)

return (k-length(L1)-length(L2)) element in L3 } else return x }

Recursive final version

• quickSelect(int[] values, int k) {

pick x in values

partition values into L1<x, L2=x, L3>x

if (k <= length(L1)) {

return quickSelect(L1,k)

} else if (k > length(L1)+length(L2)) {

return quickSelect(L3, k – length(L1)+length(L2))

} else return x

}

Time analysis

• If the partition always splits the values to 2 equal sub arrays

• Worst case is that partition has a bad split

• Average case - ?

( ) ( / 2) ( ) ( )T n T n O n O n

2( ) ( 1) ( ) ( )T n T n O n O n

Worst case O(n) algorithm

• Divide the input elements into groups of 5 elements each

• Find the median of each group

• Use select recursively to find the median of medians

• Partition the input using the median of medians as the pivot element

Time analysis

• The number of elements greater than x (the median of medians) is at least

1 33( 2) 6

2 5 10

n n

( ) (1) 80

( ) ( ) (7 /10 6) ( ) 805

T n if n

nT n T T n O n if n

Time analysis

• ( ) / 5 (7 /10 6) ( )

/ 5 7 /10 6 ( )

9 /10 7 ( )

T n c n c n O n

cn c cn c O n

cn c O n

cn

Exercise

• Given an array of n elements, describe an algorithm that efficiently finds if one of the numbers in the array appears more than n/3 times

An inefficient solution

• Sort the array. Then check for a sequence of size greater than n/3.

• ( log ) ( )n n O n

An efficient solution

• The only elements that can appear more than n/3 time are the o.s n/3 and o.s 2n/3

• Find both of these elements using the select algorithm. Count the instances of each of these elements in the array.

• ( )n

An efficient solution

`

Example

• n=12 n/3=4 2n/3=8

10 1 3 3 4 4 2 4 9 4 4 10

1 2 3 3 4 4 4 4 4 9 10 10

Exercise

• Given two sorted arrays a,b of size n each, find the median of the the 2n elements of the union of a and b.

• The median of array of even size is the average of the two elements in the middle of the sorted collection

An inefficient solution

• Using merge, we unify both arrays into a single array, and return the median of the new array.

• ( )n

An efficient solution

• Let a be the median of A, let b be the median of b.

• Assuming

Recursively call the algorithm with the upper half of A and the lower half of B

• Base case: if |A| =1 and |B| =1 return (a+b)/2

a b

Proof

• The median c of the union of A and B

• A has exactly n/2 elements smaller than a• B has at most n/2 elements smaller than a• In the union there is at most n elements

smaller than a• In the union there are at most n elements

greater than b

a c b

Proof

• The median of the merge of the upper half of A and the lower half of B is the same median as A union B.

a c b

nn

<n <n

Proof

• Since we removed exactly n elements and these elements are n/2 smallest and n/2 largest in the union, the median stays at place.

• Time analysis: ( ) ( / 2) (1) (log )T n T n n