internal sorting 1 -...
TRANSCRIPT
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
1/38
Internal Sorting 1
S. Thiel1
1Department of Computer Science & Software EngineeringConcordia University
May 11, 2017
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
2/38
Outline
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
3/38
Sorting
I Our example: “a series of numbered tenis balls”
I Sorted or unsorted?
I Ways and means by which we sort
I Properties to let us choose when to use which
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
4/38
Sorting
I We mostly see comparison sorts
I “Compare” elements in turn
I We determine the desired order
I smallest to biggest is a good default
I when sorted, elements to the left are ≤ elements to theright
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
5/38
Sorting Terms
I Stable
I In-place or in-situ
I swap
I diversion
I equality (duplication)
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
6/38
Sorting Three Elements
I What’s the best case?
I What’s the worst case?I What’s the average case?
I Distinct? What if duplicates allowed?
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
7/38
Sorting Three Elements
Figure: Ways to sort three elements.
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
8/38
Linear Sorts
I Not referencing Θ (n)
I In fact, generally Θ(n2)
I
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
9/38
The Exchange Sorts
I These are often called exchange sorts or linear sorts
I technically, insertion sort isn’t an exchange
I linear sort is not about analysis
I linear sort is about the flow through the list
I . . . technically insertion sort cheats there too
I They are Θ(n2)
sorts in the average case
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
10/38
The Sorts
I Bubble SortI Knuth identifies one redeeming use with obscure
technologyI when parralellized, it looks like parralellized sifting
I Selection SortI Sifting
I Book calls this Insertion Sort
I Insertion Sort
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
11/38
Bubble Sort
I This one is bad, but amusing
I Bubbles up the list
I Always has to go up till the end
I Insertion/Sifting does not
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
12/38
Bubble Sort Workings
1. The “end” position is one past the last element
2. The first element is the “biggest”
3. If the next element is the “end” position, stop.
4. Point to the next element
5. Compare “biggest” with newly pointed at
5.1 If the newer item is bigger, it is now the biggest,5.2 If the newer item is smaller, swap with the biggest item
6. Check if the next element is the “end” position
6.1 If it is the “end”, the “end” position moves left6.2 If it is not the end, Go to 4
7. Go to 2
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
13/38
Bubble Sort Analysis
I We know after each pass, one more item is in order
I we compare n-1 times, then n-2 times, etc.
I In the best case, we do no swaps, same number ofcompares
I In the worst case do we swap after every compare?
I What does the average case imply?
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
14/38
Bubble Sort Properties
I Stable (if we do ≥?)
I In-place
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
15/38
Selection Sort
I Like Bubble Sort in comparisons
I But fewer swaps
I We only swap once for each pass
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
16/38
Selection Sort Workings
I Maybe make me draw this on the board too. . .
1. The “end” position is one past the last element
2. The first element is the “biggest”
3. If the next element is the “end” position, stop.
4. Point to the next element
5. Compare “biggest” with newly pointed at
5.1 If the newer item is bigger, it is now the biggest,
6. Check if the next element is the “end” position6.1 If it is the “end”
6.1.1 swap biggest with the one pointed at6.1.2 move the “end” to the left
6.2 If it is not the end, Go to 4
7. Go to 2
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
17/38
Selection Sort Analysis
I We know after each pass, one more item is in order
I we compare n-1 times, then n-2 times, etc.
I In the best case, we do no swaps, same number ofcompares
I In the worst case we swap only once for each n
I What does the average case imply? Still half-n swaps?
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
18/38
Selection Sort Properties
I Stable (if we do ≥?)
I In-place
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
19/38
Sifting
I walk up the list
I where you are at in the list is current position
I everything before current position must be in order
I put current position in order
I advance current position
I things put in order by swapping
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
20/38
Insertion Sort
I Exactly like Sifting except
I You don’t swap, you slide
I Optimal diversion sort in most cases
I Also fast when list is nearly sorted
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
21/38
Linear Algorithm Asymptotic Analysis
Figure: p. 231 of Data Structures and Algorithm Analysis
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
22/38
Better than n2
I Can we sort faster than this?
I Definitely. The rest of the course looks at theseapproaches.
I Today we’ll start with Quicksort, a very popular sort
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
23/38
Quicksort
I Quicksort works by partitioning an input in two, thensorting each half recursively
I The partition is made around a chosen “pivot”
I The left partition only has elements smaller than the“pivot”
I The right partition only has elements bigger than the“pivot”
I Each “partition” step takes Θ (N) operations
I How many “partition” steps are needed?
I Actually, each “partition” step takes gradually feweroperations. . . why?
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
24/38
Common variants
I Median of Three
I Diversion
I Tail Recursion
I Introsort (Musser)
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
25/38
Quicksort Sort Properties 1
I Quicksort. . .
I is a divide and conquer algorithm
I works best with good pivot selection
I is recursive
I puts a pivot in place every pass
I is in-place
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
26/38
Quicksort Sort Properties 2
I Is it Stable?
I Some neat optimizations to make it fast and stable withmany duplicates
I . . . might be a bit slower otherwise
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
27/38
Quicksort Analysis
I Best case Θ (n log n)
I Average case Θ (n log n)
I Worst case Θ(n2)
I Note that the Average and Worst case have differentcomplexity?
I What does that mean?
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
28/38
Quicksort With Distinct Keys Average-CaseAnalysis
I Sedgewick’s 1977 piece “Quicksort with Equal Keys”
I He starts with an introductory analysis of inputs withdistinct keys
I We will look at that here
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
29/38
Quicksort With Distinct Keys 1
I Let us look at comparisons with the pivot
I We assume random inputs
I We assume randomness is maintained on partitioning
I We assume sentinel checks give two extra comparisons(this is an optimization)
I Since input of length 1 is sorted, assume 2 ≤ N
I We can then use the recurrence relation as follows(directly from Sedgwick)
I let CN be the average comparisons given the above
I CN = N + 1 + 1N
∑1≤k≤N
(Ck−1 + CN−k)
I This looks a bit bulky, can we trim it down?
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
30/38
Quicksort With Distinct Keys 2
I given CN = N + 1 + 1N
∑1≤k≤N
(Ck−1 + CN−k)
I We see that the recurrence is just the left and the rightpartition
I Since the sum must be the total size, the occurrence ofone of them must be the same as the occurrence of itscomplement
I We can thus reduce to:
I CN = N + 1 + 2N
∑1≤k≤N
Ck−1
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
31/38
Quicksort With Distinct Keys 3
I CN = N + 1 + 2N
∑1≤k≤N
Ck−1
I We know that if the input is empty or there is one item,we have no comparisons.
I We know that the N + 1 is just the number ofcomparisons on the first partitioning pass.
I We know that the last term is the average number ofcomparisons for each half.
I If we multiply by N we lose the fraction
I NCN = N2 + N + 2∑
1≤k≤N
Ck−1
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
32/38
Quicksort With Distinct Keys 4
I NCN = N2 + N + 2∑
1≤k≤N
Ck−1
I we can further reduce by a process called differencing,that is subtracting the result of N-1 (which means weneed an N of at least size 3)
I (N − 1)CN−1 = (N − 1)2 + (N − 1) + 2∑
1≤k≤N−1
Ck−1
I Before we difference, note that the summation can bemade the same by subtracting the last term, CN−1
I (N−1)CN−1 = (N−1)2+(N−1)+2∑
1≤k≤N
Ck−1−CN−1
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
33/38
Quicksort With Distinct Keys 5
I Let’s look at the left side first, it’s easy:
I NCN − (N − 1)CN−1
I The right side looks more complicated at first:
I N2 + N + 2∑
1≤k≤N
Ck−1 − (N − 1)2 − (N − 1) −
2∑
1≤k≤N
Ck−1 + 2CN−1
I but we can note right away that because of the lasttweak, the summation terms differ only in sign, so wecan get rid of them:
I N2 + N − (N − 1)2 − (N − 1) + 2CN−1
I We can then expand the terms to get further savings
I N2 + N − N2 + 2N − 1 − N + 1 + 2CN−1
I Which reduces simply to
I 2N + 2CN−1
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
34/38
Quicksort With Distinct Keys 6
I We can now show both sides easily
I NCN − (N − 1)CN−1 = 2N + 2CN−1
I Isolating the NCN term again:
I NCN = 2N + 2CN−1 + (N − 1)CN−1
I NCN = (N + 1)CN−1 + 2N
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
35/38
Quicksort With Distinct Keys 7
I This last step requires some intuition, but we can divideby N(N + 1) to see a telescoping pattern
I CNN+1 = C2
3 +∑
3≤k≤N
2k+1
I Given the similarity to harmonic series, we can simplifyknowing that:
I Hn =∑
1≤k≤N
1k
I Personally, I see it clearer by knowing that we can pullthe constant 2 out of the summation and by adjustingthe limit we can get rid of the +1 in the denominator.
I CNN+1 = C2
3 + 2∑
4≤k≤N+1
1k
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
36/38
Quicksort With Distinct Keys 8
I Given Hn =∑
1≤k≤N
1k
I and CNN+1 = C2
3 + 2∑
4≤k≤N+1
1k
I We can look at our term in terms of the Harmonicnumber, that is
I 2(HN+1 − 1 − 12 − 1
3)
I Since C2 = 3 we can push that constant in with theharmonic (we have to divide it by two to get inside theterm:
I CN = 2(N + 1)(HN+1 + 12 − 1 − 1
2 − 13):
I or more simply CN = 2(N + 1)(HN+1 − 43):
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
37/38
Quicksort With Distinct Keys 9
I CN = 2(N + 1)(HN+1 − 43) when N >= 2
I Since the rate of growth of the Harmonic Series isΘ (log n) then the average case of CN must beΘ (N log n)
Internal Sorting 1
S. Thiel
Sorting
Linear Sorts
nlogn Sorts
Quicksort Analysis
References
38/38
References I