15-211 fundamental structures of computer science

22
15-211 Fundamental Structures of Computer Science February 25, 2003 Ananda Guna Sorting – Part II

Upload: abraham-clay

Post on 04-Jan-2016

48 views

Category:

Documents


2 download

DESCRIPTION

15-211 Fundamental Structures of Computer Science. Sorting – Part II. February 25, 2003. Ananda Guna. Announcements. Work on Homework #4 Due on Monday, March 17, 11:59pm You should have started by now! Quiz #2 is Tuesday, Feb.25 Study Huffman and LZW algorithms - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 15-211 Fundamental Structures of Computer Science

15-211Fundamental Structuresof Computer Science

February 25, 2003

Ananda Guna

Sorting – Part II

Page 2: 15-211 Fundamental Structures of Computer Science

Announcements

Work on Homework #4 Due on Monday, March 17, 11:59pmYou should have started by now!

Quiz #2 is Tuesday, Feb.25Study Huffman and LZW algorithms

Midterm is Tuesday March 4th

Review for mid term test thursday

Page 3: 15-211 Fundamental Structures of Computer Science

Master Theorem

THEOREM: The recurrence T(n) = aT(n/b) + cn, T(1) = c,

where a, b, and c are all constants, solves to:

T(n) = (n) if a < b T(n) = (n log n) if a = b T(n) = (n log

b a) if a > b

Page 4: 15-211 Fundamental Structures of Computer Science

Recurrences

Divide-and-conquer algorithms often lead to recurrences of the following form: T(n) = a*T(n/b) + cn T(1) = c

(Here a,b, and c, are constants > 0.) For merge sort : a = b = 2 What if a = b = 3 for merge sort?

How will that affect c?

Page 5: 15-211 Fundamental Structures of Computer Science

Solving General Recurrences

We can solve this by repeated substitution method

T(n) = aT(n/b) + cn = a(aT(n/b2) + cn/b) + cn = a(a(aT(n/b3)+cn/b2) +cn/b) +cn = ……

= ak+1 T(n/bk+1) + cn[(a/b)k+..+(a/b)2+(a/b)+1] We will solve this in class.

Page 6: 15-211 Fundamental Structures of Computer Science

Sorting Recap

Selection sort: always O(n^2) Insertion sort: the total time is O(n + # inversions). This is O(n^2)in worst-case and average-case, but might be smaller if the file is almost sorted.

Bubble-sort - between insertion and selection in terms of running time.

These are all easy to code but O(n2) on average

We can do better

Page 7: 15-211 Fundamental Structures of Computer Science

Sorting Recap ctd..

Better algorithms Heapsort : O(n log n) worst-case. Can do in-place

in array. Mergesort. O(n log n) worst-case. Simple divide-and-

conquer: split into left and right halves, recursively sort both halves, and then merge the results. Running time described by recurrence: T(n) = 2T(n/2) + cn and Recurrence solves to O(n log n).

Quicksort: O(n^2) worst-case but O(n log n) average-case. • If you always pick the leftmost as pivot, then this is a lot

like inserting into a binary search tree• Cost is like sum of depths of the nodes• How can we avoid the worst case for any data set? Hint:

Randomize

Why is quicksort better than MergeSort? Hint: faster innerloop

Page 8: 15-211 Fundamental Structures of Computer Science

Lower Bound for the Sorting Problem

Page 9: 15-211 Fundamental Structures of Computer Science

How fast can we sort?

We have seen several sorting algorithms with O(Nlog N) running time.

Can we do better than N.logN?

In fact, O(Nlog N) is a general lower bound for the sorting algorithm.

A proof appears in Weiss.

Informally we can argue as follows…

Page 10: 15-211 Fundamental Structures of Computer Science

Decision tree for sorting

a<b<ca<c<bb<a<cb<c<ac<a<bc<b<a

a<b<ca<c<bc<a<b

b<a<cb<c<ac<b<a

a<b<ca<c<b

c<a<b

a<b<c a<c<b

b<a<cb<c<a

c<b<a

b<a<c b<c<a

a<b b<a

b<c c<bc<aa<c

b<c c<b a<c c<a

N! leaves.

So, tree has height log(N!).

log(N!) = (Nlog N).

Page 11: 15-211 Fundamental Structures of Computer Science

Our lower bound argument

We make the following observations/Arguments for any two different permutations P1,P2 of the input, the algorithm must at some point make a comparison that causes it to do different things in the two permutations (otherwise, they wouldn't both be sorted)

each comparison has only two outcomes (it's a YES/NO question)

there must be some permutation that causes the algorithm to ask log(n!) questions

So we argued that comparison based algorithms are both O(n log n) and (n log n)

So this is (n log n).

Page 12: 15-211 Fundamental Structures of Computer Science

Summary on sorting bound

If we are restricted to comparisons on pairs of elements, then the general lower bound for sorting is (Nlog N).

A decision tree is a representation of the possible comparisons required to solve a problem.

Page 13: 15-211 Fundamental Structures of Computer Science

Bucket Sort

Page 14: 15-211 Fundamental Structures of Computer Science

Non-comparison-based sorting

If we can use more than just comparisons of pairs of elements, we can sometimes sort more quickly.

A simple example is bucket sort.

In bucket sort, we require the additional knowledge that all elements are non-negative integers less than a specified maximum value.

Page 15: 15-211 Fundamental Structures of Computer Science

Bucket sort

1 3 3 1 2

1 2 3

Page 16: 15-211 Fundamental Structures of Computer Science

Implementing Bucket Sort

Assume all values are in the range 0..k for some small k.

Make an array of k linked lists Insert each item into

array[item.value()] Make one pass and collect all items This is O(N + k) algorithm

Page 17: 15-211 Fundamental Structures of Computer Science

Bucket sort characteristics

Runs in O(N) time.

Easy to implement each bucket as a linked list.

Is stable:If two elements (A,B) are equal with

respect to sorting, and they appear in the input in order (A,B), then they remain in the same order in the output.

Page 18: 15-211 Fundamental Structures of Computer Science

Radix Sort

If your integers are in a larger range then do bucket sort on each digit

Start by sorting with the low-order digit using a STABLE bucket sort.

Then, do the next-lowest,and so on If the items are b digits long (or b

bytes long for strings) then the time to sort N items is O(Nb).

Page 19: 15-211 Fundamental Structures of Computer Science

Radix sort Example

A sorting algorithm that goes beyond comparison - radix sort.

0 1 00 0 01 0 10 0 11 1 10 1 11 0 01 1 0

20517346

01234567

0 1 00 0 01 0 01 1 01 0 10 0 11 1 10 1 1

0 0 01 0 01 0 10 0 10 1 01 1 01 1 10 1 1

0 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 1

Each sorting step must be stable.

Page 20: 15-211 Fundamental Structures of Computer Science

Radix sort characteristics

Each sorting step can be performed via bucket sort, and is thus O(N).

If the numbers are all b bits long, then there are b sorting steps.

Hence, radix sort is O(bN).

Also, radix sort can be implemented in-place (just like quicksort).

Page 21: 15-211 Fundamental Structures of Computer Science

Not just for binary numbers

Radix sort can be used for decimal numbers and alphanumeric strings.

0 3 22 2 40 1 60 1 50 3 11 6 91 2 32 5 2

0 3 10 3 22 5 21 2 32 2 40 1 50 1 61 6 9

0 1 50 1 61 2 32 2 40 3 10 3 22 5 21 6 9

0 1 50 1 60 3 10 3 21 2 31 6 92 2 42 5 2

Page 22: 15-211 Fundamental Structures of Computer Science

Thursday and Next Week

We will do a review for midterm on Thursday

Midterm test is Tuesday March 4th

We will post some old exams on Bb. We will have online office hours next

week Work on HW4 – Ask questions early