dco 20105 data structures and algorithms

33
Rossella Lau Lecture 7, DCO20105, Semester A,2005-6 DCO20105 Data structures and algorithms Lecture 7: Big-O analysis Sorting Algorithms Big-O analysis on different ways of multiplication Sorting Algorithms: selection sort, bubble sort, insertion sort, radix sort, partition sort, merge sort Comparison of different sorting algorithms -- By Rossella Lau

Upload: guy-david

Post on 01-Jan-2016

47 views

Category:

Documents


1 download

DESCRIPTION

DCO 20105 Data structures and algorithms. Lecture 7: Big-O analysis Sorting Algorithms Big-O analysis on different ways of multiplication Sorting Algorithms: selection sort, bubble sort, insertion sort, radix sort, partition sort, merge sort - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

DCO20105 Data structures and algorithms

Lecture 7: Big-O analysis Sorting Algorithms

Big-O analysis on different ways of multiplication Sorting Algorithms:

selection sort, bubble sort, insertion sort, radix sort, partition sort, merge sort

Comparison of different sorting algorithms

-- By Rossella Lau

Page 2: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Performance re-visitFor multiplication, we can use (at least) three

different ways: The one we used to use in primary school bFunction(m, n) in slide 9 of Lecture 6 funny(a, b) in slide 10 of Lecture 6

int funny(int a, int b) { if ( a == 1 ) return b; return a & 1 ? funny ( a>>1, b<<1) + b : funny ( a>>1, b<<1); }

int bFunction( int n, int m){ if (!n) return 0; return bFunction (n-1, m) + m; }

Page 3: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Performance analysis

The traditional way bFunction(n,m) funny(a, b)

Memorizing time table is required (memory)

Memorizing time table is not required

Memorizing is not required for binary system (in a computer)

Number of operations:Multiplications: #(n) * #(m) – memory recallAdditions: #(n)

#(n) is the number of digits of n

Number of operations: number of n additions

Number of operations:Shift operations: 2 * #b(a)Additions: worst case, #b(a)

#b(a) is the number of digits of a in binary

Page 4: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Execution time vs memory

The traditional multiplication has the least operations but it requires the most memory at O (log10 n)

bFunction() does not require additional memory but it spends a terrible amount of time getting the result : O(n)

funny() does not require additional memory and it has a bit more operations at O (log2 n)

The traditional way may have less operations but hard to say if it really outperforms funny() since memory load may not be faster than shift operation

Page 5: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Ordering of data

In order to search a record efficiently, records are stored in the order of key values

A key is a field or some fields of a record that can uniquely identify the record in a file

Usually, only the key values are stored in memory and the corresponding record is loaded into the memory only when it is necessary

The key values, therefore, usually are sorted in a special order to allow efficient searching

Page 6: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Classification of sorting methods

Comparison-Based Methods

Insertion Sorts Selection Sorts Heapsort (tree sorting) – in future lesson Exchange sorts

• Bubble sort• Quick sort

Merge sorts

Distribution Methods: Radix sorting

Page 7: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Selection sort

Selection: choose the smaller element from a list and place it in the 1st position.

The process is from the first element to the second to last element on a list and for each element to apply the “selection” on the sub-list starting from the element being processed.

Ford’ text book slides 2-9 in Chapter 3

Page 8: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Bubble sortTo pass through the array n-1 times, where n is the

number of data in the array

For each pass: compare each element in the array with its successor interchange the two elements if they are not in order

The algorithm bubble (int x[], int n) { for (i=0; i<n-1; i++) for (j=0; j<n-1; j++) if (x[j] > x[j+1])

SWAP (x[j], x[j+1])}

Page 9: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

An example trace of bubble sortGiven data sequence:

25 57 48 37 12 92 86 33

The first pass:

25 57 48 37 12 92 86 33

25 57 48 37 12 92 86 33

25 48 57 37 12 92 86 33

25 48 37 57 12 92 86 33

25 48 37 12 57 92 86 33

25 48 37 12 57 92 86 33

25 48 37 12 57 86 92 33

25 48 37 12 57 86 33 92

Subsequent passes:

Pass2: 25 37 12 48 57 33 86 92

Pass3: 25 12 37 48 33 57 86 92

Pass4: 12 25 37 33 48 57 86 92

Pass5: 12 25 33 37 48 57 86 92

Pass6: 12 25 33 37 48 57 86 92

Pass7: 12 25 33 37 48 57 86 92

Page 10: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Improvement can be madeAt pass i, the last i elements should be in proper

positions since, at the first pass the largest element should be placed at the end of the array. At the second pass, the second large element should be placed before the last element, and so on. The comparison only requires from x[0] to x[n-i-1]

The array has already been sorted at the fifth iteration and the sixth and seventh are redundant

Therefore, once no exchange is required in an iteration, the array is already sorted and the subsequent iterations are redundant

Page 11: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

The improved algorithm for bubble sort

void bubble1 (int x[], int n){ exchange = TRUE; for (i=0; i<n-1 && exchange; i++) { exchange = FALSE; for (j=0; i<n-i-1; j++) if (x[j] > x[j+1]) { exchange = TRUE; SWAP(x[j], x[j+1]); }/*end if */ }/* end for i */}

Page 12: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Performance considerations of bubble sort

For the first version, it requires (n-1) comparisons in (n-1) passes the total number of comparisons is n2 -2n +1, i.e., O(n2)

For the improved version, it requires (n-1) + (n-2) + ... + (n-k) for k (<n) passes the total number of comparisons is (2kn-k2 -k)/2. However, the average k is O(n) yielding the overall complexity as O(n2) and the overhead (set and check exchange) introduced should also be considered

It only requires little additional space

Page 13: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Insertion sort

Insert an item into a previous sorted order one by one for each of the data.

It is similar to repeatedly picking up playing cards and inserting them into the proper position in a partial hand of cards

Page 14: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

An example trace of insertion sort

25 57 48 37 12 92 86 33

25 57 48 37 12 92 86 33

25 57 48 37 12 92 86 33

25 57 37 12 92 86 33

25 48 57 37 12 92 86 33

25 48 57 37 12 92 86 33

25 48 57 12 92 86 33

25 48 57 12 92 86 33

25 37 48 57 12 92 86 33

25 37 48 57 12 92 86 33

25 37 48 57 92 86 33

25 37 48 57 92 86 33

25 37 48 57 92 86 33

25 37 48 57 92 86 33

12 25 37 48 57 92 86 33

12 25 37 48 57 92 86 33

12 25 37 48 57 92 86 33

12 25 37 48 57 86 92 33

12 25 37 48 57 86 92 33……

12 25 33 37 48 57 86 92

Page 15: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

The algorithm of insertion sort

void insertsort(x,n) int x[], n){for (k=1; k<n; k++) { y = x[k]; for (i = k-1; i >=0 && y<x[i]; i--) x[i+1] = x[i]; x[i+1] = y; } /* end for k */}

The checking of i>=0 is time consuming. Setting a sentinel in the beginning of the array will prevent y from going beyond the array

void insertsort(int x[], int m)/* m is n+1, data from x[1] */X[0] = MAXNEGINT{for (k=2; k < m; k++) { y = x[k]; for (i = k-1; y<x[i]; i--) x[i+1] = x[i]; x[i+1] = y; } /* end for k */}

Page 16: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Performance analysis of insertion sort

If the original sequence is already in order, only one comparison is made on each pass ==> O(n)

If the original sequence is in a reversed order, it requires n comparison in each pass ==> O(n2)

The complexity is from O(n) to O(n2)

It requires little additional space

Page 17: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Quick sortIt is also called partition exchange sort

In each step, the original sequence is partitioned into 3 parts:

a. all the items less than the partitioning element

b. the partitioning element in its final position

c. all the items greater than the partitioning element

The partitioning process continues in the left and right partitions

Page 18: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

The partitioning in each step of quicksort

To pick one of the elements as the partitioning element, p, usually the first element of the sequence

To find the proper position for p while partitioning the sequence into 3 parts

a) it employs two indexes, down and up

b) down goes from left to right to find elements greater than p

c) up goes from right to left to find elements less than p

d) elements found by up and down are exchanged

e) process until up and down are matched or passed each other

f) the position of p should be pointed by up

g) exchange p with the element pointed by up

Page 19: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

An example trace of quicksort 25 57 48 37 12 92 86 33

25 57 48 37 12 92 86 33

25 57 48 37 12 92 86 33

25 57 48 37 12 92 86 33

25 57 48 37 12 92 86 33

25 57 48 37 12 92 86 33

25 12 48 37 57 92 86 33

25 12 48 37 57 92 86 33

25 12 48 37 57 92 86 33

25 12 48 37 57 92 86 33

25 12 48 37 57 92 86 33

(12) 25 (48 37 57 92 86 33)

Subsequent processes:

12 25 (48 37 57 92 86 33)

12 25 (48 37 33 92 86 57)

12 25 (48 37 33 92 86 57)

12 25 (33 37) 48 (92 86 57)

12 25 (33 37) 48 (92 86 57)

12 25()33 (37) 48 (57 86) 92()

12 25 33 37 48 (57 86) 92

12 25 33 37 48()57(86) 92

12 25 33 37 48 57 86 92

_ down, _ up

Page 20: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

The algorithm for quicksortvoid quickSort(int x[], int left, int right){ int down, up, partition; down=left; up=right+1; partition=x[left];

while (down<up) { while (x[++down] <= partition); while (x[--up] > partition); if (down<up) SWAP(x[down], x[up]) } x[left] = x[up]; x[up] = partition;

if (left < up - 1) quickSort(x, left, up-1); if (down < right) quickSort(x, down, right);}

Page 21: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Performance considerations of quicksort

Quciksort got its name because it quickly puts an element into its proper position by employing two indexes to speed up the partioning process and to minimize the exchange

Each pass reduces the comparisons about a half total number of comparisons is about O(nlog2n)

It requires spaces for the recursive process or stacks for an iterative process, it is about O(log2n)

Page 22: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

MergeMerge means to combine two or more sorted sequences

into another sorted sequence

The merging of two sequences, for example, are as follows32 45 78 90 92 | 25 30 52 88 98 |

32 45 78 90 92 | 25 30 52 88 98 |25

32 45 78 90 92 | 25 30 52 88 98 |25 30

32 45 78 90 92 | 25 30 52 88 98 |25 30 32

32 45 78 90 92 | 25 30 52 88 98 |25 30 32 45

32 45 78 90 92 | 25 30 52 88 98 |25 30 32 45 52

32 45 78 90 92 | 25 30 52 88 98 |25 30 32 45 52 78

32 45 78 90 92 | 25 30 52 88 98 |25 30 32 45 52 78 88

32 45 78 90 92 | 25 30 52 88 98 |25 30 32 45 52 78 88 90

32 45 78 90 92 | 25 30 52 88 98 |25 30 32 45 52 78 88 90 92

32 45 78 90 92_| 25 30 52 88 98_|25 30 32 45 52 78 88 90 92 98

Page 23: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Merge sortIt employs the merging technique in the following

way:

1. Divide the sequence into n parts

2. Merge adjacent parts yielding the sequence n/2 parts

3. Merge adjacent parts again yielding the sequence n/4 parts

......

Process goes on until the sequence becomes 1 part

Page 24: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

An example of merge sort

8 parts 25 57 48 37 12 92 86 33

merge 25 57 37 48 12 92 33 86

4 parts 25 57 37 48 12 92 33 86

merge 25 37 48 57 12 33 86 92

2 parts 25 37 48 57 12 33 86 92

merge 12 25 33 37 48 57 86 92

Page 25: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Performance considerations of merge sort

There are only log2n passes yielding a complexity of O(nlogn)

It never requires n* log2n comparison while quicksort may require O(n2) at the worst case

However, it requires about double of assignment statements as quicksort

It also requires more additional spaces, about O(n), than quicksort's O(log2n)

Page 26: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Radix SortIt is based on the values of the actual digits of its octal

position

Starting from the least significant digit to the most significant digit

define 10 vectors for each digit and number the vectors from v0 to v9 for digit 0 to 9 respectively

scan the data sequence once and add xi into the significant digit's respective vector

new data sequence is as follows: remove elements from each vector from the beginning one by one until it is empty from q0 to q9

After the above actions, the new data sequence is the sorted sequence!

Page 27: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

An example of radix sort

25 57 48 37 12 92 86 33

12 92 33 25 86 57 37 48

25

5748

37

12 92

86

33

12

92

3325

86

57

3748

12 25 33 37 48 57 86 92

Page 28: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Performance considerations of radix sort

It does not require any comparison between data

It requires number of digits, log10 m, passes

O(n*log10 m) O(n), treating log10 m a constant

It requires 10 times of the memory for numbers

It seems that radix sort has the “best” performance; however, it is not popularly used because

It consumes a terrible amount of memory Log10 m depends on the digit (length) of a key and may

not be treated as a small constant when the key length is long

Page 29: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

The real life sort for vector based data

Although quick sort is known to be the fastest in many cases, the library will not usually directly use quick sort as the sort method

Usually, a carefully designed library will implement its sort method with quick sort and insertion sort

Quick sort divides partitions until a partition is about the size from 8 to 16, insertion is applied to the partition since the partitions usually are near being sorted

Page 30: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

The real life sort for non vector data

Quick sort requires a container with random access

A container such as a linked list does not support random access and cannot apply quick sort

Merge sort is preferred to be applied

Page 31: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Sample timing of sort methods

Ford’s prg15_2.cpp & d_sort.h

Timing for some sample runs : timeSort.out

Page 32: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Summary

Bubble sort and insertion sort have complexity of O(n2) but insertion sort is still preferred for short data stream

Partition sort, merge sort have a less complexity at O(n logn)

Radix sort seemed at O(n) complexity but it consumes more memory and may depend on the key length

Many times, the trade off is space

Page 33: DCO 20105  Data structures and algorithms

Rossella Lau Lecture 7, DCO20105, Semester A,2005-6

Reference

Ford: 3.1, 4.4, 8.3 15.1

Data Structures using C and C++ by Yedidyah Langsam, Moshe J. Augenstein & Aaron M. Tenenbaum: Chapter 6

Example programs: Ford: prg15_2.cpp, d_sort.h,

-- END --