tecniche di programmazione a.a....

57
Sorting Tecniche di Programmazione – A.A. 2012/2013

Upload: dangliem

Post on 15-Feb-2019

221 views

Category:

Documents


0 download

TRANSCRIPT

Sorting

Tecniche di Programmazione – A.A. 2012/2013

Summary

1. Problem definition

2. Insertion Sort

3. Selection Sort

4. Counting Sort

5. Merge Sort

6. Quicksort

7. Collections.sort

A.A. 2012/2013 Tecniche di programmazione 2

Summary

1. Problem definition

2. Insertion Sort

3. Selection Sort

4. Counting Sort

5. Merge Sort

6. Quicksort

7. Collections.sort

A.A. 2012/2013 Tecniche di programmazione 3

Iterative

Recursive

Special

Problem definition

Sorting

Formal problem definition: Sorting

Input:

A sequence of n numbers <a1, a2, …, an>

Output:

A permutation <a’1, a’2, …, a’n> of the original elements, such

that a’1 a’2 … a’n

A.A. 2012/2013 Tecniche di programmazione 5

Types of sorting approaches

Internal sorting

Data to be sorted are all within the main computer memory

(RAM)

Direct access to all element values

External sorting

Data to be sorted may not all be loaded in memory at the

same time

We must work directly on data stored on file

Typically, sequential access to data

A.A. 2012/2013 Tecniche di programmazione 6

Sorting objects

A.A. 2012/2013 Tecniche di programmazione 7

Book-algorithms always refer to sorting sequences of

numbers

In practice, we need to sort the elements of a collection,

of some class type

The objects to be sorted must implement the

Comparable interface

Comparable

A.A. 2012/2013 Tecniche di programmazione 8

public interface Comparable<T> (java.lang)

Must implement:

int compareTo(T other)

Returns a negative integer, zero, or a positive integer as this

object is less than, equal to, or greater than the specified other

object.

It is strongly recommended, but not strictly required

that (x.compareTo(y)==0) == (x.equals(y))

http://docs.oracle.com/javase/7/docs/api/java/lang/Comparable.html

Sorting Comparable objects

A.A. 2012/2013 Tecniche di programmazione 9

Given a class, usually

A sub-set of the fields is used for sorting

The fields for sorting are called the «key» of the objects

.equals and .compareTo are defined according to the key fields

Other fields are regarded as «additional data»

Different types of keys (and thus ordering criteria) may be defined

The Comparable interface specifies the «natural» ordering

Other orderings may be achieved with the Comparator helper classes

Comparator

A.A. 2012/2013 Tecniche di programmazione 10

public interface Comparator<T> (java.util)

Must implement:

int compare(T obj1, T obj2)

Returns a negative integer, zero, or a positive integer as the

first argument is less than, equal to, or greater than the second.

It is generally the case, but not strictly required

that (compare(x, y)==0) == (x.equals(y))

Comparators can be passed to a sort method

http://docs.oracle.com/javase/7/docs/api/java/util/Comparator.html

Example

A.A. 2012/2013 Tecniche di programmazione 11

public class Studente implements Comparable<Studente> {

private int matricola ; private String cognome ; private String nome ; private int voto ; @Override public int compareTo(Studente other) { return this.matricola - other.matricola ; }

Example

A.A. 2012/2013 Tecniche di programmazione 12

public class Studente implements Comparable<Studente> {

private int matricola ; private String cognome ; private String nome ; private int voto ; @Override public int compareTo(Studente other) { return this.matricola - other.matricola ; }

«Natural» ordering:

by Matricola field

Example

A.A. 2012/2013 Tecniche di programmazione 13

// Since we define compareTo, we should also redefine equals and hashCode !!! @Override public boolean equals(Object other) { return this.matricola == ((Studente)other).matricola ; } @Override public int hashCode() { return ((Integer)this.matricola).hashCode(); } ... getters & setters ...

}

Based on the same

«key» fields

Comparator for sorting by name

A.A. 2012/2013 Tecniche di programmazione 14

public class StudenteByName implements Comparator<Studente> {

@Override public int compare(Studente arg0, Studente arg1) {

int cmp = arg0.getCognome().compareTo(arg1.getCognome()) ; if( cmp!=0 ) return cmp ; else return arg0.getNome().compareTo(arg1.getNome()) ;

} }

Check names only if

surnames are equal.

Comparator for sorting by voto

A.A. 2012/2013 Tecniche di programmazione 15

public class StudenteByVoto implements Comparator<Studente> { @Override public int compare(Studente o1, Studente o2) { return o1.getVoto()-o2.getVoto() ; }

}

Note: repeated values for the

Voto field are possible

Stability

A sorting algorithm is said to be stable when, if multiple

elements share the same value of the key, in the sorted

sequence such elements appear in the same relative

order of the original sequence.

A.A. 2012/2013 Tecniche di programmazione 16

Algorithms

Various sorting algorithms are known, with differing

complexity:

O(n2): simple, iterative

Insertion sort, Selection sort, Bubble sort, …

O(n): applicable in special cases, only

Counting sort, Radix sort, Bin (o Bucket) sort, …

O(n log n): more complex, recursive

Merge sort, Quicksort, Heapsort

A.A. 2012/2013 Tecniche di programmazione 17

Insertion Sort

Sorting

Insertion sort

2 3 6 12 16 21 8

Already ordered Not considered yet

v[j]

2 3 6 8 12 16 21

Move right by one cell all elements ‘i’ for which v[i]>v[j]

2 3 6 8 12 16 21 5

A.A. 2012/2013 Tecniche di programmazione 19

Quick reference

A.A. 2012/2013 Tecniche di programmazione 20

Running example

A.A. 2012/2013 Tecniche di programmazione 21

Complexity

Number of comparisons

Cmin = n-1

Cavg = ¼(n2+n-2)

Cmax = ½(n2+n)-1

Number of data copies

Mmin = 2(n-1)

Mavg = ¼(n2+9n-10)

Mmax = ½(n2+3n-4)

C = O(n2), M = O(n2)

T(n) = O(n2) Best case: already sorted vector

Worst case: inversely sorted vector

T(n) is not (n2)

Tworst case (n) = (n2)

A.A. 2012/2013 Tecniche di programmazione 22

Selection Sort

Sorting

Selection Sort

At every iteration, find the minimum of the yet-unsorted

part of the vector

Swap the minimum with the current position in the

vector

2 3 6 12 16 21 34 81 25 28 41 27 60

Already ordered Not ordered

v[j]

2 3 6 12 16 21 25 81 34 28 41 27 60

Mimimum

A.A. 2012/2013 Tecniche di programmazione 24

Complexity

A.A. 2012/2013 Tecniche di programmazione 25

The loops don’t depend on the data stored in the array:

complexity is independent from the contents of the

values to be sorted

Worst case performance : О(n2)

Best case performance: О(n2)

Average case performance: О(n2)

Implementation

A.A. 2012/2013 Tecniche di programmazione 26

public void sort(T[] vector, Comparator<T> comp) {

for(int j=0; j<vector.length; j++) {

// find minimum int pos = j ; for(int i=j+1; i<vector.length; i++) {

if( comp.compare(vector[i], vector[pos])<0 ) pos = i ;

} // swap positions if( j!=pos ) {

T temp = vector[pos] ; vector[pos] = vector[j] ; vector[j] = temp ;

} }

}

Counting Sort

Sorting

Counting sort

Not applicable in general

Precondition (hypothesis for applicability):

The n elements to be sorted are integer numbers ranging from

1 to k, for some positive integer k

With this hypothesis, if k = O(n), then the algorithm has

complexity O(n), only!

A.A. 2012/2013 Tecniche di programmazione 28

Basic idea

Find, for each element x to be sorted, how many

elements are less than x

This information allows us to directly deposit x into its

final destination position.

A.A. 2012/2013 Tecniche di programmazione 29

A.A. 2012/2013 Tecniche di programmazione 30

Data structures

We need 3 vectors:

Starting vector : A[1..n]

Result vector : B[1..n]

Support vector : C[1..k]

Vector C keeps track of the number of elements in A that

have a certain value:

C[i] = how many elements in A have value i

The sum of the first i elements in C equals the number of

elements in A with value <= i.

Pseudo-code

A.A. 2012/2013 Tecniche di programmazione 31

Analysis

For each j, C[A[j]] is the number of elements <=A[j], and

also represents the final position of A[j] in B:

B[ C[A[j]] ] = A[j]

The corrective term C[A[j]] C[A[j]] – 1 handles the

presence of duplicate items

A.A. 2012/2013 Tecniche di programmazione 32

Example (n=8, k=6)

3 6 4 1 3 4 1 4 A

2 0 2 3 0 1 C

2 2 4 7 7 8 C

4 B 2 2 4 6 7 8 C

1 4 B 1 2 4 6 7 8 C

A.A. 2012/2013 Tecniche di programmazione 33

Example

4 B 2 2 4 6 7 8 C

1 4 B 1 2 4 6 7 8 C

1 4 4 B 1 2 4 5 7 8 C

1 3 4 4 B 1 2 3 5 7 8 C

j=8

3 6 4 1 3 4 1 4 A

j=7

1 1 3 4 4 B 0 2 3 5 7 8 C

j=6

j=5

j=4

1 1 3 4 4 4 B 0 2 3 4 7 8 C j=3

1 1 3 4 4 4 6 B 0 2 3 4 7 7 C j=2

1 1 3 3 4 4 4 6 B 0 2 2 4 7 7 C j=1

A.A. 2012/2013 Tecniche di programmazione 34

Complexity

1-2: Initialization of C: O(k)

3-4: Computaion of C: O(n)

6-7: Running sum in C: O(k)

9-11: Copy back to B: O(n)

Total complexity is therefore: O(n+k).

The algorithm is useful with k=O(n), only…

In such a case, the overall complexity is O(n).

A.A. 2012/2013 Tecniche di programmazione 35

Quick Reference

A.A. 2012/2013 Tecniche di programmazione 36

Merge Sort

Sorting

Merge Sort

A.A. 2012/2013 Tecniche di programmazione 38

The Merge Sort algorithm is a direct application of the

Divide et Impera approach

6 12 4 5 2 9 5 12

6 12 4 5 2 9 5 12

4 5 6 12 2 5 9 12

2 4 5 5 6 9 12 12

Divide

Solve

Combine

Solve

Merge Sort: Divide

The vector is simply partitioned in two sub-vector,

according to a splitting point

The splitting point is usually chosen at the middle of the

vector

6 12 4 5 2 9 5 12

6 12 4 5 2 9 5 12

Divide

1 8

1 4 5 8

p r

p r q q+1

A.A. 2012/2013 Tecniche di programmazione 39

Merge Sort: Termination

Recursion terminates when the sub-vector:

Has one element, only: p=r

Has no elements: p>r

p r

p r q q+1

A.A. 2012/2013 Tecniche di programmazione 40

Merge Sort: Combine

The combining step implies merging two sorted sub-

vectors

Recursion guarantees that the sub-vectors are sorted

The merging approach compares the first element of each of

the two vectors, and copies the lowest one

The result of the merging is saved in a different vector

Such algorithm may be realized in (n).

4 5 6 12 2 5 9 12

2 4 5 5 6 9 12 12

Combine

A.A. 2012/2013 Tecniche di programmazione 41

Pseudo-code

Combine

Solve

Divide

Termination

MERGE-SORT(A, p, r)

1 if p < r

2 then q (p+r)/2

3 MERGE-SORT(A, p, q)

4 MERGE-SORT(A, q+1, r)

5 MERGE(A, p, q, r)

A.A. 2012/2013 Tecniche di programmazione 42

Note

We often use the following symbols:

x = integer part of x, i.e. largest integer preceding x (floor

function)

x = smallest integer following x (ceiling function)

Examples:

3 = 3 = 3

3.1 = 3; 3.1 = 4

A.A. 2012/2013 Tecniche di programmazione 43

The Merge procedure

MERGE(A, p, q, r)

1 i p ; j q+1 ; k 1

2 while( i q and j r )

3 if( A[i] < A[j]) B[k] A[i] ; i i+1

4 else B[k] A[j] ; j j+1

5 k k+1

6 while( iq ) B[k]A[i] ; ii+1; kk+1

7 while( jr ) B[k]A[j] ; jj+1; kk+1

8 A[p..r] B[1..k-1]

Complexity: (n).

A.A. 2012/2013 Tecniche di programmazione 44

The Merge procedure

MERGE(A, p, q, r)

1 i p ; j q+1 ; k 1

2 while( i q and j r )

3 if( A[i] < A[j]) B[k] A[i] ; i i+1

4 else B[k] A[j] ; j j+1

5 k k+1

6 while( iq ) B[k]A[i] ; ii+1; kk+1

7 while( jr ) B[k]A[j] ; jj+1; kk+1

8 A[p..r] B[1..k-1]

At each iteration, the smallest number

between the heads of the two vectors is

copied to B

The «tail» of one of the vectors is

emptied

A.A. 2012/2013 Tecniche di programmazione 45

Complexity analysis

Termination: a simple test, (1)

Divide (2): find the mid-point of the vector, D(n)=(1)

Solve (3-4): solves 2 sub-problems of size n/2 each,

2T(n/2)

Combine (5): based on the Merge algorithm, C(n) = (n).

A.A. 2012/2013 Tecniche di programmazione 46

Complexity analysis

Termination: a simple test, (1)

Divide (2): find the mid-point of the vector, D(n)=(1)

Solve (3-4): solves 2 sub-problems of size n/2 each,

2T(n/2)

Combine (5): based on the Merge algorithm, C(n) = (n).

One sub-problem has size

n/2 , the other n/2 .

This detail does not change the

complexity result.

A.A. 2012/2013 Tecniche di programmazione 47

Complexity

T(n) =

(1) for n 1

2T(n/2) + (n) for n > 1

The solution (proof omitted…) is:

T(n) = (n log n)

A.A. 2012/2013 Tecniche di programmazione 48

Intuitive understanding (n=16)

16

8 8

4 4 4 4

2 2 2 2 2 2 2 2

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 x 16 = n

2 x 8 = n

4 x 4 = n

8 x 2 = n

16 x 1 = n

log

2 n

Total operations: n log2 n

Recursion levels: log2 n Operations per level: n

A.A. 2012/2013 Tecniche di programmazione 49

Warning

Not all recursive implementations have (n log n)

complexity.

For example, if merge sort is used with asymmetric

partitioning (q=p+1), it degrades to an insertion sort,

yielding (n2).

A.A. 2012/2013 Tecniche di programmazione 50

Quicksort

Sorting

Collections.sort

Sorting

Sorting, in practice, in Java

A.A. 2012/2013 Tecniche di programmazione 74

A programmer’s motto says:

Use the system sort

i.e., the sorting algorithm already provided by your libraries

In other words, don’t re-implement your own sorting functions

The Collections framework provides:

public class Collections

This class consists exclusively of static methods that operate

on or return collections

public static <T extends Comparable<? super T>> void sort(List<T> list)

public static <T> void sort(List<T> list, Comparator<? super T> c)

Collections.sort(list)

A.A. 2012/2013 Tecniche di programmazione 75

Sorts the specified list into ascending order, according to

the natural ordering of its elements.

All elements in the list must implement

the Comparable interface.

Furthermore, all elements in the list must be mutually

comparable (that is, e1.compareTo(e2) must not throw

a ClassCastException for any elements e1 and e2 in the list).

This sort is guaranteed to be stable: equal elements will

not be reordered as a result of the sort.

The specified list must be modifiable, but need not be

resizable. http://docs.oracle.com/javase/7/docs/api/java/util/Coll

ections.html#sort(java.util.List)

Implementation of Collections.sort

A.A. 2012/2013 Tecniche di programmazione 76

This implementation is a stable, adaptive, iterative

mergesort that requires far fewer than n lg(n)

comparisons when the input array is partially sorted,

while offering the performance of a traditional mergesort

when the input array is randomly ordered.

If the input array is nearly sorted, the implementation

requires approximately n comparisons.

Temporary storage requirements vary from a small

constant for nearly sorted input arrays to n/2 object

references for randomly ordered input arrays.

http://docs.oracle.com/javase/7/docs/api/java/util/C

ollections.html#sort(java.util.List)

Resources

A.A. 2012/2013 Tecniche di programmazione 77

Algorithms in a Nutshell, By George T. Heineman, Gary

Pollice, Stanley Selkow, O'Reilly Media

http://docs.oracle.com/javase/7/docs/api/java/lang/Compar

able.html

http://www.sorting-algorithms.com/

Licenza d’uso

A.A. 2012/2013 Tecniche di programmazione 78

Queste diapositive sono distribuite con licenza Creative Commons “Attribuzione - Non commerciale - Condividi allo stesso modo (CC BY-NC-SA)”

Sei libero: di riprodurre, distribuire, comunicare al pubblico, esporre in pubblico,

rappresentare, eseguire e recitare quest'opera

di modificare quest'opera

Alle seguenti condizioni: Attribuzione — Devi attribuire la paternità dell'opera agli autori

originali e in modo tale da non suggerire che essi avallino te o il modo in cui tu usi l'opera.

Non commerciale — Non puoi usare quest'opera per fini commerciali.

Condividi allo stesso modo — Se alteri o trasformi quest'opera, o se la usi per crearne un'altra, puoi distribuire l'opera risultante solo con una licenza identica o equivalente a questa.

http://creativecommons.org/licenses/by-nc-sa/3.0/