michael hanke introduction parallelsorting · 2017-11-09 · introduction michael hanke...

50
Introduction Michael Hanke Introduction Review of Sorting Algorithms Algorithms Based on Compare-And- Exchange Other Algorithms Parallel Sorting Michael Hanke School of Engineering Sciences Parallel Computations for Large-Scale Problems I Chapter 7, p. 1/50

Upload: others

Post on 31-Jan-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Parallel Sorting

Michael Hanke

School of Engineering Sciences

Parallel Computations for Large-Scale Problems I

Chapter 7, p. 1/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Outline

1 Introduction

2 Review of Sorting Algorithms

3 Algorithms Based on Compare-And-Exchange

4 Other Algorithms

Chapter 7, p. 2/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

The Problem

DefinitionLet r1, . . . , rN be a list of records where each record is identified by aunique key k1, . . . , kN . A sorting algorithm is an algorithm whichcomputes a permutation π such that, for the reordered listrπ(1), . . . , rπ(N), it holds kπ(1) ≤ kπ(2) ≤ · · · ≤ kπ(N) where “≤”denotes an order relation in the set of keys.

• The most often used order relations are numerical order andlexicographical order.

• The general definition includes also the case of sorting indecreasing order.

Chapter 7, p. 3/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Why Sorting?

• Information retrieval in large data sets• Optimization of many algorithms:

• graph algorithms• sparse matrix computations• irregular domains and grids

• Sorting is one of the basic tasks in computer science

Chapter 7, p. 4/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Parallel Sorting

Sorting is usually considered in extenso in introductory computerscience classes.

Why do it here again?So what is the challenge?

Chapter 7, p. 5/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Issues in Parallel Sorting

1 Where are input and output sequences stored?• Stored only in the memory of one processor;• distributed over all processors.

2 How are comparisions performed?• One element per process• More than one element per process

Chapter 7, p. 6/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Distributed Data

We need a generalization of the definition of sorting:• Assume the permutation π according to the sorting criterium begiven.

• Let p = 0, . . . ,P − 1 denote the processors. The datadistribution µ after the sorting algorithm fulfills:

1 µ is a linear data distribution of 1, . . . ,N,2 rπ(n) resides on processor p with (p, i) = µ(n).

With other words: After sorting, all records residing on processor pare less than or equal to all records residing on processor p + 1.

Chapter 7, p. 7/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Challenge

Obtain optimal (time) comlexity with a minimal number ofprocessors!

We will only consider internal sorting, that is, all data reside in themain memory.

Chapter 7, p. 8/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Repeat: Rank Sort

r = zeros(N,1);for i = 1:N

for j = 1:Nif a(j) < a(i)

r(i) = r(i)+1;end

endend

Properties:• On a shared memory machine with P = N processors, thespeedup is O(logN).

• On a distributed memory machine, there will be either a hugememory overhead or a large amount of communication.

Chapter 7, p. 9/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Classification

• Comparision-based algorithms: sorts a list by repeatedlycomparing pairs of elements and, if they are out of order,exchanging them (sorting by compare-and-exchange)

• Noncomparision-based algorithms: Use special a-priori knownproperties of the keys. (Example: The keys are integers from afixed interval.)

Chapter 7, p. 10/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Time Complexity

• The optimal complexity of comparision-based algorithms is

T ∗1 = O(N logN)

• The best theoretical parallel complexity for P = N processors is,therefore,

TN = O(logN)

(neglecting any communication)• There is an algorithm of this complexity, but the constanthidden in the Big-O notation is extremly large.

Chapter 7, p. 11/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Selected Sorting Algorithms

Method Best Average WorstRank sort O(N2) O(N2) O(N2)Bubble sort O(N) O(N2)Merge sort O(N logN) O(N logN) O(N logN)Insertion sort O(N) O(N2) O(N2)Quicksort O(N logN) O(N logN) O(N2)Heap sort O(N logN) O(N logN) O(N logN)

Chapter 7, p. 12/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Other Selected Sorting Algorithms

Method Best Average WorstBucket sort O(N) O(N) O(N2)Radix sort O(Nd/s) O(Nd/s) O(Nd/s)

d - key lengths - chunk size

Chapter 7, p. 13/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Basic Assumptions

• The records consist only of a key value.• The keys are a set of numbers.• The order relation is the natural order by magnitude.• Most often, the list is mapped to an array (vector), say a.• For a parallel version, the vector is linearly distributed over theprocessors 0, . . . ,P − 1.

Chapter 7, p. 14/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Compare And Exchange

• The basic operation for two data items A and B is

if A > Btemp = A;A = B;B = temp;

end

• What may be the strategy if A and B are stored on differentprocessors?

Chapter 7, p. 15/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Distributed Compare And Exchange

Processor 0

send(A, 1);receive(A, 1);

Processor 1

receive(A, 0);if A > B

send(B, 0);B = A;

elsesend(A, 0);

end

Chapter 7, p. 16/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Symmetric Distributed CompareAnd Exchange

• In the previous code snippet, one procesor is idle while the otheris doing the comparision

• In the spirit of SPMD programming, a more symmetriccompare-and-exchange implemenation would be desirable

Processor 0

send(A, 1);receive(B, 1);if A > B

A = B;end

Processor 1

receive(A, 0);send(B, 0);if A > B

B = A;end

• Even if the number of comparisions is doubled, the executiontime is identical to the first version

• In MPI, the send/receives can be conveniently combined intoone MPI_Sendrecv call

Chapter 7, p. 17/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Compare And Exchange For LinearData Distributions

• We will generalize the symmetric strategy to the case that N isa multiple of P

• Linear data distribution places N/P consecutive elements oneach processor

• Since finally the elements on each processor are sorted, we canassume that the first step of each sorting algorithm is localsorting

• Therefore, comparing elements on different processors amountsto a merging step (which has complexity (2N/P − 1)ta) followedby a split step

Chapter 7, p. 18/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Compare And Exchange: Algorithm

Processor 0

send(list0, 1);receive(list1, 1);list = merge(list0, list1);list0 = list(1:N/P);

Processor 1

receive(list0, 0);send(list1, 0);list = merge(list0, list1);list1 = list(N/P+1:end);

Note: This algorithm assumes implicitely that the results of themerge operations are identical on both processors.

AssumptionIn the following we will assume that each processor holds (at most)one element.

Chapter 7, p. 19/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Bubble Sort

• Idea: The largest number is moved to the end of the list by anumber of compare and exchanges. Then this step is repeatedfor the remaining list, and so on.

• Sequential algorithm (non-optimized!):

for i = N:-1:2for j = 1:i-1

if a(j) > a(j+1)temp = a(j);a(j) = a(j+1);a(j+1) = temp;

endend

end

Chapter 7, p. 20/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Bubble Sort: Properties

• Complexity is independent of the order of the elements• Time complexity O(N2)

• The algorithm is purely sequential• Straightforward parallelization idea: The bubbling action of thenext iteration of the inner loop could start before the precedingiteration has finished, so long as it does not overtake thepreceeding bubbling action

• This suggests that a pipeline implementation might be beneficial

Chapter 7, p. 21/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Bubble Sort: PipelinedComputation

We will not go into detail here!

Chapter 7, p. 22/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Parallel Bubble Sort: Odd-EvenSort

• Idea: Rearrange the comparisions in as many independentcomparisions as possible!

• Observation: Comparisions can be carried out in parallel if everyprocessor is involved in at most one compare-and-exchange.

• This can be achieved if the processors are grouped intoeven/odd pairs or odd/even pairs

Chapter 7, p. 23/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Odd-Even Sort: Algorithm

odd-even phase The odd processes p compare and exchange theirelements with the even processors p + 1

even-odd phase The even processes compare and exchange theirelements with the odd processors p + 1

Chapter 7, p. 24/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Odd-Even Sort (cont)

FactThe algorithm is guaranteed to terminate after N/2 odd-even andeven-odd steps.

Chapter 7, p. 25/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Odd-Even Sort: Implementation

p = 1, 3, . . . ,N − 1

% even phasesend(A, p-1);receive(B, p-1);if A < B

A = B;endif p <= N-3

% odd phasesend(A, p+1);receive(B, p+1);if A > B

A = B;end

end

p = 0, 2, . . . ,N − 2

% even phasereceive(A, p+1);send(B, p+1);if A < B

B = A;endif p >= 2

% odd phasereceive(A, p-1);send(B, p-1);if A > B

A = B;end

end

Chapter 7, p. 26/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Odd-Even Sort: Complexity

Assume linear data distribution:1 Initial local sort: O(N/P log(N/P))2 During the global sorting phase, there are P/2 iterations3 Each iteration contains:

1 4 send/receives of length N/P2 2 merges of length 2N/P

TP =P2

(4(tstartup +

NPtdata

)+ 2(2N/P − 1)ta

)+O(N/P log(N/P))

Neglecting communication costs, for P = N we obtain

TP = O(P), SP = O(logP)

Chapter 7, p. 27/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Merge Sort: Idea

• Fact: Two ordered lists of length n can be merged into oneordered list of length 2n with 2n − 1 comparisions

• A list of length n = 1 is ordered by definition• This suggests a divide-and conquer strategy: Apply the followingalgorithm mergesort recursively:

• If n > 1, divide the list in two halfs and call mergesort for bothsublists

• Merge the two sublists into one ordered list and return that list• If n = 1, return the list

Chapter 7, p. 28/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Merge Sort: Properties

• Complexity is independent of the order of the elements• Time complexity is O(N logN), which is optimal forcomparision-bases sorting algorithms

• The divide-and-conquer approach is immediately parallelizable:• The data (list) distribution step amounts to a scatter operation

(if the data are not already distributed in this way)• The merge step can be done along the ideas of recursive doubling

Chapter 7, p. 29/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Merge Sort: Implementation

• Assume the data already distributed• Assume for simplicity N = P = 2D being a power of 2• Implementation:

list = A;for d = 0:D-1

q = bitflip(p, d);send(list, q);receive(listq, q);list = merge(list, listq);

end

• Note: After completion, every processor holds the complete list!

Chapter 7, p. 30/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Merge Sort: Performance Analysis• For each d , the number of data exchanged is 2d

• Communication time

tcomm =D−1∑d=0

(tstartup + 2d tdata)

≈ Dtstartup + 2Dtdata = logNtstartup + Ntdata

• For each d , two lists of length 2d are merged• Computation time

tcomp =D−1∑d=0

(2 · 2d − 1)ta ≈ Nta

• Hence,TP = O(P), SP = O(logP)

Chapter 7, p. 31/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Quicksort

• In avarage, quicksort has a sequential time complexity ofO(N logN)

• Quicksort is another example of a divide-and-conquer algorithm

QuestionCan quicksort be a basis for a good parallel algorithm?

Chapter 7, p. 32/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Quicksort: Algorithm

1 Choose a pivot element from the list2 Partition the list into two sublists such that one list contains all

elements less than the pivot, while the remaining elements formanother list

3 Apply quicksort recursively to these two sublists

The divide-and-conquer strategy is similar to mergesort.

Chapter 7, p. 33/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Quicksort: Problems

• In contrast to mergesort, the data distribution is not known inadvance

• The initial partition must be done on one processor, only, whichwill seriously limit the speed

• The tree with subproblems may become heavily unbalanced, i.e.,the sublist may vary considerably in length

• The worst case behavior is O(N2) even in the parallel case

Chapter 7, p. 34/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Bitonic Mergesort

QuestionCan we do better?

DefinitionA sequence a1, a2, . . . , aN is called bitonic if either

1 there exists an index i such that a1 < · · · < ai andai > · · · > an, or

2 there is a cyclic shift of indices such that (1) holds.

Chapter 7, p. 35/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Bitonic Split• Let a1, . . . , aN be a bitonic sequence with N = 2n. Consider:

function [al,ar] = splitincr(a)n = length(a)/2;for i = 0:n-1

if a(i) > a(i+n)al(i) = a(i+n);ar(i) = a(i);

elseal(i) = a(i);ar(i) = a(i+n);

endend

Properties of the transformed sequence

• al and ar are two bitonic sequences;• all elements of the first sequence are smaller than those of thesecond sequence

Chapter 7, p. 36/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Bitonic Split: Decreasing

function [al,ar] = splitdecr(a)n = length(a)/2;for i = 0:n-1

if a(i) < a(i+n)al(i) = a(i+n);ar(i) = a(i);

elseal(i) = a(i);ar(i) = a(i+n);

endend

Chapter 7, p. 37/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Sorting a Bitonic Sequence

Chapter 7, p. 38/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Sorting a Bitonic Sequence (cont)

function a = bisort(a,dir)if length(a) == 1

returnendswitch dircase incr:

[al,ar] = splitincr(a);case decr:

[al,ar] = splitdecr(a);otherwise:

error(’Ooops’)enda1 = bisort(al,dir);a2 = bisort(ar,dir);a = [a1,a2];

Chapter 7, p. 39/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Bitonic Sort

• A sequence containing two elements is bitonic• Every bitonic sequence can be sorted by the algorithm givenbefore in increasing or decreasing order

• Joining two neighboring sequences where the first is increasingwhile the second is decreasing provides a bitonic sequence

Chapter 7, p. 40/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Bitonic Sort (cont)

Chapter 7, p. 41/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Bitonic Sort: Implementation

function a = bitonicsort(a,dir)n = length(a);if n == 1

returnenda1 = bitonicsort(a(0:n/2-1),incr);a2 = bitonicsort(a(n/2:n-1),decr);a = bisort([a1,a2],dir);

Chapter 7, p. 42/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Bitonic Sort: Complexity

FactBitonic Sort implemented on P = N = 2D processors has a timecomplexity of O(log2 N)!

• Bitonic sort can be mapped efficiently to mesh and hypercubeprocessor topologies.

Chapter 7, p. 43/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Bucket Sort

AssumptionThe keys are uniformly distributed on the interval [a, b].

• Idea: For P processors, subdivide the interval [a, b] into P equalchunks, Ip = [xp, xp+1) where xp = a+ ph, h = (b − a)/P

• Each processor scans its local list and determines where to sendthe elements to

• After having received all elements, sort the local lists

Chapter 7, p. 44/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Bucket Sort: Performance Analysis

• Computation: Each processor needs to perform N/Pcomparisions

• The local sort has O(N/P log(N/P)) comparisions. Hence,

tcomp = O(N/P log(N/P))ta

• Invoking the uniform distribution assumption, P − 1communication steps (all-to-all) with a message lengthO(N/P2),

tcomm = (P − 1)(tstartup + N/P2tdata)

Chapter 7, p. 45/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Sample Sort

• Bucket sort does not work efficiently if the interval is unknownor if the assumption of uniform distribution is not fullfilled

• Can one find a subdivision of the real axis x0 < x1 < · · · < xPsuch that the number of elements falling in each subinterval isroughly constant?

• This subdivision is data-dependent and must be generated

Chapter 7, p. 46/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Sample Sort: Idea

• A sample of size s is selected from the sequence and sorted• Select P − 1 elements x1 < · · · < xP−1 (called splitters)• Set x0 = −∞ and xP = +∞• Define the buckets by Ip = [xp, xp+1)

Chapter 7, p. 47/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Sample Sort: Complexity

TP = O(N/P log(N/P) local sort

+ O(P2 logP) sort sample+ O(P log(N/P)) block partition+ O(N/P) + O(P logP) communication

Here, we used a sample of P − 1 elements per process, s = P(P − 1).

Chapter 7, p. 48/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Radix Sort

AssumptionThe keys are positive integers

• Then every key k has a unique representation in a β-adicposition system,

k = a0β0 + a1β

1 + · · ·+ anβn

• Assume that n is the maximal exponent appearing• Idea: Sort the sequence by first sorting the least significant digit(a0), then a1, and so on until the most significant digit (an)

Chapter 7, p. 49/50

Introduction

MichaelHanke

Introduction

Review ofSortingAlgorithms

AlgorithmsBased onCompare-And-Exchange

OtherAlgorithms

Radix Sort: Complexity

Let β = 2r :TP = βn(O(logN) + O(N))

The O(N)-term belongs to the communication part.

Chapter 7, p. 50/50