sorting. quick sort example 2, 1, 3, 4, 5 2, 4, 3, 1, 5, 5, 6, 9, 7, 8, 5 s={6, 5, 9, 2, 4, 3, 5, 1,...
TRANSCRIPT
Sorting
Quick SortExample
2, 1, 3, 4, 5
2, 4, 3, 1, 5, 5, 6, 9, 7, 8, 5
S={6, 5, 9, 2, 4, 3, 5, 1, 7, 5, 8}
1, 2 3 4, 5 5 5, 6, 7, 8, 9
Quick Sort
Step 1. If n = 1 then return.
Step 2. Find the median m of the input array A.
Step 3. Use m to partition A into two
subsequences B and C.
Step 4. Quick sort B.
Step 5. Quick sort C.
O(1)O(n)
O(n)
t(n/2)
The time complexity of Quick sort
t(n)= O(1) + O(n) + O(n) + 2t(n/2) = cn + 2t(n/2) = O(n log n)
Merging Networks
(1, 1)-merger (comparator):
Sorting Networks
Merging Networks
(2, 2)-merger:
Sorting Networks
Running Time
s(2) = 1, i = 1
s(2i) = s(2i-1) + 1, i > 1.
s(2i): the time required in the ith stage.
There are log n stages in all.
s(2i) = i
t(n) = s(21) + s(22) + … + s(2log n) = 1 + 2 + … + log n = O(log2 n)
Number of Processors
q(2) = 1, i = 1
q(2i) = 2q(2i-1) + 2i-1 - 1, i > 1.
q(2i): the number of comparators required in the ith stage.
q(2i) = (i-1)2i-1 + 1
q(2i) = 2q(2i-1) + 2i-1 – 1
= 22q(2i-2) + 2i-1 – 2 + 2i-1 – 1
=23q(2i-3) + 2i-1 – 22 + 2i-1 – 21 + 2i-1 – 20
… = 2i-1q(21) + 2i-1 + … + 2i-1
– (1 + 2 + … + 2i-2) = 2i-1 + (i – 1)2i-1 – 2i-1 + 1
= (i – 1)2i-1 + 1
p(n) = 2(logn)-1q(21) + 2(logn)-2q(22) + … + 20q(2log n) = O(nlog2 n)
Cost
c(n) = p(n) * t(n)
= O(nlog2 n) * O(log2 n)
= O(nlog4 n)
Sorting on a linear array (odd-even transposition)
Procedure ODD-EVEN TRANSPOSITION (S) for j = 1 to ┌n/2┐ do (1) for i = 1, 3, …, 2└n/2┘ - 1 do in parallel if xi > xi+1
then xi ←→ xi+1
end if end for (2) for i = 2, 4, …, 2└(n-1)/2┘ do in parallel if xi > xi+1
then xi ←→ xi+1
end if end for end for
Time:
2
n odd-even steps.
ntnp * = O(2n ) Cost:
2 ways for reducing cost:(i) reduce running time(ii) reduce # of processors
Reducing running time is hopeless since the lower bound for sorting on a linear array of n processors is . n
Sorting on a linear array (odd-even transposition)
{10, 11, 12}
{1, 2, 3} {7, 8, 9}{4, 5, 6}
{9, 11, 12}
{1, 2, 3} {7, 8, 9}{4, 5, 6}
{9, 11, 12}
{1, 2, 5} {7, 8 10}{3, 4, 6}
{4, 9, 11}
{1, 2, 5} {3, 4, 6}{7, 8, 10}
{11, 4, 9}
{2, 5, 8} {3, 6, 12}{1, 7, 10}
{10, 11, 12}
{8, 2, 5} {3, 12, 6}{10, 1, 7}
(2.1)
1
(2.2)
(2.1)
(2.2)
AFTER STEP
INITIALLY
p 1 p 2 p 3 p 4
N processors are available, N < n.Each processor stores data elements.
nN
Stage 1: Sort sequentially in each processor.Stage 2: Odd-even transposition sort. Each comparison-exchange is replaced with a merge-split.
O((n/N)log(n/N))
┌ N/2┐O(n/N)用 sequential merge, 每次 O(n/N) 的時間 ,共有 N/2 個 steps, 所以總時間為
nn
N
nO
N
nO
N
N
n
N
nO log
2)log(
C(n) = p(n)*t(n) = )log( nNnnO
The algorithm is cost optimal when
nN log .
CRCW Sort
5 2 4 5
5 2 4 5
2 0 1 3
Procedure CRCW SORT(S) Step 1. for i = 1 to n do in parallel for j = 1 to n do in parallel if (si > sj) or (si = sj and i > j) then P(i, j) writes 1 in ci
else P(i, j) writes 0 in ci
end if end for end for Step 2. for i = 1 to n do in parallel P(i, 1) stores si in position 1 + ci of S end for
O(1)
O(1)
p(n) = n2
t(n) = O(1)
c(n) = n2
CREW Sort( 利用 CREW MERGE)
S={2, 8, 5, 10, 15, 1, 12, 6, 14, 3, 11, 7, 9, 4, 13, 16}N = 4
Step 1. P1: {2, 8, 5, 10} P2: {15, 1, 12, 6} P3: {14, 3, 11, 7} P4: {9, 4, 13, 16}
Step 2. P1: {2, 5, 8, 10} P2: {1, 6, 12, 15} P3: {3, 7, 11, 14} P4: {4, 9, 13, 16}
Step 3. P1, P2 : {1, 2, 5, 6, 8, 10, 12, 15} P3 , P4 : {3, 4, 7, 9, 11, 13, 14, 16}
P1, P2 , P3 , P4 : {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}
Procedure CREW SORT(S)
Step 1. for i = 1 to N do in parallel
Processor Pi
1.1 reads a distinct subsequence Si of S of size n/N
1.2 QUICKSORT(Si)
1.3 S(i, 1) ← Si
1.4 P(i, 1)← {Pi}
O(n/N log n/N)
Step 2. u ← 1 v ← N while v > 1 do
for m = 1 to └v/2┘ do in parallel P(u+1, m) ← P(u, 2m-1) ∪ P(u, 2m) The processors in the set P(u+1, m) perform CREW MERGE(S(u, 2m-1), S(u, 2m), S(u+1, m)) end for if v is odd then P(u+1, [v/2]) ← P(u, v) S(u+1, [v/2]) ← S(u, v) end if u ← u + 1 v ← [v/2] end while
每次 O(n/N + log n)共做 [log N] 次 ,總共所花時間為[log N] * O(n/N + log n)
t(n) = O((n/N) log(n/N)) + O((n/N) + log n) *log N = O((n/N) log n – (n/N) log N) + O((n/N) log N + log n log N) = O((n/N) log n + log n log N)
c(n) = p(n) * t(n) = N * O((n/N) log n + log n log N) = O(n log n + N log n log N)
The algorithm is cost optimal when N n/log N.≦
Sorting on the EREW Model
Simulating Procedure CREW Sort:用 MULTIPLE BROADCAST 來取代Concurrent Read.
多花 log N 的時間 .
t(n) = O((n/N)log n + log n log N) * O(log N) = O([(n/N) + log N] log n log N)
c(n) = O([(n/N) + log N] log n log N) * N = O((n +N log N) log n log N) which is not cost optimal.
Sorting by Conflict-Free Merging
用 EREW MERGE 取代 CREW MERGE.
每次 EREW MERGE 所花的時間為O((n/N) + log n log N).
要做 log N 次 ,
所以總共所花時間為t(n) = O((n/N) log (n/N)) + O((n/N) + log n log N) * log N
= O([(n/N) + log2N] log n)
c(n) = O((n + N log2N) log n)
which is cost optimal when N n/log≦ 2N.
Parallel Quicksort
S={5, 9, 12, 16, 18, 2, 10, 13, 17, 4, 7, 18, 18,1 1, 3, 17, 20, 19, 14, 8, 5, 17, 1, 11, 15, 10, 6}
n = 27 N = 5N = n 1-x = 27 1-x
x 0.5≒
將 n 個 data 分成 21/x 塊 , 每塊有 n/21/x 個 data.
原因稍後解釋
21/x = 21/0.5 = 4 n/21/x = 27/4 7 ≒
所以 PARALLEL SELECT 第 7, 14, 21 大的數字 ,分別另為 m1, m2, 和 m3.m1 = 6, m2 = 11, m3 =17.
S1 = {5, 2, 4, 3, 5, 1, 6}S2 = {9, 10, 7, 8, 10, 11, 11}S3 = {12, 16, 13, 14, 15, 17, 17}S4 = {18, 18, 18, 20, 19, 17}
m1 = 6, m2 = 11, m3 = 17
n = 7 N = n 1-x = 7 1-0.5 2≒
P1, P2:S1 = {5, 2, 4, 3, 5, 1, 6}
P3, P4:S2 = {9, 10, 7, 8, 10, 11, 11}
m1 = 2, m2 = 4, m3 = 5 m1 = 8, m2 = 10, m3 = 11
將 n 個 data 分成 21/x 塊 , 每塊有 n/21/x 個 data.
7 4 2
P1, P2:S = {5, 2, 4, 3, 5, 1, 6}
P3, P4:S = {9, 10, 7, 8, 10, 11, 11}
m1 = 2, m2 = 4, m3 = 5 m1 = 8, m2 = 10, m3 = 11
S1 = {1, 2}S2 = {3, 4}S3 = {5, 5}S4 = {6}
S1 = {7, 8}S2 = {9, 10}S3 = {10, 11}S4 = {11}
S1 = {5, 2, 4, 3, 5, 1, 6}S2 = {9, 10, 7, 8, 10, 11, 11}S3 = {12, 16, 13, 14, 15, 17, 17}S4 = {18, 18, 18, 20, 19, 17}
m1 = 6, m2 = 11, m3 = 17
n = 7 N = n 1-x = 7 1-0.5 2≒
P1, P2:S3 = {12, 16, 13, 14, 15, 17, 17}
P3, P4:S4 = {18, 18, 18, 20, 19, 17}
m1 = 13, m2 = 15, m3 = 17 m1 = 18, m2 = 18, m3 = 20
將 n 個 data 分成 21/x 塊 , 每塊有 n/21/x 個 data.
7 4 2
P1, P2:S = {12, 16, 13, 14, 15, 17, 17}
P3, P4:S = {18, 18, 18, 20, 19, 17}
m1 = 13, m2 = 15, m3 = 17 m1 = 18, m2 = 18, m3 = 20
S1 = {12, 13}S2 = {14, 15}S3 = {16, 17}S4 = {17}
S1 = {17, 18}S2 = {18, 18}S3 = {19, 20}S4 = { }
procedure EREW SORT (S) if kS
then QUICKSORT (S)else (1) for i=1 to k-1 do PARALLEL SELECT (S,
k
Si) {Obtain im }
11 : msSsS
(3) for i=2 to k-1 do
iii msmSsS 1:
end for
1: kk msSsS(5) for i=1 to k/2 do in parallel EREW SORT end for
iS
end for
(2)
(4)
(6) for 12 ki to k do in parallel
Si
end for
EREW SORT
end if
xk1
2xnN 1 : number of processors
m 1 m 2 m iS
n/2 1/x n/2 1/x n/2 1/x n/2 1/x
S 1 S kS 2 S j
,
Why 2
k?
n elements use xn 1
processors.
x
n1
2 elements use
222
1
11
1x
xx
x
nn
processors.
222
1
11
x
xx nk
n
time: nnOkntcnnt xx log2
which is cost optimal.
c(n) = p(n)*t(n) = n1-x * nx log n = n logn