parallel prefix and data parallel operations motivation: basic parallel operations which occurs...

22
Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a 2 ) ) a 3 = a 1 ) (a 2 ) a 3 ) How to compute (a 1 ) a 2 ) …. ) a n ) in parallel in O(logn) time?

Post on 20-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Parallel Prefix and Data Parallel Operations

Motivation: basic parallel operations which occurs repeatedly.Let ) be an associative operation.

(a1 ) a2) ) a3 = a1 ) (a2 ) a3 )

How to compute

(a1 ) a2 ) …. ) an ) in parallel in O(logn) time?

Page 2: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Approach 1

a0 a1 a2 a3 a4 a5 a6 a7

[0:1][0:0] [1:2] [2:3] [3:4] [4:5] [5:6] [6:7]

[0:1][0:0] [0:2] [0:3] [1:4] [2:5] [3:6] [4:7]

[0:1][0:0] [0:2] [0:3] [0:4] [0:5] [0:6] [0:7]

d=1

d=2

d=4

Assume that n = 2k

for i = 0 to k-1 for j = 0 to n-1-2i do in parallel

x[j+ 2i ] = x[j] + x[j+ 2i ]

Page 3: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

How to do on Tree Architecture?

for each nodeif there is a signal from left and right

St <- Sl + Sr

if there is a signal R, send R to both its children

if the node is a leaf and there is a signal R, X <- X + R

SlSr

StR

Page 4: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

How to do on a Hypercube

A complete binary tree can be embedded into a hypercubeSimpler solution: each node computes prefix and total sum for i = 0 to k-1 for j = 0 to n-1 do in parallel

x[j] = x[j] + sum[ji] if i-th bit of j = 1

sum[j ] = sum[j] + sum[ji],

where ji and j have the same binary number representation

except their i-th bit, where the i-th bit of ji is the

complement of the i-bit of j.

Page 5: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Prefix on Hypercube

a0 a1 a2 a3 a4 a5 a6 a7

for i = 0 to k-1 for j = 0 to n-1 do in parallel

x[j] = x[j] + sum[ji] if i-th bit of j = 1

sum[j ] = sum[j] + sum[ji],

[0:1]

[0:1]

[0:0]

[0:1]

[2:2]

[2:3]

[2:3]

[2:3]

[4:4]

[4:5]

[4:5]

[4:5]

[6:6]

[6:7]

[6:7]

[6:7]d=1X

SUM

[0:1]

[0:3]

[0:0]

[0:3]

[2:2]

[0:3]

[2:3]

[0:3]

[4:4]

[4:7]

[4:5]

[4:7]

[4:6]

[4:7]

[4:7]

[4:7]d=2X

SUM

[0:1]

[0:7]

[0:0]

[0:7]

[2:2]

[0:7]

[2:3]

[0:7]

[0:4]

[0:7]

[0:5]

[0:7]

[0:6]

[0:7]

[0:7]

[0:7]d=4X

SUM

Page 6: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Applications of Data Parallel Operations

Any associative operations:

Examples:– min, max, add– adding two binary numbers– finite state automata– radix sort– segmented prefix sum– routing

• packing• unpacking• broadcast (copy-scan)

– solving recurrence equations– straight line computation (parallel arithmetic evaluation)

Page 7: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Adding two n bit numbers as parallel prefix

• a = an-1 …. a0

• b = bn-1 …. b0

• s = a + b

• note that si = ai bi ci-1

• to compute ci define g and p as:

gi = ai bi , pi = ai bi

• define as : (g,p) (g’,p’) = (g (p g’), p p’)

Then carry bit ci can be computed by:

(g,p) (g’,p’) = (g (p g’), p p’)

(Gi, Pi) = (gi,pi) (gi-1, pi-1) … (g0,p0)

and Gi = ci

Page 8: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Hardware circuit of recursive look-ahead adder

a0

b0

a10

b10

a12

b12

a6

b6

a9

b9

a3

b3

a14

b14

a13

b13

a1

b1

a5

b5

a7

b7

a4

b4

a2

b2

a8

b8

a15

b15

a11

b11

Page 9: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Parsing a regular language

b b

c cq1q2q0

(q0,b) = q2, (q0,c) = q1, (q1,b) = q0, (q1,c) = qr,(q2,b) = qr, (q2,c) = q0qr: reject state

q0->q2q1->q0q2->qr

q2q0qr

q1qrq0

q1qrq0

q2q0qr

q1’q2’q3’

q1’q2’q3’

q0q1qr

q1qrq0

b

q1’q2’q3’

q0q1qr

q0qrq2

q0q1qr

q0qrq2

q0qrqr

bccb c

Page 10: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Segmented Prefix operation

Segment boundary

1 3 3 7 12 18 7 15after

1 2 3 4 5 6 7 8

before

Page 11: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Segmented Prefix computation

Let be any associative operation.For segmented operation of , define ’ as follows:

’ b | b

a a b | b | a | (a b) | b

Then ’ is associativeand we can compute segmented operation in O(logn) time.

Page 12: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Enumerating

Data = [5 6 3 1 8 3 7 5 9 2]

active procs = [1 0 1 1 0 0 1 0 1 0]

enumerated = [0 x 1 2 x x 3 x 4 0]

Page 13: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

packing

data = [5 6 3 1 8 3 7 5 9 2]

active procs = [1 0 1 1 0 0 1 0 1 0]

enumerated = [0 x 1 2 x x 3 x 4 x]

packed data =[5 3 1 7 9 x x x x x]

Page 14: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Packing and Unpacking on Hypercube

Packing• adjust bit 0• adjust bit 1• adjust bit 2 • ...• adjust bit k-1

Unpacking• adjust bit k-1• adjust bit k-2• ...• adjust bit 1• adjust bit 0

How about in the order of adjust bit 0, 1, ..., k-1 for packing?

Page 15: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Unpacking

Address 0 1 2 3 4 5 6 7 8 9

data = [6 2 3 5 9 x x x x x]

active procs = [1 0 1 1 0 0 1 0 1 0]

enumerated = [0 x 1 2 x x 3 x 4 x]

destination = [0 2 3 6 8 x x x x x]

unpacked data = [6 x 2 3 x x 5 x 9 x]

Page 16: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Copy Scan (broadcast)

address 0 1 2 3 4 5 6 7 8 9

data = [ 6 2 3 5 9 4 1 7 8 10]

segmented bit = [ 1 0 1 1 0 0 1 0 1 0]

result = [ 6 6 3 5 5 5 1 1 8 8]

Page 17: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Radix Sort

for j = k-1 to 0 // x has k bits for all i in [0 .. n-1] do parallel { if j-th bit of x[i] is 0 { y[i] = enumerate c = count } if j-th bit of x[i] is 1 y [i] <- enumerate + c

x [y[i]] = x [i] }

Radix sort another code

for j = k-1 to 0 // x has k bits for all i in [0 .. n-1] do parallel { pack left x[i] if j-th bit of x[i] pack right x[i] if j-th bit of x[i] }

Page 18: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Quick Sort

1. Pick a pivot p

2. Broadcast p

3. For all PE i, compare A[i] with p

{ if A[i] <p, pack left A[i] in the segment

if A[i] >= p, pack right A[i] in the segment

}

4. Mark the segment boundary

5. Each segment, quick sort recursively

Page 19: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Solving Linear Recurrence Equations

fn=an-1fn-1 + an-2fn-2

fn

fn-1

Page 20: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Pointer Jumping and Tree Computation

How to compute a prefix on a linked list?

1 2 3 4 5 6 7

If NEXT[i] != NILL then X[i] <- X[i] + X[NEXT[i]] NEXT[i] <- NEXT[NEXT[i]]

10 14 18 22 18 13 7

3 5 7 9 11 13 7

28 27 25 22 18 13 7

How to make 1 3 6 10 15 21 28 order?

Page 21: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Application: Tree computationPre-order numbering

Each node

Leaf node

1

1

Can be applied to in order, post ordernumber of children, depth etc.Bi-component, etc also

Page 22: Parallel Prefix and Data Parallel Operations Motivation: basic parallel operations which occurs repeatedly. Let ) be an associative operation. (a 1 ) a

Recurrence Equation

Example: LU decomposition on a triangular matrix