prefix sum circuits by: d.m.rasanjalee himali csc 8530: parallel algorithm instructor: dr. sushil...

41
PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

Upload: benjamin-richard

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

PREFIX SUM CIRCUITS

By: D.M.Rasanjalee Himalicsc 8530: Parallel AlgorithmInstructor: Dr. Sushil Prasad

Page 2: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

2

PREFIX SUM: OVERVIEW Also called scan operation Inputs:

Binary operator : Input Sequence :[x0,x1….. xn]

Output : Output Sequence :

[x0, (x0 x1)…… (x0 x1 …. xn)] Ex:

Prefix sum([1, 2, 3, 4, 5, 6, 7, 8]) = [1, 3, 6, 10,15, 21, 28, 36]

Page 3: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

3

PREFIX SUM : CIRCUITS LOWER BOUND

The building block is an adder prefix sum circuit must have :

A depth (logn) as lower bound A circuit width (n) as lower bound A circuit size (n) as lower bound

.

...

..

Width

Depth

Page 4: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

4

PREFIX SUM : A RECURSIVE CIRCUIT

Assume that n is a power of 2 A problem of size n is broken into 2

identical problems of size n/2 each The 2 sub problems are solved

recursively Their solutions are combined to

obtain a solution to the original problem.

Page 5: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

5

PREFIX SUM : A RECURSIVE CIRCUIT The even index

elements {x0,x2,….,xn-2} and the odd index elements {x1,x3,…,xn-1} of the input are fed separately and simultaneously into 2 circuits computing prefix sum of n/2 elements.

x0

x1+x3+x5+x7

s1

s2

s3

s4

s5

s6

s7

x0+x2+x4+x6

x0+x2

x0+x2+x4

x1

x1+x3

x1+x3+x5

x0

x7

x1

x2

x3

x4

x5

x6

s0CIRCUIT

FORPREFIXSUMS

OFn/2

ELEMENTS

CIRCUITFOR

PREFIXSUMS

OFn/2

ELEMENTS

Recursive computation of prefix sums

Page 6: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

6

PREFIX SUM : A RECURSIVE CIRCUIT ANALYSIS

Circuit width is O(n) Circuit depth d(n):

d(1) = 0 d(n) = d(n/2) +1 ;for n>=2 Thus d(n) = O(logn)

Circuit size s(n): s(1) = 0 s(n) = 2s(n/2)+n-1 ;for n>=2 Thus s(n) = O(nlogn)

Circuit is optimal w.r.t. depth and width

Circuit is not optimal w.r.t. size

x0

x1+x3+x5+x7

s1

s2

s3

s4

s5

s6

s7

x0+x2+x4+x6

x0+x2

x0+x2+x4

x1

x1+x3

x1+x3+x5

x0

x7

x1

x2

x3

x4

x5

x6

s0CIRCUIT

FORPREFIXSUMS

OFn/2

ELEMENTS

CIRCUITFOR

PREFIXSUMS

OFn/2

ELEMENTS

Recursive computation of prefix sums

Page 7: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

7

PREFIX SUM :AN OPTIMAL CIRCUIT This circuit recursively

computes the odd index sums: s1 = x0+x1 s3 = x0+x1+x2+x3 .. sn-1 =x0+x1+..+xn-1

Then add xi+1 to the sum si = x0+x1+..+xi for i=1,3,..n-3 Requires (n/2)-1 adders

Yields even-indexed

sums s2,s4,..,sn-2 s0 = x0

x1

x3

x5

x7

x0

s1

x2

s2

s3

s4x4

s5

s6x6

s7

s0

Optimal prefix sums circuit for 8 inputs

Page 8: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

8

PREFIX SUM :AN OPTIMAL CIRCUIT ANALYSIS

Circuit width is O(n) Circuit depth d(n):

d(1) = 0 d(2) = 1 ;for n=2 d(n) = d(n/2) +2;for n>2 Thus d(n) = 2logn - 1

Circuit size s(n): s(1) = 0 s(2) = 1 ;for n=2 s(n) = s(n/2)+n-1 ;for n>2 Thus s(n) = 2n – 2 –logn

The size, width, depth are optimal

x1

x3

x5

x7

x0

s1

x2

s2

s3

s4x4

s5

s6x6

s7

s0

Optimal prefix sums circuit for 8 inputs

Page 9: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

9

A New Approach to Constructing Optimal Prefix Circuits with Small Depth

Page 10: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

10

INTRODUCTION

Paper presents a prefix circuit with a directed acyclic graph containing: n input nodes, n output nodes, at least n – 1 operation nodes, and at least one duplication node.

All the directed edges are assumed to be downward

Page 11: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

11

A SERIAL PREFIX CIRCUIT (S(n))

0

1

1

2

.

.

.

n-1

2 3 . . . n

level

Serial prefix circuit S(n)

input

Page 12: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

12

DEPTH-SIZE OPTIMALITY For any prefix circuit D,

d(D ) + s(D ) >= 2n – 2

Thus, D is depth-size optimal, or optimal for short, if d (D ) + s(D ) = 2n – 2.

In figure 1, s(S) = d(S) = n – 1; thus, S is optimal. fo(S) = 2

Page 13: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

13

SOME IMPORTANT CONCEPTSSymbol Description

i:j - The result of computing xi xi+1 … xj ; i<=j

ia(D) = a - line 1 of prefix circuit D has a duplication node at level a and has no duplication nodes at any level less than a

l(D) = b - line n of D obtains 1:n at level b

Page 14: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

14

BRIEF REVIEW OF PREVIOUS RESULTS

A prefix circuit D can be defined with sets of operation nodes at level i, i = 1, 2,... , d(D):

Gi = {(x, y) | at level i on line y there is an operation node whose left input is the

output of a node on line x at level i – 1}.

If (x, y) Gi, the corresponding operation node can be denoted as (x, y)i.

Page 15: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

15

PREFIX CIRCUITS Q AND W Q(n)

Ex:

PROPERTY 1

Page 16: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

16

PREFIX CIRCUITS Q AND W W(n)

a prefix circuit defined with the following operation nodes :

By definition, d(W ) = n – 1, s(W ) = 2n – 3, ia(W) = n – 2, l(W) = n – 1 .

1

2

W(3)

1 2 3

1

1

2

3

2 3 4

W(4)

Page 17: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

17

COMPOSITION OF PREFIX CIRCUITS A(n1 ) and B(n2 ) are two

prefix circuits with n1 and n2 inputs, respectively.

A(n1) and B(n2) can be composed into a prefix circuit with n1 + n2 – 1 inputs by merging the operation node that produces 1:n1 on line n1 of A(n1 ) with the first duplication node on line 1 of B(n2 )

The resulting circuit is denoted by A(n1).B(n2 )

Ex: W(3).W(4)

1

2

W(3)

1 2 3

1

1

2

3

2 3 4

W(4)

W(3).W(4)

1

1

2

3

2 3 4 5 6

Page 18: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

18

SOPC – SIZE OPTIMAL PREFIX CIRCUITS

Ex: W(n) is SOPC(n, n-2, n-1) with fan-out 4

Page 19: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

19

SOPC – SIZE OPTIMAL PREFIX CIRCUITS

Page 20: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

20

SIZE OPTIMAL PREFIX CIRCUIT WL Prefix circuit WL(5) is defined with

the following operation nodes:

Fig. 3 shows that d(WL(5)) = 7, fo(WL(5)) = 4, ia(WL(5)) = 5, l(WL(5)) = 6. s(WL(5)) =23=2×13–2+5 –6

Therefore, by Definition 2, WL(5) is SOPC(13, 5, 6).

1

1

2

3

4

5

6

7

2 3 4 5 6 7 8 9 10 11 12 13

WL(5)

Page 21: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

21

SIZE OPTIMAL PREFIX CIRCUIT WL

WL(5)

move the operation node (1, 13)6 of WL(5) downward by 1 level to be (1, 13)7

move the other operation nodes at level 6 and

level 7 downward by 2 levels

New prefix circuit WA(5)

WA(5) d(WA(5)) = 9, fo(WA(5)) = 4 SOPC(13, 6, 7).

1

1

2

3

4

5

6

7

2 3 4 5 6 7 8 9 10 11 12 13

8

9

WA(5)

Page 22: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

22

SIZE OPTIMAL PREFIX CIRCUIT WL

WL(5)

move down all the nodes at levels 6 and 7downward by 2 levels

New prefix circuit WB(5)

WB(5) d(WB(5)) = 9 fo(WB(5)) = 4 SOPC(13, 7, 8).

1

1

2

3

4

5

6

7

2 3 4 5 6 7 8 9 10 11 12 13

8

9

WB(5)

Page 23: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

23

SIZE OPTIMAL PREFIX CIRCUIT WL By Property 6,

WA(5)•WB(5) is SOPC(25, 6, 8).

WA(5).WB(5)

delete the operation node (13, 25)8

add operation nodes (13,25)6 ,and(1,25)7

WL(6)

1

1

2

3

4

5

6

7

2 3 4 5 6 7 8 9 10 11 12 13

8

9

WA(5).WB(5)

14 15 16 17 18 19 20 21 22 23 24 25

1

1

2

3

4

5

6

7

2 3 4 5 6 7 8 9 10 11 12 13

8

9

WL(6)

14 15 16 17 18 19 20 21 22 23 24 25

Page 24: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

24

SIZE OPTIMAL PREFIX CIRCUIT WL

The above method of obtaining WL(6) from WL(5), can be generalized to derive WL(t+1) from WL(t), t ≥ 5.

Page 25: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

25

SIZE OPTIMAL PREFIX CIRCUIT WL

Page 26: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

26

SIZE OPTIMAL PREFIX CIRCUIT WV Prefix circuit WV(5) is

defined with the following operation nodes:

d(WV(5)) = 6, fo(WV(5)) = 4, ia(WV(5)) = 4, l(WV(5)) = 5. s(WV(5)) = 17 = 2 × 10 – 2 + 4 – 5. Therefore, by Definition 2, WV(5) is

SOPC(10, 4, 5).

1

1

2

3

4

5

6

2 3 4 5 6 7 8 9 10

WV(5)

Page 27: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

27

SIZE OPTIMAL PREFIX CIRCUIT WV

WV(5)

move the operation node (1, 10)6 of WL(5) downward

by 1 level

move the other operation nodes at level 5 and

level 6 downward by 2 levels

New prefix circuit VA(5)

VA(5) d(VA(5)) = 8 fo(VA(5)) = 3 SOPC(10, 5, 6).

Page 28: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

28

SIZE OPTIMAL PREFIX CIRCUIT WV

WV(5)

move down all the nodes at levels 5 and 6

downward by 2 levels

New prefix circuit VB(5)

VB(5) d(VB(5)) = 8 fo(VB(5)) = 4 SOPC(10, 6, 7).

Page 29: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

29

SIZE OPTIMAL PREFIX CIRCUIT WV By Property 6,

VA(5)•VB(5) is SOPC(19, 5, 7).

VA(5).VB(5)

delete the operation node (10, 19)7

add operation nodes (10, 19)5 ,and (1, 19)6

WV(6)

1

1

2

3

4

5

6

2 3 4 5 6 7 8 9 10

WV(6)

191817161514131211

7

8

Page 30: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

30

SIZE OPTIMAL PREFIX CIRCUIT WV The above method

of obtaining WV(6) from WV(5) can be generalized to derive WV(t + 1) from WV(t), t ≥ 5.

Page 31: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

31

SIZE OPTIMAL PREFIX CIRCUIT WV

Page 32: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

32

DEPTH -SIZE OPTIMAL PREFIX CIRCUIT WE4 let WG(t)=WL(t)•WL(t–1)•...•WL(5)•W(4), for t ≥ 5, and

WG(4) =W(4).

Page 33: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

33

DEPTH -SIZE OPTIMAL PREFIX CIRCUIT WE4

1

1

2

3

4

5

6

2 3 4 5 6 7 8 9 10 11 12 13 26252423222120191817161514 27

WE4(27) = Q(15).WV(5).W(4)

Page 34: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

34

DEPTH -SIZE OPTIMAL PREFIX CIRCUIT WE4

Page 35: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

35

COMPARISON OF OPTIMAL PREFIX CIRCUITS

OPTIMAL PREFIX CIRCUIT

FAN-OUT DEPTH

LYD unbounded 2logn -6 to 2logn -3

M unbounded 2logn -5 to 2logn -3

H4 4 2logn -5 to 2logn -3

Z4 4 2logn -6 to 2logn -3

WE4 4 2logn -6 to 2logn -3

Page 36: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

36

COMPARISON OF OPTIMAL PREFIX CIRCUITSn comment

n = 21 d(WE4)=d(H4)i.e. WE4 isbetter than H4.

n = 29, 65 ≤ n ≤ 67, 139 ≤ n ≤ 145,287≤n ≤303,583 ≤ n ≤ 621,1175 ≤ n≤ 1259,2359 ≤ n ≤ 2537,or 4727 ≤ n ≤ 5095

d(Z4) = d(WE4) – 1

45 ≤ n ≤ 46, 99 ≤ n ≤ 102, 209 ≤ n ≤ 214,431 ≤ n ≤ 438, 877 ≤ n ≤ 886, 1771 ≤ n ≤ 1782, or 3561 ≤ n ≤ 3574

d(WE4) = d(Z4) – 1

Page 37: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

37

REFERENCES A new approach to constructing optimal prefix circuits with

smalldepthYen-Chun Lin; Jun-Wei HsiaoParallel Architectures, Algorithms and Networks, 2002. I-SPAN apos;02. Proceedings. International Symposium onVolume , Issue , 2002 Page(s):86 - 91

Page 38: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

38

Additional Information

P(n) Q(n)

Page 39: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

39

P(n)

Page 40: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

40

Q(n)

Page 41: PREFIX SUM CIRCUITS By: D.M.Rasanjalee Himali csc 8530: Parallel Algorithm Instructor: Dr. Sushil Prasad

41

WE4(n) fan-out of Q is at most 4 fan-out of WV(5) is 4. By Theorem 8, the fan-out of

WV(t) is 3, t = 6. By Property 4 and Theorem 7, the

fan-out of W(4) and WL is 4, Hence, WE4(n) has a fan-out of 4.