prefix sum circuits by: d.m.rasanjalee himali csc 8530: parallel algorithm instructor: dr. sushil...

Post on 21-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

PREFIX SUM CIRCUITS

By: D.M.Rasanjalee Himalicsc 8530: Parallel AlgorithmInstructor: Dr. Sushil Prasad

2

PREFIX SUM: OVERVIEW Also called scan operation Inputs:

Binary operator : Input Sequence :[x0,x1….. xn]

Output : Output Sequence :

[x0, (x0 x1)…… (x0 x1 …. xn)] Ex:

Prefix sum([1, 2, 3, 4, 5, 6, 7, 8]) = [1, 3, 6, 10,15, 21, 28, 36]

3

PREFIX SUM : CIRCUITS LOWER BOUND

The building block is an adder prefix sum circuit must have :

A depth (logn) as lower bound A circuit width (n) as lower bound A circuit size (n) as lower bound

.

...

..

Width

Depth

4

PREFIX SUM : A RECURSIVE CIRCUIT

Assume that n is a power of 2 A problem of size n is broken into 2

identical problems of size n/2 each The 2 sub problems are solved

recursively Their solutions are combined to

obtain a solution to the original problem.

5

PREFIX SUM : A RECURSIVE CIRCUIT The even index

elements {x0,x2,….,xn-2} and the odd index elements {x1,x3,…,xn-1} of the input are fed separately and simultaneously into 2 circuits computing prefix sum of n/2 elements.

x0

x1+x3+x5+x7

s1

s2

s3

s4

s5

s6

s7

x0+x2+x4+x6

x0+x2

x0+x2+x4

x1

x1+x3

x1+x3+x5

x0

x7

x1

x2

x3

x4

x5

x6

s0CIRCUIT

FORPREFIXSUMS

OFn/2

ELEMENTS

CIRCUITFOR

PREFIXSUMS

OFn/2

ELEMENTS

Recursive computation of prefix sums

6

PREFIX SUM : A RECURSIVE CIRCUIT ANALYSIS

Circuit width is O(n) Circuit depth d(n):

d(1) = 0 d(n) = d(n/2) +1 ;for n>=2 Thus d(n) = O(logn)

Circuit size s(n): s(1) = 0 s(n) = 2s(n/2)+n-1 ;for n>=2 Thus s(n) = O(nlogn)

Circuit is optimal w.r.t. depth and width

Circuit is not optimal w.r.t. size

x0

x1+x3+x5+x7

s1

s2

s3

s4

s5

s6

s7

x0+x2+x4+x6

x0+x2

x0+x2+x4

x1

x1+x3

x1+x3+x5

x0

x7

x1

x2

x3

x4

x5

x6

s0CIRCUIT

FORPREFIXSUMS

OFn/2

ELEMENTS

CIRCUITFOR

PREFIXSUMS

OFn/2

ELEMENTS

Recursive computation of prefix sums

7

PREFIX SUM :AN OPTIMAL CIRCUIT This circuit recursively

computes the odd index sums: s1 = x0+x1 s3 = x0+x1+x2+x3 .. sn-1 =x0+x1+..+xn-1

Then add xi+1 to the sum si = x0+x1+..+xi for i=1,3,..n-3 Requires (n/2)-1 adders

Yields even-indexed

sums s2,s4,..,sn-2 s0 = x0

x1

x3

x5

x7

x0

s1

x2

s2

s3

s4x4

s5

s6x6

s7

s0

Optimal prefix sums circuit for 8 inputs

8

PREFIX SUM :AN OPTIMAL CIRCUIT ANALYSIS

Circuit width is O(n) Circuit depth d(n):

d(1) = 0 d(2) = 1 ;for n=2 d(n) = d(n/2) +2;for n>2 Thus d(n) = 2logn - 1

Circuit size s(n): s(1) = 0 s(2) = 1 ;for n=2 s(n) = s(n/2)+n-1 ;for n>2 Thus s(n) = 2n – 2 –logn

The size, width, depth are optimal

x1

x3

x5

x7

x0

s1

x2

s2

s3

s4x4

s5

s6x6

s7

s0

Optimal prefix sums circuit for 8 inputs

9

A New Approach to Constructing Optimal Prefix Circuits with Small Depth

10

INTRODUCTION

Paper presents a prefix circuit with a directed acyclic graph containing: n input nodes, n output nodes, at least n – 1 operation nodes, and at least one duplication node.

All the directed edges are assumed to be downward

11

A SERIAL PREFIX CIRCUIT (S(n))

0

1

1

2

.

.

.

n-1

2 3 . . . n

level

Serial prefix circuit S(n)

input

12

DEPTH-SIZE OPTIMALITY For any prefix circuit D,

d(D ) + s(D ) >= 2n – 2

Thus, D is depth-size optimal, or optimal for short, if d (D ) + s(D ) = 2n – 2.

In figure 1, s(S) = d(S) = n – 1; thus, S is optimal. fo(S) = 2

13

SOME IMPORTANT CONCEPTSSymbol Description

i:j - The result of computing xi xi+1 … xj ; i<=j

ia(D) = a - line 1 of prefix circuit D has a duplication node at level a and has no duplication nodes at any level less than a

l(D) = b - line n of D obtains 1:n at level b

14

BRIEF REVIEW OF PREVIOUS RESULTS

A prefix circuit D can be defined with sets of operation nodes at level i, i = 1, 2,... , d(D):

Gi = {(x, y) | at level i on line y there is an operation node whose left input is the

output of a node on line x at level i – 1}.

If (x, y) Gi, the corresponding operation node can be denoted as (x, y)i.

15

PREFIX CIRCUITS Q AND W Q(n)

Ex:

PROPERTY 1

16

PREFIX CIRCUITS Q AND W W(n)

a prefix circuit defined with the following operation nodes :

By definition, d(W ) = n – 1, s(W ) = 2n – 3, ia(W) = n – 2, l(W) = n – 1 .

1

2

W(3)

1 2 3

1

1

2

3

2 3 4

W(4)

17

COMPOSITION OF PREFIX CIRCUITS A(n1 ) and B(n2 ) are two

prefix circuits with n1 and n2 inputs, respectively.

A(n1) and B(n2) can be composed into a prefix circuit with n1 + n2 – 1 inputs by merging the operation node that produces 1:n1 on line n1 of A(n1 ) with the first duplication node on line 1 of B(n2 )

The resulting circuit is denoted by A(n1).B(n2 )

Ex: W(3).W(4)

1

2

W(3)

1 2 3

1

1

2

3

2 3 4

W(4)

W(3).W(4)

1

1

2

3

2 3 4 5 6

18

SOPC – SIZE OPTIMAL PREFIX CIRCUITS

Ex: W(n) is SOPC(n, n-2, n-1) with fan-out 4

19

SOPC – SIZE OPTIMAL PREFIX CIRCUITS

20

SIZE OPTIMAL PREFIX CIRCUIT WL Prefix circuit WL(5) is defined with

the following operation nodes:

Fig. 3 shows that d(WL(5)) = 7, fo(WL(5)) = 4, ia(WL(5)) = 5, l(WL(5)) = 6. s(WL(5)) =23=2×13–2+5 –6

Therefore, by Definition 2, WL(5) is SOPC(13, 5, 6).

1

1

2

3

4

5

6

7

2 3 4 5 6 7 8 9 10 11 12 13

WL(5)

21

SIZE OPTIMAL PREFIX CIRCUIT WL

WL(5)

move the operation node (1, 13)6 of WL(5) downward by 1 level to be (1, 13)7

move the other operation nodes at level 6 and

level 7 downward by 2 levels

New prefix circuit WA(5)

WA(5) d(WA(5)) = 9, fo(WA(5)) = 4 SOPC(13, 6, 7).

1

1

2

3

4

5

6

7

2 3 4 5 6 7 8 9 10 11 12 13

8

9

WA(5)

22

SIZE OPTIMAL PREFIX CIRCUIT WL

WL(5)

move down all the nodes at levels 6 and 7downward by 2 levels

New prefix circuit WB(5)

WB(5) d(WB(5)) = 9 fo(WB(5)) = 4 SOPC(13, 7, 8).

1

1

2

3

4

5

6

7

2 3 4 5 6 7 8 9 10 11 12 13

8

9

WB(5)

23

SIZE OPTIMAL PREFIX CIRCUIT WL By Property 6,

WA(5)•WB(5) is SOPC(25, 6, 8).

WA(5).WB(5)

delete the operation node (13, 25)8

add operation nodes (13,25)6 ,and(1,25)7

WL(6)

1

1

2

3

4

5

6

7

2 3 4 5 6 7 8 9 10 11 12 13

8

9

WA(5).WB(5)

14 15 16 17 18 19 20 21 22 23 24 25

1

1

2

3

4

5

6

7

2 3 4 5 6 7 8 9 10 11 12 13

8

9

WL(6)

14 15 16 17 18 19 20 21 22 23 24 25

24

SIZE OPTIMAL PREFIX CIRCUIT WL

The above method of obtaining WL(6) from WL(5), can be generalized to derive WL(t+1) from WL(t), t ≥ 5.

25

SIZE OPTIMAL PREFIX CIRCUIT WL

26

SIZE OPTIMAL PREFIX CIRCUIT WV Prefix circuit WV(5) is

defined with the following operation nodes:

d(WV(5)) = 6, fo(WV(5)) = 4, ia(WV(5)) = 4, l(WV(5)) = 5. s(WV(5)) = 17 = 2 × 10 – 2 + 4 – 5. Therefore, by Definition 2, WV(5) is

SOPC(10, 4, 5).

1

1

2

3

4

5

6

2 3 4 5 6 7 8 9 10

WV(5)

27

SIZE OPTIMAL PREFIX CIRCUIT WV

WV(5)

move the operation node (1, 10)6 of WL(5) downward

by 1 level

move the other operation nodes at level 5 and

level 6 downward by 2 levels

New prefix circuit VA(5)

VA(5) d(VA(5)) = 8 fo(VA(5)) = 3 SOPC(10, 5, 6).

28

SIZE OPTIMAL PREFIX CIRCUIT WV

WV(5)

move down all the nodes at levels 5 and 6

downward by 2 levels

New prefix circuit VB(5)

VB(5) d(VB(5)) = 8 fo(VB(5)) = 4 SOPC(10, 6, 7).

29

SIZE OPTIMAL PREFIX CIRCUIT WV By Property 6,

VA(5)•VB(5) is SOPC(19, 5, 7).

VA(5).VB(5)

delete the operation node (10, 19)7

add operation nodes (10, 19)5 ,and (1, 19)6

WV(6)

1

1

2

3

4

5

6

2 3 4 5 6 7 8 9 10

WV(6)

191817161514131211

7

8

30

SIZE OPTIMAL PREFIX CIRCUIT WV The above method

of obtaining WV(6) from WV(5) can be generalized to derive WV(t + 1) from WV(t), t ≥ 5.

31

SIZE OPTIMAL PREFIX CIRCUIT WV

32

DEPTH -SIZE OPTIMAL PREFIX CIRCUIT WE4 let WG(t)=WL(t)•WL(t–1)•...•WL(5)•W(4), for t ≥ 5, and

WG(4) =W(4).

33

DEPTH -SIZE OPTIMAL PREFIX CIRCUIT WE4

1

1

2

3

4

5

6

2 3 4 5 6 7 8 9 10 11 12 13 26252423222120191817161514 27

WE4(27) = Q(15).WV(5).W(4)

34

DEPTH -SIZE OPTIMAL PREFIX CIRCUIT WE4

35

COMPARISON OF OPTIMAL PREFIX CIRCUITS

OPTIMAL PREFIX CIRCUIT

FAN-OUT DEPTH

LYD unbounded 2logn -6 to 2logn -3

M unbounded 2logn -5 to 2logn -3

H4 4 2logn -5 to 2logn -3

Z4 4 2logn -6 to 2logn -3

WE4 4 2logn -6 to 2logn -3

36

COMPARISON OF OPTIMAL PREFIX CIRCUITSn comment

n = 21 d(WE4)=d(H4)i.e. WE4 isbetter than H4.

n = 29, 65 ≤ n ≤ 67, 139 ≤ n ≤ 145,287≤n ≤303,583 ≤ n ≤ 621,1175 ≤ n≤ 1259,2359 ≤ n ≤ 2537,or 4727 ≤ n ≤ 5095

d(Z4) = d(WE4) – 1

45 ≤ n ≤ 46, 99 ≤ n ≤ 102, 209 ≤ n ≤ 214,431 ≤ n ≤ 438, 877 ≤ n ≤ 886, 1771 ≤ n ≤ 1782, or 3561 ≤ n ≤ 3574

d(WE4) = d(Z4) – 1

37

REFERENCES A new approach to constructing optimal prefix circuits with

smalldepthYen-Chun Lin; Jun-Wei HsiaoParallel Architectures, Algorithms and Networks, 2002. I-SPAN apos;02. Proceedings. International Symposium onVolume , Issue , 2002 Page(s):86 - 91

38

Additional Information

P(n) Q(n)

39

P(n)

40

Q(n)

41

WE4(n) fan-out of Q is at most 4 fan-out of WV(5) is 4. By Theorem 8, the fan-out of

WV(t) is 3, t = 6. By Property 4 and Theorem 7, the

fan-out of W(4) and WL is 4, Hence, WE4(n) has a fan-out of 4.

top related