prefix sum circuits by: d.m.rasanjalee himali csc 8530: parallel algorithm instructor: dr. sushil...
TRANSCRIPT
PREFIX SUM CIRCUITS
By: D.M.Rasanjalee Himalicsc 8530: Parallel AlgorithmInstructor: Dr. Sushil Prasad
2
PREFIX SUM: OVERVIEW Also called scan operation Inputs:
Binary operator : Input Sequence :[x0,x1….. xn]
Output : Output Sequence :
[x0, (x0 x1)…… (x0 x1 …. xn)] Ex:
Prefix sum([1, 2, 3, 4, 5, 6, 7, 8]) = [1, 3, 6, 10,15, 21, 28, 36]
3
PREFIX SUM : CIRCUITS LOWER BOUND
The building block is an adder prefix sum circuit must have :
A depth (logn) as lower bound A circuit width (n) as lower bound A circuit size (n) as lower bound
.
...
..
Width
Depth
4
PREFIX SUM : A RECURSIVE CIRCUIT
Assume that n is a power of 2 A problem of size n is broken into 2
identical problems of size n/2 each The 2 sub problems are solved
recursively Their solutions are combined to
obtain a solution to the original problem.
5
PREFIX SUM : A RECURSIVE CIRCUIT The even index
elements {x0,x2,….,xn-2} and the odd index elements {x1,x3,…,xn-1} of the input are fed separately and simultaneously into 2 circuits computing prefix sum of n/2 elements.
x0
x1+x3+x5+x7
s1
s2
s3
s4
s5
s6
s7
x0+x2+x4+x6
x0+x2
x0+x2+x4
x1
x1+x3
x1+x3+x5
x0
x7
x1
x2
x3
x4
x5
x6
s0CIRCUIT
FORPREFIXSUMS
OFn/2
ELEMENTS
CIRCUITFOR
PREFIXSUMS
OFn/2
ELEMENTS
Recursive computation of prefix sums
6
PREFIX SUM : A RECURSIVE CIRCUIT ANALYSIS
Circuit width is O(n) Circuit depth d(n):
d(1) = 0 d(n) = d(n/2) +1 ;for n>=2 Thus d(n) = O(logn)
Circuit size s(n): s(1) = 0 s(n) = 2s(n/2)+n-1 ;for n>=2 Thus s(n) = O(nlogn)
Circuit is optimal w.r.t. depth and width
Circuit is not optimal w.r.t. size
x0
x1+x3+x5+x7
s1
s2
s3
s4
s5
s6
s7
x0+x2+x4+x6
x0+x2
x0+x2+x4
x1
x1+x3
x1+x3+x5
x0
x7
x1
x2
x3
x4
x5
x6
s0CIRCUIT
FORPREFIXSUMS
OFn/2
ELEMENTS
CIRCUITFOR
PREFIXSUMS
OFn/2
ELEMENTS
Recursive computation of prefix sums
7
PREFIX SUM :AN OPTIMAL CIRCUIT This circuit recursively
computes the odd index sums: s1 = x0+x1 s3 = x0+x1+x2+x3 .. sn-1 =x0+x1+..+xn-1
Then add xi+1 to the sum si = x0+x1+..+xi for i=1,3,..n-3 Requires (n/2)-1 adders
Yields even-indexed
sums s2,s4,..,sn-2 s0 = x0
x1
x3
x5
x7
x0
s1
x2
s2
s3
s4x4
s5
s6x6
s7
s0
Optimal prefix sums circuit for 8 inputs
8
PREFIX SUM :AN OPTIMAL CIRCUIT ANALYSIS
Circuit width is O(n) Circuit depth d(n):
d(1) = 0 d(2) = 1 ;for n=2 d(n) = d(n/2) +2;for n>2 Thus d(n) = 2logn - 1
Circuit size s(n): s(1) = 0 s(2) = 1 ;for n=2 s(n) = s(n/2)+n-1 ;for n>2 Thus s(n) = 2n – 2 –logn
The size, width, depth are optimal
x1
x3
x5
x7
x0
s1
x2
s2
s3
s4x4
s5
s6x6
s7
s0
Optimal prefix sums circuit for 8 inputs
9
A New Approach to Constructing Optimal Prefix Circuits with Small Depth
10
INTRODUCTION
Paper presents a prefix circuit with a directed acyclic graph containing: n input nodes, n output nodes, at least n – 1 operation nodes, and at least one duplication node.
All the directed edges are assumed to be downward
11
A SERIAL PREFIX CIRCUIT (S(n))
0
1
1
2
.
.
.
n-1
2 3 . . . n
level
Serial prefix circuit S(n)
input
12
DEPTH-SIZE OPTIMALITY For any prefix circuit D,
d(D ) + s(D ) >= 2n – 2
Thus, D is depth-size optimal, or optimal for short, if d (D ) + s(D ) = 2n – 2.
In figure 1, s(S) = d(S) = n – 1; thus, S is optimal. fo(S) = 2
13
SOME IMPORTANT CONCEPTSSymbol Description
i:j - The result of computing xi xi+1 … xj ; i<=j
ia(D) = a - line 1 of prefix circuit D has a duplication node at level a and has no duplication nodes at any level less than a
l(D) = b - line n of D obtains 1:n at level b
14
BRIEF REVIEW OF PREVIOUS RESULTS
A prefix circuit D can be defined with sets of operation nodes at level i, i = 1, 2,... , d(D):
Gi = {(x, y) | at level i on line y there is an operation node whose left input is the
output of a node on line x at level i – 1}.
If (x, y) Gi, the corresponding operation node can be denoted as (x, y)i.
15
PREFIX CIRCUITS Q AND W Q(n)
Ex:
PROPERTY 1
16
PREFIX CIRCUITS Q AND W W(n)
a prefix circuit defined with the following operation nodes :
By definition, d(W ) = n – 1, s(W ) = 2n – 3, ia(W) = n – 2, l(W) = n – 1 .
1
2
W(3)
1 2 3
1
1
2
3
2 3 4
W(4)
17
COMPOSITION OF PREFIX CIRCUITS A(n1 ) and B(n2 ) are two
prefix circuits with n1 and n2 inputs, respectively.
A(n1) and B(n2) can be composed into a prefix circuit with n1 + n2 – 1 inputs by merging the operation node that produces 1:n1 on line n1 of A(n1 ) with the first duplication node on line 1 of B(n2 )
The resulting circuit is denoted by A(n1).B(n2 )
Ex: W(3).W(4)
1
2
W(3)
1 2 3
1
1
2
3
2 3 4
W(4)
W(3).W(4)
1
1
2
3
2 3 4 5 6
18
SOPC – SIZE OPTIMAL PREFIX CIRCUITS
Ex: W(n) is SOPC(n, n-2, n-1) with fan-out 4
19
SOPC – SIZE OPTIMAL PREFIX CIRCUITS
20
SIZE OPTIMAL PREFIX CIRCUIT WL Prefix circuit WL(5) is defined with
the following operation nodes:
Fig. 3 shows that d(WL(5)) = 7, fo(WL(5)) = 4, ia(WL(5)) = 5, l(WL(5)) = 6. s(WL(5)) =23=2×13–2+5 –6
Therefore, by Definition 2, WL(5) is SOPC(13, 5, 6).
1
1
2
3
4
5
6
7
2 3 4 5 6 7 8 9 10 11 12 13
WL(5)
21
SIZE OPTIMAL PREFIX CIRCUIT WL
WL(5)
move the operation node (1, 13)6 of WL(5) downward by 1 level to be (1, 13)7
move the other operation nodes at level 6 and
level 7 downward by 2 levels
New prefix circuit WA(5)
WA(5) d(WA(5)) = 9, fo(WA(5)) = 4 SOPC(13, 6, 7).
1
1
2
3
4
5
6
7
2 3 4 5 6 7 8 9 10 11 12 13
8
9
WA(5)
22
SIZE OPTIMAL PREFIX CIRCUIT WL
WL(5)
move down all the nodes at levels 6 and 7downward by 2 levels
New prefix circuit WB(5)
WB(5) d(WB(5)) = 9 fo(WB(5)) = 4 SOPC(13, 7, 8).
1
1
2
3
4
5
6
7
2 3 4 5 6 7 8 9 10 11 12 13
8
9
WB(5)
23
SIZE OPTIMAL PREFIX CIRCUIT WL By Property 6,
WA(5)•WB(5) is SOPC(25, 6, 8).
WA(5).WB(5)
delete the operation node (13, 25)8
add operation nodes (13,25)6 ,and(1,25)7
WL(6)
1
1
2
3
4
5
6
7
2 3 4 5 6 7 8 9 10 11 12 13
8
9
WA(5).WB(5)
14 15 16 17 18 19 20 21 22 23 24 25
1
1
2
3
4
5
6
7
2 3 4 5 6 7 8 9 10 11 12 13
8
9
WL(6)
14 15 16 17 18 19 20 21 22 23 24 25
24
SIZE OPTIMAL PREFIX CIRCUIT WL
The above method of obtaining WL(6) from WL(5), can be generalized to derive WL(t+1) from WL(t), t ≥ 5.
25
SIZE OPTIMAL PREFIX CIRCUIT WL
26
SIZE OPTIMAL PREFIX CIRCUIT WV Prefix circuit WV(5) is
defined with the following operation nodes:
d(WV(5)) = 6, fo(WV(5)) = 4, ia(WV(5)) = 4, l(WV(5)) = 5. s(WV(5)) = 17 = 2 × 10 – 2 + 4 – 5. Therefore, by Definition 2, WV(5) is
SOPC(10, 4, 5).
1
1
2
3
4
5
6
2 3 4 5 6 7 8 9 10
WV(5)
27
SIZE OPTIMAL PREFIX CIRCUIT WV
WV(5)
move the operation node (1, 10)6 of WL(5) downward
by 1 level
move the other operation nodes at level 5 and
level 6 downward by 2 levels
New prefix circuit VA(5)
VA(5) d(VA(5)) = 8 fo(VA(5)) = 3 SOPC(10, 5, 6).
28
SIZE OPTIMAL PREFIX CIRCUIT WV
WV(5)
move down all the nodes at levels 5 and 6
downward by 2 levels
New prefix circuit VB(5)
VB(5) d(VB(5)) = 8 fo(VB(5)) = 4 SOPC(10, 6, 7).
29
SIZE OPTIMAL PREFIX CIRCUIT WV By Property 6,
VA(5)•VB(5) is SOPC(19, 5, 7).
VA(5).VB(5)
delete the operation node (10, 19)7
add operation nodes (10, 19)5 ,and (1, 19)6
WV(6)
1
1
2
3
4
5
6
2 3 4 5 6 7 8 9 10
WV(6)
191817161514131211
7
8
30
SIZE OPTIMAL PREFIX CIRCUIT WV The above method
of obtaining WV(6) from WV(5) can be generalized to derive WV(t + 1) from WV(t), t ≥ 5.
31
SIZE OPTIMAL PREFIX CIRCUIT WV
32
DEPTH -SIZE OPTIMAL PREFIX CIRCUIT WE4 let WG(t)=WL(t)•WL(t–1)•...•WL(5)•W(4), for t ≥ 5, and
WG(4) =W(4).
33
DEPTH -SIZE OPTIMAL PREFIX CIRCUIT WE4
1
1
2
3
4
5
6
2 3 4 5 6 7 8 9 10 11 12 13 26252423222120191817161514 27
WE4(27) = Q(15).WV(5).W(4)
34
DEPTH -SIZE OPTIMAL PREFIX CIRCUIT WE4
35
COMPARISON OF OPTIMAL PREFIX CIRCUITS
OPTIMAL PREFIX CIRCUIT
FAN-OUT DEPTH
LYD unbounded 2logn -6 to 2logn -3
M unbounded 2logn -5 to 2logn -3
H4 4 2logn -5 to 2logn -3
Z4 4 2logn -6 to 2logn -3
WE4 4 2logn -6 to 2logn -3
36
COMPARISON OF OPTIMAL PREFIX CIRCUITSn comment
n = 21 d(WE4)=d(H4)i.e. WE4 isbetter than H4.
n = 29, 65 ≤ n ≤ 67, 139 ≤ n ≤ 145,287≤n ≤303,583 ≤ n ≤ 621,1175 ≤ n≤ 1259,2359 ≤ n ≤ 2537,or 4727 ≤ n ≤ 5095
d(Z4) = d(WE4) – 1
45 ≤ n ≤ 46, 99 ≤ n ≤ 102, 209 ≤ n ≤ 214,431 ≤ n ≤ 438, 877 ≤ n ≤ 886, 1771 ≤ n ≤ 1782, or 3561 ≤ n ≤ 3574
d(WE4) = d(Z4) – 1
37
REFERENCES A new approach to constructing optimal prefix circuits with
smalldepthYen-Chun Lin; Jun-Wei HsiaoParallel Architectures, Algorithms and Networks, 2002. I-SPAN apos;02. Proceedings. International Symposium onVolume , Issue , 2002 Page(s):86 - 91
38
Additional Information
P(n) Q(n)
39
P(n)
40
Q(n)
41
WE4(n) fan-out of Q is at most 4 fan-out of WV(5) is 4. By Theorem 8, the fan-out of
WV(t) is 3, t = 6. By Property 4 and Theorem 7, the
fan-out of W(4) and WL is 4, Hence, WE4(n) has a fan-out of 4.