a decomposition algorithm to structure arithmetic circuits ajay k. verma, philip brisk, paolo ienne...
TRANSCRIPT
![Page 1: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/1.jpg)
A Decomposition Algorithm to Structure Arithmetic Circuits
Ajay K. Verma, Philip Brisk, Paolo Ienne
Ecole Polytechnique Fédérale de Lausanne (EPFL)
International Workshop on Logic and Synthesis
August 1, 2009
![Page 2: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/2.jpg)
Logic Optimization Strategies
Ripple-Carry Adder Carry-Lookahead Adder
• Logic synthesis tools– Local optimization via Boolean minimization
• Architectural transformation– Not with “traditional” logic synthesis
1
![Page 3: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/3.jpg)
Naïve Leading Zero Detector
xi is TRUE if (i+1)th most-significant bit is the leading non-zero bit
xi = a15a14 … a15-(i-2)a15-(i-1)a15-i
Convert xi to a binary number
2
![Page 4: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/4.jpg)
Optimized LZD [Oklobdzija 1994]
3
![Page 5: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/5.jpg)
Comparison
0.36 ns (427 μm2)
0.30 ns (392 μm2)
16% faster, 8% smaller
4
![Page 6: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/6.jpg)
Outline• Algorithmic Overview
• Progressive Decomposition Algorithm… – … and its shortcomings– [Verma et al., DAC 2007]
• New Algorithm
• Experimental Results
• Conclusion
5
![Page 7: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/7.jpg)
Input Condensation• Leader Expressions
– Sufficient to evaluate expression– Once evaluated, you can discard input bits– Works for circuits with “effective online algorithms”
IN
OneBig
Circuit
OUTRecursively compute leader expressions
again
Leader Expressions
L |L| < |IN|
Smaller Circuit
OUT
IN
6
![Page 8: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/8.jpg)
8:4 Parallel Counter
sc
(Leader Expressions)
7
![Page 9: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/9.jpg)
Hierarchical Circuit Construction
Use leader expressions as building blocks to impose hierarchy
8
![Page 10: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/10.jpg)
Progressive Decomposition
• Choose a subset of input bits– How many bits?– Many different combinations?
• Find leader expressions– Optimize via Boolean ring properties– Find identities
• Discard dependent expressions
x y zz = f(x, y)
• Rewrite circuit in terms of leader expressions• Recursively process the remaining circuit
9
![Page 11: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/11.jpg)
Progressive Decomposition: Shortcomings and Concerns
• [Verma et al., DAC 2007]
• Entire algorithm based on Reed-Muller Form– Rewrite ‘your’ optimizer, e.g., if you use AIGs or BDDs.– Exponential blowup for leading one detector
• Cannot optimize multipliers
• Cannot optimize “structurable circuits” surrounded by peripheral logic
10
![Page 12: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/12.jpg)
M1 M2
48
E1 E2
19 19
4
sign
negs1 s2
xor
out
Compound CircuitsM1 M2
48
E1 E2
19 19
sign
not
out
and
1
4
s1 s2
xor
g72x
0.82 ns (7998 μm2)
12% faster, 55% larger
0.94 ns (5142 μm2)
11
![Page 13: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/13.jpg)
Support Sets
• Progressive Decomposition– Support sets are subsets of one another or disjoint– Blocks must always reduce the number of inputs
12
![Page 14: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/14.jpg)
Support Sets
• New Approach– Support sets may overlap– Relaxes input condensation constraint– Both conditions are necessary to support multipliers
13
![Page 15: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/15.jpg)
New Algorithm: Overview• Supports any representation with minimization algorithms
– We use BDDs
• Use SAT to check functional dependency – [Lee et al., ICCAD 2007]
• Restrict Operator computes generalized cofactors– [Coudert and Madre, ICCAD 1990]
• Entropy-based delay estimator – [Macii et al., GLS-VLSI 1999]– Imprecise, but effectively computes relative delays
14
![Page 16: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/16.jpg)
Input Bit Selection via Random Sampling
a5 a4 a3 a2 0 1
b5 b4 b3 b2 1 1 a5 1 a3 a2 a1 0
b5 b4 1 1 b1 b0
Complexity of a 4-bit adder Complexity of a 6-bit adder
• Select every combination of k input bits for k < 6• Randomly assign values to the bits• Estimate the complexity of the resulting circuit
15
![Page 17: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/17.jpg)
Computing Leader ExpressionsE – input expression
B – chosen input bits
S – leader expressions
found thus far
R – remaining bits
Is E functionally dependent on SR?
1. Randomly sample R’s assignment space
2. Find missing leader expressions using SAT [Lee et al. ICCAD 2007]
- Satisfying assignments provide missing leader expressions
- S contains all leader expressions if no satisfying assignments exist
E
B R
S
16
![Page 18: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/18.jpg)
Redundant Leader Expressions
• Leader expression is an input bit– Non-disjunctive decomposition is required– Remove from the set
• Leader expression contains no useful information– Remove from the set
a0b1 + a1b0
B = {a0, b0} a1 = b1 = 1 a0 + b0
Cannot help to compute the original expression
17
![Page 19: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/19.jpg)
• Generalized cofactors– E = (a+b)x + (ab + c)y– g = a + b + c
– E |g=0 = 0
– E |g=1 = (a + b)x + (ab + c)y = E
• The general case is a reduction to SAT– Problem instances tend to be small
Redundant Leader Expressions
18
![Page 20: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/20.jpg)
Rewrite Original Expression
• Rewrite as Shannon expansion using the Restrict operator – Generalized cofactors are not unique – The order in which the cofactors of each leader expression are
computed may affect the result
• For each cofactor:– Estimate the delay– Estimate delay of F based on
Shannon expansion
• Select the cofactor that leads to the minimal estimated delay of F
F = g(F |g=1) + g’(F |g=0)
D(F) max{D(F |g=1), D(F |g=0), D(g)} + D(mux)
F
F |g=0
F |g=1
g
19
![Page 21: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/21.jpg)
Rinse, Repeat
We picked this set of input bits to optimize!We generated a set of leader expressionsLocal optimization using your favorite LS tool can’t hurt.The leader expressions are now frozen, and the block that computes them is optimized. Optimize the remaining circuit
20
![Page 22: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/22.jpg)
Experimental Setup
Circuit written by hand
Known Arithmetic Circuits
Prog. Decomp.
Synopsis Design Compiler
- compile_ultra - minimize delay
Artisan Standard CellsUMC (90 nm)
1 2 3
New Algorith
m
4
21
![Page 23: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/23.jpg)
Critical Path Delay
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
16-bit ADD 12-bit 3ADD 8x8-bit MUL 16-bit SHIFT ADPCM g72x SAD
ns
Original Progressive Decomposition
Our Algorithm Library/Manual Implementation
Optimized for Area, Not DelayProgressive Decomposition
Fails
22
![Page 24: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/24.jpg)
Area
μm2
Original Progressive Decomposition
Our Algorithm Library/Manual Implementation
Optimized for Area, Not Delay
0
2000
4000
6000
8000
10000
12000
14000
16-bit ADD 12-bit 3ADD 8x8-bit MUL 16-bit SHIFT ADPCM g72x SAD
Progressive Decomposition Fails
23
![Page 25: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/25.jpg)
Conclusion
• Technique to structure arithmetic circuits– Fixes shortcomings of Progressive Decomposition
• Our approach is orthogonal to classical Boolean minimization techniques
• Discovered new implementation of a k-input MAX function– Similar structure to LZD Circuit– Will appear at ICCAD 2009
24
![Page 26: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/26.jpg)
Computing Leader Expressions
E
B R
S
Original Variables
B = {b1, …, bm}
R = {r1, …, rn}
Dummy Variables
C = {c1, …, cm}
S = {s1, …, sn}
Extra Constraints
[Lee et al., ICCAD 2007]
ri = si, 1 < i < n
For each leader expression ej:
ej(b1, …, bm) = ej(c1, …, cm)
E(b1, …, bm, r1, …, rn) E(c1, …, cm, s1, …, sn)
ej
Two different input assignments
![Page 27: A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International](https://reader036.vdocument.in/reader036/viewer/2022062519/5697bfe81a28abf838cb62f6/html5/thumbnails/27.jpg)
Input Bit Selection
EE’
X
B = {x, y, z}
E’ |xyz=000
E’ |xyz=001
E’ |xyz=111
…
B = {x, y, z}
Use delay estimator for each E’The complexity of E’ is the metric by which we evaluate each group of input bits
Compute every combination of k input bits for k < 6
Assign values to x, y, z using random sampling