-
1
EE241 Spring 2011EE241 - Spring 2011Advanced Digital Integrated Circuits
Lecture 22: Adders
AnnouncementsHomework #4 due on MondayQuiz #4 on MondayFinal exam next Wedensday!
80 minutes, in classProject reports due Wednesday, May 3, noon
6 pages, double columnProject presentations on Wednesday May 4 at 2pm in
2
Project presentations on Wednesday, May 4, at 2pm in BWRC
20min + 5 min Q&A
-
2
OutlineLast lecture
Domino logicThis lecture
Other dynamic stylesDigital arithmetic
Reading: Selected publications
3
g p
Other Dynamic Logic Other Dynamic Logic Styles
-
3
Self-Resetting DominoSignals exist as pulses, not levels
5
Used in Pentium 4 (130nm generation)
Pulsed Static CMOS
RH – Reset highRL – Reset low
6
Fast pull-up Fast pull-down
Chen, Ditlow, US Pat. 5,495,188 Feb. 1996.
-
4
Sense-Amplifying Logic
Matsui,JSSC 12/94
7
SA-F/F
8
Falling edge Rising edge
-
5
Dynamic Logic with SA-F/F
9
Example
10
-
6
4-Bit Adder
11
20-Bit Carry-Skip Adder
12
-
7
Pentium 4 (Prescott) 7GHz PathDeleganes, ISSCC’04
I
13
Sense Amplifier
14
Can build in logic if needed
-
8
Timing
15
Carry-Skip Adder
16
-
9
AddersAdders
Arithmetic Circuits
Chapter 11, Rabaey, 2nd ed.Selected journal publicationsBooks:
Ercegovac and Lang, “Digital Arithmetic” Elsevier 2004High-Speed VLSI Arithmetic Units: Adders and Multipliers, by V. Oklobdzija in Chandrakasan et al.
18
-
10
AddersEE141
Ripple carry & implementationCarry bypass (skip)Carry selectCarry lookahead (basic)
EE241Conditional sum
19
More carry lookahead
Conditional Sum Adders
0i i is x y i i iy1i i is x y
0oi i ic x y 1oi i ic x y
20
Sklansky,Trans on Comp6/60
-
11
Conditional Sum Adders
21
TG Conditional Sum
Conditional CellConditional Sum Adder
22
2-way MUXes
Rothermel, JSSC 89
-
12
TG Conditional Sum
Serial connection of transmission gatesSerial connection of transmission gates Chain length = 1+log2n
23Signal propagation
DPL Conditional Sum
24CLA
“Conditional carry select”
-
13
DPL Conditional Sum
Block Conditional Sums
25
Carry-Lookahead AddersAdder trees
Radix of a treeMinimum depth treesSparse trees
Logic manipulationsConventional vs. LingStack height limiting
26
-
14
Lookahead Adder: Basic Idea
AN-1, BN-1A1, B1 • • •A0, B0
P1 PN-1Ci, N-1P0Ci,0 Ci,1
27
S1 • • • SN-1S0
, 1 , , ,, ,i k o k k k i k k k i kC C f A B C g p C
Propagate and Generate Signals
Define 2 (or 3) new variables which ONLY depend on inputs ak, bkDefine 2 (or 3) new variables which ONLY depend on inputs ak, bkGenerate (gk) = akbkPropagate (pk) = ak bk (could be XOR as well)(Delete = akBk)
,out k k k k inc g p g p c
28
Can also derive expressions for s and cout based on dkand pk
( , )k k k k ins g p a b c
-
15
Lookahead Adder
Looakahead Equations
1k k k kc g p c
1 1 1
1 1 1
1 1 1 1
k k k k
k k k k k
k k k k k k
c g p cg p g p cg p g p p c
Position k:
Position k + 1:
29
Carry exists if:- generated in stage k + 1- generated in stage k and propagated through k + 1- propagated through both k and k + 1
Lookahead Adder
• Unrolling of carry recurrence can be continuedUnrolling of carry recurrence can be continued• If unrolled to level k, resulting in two-level AND-OR
structure• AND Fan-In = k + 1, OR Fan-In = k + 1• k + 1 transistors in the MOS stack• Limits k to 2 – 4 • Later referred to as a radix of an adder
30
-
16
Carry Lookahead Trees
Co 0 G0 P0Ci 0+=
Co 1 G1 P1 G0 P1P0 Ci 0+ +=
Co 2 G2 P2G1 P2P1G0 P+ 2 P1P0Ci 0+ +=
G2 P2G1+ = P2P1 G0 P0Ci 0+ + G2:1 P2:1Co 0+=
31
Can continue building the tree hierarchically
Tree Adders
lG ppP m more significantlmG ppP
lmmG gpgG
m – more significantl – less significant
Start from the input P, G, and continue up the tree2-bit groups, then 4-bit groups, …
PG )(
32
lmlmmllmmGG ppgpgpgpgPG ,,,),(
Kogge, Stone, Trans on Comp,’73 Radix 2
-
17
Adder Structure
33
Carry tree and sum precompute operate in parallelSum select – selects the correct precomputed sum based on final carry
Adder OptimizationIf given
Input capacitance, Overall fanout (loading capacitance)Overall fanout (loading capacitance)Wiring structureAdder topology
Optimization can be performed to:Minimize the delay subject to powerMinimize the power for given delay constraint
34
-
18
Design Considerations for CLA Adders
Wire capacitance is determined by the microarchitecture
From register files / Cache / Bypass
Carry signals cross certain number of bitslicesThe adder topology determines the wire capacitance
weak function of gate sizingThe capacitance of wires depends on the tree topology and wiring/shielding methodology
Adder stage 1
Wiring
Adder stage 2
Wiring
Adder stage 3
Bit slice 0
Bit slice 2
Bit slice 1
Bit slice 63
Sum Select
Shifter
Multiplexers
Loopback Bus
From register files / Cache / Bypass
Loopback Bus
Loopback Bus
35
To register files / Cache
Specifying the Output Capacitance
Fanout is dictated by the architectureI It i h IEU d i 6 th IEU In Itanium, each IEU drives 6 other IEUs, register files and the cache, through a long busThus the fanout is larger than 15-20, but depends on the ratio of the IEU input capacitance compared to the bus capacitanceBus is driven through a buffer, thus reducing
36
the adder fanout to close to 1.
-
19
Specifying the Input CapacitanceLarger Cin:
Less impact of internal wiresLess fanout (less impact of the buss)Faster adderPower grows linearly with Cin
Smaller Cin:Larger impact of internal wiresLarger fanoutSlower, lower power adder
Optimum tradeoff:
37
For desired dE/dD (for both adder and 6 IEUs) find optimal Cg/CwFor example dE/dD=2, Cg/Cw = 2.5-3
Carry Tree ConsiderationsNumber of signals merging at each stage (radix)
Uniform vs. non-uniformNumber of logic levels
Full vs. sparse trees
38
-
20
Tree Adders: Kogge-Stone
S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10
S 11
S 12
S 13
S 14
S 15
39
16-bit radix-2 Kogge-Stone Tree
(A0,
B0)
(A1,
B1)
(A2,
B2)
(A3,
B3)
(A4,
B4)
(A5,
B5)
(A6,
B6)
(A7,
B7)
(A8,
B8)
(A9,
B9)
(A10
, B10
)
(A11
, B11
)
(A12
, B12
)
(A13
, B13
)
(A14
, B14
)
(A15
, B15
)
Tree Adders: Other TreesLadner-Fischer
S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10
S 11
S 12
S 13
S 14
S 15
S S S S S S S S S S S S S S S S
40
(A0,
B0)
(A1,
B1)
(A2,
B2)
(A3,
B3)
(A4,
B4)
(A5,
B5)
(A6,
B6)
(A7,
B7)
(A8,
B8)
(A9,
B9)
(A10
, B10
)
(A11
, B11
)
(A12
, B12
)
(A13
, B13
)
(A14
, B14
)
(A15
, B15
)
-
21
Tree Adders: Radix 4
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
41
(a0,
b 0)
(a1,
b 1)
(a2,
b 2)
(a3,
b 3)
(a4,
b 4)
(a5,
b 5)
(a6,
b 6)
(a7,
b 7)
(a8,
b 8)
(a9,
b 9)
(a10
, b10
)
(a11
, b11
)
(a12
, b12
)
(a13
, b13
)
(a14
, b14
)
(a15
, b15
)
16-bit radix-4 Kogge-Stone Tree
Ling Adder
CLA Ling’s equations
:0 1:0
1:0
i i i
i i i
i i i i
i i i i
g a bp a bG g p GS a b G
:0 1 1:0
:0 1 1:0
i i i
i i i
i i i i
i i i i i i
g a bt a bH g t HS t H g t H
42Ling, IBM J. Res. Dev, 5/81
-
22
Ling Adder
G g p g p p g p p p g Conventional radix-4
3:0 3 3 2 3 2 1 3 2 1 0G g p g p p g p p p g
3:0 3 2 2 2 1 1 2 1 0 0
3 2 2 1 2 1 0
H g t g t t g t t t gg g t g t t g
Ling’s radix-4
43
Reduces the stack height (or width)Reduces input loading
Ling vs. CLAConventional G3
Ling’s H3
C K
a3
b3
a3 b3
a2
b2
a2
a1
b2
a1 b1
G 3
CK
a3
b3
a2 a2
b2 a1
b1
b1
a0
H3
b2
a1
44
b1 a0
b0
b1 a0
b0
-
23
Ling vs. CLA: Sum Pre-Computation
Conventional CLA Ling’s
0
1
i i i
i i i
S a b
S a b
0
11 1
i i i
i i i i i
S a b
S a b a b
45
Ling vs. CLA (64 bit)
44
49
Radix-2 Ling0.5 FO4
1 1
19
24
29
34
39
Ener
gy [p
J]
Radix-4 LingRadix-2 CLARadix-4 CLA
0.5 FO4
1
3 234
46
4
9
14
7 9 11 13 15
Delay [FO4]
2 4
-
24
Ling vs. CLATradeoff between the first carry and the sum circuit complexity
Later carry stages are unchanged from conventional CLAReducing the input loading and smaller stack speed up the carryReducing the input loading and smaller stack speed up the carry
Sum gets more complexWith tight power constraints Ling is slower than CLA
47
Next LectureFinish addersWrap-up
48