ee241 - spring 2005bwrcs.eecs.berkeley.edu/classes/icdesign/ee241_s05/...1 ee241 - spring 2005...
TRANSCRIPT
1
EE241 - Spring 2005Advanced Digital Integrated Circuits
Lecture 6:Optimization for Performance
2
Admin
Project proposals due by Fr 5pm (by e-mail to Huifangand myself)
Title
Short abstract of 10-15 lines describing the problem you are trying to address
Special office hours today right after class (3:30-4:30pm)
Some feedback on ISSCC? What did catch your eye?
2
3
Today’s lecture
Using the models we have created so far to do create an environment for optimization
Reading:ICCAD paper by Stojanovic et al.
Chapters 2 and 3 in the text by K. Bernstein (High Speed CMOS Design Styles)
Background material from Rabaey, 2nd ed, Chapters 5, 6.
4
Static Timing Analysis
Computing critical (longest) path delayLongest path algorithm on DAG [Kirkpatrick, IBM Jo. R&D, 1966]
Used in most ASIC designs today
LimitationsFalse paths
Simultaneous arrival times
3
5
Signal Arrival Times
NAND gate:
1
6
Signal Arrival Times
NAND gate:
1
4
7
Simultaneous Arrival Times
NAND gate:
8
Impact of Arrival Times
A
B
Delay
0 tA - tB
A arrives early B arrives early
Up to 25%
5
9
Optimization for Performance
Performance critical blocks
Start with a synthesized designEasier to explore architectures
Easy to verify
Provides some level of performance optimization
Understand the limits of synthesized designs
10
Performance Optimization
Power
Delay
Increasing the performanceincreases power!
6
11
Performance Optimization
Power
Delay
Microarchitecture A
Microarchitecture B
12
Performance Optimization
Power
Delay
SynthesizedMicroarchitecture A
Microarchitecture B
CustomMicroarchitecture A
7
13
How to Increase Performance?
Scale technology
Circuit level:Transistor sizing, buffering
Wire optimization, repeaters
Supply and Threshold voltage
Logic styles
Timing, latches
MicroarchitectureBlock topologies (adders, multipliers)
Pipelining
Parallelism
14
Sizing Logic Paths for Speed
Frequently, input capacitance of a logic path is constrainedLogic has to drive some capacitanceExample: ALU load in an Intel’s microprocessor is > 0.5pFHow do we size the ALU datapath to achieve maximum speed?Review the method of logical effort
8
15
Inverter Chain
CL
If CL and CIn are given:- How many stages are needed to minimize the delay?- How to size the inverters?
May need some additional constraints.
In Out
16
Delay Formula
( )
( ) ( )γ/1/1
~
0int ftCCCkRt
CCRDelay
pLWp
LW
+=+=
+
int
int
Cint = γCgin with γ ≈ 1f = CL/Cgin - effective fanoutR = Runit/W ; Cint =WCunittp0 = 0.7RunitCunit
9
17
Apply to Inverter Chain
CL
In Out
1 2 N
tp = tp1 + tp2 + …+ tpN
⎟⎟⎠
⎞⎜⎜⎝
⎛+ +
jgin
jginunitunitpj C
CCRt
,
1,1~γ
LNgin
N
i jgin
jginp
N
jjpp CC
C
Cttt =⎟
⎟⎠
⎞⎜⎜⎝
⎛+== +
=
+
=∑∑ 1,
1 ,
1,0
1, ,1
γ
18
Apply to Inverter Chain
CL
In Out
1 2 N
tp = tp1 + tp2 + …+ tpN
⎟⎟⎠
⎞⎜⎜⎝
⎛+ +
jgin
jginunitunitpj C
CCRt
,
1,1~
LNgin
N
i jgin
jginp
N
jjpp CC
C
Cttt =⎟
⎟⎠
⎞⎜⎜⎝
⎛+== +
=
+
=∑∑ 1,
1 ,
1,0
1, ,1
1=γ
10
19
Optimal Tapering for Given N
Delay equation has N - 1 unknowns, Cgin,2 – Cgin,N
Minimize the delay, find N - 1 partial derivatives
Result: Cgin,j+1/Cgin,j = Cgin,j/Cgin,j-1
Size of each stage is the geometric mean of two neighbors
- each stage has the same effective fanout (Cout/Cin)- each stage has the same delay
1,1,, +−= jginjginjgin CCC
20
Optimum Delay and Number of Stages
1,/ ginLN CCFf ==
When each stage is sized by f and has same effective fanout f:
N Ff =
( )γ/10N
pp FNtt +=
Minimum path delay
Effective fanout of each stage:
11
21
Example
CL= 8 C1
In Out
C11 f f2
283 ==f
CL/C1 has to be evenly distributed across N = 3 stages:
22
Optimum Number of Stages
For a given load, CL and given input capacitance CinFind optimal number of stages, N, and optimal sizing, f
( ) ⎟⎠⎞
⎜⎝⎛ +=+=
fffFt
FNtt pNpp lnln
ln1/ 0/1
0γ
γγ
0ln
1lnln2
0 =−−⋅=∂∂
f
ffFt
f
t pp γγ
For γ = 0, f = e, N = lnF
fF
NCfCFC inN
inL lnln
with ==⋅=
fγf += 1e
12
23
Optimum Effective Fanout f
Optimum f for given process defined by γ( )ff γ+= 1e
fopt = 3.6for γ=1
0 0.5 1 1.5 2 2.5 32.5
3
3.5
4
4.5
5
γ
f op
t
24
Impact of Loading on tp
With self-loading γ=1
1 1.5 2 2.5 3 3.5 4 4.5 50
1
2
3
4
5
6
7
f
norm
aliz
ed d
elay
13
25
Extending the Model
For given N: Ci+1/Ci = Ci/Ci-1To find N: Ci+1/Ci ~ 4
Method of logical effort generalizes this to any logic path
CL
In Out
1 2 N
( )∑=
⋅+=N
iiii fgpDelay
1(in units of τinv)
26
Logical Effort
( )fgp
CC
CRkDelayin
Lunitunit
⋅+=
⎟⎟⎠
⎞⎜⎜⎝
⎛+⋅=
τγ
1
p – intrinsic delay - gate parameter ≠ f(W)g – logical effort – gate parameter ≠ f(W)f – electrical effort (fanout)
Normalize everything to an inverter:ginv =1, pinv = 1
Divide everything by τinv
(everything is measured in unit delays τinv)Assume γ = 1.
14
27
Delay in a Logic Gate
Gate delay:
d = h + p
effort delay intrinsic delay
Effort delay:
h = g f
logical effort effective fanout = Cout/Cin
Logical effort is a function of topology, independent of sizingEffective fanout (electrical effort) is a function of load/gate size
28
Logical Effort
Inverter has the smallest logical effort and intrinsic delay of all static CMOS gatesLogical effort of a gate presents the ratio of its input capacitance to the inverter capacitance when sized to deliver the same current
Logical effort increases with the gate complexity
15
29
Logical Effort
Logical effort is the ratio of input capacitance of a gate to the inputcapacitance of an inverter with the same output current
g = 1 g = g =
Size factor:1.8Size factor:1.5
30
Logical Effort of Gates
Fan-out (f)
Nor
mal
ized
del
ay (
d)
t
1 2 3 4 5 6 7
pINV
t pNAND
F(Fan-in)
g=p=d=
g=p=d=
16
31
Logical Effort of Gates
Fan-out (f)
Nor
mal
ized
del
ay (
d)t
1 2 3 4 5 6 7
pINVtpNAND
F(Fan-in)
g=1p=1d=f+1
g=3.5/3p=5.5/3d=(3.5/3)f+1.8
32
Add Branching Effort
Branching effort:
pathon
pathoffpathon
C
CCb
−
−− +=
17
33
Multistage Networks
Stage effort: hi = gifi
Path electrical effort: F = Cout/Cin
Path logical effort: G = g1g2…gN
Branching effort: B = b1b2…bN
Path effort: H = GFB
Path delay D = Σdi = Σpi + Σhi
( )∑=
⋅+=N
iiii fgpDelay
1
34
Optimum Effort per Stage
HhN =
When each stage bears the same effort:
N Hh =
( ) PNHpfgD Niii +=+=∑ /1ˆ
Minimum path delay
Effective fanout of each stage: ii ghf =
Stage efforts: g1f1 = g2f2 = … = gNfN
18
35
Optimal Number of Stages
For a given load, and given input capacitance of the first gateFind optimal number of stages and optimal sizing
PNHD N += /1
( ) 0ln /1/1/1 =++−=∂∂
PHHHND NNN
NHhˆ/1=Substitute ‘best stage effort’
36
Logical Effort Optimization Methodology
For smaller problems, easy to translate into set of analytical expressions
Feed them into Matlab optimizerWith some careful manipulations, can be turned into a convex optimization problem (Stojanovic)
Easily extended to add power/energy
19
37
Optimization for Performance
Options• Technology choice
CMOS, bipolar, BiCMOS, GaAs, Superconducting• Logic level optimizations
logic depth, network topology, fan-out, gate complexity• Circuit optimizations
logic style, transistor sizing• Physical optimization
implementation choice, layout strategy
• Wires are the key
38
Logic Level Optimizations
R R
Logic Depth
or
Techniques: Restructuring, pipelining, retiming, technology mapping
Well covered by today’s logic and sequential synthesis
20
39
Logic Optimizations (2)
Technique: Removal of common sub-expressionStart from tree structure/output
Fanout
Tp = O(FO) also effects wiring capacitance
Late arriving
40
Logic Optimizations (3)
1 3 5 7 9fan-in
0.0
1.0
2.0
3.0
4.0
t p(n
sec)
tpHL
tp
tpLHlinear
quadratic
AVOID LARGE FAN-IN GATES! (Typically not more than FI < 4)
Tp = O(FI2) !Observation: only true if FI
translates in series devices -
otherwise linear
e.g. NAND pull-down
NOR pull-up
Fanin
21
41
Logic Optimizations (4)
Fan-out
t p(p
sec)
t
1 2 3 4 5 6 7
pINVtpNAND
F(Fan-in)
Slope is a function of “driving strength”
pNORt
All the gates have the same drive current
42
Technology Mapping for Performance
Alternative coverings
Use low FI modules on critical path(s)Library composition?
22
43
CMOS Logic Styles
CMOS tradeoffs:SpeedPower (energy)Area
Design tradeoffsRobustness, scalability
Design time
Many styles: don’t try to remember the names –remember the principlesChanging the logic style – can it be done without breaking the synthesis flow?
44
CMOS Logic Styles
PUN
PDN
ABC
OUT
VDD
GND
ABC
Complementary
robustscales
large and slow
LOGICNETWORK
ABC
OUT
Pass Transistor Logic
simple and fastnot always very efficientversatile
23
45
CMOS Logic Styles
LOAD
ABC PDN
OUT
GND
GND
VDD
Ratioed Logic
small & faststatic power
RPDN <<RLOAD
VDD
PDN
φ
In1In2
In3
Out
φ
CL
Dynamic Logic
Small & fastest!Noise issuesScales?
46
Pulsed Static CMOS
RH – Reset highRL – Reset low
Fast pull-up Fast pull-down
Chen, Ditlow, US Pat. 5,495,188 Feb. 1996.
24
47
PS-CMOS
Evaluation and reset waves: reset is 1.5x slower
48
PS-CMOS
Advantages:
No dynamic nodes – good noise immunity
Reset delay slower than evaluation
No data dependent delay (worst case gets better)
No false transitions
Disadvantages
Width of reset wave limits logic depth
Margin in design
25
49
Skewing Gates
Different rising and falling delays
W
W
LE =
50
Skewing Gates
4W
W
LE =
26
51
Skewing Gates