cpe/ee 427, cpe 527 vlsi design i l13: wires, design for...
Post on 02-Aug-2020
10 Views
Preview:
TRANSCRIPT
•VLSI Design I; A. Milenkovic •1
CPE/EE 427, CPE 527 VLSI Design I
L13: Wires, Design for Speed
Department of Electrical and Computer Engineering University of Alabama in Huntsville
Aleksandar Milenkovic ( www.ece.uah.edu/~milenka )www.ece.uah.edu/~milenka/cpe527-05F
10/11/2005 VLSI Design I; A. Milenkovic 2
Course Administration
• Instructor: Aleksandar Milenkovicmilenka@ece.uah.eduwww.ece.uah.edu/~milenkaEB 217-LMon. 5:30 PM – 6:30 PM, Wen. 12:30 – 13:30 PM
• URL: http://www.ece.uah.edu/~milenka/cpe527-05F• TA: Joel Wilder• Labs: Lab#4: due 10/14/05; Lab#5: 10/21/05• Hws: Solutions in secure directory /scr (cpe427fall05, ?)• Project: Proposals due was on 10/10/05• Test I: 10/17/05• Text: CMOS VLSI Design, 3rd ed., Weste, Harris• Review: Chapters 1, 2, 3, 4;• Today: Wires, Design for Speed (meet AM in the Lab tonight)
•VLSI Design I; A. Milenkovic •2
10/11/2005 VLSI Design I; A. Milenkovic 3
Outline
• Introduction• Wire Resistance• Wire Capacitance• Wire RC Delay• Crosstalk• Wire Engineering• Repeaters
10/11/2005 VLSI Design I; A. Milenkovic 4
Introduction
• Chips are mostly made of wires called interconnect– In stick diagram, wires set size– Transistors are little things under the wires– Many layers of wires
• Wires are as important as transistors– Speed– Power– Noise
• Alternating layers run orthogonally
•VLSI Design I; A. Milenkovic •3
10/11/2005 VLSI Design I; A. Milenkovic 5
Wire Geometry
• Pitch = w + s• Aspect ratio: AR = t/w
– Old processes had AR << 1– Modern processes have AR ≈ 2
• Pack in many skinny wires
l
w s
t
h
10/11/2005 VLSI Design I; A. Milenkovic 6
Layer Stack
• AMI 0.6 µm process has 3 metal layers• Modern processes use 6-10+ metal layers• Example:
Intel 180 nm process• M1: thin, narrow (< 3λ)
– High density cells• M2-M4: thicker
– For longer wires• M5-M6: thickest
– For VDD, GND, clk
Layer T (nm) W (nm) S (nm) AR
6 1720 860 860 2.0
1000
5 1600 800 800 2.0
1000
4 1080 540 540 2.0
7003 700 320 320 2.2
7002 700 320 320 2.2
7001 480 250 250 1.9
800
Substrate
•VLSI Design I; A. Milenkovic •4
10/11/2005 VLSI Design I; A. Milenkovic 7
Wire Resistance
ρ = resistivity (Ω*m)
• R = sheet resistance (Ω/ )– is a dimensionless unit(!)
• Count number of squares– R = R * (# of squares)
l
w
t
1 Rectangular BlockR = R (L/W) Ω
4 Rectangular BlocksR = R (2L/2W) Ω = R (L/W) Ω
t
l
w w
l
l lR Rt w wρ
= =
10/11/2005 VLSI Design I; A. Milenkovic 8
Choice of Metals
• Until 180 nm generation, most wires were aluminum• Modern processes often use copper
– Cu atoms diffuse into silicon and damage FETs– Must be surrounded by a diffusion barrier
5.3Molybdenum (Mo)5.3Tungsten (W)2.8Aluminum (Al)2.2Gold (Au)1.7Copper (Cu)1.6Silver (Ag)Bulk resistivity (µΩ*cm)Metal
•VLSI Design I; A. Milenkovic •5
10/11/2005 VLSI Design I; A. Milenkovic 9
Sheet Resistance
• Typical sheet resistances in 180 nm process
0.08Metal10.05Metal20.05Metal30.03Metal4
0.02Metal60.02Metal5
50-400Polysilicon (no silicide)3-10Polysilicon (silicided)50-200Diffusion (no silicide)3-10Diffusion (silicided)Sheet Resistance (Ω/ )Layer
10/11/2005 VLSI Design I; A. Milenkovic 10
Contacts Resistance
• Contacts and vias also have 2-20 Ω• Use many contacts for lower R
– Many small contacts for current crowding around periphery
•VLSI Design I; A. Milenkovic •6
10/11/2005 VLSI Design I; A. Milenkovic 11
Wire Capacitance
• Wire has capacitance per unit length– To neighbors– To layers above and below
• Ctotal = Ctop + Cbot + 2Cadj
layer n+1
layer n
layer n-1
Cadj
Ctop
Cbot
ws
t
h1
h2
10/11/2005 VLSI Design I; A. Milenkovic 12
Capacitance Trends
• Parallel plate equation: C = εA/d– Wires are not parallel plates, but obey trends– Increasing area (W, t) increases capacitance– Increasing distance (s, h) decreases capacitance
• Dielectric constant– ε = kε0
• ε0 = 8.85 x 10-14 F/cm• k = 3.9 for SiO2
• Processes are starting to use low-k dielectrics– k ≈ 3 (or less) as dielectrics use air pockets
•VLSI Design I; A. Milenkovic •7
10/11/2005 VLSI Design I; A. Milenkovic 13
M2 Capacitance Data
• Typical wires have ~ 0.2 fF/µm– Compare to 2 fF/µm for gate capacitance
0
50
100
150
200
250
300
350
400
0 500 1000 1500 2000
Cto
tal (
aF/µ
m)
w (nm)
Isolated
M1, M3 planes
s = 320s = 480s = 640s= 8
s = 320s = 480s = 640
s= 8
10/11/2005 VLSI Design I; A. Milenkovic 14
Diffusion & Polysilicon
• Diffusion capacitance is very high (about 2 fF/µm)– Comparable to gate capacitance– Diffusion also has high resistance– Avoid using diffusion runners for wires!
• Polysilicon has lower C but high R– Use for transistor gates– Occasionally for very short wires between gates
•VLSI Design I; A. Milenkovic •8
10/11/2005 VLSI Design I; A. Milenkovic 15
Lumped Element Models
• Wires are a distributed system– Approximate with lumped element models
• 3-segment π-model is accurate to 3% in simulation• L-model needs 100 segments for same accuracy!• Use single segment π-model for Elmore delay
C
R
C/N
R/N
C/N
R/N
C/N
R/N
C/N
R/N
R
C
L-model
R
C/2 C/2
R/2 R/2
C
N segments
π-model T-model
10/11/2005 VLSI Design I; A. Milenkovic 16
Example
• Metal2 wire in 180 nm process– 5 mm long– 0.32 µm wide
• Construct a 3-segment π-model– R =– Cpermicron =
•VLSI Design I; A. Milenkovic •9
10/11/2005 VLSI Design I; A. Milenkovic 17
Example
• Metal2 wire in 180 nm process– 5 mm long– 0.32 µm wide
• Construct a 3-segment π-model– R = 0.05 Ω/ => R = 781 Ω– Cpermicron = 0.2 fF/µm => C = 1 pF
260 Ω
167 fF 167 fF
260 Ω
167 fF 167 fF
260 Ω
167 fF 167 fF
10/11/2005 VLSI Design I; A. Milenkovic 18
Wire RC Delay
• Estimate the delay of a 10x inverter driving a 2x inverter at the end of the 5mm wire from the previous example.– R = 2.5 kΩ*µm for gates– Unit inverter: 0.36 µm nMOS, 0.72 µm pMOS
– tpd =
•VLSI Design I; A. Milenkovic •10
10/11/2005 VLSI Design I; A. Milenkovic 19
Wire RC Delay
• Estimate the delay of a 10x inverter driving a 2x inverter at the end of the 5mm wire from the previous example.– R = 2.5 kΩ*µm for gates– Unit inverter: 0.36 µm nMOS, 0.72 µm pMOS
– tpd = 1.1 ns
781 Ω
500 fF 500 fF
Driver Wire
4 fF
Load
690 Ω
10/11/2005 VLSI Design I; A. Milenkovic 20
Simulated Wire Delays
0
0.5
1
1.5
2
2.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
v olta
ge ( V
)
time (nsec)
Vin Vout
L
L/10 L/4 L/2 L
•VLSI Design I; A. Milenkovic •11
10/11/2005 VLSI Design I; A. Milenkovic 21
Wire Delay Models
• Ideal wire– same voltage is present at every segment of the wire at every point in time - at
equi-potential– only holds for very short wires, i.e., interconnects between very nearest
neighbor gates
• Lumped C model– when only a single parasitic component (C, R, or L) is dominant the different
fractions are lumped into a single circuit element• When the resistive component is small and the switching frequency is low to medium,
can consider only C; the wire itself does not introduce any delay; the only impact on performance comes from wire capacitance
cwire
Driver
capacitance per unit length
Vout
Clumped
RDriver Vout
– good for short wires; pessimistic and inaccurate for long wires
10/11/2005 VLSI Design I; A. Milenkovic 22
Wire Delay Models, con’t
• Lumped RC model– total wire resistance is lumped into a single R and
total capacitance into a single C– good for short wires; pessimistic and inaccurate for long wires
• Distributed RC model– circuit parasitics are distributed along the length, L, of the wire
• c and r are the capacitance and resistance per unit length
– Delay is determined using the Elmore delay equation
τDi = ∑ ckrikN
k=1
(r,c,L)VNVin
r∆LVin VN
r∆L r∆L r∆L r∆L
c∆Lc∆Lc∆Lc∆Lc∆L
•VLSI Design I; A. Milenkovic •12
10/11/2005 VLSI Design I; A. Milenkovic 23
Chain Network Elmore Delay
c1 c2 ci-1 ci cN
r1 r2 ri-1 ri rN
VinVN
1 2 i-1 i N
Elmore delay equation τDN = ∑ cirii = ∑ ci ∑ rj
N i
10/11/2005 VLSI Design I; A. Milenkovic 24
Chain Network Elmore Delay
c1 c2 ci-1 ci cN
r1 r2 ri-1 ri rN
VinVN
1 2 i-1 i N
τD1=c1r1 τD2=c1r1 + c2(r1+r2)
τDi=c1r1+ c2(r1+r2)+…+ci(r1+r2+…+ri)
τDi=c1req+ 2c2req+ 3c3req+…+ icireq
Elmore delay equation τDN = ∑ cirii = ∑ ci ∑ rj
N i
•VLSI Design I; A. Milenkovic •13
10/11/2005 VLSI Design I; A. Milenkovic 25
Distributed RC Model for Simple Wires
• A length L RC wire can be modeled by N segments of length L/N– The resistance and capacitance of each segment are given by r L/N
and c L/N
τDN = (L/N)2(cr+2cr+…+Ncr) = (crL2) (N(N+1))/(2N2) = CR((N+1)/(2N))
where R (= rL) and C (= cL) are the total lumped resistance and capacitance of the wire
• For large N τDN = RC/2 = rcL2/2
• Delay of a wire is a quadratic function of its length, L
• The delay is 1/2 of that predicted (by the lumped model)
10/11/2005 VLSI Design I; A. Milenkovic 26
Putting It All TogetherRDriver
Vin
Vout
rw,cw,L
• Total propagation delay consider driver and wireτD = RDriverCw + (RwCw)/2 = RDriverCw + 0.5rwcwL2
and tp = 0.69 RDriverCw + 0.38 RwCwwhere Rw = rwL and Cw = cwL
• The delay introduced by wire resistance becomes dominant when (RwCw)/2 ≥ RDriver CW (when L ≥ 2RDriver/Rw)– For an RDriver = 1 kΩ driving an 1 µm wide Al1 wire, Lcrit is 2.67 cm
•VLSI Design I; A. Milenkovic •14
10/11/2005 VLSI Design I; A. Milenkovic 27
Design Rules of Thumb
• rc delays should be considered when tpRC > tpgate of the driving gate
Lcrit > √ (tpgate/0.38rc)– actual Lcrit depends upon the size of the driving gate and the interconnect
material
• rc delays should be considered when the rise (fall) time at the line input is smaller than RC, the rise (fall) time of the line
trise < RC– when not met, the change in the signal is slower than the propagation
delay of the wire so a lumped C model suffices
10/11/2005 VLSI Design I; A. Milenkovic 28
Delay with Long Interconnects
• When gates are farther apart, wire capacitance and resistance can no longer be ignored.
tp = 0.69RdrCint + (0.69Rdr+0.38Rw)Cw + 0.69(Rdr+Rw)Cfan
where Rdr = (Reqn + Reqp)/2= 0.69Rdr(Cint+Cfan) + 0.69(Rdrcw+rwCfan)L + 0.38rwcwL2
cint
Vin
cfan
(rw, cw, L) Vout
• Wire delay rapidly becomes the dominate factor (due to the quadratic term) in the delay budget for longer wires.
•VLSI Design I; A. Milenkovic •15
10/11/2005 VLSI Design I; A. Milenkovic 29
Crosstalk
• A capacitor does not like to change its voltage instantaneously.
• A wire has high capacitance to its neighbor.– When the neighbor switches from 1-> 0 or 0->1, the wire
tends to switch too.– Called capacitive coupling or crosstalk.
• Crosstalk effects– Noise on nonswitching wires– Increased delay on switching wires
10/11/2005 VLSI Design I; A. Milenkovic 30
Crosstalk Delay
• Assume layers above and below on average are quiet– Second terminal of capacitor can be ignored– Model as Cgnd = Ctop + Cbot
• Effective Cadj depends on behavior of neighbors– Miller effect
A BCadjCgnd Cgnd
Switching opposite ASwitching with AConstant
MCFCeff(A)∆VB
•VLSI Design I; A. Milenkovic •16
10/11/2005 VLSI Design I; A. Milenkovic 31
Crosstalk Delay
• Assume layers above and below on average are quiet– Second terminal of capacitor can be ignored– Model as Cgnd = Ctop + Cbot
• Effective Cadj depends on behavior of neighbors– Miller effect
A BCadjCgnd Cgnd
2Cgnd + 2 Cadj2VDDSwitching opposite A0Cgnd0Switching with A1Cgnd + CadjVDDConstantMCFCeff(A)∆VB
10/11/2005 VLSI Design I; A. Milenkovic 32
Crosstalk Noise
• Crosstalk causes noise on nonswitching wires• If victim is floating:
– model as capacitive voltage divider
Cadj
Cgnd-v
Aggressor
Victim
∆Vaggressor
∆Vvictim
adjvictim aggressor
gnd v adj
CV V
C C−
∆ = ∆+
•VLSI Design I; A. Milenkovic •17
10/11/2005 VLSI Design I; A. Milenkovic 33
Driven Victims
• Usually victim is driven by a gate that fights noise– Noise depends on relative resistances– Victim driver is in linear region, agg. in saturation– If sizes are same, Raggressor = 2-4 x Rvictim
11
adjvictim aggressor
gnd v adj
CV V
C C k−
∆ = ∆+ +
( )( )
aggressor gnd a adjaggressor
victim victim gnd v adj
R C Ck
R C Cττ
−
−
+= =
+
Cadj
Cgnd-v
Aggressor
Victim
∆Vaggressor
∆Vvictim
Raggressor
Rvictim
Cgnd-a
10/11/2005 VLSI Design I; A. Milenkovic 34
Coupling Waveforms
Aggressor
Victim (undriven): 50%
Victim (half size driver): 16%
Victim (equal size driver): 8%Victim (double size driver): 4%
t (ps)0 200 400 600 800 1000 1200 1400 1800 2000
0
0.3
0.6
0.9
1.2
1.5
1.8
• Simulated coupling for Cadj = Cvictim
•VLSI Design I; A. Milenkovic •18
10/11/2005 VLSI Design I; A. Milenkovic 35
Noise Implications
• So what if we have noise?• If the noise is less than the noise margin, nothing
happens• Static CMOS logic will eventually settle to correct
output even if disturbed by large noise spikes– But glitches cause extra delay– Also cause extra power from false transitions
• Dynamic logic never recovers from glitches• Memories and other sensitive circuits also can
produce the wrong answer
10/11/2005 VLSI Design I; A. Milenkovic 36
Wire Engineering
• Goal: achieve delay, area, power goals with acceptable noise
• Degrees of freedom:
•VLSI Design I; A. Milenkovic •19
10/11/2005 VLSI Design I; A. Milenkovic 37
Wire Engineering
• Goal: achieve delay, area, power goals with acceptable noise
• Degrees of freedom:– Width – Spacing
Del
ay (n
s): R
C/2
Wire Spacing(nm)
Cou
plin
g: 2C
adj /
(2C
adj+C
gnd)
00.20.40.6
0.81.01.21.4
1.61.82.0
0 500 1000 1500 20000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 500 1000 1500 2000
320480640
Pitch (nm)Pitch (nm)
10/11/2005 VLSI Design I; A. Milenkovic 38
Wire Engineering
• Goal: achieve delay, area, power goals with acceptable noise
• Degrees of freedom:– Width – Spacing– Layer
Del
ay (n
s): R
C/2
Wire Spacing(nm)
Cou
plin
g: 2C
adj /
(2C
adj+C
gnd)
00.20.40.6
0.81.01.21.4
1.61.82.0
0 500 1000 1500 20000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 500 1000 1500 2000
320480640
Pitch (nm)Pitch (nm)
•VLSI Design I; A. Milenkovic •20
10/11/2005 VLSI Design I; A. Milenkovic 39
Wire Engineering
• Goal: achieve delay, area, power goals with acceptable noise
• Degrees of freedom:– Width – Spacing– Layer– Shielding D
elay
(ns)
: RC
/2
Wire Spacing(nm)
Cou
plin
g: 2C
adj /
(2C
adj+C
gnd)
00.20.40.6
0.81.01.21.4
1.61.82.0
0 500 1000 1500 20000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 500 1000 1500 2000
320480640
Pitch (nm)Pitch (nm)
vdd a0a1gnd a2vdd b0 a1 a2 b2vdd a0 a1 gnd a2 a3 vdd gnd a0 b1
10/11/2005 VLSI Design I; A. Milenkovic 40
Repeaters
• R and C are proportional to l• RC delay is proportional to l2
– Unacceptably great for long wires
•VLSI Design I; A. Milenkovic •21
10/11/2005 VLSI Design I; A. Milenkovic 41
Repeaters
• R and C are proportional to l• RC delay is proportional to l2
– Unacceptably great for long wires• Break long wires into N shorter segments
– Drive each one with an inverter or bufferWire Length: l
Driver Receiver
l/N
Driver
Segment
Repeater
l/N
Repeater
l/N
ReceiverRepeater
N Segments
10/11/2005 VLSI Design I; A. Milenkovic 42
Repeater Design
• How many repeaters should we use?• How large should each one be?• Equivalent Circuit
– Wire length l/N• Wire Capaitance Cw*l/N, Resistance Rw*l/N
– Inverter width W (nMOS = W, pMOS = 2W)• Gate Capacitance C’*W, Resistance R/W
•VLSI Design I; A. Milenkovic •22
10/11/2005 VLSI Design I; A. Milenkovic 43
Repeater Design
• How many repeaters should we use?• How large should each one be?• Equivalent Circuit
– Wire length l• Wire Capacitance Cw*l, Resistance Rw*l
– Inverter width W (nMOS = W, pMOS = 2W)• Gate Capacitance C’*W, Resistance R/W
R/W C'WCwl/2N Cwl/2N
RwlN
10/11/2005 VLSI Design I; A. Milenkovic 44
Repeater Results
• Write equation for Elmore Delay– Differentiate with respect to W and N– Set equal to 0, solve
2
w w
l RCN R C
′=
( )2 2pdw w
tRC R C
l′= +
w
w
RCWR C
=′
~60-80 ps/mm
in 180 nm process
•VLSI Design I; A. Milenkovic •23
Designing for Speed
Department of Electrical and Computer Engineering University of Alabama in Huntsville
10/11/2005 VLSI Design I; A. Milenkovic 46
Review: CMOS Inverter: Dynamic
VDD
Rn
Vout
Vin = V DD
CL
tpHL = f(Rn, CL)
tpHL = 0.69 Reqn CL
tpHL = 0.69 (3/4 (CL VDD)/ IDSATn )
= 0.52 CL / (W/Ln k’n VDSATn )
•VLSI Design I; A. Milenkovic •24
10/11/2005 VLSI Design I; A. Milenkovic 47
Review: Designing Inverters for Performance
• Reduce CL– internal diffusion capacitance of the gate itself– interconnect capacitance– fanout
• Increase W/L ratio of the transistor– the most powerful and effective performance optimization tool in the
hands of the designer– watch out for self-loading!
• Increase VDD– only minimal improvement in performance at the cost of increased
energy dissipation• Slope engineering - keeping signal rise and fall times smaller
than or equal to the gate propagation delays and of approximately equal values– good for performance– good for power consumption
10/11/2005 VLSI Design I; A. Milenkovic 48
Switch Delay Model
A
Rp
A
Rp
A
Rn CL
A
ReqA
Cint
CintCL
A
Rn
A
Rp
B
Rp
B
Rn
NAND
INVERTER
B
Rp
A
Rp
A
Rn
B
Rn CL
NOR
•VLSI Design I; A. Milenkovic •25
10/11/2005 VLSI Design I; A. Milenkovic 49
Input Pattern Effects on Delay
• Delay is dependent on the pattern of inputs
• Low to high transition– both inputs go low
• delay is 0.69 Rp/2 CL since two p-resistors are on in parallel
– one input goes low• delay is 0.69 Rp CL
• High to low transition– both inputs go high
• delay is 0.69 2Rn CL
• Adding transistors in series (without sizing) slows down the circuit
CL
A
Rn
ARp
BRp
B
Rn Cint
10/11/2005 VLSI Design I; A. Milenkovic 50
Delay Dependence on Input Patterns
-0.5
0
0.5
1
1.5
2
2.5
3
0 100 200 300 400
A=B=1→0
A=1, B=1→0
A=1 →0, B=1
time, psec
Vol
tage
, V
57A= 1→0, B=1
76A=1, B=1→0
35A=B=1→0
50A= 0→1, B=1
62A=1, B=0→1
69A=B=0→1
Delay(psec)
Input DataPattern
2-input NAND withNMOS = 0.5µm/0.25 µmPMOS = 0.75µm/0.25 µm
CL = 10 fF
•VLSI Design I; A. Milenkovic •26
10/11/2005 VLSI Design I; A. Milenkovic 51
Transistor Sizing
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
2
2
1 1
11
2
2
10/11/2005 VLSI Design I; A. Milenkovic 52
Fan-In Considerations
DCBA
D
C
B
A CL
C3
C2
C1
Distributed RC model(Elmore delay)
tpHL = 0.69 Reqn(C1+2C2+3C3+4CL)
Propagation delay deteriorates rapidly as a function of fan-in –quadratically in the worst case.
•VLSI Design I; A. Milenkovic •27
10/11/2005 VLSI Design I; A. Milenkovic 53
tp as a Function of Fan-In
0
250
500
750
1000
1250
2 4 6 8 10 12 14 16
tpHL
tpLH
t p(p
sec)
fan-in
quadratic function of fan-in
linear function of fan-in
Gates with a fan-in greater than 4 should be avoided.
tp
10/11/2005 VLSI Design I; A. Milenkovic 54
Fast Complex Gates: Design Technique 1
• Transistor sizing– as long as fan-out capacitance dominates
• Progressive sizing
InN CL
C3
C2
C1In1
In2
In3
M1
M2
M3
MN
Distributed RC line
M1 > M2 > M3 > … > MN
The fet closest to the output should be the smallest.
Can reduce delay by more than 20%; decreasing gains as technology shrinks
•VLSI Design I; A. Milenkovic •28
10/11/2005 VLSI Design I; A. Milenkovic 55
Fast Complex Gates: Design Technique 2
• Input re-ordering– when not all inputs arrive at the same time
C2
C1In1
In2
In3
M1
M2
M3 CL
C2
C1In3
In2
In1
M1
M2
M3 CL
critical path critical path
1
0→1
1
1
1
0→1 chargedcharged
10/11/2005 VLSI Design I; A. Milenkovic 56
Fast Complex Gates: Design Technique 2
• Input re-ordering– when not all inputs arrive at the same time
C2
C1In1
In2
In3
M1
M2
M3 CL
C2
C1In3
In2
In1
M1
M2
M3 CL
critical path critical path
charged1
0→1charged
charged1
delay determined by time to discharge CL, C1 and C2
delay determined by time to discharge CL
1
1
0→1 charged
discharged
discharged
•VLSI Design I; A. Milenkovic •29
10/11/2005 VLSI Design I; A. Milenkovic 57
Sizing and Ordering Effects
DCBA
D
C
B
A CL
C3
C2
C1
Progressive sizing in pull-down chain gives up to a 23% improvement.
Input ordering saves 5%critical path A – 23% critical path D – 17%
3 3 3 3
4
4
4
4
4
5
6
7
= 100 fF
10/11/2005 VLSI Design I; A. Milenkovic 58
Fast Complex Gates: Design Technique 3• Alternative logic structures
F = ABCDEFGH
•VLSI Design I; A. Milenkovic •30
10/11/2005 VLSI Design I; A. Milenkovic 59
Fast Complex Gates: Design Technique 4
• Isolating fan-in from fan-out using buffer insertion
CLCL
• Real lesson is that optimizing the propagation delay of a gate in isolation is misguided.
10/11/2005 VLSI Design I; A. Milenkovic 60
Logical Effort: Design Technique 5
• Logical effort generalizes to multistage networks• Path Logical Effort
• Path Electrical Effort
• Path Effort
iG g= ∏out-path
in-path
CH
C=
i i iF f g h= =∏ ∏10 x y z 20g1 = 1h1 = x/10
g2 = 5/3h2 = y/x
g3 = 4/3h3 = z/y
g4 = 1h4 = 20/z
•VLSI Design I; A. Milenkovic •31
10/11/2005 VLSI Design I; A. Milenkovic 61
Branching Effort
• Introduce branching effort– Accounts for branching between stages in path
• Now we compute the path effort– F = GBH
on path off path
on path
C Cb
C+
=
iB b= ∏ih BH=∏
Note:
10/11/2005 VLSI Design I; A. Milenkovic 62
Multistage Delays
• Path Effort Delay
• Path Parasitic Delay
• Path Delay
F iD f= ∑iP p= ∑i FD d D P= = +∑
•VLSI Design I; A. Milenkovic •32
10/11/2005 VLSI Design I; A. Milenkovic 63
Designing Fast Circuits
• Delay is smallest when each stage bears same effort
• Thus minimum delay of N stage path is
• This is a key result of logical effort– Find fastest possible delay– Doesn’t require calculating gate sizes
i FD d D P= = +∑
1ˆ Ni if g h F= =
1ND NF P= +
10/11/2005 VLSI Design I; A. Milenkovic 64
Gate Sizes
• How wide should the gates be for least delay?
• Working backward, apply capacitance transformation to find input capacitance of each gate given load it drives.
• Check work by verifying input cap spec is met.
ˆ
ˆ
out
in
i
i
CC
i outin
f gh g
g CC
f
= =
⇒ =
•VLSI Design I; A. Milenkovic •33
10/11/2005 VLSI Design I; A. Milenkovic 65
Best Number of Stages
• How many stages should a path use?– Minimizing number of stages is not always fastest
• Example: drive 64-bit datapath with unit inverter
D =
1 1 1 1
64 64 64 64
Initial Driver
Datapath Load
N:f:D:
1 2 3 4
10/11/2005 VLSI Design I; A. Milenkovic 66
Best Number of Stages
• How many stages should a path use?– Minimizing number of stages is not always fastest
• Example: drive 64-bit datapath with unit inverter
D = NF1/N + P= N(64)1/N + N
1 1 1 1
8 4
16 8
2.8
23
64 64 64 64
Initial Driver
Datapath Load
N:f:D:
16465
2818
3415
42.815.3
Fastest
•VLSI Design I; A. Milenkovic •34
10/11/2005 VLSI Design I; A. Milenkovic 67
Derivation
• Consider adding inverters to end of path– How many give least delay?
• Define best stage effort
N - n1 Extra InvertersLogic Block:n1 Stages
Path Effort F( )11
11
N
n
i invi
D NF p N n p=
= + + −∑1 1 1
ln 0N N Ninv
D F F F pN
∂= − + + =
∂
( )1 ln 0invp ρ ρ+ − =
1NFρ =
10/11/2005 VLSI Design I; A. Milenkovic 68
Best Stage Effort
• has no closed-form solution
• Neglecting parasitics (pinv = 0), we find ρ = 2.718 (e)
• For pinv = 1, solve numerically for ρ = 3.59
( )1 ln 0invp ρ ρ+ − =
•VLSI Design I; A. Milenkovic •35
10/11/2005 VLSI Design I; A. Milenkovic 69
Sensitivity Analysis
• How sensitive is delay to using exactly the best number of stages?
• 2.4 < ρ < 6 gives delay within 15% of optimal– We can be sloppy!– I like ρ = 4
1.0
1.2
1.4
1.6
1.0 2.00.5 1.40.7
N / N
1.151.26
1.51
(ρ =2.4)(ρ=6)
D(N)
/D(N
)
0.0
top related