cpe/ee 427, cpe 527 vlsi design i l13: wires, design for...

•VLSI Design I; A. Milenkovic •1

CPE/EE 427, CPE 527 VLSI Design I

L13: Wires, Design for Speed

Department of Electrical and Computer Engineering University of Alabama in Huntsville

Aleksandar Milenkovic ( www.ece.uah.edu/~milenka )www.ece.uah.edu/~milenka/cpe527-05F

10/11/2005 VLSI Design I; A. Milenkovic 2

Course Administration

• Instructor: Aleksandar Milenkovicmilenka@ece.uah.eduwww.ece.uah.edu/~milenkaEB 217-LMon. 5:30 PM – 6:30 PM, Wen. 12:30 – 13:30 PM

• URL: http://www.ece.uah.edu/~milenka/cpe527-05F• TA: Joel Wilder• Labs: Lab#4: due 10/14/05; Lab#5: 10/21/05• Hws: Solutions in secure directory /scr (cpe427fall05, ?)• Project: Proposals due was on 10/10/05• Test I: 10/17/05• Text: CMOS VLSI Design, 3rd ed., Weste, Harris• Review: Chapters 1, 2, 3, 4;• Today: Wires, Design for Speed (meet AM in the Lab tonight)

Outline

• Introduction• Wire Resistance• Wire Capacitance• Wire RC Delay• Crosstalk• Wire Engineering• Repeaters

Introduction

• Chips are mostly made of wires called interconnect– In stick diagram, wires set size– Transistors are little things under the wires– Many layers of wires

• Wires are as important as transistors– Speed– Power– Noise

• Alternating layers run orthogonally

Wire Geometry

• Pitch = w + s• Aspect ratio: AR = t/w

– Old processes had AR << 1– Modern processes have AR ≈ 2

• Pack in many skinny wires

Layer Stack

• AMI 0.6 µm process has 3 metal layers• Modern processes use 6-10+ metal layers• Example:

Intel 180 nm process• M1: thin, narrow (< 3λ)

– High density cells• M2-M4: thicker

– For longer wires• M5-M6: thickest

– For VDD, GND, clk

Layer T (nm) W (nm) S (nm) AR

6 1720 860 860 2.0

5 1600 800 800 2.0

4 1080 540 540 2.0

7003 700 320 320 2.2

7002 700 320 320 2.2

7001 480 250 250 1.9

Substrate

Wire Resistance

ρ = resistivity (Ω*m)

• R = sheet resistance (Ω/ )– is a dimensionless unit(!)

• Count number of squares– R = R * (# of squares)

1 Rectangular BlockR = R (L/W) Ω

4 Rectangular BlocksR = R (2L/2W) Ω = R (L/W) Ω

l lR Rt w wρ

Choice of Metals

• Until 180 nm generation, most wires were aluminum• Modern processes often use copper

– Cu atoms diffuse into silicon and damage FETs– Must be surrounded by a diffusion barrier

5.3Molybdenum (Mo)5.3Tungsten (W)2.8Aluminum (Al)2.2Gold (Au)1.7Copper (Cu)1.6Silver (Ag)Bulk resistivity (µΩ*cm)Metal

Sheet Resistance

• Typical sheet resistances in 180 nm process

0.08Metal10.05Metal20.05Metal30.03Metal4

0.02Metal60.02Metal5

50-400Polysilicon (no silicide)3-10Polysilicon (silicided)50-200Diffusion (no silicide)3-10Diffusion (silicided)Sheet Resistance (Ω/ )Layer

Contacts Resistance

• Contacts and vias also have 2-20 Ω• Use many contacts for lower R

– Many small contacts for current crowding around periphery

Wire Capacitance

• Wire has capacitance per unit length– To neighbors– To layers above and below

• Ctotal = Ctop + Cbot + 2Cadj

layer n+1

layer n

layer n-1

Capacitance Trends

• Parallel plate equation: C = εA/d– Wires are not parallel plates, but obey trends– Increasing area (W, t) increases capacitance– Increasing distance (s, h) decreases capacitance

• Dielectric constant– ε = kε0

• ε0 = 8.85 x 10-14 F/cm• k = 3.9 for SiO2

• Processes are starting to use low-k dielectrics– k ≈ 3 (or less) as dielectrics use air pockets

M2 Capacitance Data

• Typical wires have ~ 0.2 fF/µm– Compare to 2 fF/µm for gate capacitance

0 500 1000 1500 2000

w (nm)

Isolated

M1, M3 planes

s = 320s = 480s = 640s= 8

s = 320s = 480s = 640

Diffusion & Polysilicon

• Diffusion capacitance is very high (about 2 fF/µm)– Comparable to gate capacitance– Diffusion also has high resistance– Avoid using diffusion runners for wires!

• Polysilicon has lower C but high R– Use for transistor gates– Occasionally for very short wires between gates

Lumped Element Models

• Wires are a distributed system– Approximate with lumped element models

• 3-segment π-model is accurate to 3% in simulation• L-model needs 100 segments for same accuracy!• Use single segment π-model for Elmore delay

L-model

C/2 C/2

R/2 R/2

N segments

π-model T-model

Example

• Metal2 wire in 180 nm process– 5 mm long– 0.32 µm wide

• Construct a 3-segment π-model– R =– Cpermicron =

Example

• Metal2 wire in 180 nm process– 5 mm long– 0.32 µm wide

• Construct a 3-segment π-model– R = 0.05 Ω/ => R = 781 Ω– Cpermicron = 0.2 fF/µm => C = 1 pF

260 Ω

167 fF 167 fF

260 Ω

167 fF 167 fF

260 Ω

167 fF 167 fF

Wire RC Delay

• Estimate the delay of a 10x inverter driving a 2x inverter at the end of the 5mm wire from the previous example.– R = 2.5 kΩ*µm for gates– Unit inverter: 0.36 µm nMOS, 0.72 µm pMOS

– tpd =

Wire RC Delay

• Estimate the delay of a 10x inverter driving a 2x inverter at the end of the 5mm wire from the previous example.– R = 2.5 kΩ*µm for gates– Unit inverter: 0.36 µm nMOS, 0.72 µm pMOS

– tpd = 1.1 ns

781 Ω

500 fF 500 fF

Driver Wire

690 Ω

Simulated Wire Delays

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

v olta

ge ( V

time (nsec)

Vin Vout

L/10 L/4 L/2 L

Wire Delay Models

• Ideal wire– same voltage is present at every segment of the wire at every point in time - at

equi-potential– only holds for very short wires, i.e., interconnects between very nearest

neighbor gates

• Lumped C model– when only a single parasitic component (C, R, or L) is dominant the different

fractions are lumped into a single circuit element• When the resistive component is small and the switching frequency is low to medium,

can consider only C; the wire itself does not introduce any delay; the only impact on performance comes from wire capacitance

Driver

capacitance per unit length

Clumped

RDriver Vout

– good for short wires; pessimistic and inaccurate for long wires

Wire Delay Models, con’t

• Lumped RC model– total wire resistance is lumped into a single R and

total capacitance into a single C– good for short wires; pessimistic and inaccurate for long wires

• Distributed RC model– circuit parasitics are distributed along the length, L, of the wire

• c and r are the capacitance and resistance per unit length

– Delay is determined using the Elmore delay equation

τDi = ∑ ckrikN

(r,c,L)VNVin

r∆LVin VN

r∆L r∆L r∆L r∆L

c∆Lc∆Lc∆Lc∆Lc∆L

Chain Network Elmore Delay

c1 c2 ci-1 ci cN

r1 r2 ri-1 ri rN

1 2 i-1 i N

Elmore delay equation τDN = ∑ cirii = ∑ ci ∑ rj

Chain Network Elmore Delay

c1 c2 ci-1 ci cN

r1 r2 ri-1 ri rN

1 2 i-1 i N

τD1=c1r1 τD2=c1r1 + c2(r1+r2)

τDi=c1r1+ c2(r1+r2)+…+ci(r1+r2+…+ri)

τDi=c1req+ 2c2req+ 3c3req+…+ icireq

Elmore delay equation τDN = ∑ cirii = ∑ ci ∑ rj

Distributed RC Model for Simple Wires

• A length L RC wire can be modeled by N segments of length L/N– The resistance and capacitance of each segment are given by r L/N

and c L/N

τDN = (L/N)2(cr+2cr+…+Ncr) = (crL2) (N(N+1))/(2N2) = CR((N+1)/(2N))

where R (= rL) and C (= cL) are the total lumped resistance and capacitance of the wire

• For large N τDN = RC/2 = rcL2/2

• Delay of a wire is a quadratic function of its length, L

• The delay is 1/2 of that predicted (by the lumped model)

Putting It All TogetherRDriver

rw,cw,L

• Total propagation delay consider driver and wireτD = RDriverCw + (RwCw)/2 = RDriverCw + 0.5rwcwL2

and tp = 0.69 RDriverCw + 0.38 RwCwwhere Rw = rwL and Cw = cwL

• The delay introduced by wire resistance becomes dominant when (RwCw)/2 ≥ RDriver CW (when L ≥ 2RDriver/Rw)– For an RDriver = 1 kΩ driving an 1 µm wide Al1 wire, Lcrit is 2.67 cm

Design Rules of Thumb

• rc delays should be considered when tpRC > tpgate of the driving gate

Lcrit > √ (tpgate/0.38rc)– actual Lcrit depends upon the size of the driving gate and the interconnect

material

• rc delays should be considered when the rise (fall) time at the line input is smaller than RC, the rise (fall) time of the line

trise < RC– when not met, the change in the signal is slower than the propagation

delay of the wire so a lumped C model suffices

Delay with Long Interconnects

• When gates are farther apart, wire capacitance and resistance can no longer be ignored.

tp = 0.69RdrCint + (0.69Rdr+0.38Rw)Cw + 0.69(Rdr+Rw)Cfan

where Rdr = (Reqn + Reqp)/2= 0.69Rdr(Cint+Cfan) + 0.69(Rdrcw+rwCfan)L + 0.38rwcwL2

(rw, cw, L) Vout

• Wire delay rapidly becomes the dominate factor (due to the quadratic term) in the delay budget for longer wires.

Crosstalk

• A capacitor does not like to change its voltage instantaneously.

• A wire has high capacitance to its neighbor.– When the neighbor switches from 1-> 0 or 0->1, the wire

tends to switch too.– Called capacitive coupling or crosstalk.

• Crosstalk effects– Noise on nonswitching wires– Increased delay on switching wires

Crosstalk Delay

• Assume layers above and below on average are quiet– Second terminal of capacitor can be ignored– Model as Cgnd = Ctop + Cbot

• Effective Cadj depends on behavior of neighbors– Miller effect

A BCadjCgnd Cgnd

Switching opposite ASwitching with AConstant

MCFCeff(A)∆VB

Crosstalk Delay

• Assume layers above and below on average are quiet– Second terminal of capacitor can be ignored– Model as Cgnd = Ctop + Cbot

• Effective Cadj depends on behavior of neighbors– Miller effect

A BCadjCgnd Cgnd

2Cgnd + 2 Cadj2VDDSwitching opposite A0Cgnd0Switching with A1Cgnd + CadjVDDConstantMCFCeff(A)∆VB

Crosstalk Noise

• Crosstalk causes noise on nonswitching wires• If victim is floating:

– model as capacitive voltage divider

Cgnd-v

Aggressor

Victim

∆Vaggressor

∆Vvictim

adjvictim aggressor

gnd v adj

C C−

∆ = ∆+

Driven Victims

• Usually victim is driven by a gate that fights noise– Noise depends on relative resistances– Victim driver is in linear region, agg. in saturation– If sizes are same, Raggressor = 2-4 x Rvictim

adjvictim aggressor

gnd v adj

C C k−

∆ = ∆+ +

( )( )

aggressor gnd a adjaggressor

victim victim gnd v adj

R C Ck

R C Cττ

Cgnd-v

Aggressor

Victim

∆Vaggressor

∆Vvictim

Raggressor

Rvictim

Cgnd-a

Coupling Waveforms

Aggressor

Victim (undriven): 50%

Victim (half size driver): 16%

Victim (equal size driver): 8%Victim (double size driver): 4%

t (ps)0 200 400 600 800 1000 1200 1400 1800 2000

• Simulated coupling for Cadj = Cvictim

Noise Implications

• So what if we have noise?• If the noise is less than the noise margin, nothing

happens• Static CMOS logic will eventually settle to correct

output even if disturbed by large noise spikes– But glitches cause extra delay– Also cause extra power from false transitions

• Dynamic logic never recovers from glitches• Memories and other sensitive circuits also can

produce the wrong answer

Wire Engineering

• Goal: achieve delay, area, power goals with acceptable noise

• Degrees of freedom:

Wire Engineering

• Degrees of freedom:– Width – Spacing

Wire Spacing(nm)

00.20.40.6

0.81.01.21.4

1.61.82.0

0 500 1000 1500 20000

0 500 1000 1500 2000

320480640

Pitch (nm)Pitch (nm)

Wire Engineering

• Degrees of freedom:– Width – Spacing– Layer

Wire Spacing(nm)

00.20.40.6

0.81.01.21.4

1.61.82.0

0 500 1000 1500 20000

0 500 1000 1500 2000

320480640

Wire Engineering

• Degrees of freedom:– Width – Spacing– Layer– Shielding D

Wire Spacing(nm)

00.20.40.6

0.81.01.21.4

1.61.82.0

0 500 1000 1500 20000

0 500 1000 1500 2000

320480640

vdd a0a1gnd a2vdd b0 a1 a2 b2vdd a0 a1 gnd a2 a3 vdd gnd a0 b1

Repeaters

• R and C are proportional to l• RC delay is proportional to l2

– Unacceptably great for long wires

Repeaters

• R and C are proportional to l• RC delay is proportional to l2

– Unacceptably great for long wires• Break long wires into N shorter segments

– Drive each one with an inverter or bufferWire Length: l

Driver Receiver

Driver

Segment

Repeater

ReceiverRepeater

N Segments

Repeater Design

• How many repeaters should we use?• How large should each one be?• Equivalent Circuit

– Wire length l/N• Wire Capaitance Cw*l/N, Resistance Rw*l/N

– Inverter width W (nMOS = W, pMOS = 2W)• Gate Capacitance C’*W, Resistance R/W

Repeater Design

• How many repeaters should we use?• How large should each one be?• Equivalent Circuit

– Wire length l• Wire Capacitance Cw*l, Resistance Rw*l

– Inverter width W (nMOS = W, pMOS = 2W)• Gate Capacitance C’*W, Resistance R/W

R/W C'WCwl/2N Cwl/2N

Repeater Results

• Write equation for Elmore Delay– Differentiate with respect to W and N– Set equal to 0, solve

l RCN R C

( )2 2pdw w

tRC R C

l′= +

RCWR C

~60-80 ps/mm

in 180 nm process

Designing for Speed

Department of Electrical and Computer Engineering University of Alabama in Huntsville

Review: CMOS Inverter: Dynamic

Vin = V DD

tpHL = f(Rn, CL)

tpHL = 0.69 Reqn CL

tpHL = 0.69 (3/4 (CL VDD)/ IDSATn )

= 0.52 CL / (W/Ln k’n VDSATn )

Review: Designing Inverters for Performance

• Reduce CL– internal diffusion capacitance of the gate itself– interconnect capacitance– fanout

• Increase W/L ratio of the transistor– the most powerful and effective performance optimization tool in the

hands of the designer– watch out for self-loading!

• Increase VDD– only minimal improvement in performance at the cost of increased

energy dissipation• Slope engineering - keeping signal rise and fall times smaller

than or equal to the gate propagation delays and of approximately equal values– good for performance– good for power consumption

Switch Delay Model

CintCL

INVERTER

Input Pattern Effects on Delay

• Delay is dependent on the pattern of inputs

• Low to high transition– both inputs go low

• delay is 0.69 Rp/2 CL since two p-resistors are on in parallel

– one input goes low• delay is 0.69 Rp CL

• High to low transition– both inputs go high

• delay is 0.69 2Rn CL

• Adding transistors in series (without sizing) slows down the circuit

Rn Cint

Delay Dependence on Input Patterns

0 100 200 300 400

A=B=1→0

A=1, B=1→0

A=1 →0, B=1

time, psec

57A= 1→0, B=1

76A=1, B=1→0

35A=B=1→0

50A= 0→1, B=1

62A=1, B=0→1

69A=B=0→1

Delay(psec)

Input DataPattern

2-input NAND withNMOS = 0.5µm/0.25 µmPMOS = 0.75µm/0.25 µm

CL = 10 fF

Transistor Sizing

Rn Cint

Fan-In Considerations

Distributed RC model(Elmore delay)

tpHL = 0.69 Reqn(C1+2C2+3C3+4CL)

Propagation delay deteriorates rapidly as a function of fan-in –quadratically in the worst case.

tp as a Function of Fan-In

2 4 6 8 10 12 14 16

fan-in

quadratic function of fan-in

linear function of fan-in

Gates with a fan-in greater than 4 should be avoided.

Fast Complex Gates: Design Technique 1

• Transistor sizing– as long as fan-out capacitance dominates

• Progressive sizing

InN CL

Distributed RC line

M1 > M2 > M3 > … > MN

The fet closest to the output should be the smallest.

Can reduce delay by more than 20%; decreasing gains as technology shrinks

• Input re-ordering– when not all inputs arrive at the same time

critical path critical path

0→1 chargedcharged

• Input re-ordering– when not all inputs arrive at the same time

critical path critical path

charged1

0→1charged

charged1

delay determined by time to discharge CL, C1 and C2

delay determined by time to discharge CL

0→1 charged

discharged

Sizing and Ordering Effects

Progressive sizing in pull-down chain gives up to a 23% improvement.

Input ordering saves 5%critical path A – 23% critical path D – 17%

3 3 3 3

= 100 fF

Fast Complex Gates: Design Technique 3• Alternative logic structures

F = ABCDEFGH

• Isolating fan-in from fan-out using buffer insertion

• Real lesson is that optimizing the propagation delay of a gate in isolation is misguided.

Logical Effort: Design Technique 5

• Logical effort generalizes to multistage networks• Path Logical Effort

• Path Electrical Effort

• Path Effort

iG g= ∏out-path

in-path

i i iF f g h= =∏ ∏10 x y z 20g1 = 1h1 = x/10

g2 = 5/3h2 = y/x

g3 = 4/3h3 = z/y

g4 = 1h4 = 20/z

Branching Effort

• Introduce branching effort– Accounts for branching between stages in path

• Now we compute the path effort– F = GBH

on path off path

on path

iB b= ∏ih BH=∏

Multistage Delays

• Path Effort Delay

• Path Parasitic Delay

• Path Delay

F iD f= ∑iP p= ∑i FD d D P= = +∑

Designing Fast Circuits

• Delay is smallest when each stage bears same effort

• Thus minimum delay of N stage path is

• This is a key result of logical effort– Find fastest possible delay– Doesn’t require calculating gate sizes

i FD d D P= = +∑

1ˆ Ni if g h F= =

1ND NF P= +

Gate Sizes

• How wide should the gates be for least delay?

• Working backward, apply capacitance transformation to find input capacitance of each gate given load it drives.

• Check work by verifying input cap spec is met.

i outin

f gh g

Best Number of Stages

• How many stages should a path use?– Minimizing number of stages is not always fastest

• Example: drive 64-bit datapath with unit inverter

1 1 1 1

64 64 64 64

Initial Driver

Datapath Load

N:f:D:

1 2 3 4

Best Number of Stages

• How many stages should a path use?– Minimizing number of stages is not always fastest

• Example: drive 64-bit datapath with unit inverter

D = NF1/N + P= N(64)1/N + N

1 1 1 1

64 64 64 64

Initial Driver

Datapath Load

N:f:D:

42.815.3

Fastest

Derivation

• Consider adding inverters to end of path– How many give least delay?

• Define best stage effort

N - n1 Extra InvertersLogic Block:n1 Stages

Path Effort F( )11

i invi

D NF p N n p=

= + + −∑1 1 1

ln 0N N Ninv

D F F F pN

∂= − + + =

( )1 ln 0invp ρ ρ+ − =

1NFρ =

Best Stage Effort

• has no closed-form solution

• Neglecting parasitics (pinv = 0), we find ρ = 2.718 (e)

• For pinv = 1, solve numerically for ρ = 3.59

( )1 ln 0invp ρ ρ+ − =

Sensitivity Analysis

• How sensitive is delay to using exactly the best number of stages?

• 2.4 < ρ < 6 gives delay within 15% of optimal– We can be sloppy!– I like ρ = 4

1.0 2.00.5 1.40.7

1.151.26

(ρ =2.4)(ρ=6)

cpe/ee 427, cpe 527 vlsi design i l13: wires, design for...

Documents

cpe 626 advanced vlsi design - the university of...

lecture 11 - cpe 690 introduction to vlsi design

cpe 626 advanced vlsi design lecture 10: fpga structures...

lecture 8 - cpe 690 introduction to vlsi design

lecture 10 - cpe 690 introduction to vlsi design

cpe 626 advanced vlsi design lecture 6: vhdl synthesis...

cpe 626 advanced vlsi design lecture 8: power and designing...

intro to vlsi design (cpe 448) (vhdl tutorial )

cpe 626 advanced vlsi design lecture 10: fpga...

cpe/ee 427, cpe 527 vlsi design i l01: introduction...

lecture 7 - cpe 690 introduction to vlsi design

cpe/ee 427, cpe 527, vlsi design i: laboratory assignment...

cpe/ee 427, cpe 527, vlsi design i: vhdl design...

lecture 4 - cpe 690 introduction to vlsi design

cpe 626: advanced vlsi design l01

lecture 1 - cpe 690 introduction to vlsi design

cpe 626: advanced vlsi design l02

cpe/ee 427, cpe 527, vlsi design i:

cpe/ee 427, cpe 527, vlsi design i - uah

lecture 3 - cpe 690 introduction to vlsi design