useful skew and linear programming › ece660 › presentation_sept... · 2004-08-24 · vaibhav...

Useful Skew And Linear Programming

By:Vaibhav Nawale

Tejas Pipwala

Outline of the presentation

Skew: the problem in modern VLSI designZero Skew vs. Useful SkewCurrent Techniques employed for optimal useful clock skew settingLinear Programming overviewLP as a tool for optimizing useful skew setting

Skew : the problem in modern VLSI designs

Causes of Clock Skew Interconnect delaysBuffer delaysProcess variations

Effects of Clock SkewSlowing down the pace of doubling clock speed (places an overhead)Causes malfunction due to timing violations

Zero Skew vs. Useful Skew

A traditional solution to the problem of skewClock distribution design and data path design considered separatelyDifferent clock distribution topologies with appropriate buffer insertions used to eliminate skewAlmost impossible to achieve; increased wire size, more power(area) penalty and higher costs

A recent approach to solve the clock-skew problemTakes into account the interactions between the data path and the clock to make use of the clock skewVarious methods (like LPP) to determine the optimal permissible useful skewEffective low cost solution to the clock skew problem; by introducing intentional skew within the permissible limits

Current TechniquesSkew = local constraint

Current Techniques (cont..)Skew scheduling for design robustness

Timing is correct as long as the signal arrives in the permissible skew rangeDesign will be more robust if clock signal arrival time is in the middle of permissible skew range, rather than on the edge.

Current Techniques (cont..)Cycle stealing on synchronous CircuitsThe proposed method can fully discover the cycle stealing effect on level- sensitive synchronous designs using an algorithm.This algorithm first constructs a latch graph from a timing analysis on the combinational logicThen it analyzes cycle stealing based on overlay timing relationships among latch nodes.A breadth-first search algorithm examines all possible cycle stealing among latches. The algorithm also considers the fact that cycle stealing is topology dependent

Current Techniques(cont..)Cycle stealing (cont..)

A backtracking method using breadth-first search technique is applied to verify the overlay relationships through multiple stagesIn the backtracking process, a feedback loop starts at a root latch, traverses the latch graph, and ends at root latch. The delays in some feedback loops cant satisfy the timing requirements and backtracking of overlay timing information can detect these timing violations.In this technique, it averages the negative slack at each latch stage in the feedback loop. Thus, For each edge, only the maximum negative slack is recorded.

Current techniques (cont..)Automatic Clock Skew Scheduling

This makes use of an algorithm (min cycle algorithm) to minimize the cycle period of a synchronous circuit by optimizing clock skewsThe idea is to utilize the bound imposed by the critical cycle of the circuit and setup/hold time constraints on individual registers to compute clock period.The resultant period is the best that can be achieved by clock skew scheduling.This algorithm takes advantages of the basic structure of the practical circuits and hence it is very efficient.

Current Techniques (cont...)Min Cycle algorithm application

Tmax = 8E’ = {e1}, E”={e3,e4}ticks(a)=0, ticks(b)=-1, ticks(c)=0θ1(e1)=∅ , θ1(e2)=2, θ1(e3)=8, θ1(e4)=8, θ2(e1)=8→ θ=2∴ skew(a)=0, skew(b)=2, skew(c)=0,new Tmax = 6

Thus in the first iteration of algorithm, the maximum permissible amount to improve the clock period is 2. The algorithm iterates until θ=0.

Current Techniques (cont..)Min-Cycle Algorithm

/* Select all forward critical edges and

form new graph C' (setup critical). */ E' = { e | e =(u,v) and

[ dmax(e) - skew(v) + skew(u) ] == Tmax } C' = (V, E') /* Select all backward critical edges and form

graph C'' (hold critical) */ E'' = { e | e = (u,v) and

[ skew(u) + dmin(e) ] == skew(v) } C'' = (V, E'') /* Propagate ticks to maximum # of registers in the

forward (setup) and backward (hold) directions */ for each v ∈∈∈∈ V in topological order {

ticks(v) = 0 : if v is root of C' min{ { ticks(u’) - 1 | (u’,v) ∈∈∈∈ E' },

{ ticks(u’’) | (v,u’’) ∈∈∈∈ E’’ } }

otherwise}

/* Compute maximum θθθθ for adjusting Tmax */ for each e = (u,v) ∈∈∈∈ E {

if( ticks(v) + 1 > ticks(u) ) { /* setup */ t = dmax(u,v) + skew(u) - skew(v) θθθθ1 = min{ [ Tmax – t ] /

[ ticks(v) - ticks(u) + 1 ] } } if( ticks(u) > ticks(v) ) { /* hold */

t = dmin(u,v) + skew(u) - skew(v) θθθθ2 = min{ [ t ] / [ ticks(u) - ticks(v) ] }

} } θθθθ = min{ θθθθ1 , θθθθ2 }

Linear Programming overviewProblems involving the determination of the minimum or maximum of a function are a part of classical mathematical analysis and have been denoted by the collective term extreme value problems

The domain that incorporates constrained extreme valueproblems is labeled as the field of linear programming.

njxand

bxaxaxa

bxaxaxabxaxaxa

subjectxcxcxcz

j

mnmnmm

nn

nn

nn

...,3,2,10

)2(

.............

...........

)1(......:max

2211

22222121

11212111

2211

====≥≥≥≥

−−−−−−−−−−−−−−−−−−−−−−−−−−−−

====++++++++

====++++++++====++++++++

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−++++++++++++====

Methods

Graphical methods; only for geometric understandingSimplex method; an iterative method that moves along the boundary of the feasible solution space. It covers a fraction of all the extreme points to determine the optimal solutionRevised simplex method; a modification to simplex method to solve large and sparse problemsInterior point methods; faster iterative solution approaching the optimum value from within the feasible solution region

LP as a tool for optimizing useful skew setting

All approaches to apply LPP to useful skew setting have to satisfy the the Setup and Hold time specifications.The objective function may vary but the basic constraints equations remain the same.

Hold time check

Set up time checkand

Where is the clock arrival time at say latch i and MAX(i,j)/MIN(i,j) are the max/min delays b/w the latch pair

HOLDxjiMINxji

++++≥≥≥≥++++ ),(

PERIODxSETUPjiMAXxij++++≤≤≤≤++++++++ ),(

boundupperxboundloweri

__ ≤≤≤≤≤≤≤≤

ix

Different objectivesInitially, clock skew optimization was proposed with a view to improve synchronous circuit speed performance

minimize: Time period (P) subject to: the setup & hold constraintsAnother objective can be viewed as optimizing the useful skew such that the clock signal arrival time is in the middle of the permissible skew range. This gives the circuit the additional robustness to uncertainty in clock arrival timesA third objective is seen as optimal skew setting (using a combination of 2 LP problems) incorporated in gate sizing to take advantage of larger timing budget.minimize: Area subject to: Delay <= Pspec orminimize: P subject to: Area<= Aspec

Linear programming; a good choiceThe ubiquity and immense utility of linear programming, the simplicity, beauty, and accessibility of the underlying mathematics, its enormous capability for modeling and solving a vast array of concrete real-world problems, and its intimate connections with computer science provide ample justification for its use in solving a wide variety of problems.Clock time minimization by optimal useful skew setting is just one of the many.The existing algorithms do not consider the process variations that cause uncertainties in the clock arrival time. The use of LPP makes it much easier to add the uncertainty as an additional constraint to determine the optimal time

References:[1] Ichiang Lin, Ludwig John Kwok Eng -Analyzing cycle stealing on synchronous circuits with level sensitive latches.

July 1992 Proceedings of the 29th ACM/IEEE conference on Design automation conference

[2] Ravindram Kaushik-Automatic clock skewingDepartment of Electrical engineering and computer science, University of California, Berkley

[3] Fishburn, J M.- Clock Skew OptimizationComputers, IEEE Transactions on , Volume: 39 Issue: 7 , July 1990

[4] Joe G. Xi, Wayne W.-M. Dai-Useful Skew Clock Routing With Gate Sizing for Low Power Dseign

June 1996 Proceedings of the 33rd annual conference on Design automation conference

[5] Xun Liu, Marios C. Papaefthymiou, Eby G. Friedman -Maximizing performance by retiming and clock skew scheduling

Design Automation Conference, 1999. Proceedings. 36th , 21-25 June 1999

STATISTICAL TIMING ANALYSIS

Kranthi BodepudiPraveen Vangari

Outline• STA – case analysis• Process variations• Statistical Timing Analysis

– Full chip analysis– Path based analysis

• Delays based on arrival times(NAND, NOR)

Static timing analysis

• Process parameters modeled in STA using case analysis.

• Analysis using Best-case, nominal and worst-case SPICE parameter sets.

• Deterministic delays are used, statistical variation in silicon is hidden. (die-die)

• Worst case analysis for within-die variations leads to pessimistic results (since it ignores inherent statistical variations)

Process Variations

• Uncertainty in device• Interconnect characteristics

– Effective gate length– Doping concentrations– Oxide thickness

• Inter-die variation and intra-die variation

Statistical Timing Analysis

• Objective (PDF/CDF)• Full chip analysis

– Bounds– Selective Enumeration

• Path based analysis– Gate level model– Spatial correlation model

Statistical timing analysis formulation

• Circuit delay, a random variable• CDF of delay of a probabilistic timing graph Gp is

where pi(ti) = PDF of delay of edge i. • Enumerate all possible edge delays

Exponential run-time = kn

k = no. of discrete edge delay PDF.

∫≤

=≤tGD

P

D

dtdttptptGDP)(

212211 ...)....()())((

Statistical timing analysis• Arrival time propagation

• Independent arrival timesTwo arrival times An,i and An,j at nodes ni and nj are independent if the fanin subgraphs GS,i and GS,j at nodes ni and nj are disjoint (meaning they have no common edges) or if any common edges have a deterministic delay.

• Dependent arrival timesConsider the set of k predecessor nodes np,i of node n, with fanin subgraphsGS,i, and the intersection graph GI={Ni,Ei} consisting of the union of edges and nodes shared by two or more subgraphs, excluding the source node ns. The dependence set of n is the set of nodes {n1, n2,...., nd,...}, such that nd lies on the intersection graph, , and has one or more fanout edges ei that do not lie on the intersection graph,

Arrival times

• Independent PDFCDF

– Linear run time complexity– Only valid if arrival times are independent

)()()( tQtPtPPG ⋅=

τττ dqtptpPG ⋅⋅−= ∫

∞

0

)()()(

Arrival times• Dependent arrival time propagation

Identify all dependence nodes in the circuit.Propagate arrival time PDFs in the circuit until the first dependence node nd is encountered.Enumerate the pairs (ti, pi) of arrival time PDF Ad at nd and for each pair propagate ti with conditional probability pi.Propagate ti, using independent arrival time propagation until the next dependence node is encountered and repeat above step.Compute the final arrival time PDF at node x by summing the conditional arrival time PDFs weighted by the product of their conditional probabilities.

• Recursive enumeration of arrival time PDF’s of all dependence nodes --- exponential run-time complexity.

• Since Dependence nodes < edges, Set of nodes at which arrival times are enumerated is the sufficient.

Statistical Bounds

• Upper bound– The arrival time CDF P(t) is an upper bound of the

arrival time CDF Q(t) if and only if for all t, • Lower bound

– Given the CDFs X(t) and Y(t) of two dependent random variables x and y and the random variable z = max(x, y), it is clear that the CDF min(X(t), Y(t)) is a lower bound on the CDF of z.

– Similar to independent arrival time, but at convergence nodes the statistical max. function is replaced with a selection of arrival time with largest expected value.

)()( tQtP ≤

Overall approach• Exact Graph Reduction

– If where A1 and A2 are two arrival time PDFs that converge at a node, then A2 is pruned.

– Series and Parallel Reduction

– Parallel Reduction removes local reconvergence and improves quality of lower bounds.

21 max,2min,1 AAtt >

Overall approach continued..

• Selective enumeration– Bound computation with Enumeration of dependence

nodes• Partition of sample space, reduction of dependencies of arrival

time PDF's.• When all dependence nodes of a particular convergence nodes

are enumerated, the arrival times at this node are independent and the low bound can be computed using statistical maximum, instead of selection of the arrival time with the latest expected value.

• No. of enumerated dependence nodes increases, quality of bounds increases.

Results of bounds and selective Enumeration

Path-Based statistical timing analysis

• Chooses top ‘n’ critical paths with deterministic timing analysis

• Statistical analysis of each path• Avoids the issue of reconvergence• Accounts for intra die process variations

Published workConsiders

– Inter and Intra-die correlations– Variations of input slope and output loads– Spatial correlation of intra-die gate length

variations

Device lengthPath delay Dp resulting from variation of the length of individual gates in path.

Simplifying assumption

iraernomitotal LLLL ,intint, ∆+∆+=

)( ,intint iraernomi

iP LLLDD ∆+∆+=∑

)()()()( ,intint,intint iraierinomiiraernomi LDLDLDLLLD ∆∆+∆∆+=∆+∆+

• Inter-die variability analysis ( )– Computed different possibilities of worst case to best case process

corners.• Intra-die variability analysis ( )

– Function of multiple independent random variables

-- Change in output slope of preceding gate-- Change in input slope of succeeding gate

∑∑ ∆∆+i

erii

nomi LDLD )()( int

∑ ∆∆i

irai LD )( ,int

iraira

i

iirai L

LDLD ,int

,int,int )( ∆×

∂∂=∆∆∑

1

1

,int11 ),,(

+

−

−+

∆∆

∆∆∆=∆

i

i

iraiii

clS

LSclfD

Total change in path delay

-- coefficient of total path delay change due to intra-die device length at

gate ‘i’.

∑ ∆×=i

irairap LKD )( ,intint,

iKiL∆

Spatial Correlation model

Delay variation for each gate with respective coefficients oftotal path delay change can be expressed as

)()(

)(

1,04,115,233

1,01,14,222

1,01,11,211

LLLKDLLLKD

LLLKD

∆+∆+∆=∆∆+∆+∆=∆

∆+∆+∆=∆

NAND

BA

A

B

C L

C ab

VDD

A (ns) B (ns) Delay (ps)0 1 68.310 2 68.4910 3 68.5860 4 68.6620 5 68.708

A (ns) B (ns) Delay (ps)1 0 63.1152 0 63.1123 0 63.1144 0 63.1125 0 63.113

NOR

Cab

CL

B

A

A B

VDD

A (ps) B (ps) Delay (ps)0 10 44.220 20 42.210 30 41.250 40 41.140 50 40.84

A (ps) B (ps) Delay (ps)10 0 44.8120 0 44.0230 0 44.9440 0 45.6450 0 46.15

References• “Statistical Timing Analysis using Bounds”, Aseem Agarwal,

David Blaauw, Vladimir Zolotov, Sarma Vrudhula.

• “Statistical Timing Analysis using Bounds and Selective Enumeration”, Aseem Agarwal, David Blaauw, VladimirZolotov, Sarma Vrudhula.

• “Path-Based Statistical Timing Analysis Considering Inter- and Intra-Die Correlations”, Aseem Agarwal, David Blaauw, Vladimir Zolotov, Savithri Sundareswaran, Min Zhao, Kaushik Gala, Rajendran Panda.

Body Biasing (BB)

and

Evolutionary Algorithms

David Balhiser

and

David M. Sendek

EE660 Advanced Topics - VLSI Design

16 September 2003

Outline

• Body Biasing– What is it ?– Why do it ?– How does it work ?– How do we use body biasing ?

• Evolutionary Algorithms

Body Biasing – What is it ?(Forward Biasing)

If V sb > 0 => Forward Biasing ,where V sb is the Source-to-BodyVoltage.

Advantages:• Used in active mode to increase

operating frequency. • Improves short channel effects, thusreducing sensitivity to critical dimensional variation.

• If a voltage divider circuit is applied,Vbp < Vdd or Vbn > Vss can be generated on chip.

Disadvantages:• Increases leakage power as well.• If additional circuits are present, ap-well will likely be required toisolate elements which couldinclude separate biases (triple-well).

If V sb < 0 => Reverse Biasing ,where V sb is the Source-to-BodyVoltage

Advantages:• Lowers leakage power

Disadvantages:• Requires a separate power supplyfor Vbp and/or Vbn

Body Biasing – What is it ?(Reverse Biasing)

[10]

The Focus will be on N-well Biasing

Body Biasing – What is it ?(Triple Well)

Body Biasing – Why Do it ? (General Trends)

↑Vdd => ↓TD => ↑fop=> ↑Ps , ↑PD => ↑IL

↑Vbn => ↓Vth, ↓TD => ↑fop=> ↑Ps (FBB), ↑PD

↑Vbp => ↓Vth, ↑TD (RBB) => ↓fop=> ↓Ps , → PD , ↑PSH

BOTTOM LINE:If ↑fop is desired, => FBB => ↓Vth (which also ↑Ps )If ↓Ps is desired, => RBB => ↑ Vth (which also ↓fop )

↓Vth => ↑ IL , ↓TD (Forward Bias) => ↑fop

[3, 5]

LEGEND:Vdd = Supply VoltageVth = Threshold VoltageTD = Delay Timefop = Operating FrequencyPs = Static Power DissipationPD = Dynamic Power DissipationIL = Leakage Current↑ = Increasing↓ = Decreasing↑↓ = Variable→= Little ChangeFBB = Forward Body BiasRBB = Reverse Body BiasNBB = Normal Body Bias

Vth is changed for a Power-Performance Trade-Off

Vth = Vfb + VT-MOS

Vth = Vfb + 2φb + [2 εSiqNA(2φb + Vsb|)]1/2/COX

Vth = Vt0 + γ[ (2φb + Vsb|)1/2 - (2φb)1/2 ]

where,γ = (t ox/ εox)(2q εSiNA )1/2 = (1/COX )( 2 εSiqNA )1/2

LEGEND:Vth = Threshold VoltageVfb = Flatband VoltageVT-MOS = Ideal Threshold Voltageφb = Bulk PotentialεSi = Permittivity of Siliconq = Charge of an ElectronNA = Substrate Doping ConcentrationVsb| = Voltage Between Source and SubstrateCOX = Oxide CapacitanceVt0 = Threshold Voltage for Vsb = 0γ = Constant for Substrate Bias Effectt ox = Gate Oxide Thicknessεox = Dielectric Constant

[7]

Body Biasing – How Does it Work ? (Formulas)

Vth is changed since Vsb is Directly Effected by Body Biasing

Ids = (µ ε/ t ox)(W/Leff )(Vgs – Vtn)2/2 (in saturation)

IL = (W/Leff )Is[1-e^(- Vdd/VT)]e^[-(Vth+Voff)/(n VT)]

LEGEND:Ids = Drain-to-Source Currentµ = Channel Carrier Mobilityε = Permittivity of the Gate Insulatort ox = Gate Oxide ThicknessW = Width of the ChannelLeff = Effective Length of the ChannelVgs = Gate-to-Source Voltage = Vds + VthVtn = Threshold Voltage of NFETVth = Threshold VoltageVds = Drain-to-Source VoltageIL = Leakage Current = Ioff (weak inversion

region)Is = process constant (empirically derived)Vdd = Supply VoltageVT = Thermal VoltageVth = Threshold VoltageVoff = Offset Voltage (empirically derived;

process dependent)n = swing factor (empirically derived;

process dependent)

[3, 7, 16]


Vth has an Exponential Impact on Leakage Current

PD = fopCLV2DDα

PS = Σ IL VDD

Psh = fop V DD α ∫1cycle IS dt

Ptotal = PD + PS + Psh

LEGEND:PD = Dynamic Power Dissipationfop = Operating FrequencyCL = Capacitive LoadVDD = Supply Voltageα = Activity FactorPS = Static Power DissipationIL = Leakage CurrentPsh = Short-Circuit Power Dissipation∫1cycle = Integration over 1 clock cycleIS = Short Circuit Current Ptotal = Total Power Dissipation

[3, 6]


t = dLk/(VDD – Vth) α

LEGEND:t = Delay of a basic CMOS InverterdL = Logic Depth of the path k = Process Dependent constant VDD = Supply VoltageVth = Threshold Voltage α = Measure of Velocity Saturation

[3]


“Alpha-Power Model”

BackgroundTypes of Process Variations

Lot-to-Lot (L2L)Wafer-to-Wafer (W2W)Die-to-Die (D2D)Within Die (WID)

Body BiasingStatic Body BiasingAdaptive Body Biasing

The ABB Methods Presented Focus on WID Parameter Modifications

Body Biasing – How do we use body biasing ?

Static Body BiasingFBB critical paths to increase performance (↑fop)RBB on non-critical paths (↓Ps )Combination of the two

A Single Biasing Value is Selected


Adaptive Body Biasing (ABB)

Power-Performance Trade-off across the ChipPredominately used to optimize power and/or performanceDifferent power modes (such as active, stand-by, sleep)

Power-Performance Trade-off of sub-circuits within the chipEffects yields (binning)

Power-Performance Trade-off of every N-well within the chipEffects yields (binning)Most useful for WID variations



Frequency

Pow

er

Vdd

Body Biasing

FBBRBB

Curves are Process Dependent

Adaptive Body Bias (ABB) Techniques

Addressing Power ConsumptionVBC = Voltage Bias ControllerVBCG = VBC GenerationVBCP = VBC PFETVPCN = VBC NFETVBCI = VBC vsub = substrate voltage = vdd - vwellvbn = NFET Bias Voltage (0v -> -1.5v)vbp = PFET Bias Voltage (1.8v -> 3.3v)cbn = Control NFET Bias (1.8v -> -1/5vcbp = Control PFET Bias (0v -> 3.3v)cbpr = Feedback Signal from PFETs (furthest corner)cbnr = Feedback Signal from NFETs (furthest corner)VBCR = Voltage Bias Controller ReturnVbbenb = Vbb EnableVbbenbr = Vbb Enable Mode Transition Terminate

Modes of Operation: 1. Standby Mode2. Data Retention Mode (for battery backup)Operation: Bias the PFET & NFET substrates to

a single level to reduce leakage current.Feedback (furthest corner) is used to reduce switching noise. [8]


Addressing Power Consumption (cont.)

[8]


Improving Yields

[5, 15]

- CUT represents a critical path- ROenable is a Ring Oscillator

for frequency and active power measurements

Testchip Subsites Single Testchip Subsite


[5, 15]

φ = Desired Operating FrequencyPD = Phase Detector5-bit Counter = Clocked by PD whose value represents

the desired body bias to apply (32mV resolution)

Op Amp = Converts Digital to analog level

Operation: PD compares critical path delay with the targetoperating frequency. FF clk divider allows bodybody bias generator to stabilize an the criticalpath to adapt to the new body bias before thePD is updated.

Improving Yields


[5, 15]

Improving Yields (cont.)

Initial Distribution of ChipsYield Improvement(w/biasing)LEGEND:

ABB = Adaptive Body Bias NBB = Normal Body Biasσ/µ = Frequency VariationRBB = Reverse Body BiasFBB = Forward Body Bias

Biasing Histogram


[5, 15]

Improving Yields (cont.)

Biasing Histogram


Improving Yields (controlling WID variations)vdiv_cntl = Voltage Divider Control Pull_cntl = switch for Vbp (or if tied to Vdd, can control

Vdd)

Operation: Individual PFET N-wells are biased basedupon a scan latch control. This same could be used to control Vdd too. Thispermits frequency increases and/or powerreduction for product binning

[4, 11]


Improving Yields (controlling WID variations) (cont.)

[4, 11]

LEGEND:NWF = N-Well Floating VDD = VDD BiasingNWB = N-Well Biasing (Vbp)DWB = Dual Well Biasing (Vbp & Vbn)VDIV = Voltage Divider Control

Initial Distribution of ChipsFinal Distribution of Chips(w/biasing)

Yield Improvement(w/biasing)

Evolutionary Algorithms

Why Search Optimization ?Optimization is used in lieu of exhaustive searchingProblem:

Assume we have hundreds of sub-circuits of a die where a body bias is required. In addition, assume there are thousands of diesThe goal is to determine the body bias set points for each sub-circuit for each chipIt would take years to exhaustively search the best body bias set point of each sub-circuit for a single chip.

Hence, search optimization is used.

Evolutionary Algorithm* Based upon Darwin’s Theory of Evolution A search optimization techniqueAlgorithminitialize population; //Generate initial population & encode to

//Format for each individual in the populationevaluate population; //Determine “fitness” of each individualwhile NOT (termination criteria) //Usually, the number of generations or “good” soln{

select population; //Survival of the “fittest” individualscrossover population; //Mating between individualsmutate population; //A slight characteristic change to an individualevaluate population; //Determines “fitness” of each individual

}

[1] * = Also referred to as a Genetic Algorithm (GA)

Basic Genetic Algorithm

Parameter Space of all Possible Solutions[2]

= 1 Chromosome or Individual.A Possible Solution

= Results of crossover. (Chromosome Sharing)

= Results of mutation. (Minor Chromosome Change)

= Absolute “minimum” (for discussion purposes)

Optimum Chromosome

Search Space of all Possible SolutionsFi

tnes

s (m

inim

ize)

Generations

Population Convergence

Optimum Solution or Target

One Population

0 1 2 3 n

•••

Each Chromosome has anAssociated fitness value

Basic Genetic AlgorithmA population consists of individuals called chromosomes and the size is usually static. The population represents one set of solutions.One iteration is a generation.A chromosome can be represented/encoded as a binary, real, integer, characters, etc.A chromosome structure can be an array, a Directed Acyclic Graph, a tree, a list, etc.A chromosome is evaluated on its “fitness”, the function to be optimized.

Basic Genetic Algorithm OperatorsCrossover (single point)chromosome 1: 1001011001

child chromosome 1001111000chromosome 2: 1110111000Mutationchromosome 1100010101 1100110101

Why These Operators ?Crossover drives the population towards fitter individualsMutation ensures population diversity

Selection Operation Ensures Survival of the Fittest Individuals

Basic Genetic Algorithm Variations(a short list)

Cross Over OperatorSingle PointTwo-pointUniform (every point)No Crossover (Random-walk Hill Climber)

Selection OperationTournament SelectionRoulette Wheel SelectionElitism

Basic Genetic AlgorithmAdvantages

No presumptions to problem space.If presumptions are made, this can help converge to an optimal solutionLow developmental cost.Applicable to a wide range of problems where no “good” method is available.

DisadvantagesNo guarantee of finding optimal solutions in a finite amount of time. But if near optimal solution is desired, this is an advantage.Algorithm search can get stuck in a local minima, but mutation is able to solve this problem.Parameter tuning mostly by trial & error, such as a multi-objective function with a single fitness function. F = αf1 + (1-α)f2 where α is a “weight”

[2]

Multi-Objective (MO) GA’s

• Optimize two or more objectives concurrently.• Determines/Maps trade-off space for given

problem.• For ABB optimization.

– GA Objectives: • 1) Delay minimization. • 2) Power minimization.

– Constraint: Limited tester time per part (1 minute).– Systematic goals: Yield enhancement. Bin upgrading.

NSGA-II Algorithm

• Maintains solution diversity using improved density metric.

• Retains best solutions between generations (elitist).• Reasonable O(MN2) time complexity.• Empirically tested to show best or near-best results for a

wide variety of MO problems versus other advanced MO GA’s.

Summary

Body Biasing is a circuit design technique to trade power & performance

Body Biasing can improve yield

Body Biasing can be done adaptively

An “intelligent” search method is used to determine body bias set points.

Genetic Algorithms provide a methodology to search “intelligently”.

References1. M. Srinivas and Lalit M. Patnaik, “Genetic Algorithms: A Survey”, IEEE Computer, June 19942. Kalyanmoy Deb, “Evolutionary Algorithms: Techniques and Applications”, 9th International Conference

on Neural Information Processing, 4th Asia-Pacific Conference on Simulated Evolution and Learning,International Conference on Fuzzy Systems and Knowledge Discovery, November 18-22, 2002,Singapore

3. Justin Gregg, “A Low Cost Individual-Well Adaptive Body Bias (IWABB) Scheme for Leakage Power Reduction, Performance Enhancement and Manufacturing Yield Improvement in the Presence of Intra-Die Variations”, Summer 2003

4. Justin Gregg and Tom W. Chen, “Optimization of Individual Well Adaptive Body Biasing (IWABB) UsingSingle Objective Algorithms”, draft Summer 2003

5. James W. Tschanz, James T. Kao, Siva G. Narendra, Raj Nair, Dimitri A. Antoniadis, Anantha P. Chandrakasan and Vivik De, “Adaptive Bias for Reducing Impacts of Die-to-Die andWithin-Die Parameter Variations on Microprocessor Frequency and Leakage, IEEE Journal ofSolid-State Circuits, Vol. 37, No. 11, November 2002, pgs. 1396 – 1402

6. Ajith Leo Chandy, “Optimized Placement and Allocation of Decoupling Capacitors for Improvement inSystem Performance Under the Leakage Power Constraint”, draft Fall 2003

7. Neil H.E. Weste and Kamran Eshraghian, “Principles of CMOS VLSI Design: A Systems Perspective”,second edition, Addison-Wesley, October 1994

8. Hiroyuki Mizuno et al, “A 18µA-standby-Current 1.8v 200MHz Microprocessor with Self Substrate-BiasedData-Retention Mode”, ISSCC99, 16 February 1999

References

10. Masayuki Miyazaki et al., “A 1.2GIPS/W Microprocessor Using Speed-Adaptive Threshold-VoltageCMOS With Forward Bias”, IEEE J. of SSC, Vol. 37, No. 2, pgs. 210-217, February 2002

11. Tom Chen, “EE660 Advanced Topics – VLSI Design”, Fall 200312. Masayuki Miyazaki et al., “A Delay Distribution Squeezing Scheme with Speed-Adaptive Threshold-

Voltage CMOS (SA-Vt CMOS) Low Voltage LSIs”, 1998 International Symposium on Low Power Electronics and Design Proceedings, pgs. 48-53, August 1998

13. Ali Keshavarzi et al., “Forward Body Bias for Microprocessors in 130nm Technology Generation andBeyond”, 2002 Symposium On VLSI circuits Digest of Technical Papers, pgs. 312-315, 2002

14. Siva Narendra et al., “Forward Body Bias for Microprocessors in 130nm Technology Generation andBeyond”, IEEE J. of SSC, vol. 38, No. 5, pgs 696-701, May 2003

15. James W. Tschanz, Siva G. Narendra, Raj Nair, Dimitri A. Antoniadis, and Vivik De, “Effectiveness ofAdaptive Supply Voltage and Body Bias for Reducing Impacts of Parameter Variations in Low Powerand High Performance Microprocessors”, IEEE Journal of Solid-State Circuits, Vol. 38, No. 5, May 2003, pgs. 826 – 829

16. J. H. Huang et al, “A Robust Physical and Predictive Model for Deep-Submicrometer MOS Circuit Simulation”, IEEE 1993 Custom Integrated Circuits Conference, 1993

Backup Slides

Biasing Simulations (Using Spice)

NAND Circuit NOR Circuit

Biasing Simulation Results - NANDNAND Gate Static Power (Sweep Vdd, Normal Biasing)

1.51E-05

4.47E-08

1.17E-111.08E-113.15E-111.00E-11

1.00E-101.00E-09

1.00E-081.00E-071.00E-06

1.00E-051.00E-041.00E-03

1.00E-021.00E-01

1.00E+001.4 1.6 1.8 2 2.2

Vdd (volts)

Pow

er (w

atts

)

Static Power

Sweeping Vdd

Vbp = 1.8V, Vbn = 0.0V

NAND Gate Delay Time (Sweep Vdd, Normal Biasing)

1.94E-10

1.51E-10

1.30E-101.15E-10

1.02E-10

0.00E+00

5.00E-11

1.00E-10

1.50E-10

2.00E-10

2.50E-101.4 1.6 1.8 2 2.2

Vdd (volts)

Dela

y Ti

me

(sec

onds

)

Delay Time

Biasing Simulation Results - NAND

Sweeping Vdd

NAND Gate Threshold Voltage (Sweep Vdd, Normal Biasing)

1.15E+00

9.33E-01

7.47E-01

5.32E-01

2.88E-01

0.00E+00

2.00E-01

4.00E-01

6.00E-01

8.00E-01

1.00E+00

1.20E+00

1.40E+001.4 1.6 1.8 2 2.2

Vdd (volts)Th

resh

old

Volta

ge (v

olts

)

Threshold Voltage

Vbp = 1.8V, Vbn = 0.0V


Sweeping Vbn

NAND Gate Static Power (Sweep Vbn)

3.15E-113.21E-11 3.16E-11 3.18E-11

8.36E-10

1.00E-111.00E-101.00E-091.00E-081.00E-071.00E-06

1.00E-051.00E-041.00E-031.00E-021.00E-01

1.00E+00-0.4 -0.2 0 0.2 0.4

Vbn (volts)

Pow

er (w

atts

)

Static Power

Vdd = 1.8V, Vbp = 1.8V

Reverse Bias Forward Bias

NAND Gate Delay Time (Sweep Vbn)

1.18E-101.21E-10

1.30E-10

1.35E-10

1.40E-10

1.05E-10

1.10E-10

1.15E-10

1.20E-10

1.25E-10

1.30E-10

1.35E-10

1.40E-10

1.45E-10-0.4 -0.2 0 0.2 0.4

Vbn (volts)

Dela

y Ti

me

(sec

onds

)Delay Time



Sweeping Vbn

NAND Gate Threshold Voltage (Sweep Vbn)

7.76E-01

7.63E-01

7.47E-01

7.31E-01

7.18E-01

6.80E-016.90E-01

7.00E-017.10E-01

7.20E-017.30E-01

7.40E-017.50E-01

7.60E-017.70E-01

7.80E-017.90E-01

-0.4 -0.2 0 0.2 0.4

Vbn (volts)

Thre

shol

d Vo

ltage

(vol

ts)

Threshold Voltage


Vdd = 1.8V, Vbp = 1.8V


Sweeping Vbp

NAND Gate Static Power (Sweep Vbp)

1.69E-111.69E-111.92E-10

1.34E-07

3.15E-11

1.00E-11

1.00E-101.00E-091.00E-081.00E-071.00E-061.00E-051.00E-041.00E-031.00E-021.00E-01

1.00E+001.4 1.6 1.8 2 2.2

Vbp (volts)

Pow

er (w

atts

)

Static Power

Reverse BiasForward Bias

Vdd = 1.8V, Vbn = 0.0V

NAND Gate Delay Time (Sweep Vbp)

1.30E-10

1.27E-10

1.24E-10

1.20E-101.21E-10

1.22E-101.23E-10

1.24E-101.25E-10

1.26E-10

1.27E-10

1.28E-101.29E-10

1.30E-10

1.31E-101.4 1.6 1.8

Vbp (volts)

Dela

y Ti

me

(sec

onds

)Delay Time

Forward Bias


Sweeping Vbp

NAND Gate Threshold Voltage (Sweep Vbp)

6.88E-01

7.47E-017.47E-01

7.87E-01

8.18E-01

6.00E-01

6.50E-01

7.00E-01

7.50E-01

8.00E-01

8.50E-011.4 1.6 1.8 2 2.2

Vbp (volts)

Thre

shol

d Vo

ltage

(vol

ts)

Threshold Voltage

Vdd = 1.8V, Vbn = 0.0V


Biasing Simulation Results - NOR

Sweeping Vdd

Vbp = 1.8V, Vbn = 0.0V

NOR Gate Delay Time (Sweep Vdd, Normal Biasing)

2.38E-10

1.83E-10

1.24E-10 1.23E-101.11E-10

0.00E+00

5.00E-11

1.00E-10

1.50E-10

2.00E-10

2.50E-10

3.00E-101.4 1.6 1.8 2 2.2

Vdd (volts)

Dela

y Ti

me

(sec

onds

)Delay Time

NOR Gate Static Power (Sweep Vdd, Normal Biasing)

2.96E-06

7.46E-09

5.58E-125.98E-12

7.29E-121.00E-121.00E-11

1.00E-101.00E-09

1.00E-081.00E-071.00E-06

1.00E-051.00E-04

1.00E-031.00E-02

1.00E-011.00E+00

1.4 1.6 1.8 2 2.2

Vdd (volts)

Stat

ic P

ower

(wat

ts)


Sweeping Vdd

Vbp = 1.8V, Vbn = 0.0V

NOR Gate Threshold Voltage (Sweep Vdd, Normal Biasing)

5.68E-01

4.73E-01

3.95E-01

1.49E-01

0.00E+000.00E+00

1.00E-01

2.00E-01

3.00E-01

4.00E-01

5.00E-01

6.00E-011.4 1.6 1.8 2 2.2

Vdd (volts)Th

resh

old

Volta

ge (v

olts

)

Threshold Voltage


Sweeping Vbn

Vdd = 1.8V, Vbp = 1.8V

NOR Gate Static Power (Sweep Vbn)

7.59E-12 7.48E-12 7.32E-12 7.66E-12

8.12E-10

1.00E-12

1.00E-111.00E-10

1.00E-09

1.00E-081.00E-07

1.00E-06

1.00E-051.00E-04

1.00E-03

1.00E-021.00E-01

1.00E+00-0.4 -0.2 0 0.2 0.4

Vbn (volts)

Stat

ic P

ower

(wat

ts)

NOR Gate Delay Time (Sweep Vbn)

1.40E-10

1.36E-10

1.26E-10

1.15E-10

1.20E-10

1.25E-10

1.30E-10

1.35E-10

1.40E-10

1.45E-100 0.2 0.4

Vbn (volts)

Del

ay T

ime

(sec

onds

)Delay Time

Reverse Bias Forward Bias Forward Bias


Sweeping Vbn

Vdd = 1.8V, Vbp = 1.8V

NOR Gate Threshold Voltage (Sweep Vbn)

4.53E-01 4.32E-013.89E-01

3.35E-01

2.89E-01

0.00E+00

5.00E-02

1.00E-01

1.50E-01

2.00E-01

2.50E-01

3.00E-01

3.50E-01

4.00E-01

4.50E-01

5.00E-01-0.4 -0.2 0 0.2 0.4

Vbn (volts)

Thre

shol

d Vo

ltage

(vol

ts)

Threshold Voltage



Sweeping Vbp

Vdd = 1.8V, Vbn = 0.0V

NOR Gate Static Power (Sweep Vbp)

8.58E-126.60E-125.57E-11

6.69E-08

7.31E-12

1.00E-12

1.00E-111.00E-10

1.00E-091.00E-08

1.00E-071.00E-06

1.00E-05

1.00E-041.00E-03

1.00E-021.00E-01

1.00E+001.4 1.6 1.8 2 2.2

Vbp (volts)

Stat

ic P

ower

(wat

ts)

NOR Gate Delay Time (Sweep Vbp)

1.10E-10

1.32E-10

1.24E-10

1.51E-101.56E-10

0.00E+00

2.00E-11

4.00E-11

6.00E-11

8.00E-11

1.00E-10

1.20E-10

1.40E-10

1.60E-10

1.80E-101.4 1.6 1.8 2 2.2

Vbp (volts)

Dela

y Ti

me

(sec

onds

)




Sweeping Vbp

Vdd = 1.8V, Vbn = 0.0V

NOR Gate Threshold Voltage (Sweep Vbp)

2.85E-01

3.09E-01

3.94E-013.82E-01

4.73E-01

0.00E+00

5.00E-02

1.00E-01

1.50E-01

2.00E-01

2.50E-01

3.00E-01

3.50E-01

4.00E-01

4.50E-01

5.00E-011.4 1.6 1.8 2 2.2


Eldo Spice NAND Gate Source Code* Filename: biasingNAND2.cir* Programmers: David M. Sendek & David Balhiser* Date: 2 September 2003* Course: EE660 Advanced Topics - VLSI Design* Professor: Dr. Tom Chen

* BACKGROUND:* This is the netlist file that was originally created by* Mentor Graphics Design Architect(Eldo or Spice model of* the NAND CMOS circuit). This file has been modified* (removal of extraneous comments & commands) as well as* incorporating Eldo Spice commands to extract various* circuit parameters. The width of the PFET & NFET were* obtained from David Balhiser.

* DESCRIPTION:* This circuit model is a basic NAND CMOS circuit. The* purpose of this exercise is to observe the effects of* varying Vdd, Vbn and Vbp. Vdd is the supply voltage* to the CMOS NAND circuit. Vbn is the bias voltage to* the substrate of the NFETs. Vbp is the bias voltage* to the substrate of the PFETs. If Vbp < Vdd OR* Vbn > Vss (ground), then this is a forward biased* circuit. If Vbp > Vdd OR Vbn < Vss (ground), then* this is a reverse biased circuit.

* The intent is to observe the effects upon threshold* voltage (Vth), risetime & falltime and static power* consumption by modifying Vdd, forward & reverse* biasing the substrate of the CMOS circuit.

* The library used in EE571 & this example.LIB /class/EE571/models/log018.eldo53 TT

* Define parameters, i.e., variables with an initial value* that can be later "swept" to change the values.param swVdd = 1.8v.param swVbn = 0.0v.param swVbp = 1.8v

* Define Vdd and Groundv0 GND 0 DC 0vdd VDD GND DC swVddv3 swVbn 0 DC swVbn v4 swVbp GND DC swVbp

* Define input pulse*v1 in1 GND pulse(0 1.8 20n 100p 100p 20n 40n)*v2 in2 GND pulse(0 1.8 10n 100p 100p 10n 20n)v1 in1 GND pwl(0n 1.8 5n 1.8 5.5n 0 12n 0 12.5n 1.8 18n 1.8)v2 in2 GND pwl(0n 1.8 5n 1.8 5.5n 0 12n 0 12.5n 1.8 18n 1.8)

* Define the load capacitance of 5 femto FaradsC_I$416 out GND 5F

* N-FETs & P-FETs for a NAND circuit.* Note: The fields of the circuit showing connections* are as follows (for M_p1):* label = M_p1* drain = swVdd* gate = in2* source = out* body = Vbp* This model uses 1.8v MOS device modelsM_n2 N$1 in2 GND swVbn nch.6 W=1.6U L= 0.18UM_n1 out in1 N$1 swVbn nch.6 W=1.6U L= 0.18UM_p2 vdd in1 out swVbp pch.6 W=2.44U L= 0.18UM_p1 vdd in2 out swVbp pch.6 W=2.44U L= 0.18U

Eldo Spice NAND Gate Source Code(cont.)

•Step through various Vdd & biasing voltages to observe* the results. Note: Only 1 step command can be active* at a time. The others are commented out*.step param swVdd 1.4v 2.2v 0.2v*.step param swVbn -0.4v 0.4v 0.2v.step param swVbp 1.4v 2.2v 0.2v

* Specify the transient analysis.tran 0.01ns 40ns

*Note: If the .plot command is commented out the results of the simulation* output with the extract commands can be observed in the biasingNAND.chi* file..plot v(in1) v(in2).plot v(out)* Extract the rise and fall delays at 50% of Vdd* Returns the x-axis value of v(out) at the crossing of y-axis value of 0.9* at the first (1) or second (2) occurance..extract label = risetime (xthres(v(out), 0.9, 1) - xthres(v(in2), 0.9, 1)).extract label = falltime (xthres(v(out), 0.9, 2) - xthres(v(in2), 0.9, 2)).extract label = Delay ((xthres(v(out),0.9,1)-xthres(v(in2),0.9,1))+(xthres(v(out),0.9,2)-xthres(v(in2),0.9,2)))

* Extract the threshold voltage, i.e., Vdd/2 = 0.9V* Returns the y-axis value of v(in1) when the x-axis value xthres(v(out),0.9,1).* xthres(v(out),0.9,1) returns the x-axis value of v(out) at the crossing of* y-axis value of 0.9 at the first occurence (1)..extract label = V1_threshold (yval(v(in1),xthres(v(out), 0.9, 1))).extract label = V2_threshold (yval(v(in2),xthres(v(out), 0.9, 1)))

*Note: As part of the Eldo Spice simulation, static power is displayed* during the simulation run. This is simply copied down.END

Eldo Spice NOR Gate Source Code* Filename: biasingNOR.cir* Programmers: David M. Sendek & David Balhiser* Date: 2 September 2003* Course: EE660 Advanced Topics - VLSI Design* Professor: Dr. Tom Chen

* BACKGROUND:* This is the netlist file that was originally created by* Mentor Graphics Design Architect(Eldo or Spice model of* the NOR CMOS circuit). This file has been modified* (removal of extraneous comments & commands) as well as* incorporating Eldo Spice commands to extract various* circuit parameters. The width of the PFET & NFET were* obtained from David Balhiser.

* DESCRIPTION:* This circuit model is a basic NOR CMOS circuit. The* purpose of this exercise is to observe the effects of* varying Vdd, Vbn and Vbp. Vdd is the supply voltage* to the CMOS NAND circuit. Vbn is the bias voltage to* the substrate of the NFETs. Vbp is the bias voltage* to the substrate of the PFETs. If Vbp < Vdd OR* Vbn > Vss (ground), then this is a forward biased* circuit. If Vbp > Vdd OR Vbn < Vss (ground), then* this is a reverse biased circuit.

* The intent is to observe the effects upon threshold* voltage (Vth), risetime & falltime and static power* consumption by modifying Vdd, forward & reverse* biasing the substrate of the CMOS circuit.

* The library used in EE571 & this example.LIB /class/EE571/models/log018.eldo53 TT

* Define parameters, i.e., variables with an initial value* that can be later "swept" to change the values.param swVdd = 1.8v.param swVbn = 0.0v.param swVbp = 1.8v

* Define Vdd and Groundv0 GND 0 DC 0vdd VDD GND DC swVddv3 swVbn 0 DC swVbnv4 swVbp GND DC swVbp

* Define input pulse*v1 in1 GND pulse(0 1.8 20n 100p 100p 20n 40n)*v2 in2 GND pulse(0 1.8 10n 100p 100p 10n 20n)v1 in1 GND pwl(0n 1.8 5n 1.8 5.5n 0 12n 0 12.5n 1.8 18n 1.8)v2 in2 GND pwl(0n 1.8 5n 1.8 5.5n 0 12n 0 12.5n 1.8 18n 1.8)

* Define the load capacitance of 5 femto FaradsC_I$416 out GND 5F

* N-FETs & P-FETs for a NOR circuit.* Note: The fields of the circuit showing connections* are as follows (for M_p1):* label = M_p1* drain = swVdd* gate = in2* source = N$1* body = swVbp* This model uses 1.8v MOS device modelsM_n2 out in2 GND swVbn nch.6 W=1.6U L= 0.18UM_n1 out in1 GND swVbn nch.6 W=1.6U L= 0.18UM_p2 N$1 in1 out swVbp pch.6 W=2.44U L= 0.18UM_p1 vdd in2 N$1 swVbp pch.6 W=2.44U L= 0.18U

Eldo Spice NOR Gate Source Code(cont.)

* Step through various Vdd & biasing voltages to observe* the results. Note: Only 1 step command can be active* at a time. The others are commented out*.step param swVdd 1.4v 2.2v 0.2v*.step param swVbn -0.4v 0.4v 0.2v.step param swVbp 1.4v 2.2v 0.2v

* Specify the transient analysis.tran 0.001ns 40ns

*Note: If the .plot command is commented out the results of the simulation* output with the extract commands can be observed in the biasingNAND.chi* file..plot v(in1) v(in2).plot v(out)

* Extract the rise and fall delays at 50% of Vdd* Returns the x-axis value of v(out) at the crossing of y-axis value of 0.9* at the first (1) or second (2) occurance..extract label = risetime (xthres(v(out), 0.9, 1) - xthres(v(in2), 0.9, 1)).extract label = falltime (xthres(v(out), 0.9, 2) - xthres(v(in2), 0.9, 2)).extract label = Delay ((xthres(v(out),0.9,1)-xthres(v(in2),0.9,1))+(xthres(v(out),0.9,2)-xthres(v(in2),0.9,2)))

* Extract the threshold voltage, i.e., Vdd/2 = 0.9V* Returns the y-axis value of v(in1) when the x-axis value xthres(v(out),0.9,1).* xthres(v(out),0.9,1) returns the x-axis value of v(out) at the crossing of* y-axis value of 0.9 at the first occurence (1)..extract label = V1_threshold (yval(v(in1),xthres(v(out), 0.9, 1))).extract label = V2_threshold (yval(v(in2),xthres(v(out), 0.9, 1)))

*Note: As part of the Eldo Spice simulation, static power is displayed* during the simulation run. This is simply copied down.END


Different Power Modes

LCB = Leakage Current MonitorsSSB = Self-Substrate Bias CircuitVbb = Substrate Bias Voltage

Operation: Dynamically varies Vth with substrate bias feedback control circuits to reduce active and standby power dissipation Variable Vt Scheme

SSB Circuit

[9]


Different Power Modes (cont.)

[9]

Frequency Fluctuation Control (using RBB) Operating Current Control

Max Forward Bias

RBB FBB

NFET Body Bias Voltage (Vbb) Control

Vth 0.1v Variation

Substrate Bias Control


Addressing Increased Performance (cont.)

Leakage Current Monitor Substrate Charge Injector- Drives substrate from Vstandby to GND

[9]


Addressing Power ConsumptionVBC = Voltage Bias ControllerVBCG = VBC GenerationVBCP = VBC PFETVPCN = VBC NFETVBCI = VBC vsub = substrate voltage = vdd - vwellvbn = NFET Bias Voltage (0v -> -1.5v)vbp = PFET Bias Voltage (1.8v -> 3.3v)cbn = Control NFET Bias (1.8v -> -1/5vcbp = Control PFET Bias (0v -> 3.3v)cbpr = Feedback Signal from PFETs (furthest corner)cbnr = Feedback Signal from NFETs (furthest corner)VBCR = Voltage Bias Controller ReturnVbbenb = Vbb EnableVbbenbr = Vbb Enable Mode Transition Terminate

Modes of Operation: 1. Standby Mode2. Data Retention Mode (for battery backup)Operation: Bias the PFET & NFET substrates to

a single level to reduce leakage current.Feedback (furthest corner) is used to reduce switching noise. [8]


Improving Manufacturing Yields

[10, 12]

Operation: Uses forward & reverse body biasing to attain higher operating frequency and low power simultaneously.Technique is referred to as Speed Adaptive –Threshold Voltage (SA-Vth). Bias is controlled so thedelay remains constant.

SA-Vth Scheme

Clock Pulse Modulator- Produces 3 clks at different phases with

¼ duty cycle


Improving Manufacturing Yields (cont.)

[10, 12]

Delay Comparator- Compares delayed signals A & B with ↓clk0

Up/Down Shift Register as a decoder- Level is either incremented or

decremented, based upon delay line variation

- Vbp & Vbn are fed to the delay line circuit so the “delay” of the delay line becomes a pre-determined time

Substrate Bias (Vbb) Generators

Review of leakage current reduction techniques Sleep transistor configuration Initial design of 6T SRAM Cell

Chinmay GupteAshutosh Sharma

Effectiveness of Sleep Transistors in Leakage CurrentReduction

Presentation 1

16th September, 2003

Leakage ProblemLeakage Problem

• Subthreshold leakage current Ileakage (or “off-state” current Ioff) is the small amount of drain current when VGS < VT.

• Subthreshold leakage current varies exponentially with threshold voltage VT.

• This current is influenced by VT ,channel physical dimensions, channel/surface doping profile, drain/source junction depth, gate oxide thickness, and VDD

• Scaling and power reduction trends in future technologies will cause subthreshold leakage currents to become an increasingly large component of power dissipation.

Leakage MechanismsLeakage Mechanisms

• Pn Reverse Bias Current (I1)• Weak Inversion (I2)• Drain Induced Barrier Lowering (I3)• Gate Induced Drain Leakage (I4)• Punchthrough (I5)• Narrow Width Effect (I6)• Gate Oxide Tunneling (I7)• Hot Carrier Injection (I8)

Leakage Current Mechanisms of deepsubmicron transistors

Leakage Current Reduction TechniquesLeakage Current Reduction Techniques• Temperature

– With lowered temperature- Sensitivity factor St decreases

Ioff

- Threshold voltage VT increases– Rise in VT with cooler temperature is becoming insignificant and temperature as

a technique for leakage reduction may lose its efficiency for oxides that are getting thinner with SIA Roadmap achievements.

ID vs. VG showing temperaturesensitivity of Ioff

Leakage Current Reduction TechniquesLeakage Current Reduction Techniques

• Substrate-Well Biasing– Biasing the source-well voltage to negative voltages for n-channel and positive

voltages for p-channel transistor increases VT.– As VT increases, Ioff decreases.– Amount of backbiasing is limited by GIDL and the increased field across thin

oxide.– Indiscriminate backbiasing of the wells may lead to higher Ioff if GIDL is not

understood.

n-channel log(ID) vs VG for 6 substrate biases on 0.35um logic process technology (VD = 2.7 V)


• Input Vector Control– Due to the transistor stacking effect, the leakage of a circuit depends on its input

combination.– A minimum leakage pattern, that maximizes the number of “off” transistors in all

stacks across the circuit, is used to drive the circuit while in standby mode.– Requires addition of multiplexers, resulting in additional leakage and delay.

• Stack Effect• Forcing series transistors to be off simultaneously.

• Exerts reverse bias between gate and source of transistor above the “off” transistor.

• VGS Leakage

• Leakage reduction achieved with minimal overhead in area, power and process technology.

Stack Effect


• Gating the Supply Voltage– Basic idea is to shut down the power supply so the idle units do not

consume any power.– Use high threshold transistors called sleep transistors.– Use MTCMOS technology

MultiMulti--Threshold CMOS [MTCMOS] TechnologyThreshold CMOS [MTCMOS] Technology

• High Vt Devices used to reduce Leakage Current• Low Vt Devices used to enhance Performance (Delay)

• CMOS Circuit with SLEEP TRANSISTOR (Dual VT Technology)- Effective for Burst mode type integrated circuits [Active & Standby Modes]

• Examples: Cell Phone, Pager, Processor Cache etc.

• Active Mode: Sleep Transistor ON • Standby Mode: Sleep Transistor OFF

MTCMOS Circuit Structure

MultiMulti--Threshold CMOS [MTCMOS] TechnologyThreshold CMOS [MTCMOS] Technology• Both pMOS & nMOS transistors are not generally used.• Only nMOS is used.

– nMOS Area is smaller than pMOS.– nMOS ‘ON’ Resistance is small.– As a result only High-to-Low Swing is affected.

• This structure results in several order reduction in Leakage Current.– Total effective W of original CMOS is reduced to the W of single “OFF”

nMOS (provided the width is smaller than the original pull down width)– High Vt nMOS results in an exponential reduction in leakage current.

MultiMulti--Threshold CMOS [MTCMOS] CircuitsThreshold CMOS [MTCMOS] Circuits

• Sleep Transistor Sizing Problems. (Area Vs Delay Performance)– Large Size

• ‘ON’ Resistance High-to-Low Delay [GOOD]• Overall Area [BAD]• Dynamic Power Consumption [BAD]

– Small Size• ‘ON’ Resistance Leakage Current [GOOD]• Overall Area [GOOD]• Dynamic Power Consumption [GOOD]• High-to-Low Delay [BAD]

– Sizing of Sleep Transistors properly is a key issue in power gating designs


• Other Problems– Vx Drop Effects:

• Gate Drive reduces from Vcc to Vcc-Vx• Vt of Pull Down nMOS increases due to Body Effect.

– Backward Current Effects:• Charges output capacitance from 0 to Vx• Noise Margin for next logic reduces

MTCMOS block illustrating equivalentresistance, capacitance, and reverse

conduction effects.


• Sleep transistor sizing is depended on circuit discharge current pattern.– Which in turn is dependant on Input vector.

• Some discharge patterns may affect the timing limitations on critical path.

• When analyzing MTCMOS circuits, one cannot simply examine the critical paths in the circuit, but must also consider all other accompanying gates that are switching.

• For optimal sizing, one would need to exhaustively simulate the entire circuit for all possible input vectors and all sleep transistor sizes such that delay on critical path is within tolerable limits.

• J. Kao et al have proposed MTCMOS Hierarchical Sizing Based on Mutually Exclusive Discharge patterns.

– In this approach Logic that does not switch at the same time, share a common sleep transistor.

– It is a systematic approach and gives an approximate size of sleep transistor.

Sleep Transistor Circuit TopologiesSleep Transistor Circuit Topologies

Module/Circuit-level Sleep transistor configuration

• Consists of only one sleep transistor for the entire circuit.

• Results in increased virtual GND interconnect resistance.

• Large Sleep transistor is needed to maintain reasonable performance.

• Therefore it is not a good solution.


Cluster-based Sleep transistor configuration

• Consist of one sleep transistor per cluster.• Sizing of sleep transistor depends on Maximum Simultaneous Switching Currents

(MSSC) per Cluster.• In order have small sleep transistor, logic in a cluster is chosen such that this MSSC

is minimum.• This results in extra constraints on placement and may conflict with timing-driven

placement.• Also Total Sleep Transistor Area will be quite large.• Still not a good solution.

Distributed Sleep Transistor Network (DSTN) Configuration

• Combination of Circuit-level & Cluster-based Sleep transistor configuration.• Virtual GND lines of all the clusters are connected together.• In DSTN, the current discharges through both its own and the neighboring sleep

transistors, making virtual ground better strapped.• Results show that DSTN is better than Circuit-level & Cluster-based approaches.


6T SRAM CELL6T SRAM CELL

Pass Transistors Inverter 1 Inverter 2Result

Read M3 M4 M1 M5 M2 M6

1 1 1 0 1 1 0 VC > VC_bar

0 1 1 1 0 0 1 VC < VC_bar

• RS = 0, M3=M4=0, Vc and Vc_bar reach voltage level Vdd -Vtp.•When RS=1, M3=M4=1, read/write is performed.•Write “0” : Vc is forced low by write circuitry and V1=0 due to M3 and V2=1 due to M6.•Write “1” : Vc_bar is forced low by write circuitry and V1=1 due to M5 and V2=0 due to M4.

Operation

Circuit Topology of CMOS SRAM cell

The difference in voltages is small and is detected by the senseamplifiers in data-read circuitry.

6T SRAM CELL6T SRAM CELL

Schematic of the 6T SRAMCell in TSMC 0.18 um

Technology

Write “1” Read “1” Write “0” Read “0”

Read Select (RS)

Bit Line C(Vc)

Bit Line C_Bar (Vc_bar)

Write Line “0”

Write Line “1”

6T SRAM CELL6T SRAM CELLFunctionality

EE660 – Advanced Topics in VLSI Design

Project 5 – Presentation 1Accurate RC Extraction by Using a Neural Network to Compensate for

OPC Effects

Mahir AydinJohn Pratt

Presentation Outline

• An in-depth look at OPC (Optical Proximity Correction) and its benefits to deep submicron VLSI fabrication

• Project overview• Review of test layout shapes• Mentor’s Calibre OPC tool overview

Motivation for OPC

• Feature size decreases faster than the lithographic wavelength– Current lithography deep-UV radiation

wavelength = 243 nm– Metal-1 minimum wire width in TSMC 0.18um

process =230 nm• Resulting imperfect wiring has significant

impact on performance as feature size decreases

Imperfect Wiring

Optical Proximity Correction

• Pre-compensating for sub-wavelength effects by altering the mask before fabrication

• Three major OPC techniques– Line-end treatment– Line biasing– Scattering bars

Line-End Treatment

• Hammerheads and Serifs

Line Biasing

• Line width varies with pitch• Line biasing adjusts the width of the wires

to compensate for this variation• Works well for dense lines, but not as

effective on isolated lines• Scattering bars are used for controlling the

width of isolated lines

Line Biasing

Scattering Bars

• Small, sub-resolution features placed alongside isolated lines to control their width during exposure

Rule Based OPC

• A collection of techniques generated for a certain process.

• Rules vary by feature size as well as exposure system

• Automatically applied to a design, similar to DRC

Model Based OPC

• Features are fragmented into small pieces, and OPC is applied to individual fragments, taking neighborhood effects into account

• Analytic or empirical model used for correction, the latter can be based on actual SEM photographs

• More refined than rule based OPC, but much more computationally intensive

Rule vs. Model Based OPC

Project Overview

• RC extraction – Input: Layout before OPC – Output: The RC values of different wires (poly,

metal1, etc.) after OPC is applied and the mask is transferred to wafer

– Neural network will be used to perform pattern recognition and prediction

Sample of Layout Shapes

Calibre Workbench

Light intensity vs. distance

Light intensity image

OPC EffectsOriginal Layout

OPC corrected Layout

Print image on original layout

Print image on corrected layout

More OPC effectsLayout # 1

(Original Layout)Layout # 2

(1 iteration)Layout # 3

(6 iterations)Layout # 4

(6 iterations on Layout #2)

Project 6: A Better DRC

-Review of OPC and PSM

-Overview of Mentor’s Calibre OPC and PSM Tools

-Test-bench Layout Shapes

Review of OPC and PSM

The Problem• Wavelength of light is the limiting factor in optical

lithography

Year Wave Leng th Des crip tion Minimum Feature Siz e1980-1988 436nm Mercury Vapor G-line 1000nm1988-1994 365nm Mercury Vapor I-line 350nm1994-1996 365nm Mercury Vapor I-line 250nm1996-2001 248nm KrF Excimer Las er 150nm2002 193nm ArF Excimer Las er 90nm

Subwave Length Lithography

Possible Solutions

• Electron Beam Lithography• Extreme Ultra Violet Lithography• Proximity X-ray• Ion Beam• Subwave Length Lithography:

• Stick with current Ultra Violet wavelengths• Use Resolution Enhancement Techniques (RET)

The Problem with Subwave Length Lithography: Diffraction

Planar light source

Mask


Planar light source

Mask Point light source


Planar light source

Mask Point light source

Fourier distribution of light on wafer

OPC: Optical Proximity CorrectionSerifs on corners to reduce corner rounding and feature length shortening

Width variation to compensate for effects of adjacent features

OPC Pattern

PSM: Phase Shifting MaskPhase Shifter

Mask

Amplitude

Intensity

Overview of Mentor’s Calibre OPC and PSM Tools

What it can do for you• Enables Silicon Accuracy, Speed and Yield From 180nm

to 65nm.

• Provides Resolution Enhancement Technologies (OPC, PSM) for deep submicron physical verification.

• Ensures feature sizes below the wavelength of light will print accurately.

• Minimizes mask costs and lowers overall write times.

The Calibre WORKbench

Drawn Layout

OPC Corrected Layout

Simulated Wafer Image

Test-bench Layout Shapes

Layout shapes I

Layout shapes II

Layout shapes III

Layout shapes IV

Layout shapes V

A Better DRC

• Removing the hard DRC rules, allowing simulated wafer images to guide a more intelligent DRC checker.

• Training a Neural Network to detect DRC failures as a percentage yield, not just a pass or fail result.

Questions?

Project #7

byMichael Malander

Jayashree Sridharan

Introduction

• Review of general statistics related to correlation

• Review of current techniques of statistical timing analysis

Correlation

• What is correlation– A quantitative assessment of the strength of

relationship between two variables• Why is correlation significant• Correlation and Causation

Correlation Example

• Pearson’s sample correlation coefficient –

• Correlation coefficient r = 0.905Positive Correlation

-5

0

5

10

15

20

25

0 5 10 15 20 25

X

Y

1*

−= ∑

nZZ

r YX

Current Techniques

• Circuit Delay • Types of optimization

– Interconnect– Gate

Methods of Analysis

• Monte Carlo

• Statistical Timing Analysis (STA)– Why use STA

Interconnect

• Process Variation– Parasitic R– Parasitic C

• Environmental Variation– Signal Arrival Time – Capacitive Coupling

Gate Delay

• Process Variation– Variations in Length– Doping Density– Oxide thickness

• Environmental Variation– Arrival Time– Input Slew Rate– Threshold Voltage

Statistical Timing Analysis

• Longest Path• Statistical Variation of Delay

– Interconnect– Gate

• False Path Elimination• Weight of Sensitizable Paths• Re-evaluation of Longest Path

Probabilistic Event Propagation

• Algorithm-– Arrival time and Cell Delay are taken as random

variables– A cell level net list and cell delays are used as inputs to

produce signal arrival times as output– Circuit is first partitioned into supergates– The events at the output of a supergate is obtained

using cross product sampling evaluationand recursive sampling evaluation

False Path

• Goal is to estimate a true circuit delay distribution and deliver critical, sensitizable and true paths

• Algorithm-– Phase 1:

• Slack is used to find nodes that are critical• Construct critical paths using critical nodes

– Phase 2:• Identify the timingly true paths using Monte Carlo sampling

scheme

Spatial Correlation

• Algorithm– For each gate assign an effective length

• Inter Die Variation– Enumerate Possibilities from Best to worst case– Use Length variation to establish Delay variation– Discretization of Length Distribution

• Intra Die Variation– Delay Sensitivity multiplied by Length– Include Spatial Correlation

– Calculate the delay of each path• Sum of delay of each gate in the path• Include effects of arrival time and input slope

Spatial Correlation

• Intra-Die Length Variation– Sum of weighted Spatial variations

Conclusion

• Current Research– Combination of Process and Environmental

• Goal– Standard Deviation– Correlation

useful skew and linear programming › ece660 › presentation_sept... · 2004-08-24 · vaibhav...

Documents