useful skew and linear programming › ece660 › presentation_sept... · 2004-08-24 · vaibhav...
TRANSCRIPT
Useful Skew And Linear Programming
By:Vaibhav Nawale
Tejas Pipwala
Outline of the presentation
Skew: the problem in modern VLSI designZero Skew vs. Useful SkewCurrent Techniques employed for optimal useful clock skew settingLinear Programming overviewLP as a tool for optimizing useful skew setting
Skew : the problem in modern VLSI designs
Causes of Clock Skew Interconnect delaysBuffer delaysProcess variations
Effects of Clock SkewSlowing down the pace of doubling clock speed (places an overhead)Causes malfunction due to timing violations
Zero Skew vs. Useful Skew
A traditional solution to the problem of skewClock distribution design and data path design considered separatelyDifferent clock distribution topologies with appropriate buffer insertions used to eliminate skewAlmost impossible to achieve; increased wire size, more power(area) penalty and higher costs
A recent approach to solve the clock-skew problemTakes into account the interactions between the data path and the clock to make use of the clock skewVarious methods (like LPP) to determine the optimal permissible useful skewEffective low cost solution to the clock skew problem; by introducing intentional skew within the permissible limits
Current TechniquesSkew = local constraint
Current Techniques (cont..)Skew scheduling for design robustness
Timing is correct as long as the signal arrives in the permissible skew rangeDesign will be more robust if clock signal arrival time is in the middle of permissible skew range, rather than on the edge.
Current Techniques (cont..)Cycle stealing on synchronous CircuitsThe proposed method can fully discover the cycle stealing effect on level- sensitive synchronous designs using an algorithm.This algorithm first constructs a latch graph from a timing analysis on the combinational logicThen it analyzes cycle stealing based on overlay timing relationships among latch nodes.A breadth-first search algorithm examines all possible cycle stealing among latches. The algorithm also considers the fact that cycle stealing is topology dependent
Current Techniques(cont..)Cycle stealing (cont..)
A backtracking method using breadth-first search technique is applied to verify the overlay relationships through multiple stagesIn the backtracking process, a feedback loop starts at a root latch, traverses the latch graph, and ends at root latch. The delays in some feedback loops cant satisfy the timing requirements and backtracking of overlay timing information can detect these timing violations.In this technique, it averages the negative slack at each latch stage in the feedback loop. Thus, For each edge, only the maximum negative slack is recorded.
Current techniques (cont..)Automatic Clock Skew Scheduling
This makes use of an algorithm (min cycle algorithm) to minimize the cycle period of a synchronous circuit by optimizing clock skewsThe idea is to utilize the bound imposed by the critical cycle of the circuit and setup/hold time constraints on individual registers to compute clock period.The resultant period is the best that can be achieved by clock skew scheduling.This algorithm takes advantages of the basic structure of the practical circuits and hence it is very efficient.
Current Techniques (cont...)Min Cycle algorithm application
Tmax = 8E’ = {e1}, E”={e3,e4}ticks(a)=0, ticks(b)=-1, ticks(c)=0θ1(e1)=∅ , θ1(e2)=2, θ1(e3)=8, θ1(e4)=8, θ2(e1)=8→ θ=2∴ skew(a)=0, skew(b)=2, skew(c)=0,new Tmax = 6
Thus in the first iteration of algorithm, the maximum permissible amount to improve the clock period is 2. The algorithm iterates until θ=0.
Current Techniques (cont..)Min-Cycle Algorithm
/* Select all forward critical edges and
form new graph C' (setup critical). */ E' = { e | e =(u,v) and
[ dmax(e) - skew(v) + skew(u) ] == Tmax } C' = (V, E') /* Select all backward critical edges and form
graph C'' (hold critical) */ E'' = { e | e = (u,v) and
[ skew(u) + dmin(e) ] == skew(v) } C'' = (V, E'') /* Propagate ticks to maximum # of registers in the
forward (setup) and backward (hold) directions */ for each v ∈∈∈∈ V in topological order {
ticks(v) = 0 : if v is root of C' min{ { ticks(u’) - 1 | (u’,v) ∈∈∈∈ E' },
{ ticks(u’’) | (v,u’’) ∈∈∈∈ E’’ } }
otherwise}
/* Compute maximum θθθθ for adjusting Tmax */ for each e = (u,v) ∈∈∈∈ E {
if( ticks(v) + 1 > ticks(u) ) { /* setup */ t = dmax(u,v) + skew(u) - skew(v) θθθθ1 = min{ [ Tmax – t ] /
[ ticks(v) - ticks(u) + 1 ] } } if( ticks(u) > ticks(v) ) { /* hold */
t = dmin(u,v) + skew(u) - skew(v) θθθθ2 = min{ [ t ] / [ ticks(u) - ticks(v) ] }
} } θθθθ = min{ θθθθ1 , θθθθ2 }
Linear Programming overviewProblems involving the determination of the minimum or maximum of a function are a part of classical mathematical analysis and have been denoted by the collective term extreme value problems
The domain that incorporates constrained extreme valueproblems is labeled as the field of linear programming.
njxand
bxaxaxa
bxaxaxabxaxaxa
subjectxcxcxcz
j
mnmnmm
nn
nn
nn
...,3,2,10
)2(
.............
...........
)1(......:max
2211
22222121
11212111
2211
====≥≥≥≥
−−−−−−−−−−−−−−−−−−−−−−−−−−−−
====++++++++
====++++++++====++++++++
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−++++++++++++====
Methods
Graphical methods; only for geometric understandingSimplex method; an iterative method that moves along the boundary of the feasible solution space. It covers a fraction of all the extreme points to determine the optimal solutionRevised simplex method; a modification to simplex method to solve large and sparse problemsInterior point methods; faster iterative solution approaching the optimum value from within the feasible solution region
LP as a tool for optimizing useful skew setting
All approaches to apply LPP to useful skew setting have to satisfy the the Setup and Hold time specifications.The objective function may vary but the basic constraints equations remain the same.
Hold time check
Set up time checkand
Where is the clock arrival time at say latch i and MAX(i,j)/MIN(i,j) are the max/min delays b/w the latch pair
HOLDxjiMINxji
++++≥≥≥≥++++ ),(
PERIODxSETUPjiMAXxij++++≤≤≤≤++++++++ ),(
boundupperxboundloweri
__ ≤≤≤≤≤≤≤≤
ix
Different objectivesInitially, clock skew optimization was proposed with a view to improve synchronous circuit speed performance
minimize: Time period (P) subject to: the setup & hold constraintsAnother objective can be viewed as optimizing the useful skew such that the clock signal arrival time is in the middle of the permissible skew range. This gives the circuit the additional robustness to uncertainty in clock arrival timesA third objective is seen as optimal skew setting (using a combination of 2 LP problems) incorporated in gate sizing to take advantage of larger timing budget.minimize: Area subject to: Delay <= Pspec orminimize: P subject to: Area<= Aspec
Linear programming; a good choiceThe ubiquity and immense utility of linear programming, the simplicity, beauty, and accessibility of the underlying mathematics, its enormous capability for modeling and solving a vast array of concrete real-world problems, and its intimate connections with computer science provide ample justification for its use in solving a wide variety of problems.Clock time minimization by optimal useful skew setting is just one of the many.The existing algorithms do not consider the process variations that cause uncertainties in the clock arrival time. The use of LPP makes it much easier to add the uncertainty as an additional constraint to determine the optimal time
References:[1] Ichiang Lin, Ludwig John Kwok Eng -Analyzing cycle stealing on synchronous circuits with level sensitive latches.
July 1992 Proceedings of the 29th ACM/IEEE conference on Design automation conference
[2] Ravindram Kaushik-Automatic clock skewingDepartment of Electrical engineering and computer science, University of California, Berkley
[3] Fishburn, J M.- Clock Skew OptimizationComputers, IEEE Transactions on , Volume: 39 Issue: 7 , July 1990
[4] Joe G. Xi, Wayne W.-M. Dai-Useful Skew Clock Routing With Gate Sizing for Low Power Dseign
June 1996 Proceedings of the 33rd annual conference on Design automation conference
[5] Xun Liu, Marios C. Papaefthymiou, Eby G. Friedman -Maximizing performance by retiming and clock skew scheduling
Design Automation Conference, 1999. Proceedings. 36th , 21-25 June 1999
STATISTICAL TIMING ANALYSIS
Kranthi BodepudiPraveen Vangari
Outline• STA – case analysis• Process variations• Statistical Timing Analysis
– Full chip analysis– Path based analysis
• Delays based on arrival times(NAND, NOR)
Static timing analysis
• Process parameters modeled in STA using case analysis.
• Analysis using Best-case, nominal and worst-case SPICE parameter sets.
• Deterministic delays are used, statistical variation in silicon is hidden. (die-die)
• Worst case analysis for within-die variations leads to pessimistic results (since it ignores inherent statistical variations)
Process Variations
• Uncertainty in device• Interconnect characteristics
– Effective gate length– Doping concentrations– Oxide thickness
• Inter-die variation and intra-die variation
Statistical Timing Analysis
• Objective (PDF/CDF)• Full chip analysis
– Bounds– Selective Enumeration
• Path based analysis– Gate level model– Spatial correlation model
Statistical timing analysis formulation
• Circuit delay, a random variable• CDF of delay of a probabilistic timing graph Gp is
where pi(ti) = PDF of delay of edge i. • Enumerate all possible edge delays
Exponential run-time = kn
k = no. of discrete edge delay PDF.
∫≤
=≤tGD
P
D
dtdttptptGDP)(
212211 ...)....()())((
Statistical timing analysis• Arrival time propagation
• Independent arrival timesTwo arrival times An,i and An,j at nodes ni and nj are independent if the fanin subgraphs GS,i and GS,j at nodes ni and nj are disjoint (meaning they have no common edges) or if any common edges have a deterministic delay.
• Dependent arrival timesConsider the set of k predecessor nodes np,i of node n, with fanin subgraphsGS,i, and the intersection graph GI={Ni,Ei} consisting of the union of edges and nodes shared by two or more subgraphs, excluding the source node ns. The dependence set of n is the set of nodes {n1, n2,...., nd,...}, such that nd lies on the intersection graph, , and has one or more fanout edges ei that do not lie on the intersection graph,
Arrival times
• Independent PDFCDF
– Linear run time complexity– Only valid if arrival times are independent
)()()( tQtPtPPG ⋅=
τττ dqtptpPG ⋅⋅−= ∫
∞
0
)()()(
Arrival times• Dependent arrival time propagation
Identify all dependence nodes in the circuit.Propagate arrival time PDFs in the circuit until the first dependence node nd is encountered.Enumerate the pairs (ti, pi) of arrival time PDF Ad at nd and for each pair propagate ti with conditional probability pi.Propagate ti, using independent arrival time propagation until the next dependence node is encountered and repeat above step.Compute the final arrival time PDF at node x by summing the conditional arrival time PDFs weighted by the product of their conditional probabilities.
• Recursive enumeration of arrival time PDF’s of all dependence nodes --- exponential run-time complexity.
• Since Dependence nodes < edges, Set of nodes at which arrival times are enumerated is the sufficient.
Statistical Bounds
• Upper bound– The arrival time CDF P(t) is an upper bound of the
arrival time CDF Q(t) if and only if for all t, • Lower bound
– Given the CDFs X(t) and Y(t) of two dependent random variables x and y and the random variable z = max(x, y), it is clear that the CDF min(X(t), Y(t)) is a lower bound on the CDF of z.
– Similar to independent arrival time, but at convergence nodes the statistical max. function is replaced with a selection of arrival time with largest expected value.
)()( tQtP ≤
Overall approach• Exact Graph Reduction
– If where A1 and A2 are two arrival time PDFs that converge at a node, then A2 is pruned.
– Series and Parallel Reduction
– Parallel Reduction removes local reconvergence and improves quality of lower bounds.
21 max,2min,1 AAtt >
Overall approach continued..
• Selective enumeration– Bound computation with Enumeration of dependence
nodes• Partition of sample space, reduction of dependencies of arrival
time PDF's.• When all dependence nodes of a particular convergence nodes
are enumerated, the arrival times at this node are independent and the low bound can be computed using statistical maximum, instead of selection of the arrival time with the latest expected value.
• No. of enumerated dependence nodes increases, quality of bounds increases.
Results of bounds and selective Enumeration
Path-Based statistical timing analysis
• Chooses top ‘n’ critical paths with deterministic timing analysis
• Statistical analysis of each path• Avoids the issue of reconvergence• Accounts for intra die process variations
Published workConsiders
– Inter and Intra-die correlations– Variations of input slope and output loads– Spatial correlation of intra-die gate length
variations
Device lengthPath delay Dp resulting from variation of the length of individual gates in path.
Simplifying assumption
iraernomitotal LLLL ,intint, ∆+∆+=
)( ,intint iraernomi
iP LLLDD ∆+∆+=∑
)()()()( ,intint,intint iraierinomiiraernomi LDLDLDLLLD ∆∆+∆∆+=∆+∆+
• Inter-die variability analysis ( )– Computed different possibilities of worst case to best case process
corners.• Intra-die variability analysis ( )
– Function of multiple independent random variables
-- Change in output slope of preceding gate-- Change in input slope of succeeding gate
∑∑ ∆∆+i
erii
nomi LDLD )()( int
∑ ∆∆i
irai LD )( ,int
iraira
i
iirai L
LDLD ,int
,int,int )( ∆×
∂∂=∆∆∑
1
1
,int11 ),,(
+
−
−+
∆∆
∆∆∆=∆
i
i
iraiii
clS
LSclfD
Total change in path delay
-- coefficient of total path delay change due to intra-die device length at
gate ‘i’.
∑ ∆×=i
irairap LKD )( ,intint,
iKiL∆
Spatial Correlation model
Delay variation for each gate with respective coefficients oftotal path delay change can be expressed as
)()(
)(
1,04,115,233
1,01,14,222
1,01,11,211
LLLKDLLLKD
LLLKD
∆+∆+∆=∆∆+∆+∆=∆
∆+∆+∆=∆
NAND
BA
A
B
C L
C ab
VDD
A (ns) B (ns) Delay (ps)0 1 68.310 2 68.4910 3 68.5860 4 68.6620 5 68.708
A (ns) B (ns) Delay (ps)1 0 63.1152 0 63.1123 0 63.1144 0 63.1125 0 63.113
NOR
Cab
CL
B
A
A B
VDD
A (ps) B (ps) Delay (ps)0 10 44.220 20 42.210 30 41.250 40 41.140 50 40.84
A (ps) B (ps) Delay (ps)10 0 44.8120 0 44.0230 0 44.9440 0 45.6450 0 46.15
References• “Statistical Timing Analysis using Bounds”, Aseem Agarwal,
David Blaauw, Vladimir Zolotov, Sarma Vrudhula.
• “Statistical Timing Analysis using Bounds and Selective Enumeration”, Aseem Agarwal, David Blaauw, VladimirZolotov, Sarma Vrudhula.
• “Path-Based Statistical Timing Analysis Considering Inter- and Intra-Die Correlations”, Aseem Agarwal, David Blaauw, Vladimir Zolotov, Savithri Sundareswaran, Min Zhao, Kaushik Gala, Rajendran Panda.
Body Biasing (BB)
and
Evolutionary Algorithms
David Balhiser
and
David M. Sendek
EE660 Advanced Topics - VLSI Design
16 September 2003
Outline
• Body Biasing– What is it ?– Why do it ?– How does it work ?– How do we use body biasing ?
• Evolutionary Algorithms
Body Biasing – What is it ?(Forward Biasing)
If V sb > 0 => Forward Biasing ,where V sb is the Source-to-BodyVoltage.
Advantages:• Used in active mode to increase
operating frequency. • Improves short channel effects, thusreducing sensitivity to critical dimensional variation.
• If a voltage divider circuit is applied,Vbp < Vdd or Vbn > Vss can be generated on chip.
Disadvantages:• Increases leakage power as well.• If additional circuits are present, ap-well will likely be required toisolate elements which couldinclude separate biases (triple-well).
If V sb < 0 => Reverse Biasing ,where V sb is the Source-to-BodyVoltage
Advantages:• Lowers leakage power
Disadvantages:• Requires a separate power supplyfor Vbp and/or Vbn
Body Biasing – What is it ?(Reverse Biasing)
[10]
The Focus will be on N-well Biasing
Body Biasing – What is it ?(Triple Well)
Body Biasing – Why Do it ? (General Trends)
↑Vdd => ↓TD => ↑fop=> ↑Ps , ↑PD => ↑IL
↑Vbn => ↓Vth, ↓TD => ↑fop=> ↑Ps (FBB), ↑PD
↑Vbp => ↓Vth, ↑TD (RBB) => ↓fop=> ↓Ps , → PD , ↑PSH
BOTTOM LINE:If ↑fop is desired, => FBB => ↓Vth (which also ↑Ps )If ↓Ps is desired, => RBB => ↑ Vth (which also ↓fop )
↓Vth => ↑ IL , ↓TD (Forward Bias) => ↑fop
[3, 5]
LEGEND:Vdd = Supply VoltageVth = Threshold VoltageTD = Delay Timefop = Operating FrequencyPs = Static Power DissipationPD = Dynamic Power DissipationIL = Leakage Current↑ = Increasing↓ = Decreasing↑↓ = Variable→= Little ChangeFBB = Forward Body BiasRBB = Reverse Body BiasNBB = Normal Body Bias
Vth is changed for a Power-Performance Trade-Off
Vth = Vfb + VT-MOS
Vth = Vfb + 2φb + [2 εSiqNA(2φb + Vsb|)]1/2/COX
Vth = Vt0 + γ[ (2φb + Vsb|)1/2 - (2φb)1/2 ]
where,γ = (t ox/ εox)(2q εSiNA )1/2 = (1/COX )( 2 εSiqNA )1/2
LEGEND:Vth = Threshold VoltageVfb = Flatband VoltageVT-MOS = Ideal Threshold Voltageφb = Bulk PotentialεSi = Permittivity of Siliconq = Charge of an ElectronNA = Substrate Doping ConcentrationVsb| = Voltage Between Source and SubstrateCOX = Oxide CapacitanceVt0 = Threshold Voltage for Vsb = 0γ = Constant for Substrate Bias Effectt ox = Gate Oxide Thicknessεox = Dielectric Constant
[7]
Body Biasing – How Does it Work ? (Formulas)
Vth is changed since Vsb is Directly Effected by Body Biasing
Ids = (µ ε/ t ox)(W/Leff )(Vgs – Vtn)2/2 (in saturation)
IL = (W/Leff )Is[1-e^(- Vdd/VT)]e^[-(Vth+Voff)/(n VT)]
LEGEND:Ids = Drain-to-Source Currentµ = Channel Carrier Mobilityε = Permittivity of the Gate Insulatort ox = Gate Oxide ThicknessW = Width of the ChannelLeff = Effective Length of the ChannelVgs = Gate-to-Source Voltage = Vds + VthVtn = Threshold Voltage of NFETVth = Threshold VoltageVds = Drain-to-Source VoltageIL = Leakage Current = Ioff (weak inversion
region)Is = process constant (empirically derived)Vdd = Supply VoltageVT = Thermal VoltageVth = Threshold VoltageVoff = Offset Voltage (empirically derived;
process dependent)n = swing factor (empirically derived;
process dependent)
[3, 7, 16]
Body Biasing – How Does it Work ? (Formulas)
Vth has an Exponential Impact on Leakage Current
PD = fopCLV2DDα
PS = Σ IL VDD
Psh = fop V DD α ∫1cycle IS dt
Ptotal = PD + PS + Psh
LEGEND:PD = Dynamic Power Dissipationfop = Operating FrequencyCL = Capacitive LoadVDD = Supply Voltageα = Activity FactorPS = Static Power DissipationIL = Leakage CurrentPsh = Short-Circuit Power Dissipation∫1cycle = Integration over 1 clock cycleIS = Short Circuit Current Ptotal = Total Power Dissipation
[3, 6]
Body Biasing – How Does it Work ? (Formulas)
t = dLk/(VDD – Vth) α
LEGEND:t = Delay of a basic CMOS InverterdL = Logic Depth of the path k = Process Dependent constant VDD = Supply VoltageVth = Threshold Voltage α = Measure of Velocity Saturation
[3]
Body Biasing – How Does it Work ? (Formulas)
“Alpha-Power Model”
BackgroundTypes of Process Variations
Lot-to-Lot (L2L)Wafer-to-Wafer (W2W)Die-to-Die (D2D)Within Die (WID)
Body BiasingStatic Body BiasingAdaptive Body Biasing
The ABB Methods Presented Focus on WID Parameter Modifications
Body Biasing – How do we use body biasing ?
Static Body BiasingFBB critical paths to increase performance (↑fop)RBB on non-critical paths (↓Ps )Combination of the two
A Single Biasing Value is Selected
Body Biasing – How do we use body biasing ?
Adaptive Body Biasing (ABB)
Power-Performance Trade-off across the ChipPredominately used to optimize power and/or performanceDifferent power modes (such as active, stand-by, sleep)
Power-Performance Trade-off of sub-circuits within the chipEffects yields (binning)
Power-Performance Trade-off of every N-well within the chipEffects yields (binning)Most useful for WID variations
Body Biasing – How do we use body biasing ?
Body Biasing – How do we use body biasing ?
Frequency
Pow
er
Vdd
Body Biasing
FBBRBB
Curves are Process Dependent
Adaptive Body Bias (ABB) Techniques
Addressing Power ConsumptionVBC = Voltage Bias ControllerVBCG = VBC GenerationVBCP = VBC PFETVPCN = VBC NFETVBCI = VBC vsub = substrate voltage = vdd - vwellvbn = NFET Bias Voltage (0v -> -1.5v)vbp = PFET Bias Voltage (1.8v -> 3.3v)cbn = Control NFET Bias (1.8v -> -1/5vcbp = Control PFET Bias (0v -> 3.3v)cbpr = Feedback Signal from PFETs (furthest corner)cbnr = Feedback Signal from NFETs (furthest corner)VBCR = Voltage Bias Controller ReturnVbbenb = Vbb EnableVbbenbr = Vbb Enable Mode Transition Terminate
Modes of Operation: 1. Standby Mode2. Data Retention Mode (for battery backup)Operation: Bias the PFET & NFET substrates to
a single level to reduce leakage current.Feedback (furthest corner) is used to reduce switching noise. [8]
Adaptive Body Bias (ABB) Techniques
Addressing Power Consumption (cont.)
[8]
Adaptive Body Bias (ABB) Techniques
Improving Yields
[5, 15]
- CUT represents a critical path- ROenable is a Ring Oscillator
for frequency and active power measurements
Testchip Subsites Single Testchip Subsite
Adaptive Body Bias (ABB) Techniques
[5, 15]
φ = Desired Operating FrequencyPD = Phase Detector5-bit Counter = Clocked by PD whose value represents
the desired body bias to apply (32mV resolution)
Op Amp = Converts Digital to analog level
Operation: PD compares critical path delay with the targetoperating frequency. FF clk divider allows bodybody bias generator to stabilize an the criticalpath to adapt to the new body bias before thePD is updated.
Improving Yields
Adaptive Body Bias (ABB) Techniques
[5, 15]
Improving Yields (cont.)
Initial Distribution of ChipsYield Improvement(w/biasing)LEGEND:
ABB = Adaptive Body Bias NBB = Normal Body Biasσ/µ = Frequency VariationRBB = Reverse Body BiasFBB = Forward Body Bias
Biasing Histogram
Adaptive Body Bias (ABB) Techniques
[5, 15]
Improving Yields (cont.)
Biasing Histogram
Adaptive Body Bias (ABB) Techniques
Improving Yields (controlling WID variations)vdiv_cntl = Voltage Divider Control Pull_cntl = switch for Vbp (or if tied to Vdd, can control
Vdd)
Operation: Individual PFET N-wells are biased basedupon a scan latch control. This same could be used to control Vdd too. Thispermits frequency increases and/or powerreduction for product binning
[4, 11]
Adaptive Body Bias (ABB) Techniques
Improving Yields (controlling WID variations) (cont.)
[4, 11]
LEGEND:NWF = N-Well Floating VDD = VDD BiasingNWB = N-Well Biasing (Vbp)DWB = Dual Well Biasing (Vbp & Vbn)VDIV = Voltage Divider Control
Initial Distribution of ChipsFinal Distribution of Chips(w/biasing)
Yield Improvement(w/biasing)
Evolutionary Algorithms
Why Search Optimization ?Optimization is used in lieu of exhaustive searchingProblem:
Assume we have hundreds of sub-circuits of a die where a body bias is required. In addition, assume there are thousands of diesThe goal is to determine the body bias set points for each sub-circuit for each chipIt would take years to exhaustively search the best body bias set point of each sub-circuit for a single chip.
Hence, search optimization is used.
Evolutionary Algorithm* Based upon Darwin’s Theory of Evolution A search optimization techniqueAlgorithminitialize population; //Generate initial population & encode to
//Format for each individual in the populationevaluate population; //Determine “fitness” of each individualwhile NOT (termination criteria) //Usually, the number of generations or “good” soln{
select population; //Survival of the “fittest” individualscrossover population; //Mating between individualsmutate population; //A slight characteristic change to an individualevaluate population; //Determines “fitness” of each individual
}
[1] * = Also referred to as a Genetic Algorithm (GA)
Basic Genetic Algorithm
Parameter Space of all Possible Solutions[2]
= 1 Chromosome or Individual.A Possible Solution
= Results of crossover. (Chromosome Sharing)
= Results of mutation. (Minor Chromosome Change)
= Absolute “minimum” (for discussion purposes)
Optimum Chromosome
Search Space of all Possible SolutionsFi
tnes
s (m
inim
ize)
Generations
Population Convergence
Optimum Solution or Target
One Population
0 1 2 3 n
•••
Each Chromosome has anAssociated fitness value
Basic Genetic AlgorithmA population consists of individuals called chromosomes and the size is usually static. The population represents one set of solutions.One iteration is a generation.A chromosome can be represented/encoded as a binary, real, integer, characters, etc.A chromosome structure can be an array, a Directed Acyclic Graph, a tree, a list, etc.A chromosome is evaluated on its “fitness”, the function to be optimized.
Basic Genetic Algorithm OperatorsCrossover (single point)chromosome 1: 1001011001
child chromosome 1001111000chromosome 2: 1110111000Mutationchromosome 1100010101 1100110101
Why These Operators ?Crossover drives the population towards fitter individualsMutation ensures population diversity
Selection Operation Ensures Survival of the Fittest Individuals
Basic Genetic Algorithm Variations(a short list)
Cross Over OperatorSingle PointTwo-pointUniform (every point)No Crossover (Random-walk Hill Climber)
Selection OperationTournament SelectionRoulette Wheel SelectionElitism
Basic Genetic AlgorithmAdvantages
No presumptions to problem space.If presumptions are made, this can help converge to an optimal solutionLow developmental cost.Applicable to a wide range of problems where no “good” method is available.
DisadvantagesNo guarantee of finding optimal solutions in a finite amount of time. But if near optimal solution is desired, this is an advantage.Algorithm search can get stuck in a local minima, but mutation is able to solve this problem.Parameter tuning mostly by trial & error, such as a multi-objective function with a single fitness function. F = αf1 + (1-α)f2 where α is a “weight”
[2]
Multi-Objective (MO) GA’s
• Optimize two or more objectives concurrently.• Determines/Maps trade-off space for given
problem.• For ABB optimization.
– GA Objectives: • 1) Delay minimization. • 2) Power minimization.
– Constraint: Limited tester time per part (1 minute).– Systematic goals: Yield enhancement. Bin upgrading.
NSGA-II Algorithm
• Maintains solution diversity using improved density metric.
• Retains best solutions between generations (elitist).• Reasonable O(MN2) time complexity.• Empirically tested to show best or near-best results for a
wide variety of MO problems versus other advanced MO GA’s.
Summary
Body Biasing is a circuit design technique to trade power & performance
Body Biasing can improve yield
Body Biasing can be done adaptively
An “intelligent” search method is used to determine body bias set points.
Genetic Algorithms provide a methodology to search “intelligently”.
References1. M. Srinivas and Lalit M. Patnaik, “Genetic Algorithms: A Survey”, IEEE Computer, June 19942. Kalyanmoy Deb, “Evolutionary Algorithms: Techniques and Applications”, 9th International Conference
on Neural Information Processing, 4th Asia-Pacific Conference on Simulated Evolution and Learning,International Conference on Fuzzy Systems and Knowledge Discovery, November 18-22, 2002,Singapore
3. Justin Gregg, “A Low Cost Individual-Well Adaptive Body Bias (IWABB) Scheme for Leakage Power Reduction, Performance Enhancement and Manufacturing Yield Improvement in the Presence of Intra-Die Variations”, Summer 2003
4. Justin Gregg and Tom W. Chen, “Optimization of Individual Well Adaptive Body Biasing (IWABB) UsingSingle Objective Algorithms”, draft Summer 2003
5. James W. Tschanz, James T. Kao, Siva G. Narendra, Raj Nair, Dimitri A. Antoniadis, Anantha P. Chandrakasan and Vivik De, “Adaptive Bias for Reducing Impacts of Die-to-Die andWithin-Die Parameter Variations on Microprocessor Frequency and Leakage, IEEE Journal ofSolid-State Circuits, Vol. 37, No. 11, November 2002, pgs. 1396 – 1402
6. Ajith Leo Chandy, “Optimized Placement and Allocation of Decoupling Capacitors for Improvement inSystem Performance Under the Leakage Power Constraint”, draft Fall 2003
7. Neil H.E. Weste and Kamran Eshraghian, “Principles of CMOS VLSI Design: A Systems Perspective”,second edition, Addison-Wesley, October 1994
8. Hiroyuki Mizuno et al, “A 18µA-standby-Current 1.8v 200MHz Microprocessor with Self Substrate-BiasedData-Retention Mode”, ISSCC99, 16 February 1999
References
10. Masayuki Miyazaki et al., “A 1.2GIPS/W Microprocessor Using Speed-Adaptive Threshold-VoltageCMOS With Forward Bias”, IEEE J. of SSC, Vol. 37, No. 2, pgs. 210-217, February 2002
11. Tom Chen, “EE660 Advanced Topics – VLSI Design”, Fall 200312. Masayuki Miyazaki et al., “A Delay Distribution Squeezing Scheme with Speed-Adaptive Threshold-
Voltage CMOS (SA-Vt CMOS) Low Voltage LSIs”, 1998 International Symposium on Low Power Electronics and Design Proceedings, pgs. 48-53, August 1998
13. Ali Keshavarzi et al., “Forward Body Bias for Microprocessors in 130nm Technology Generation andBeyond”, 2002 Symposium On VLSI circuits Digest of Technical Papers, pgs. 312-315, 2002
14. Siva Narendra et al., “Forward Body Bias for Microprocessors in 130nm Technology Generation andBeyond”, IEEE J. of SSC, vol. 38, No. 5, pgs 696-701, May 2003
15. James W. Tschanz, Siva G. Narendra, Raj Nair, Dimitri A. Antoniadis, and Vivik De, “Effectiveness ofAdaptive Supply Voltage and Body Bias for Reducing Impacts of Parameter Variations in Low Powerand High Performance Microprocessors”, IEEE Journal of Solid-State Circuits, Vol. 38, No. 5, May 2003, pgs. 826 – 829
16. J. H. Huang et al, “A Robust Physical and Predictive Model for Deep-Submicrometer MOS Circuit Simulation”, IEEE 1993 Custom Integrated Circuits Conference, 1993
Backup Slides
Biasing Simulations (Using Spice)
NAND Circuit NOR Circuit
Biasing Simulation Results - NANDNAND Gate Static Power (Sweep Vdd, Normal Biasing)
1.51E-05
4.47E-08
1.17E-111.08E-113.15E-111.00E-11
1.00E-101.00E-09
1.00E-081.00E-071.00E-06
1.00E-051.00E-041.00E-03
1.00E-021.00E-01
1.00E+001.4 1.6 1.8 2 2.2
Vdd (volts)
Pow
er (w
atts
)
Static Power
Sweeping Vdd
Vbp = 1.8V, Vbn = 0.0V
NAND Gate Delay Time (Sweep Vdd, Normal Biasing)
1.94E-10
1.51E-10
1.30E-101.15E-10
1.02E-10
0.00E+00
5.00E-11
1.00E-10
1.50E-10
2.00E-10
2.50E-101.4 1.6 1.8 2 2.2
Vdd (volts)
Dela
y Ti
me
(sec
onds
)
Delay Time
Biasing Simulation Results - NAND
Sweeping Vdd
NAND Gate Threshold Voltage (Sweep Vdd, Normal Biasing)
1.15E+00
9.33E-01
7.47E-01
5.32E-01
2.88E-01
0.00E+00
2.00E-01
4.00E-01
6.00E-01
8.00E-01
1.00E+00
1.20E+00
1.40E+001.4 1.6 1.8 2 2.2
Vdd (volts)Th
resh
old
Volta
ge (v
olts
)
Threshold Voltage
Vbp = 1.8V, Vbn = 0.0V
Biasing Simulation Results - NAND
Sweeping Vbn
NAND Gate Static Power (Sweep Vbn)
3.15E-113.21E-11 3.16E-11 3.18E-11
8.36E-10
1.00E-111.00E-101.00E-091.00E-081.00E-071.00E-06
1.00E-051.00E-041.00E-031.00E-021.00E-01
1.00E+00-0.4 -0.2 0 0.2 0.4
Vbn (volts)
Pow
er (w
atts
)
Static Power
Vdd = 1.8V, Vbp = 1.8V
Reverse Bias Forward Bias
NAND Gate Delay Time (Sweep Vbn)
1.18E-101.21E-10
1.30E-10
1.35E-10
1.40E-10
1.05E-10
1.10E-10
1.15E-10
1.20E-10
1.25E-10
1.30E-10
1.35E-10
1.40E-10
1.45E-10-0.4 -0.2 0 0.2 0.4
Vbn (volts)
Dela
y Ti
me
(sec
onds
)Delay Time
Reverse Bias Forward Bias
Biasing Simulation Results - NAND
Sweeping Vbn
NAND Gate Threshold Voltage (Sweep Vbn)
7.76E-01
7.63E-01
7.47E-01
7.31E-01
7.18E-01
6.80E-016.90E-01
7.00E-017.10E-01
7.20E-017.30E-01
7.40E-017.50E-01
7.60E-017.70E-01
7.80E-017.90E-01
-0.4 -0.2 0 0.2 0.4
Vbn (volts)
Thre
shol
d Vo
ltage
(vol
ts)
Threshold Voltage
Reverse Bias Forward Bias
Vdd = 1.8V, Vbp = 1.8V
Biasing Simulation Results - NAND
Sweeping Vbp
NAND Gate Static Power (Sweep Vbp)
1.69E-111.69E-111.92E-10
1.34E-07
3.15E-11
1.00E-11
1.00E-101.00E-091.00E-081.00E-071.00E-061.00E-051.00E-041.00E-031.00E-021.00E-01
1.00E+001.4 1.6 1.8 2 2.2
Vbp (volts)
Pow
er (w
atts
)
Static Power
Reverse BiasForward Bias
Vdd = 1.8V, Vbn = 0.0V
NAND Gate Delay Time (Sweep Vbp)
1.30E-10
1.27E-10
1.24E-10
1.20E-101.21E-10
1.22E-101.23E-10
1.24E-101.25E-10
1.26E-10
1.27E-10
1.28E-101.29E-10
1.30E-10
1.31E-101.4 1.6 1.8
Vbp (volts)
Dela
y Ti
me
(sec
onds
)Delay Time
Forward Bias
Biasing Simulation Results - NAND
Sweeping Vbp
NAND Gate Threshold Voltage (Sweep Vbp)
6.88E-01
7.47E-017.47E-01
7.87E-01
8.18E-01
6.00E-01
6.50E-01
7.00E-01
7.50E-01
8.00E-01
8.50E-011.4 1.6 1.8 2 2.2
Vbp (volts)
Thre
shol
d Vo
ltage
(vol
ts)
Threshold Voltage
Vdd = 1.8V, Vbn = 0.0V
Reverse BiasForward Bias
Biasing Simulation Results - NOR
Sweeping Vdd
Vbp = 1.8V, Vbn = 0.0V
NOR Gate Delay Time (Sweep Vdd, Normal Biasing)
2.38E-10
1.83E-10
1.24E-10 1.23E-101.11E-10
0.00E+00
5.00E-11
1.00E-10
1.50E-10
2.00E-10
2.50E-10
3.00E-101.4 1.6 1.8 2 2.2
Vdd (volts)
Dela
y Ti
me
(sec
onds
)Delay Time
NOR Gate Static Power (Sweep Vdd, Normal Biasing)
2.96E-06
7.46E-09
5.58E-125.98E-12
7.29E-121.00E-121.00E-11
1.00E-101.00E-09
1.00E-081.00E-071.00E-06
1.00E-051.00E-04
1.00E-031.00E-02
1.00E-011.00E+00
1.4 1.6 1.8 2 2.2
Vdd (volts)
Stat
ic P
ower
(wat
ts)
Biasing Simulation Results - NOR
Sweeping Vdd
Vbp = 1.8V, Vbn = 0.0V
NOR Gate Threshold Voltage (Sweep Vdd, Normal Biasing)
5.68E-01
4.73E-01
3.95E-01
1.49E-01
0.00E+000.00E+00
1.00E-01
2.00E-01
3.00E-01
4.00E-01
5.00E-01
6.00E-011.4 1.6 1.8 2 2.2
Vdd (volts)Th
resh
old
Volta
ge (v
olts
)
Threshold Voltage
Biasing Simulation Results - NOR
Sweeping Vbn
Vdd = 1.8V, Vbp = 1.8V
NOR Gate Static Power (Sweep Vbn)
7.59E-12 7.48E-12 7.32E-12 7.66E-12
8.12E-10
1.00E-12
1.00E-111.00E-10
1.00E-09
1.00E-081.00E-07
1.00E-06
1.00E-051.00E-04
1.00E-03
1.00E-021.00E-01
1.00E+00-0.4 -0.2 0 0.2 0.4
Vbn (volts)
Stat
ic P
ower
(wat
ts)
NOR Gate Delay Time (Sweep Vbn)
1.40E-10
1.36E-10
1.26E-10
1.15E-10
1.20E-10
1.25E-10
1.30E-10
1.35E-10
1.40E-10
1.45E-100 0.2 0.4
Vbn (volts)
Del
ay T
ime
(sec
onds
)Delay Time
Reverse Bias Forward Bias Forward Bias
Biasing Simulation Results - NOR
Sweeping Vbn
Vdd = 1.8V, Vbp = 1.8V
NOR Gate Threshold Voltage (Sweep Vbn)
4.53E-01 4.32E-013.89E-01
3.35E-01
2.89E-01
0.00E+00
5.00E-02
1.00E-01
1.50E-01
2.00E-01
2.50E-01
3.00E-01
3.50E-01
4.00E-01
4.50E-01
5.00E-01-0.4 -0.2 0 0.2 0.4
Vbn (volts)
Thre
shol
d Vo
ltage
(vol
ts)
Threshold Voltage
Reverse Bias Forward Bias
Biasing Simulation Results - NOR
Sweeping Vbp
Vdd = 1.8V, Vbn = 0.0V
NOR Gate Static Power (Sweep Vbp)
8.58E-126.60E-125.57E-11
6.69E-08
7.31E-12
1.00E-12
1.00E-111.00E-10
1.00E-091.00E-08
1.00E-071.00E-06
1.00E-05
1.00E-041.00E-03
1.00E-021.00E-01
1.00E+001.4 1.6 1.8 2 2.2
Vbp (volts)
Stat
ic P
ower
(wat
ts)
NOR Gate Delay Time (Sweep Vbp)
1.10E-10
1.32E-10
1.24E-10
1.51E-101.56E-10
0.00E+00
2.00E-11
4.00E-11
6.00E-11
8.00E-11
1.00E-10
1.20E-10
1.40E-10
1.60E-10
1.80E-101.4 1.6 1.8 2 2.2
Vbp (volts)
Dela
y Ti
me
(sec
onds
)
Reverse BiasForward Bias
Reverse BiasForward Bias
Biasing Simulation Results - NOR
Sweeping Vbp
Vdd = 1.8V, Vbn = 0.0V
NOR Gate Threshold Voltage (Sweep Vbp)
2.85E-01
3.09E-01
3.94E-013.82E-01
4.73E-01
0.00E+00
5.00E-02
1.00E-01
1.50E-01
2.00E-01
2.50E-01
3.00E-01
3.50E-01
4.00E-01
4.50E-01
5.00E-011.4 1.6 1.8 2 2.2
Reverse BiasForward Bias
Eldo Spice NAND Gate Source Code* Filename: biasingNAND2.cir* Programmers: David M. Sendek & David Balhiser* Date: 2 September 2003* Course: EE660 Advanced Topics - VLSI Design* Professor: Dr. Tom Chen
* BACKGROUND:* This is the netlist file that was originally created by* Mentor Graphics Design Architect(Eldo or Spice model of* the NAND CMOS circuit). This file has been modified* (removal of extraneous comments & commands) as well as* incorporating Eldo Spice commands to extract various* circuit parameters. The width of the PFET & NFET were* obtained from David Balhiser.
* DESCRIPTION:* This circuit model is a basic NAND CMOS circuit. The* purpose of this exercise is to observe the effects of* varying Vdd, Vbn and Vbp. Vdd is the supply voltage* to the CMOS NAND circuit. Vbn is the bias voltage to* the substrate of the NFETs. Vbp is the bias voltage* to the substrate of the PFETs. If Vbp < Vdd OR* Vbn > Vss (ground), then this is a forward biased* circuit. If Vbp > Vdd OR Vbn < Vss (ground), then* this is a reverse biased circuit.
* The intent is to observe the effects upon threshold* voltage (Vth), risetime & falltime and static power* consumption by modifying Vdd, forward & reverse* biasing the substrate of the CMOS circuit.
* The library used in EE571 & this example.LIB /class/EE571/models/log018.eldo53 TT
* Define parameters, i.e., variables with an initial value* that can be later "swept" to change the values.param swVdd = 1.8v.param swVbn = 0.0v.param swVbp = 1.8v
* Define Vdd and Groundv0 GND 0 DC 0vdd VDD GND DC swVddv3 swVbn 0 DC swVbn v4 swVbp GND DC swVbp
* Define input pulse*v1 in1 GND pulse(0 1.8 20n 100p 100p 20n 40n)*v2 in2 GND pulse(0 1.8 10n 100p 100p 10n 20n)v1 in1 GND pwl(0n 1.8 5n 1.8 5.5n 0 12n 0 12.5n 1.8 18n 1.8)v2 in2 GND pwl(0n 1.8 5n 1.8 5.5n 0 12n 0 12.5n 1.8 18n 1.8)
* Define the load capacitance of 5 femto FaradsC_I$416 out GND 5F
* N-FETs & P-FETs for a NAND circuit.* Note: The fields of the circuit showing connections* are as follows (for M_p1):* label = M_p1* drain = swVdd* gate = in2* source = out* body = Vbp* This model uses 1.8v MOS device modelsM_n2 N$1 in2 GND swVbn nch.6 W=1.6U L= 0.18UM_n1 out in1 N$1 swVbn nch.6 W=1.6U L= 0.18UM_p2 vdd in1 out swVbp pch.6 W=2.44U L= 0.18UM_p1 vdd in2 out swVbp pch.6 W=2.44U L= 0.18U
Eldo Spice NAND Gate Source Code(cont.)
•Step through various Vdd & biasing voltages to observe* the results. Note: Only 1 step command can be active* at a time. The others are commented out*.step param swVdd 1.4v 2.2v 0.2v*.step param swVbn -0.4v 0.4v 0.2v.step param swVbp 1.4v 2.2v 0.2v
* Specify the transient analysis.tran 0.01ns 40ns
*Note: If the .plot command is commented out the results of the simulation* output with the extract commands can be observed in the biasingNAND.chi* file..plot v(in1) v(in2).plot v(out)* Extract the rise and fall delays at 50% of Vdd* Returns the x-axis value of v(out) at the crossing of y-axis value of 0.9* at the first (1) or second (2) occurance..extract label = risetime (xthres(v(out), 0.9, 1) - xthres(v(in2), 0.9, 1)).extract label = falltime (xthres(v(out), 0.9, 2) - xthres(v(in2), 0.9, 2)).extract label = Delay ((xthres(v(out),0.9,1)-xthres(v(in2),0.9,1))+(xthres(v(out),0.9,2)-xthres(v(in2),0.9,2)))
* Extract the threshold voltage, i.e., Vdd/2 = 0.9V* Returns the y-axis value of v(in1) when the x-axis value xthres(v(out),0.9,1).* xthres(v(out),0.9,1) returns the x-axis value of v(out) at the crossing of* y-axis value of 0.9 at the first occurence (1)..extract label = V1_threshold (yval(v(in1),xthres(v(out), 0.9, 1))).extract label = V2_threshold (yval(v(in2),xthres(v(out), 0.9, 1)))
*Note: As part of the Eldo Spice simulation, static power is displayed* during the simulation run. This is simply copied down.END
Eldo Spice NOR Gate Source Code* Filename: biasingNOR.cir* Programmers: David M. Sendek & David Balhiser* Date: 2 September 2003* Course: EE660 Advanced Topics - VLSI Design* Professor: Dr. Tom Chen
* BACKGROUND:* This is the netlist file that was originally created by* Mentor Graphics Design Architect(Eldo or Spice model of* the NOR CMOS circuit). This file has been modified* (removal of extraneous comments & commands) as well as* incorporating Eldo Spice commands to extract various* circuit parameters. The width of the PFET & NFET were* obtained from David Balhiser.
* DESCRIPTION:* This circuit model is a basic NOR CMOS circuit. The* purpose of this exercise is to observe the effects of* varying Vdd, Vbn and Vbp. Vdd is the supply voltage* to the CMOS NAND circuit. Vbn is the bias voltage to* the substrate of the NFETs. Vbp is the bias voltage* to the substrate of the PFETs. If Vbp < Vdd OR* Vbn > Vss (ground), then this is a forward biased* circuit. If Vbp > Vdd OR Vbn < Vss (ground), then* this is a reverse biased circuit.
* The intent is to observe the effects upon threshold* voltage (Vth), risetime & falltime and static power* consumption by modifying Vdd, forward & reverse* biasing the substrate of the CMOS circuit.
* The library used in EE571 & this example.LIB /class/EE571/models/log018.eldo53 TT
* Define parameters, i.e., variables with an initial value* that can be later "swept" to change the values.param swVdd = 1.8v.param swVbn = 0.0v.param swVbp = 1.8v
* Define Vdd and Groundv0 GND 0 DC 0vdd VDD GND DC swVddv3 swVbn 0 DC swVbnv4 swVbp GND DC swVbp
* Define input pulse*v1 in1 GND pulse(0 1.8 20n 100p 100p 20n 40n)*v2 in2 GND pulse(0 1.8 10n 100p 100p 10n 20n)v1 in1 GND pwl(0n 1.8 5n 1.8 5.5n 0 12n 0 12.5n 1.8 18n 1.8)v2 in2 GND pwl(0n 1.8 5n 1.8 5.5n 0 12n 0 12.5n 1.8 18n 1.8)
* Define the load capacitance of 5 femto FaradsC_I$416 out GND 5F
* N-FETs & P-FETs for a NOR circuit.* Note: The fields of the circuit showing connections* are as follows (for M_p1):* label = M_p1* drain = swVdd* gate = in2* source = N$1* body = swVbp* This model uses 1.8v MOS device modelsM_n2 out in2 GND swVbn nch.6 W=1.6U L= 0.18UM_n1 out in1 GND swVbn nch.6 W=1.6U L= 0.18UM_p2 N$1 in1 out swVbp pch.6 W=2.44U L= 0.18UM_p1 vdd in2 N$1 swVbp pch.6 W=2.44U L= 0.18U
Eldo Spice NOR Gate Source Code(cont.)
* Step through various Vdd & biasing voltages to observe* the results. Note: Only 1 step command can be active* at a time. The others are commented out*.step param swVdd 1.4v 2.2v 0.2v*.step param swVbn -0.4v 0.4v 0.2v.step param swVbp 1.4v 2.2v 0.2v
* Specify the transient analysis.tran 0.001ns 40ns
*Note: If the .plot command is commented out the results of the simulation* output with the extract commands can be observed in the biasingNAND.chi* file..plot v(in1) v(in2).plot v(out)
* Extract the rise and fall delays at 50% of Vdd* Returns the x-axis value of v(out) at the crossing of y-axis value of 0.9* at the first (1) or second (2) occurance..extract label = risetime (xthres(v(out), 0.9, 1) - xthres(v(in2), 0.9, 1)).extract label = falltime (xthres(v(out), 0.9, 2) - xthres(v(in2), 0.9, 2)).extract label = Delay ((xthres(v(out),0.9,1)-xthres(v(in2),0.9,1))+(xthres(v(out),0.9,2)-xthres(v(in2),0.9,2)))
* Extract the threshold voltage, i.e., Vdd/2 = 0.9V* Returns the y-axis value of v(in1) when the x-axis value xthres(v(out),0.9,1).* xthres(v(out),0.9,1) returns the x-axis value of v(out) at the crossing of* y-axis value of 0.9 at the first occurence (1)..extract label = V1_threshold (yval(v(in1),xthres(v(out), 0.9, 1))).extract label = V2_threshold (yval(v(in2),xthres(v(out), 0.9, 1)))
*Note: As part of the Eldo Spice simulation, static power is displayed* during the simulation run. This is simply copied down.END
Adaptive Body Bias (ABB) Techniques
Different Power Modes
LCB = Leakage Current MonitorsSSB = Self-Substrate Bias CircuitVbb = Substrate Bias Voltage
Operation: Dynamically varies Vth with substrate bias feedback control circuits to reduce active and standby power dissipation Variable Vt Scheme
SSB Circuit
[9]
Adaptive Body Bias (ABB) Techniques
Different Power Modes (cont.)
[9]
Frequency Fluctuation Control (using RBB) Operating Current Control
Max Forward Bias
RBB FBB
NFET Body Bias Voltage (Vbb) Control
Vth 0.1v Variation
Substrate Bias Control
Adaptive Body Bias (ABB) Techniques
Addressing Increased Performance (cont.)
Leakage Current Monitor Substrate Charge Injector- Drives substrate from Vstandby to GND
[9]
Adaptive Body Bias (ABB) Techniques
Addressing Power ConsumptionVBC = Voltage Bias ControllerVBCG = VBC GenerationVBCP = VBC PFETVPCN = VBC NFETVBCI = VBC vsub = substrate voltage = vdd - vwellvbn = NFET Bias Voltage (0v -> -1.5v)vbp = PFET Bias Voltage (1.8v -> 3.3v)cbn = Control NFET Bias (1.8v -> -1/5vcbp = Control PFET Bias (0v -> 3.3v)cbpr = Feedback Signal from PFETs (furthest corner)cbnr = Feedback Signal from NFETs (furthest corner)VBCR = Voltage Bias Controller ReturnVbbenb = Vbb EnableVbbenbr = Vbb Enable Mode Transition Terminate
Modes of Operation: 1. Standby Mode2. Data Retention Mode (for battery backup)Operation: Bias the PFET & NFET substrates to
a single level to reduce leakage current.Feedback (furthest corner) is used to reduce switching noise. [8]
Adaptive Body Bias (ABB) Techniques
Improving Manufacturing Yields
[10, 12]
Operation: Uses forward & reverse body biasing to attain higher operating frequency and low power simultaneously.Technique is referred to as Speed Adaptive –Threshold Voltage (SA-Vth). Bias is controlled so thedelay remains constant.
SA-Vth Scheme
Clock Pulse Modulator- Produces 3 clks at different phases with
¼ duty cycle
Adaptive Body Bias (ABB) Techniques
Improving Manufacturing Yields (cont.)
[10, 12]
Delay Comparator- Compares delayed signals A & B with ↓clk0
Up/Down Shift Register as a decoder- Level is either incremented or
decremented, based upon delay line variation
- Vbp & Vbn are fed to the delay line circuit so the “delay” of the delay line becomes a pre-determined time
Substrate Bias (Vbb) Generators
Review of leakage current reduction techniques Sleep transistor configuration Initial design of 6T SRAM Cell
Chinmay GupteAshutosh Sharma
Effectiveness of Sleep Transistors in Leakage CurrentReduction
Presentation 1
16th September, 2003
Leakage ProblemLeakage Problem
• Subthreshold leakage current Ileakage (or “off-state” current Ioff) is the small amount of drain current when VGS < VT.
• Subthreshold leakage current varies exponentially with threshold voltage VT.
• This current is influenced by VT ,channel physical dimensions, channel/surface doping profile, drain/source junction depth, gate oxide thickness, and VDD
• Scaling and power reduction trends in future technologies will cause subthreshold leakage currents to become an increasingly large component of power dissipation.
Leakage MechanismsLeakage Mechanisms
• Pn Reverse Bias Current (I1)• Weak Inversion (I2)• Drain Induced Barrier Lowering (I3)• Gate Induced Drain Leakage (I4)• Punchthrough (I5)• Narrow Width Effect (I6)• Gate Oxide Tunneling (I7)• Hot Carrier Injection (I8)
Leakage Current Mechanisms of deepsubmicron transistors
Leakage Current Reduction TechniquesLeakage Current Reduction Techniques• Temperature
– With lowered temperature- Sensitivity factor St decreases
Ioff
- Threshold voltage VT increases– Rise in VT with cooler temperature is becoming insignificant and temperature as
a technique for leakage reduction may lose its efficiency for oxides that are getting thinner with SIA Roadmap achievements.
ID vs. VG showing temperaturesensitivity of Ioff
Leakage Current Reduction TechniquesLeakage Current Reduction Techniques
• Substrate-Well Biasing– Biasing the source-well voltage to negative voltages for n-channel and positive
voltages for p-channel transistor increases VT.– As VT increases, Ioff decreases.– Amount of backbiasing is limited by GIDL and the increased field across thin
oxide.– Indiscriminate backbiasing of the wells may lead to higher Ioff if GIDL is not
understood.
n-channel log(ID) vs VG for 6 substrate biases on 0.35um logic process technology (VD = 2.7 V)
Leakage Current Reduction TechniquesLeakage Current Reduction Techniques
• Input Vector Control– Due to the transistor stacking effect, the leakage of a circuit depends on its input
combination.– A minimum leakage pattern, that maximizes the number of “off” transistors in all
stacks across the circuit, is used to drive the circuit while in standby mode.– Requires addition of multiplexers, resulting in additional leakage and delay.
• Stack Effect• Forcing series transistors to be off simultaneously.
• Exerts reverse bias between gate and source of transistor above the “off” transistor.
• VGS Leakage
• Leakage reduction achieved with minimal overhead in area, power and process technology.
Stack Effect
Leakage Current Reduction TechniquesLeakage Current Reduction Techniques
• Gating the Supply Voltage– Basic idea is to shut down the power supply so the idle units do not
consume any power.– Use high threshold transistors called sleep transistors.– Use MTCMOS technology
MultiMulti--Threshold CMOS [MTCMOS] TechnologyThreshold CMOS [MTCMOS] Technology
• High Vt Devices used to reduce Leakage Current• Low Vt Devices used to enhance Performance (Delay)
• CMOS Circuit with SLEEP TRANSISTOR (Dual VT Technology)- Effective for Burst mode type integrated circuits [Active & Standby Modes]
• Examples: Cell Phone, Pager, Processor Cache etc.
• Active Mode: Sleep Transistor ON • Standby Mode: Sleep Transistor OFF
MTCMOS Circuit Structure
MultiMulti--Threshold CMOS [MTCMOS] TechnologyThreshold CMOS [MTCMOS] Technology• Both pMOS & nMOS transistors are not generally used.• Only nMOS is used.
– nMOS Area is smaller than pMOS.– nMOS ‘ON’ Resistance is small.– As a result only High-to-Low Swing is affected.
• This structure results in several order reduction in Leakage Current.– Total effective W of original CMOS is reduced to the W of single “OFF”
nMOS (provided the width is smaller than the original pull down width)– High Vt nMOS results in an exponential reduction in leakage current.
MultiMulti--Threshold CMOS [MTCMOS] CircuitsThreshold CMOS [MTCMOS] Circuits
• Sleep Transistor Sizing Problems. (Area Vs Delay Performance)– Large Size
• ‘ON’ Resistance High-to-Low Delay [GOOD]• Overall Area [BAD]• Dynamic Power Consumption [BAD]
– Small Size• ‘ON’ Resistance Leakage Current [GOOD]• Overall Area [GOOD]• Dynamic Power Consumption [GOOD]• High-to-Low Delay [BAD]
– Sizing of Sleep Transistors properly is a key issue in power gating designs
MultiMulti--Threshold CMOS [MTCMOS] TechnologyThreshold CMOS [MTCMOS] Technology
• Other Problems– Vx Drop Effects:
• Gate Drive reduces from Vcc to Vcc-Vx• Vt of Pull Down nMOS increases due to Body Effect.
– Backward Current Effects:• Charges output capacitance from 0 to Vx• Noise Margin for next logic reduces
MTCMOS block illustrating equivalentresistance, capacitance, and reverse
conduction effects.
MultiMulti--Threshold CMOS [MTCMOS] TechnologyThreshold CMOS [MTCMOS] Technology
• Sleep transistor sizing is depended on circuit discharge current pattern.– Which in turn is dependant on Input vector.
• Some discharge patterns may affect the timing limitations on critical path.
• When analyzing MTCMOS circuits, one cannot simply examine the critical paths in the circuit, but must also consider all other accompanying gates that are switching.
• For optimal sizing, one would need to exhaustively simulate the entire circuit for all possible input vectors and all sleep transistor sizes such that delay on critical path is within tolerable limits.
• J. Kao et al have proposed MTCMOS Hierarchical Sizing Based on Mutually Exclusive Discharge patterns.
– In this approach Logic that does not switch at the same time, share a common sleep transistor.
– It is a systematic approach and gives an approximate size of sleep transistor.
Sleep Transistor Circuit TopologiesSleep Transistor Circuit Topologies
Module/Circuit-level Sleep transistor configuration
• Consists of only one sleep transistor for the entire circuit.
• Results in increased virtual GND interconnect resistance.
• Large Sleep transistor is needed to maintain reasonable performance.
• Therefore it is not a good solution.
Sleep Transistor Circuit TopologiesSleep Transistor Circuit Topologies
Cluster-based Sleep transistor configuration
• Consist of one sleep transistor per cluster.• Sizing of sleep transistor depends on Maximum Simultaneous Switching Currents
(MSSC) per Cluster.• In order have small sleep transistor, logic in a cluster is chosen such that this MSSC
is minimum.• This results in extra constraints on placement and may conflict with timing-driven
placement.• Also Total Sleep Transistor Area will be quite large.• Still not a good solution.
Distributed Sleep Transistor Network (DSTN) Configuration
• Combination of Circuit-level & Cluster-based Sleep transistor configuration.• Virtual GND lines of all the clusters are connected together.• In DSTN, the current discharges through both its own and the neighboring sleep
transistors, making virtual ground better strapped.• Results show that DSTN is better than Circuit-level & Cluster-based approaches.
Sleep Transistor Circuit TopologiesSleep Transistor Circuit Topologies
6T SRAM CELL6T SRAM CELL
Pass Transistors Inverter 1 Inverter 2Result
Read M3 M4 M1 M5 M2 M6
1 1 1 0 1 1 0 VC > VC_bar
0 1 1 1 0 0 1 VC < VC_bar
• RS = 0, M3=M4=0, Vc and Vc_bar reach voltage level Vdd -Vtp.•When RS=1, M3=M4=1, read/write is performed.•Write “0” : Vc is forced low by write circuitry and V1=0 due to M3 and V2=1 due to M6.•Write “1” : Vc_bar is forced low by write circuitry and V1=1 due to M5 and V2=0 due to M4.
Operation
Circuit Topology of CMOS SRAM cell
The difference in voltages is small and is detected by the senseamplifiers in data-read circuitry.
6T SRAM CELL6T SRAM CELL
Schematic of the 6T SRAMCell in TSMC 0.18 um
Technology
Write “1” Read “1” Write “0” Read “0”
Read Select (RS)
Bit Line C(Vc)
Bit Line C_Bar (Vc_bar)
Write Line “0”
Write Line “1”
6T SRAM CELL6T SRAM CELLFunctionality
EE660 – Advanced Topics in VLSI Design
Project 5 – Presentation 1Accurate RC Extraction by Using a Neural Network to Compensate for
OPC Effects
Mahir AydinJohn Pratt
Presentation Outline
• An in-depth look at OPC (Optical Proximity Correction) and its benefits to deep submicron VLSI fabrication
• Project overview• Review of test layout shapes• Mentor’s Calibre OPC tool overview
Motivation for OPC
• Feature size decreases faster than the lithographic wavelength– Current lithography deep-UV radiation
wavelength = 243 nm– Metal-1 minimum wire width in TSMC 0.18um
process =230 nm• Resulting imperfect wiring has significant
impact on performance as feature size decreases
Imperfect Wiring
Optical Proximity Correction
• Pre-compensating for sub-wavelength effects by altering the mask before fabrication
• Three major OPC techniques– Line-end treatment– Line biasing– Scattering bars
Line-End Treatment
• Hammerheads and Serifs
Line Biasing
• Line width varies with pitch• Line biasing adjusts the width of the wires
to compensate for this variation• Works well for dense lines, but not as
effective on isolated lines• Scattering bars are used for controlling the
width of isolated lines
Line Biasing
Scattering Bars
• Small, sub-resolution features placed alongside isolated lines to control their width during exposure
Rule Based OPC
• A collection of techniques generated for a certain process.
• Rules vary by feature size as well as exposure system
• Automatically applied to a design, similar to DRC
Model Based OPC
• Features are fragmented into small pieces, and OPC is applied to individual fragments, taking neighborhood effects into account
• Analytic or empirical model used for correction, the latter can be based on actual SEM photographs
• More refined than rule based OPC, but much more computationally intensive
Rule vs. Model Based OPC
Project Overview
• RC extraction – Input: Layout before OPC – Output: The RC values of different wires (poly,
metal1, etc.) after OPC is applied and the mask is transferred to wafer
– Neural network will be used to perform pattern recognition and prediction
Sample of Layout Shapes
Calibre Workbench
Light intensity vs. distance
Light intensity image
OPC EffectsOriginal Layout
OPC corrected Layout
Print image on original layout
Print image on corrected layout
More OPC effectsLayout # 1
(Original Layout)Layout # 2
(1 iteration)Layout # 3
(6 iterations)Layout # 4
(6 iterations on Layout #2)
Project 6: A Better DRC
-Review of OPC and PSM
-Overview of Mentor’s Calibre OPC and PSM Tools
-Test-bench Layout Shapes
Review of OPC and PSM
The Problem• Wavelength of light is the limiting factor in optical
lithography
Year Wave Leng th Des crip tion Minimum Feature Siz e1980-1988 436nm Mercury Vapor G-line 1000nm1988-1994 365nm Mercury Vapor I-line 350nm1994-1996 365nm Mercury Vapor I-line 250nm1996-2001 248nm KrF Excimer Las er 150nm2002 193nm ArF Excimer Las er 90nm
Subwave Length Lithography
Possible Solutions
• Electron Beam Lithography• Extreme Ultra Violet Lithography• Proximity X-ray• Ion Beam• Subwave Length Lithography:
• Stick with current Ultra Violet wavelengths• Use Resolution Enhancement Techniques (RET)
The Problem with Subwave Length Lithography: Diffraction
Planar light source
Mask
The Problem with Subwave Length Lithography: Diffraction
Planar light source
Mask Point light source
The Problem with Subwave Length Lithography: Diffraction
Planar light source
Mask Point light source
Fourier distribution of light on wafer
OPC: Optical Proximity CorrectionSerifs on corners to reduce corner rounding and feature length shortening
Width variation to compensate for effects of adjacent features
OPC Pattern
PSM: Phase Shifting MaskPhase Shifter
Mask
Amplitude
Intensity
Overview of Mentor’s Calibre OPC and PSM Tools
What it can do for you• Enables Silicon Accuracy, Speed and Yield From 180nm
to 65nm.
• Provides Resolution Enhancement Technologies (OPC, PSM) for deep submicron physical verification.
• Ensures feature sizes below the wavelength of light will print accurately.
• Minimizes mask costs and lowers overall write times.
The Calibre WORKbench
Drawn Layout
OPC Corrected Layout
Simulated Wafer Image
Test-bench Layout Shapes
Layout shapes I
Layout shapes II
Layout shapes III
Layout shapes IV
Layout shapes V
A Better DRC
• Removing the hard DRC rules, allowing simulated wafer images to guide a more intelligent DRC checker.
• Training a Neural Network to detect DRC failures as a percentage yield, not just a pass or fail result.
Questions?
Project #7
byMichael Malander
Jayashree Sridharan
Introduction
• Review of general statistics related to correlation
• Review of current techniques of statistical timing analysis
Correlation
• What is correlation– A quantitative assessment of the strength of
relationship between two variables• Why is correlation significant• Correlation and Causation
Correlation Example
• Pearson’s sample correlation coefficient –
• Correlation coefficient r = 0.905Positive Correlation
-5
0
5
10
15
20
25
0 5 10 15 20 25
X
Y
1*
−= ∑
nZZ
r YX
Current Techniques
• Circuit Delay • Types of optimization
– Interconnect– Gate
Methods of Analysis
• Monte Carlo
• Statistical Timing Analysis (STA)– Why use STA
Interconnect
• Process Variation– Parasitic R– Parasitic C
• Environmental Variation– Signal Arrival Time – Capacitive Coupling
Gate Delay
• Process Variation– Variations in Length– Doping Density– Oxide thickness
• Environmental Variation– Arrival Time– Input Slew Rate– Threshold Voltage
Statistical Timing Analysis
• Longest Path• Statistical Variation of Delay
– Interconnect– Gate
• False Path Elimination• Weight of Sensitizable Paths• Re-evaluation of Longest Path
Probabilistic Event Propagation
• Algorithm-– Arrival time and Cell Delay are taken as random
variables– A cell level net list and cell delays are used as inputs to
produce signal arrival times as output– Circuit is first partitioned into supergates– The events at the output of a supergate is obtained
using cross product sampling evaluationand recursive sampling evaluation
False Path
• Goal is to estimate a true circuit delay distribution and deliver critical, sensitizable and true paths
• Algorithm-– Phase 1:
• Slack is used to find nodes that are critical• Construct critical paths using critical nodes
– Phase 2:• Identify the timingly true paths using Monte Carlo sampling
scheme
Spatial Correlation
• Algorithm– For each gate assign an effective length
• Inter Die Variation– Enumerate Possibilities from Best to worst case– Use Length variation to establish Delay variation– Discretization of Length Distribution
• Intra Die Variation– Delay Sensitivity multiplied by Length– Include Spatial Correlation
– Calculate the delay of each path• Sum of delay of each gate in the path• Include effects of arrival time and input slope
Spatial Correlation
• Intra-Die Length Variation– Sum of weighted Spatial variations
Conclusion
• Current Research– Combination of Process and Environmental
• Goal– Standard Deviation– Correlation