dfx and signoff: challenges and opportunities 1.5 2 2.5 3 1995 2000 2005 2011 2016 volt mpu release...
TRANSCRIPT
DfX and Signoff: Challenges and Opportunities
Andrew B. KahngUCSD CSE and ECE Departments
[email protected]://vlsicad.ucsd.edu
What is “Design for X” ?• X = growing set of IC implementation challenges
• Manufacturability• Yield• Reliability• Variability• Power• Test• Cost• Debug
• All of these are now:• “first-class citizens”• expensive (fear + uncertainty + doubt = design margin)
DfCDfMDfP
DfTDfRDfV
2abk ISVLSI 120820
What is “Signoff”?• Foundation of contract between design house and foundry
• “chip should work”: stack of models, margins, analyses• Function, timing, signal integrity, power integrity, …
• Example of complexity: Static Timing Analysis
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCI/NBTI
Signoff Vdd
Voltage
Corner explosionProcess RCXX X … FF, FFG, FS, SF, TT, SSF, SS, …
C-worst, Cc-worst, C-best, Cc-best,RC-worst, RC-best, …
Variations Analysis style
OCV analysis- launch path: worst case- capture path: best case- temp, voltage gradients?
Graph-based analysis: pessimisticPath-based analysis: exponential runtime
Crisis: Margins = pessimism overdesign, schedule delay
“margin stack” for voltage signoff
Operating voltage
3abk ISVLSI 120820
Challenge: Scaling of Value
0.001
0.01
0.1
1
10
100
1000
10000
100000
1998 2000 2003 2006 2008 2011 2014
Tran
sistor Cou
nt [M
]
MPU Release Date
Source: [CPUDB]
DENSITY
Non-ideality
0
0.2
0.4
0.6
0.8
1
1.2
0
100
200
300
400
500
600
700
2009 2014 2019 2024
Dynamic Power (W)Active Capacitance Density (nF/mm^2)
POWER
Source: [JeongK08]
IdealNon-ideality
1
10
100
1000
10000
100000
2006 2008 2010 2012 2014 2016
Extended Planar Bulk (μA/μm)
UTB FD (μA/μm)
DG (μA/μm)
Ideal Scaling
DRIVE CURRENT
Source: [ITRS]
Non-ideality
0
0.5
1
1.5
2
2.5
3
1995 2000 2005 2011 2016
Volt
MPU Release Date
SUPPLY VOLTAGE
Source: [CPUDB]Non-ideality
4abk ISVLSI 120820
Challenge: Value of TechnologyD
esig
n qu
ality
(e.g
., fr
eque
ncy)
Technology generation
DfX Signoff with larger guardbands
Guardbandsfor DfX
Margin lost benefits of technology*
Lost benefits!
*Value proposition of next node nowadays: “20-20-20” (% P, P, A) 5abk ISVLSI 120820
• Need tradeoffs among various type of margin• Need co-optimizations of margin across engineering scopes,
and chip implementation phases
Turnaround Time
MARGIN
Product Quality of Results
Model and Analysis Accuracy
ps, nm, mV, …
power, area, fmax, Iddq,…rms, %, σ
The Big Lever for DfX and Signoff: Margin
6abk ISVLSI 120820
Motivating Study: The Price of Margin [ISQED-2009]
• Expected impact of margin reduction:
Delay reduction
Easier optimization
Smaller gate size
Smaller area (A)
Smaller #defects
Smaller cost
Shorter wire
Adr eY
Ar
ArN dies 2
22
(d: defect density)
(r: wafer radius)
7abk ISVLSI 120820
Design Outcomes from Margin Reduction
• 40% guardband reduction• Area: 13% reduction• Dynamic power: 13% reduction• Leakage power: 19% reduction• Wirelength: 12% reduction• SP&R runtime: 28% reduction• #Timing viols.:100% reduction • #Good dies (w/ process
enhancement): 10% increases• #Good dies (w/o process
enhancement): 4% increase
• Quantified impacts of margin reduction greater interest in margin reduction techniques
Cell library guardband reduction
Synthesis
RC guardbandreduction
Placement
Clock tree synthesis
Routing
Outcomes(area, wirelength,
runtime, #viols, yield)
RTL Design(AES, JPEG, SOC1)
Technology(90nm, 65nm, 45nm)
Experimentswith industry chipimplementationflow
8abk ISVLSI 120820
Margin, DfX and Signoff: 4 Mindsets
• Be Reactive
• Be Adaptive
• Be Proactive
• Be Predictive
9abk ISVLSI 120820
Margin, DfX and Signoff: Mindsets
• Be Reactive: Develop design methodology in response to new or increased variation …
• Be Adaptive
• Be Proactive
• Be Predictive
10abk ISVLSI 120820
Double-Patterning at 20nm: Bimodal Variation [SPIE2008,ASPDAC2009,ICCAD2009]• Two patterning steps Two CD (critical dimension)
distributions
• Comparison of design guardband (Min-Max delay)• Unimodal representation is too pessimistic
Green linesfrom 1st patterning
Blue linesfrom 2nd patterning
0.0E+00
5.0E-12
1.0E-11
1.5E-11
2.0E-11
2.5E-11
3.0E-11
1 nm 2 nm 3 nm 4 nm 5 nm 6 nm
Del
ay (s
)
CD Mean Difference
Best case: Large CD groupWorst case: Large CD groupBest case: Small CD groupWorst case: Small CD groupBest case: Pooled CDWorst case: Pooled CD
CD mean difference
Large CD group
Small CD group
11abk ISVLSI 120820
Impact of Bimodality on Path Delay• Bimodality can help reduce path delay variation
Reduction of covariance when colorings are mixed
C12 C12 C12 C12
C12 C21 C12 C21
++
++
+-
+-
+4
0
Variation () is accumulated
Variation () is compensated0
5
10
15
20
25
0 1 2 3 4 5 6CD Mean Difference (nm)
Uniform
Alternate
Sigm
a / M
ean
(%)
SPICE Simulation Results
12abk ISVLSI 120820
Impact of Bimodality on Clock Skew • Different coloring sequences in a clock network
Clock skew between launch and capture
Same color on all clock buffers is better!
Case Source to Sink A Source to Sink B1 C12+C12+C12+…+C12 C12+C12+C12+…+C122 C12+C12+C12+…+C12 C21+C21+C21+…+C21
0.00E+00
1.00E-11
2.00E-11
3.00E-11
4.00E-11
5.00E-11
6.00E-11
0nm 1nm 2nm 3nm 4nm 5nm 6nm
CD mean difference
Case2
Case1
Clo
ck s
kew
(s)
13abk ISVLSI 120820
Bimodal CD Distribution: 3 Key Facts1. Design requires bimodal-aware timing models
• Unimodal representation is too pessimistic
2. Data paths benefit from alternate (mixed) coloring
• Exploit existence of two uncorrelated CD populations
• Minimize correlated variations in a given path
3. Clock paths benefit from uniform coloring
• Correlated variation between launch and capture paths
minimizes bimodality-induced clock skew
Design can exploit both correlated, uncorrelated variations
14abk ISVLSI 120820
DPL Layout-to-Mask Flow
RTL-to-GDS
DPL Mask Coloring
Bimodal-AwareTiming Analysis
Maximization ofAlternate Coloring
(Datapaths)
Optimization 1
Alternate coloringusing integer-linear programming
Placement Perturbationfor Color Conflict Removal
(Clock and Data paths)
Optimization 2
Coloring conflict > Minimum resolution
Placement perturbation usingdynamic programming
15abk ISVLSI 120820
Overall Timing Improvement• Bimodal timing model Reduce pessimism (margin)• Alternate coloring Improve timing• Placement perturbation Remove conflicts
Stage #Conflict TimingMetric
Mean CD Difference2nm 4nm 6nm
Initial Coloring(Unimodal) 0
WNS (ns) -1.113 -2.016 -2.902TNS (ns) -671.1 -1776.3 -3348.5
Initial Coloring(Bimodal) 0
WNS (ns) -0.191 -0.354 -0.527TNS (ns) -8.17 -26.56 -64.64
AlternativeColoring 219
WNS (ns) -0.090 -0.145 -0.267TNS (ns) -1.48 -3.85 -22.40
Detailed Place(+ECO Routing) 0
WNS (ns) -0.104 -0.183 -0.295TNS (ns) -3.43 -10.45 -28.42
Bimodality impact (and naïve margins) can be effectively mitigated!
16abk ISVLSI 120820
Futures: VERY High-Dimensional Modeling• Combinatorial explosion of libraries (signoff corners)
• Ptr {SS, NN, SF, FS, FF}, Vth {LVT, RVT, HVT}, Lgate {-10, 0, +10, …}• Pint {R, C} x {min, max} x {Layers: 1, 2,..., 8}• T {-40, 0, 25, 85, 125}• V {0.7, 1.0} x { -10%, 0%, +10%}• Aging {5 years, …, 10 years}
• Many more corners with multi-patterning, misalignment
• Traditional mitigations• Designers: buy bigger disks, mix gba / pba, suffer, …• EDA companies: apply interpolation
• Synopsys CCS library format: interpolate between corners• Solido Fast PVT: adaptively seek worst-case corner; predict non-simulated
corners
Coming reasons for “high dimensional modeling”: near-threshold operation, 3D integration, UTBB SOI, …
• > 1000 corners• Library data volume• Validation runtime
Normal coupling more coupling less coupling
17abk ISVLSI 120820
Futures: DfS (Stress)• Thermomechanical stress effects, e.g., in TSV-based 3DIC
• Thinned Si substrates (<50um), bumps/pillars, temperature gradients
• Design for Stress (DfS)
time
DfM: model stress effects at time zero
DfR: model stress effects at through time
Stress-driven failure mechanisms:(1) Mechanical integrity failure
cracking, fracturing, fatigue ...(2) Electrical performance failure
change in carrier mobility
What will “stress signoff”
look like? TSV Layout Automation TSV Stress Simulation
Samsung Electronics, “3D TSV Technology & Wide I/O Memory Solutions”, DAC 201218abk ISVLSI 120820
Margin, DfX and Signoff: Mindsets
• Be Reactive
• Be Adaptive: Use “sense and adapt” to recover quality lost by margin and overdesign
• Be Proactive
• Be Predictive
19abk ISVLSI 120820
Adaptive Voltage Scaling (AVS) Approaches
Open Loop AVS
Closed-Loop AVS
Error Detection System
Freq. & Vdd LUT
Post-silicon characterization
Generic monitor
Pow
er
Design dependent replica
In-situmonitor
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07, Drake08, Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06, Fick10]
Error detection and correction system Vdd scaling until error occurs [Das06,Tschanz10]
Error Tolerance
AVS
approachesAVS classes
Different approaches deliver different tradeoffs among power saving, design complexity, area etc.
• AVS is used to recover the power part of overdesign
2020abk ISVLSI 120820
Process-aware Voltage Scaling (PVS) [ICCAD-2012]
• Monitor design considerations• Critical path maybe difficult to be
identified (IP from 3rd party)• Multiple modes/voltages Fmax calibration
takes long test time
• Proposal: tunable monitor• Design monitor to guardband for
arbitrary circuit (overdesign)• Tune monitor based on Fmax of sample
chips to recover design margin (calibrate only once)
• Abstract voltage scaling property instead of matching critical path• Enable analysis of worst-case voltage
scaling
PVS RO+SoC
Without Fmax of sample chips
With Fmax of sample chips
Store target frequency and RO configurations in a ROM
Configure RO for worst‐case
Configure RO so that all sample
chips meets timing
Closed-Loop AVS
2121abk ISVLSI 120820
Voltage Scaling Properties• Vmin= Minimum Vdd to meet timing constraints
= process distance/scaling rate• Process distance: process-induced frequency shift
relative to target frequency• Scaling rate: frequency shift for a unit voltage difference
V
FF
TT
SS
k
targetf
)(kVmin_path nomV VVnom
Process distance
Freq.
Scaling rate =Vf
2222abk ISVLSI 120820
PVS Monitor Design Concept• RO is used as a reference for voltage scaling• Design ROs with the worst case voltage scaling
properties guardband for arbitrary circuits
• A circuit meets its timing when
• Design challenges• Vmin_ro > Vmin of any data path across all process conditions
V
FF
TT
SSk
targetf
)(kVmin nomV VVnom
Freq.
m in _ ro m in _ p a th1 1m a x ( ( , ) ) m a x ( ( , ) )
m n
i jV i k V j k
Maximum of m ROs Maximum of n paths
2323abk ISVLSI 120820
0.500
0.600
0.700
0.800
0.900
1.000
1.100
INVX0 NAND2X0 NAND3X0 NAND4X0 NOR2X0 NOR3X0 NOR4X0
Vmin(V)
Cell type
SS TT FF SF FS
Vmin Analysis• Key observation: Vmin is bounded by NMOS or PMOS
dominated cells (e.g., NOR3 at FS corner)Use NAND, NOR type ROs
2424abk ISVLSI 120820
Design RO with Tunable Vmin• Identified two circuit knobs to tune Vmin
• Series resistance• Cell types (INV, NAND, NOR)
• Example circuit strategy• Allow tuning of series resistance of each stage to high or low• Different cell types cover different process corners
1 bit 1 bit 1 bit
Control pins
High resistance
Low resistance
2525abk ISVLSI 120820
PVS Experiment Result
• Default setting: low resistance in all stages Vmin_est – Vmin_chip = 13mV on average (guardband for worst-case)
• With Fmax information per die, can tune RO configuration to drive Vmin_est – Vmin_chip 0
• Better on-chip sensing and adaptation more reduction of runtime power overheads (Vdd)
More aggressive scaling
Min margin
26
65nm, OpenSPARC T1 moduleMonte Carlo SPICE simulation
26abk ISVLSI 120820
Futures: Integration of Adaptivity With Signoff• How can we make signoff aware of adaptation?
• What is the signoff corner – or even the meaning of “worst-case signoff” – for an adaptive, self-throttling circuit?
• (Will discuss this in a few minutes…)• Adaptivity 2.0
• Principled tradeoffs among adaptation approaches (open-loop, closed-loop, error-tolerant, …)
• Principled tradeoffs among sensor designs (generic, design-specific, tunable, in-situ, number of copies, spatial location, sensor fusion, …) within a “sense and adapt” paradigm
• Holy Grail: “sign off at typical”
27abk ISVLSI 120820
Margin, DfX and Signoff: Mindsets
• Be Reactive
• Be Adaptive
• Be Proactive: Reduce margins and achieve greater value by leveraging design/system information [= “exploit available information”]
• Be Predictive
28abk ISVLSI 120820
• Multi-mode operation multi-mode signoff • Example: nominal mode and overdrive mode
• Selection of signoff modes affects area, power• Mode = {voltage, frequency}
Improve performance, power, or area Reduce overdesign (margin)
NOMOD
NOMOD
time
Vdd
tnom tOD tnom tOD
85
87
89
91
93
95
97
99
1.03 1.05 1.07 1.09 1.11 1.13 1.15 1.17
Power (m
W)
Overdrive Voltages (V)
1. Signoff in Presence of … Overdrive Mode
14%
Be proactive: optimize the overdrive mode
29abk ISVLSI 120820
• Design cone• Solution space for multi-mode signoff• Defined by tradeoff between frequency
and voltage
LVT
HVT
voltage
freq
A
The design cone of mode A
A Baby Step: How Does Overdesign Occur?
30abk ISVLSI 120820
LVT
HVT
B
voltage
freq
LVT
HVT
A
Mode A is the dominant mode
A Baby Step: How Does Overdesign Occur?• Design cone
• Solution space for multi-mode signoff• Defined by tradeoff between frequency
and voltage• Dominance of modes
• The mode with tighter timing constraints is the dominant mode
31abk ISVLSI 120820
LVT
HVT
B
voltage
freq
LVT
HVT
A
Mode A and B exhibit equivalent dominance
A Baby Step: How Does Overdesign Occur?
Lemma: Multi-mode signoff at modes which do not exhibit pairwise equivalent dominance leads to overdesign
Guideline: signoff modes should exhibit equivalent dominance
• Design cone• Solution space for multi-mode signoff• Defined by tradeoff between frequency
and voltage• Dominance of modes
• The mode with tighter timing constraints is the dominant mode
• Equivalent dominance• No mode is dominated by another mode• Modes are in each other’s design cones
32abk ISVLSI 120820
Signoff Mode Optimization• Problem: Given the nominal mode, search for the
best possible overdrive mode to maximize overdrive performances.t. average and peak power satisfy predefined constraints
A
voltage
frequency
Nominal Mode
Vnom
fnom (1) Signoff
(2) Scaling
Overdrive Mode
VOD
fOD
A
voltage
frequency
Nominal Mode
Vnom
fnom
Search within the design cone
Overdrive Mode
A Better Flow• Search within the design cone • Implement multi-mode design
Today: “Signoff & Scale”• Sign off at nominal mode• Scale the voltage to increase frequency
until the power constraint is hit
33abk ISVLSI 120820
Experimental Results
Signoff & Scale Proposed Flow Exhaustive Search
fOD (MHz) 711 764 768VOD (V) 1.14 1.14 1.15
Area (µm2) 31029 32016 32020POD (mW) 49.13 49.14 49.76Pavg (mW) 21.73 20.90 20.24
• Better signoff mode selection improves performance by 7%• Flow requires about 25% runtime compared to exhaustive search with
similar area (-0.01%), power (+3%) and performance (-0.5%)
34abk ISVLSI 120820
2. Signoff in Presence of … Adaptivity• NBTI and PBTI degrade circuit performance over lifetime
• |Vth| of transistor increases when it is on (stress)• Part of the |Vth| increment is recovered when transistor is off
• Degradation is a function of circuit activity, unknown at design time
• How much margin (library derating) needed for aging at design signoff?
Degradation vs. Assumptions
Adaptive VddAdaptive Vdd
Adaptive Vdd
Signal probability AC DC
Max Vdd
Max Vdd
Max Vdd
Transistor stress time
Degradation Signal probability aware aging model
• Difficult to obtain accurate signal probabilityAC stress aging model• Does not guarantee worst-caseDC stress aging model• Worst-case but pessimistic
Be proactive: optimize margin in signoff corner = “How to find the correct 10-year timing library?”
Degradation with adaptive Vdd < with Vmax !
35abk ISVLSI 120820
?
?
?
Quantifying Design Margin• Quantify degradation under
adaptive Vdd
• Calculate expected degradation using aging-aware STA [Wang09]
• Require design changes if degraded timing fails to meet timing requirements• Time-consuming design loop!
• Need a better signoff flow
Vdd = Vdd + Vdd
N
Y
Set Ftarget and Vinit
Fmax > Ftarget ?
Aging-aware STA
time = lifetime?
time =time +time
Y
Y
Fmax > Ftarget ? Sizing andReset time =0
Vfinal , Vth
N
36abk ISVLSI 120820
Heuristics for Characterizing Derated Library • Vfinal: Vdd at end of life with AVS• Observation 1: Degradation
with a static Vfinal is similar to degradation of adaptive Vdd• Approximate adaptive Vdd by
assuming a static Vfinal
• Vfinal is not available at early design stage (design has not been implemented)
• Observation 2: Vfinal is not sensitive to circuits • Use INV chain to calculate Vfinal
Ftarget (normalized)1.00 0.99 0.98 0.97 0.96
Vfinal<1%
37abk ISVLSI 120820
Aging-Aware Signoff Accounting for AVS• Proposed method:
• Step 1: Simulate INV chain with AVS+aging to obtain Vfinal
• Step 2: Characterize derated library with Vfinal
• Step 3: Implement circuits with derated library
• Experiment • Reference: aging-aware STA+AVS
design and signoff flow• Propose method: use derated
library obtained from heuristics• Others: use derated libraries with
various derating setups
Circuit design
Calculate Vfinal with an INV chain
Characterize derated library with Vfinal
Fmax > Ftarget ? Sizing
Done
N
Vinit , Ftarget
Y
Step 1
Step 2
Step 3
38abk ISVLSI 120820
Experimental Results (Aging Signoff)
• 32nm technology• DC degradation @ 125C• Accurate Vfinal estimation
reduces design overhead
Vdd (V) Vmax=1.1V Vfinal=0.99V Adaptive V’final=0.98V
1 2 3 4 5 6 7 (reference) 8 (proposed)
Slack at signoff 23.3 39.1 32.9 3.2 25.4 0.3 2.5 0.7Slack at 10 years -117.0 3.5 -53.2 10.8 2.2 16.1 11.9 8.1
Area (um2) 95% 97% 96% 98% 99% 105% 100% 100%Vdd at 10 years (V) 1.10 1.03 1.10 1.04 0.98 0.91 0.99 0.99
Average Power (mW) 124% 108% 128% 105% 98% 84% 100% 100%
Optimistic signoff corners fail to meet timing at end of life (Vmax = 1.1V)
Pessimistic signoff corner area penalty
Good signoff corners
Avoid 20% extra power or 5% area overhead
39abk ISVLSI 120820
Futures: What If We Knew …
• Scenarios and duty cycles?• Workloads?• Accuracy requirements?• Lifetime requirements?
Huge opportunities for proactivityand margin reduction
40abk ISVLSI 120820
What If We Knew … (scenarios, duty cycles) Dynamic Voltage Freq. Scaling
• DVFS allows adaptation to workloads & operating conditions
• Multi-Mode (or DVFS) design operates at multiple power/performance points with different lifetimes
1.0V, 1GHz(e.g., talk mode)
0.7V, 100MHz(e.g., standby mode)
• Conventional EDA tool: require constraints (freq., voltage) before implementation (which constraints will provide minimum energy?)
• Replication: Create replicas that target each performance mode(Replication incurs a large area overhead)Be proactive: use scenario/duty cycle information for multi-mode optimization
41abk ISVLSI 120820
Implementation Results (OpenSPARC T1)
• Context-aware design shows up to 19.5%, 7.6% (avg.) energy reduction over conventional multi-mode design
• Replication-based design shows up to 25.4%, 9.1% (avg.) energy reduction over conventional multi-mode design
• Selective-replication design
FFU module has 12%energy savings through selective-replication
multi-mode design
Layout results (OpenSPARC/FFU)
16% power reduction with 10% area overhead (R=1%)
0%
4%
8%
12%
16%
0% 10% 20% 30%
Ener
gy R
educ
tion
Allowable Area Overhead
Duty Cycle (R) = 1%
R = 5%
R = 10%
42abk ISVLSI 120820
What If We Knew … (switch activity from workloads)
Error‐Tolerant Design
CPU, heal thyself ...
Errors are detected and corrected with redundancy technique
Problem: • Many paths have near‐critical slack → wall of (critical) slack
• Scaling beyond the critical operating point causes massive errors that cannot be corrected
Frequently‐exercised paths: upsize cellsRarely‐exercised paths: downsize cells
Be proactive: reshape slack distribution for gracefully increasing error rate
Num
ber o
f pat
hs
[ASPDAC 2010, DAC 2010, TCAD 2012 ]
43abk ISVLSI 120820
What If We Knew … (accuracy requirements)
Problem: • Accuracy requirement can change during runtime benefits of approximation could be reduced
Be proactive: adapt to changing requirements with runtime accuracy configuration
[DAC 2012]“accuracy‐configurableapproximate adder”
Approximate DesignWhat is the square root of 10 ?
“a little more than three”
“3.162278...”Approximation could be faster and more powerful
higher accuracylower power consumption
44abk ISVLSI 120820
What If We Knew … (Lifetime (MTTF) Reqts)
AF (α)
Jrms
Temp
Wire width
MTTF
Driver size
A B Inverse relation; if A increases then B decreases
A BDirect relation; if A increases then B increases
Supply voltage
Timing slack
|Vthp |
Wire spacing
TDDB
TDDB
EM
EM
Freq.|Vthn |
Slew rate
Load/fanout
Gate length
Junction resistance
EM, TDDB, NBTI, HCI
HCINBTI
HCIHCI
HCI
HCI
HCI
HCI
NBTI
Tunable at design or runtime
Tunable at design
general
general
general
generalgeneral
general
general
general
generalgeneral
general
general
general
general
general
general
general
HCI
HCI
NBTI
45abk ISVLSI 120820
Potential Impacts of Relaxed EM Signoff
[Black’s Equation]
default2 2rms_limit rms_default
require
MTTF(I ) (I )MTTF
• Guardbanding for EM affects area and performance
• How much gain in Fmax with relaxed EM guardband?• What about area?
Freq. Area
Be proactive: optimize design tradeoffs with relaxed EM requirement
46abk ISVLSI 120820
MTTF vs. Fmax
• Fmax increases with relaxing MTTFrequire• Up to +60% of Fmax for -30% of MTTFrequire
• Fmax improvement is determined by • Mix of cell sizes• Length and timing constraints of critical paths
0%
20%
40%
60%
80%
100%
10 9 8 7 6 5 4 3 2 1
% in
crea
se o
f Fm
ax DMA AES JPEG
-30% of MTTFrequire= +60% of Fmax
• 65nm technology• Fixed area
47abk ISVLSI 120820
How to Best Recapture Margin in PD?
∆Fmax / ∆area
NDRs Reducefanouts
Downsizedrivers
AES 5.06 1.54 1.23
JPEG 0.00 0.00 0.00
DMA 1.75 0.17 0.00
• Fmax increases with Irms_limit
• What is best lever to trade area for Fmax?• Non-default (spacing) rules (NDRs)• Reduce fanouts• Downsize drivers
Fmax Area
Irms_limit
Expt. setup: -30% of MTTFrequire (10 years 7 years)
48abk ISVLSI 120820
Margin, DfX and Signoff: Mindsets
• Be Reactive
• Be Adaptive
• Be Proactive
• Be Predictive:
?
49abk ISVLSI 120820
Futures for Prediction (just 3)
• Discover and feed the “available information” about future systems and technology: Pathfinding
• Build accurate, predictive, high-dimensional models: Machine Learning, Model-Building
• Change the envelope of design (= what has to be predicted): Optimization
50abk ISVLSI 120820
Pathfinding• System-Technology co-exploration, co-optimization
• E.g., for 3D IC
See: ABK Semicon West TExpoT, July 2009; Dr. Riko Radojcic, EDPS April 201251abk ISVLSI 120820
Pathfinding• System-Technology co-exploration, co-optimization
• E.g., for 3D IC
52abk ISVLSI 120820See: ABK Semicon West TExpoT, July 2009; Dr. Riko Radojcic, EDPS April 2012
Models = Predictors: Better Modeling Science
Regression
Non-parametric regression
Latin Hypercube Sampling (LHS)
Adaptive Sampling
ORION NoC Power and Area Modeling
High-Dimensional Modeling: Cell Delay with voltage noise and body biasing (ongoing)
Across-field systematic variation (STMicro 28nm) for DoseMap correction
CACTI Off-Chip Memory Model
53abk ISVLSI 120820
Example Benefit From Improved Sampling%
Wor
st-c
ase
(WC
) es
timat
ion
erro
rs
0%
5%
10%
15%
20%
25%
RBF KG MARS RBF KG MARS
LHS Adaptive
‐15%‐10%
• Sampling strategies affect modeling overheads and accuracies• Adaptive Sampling reduces worst-case estimation error by
10% to 15% compared to Latin Hypercube Sampling• Results are consistent for different non-parametric modeling
approaches
54abk ISVLSI 120820
• Example: sizing optimization• Optimization of power, delay and area• Knobs: gate-width, gate-length, Vth• Fundamental to all phases of RTL-to-GDS flow
Change What Is Possible: Optimizers
• Common heuristics/algorithms
Continuousmethods
Discretemethods
Linear programming Convex optimization
Lagrangian relaxation
Dynamic programming Sensitivity-based sizing
Optimality Scalability
55abk ISVLSI 120820
• GTR: seek violation-free solutions w/ sensitivity =
• PRFT: iteratively reduce total leakage power w/ sensitivities =
• Multi-start w/ different parameters and “Go-with-the-winners”
New Sensitivity-Guided Metaheuristics [ICCAD-2012]
α: leakage exponent, γ: % of upsizing sensitivity functions
Global Timing Recovery (GTR)
Coarse Search
Fine‐grain Search
Multistart
Power Reduction with Feasible Timing (PRFT)
Sensitivity‐guided Greedy Sizing
Perturbing (upsizing) Bottleneck Cells
( SF1, SF2, … ) ( )γ
α γ( , ) α γ( , ))( leakage
TNS TNS: total
negative slack
/
/ ( # )/ #/ ( # )
leakage delayleakage slackleakage delay pathleakage slack pathleakage slack delay path
TNS & leakage during GTR & PRFS
GTR PRFT
perturbing56abk ISVLSI 120820
Leakage Reduction [ICCAD-2012]
• Leakage (normalized) comparison on ISPD 2012 benchmarks
Contest best: best of all entries in the competition (ISPD 2012 contest)Intel Labs (contest organizer) released five (near-optimal) resultsISPD 2012 contest: http://archive.sigda.org/ispd/contests/12/ispd2012_contest.html
• UCSD-UM Trident sizer: best-reported results on all but one of ISPD 2012: 43% further reduction over contest winner
• Outperforms Intel results on four out of five available cases[UCSD-UCLA open source sizing page available (contact me)]
0.8
1.3
1.8
2.3
2.8
0.8
0.9
1
1.1
1.2
1.3 Intel LabsGTR+PRFTContest best
57abk ISVLSI 120820
Conclusions• At today’s leading edge, 20% power, 20% speed and 20%
area are a “good” return on $billions in technology development cost
• Better management of DfX and Signoff can return entire technology nodes of value …
… and Margin is the key• We will need to be:
• Reactive – manage impacts of new variation sources• Adaptive – mitigate the consequences of pessimism• Proactive – exploit application and design knowledge whenever
possible• Predictive – pathfinding of future systems and technology, science
of modeling, optimization to change the envelope• Many Challenges, Opportunities to deliver high value!
58abk ISVLSI 120820
Acknowledgments• Former students: Igor Markov, Puneet Gupta,
Kwangok Jeong, Chul-Hong Park• Current students: Tuck-Boon Chan, Seokhyeong
Kang, Siddhartha Nath, Vaishnav Srinivas, Jonas Chan, Jiajia Li
• Current/recent visitors: Bong-Il Park (Samsung), Thierno Diallo (ST)
• Work supported by SRC, NSF, MARCO, IMPACT Center (UC Discovery), companies
59abk ISVLSI 120820
THANK YOU
BACKUP
Another Leakage Optimization: DoseMap• ASML DoseMapper technology
• Adjusts exposure dose to improveCD uniformity
• Compensate for CD error induced byAcross-Chip, Across-Wafer Linewidth Variation
• Dose sensitivity• Linewidth has an approximately linear relationship
with the exposure dose• Dose sensitivity (DS): -2nm/%
• Idea: design-aware modulation of the exposure dose!• Increase dose Decrease gate CD of timing critical device More speed
• Decrease dose Increase gate CD of non-timing critical device Less leakage
• Our results: +8% frequency with zero leakage penalty• 28nm silicon experiments with ST-Crolles
Slit profile
Dose map
Adjust exposure dose
Scan direction[DAC08]
Superpose Leakage Reduction, CDU
Trad. DoseMap: same CDs
Improve global CD uniformity (CDU) achieve same gate CD for all devices
Does not address device yield improvement
No design awareness
Device on setup-timing critical path larger dose faster-switching transistors
Non-critical (or, hold-timing critical) device smaller dose less leaky transistors
Improve timing yield with noleakage penalty: “for free”
Our DoseMap: different CDs
[DAC08]
References (1)• [Burd00] T. D. Burd, T. A. Pering, A. J. Stratakos and R.W. Brodersen, “A Dynamic Voltage Scaled Microprocessor
System”, IEEE JSSC, 35(11) (2000), pp. 1571-1580.• [Chan12] T.-B. Chan, P. Gupta, A. B. Kahng and L. Lai, "DDRO: A Novel Performance Monitoring Methodology
Based on Design-Dependent Ring Oscillators", Proc. ISQED, 2012, pp. 633-640.• [Das06] S. Das, D. Roberts, S. Lee, S. Pant, D. Blaauw, T. Austin, K. Flautner, and T. Mudge, “A Self-Tuning DVS
Processor using Delay-Error Detection and Correction,” IEEE JSSC, 2006, pp. 792-804.• [Drake08] A. Drake, R. Senger, H. Singh, G. Carpenter and N. James, “Dynamic Measurement of Critical-Path
Timing”, Proc. IEEE International Conference on Integrated Circuit Design and Technology and Tutorial, 2008, pp. 249-252.
• [Elgebaly07] M. Elgebaly and M. Sachdev, “Variation-Aware Adaptive Voltage Scaling System”, IEEE Transactions on Very Large Scale Integration Systems 15(5) (2007), pp. 560-571.
• [Fick10] D. Fick, N. Liu, Z. Foo, M. Fojtik, J.-S. Seo, D. Sylvester and D. Blaauw, “In Situ Delay-Slack Monitor for High-Performance Processors Using An All-Digital Self-Calibrating 5ps Resolution Time-to-Digital Converter”, Proc. International Solid State Circuits Conference, 2010, pp. 188-189.
• [Hartman06] Hartman “PowerWise Adaptive Voltage Scaling Minimizes Energy Consumption”, http://www.ti.com/lit/wp/snvy006/snvy006.pdf
• [Martin02] S. M. Martin, K. Flautner, T. Mudge, and D. Blaauw, “Combined dynamic voltage scaling and adaptive body biasing for lower power microprocessors under dynamic workloads”, Proc IEEE/ACM ICCAD, 2002, pp. 721-725.
• [Tschanz03] J. W. Tschanz, S. Narendra, R. Nair and V. De, “Effectiveness of Adaptive Supply Voltage and Body Bias for Reducing Impact of Parameter Variations in Low Power and High Performance Microprocessors”, IEEE JSSC (38)5 (2003), pp. 826-829.
• [Tschanz10] J. Tschanz, K. Bowman, S.-L. Lu, P. Aseron, M. Khellah, A. Raychowdhury, B. Geuskens, C. Tokunaga, C. Wilkerson, T. Karnik, V. De, “A 45nm resilient and adaptive microprocessor core for dynamic variation tolerance,” IEEE ISSCC, 2010, pp.282-283,
• [Wang09] W. Wang, S. Yang and Y. Cao, “Node Criticality Computation for Circuit Timing Analysis and Optimization under NBTI effect”, Proc. ISQED, 2009, pp. 763-768.
• [Wu08] S. H. Wu, A. Tetalbaum, L.-C. Wang, “How Does Inverse Temperature Dependence Affect Timing Sign-Off”, Proc. IEEE Intl. Conf. on Integrated Circuit Design and Technology Tutorial, 2008, pp. 297-300.
64abk ISVLSI 120820
References (2)• [JeongKS08] K. Jeong, A. B. Kahng and K. Samadi, "Quantified Impacts of Guardband Reduction on Design
Process Outcomes", Proc. International Symposium on Quality Electronic Design, 2008, pp. 790-897.• [Synop] http://www.synopsys.com.cn/information/snug/2011/signal-electro-migration-analysis-and-fixing-research-
in-ic-compiler-2• [ITRS] http://www.itrs.net/
65abk ISVLSI 120820
Lifetime with 1000-Hour HTOL Test• Is qualifying for HTOL (High temperature operating life) test a stronger
constraint on signoff than reducing MTTF?• We assume HTOL temperature is at least 40˚C more than nominal EM signoff
temperature
• HTOL qualification can be a stronger constraint on signoff than reducing MTTF to 3 years or below.
• At nominal temperature of 75˚C, HTOL qualification at, e.g., 115˚C is equivalent to design for EM MTTF of 3 years.
• At nominal temperature of 105˚C, HTOL qualification at, e.g., 145˚C is equivalent to design for EM MTTF of 2 years.
05
1015202530354045
115 125 135 145 155 165 175 185 195 205 215 225 235
Equivalent M
TTF (years)
HTS Temp (˚C)
Nominal @ 105Nominal @ 75Nominal @ 125
75+40 105+40 125+40
105+60
125+60
66abk ISVLSI 120820