slack redistribution for graceful degradation under voltage overscaling
DESCRIPTION
Slack Redistribution for Graceful Degradation Under Voltage Overscaling. Andrew B. Kahng † , Seokhyeong Kang † , Rakesh Kumar ‡ and John Sartori ‡ † VLSI CAD LABORATORY, UCSD ‡ PASSAT GROUP, UIUC. Outline. Background and motivation Voltage scaling and BTWC designs - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/1.jpg)
UCSD VLSI CAD Laboratory and UIUC PASSAT Group - ASPDAC, Jan. 21, 2010
Slack Redistribution for Graceful Degradation Under Voltage Overscaling
Slack Redistribution for Graceful Degradation Under Voltage Overscaling
Andrew B. Kahng†, Seokhyeong Kang†, Rakesh Kumar‡ and John Sartori‡
†VLSI CAD LABORATORY, UCSD‡PASSAT GROUP, UIUC
![Page 2: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/2.jpg)
(2/25)
Outline• Background and motivation
• Voltage scaling and BTWC designs• Limitation of Traditional CAD Flow
• Power-Aware Slack Redistribution• Our design optimization goal• Related work: BlueShift• Our Heuristic
• Experimental Framework and Results• Design methodology• Testbed• Results and analysis
• Conclusions and Ongoing Work
![Page 3: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/3.jpg)
(3/25)
Reducing Power with Voltage Scaling• Power is a first-order design constraint• Voltage scaling can significantly reduce power• Voltage scaling may result in timing violations
• Voltage scaling is limited because of timing errors
Pow
er
(lower voltage)Voltage
Timing errors begin to occur
![Page 4: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/4.jpg)
(4/25)
CPU, Heal Thyself...
Better-Than-Worst-Case Design• Better-Than-worst-case (BTWC) design approach
• Optimize for normal operating conditions• Trade off reliability and power/performance• Have error detection/correction mechanism (e.g., Razor*)
* Ernst et al. “Razor: A low power pipeline based on circuit-level timing speculation”, Proc. MICRO 2003.
Traditional IC design BTWC design
• Does not allow timing errors in STA
• Error correction architecture allows timing errors
• Fixed target frequency and operating voltage
• Overclocking or voltage overscaling
• BTWC design allows tradeoffs between reliability and power
![Page 5: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/5.jpg)
(5/25)
v
PE(v
)
v
pwr(
v)
(lower voltage)
Voltage Scaling with Error Correction
• Error correction incurs power overhead
PE(v) : Error rate at voltage vpwr(v) : Power consumption at v
Minimum powerat point b
Voltage v
A
B
Voltage v
A
B
• Overscaling is possible for Better-Than-Worst-Case designs
![Page 6: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/6.jpg)
(6/25)
Limitations of Traditional CAD Flow• Conventional designs exhibit
critical operating points• Many paths have near-critical
slack → wall of (critical) slack• Scaling beyond COP causes
massive errors that cannot be corrected
• Conventional designs fail critically when voltage is scaled down
• Error rate should be increased gracefully :
gradual slope slack
‘wall’ of slack
Num
ber o
f pat
hs
Timing slackZero slack
Erro
r rat
eLower voltage
Higher frequency
COP
![Page 7: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/7.jpg)
(7/25)
Outline• Background and motivation
• Voltage scaling and BTWC designs• Limitation of Traditional CAD Flow
• Power-Aware Slack Redistribution• Our design optimization goal• Related work: BlueShift• Our Heuristic
• Experimental Framework and Results• Design methodology• Testbed• Results and analysis
• Conclusions and Ongoing Work
![Page 8: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/8.jpg)
(8/25)
Our Design Optimization Goal• Problem: Minimize power for a given error rate • Goal: Achieve a ‘gradual slope’ slack distribution• Approach: Frequently-exercised paths: upsize cells
Rarely-exercised paths: downsize cells
• We make a gradual slope slack distribution
‘gradual slope’ slack
‘wall’ of slack
Num
ber o
f pat
hs
Timing slackZero slack after voltage scaling
Rarely exercised
paths
Frequently exercised
paths with gradual failure characteristic
![Page 9: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/9.jpg)
(9/25)
Related Work: BlueShift
• BlueShift speed up• Paths with the highest frequency of timing errors• FBB (forward body-biasing) & Timing override
* Grescamp et al. “Blueshift: Designing processors for timing speculation from the ground up”, HPCA 2009
• BlueShift* : maximize frequency for a given error rate
Computeerror rate
ER < TargetGate-level simulation
YES
NO Speed up paths
Finish
• Limitation• Repetitive gate level simulation – impractical• Design overhead of FBB
• BlueShift is impractical with modern SOC designs
![Page 10: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/10.jpg)
(10/25)
Our Heuristic
Set initial voltage
Error rate estimation
ER < ERtargetOptimize Paths
Voltage scalingYES
NO
PowerReduction
Finish
• Optimize slack distribution by cell swaps, exploiting switching activity information
• Iteratively scale target voltage the until error rate exceeds a target, and optimize negative slack paths
• Our heuristic: Voltage scaling → Optimize paths → Power reduction
![Page 11: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/11.jpg)
(11/25)
Heuristic Implementation – Voltage Scaling
Path A
Path B
Path C
Negative Slack of Path A at the target voltage
Nominal voltage
Target voltage (fixed)
Target voltage (fixed)
Actual voltage at the target error rate
Unnecessary cell sizing
Path A
Path B
Path C
Negative Slack of Path A at the target voltage
Nominal voltage
Target voltage with the estimated error rate
Find target voltage and optimize iteratively
New target voltage
Set initial voltage
Error rate estimation
ER < ERtargetOptimize Paths
Voltage scalingYES
NO
PowerReduction
Finish
• Optimize with fixed target voltage• Lower voltage incrementally• Load a pre-characterized library at each voltage point• With iterative voltage scaling, we can find minimum
operating voltage
![Page 12: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/12.jpg)
(12/25)
Heuristic Implementation – Optimize Paths
• Main idea: increase slack of frequently-exercised paths in order of increasing switching activity
• Procedure1. Pick a critical path p with maximum switching activity2. Resize cell instance ci in p3. If slack of path p is not improved, cell change is restored4. Repeat 2. ~ 3. for all cell instances in path p 5. Repeat 2.~ 4. for all critical paths
Set initial voltage
Error rate estimation
ER < ERtargetOptimize Paths
Voltage scalingYES
NO
PowerReduction
Finish
• OptimizePaths procedure reduces error rates and enables further voltage scaling
![Page 13: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/13.jpg)
(13/25)
Heuristic Implementation – Power Reduction
• Main idea: Downsize cells on rarely-exercised paths in order of decreasing toggle rate
• Procedure1. Pick a cell c with minimum toggle rate2. Downsize cell c with logically equivalent cell3. Incremental timing analysis and check error rate4. If error rate is increased, cell change is restored5. Repeat 1. ~ 4.
Set initial voltage
Error rate estimation
ER < ERtargetOptimize Paths
Voltage scalingYES
NO
PowerReduction
Finish
• PowerReduction procedure reduces power without affecting error rate
![Page 14: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/14.jpg)
(14/25)
Heuristic Implementation – Error Rate Estimation
CLK
XA X
f/f f/f
P1
P2
P3 P1 P1 P1P2 P2 P3
TG(P1) = 0.3
TG(P2) = 0.2
TG(P3) = 0.1
Slack(P1) = postive
Slack(P2) = negative
Slack(P3) = positive ER(X) = TG(X) ∙
TG(X) = 0.6
Timing Error
TG(P2)
TG(P1) + TG(P2) + TG(P3) = 0.2
Set initial voltage
Error rate estimation
ER < ERtargetOptimize Paths
Voltage scalingYES
NO
PowerReduction
Finish
• Error rate contribution of one flip-flop
• Error rate of an entire design
α : compensation parameter
ALL
NEG
P
Pffff TG
TGTGER
Dff
ffD ERER
• We estimate error rates without functional simulation
• Error rate estimation: use toggle rate from SAIF(Switching Activity Interchange Format)
![Page 15: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/15.jpg)
(15/25)
Erro
r Rat
e
(lower voltage)
Maximum error rate
Power consumption
Pow
er
Error rate
Operating point
Vmin
Pmin
2. ReducePower
Erro
r Rat
e
Vmin
Pmin
(lower voltage)
Maximum error rate
Power consumption
Pow
er
Error rate
Operating point Er
ror R
ate
(lower voltage)
Maximum error rate
Power consumption
Pow
er
Error rate
Operating point
Vmin
Pmin
1. OptimizePaths
Power Reduction Through Slack Redistribution• Power consumption @BTWC
• Minimum power Pmin is obtained at minimum operating voltage Vmin
1. OptimizePaths• Minimize error rate• Enable to scale
voltage further
2. ReducePower• Downsize cells • Obtain additional power
reduction
2
1
![Page 16: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/16.jpg)
(16/25)
Outline• Background and motivation
• Voltage scaling and BTWC designs• Limitation of Traditional CAD Flow
• Power-Aware Slack Redistribution• Our design optimization goal• Related work: BlueShift• Our Heuristic
• Experimental Framework and Results• Design methodology• Testbed• Results and analysis
• Conclusions and Ongoing Work
![Page 17: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/17.jpg)
(17/25)
• Benchmark generation• Virtutech Simics – Full system simulation and capture test vectors
• Functional simulation• Cadence NC Verilog – Gate level simulation
• Library characterization• Cadence SignalStorm – Synopsys Liberty generation for each voltage
• Heuristic (Slack Optimization)• Implement in C++ and use Tcl socket interface with Synopsys
PrimeTime
• ECO P&R• Cadence SOCEncounter – ECO implementation
Design Methodology
![Page 18: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/18.jpg)
(18/25)
• Target design : sub-modules of OpenSPARC T1
• Benchmark• Ammp, bzip2, equake, sort and twolf• Make test vectors with 1 billion cycles for each sub-module
• Implementation• TSMC 65GP technology with standard SP&R flow
Testbed
![Page 19: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/19.jpg)
(19/25)
• Design techniques1. SP&R with 0.8 GHz (loose constraints)2. SP&R with 1.2 GHz (tight constraints)3. Blueshift: timing override4. Slack Optimizer
• Experiments compare all design techniques with respect to:
1. Power consumption at each voltage point2. Actual error rates from gate level simulation3. Power consumption at each target error rate4. Estimated processor-wide power consumption
List of Experiments
![Page 20: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/20.jpg)
(20/25)
• Error rate at each operating voltage (test case : lsu_dctl)
• Power consumption at each operating voltage
Error Rate and Power Results
![Page 21: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/21.jpg)
(21/25)
• Power consumption at each target error rate
• Slack distribution
Comparison of Power and Slack Results
![Page 22: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/22.jpg)
(22/25)
• Power reduction after optimization (@ 2% error rate)
• Area overhead of design approaches
Max. 32.8 %, Avg. 12.5% power reduction
Power Reduction and Area Overhead
![Page 23: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/23.jpg)
(23/25)
*Kahng et al. “Designing a Processor From the Ground Up to Allow Voltage/Reliability Tradeoffs”, HPCA 2010.
*
Processor-wide Results
• Slack optimization extends range of voltage scaling and reduces Razor recovery cost
![Page 24: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/24.jpg)
(24/25)
Conclusions and Ongoing Work• Showed limitations of a BTWC design• Presented design technique – slack redistribution
• Optimize frequently exercised critical paths• De-optimize rarely-exercised paths
• Demonstrated significant power benefits of gradual slack design• Reduced power 33% on maximum , 12.5% on average
• Ongoing work• Reliability-power tradeoffs for embedded memory• Applying to heterogeneous multi-core architecture
![Page 25: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/25.jpg)
UCSD VLSI CAD Laboratory and UIUC PASSAT Group - ASPDAC, Jan. 21, 2010
THANK YOU
![Page 26: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/26.jpg)
UCSD VLSI CAD Laboratory and UIUC PASSAT Group - ASPDAC, Jan. 21, 2010
BACKUP
![Page 27: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/27.jpg)
(27/25)
CPU, Heal Thyself
* Razor: A low power pipeline based on circuit-level timing speculation. In International Symposium on Micro architecture, December 2003.
• Razor* system• Timing errors can be corrected• Manage the trade-off between
system voltage and error rate• New design methodology is
needed
![Page 28: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/28.jpg)
(28/25)
Razor – How it works• Razor Implementation
• Razor: A low power pipeline based on circuit-level timing speculation. In International Symposium on Microarchitecture, December 2003.
• Main flip-flop latches at T, but Shadow latch latches at T+skew• If a timing violation occurs, main flip-flop will latch incorrect value, but
shadow latch should latch correct value• Comparator signals error and the late arriving value is fed back into the main
flip-flop
![Page 29: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/29.jpg)
(29/25)
BTWC: Voltage Scaling
• Error correction needs additional clock cycles and incurs power overhead
fa b
c
fa fb fc
PE(f
)
f
a
b c
fa fb fc
perf
(f)
PE(f) : Error rate at frequency f perf(f) : Performance at f
Maximum performance at point c
• Overclocking case
va b
c
va vb vc
PE(v
)
v
ab c
va vb vc
pwr(
v)
(lower voltage)
PE(v) : Error rate at voltage v pwr(v) : Power consumption at v
Minimum powerat point c
• Voltage scaling case
![Page 30: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/30.jpg)
(30/25)
Limitation of Voltage Scaling• At some voltage, circuit breaks down
Voltage scaling must halt after only 10% scaling.
0.0
0.1
0.2
0.3
0.4
0.5
0.50.60.70.80.91.0Voltage
Err
ors
/ C
ycle
.
![Page 31: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/31.jpg)
(31/25)
Reason for Steep Error Degradation• Critical paths are bunched up in traditional designs.
![Page 32: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/32.jpg)
(32/25)
Slack Re-distribution Example
Positive SlackNegative Slack
0.0
-0.1
Negative SlackPositive Slack
Error Rate = 1%Error Rate = 25%
![Page 33: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/33.jpg)
(33/25)
Heuristic Implementation – Error Rate Estimation• Error rate contribution of one flip-flop
• Error rate of an entire design
• Actual vs. estimated error rates
(1)
(2) α : compensation parameter
ALL
NEG
P
Pffff TG
TGTGER
Dff
ffD ERER
![Page 34: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/34.jpg)
(34/25)
Gradual Slack Distribution
Slack optimization achieves gradual slack distribution.
![Page 35: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/35.jpg)
(35/25)
Processor Error Rate and Power
Designs with comparable error rates have much
higher power/area overheads.
![Page 36: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/36.jpg)
(36/25)
Reliability/Power Tradeoff
Slack-optimized design enjoys continued power reduction as error rate increases.
![Page 37: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/37.jpg)
(37/25)
Enhancing Razor-based Design
Slack optimization extends range of voltage scaling and reduces Razor recovery cost.
![Page 38: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/38.jpg)
(38/25)
Moore’s Law
• Power consumption of processor node doubles every 18 months.
![Page 39: Slack Redistribution for Graceful Degradation Under Voltage Overscaling](https://reader036.vdocument.in/reader036/viewer/2022062322/568151fd550346895dc03877/html5/thumbnails/39.jpg)
(39/25)
Power Scaling• With current design techniques, processor power soon on
par with nuclear power plant