design-dependent ring oscillatorarm cortex m3 ddro . acknowledgments •thanks to professor dennis...
TRANSCRIPT
![Page 1: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/1.jpg)
DDRO: A Novel Performance Monitoring Methodology Based on Design-Dependent Ring Oscillators
Tuck-Boon Chan†, Puneet Gupta§, Andrew B. Kahng†‡ and Liangzhen Lai§
UC San Diego ECE† and CSE‡ Departments, La Jolla, CA 92093
UC Los Angeles EE§ Department, Los Angeles, CA 90095
1
![Page 2: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/2.jpg)
Outline
• Performance Monitoring: An Introduction
• DDRO Implementation
• Delay Estimation from Measured DDRO Delays
• Experiment Results
• Conclusions
2
![Page 3: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/3.jpg)
Performance Monitoring
• Process corner identification
– Adaptive voltage scaling, adaptive body-bias
• Runtime adaptation
– DVFS
• Manufacturing process tuning
– Wafer and test pruning [Chan10]
3
![Page 4: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/4.jpg)
Monitor Taxonomy
• In-situ monitors:
– In-situ time-to-digital converter (TDC) [Fick10]
– In-situ path RO [Ngo10, Wang08]
• Replica monitors:
– One monitor: representative path [Liu10]
– Many monitors: PSRO [Bhushan06]
4
• How many monitors?
• How to design monitors?
• How to use monitors?
∆t = ?? ∆t = ??
![Page 5: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/5.jpg)
Key Observation: Sensitivities Cluster!
• The sensitivities form natural clusters – Design dependent
– Multiple monitors • One monitor per cluster
5
• Each dot represents ∆delay of a critical path under variations
![Page 6: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/6.jpg)
DDRO Contributions
• Systematic methodology to design multiple DDROs based on clustering
• Systematic methodology to leverage monitors to estimate chip delay
6
![Page 7: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/7.jpg)
Outline
• Performance Monitoring: An Introduction
• DDRO Implementation
– Delay model
– Sensitivity Clustering
– DDRO Synthesis
• Delay Estimation from Measured DDRO Delays
• Experiment Results
• Conclusions
7
![Page 8: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/8.jpg)
Delay Model and Model Verification
• Linear model correlates well with SPICE results
8
• Assume a linear delay model for variations
(1 )nom j jd d V G
Real delay
Nominal delay
Sensitivities Variation magnitude
Variation source index
![Page 9: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/9.jpg)
Sensitivities and Clustering
– Use kmeans++ algorithm
– Choose best k-way clustering solution in 100 random starts
– Each cluster centroid = target sensitivity for a DDRO
• Synthesize DDROs to meet target sensitivities 9
• Cluster the critical paths based on sensitivities
• Extract delay sensitivity based on finite difference method
1jG nom
j
nom
d dV
d
![Page 10: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/10.jpg)
DDRO Synthesis • Gate module is the basic building block of DDRO
– Consists of standard cells from qualified library
• Multiple cells are concatenated in a gate module – Inner cells are less sensitive to input slews and output
load variation
– Delay sensitivity is independent of other modules
10
![Page 11: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/11.jpg)
ILP formulation
• Module sensitivity is independent of its location
• Module number can only be integers
• Formulate the synthesis problem as integer linear programming (ILP) problem
11
Minimize: RO
sensitivity Target
sensitivity
Subject to: Delaymin < Module 1
delay < Delaymax
< Stagemax hs
RO sensitivity
Module h sensitivity ( hs )
( hs )
![Page 12: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/12.jpg)
Outline
• Performance Monitoring: An Introduction
• DDRO Implementation
• Delay Estimation from Measured DDRO Delays
– Sensitivity Decomposition
– Path Delay Estimation
– Cluster Delay Estimation
• Experiment Results
• Conclusions
12
![Page 13: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/13.jpg)
Sensitivity Decomposition
• Based on the cluster representing RO
• User linear decomposition to fully utilize all ROs
13
Sens(path)
Sens(RO1)
Sens(RO2)
= 0.9 x Sens(RO1) + 0.1 x Sens(RO2)
Path sensitivity
RO sensitivity ∑(bk x ) +
Sensitivity residue
![Page 14: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/14.jpg)
• Given DDRO delay, use the sensitivity decomposition
• Apply margin for estimation confidence
• One estimation per path
Path Delay Estimation
14
(1 )
:
ro nompath path k ki nom k inom
k
path
i i res
d dd d b u
d
where u l V G
Predicted path delay
Other variation components Sensitivity residue
Measured from RO Margin
![Page 15: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/15.jpg)
Cluster Delay Estimation
• For run-time delay estimation, may be impractical to make one prediction per path
• Reuse the clustering
– Assume a pseudo-path for each cluster
– Use statistical method to compute the nominal delay and delay sensitivity of the pseudo-path
– Estimate the pseudo-path delay
• One estimation per cluster 15
max{ , }X
cluster path
id d path i cluster X
![Page 16: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/16.jpg)
Outline
• Introduction
• Implementation
• Delay Estimation
• Experiment Results
• Conclusion
16
![Page 17: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/17.jpg)
Sensitivity Extraction
• All variability data from a commercial 45nm statistical SPICE model
17
0.960
0.970
0.980
0.990
1.000
1.010
No
rmal
ize
d d
ela
y
7stages Inverter chain RO delay
Hvt Svt
![Page 18: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/18.jpg)
Experiment Setup
• Use Monte-Carlo method to simulate critical path delays and DDRO delays
• Apply delay estimation methods with certain estimation confidence – 99% in all experiments
• Compare the amount of delay over-prediction
– Delay from DDRO estimation vs. Delay from critical paths
18
![Page 19: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/19.jpg)
Linear Model Results Global variation only
19
M0
AES
MIPS
• Overestimation reduces as the number of cluster and RO increases
• The two estimation methods perform similarly
3.5% 0.5%
![Page 20: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/20.jpg)
Linear Model Results Global and local variations
20
M0
AES
MIPS
• With local variation, the benefit of having more ROs saturates
• Local variation can only be captured by in-situ monitors
4% 3%
![Page 21: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/21.jpg)
Conclusion and Future Work
• A systematic method to design multiple DDROs based on clustering
• An efficient method to predict chip delay
• By using multiple DDROs, delay overestimation is reduced by up to 25% (from 4% to 3%)
– Still limited by local variations
• Test chip tapeout using 45nm technology
– With an ARM CORTEX M3 Processor
21
ARM CORTEX M3 DDRO
![Page 22: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/22.jpg)
Acknowledgments
• Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan
22
![Page 23: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/23.jpg)
Thank you!
23
![Page 24: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/24.jpg)
Test Chip
• Test chip tapeout using 45nm technology
– With an ARM CORTEX M3 Processor
24
ARM CORTEX M3 DDRO
![Page 25: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/25.jpg)
Gate-module
• The delay sensitivities for different input slew and output load
combinations.
• Use 5 stages
as trade-off
between
module area
and stability
25
![Page 26: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/26.jpg)
SPICE Results
Global and local variations
26
AES M0 MIPS
![Page 27: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/27.jpg)
Process Tuning
• Circuit performance monitoring is potentially helpful as test structure for manufacturing process tuning
27
– How to exploit the performance
monitors to make short-loop
monitoring?
T. Chan, ICCAD 2010
Measured I-V , C-V values after M-1
Scribe-line test structures
Compressed design dependent parameters Design
Delay and leakage power model
wafer Early
Performance estimation
Wafer pruning
![Page 28: Design-Dependent Ring OscillatorARM CORTEX M3 DDRO . Acknowledgments •Thanks to Professor Dennis Sylvester, Matt Fojtik, David Fick, and Daeyeon Kim from University of Michigan 22](https://reader034.vdocument.in/reader034/viewer/2022042218/5ec454ed480d6057bc1c521a/html5/thumbnails/28.jpg)
Existing Monitors
Generic Design-dependent
Many monitors
N/A Representative path [Xie10] In-situ monitors [Fick10] Critical-path replica [Black00, Shaik11] In-situ path RO [Ngo10, Wang08]
Multiple monitors
PSRO [Bhushan06] RO [Tetelbaum09]
This work TRC [Drake08] Process monitors [Burns08, Philling09]
One monitor PLL [Kang10] Representative Path [Liu10]
28