statistically rigorous regression modeling for the ...leebcc/documents/lee2006-mobs-talk.… · i...

41
Motivation & Background Model Derivation Model Evaluation Conclusion Statistically Rigorous Regression Modeling for the Microprocessor Design Space Benjamin C. Lee 1,2 , David M. Brooks 1 [email protected] 1 Division of Engineering and Applied Sciences Harvard University 2 Center for Applied Scientific Computing Research Lawrence Livermore National Laboratory Benjamin C. Lee, David M. Brooks :: 18 June 2006 1 :: Workshop on Modeling, Benchmarking, and Simulation

Upload: others

Post on 13-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Statistically Rigorous Regression Modelingfor the Microprocessor Design Space

Benjamin C. Lee1,2, David M. Brooks1

[email protected] of Engineering and Applied Sciences

Harvard University

2Center for Applied Scientific Computing ResearchLawrence Livermore National Laboratory

Benjamin C. Lee, David M. Brooks :: 18 June 2006 1 :: Workshop on Modeling, Benchmarking, and Simulation

Page 2: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

OutlineMotivation & Background

Simulation ChallengesSimulation ParadigmsRegression Theory

Model DerivationExperimental MethodologyCorrelation AnalysisModel Specification

Model EvaluationValidation ApproachPerformancePower

ConclusionSummaryFuture Directions

Benjamin C. Lee, David M. Brooks :: 18 June 2006 2 :: Workshop on Modeling, Benchmarking, and Simulation

Page 3: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmsRegression Theory

OutlineMotivation & Background

Simulation ChallengesSimulation ParadigmsRegression Theory

Model DerivationExperimental MethodologyCorrelation AnalysisModel Specification

Model EvaluationValidation ApproachPerformancePower

ConclusionSummaryFuture Directions

Benjamin C. Lee, David M. Brooks :: 18 June 2006 3 :: Workshop on Modeling, Benchmarking, and Simulation

Page 4: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmsRegression Theory

Microarchitectural Design Space

I Trend toward chip multiprocessors (CMP’s) with varying core designsI Power 4, Pentium 4, UltraSPARC T1I Tractably quantify trade-offs between core complexity, count

Benjamin C. Lee, David M. Brooks :: 18 June 2006 4 :: Workshop on Modeling, Benchmarking, and Simulation

Page 5: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmsRegression Theory

Design Space Exploration

I Limitations of Existing Simulation MethodologyI Trace sampling, compression reduce per simulation costsI Existing techniques do not reduce number of simulationsI Space size increases exponentially with parameter countI Multi-threaded, multi-core simulations further constrained

I Prior Design Space AnalysesI Consider mp design pointsI Vary one or two parameters at fine granularityI Vary multiple parameters at coarse granularityI Hold majority of parameters at constant values

Benjamin C. Lee, David M. Brooks :: 18 June 2006 5 :: Workshop on Modeling, Benchmarking, and Simulation

Page 6: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmsRegression Theory

Simulation Paradigms

I ObjectivesI Comprehensively understand microprocessor design spaceI Selectively perform a modest number of simulationsI Efficiently leverage simulation data

I Random Configuration SamplingI Sample points UAR from design space for simulationI Controls exponential increase in design count

I Statistical InferenceI Reveals trends, trade-offs from sparse samplingI Enables prediction for metrics of interest

Benjamin C. Lee, David M. Brooks :: 18 June 2006 6 :: Workshop on Modeling, Benchmarking, and Simulation

Page 7: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmsRegression Theory

Statistical Inference

I ApproachI Models approximate solutions to intractable problemsI Requires initial data to train, formulate modelI Leverages correlations from initial data for prediction

I Regression ModelingI Efficient formulation :: sample 1K of ≈1B, least squaresI Accurate inference :: 4− 7% median errorI Static accuracy :: no predictive training

Benjamin C. Lee, David M. Brooks :: 18 June 2006 7 :: Workshop on Modeling, Benchmarking, and Simulation

Page 8: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmsRegression Theory

Model FormulationI Notation

I n observationsI Response :: y = y1, . . . , yn

I Predictor :: xi = xi,1, . . . , xi,p

I Regression Coefficients :: β = β0, . . . , βp

I Random Error :: e = e1, . . . , en where ei ∼ N(0, σ2)I Transformations :: f , g = g1, . . . , gp

I Model

f (yi) = βg(xi) + ei

= β0 +p∑

j=1

βjgj(xij) + ei

Benjamin C. Lee, David M. Brooks :: 18 June 2006 8 :: Workshop on Modeling, Benchmarking, and Simulation

Page 9: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmsRegression Theory

Predictor Interaction

I Modeling InteractionI Suppose effects of predictors x1, x2 cannot be separatedI Construct predictor x3 = x1x2

y = β0 + β1x1 + β2x2 + β3x1x2 + ei

I ExampleI Let x1 be pipeline depth, x2 be L2 cache sizeI Performance impact of pipelining affected by cache size

Speedup =Depth

1 + Stalls/Inst

Benjamin C. Lee, David M. Brooks :: 18 June 2006 9 :: Workshop on Modeling, Benchmarking, and Simulation

Page 10: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmsRegression Theory

Predictor Non-Linearity

I Restricted Cubic SplinesI Divide predictor domain into intervals separated by knotsI Piecewise cubic polynomials joined at knots 1

I Higher order polynomials provide better fits

I Location of KnotsI Location of knots less important than number of knotsI Place knots at fixed predictor quantiles

I Number of KnotsI Flexibility, risk of over-fitting increases with knot countI 5 knots or fewer are often sufficientI 4 knots balances flexibility, over-fitting

1Stone [SS’86]Benjamin C. Lee, David M. Brooks :: 18 June 2006 10 :: Workshop on Modeling, Benchmarking, and Simulation

Page 11: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Simulation ChallengesSimulation ParadigmsRegression Theory

Prediction

I Expected ResponseI Suppose coefficients β, predictors’ xi,1, . . . , xi,p are knownI Expected response is weighted sum of predictor values

E[yi

]= E

[β0 +

p∑j=1

βjxij

]+ E

[ei

]= β0 +

p∑j=1

βjxij

Benjamin C. Lee, David M. Brooks :: 18 June 2006 11 :: Workshop on Modeling, Benchmarking, and Simulation

Page 12: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyCorrelation AnalysisModel Specification

OutlineMotivation & Background

Simulation ChallengesSimulation ParadigmsRegression Theory

Model DerivationExperimental MethodologyCorrelation AnalysisModel Specification

Model EvaluationValidation ApproachPerformancePower

ConclusionSummaryFuture Directions

Benjamin C. Lee, David M. Brooks :: 18 June 2006 12 :: Workshop on Modeling, Benchmarking, and Simulation

Page 13: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyCorrelation AnalysisModel Specification

Tools and Benchmarks

I Simulation FrameworkI Turandot :: a cycle-accurate trace driven simulatorI PowerTimer :: power models derived from circuit analysesI Baseline simulator models POWER4/POWER5 architecture

I BenchmarksI SPEC2kCPU :: compute-intensive benchmarksI SPECjbb :: Java server benchmark

I Statistical FrameworkI R :: software environment for statistical computingI Hmisc and Design packages2

2Harrell [Springer,’01]Benjamin C. Lee, David M. Brooks :: 18 June 2006 13 :: Workshop on Modeling, Benchmarking, and Simulation

Page 14: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyCorrelation AnalysisModel Specification

Configuration Sampling

I Design Space SizeI For i∈[1, p], Si defines possible values for parameter xi

I S =∏p

i=1 Si defines design spaceI |S| =

∏pi=1 |Si| defines space size

I B defines set of benchmarks, |B| × |S| potential simulationsI |S| ≈ 109 and |B| = 22

I Sampling Uniformly at Random (UAR)I Sample n = 4, 000 design points and benchmarksI Unbiased observations from full range of parameter valuesI Trends, trade-offs between parameters at fine granularity

Benjamin C. Lee, David M. Brooks :: 18 June 2006 14 :: Workshop on Modeling, Benchmarking, and Simulation

Page 15: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyCorrelation AnalysisModel Specification

Predictors :: MicroarchitectureSet Parameters Measure Range |Si|

S1 Depth depth FO4 9::3::36 10S2 Width width insn b/w 4,8,16 3

L/S reorder queue entries 15::15::45store queue entries 14::14::42functional units count 1,2,4

S3 Physical general purpose (GP) count 40::10::130 10Registers floating-point (FP) count 40::8::112

special purpose (SP) count 42::6::96S4 Reservation branch entries 6::1::15 10

Stations fixed-point/memory entries 10::2::28floating-point entries 5::1::14

S5 I-L1 Cache i-L1 cache size log2(entries) 7::1::11 5S6 D-L1 Cache d-L1 sache size log2(entries) 6::1::10 5S7 L2 Cache L2 cache size log2(entries) 11::1::15 5

L2 cache latency cycles 6::2::14S8 Control Latency branch latency cycles 1,2 2S9 FX Latency ALU latency cycles 1::1::5 5

FX-multiply latency cycles 4::1::8FX-divide latency cycles 35::5::55

S10 FP Latency FPU latency cycles 5::1::9 5FP-divide latency cycles 25::5::45

S11 L/S Latency Load/Store latency cycles 3::1::7 5S12 Memory Latency Main memory latency cycles 70::5::115 10

Benjamin C. Lee, David M. Brooks :: 18 June 2006 15 :: Workshop on Modeling, Benchmarking, and Simulation

Page 16: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyCorrelation AnalysisModel Specification

Predictors :: Application-Specific

I Application CharacteristicsI Collect program characteristics on baseline architectureI Baseline instruction throughput (BIPS)I Cache access patterns (i-L1, d-L1, L2 miss rates)I Branch patterns (branch frequency, mispredict rate)I Sources of pipeline stalls (per queue stall histograms)

I Application EffectsI Characteristics are significant predictors when interacting

with microarchitectural predictorsI Example :: Impact of d-L1 cache affected by access rates

Benjamin C. Lee, David M. Brooks :: 18 June 2006 16 :: Workshop on Modeling, Benchmarking, and Simulation

Page 17: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyCorrelation AnalysisModel Specification

Variable Clustering

stal

l_ca

stst

all_

resv

dl1m

iss_

rate

base

_bip

sdl

2mis

s_ra

test

all_

dmis

sqst

all_

stor

eqbr

_mis

_rat

est

all_

reor

derq

fix_l

atct

l_la

tls

_lat

fpu_

lat

dept

hm

em_l

atw

idth

bips

phys

_reg

l2ca

che_

size

resv

il1m

iss_

rate

il2m

iss_

rate stal

l_in

fligh

tst

all_

rena

me

br_r

ate

br_s

tall

icac

he_s

ize

dcac

he_s

ize

1.0

0.8

0.6

0.4

0.2

0.0

Spe

arm

an ρ

2

Benjamin C. Lee, David M. Brooks :: 18 June 2006 17 :: Workshop on Modeling, Benchmarking, and Simulation

Page 18: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyCorrelation AnalysisModel Specification

Strength of Marginal Relationships

Benjamin C. Lee, David M. Brooks :: 18 June 2006 18 :: Workshop on Modeling, Benchmarking, and Simulation

Page 19: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Experimental MethodologyCorrelation AnalysisModel Specification

Regression Model Specification

I InteractionsI Pipeline width/depth interact with

I instruction bandwidth structures (queues, register file)I cache hierarchy

I Cache hierarchy sizes interact withI adjacent levels in hierarchyI application-specific access rates

I Baseline performance interacts with resource sizings

I Restricted Cubic SplinesI Weaker relationships (latencies, caches, queues) :: 3 knotsI Stronger relationships (depth, registers) :: 4 knotsI Baseline application performance :: 5 knots

Benjamin C. Lee, David M. Brooks :: 18 June 2006 19 :: Workshop on Modeling, Benchmarking, and Simulation

Page 20: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Validation ApproachPerformancePower

OutlineMotivation & Background

Simulation ChallengesSimulation ParadigmsRegression Theory

Model DerivationExperimental MethodologyCorrelation AnalysisModel Specification

Model EvaluationValidation ApproachPerformancePower

ConclusionSummaryFuture Directions

Benjamin C. Lee, David M. Brooks :: 18 June 2006 20 :: Workshop on Modeling, Benchmarking, and Simulation

Page 21: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Validation ApproachPerformancePower

Validation ApproachI Framework

I Formulate models with n∗ < n = 4, 000 samplesI Obtain 100 additional random samples for validationI Quantify percentage error, 100 ∗ |yi − yi|/yi

I Model VariantsI Baseline (B): Model non-transformed responseI Variance Stabilized (S): Model square-root of responseI Regional (S+R): For each query, reformulate model with

samples most similarly configured to query

d =

[ p∑i=1

(ai − bi

ai

)2]1/2

I Application-Specific (S+A): Fix sample benchmarks

Benjamin C. Lee, David M. Brooks :: 18 June 2006 21 :: Workshop on Modeling, Benchmarking, and Simulation

Page 22: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Validation ApproachPerformancePower

Performance Prediction

Benjamin C. Lee, David M. Brooks :: 18 June 2006 22 :: Workshop on Modeling, Benchmarking, and Simulation

Page 23: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Validation ApproachPerformancePower

Power Prediction

Benjamin C. Lee, David M. Brooks :: 18 June 2006 23 :: Workshop on Modeling, Benchmarking, and Simulation

Page 24: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

Validation ApproachPerformancePower

Performance-Power Comparison

I Performance AccuracyI 7.4% median error for S+A modelI S+A reduces performance variance across applicationsI S+R ineffective since application is primary determinant of

performance

I Power AccuracyI 4.3% median error for S+R modelI S+R reduces power variance across configurationsI S+A ineffective since resource sizings are primary

determinants of power

Benjamin C. Lee, David M. Brooks :: 18 June 2006 24 :: Workshop on Modeling, Benchmarking, and Simulation

Page 25: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

SummaryFuture Directions

OutlineMotivation & Background

Simulation ChallengesSimulation ParadigmsRegression Theory

Model DerivationExperimental MethodologyCorrelation AnalysisModel Specification

Model EvaluationValidation ApproachPerformancePower

ConclusionSummaryFuture Directions

Benjamin C. Lee, David M. Brooks :: 18 June 2006 25 :: Workshop on Modeling, Benchmarking, and Simulation

Page 26: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

SummaryFuture Directions

Summary

I Simulation ChallengesI Limited design space studies due to simulation costsI Existing frameworks reduce per simulation costs only

I Regression ModelsI Sampling :: 1K of ≈1B configurations UARI Specification :: correlation analysesI Refinement :: stabilizing transformations

I Model EvaluationI 7.4%, 4.3% median errors for performance, powerI S+A, S+R more effective for performance, power

Benjamin C. Lee, David M. Brooks :: 18 June 2006 26 :: Workshop on Modeling, Benchmarking, and Simulation

Page 27: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

Motivation & BackgroundModel DerivationModel Evaluation

Conclusion

SummaryFuture Directions

Future Directions

I Model ApplicationsI Demonstrate applicability to prior studiesI Models enable more aggressive studiesI Construct a CMP simulation framework

I Model ImprovementsI Techniques, transformations to further reduce error, bias

I Survey Approaches in Statistical InferenceI Compare regression modeling with machine learning

Benjamin C. Lee, David M. Brooks :: 18 June 2006 27 :: Workshop on Modeling, Benchmarking, and Simulation

Page 28: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

AppendixLinksReferencesExtra Slides

Publicationswww.deas.harvard.edu/∼bclee

B.C. Lee and D.M. Brooks.Statistically rigorous regression modeling for the microprocessordesign space.ISCA-33: Workshop on Modeling, Benchmarking, and Simulation,June 2006.

B.C. Lee and D.M. Brooks.Accurate, efficient regression modeling for microarchitecturalperformance, power prediction.ASPLOS-XII: International Conference on Architectural Support forProgramming Languages and Operating Systems, Oct 2006. (ToAppear)

Benjamin C. Lee, David M. Brooks :: 18 June 2006 28 :: Workshop on Modeling, Benchmarking, and Simulation

Page 29: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

AppendixLinksReferencesExtra Slides

References IY. Li, B.C. Lee, D. Brooks, Z. Hu, K. Skadron.CMP design space exploration subject to physical constraints.HPCA-12: International Symposium on High-Performance Computer Architecture, Feb 2006.

L. Eeckhout, S. Nussbaum, J. Smith, and K. DeBosschere.Statistical simulation: Adding efficiency to the computer designer’s toolbox.IEEE Micro, Sept/Oct 2003.

R. Liu and K. Asanovic.Accelerating architectural exploration using canonical instruction segments.In International Symposium on Performance Analysis of Systems and Software, Austin, Texas,March 2006.

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder.Automatically characterizing large scale program behavior.ASPLOX-X: Architectural Support for Programming Languages and Operating Systems,October 2002.

B.C. Lee and D.M. Brooks.Effects of pipeline complexity on SMT/CMP power-performance efficiency.ISCA-32: Workshop on Complexity Effective Design, June 2005.

Benjamin C. Lee, David M. Brooks :: 18 June 2006 29 :: Workshop on Modeling, Benchmarking, and Simulation

Page 30: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

AppendixLinksReferencesExtra Slides

References IIC. Stone.Comment: Generalized additive models.Statistical Science, 1986.

F. Harrell.Regression modeling strategies.Springer, New York, NY, 2001.

J. Yi, D. Lilja, and D. Hawkins.Improving computer architecture simulation methodology by adding statistical rigor.IEEE Computer, Nov 2005.

P. Joseph, K. Vaswani, and M. J. Thazhuthaveetil.Construction and use of linear regression models for processor performance analysis.In Proceedings of the 12th Symposium on High Performance Computer Architecture, Austin,Texas, February 2006.

S. Nussbaum and J. Smith.Modeling superscalar processors via statistical simulation.In PACT2001: International Conference on Parallel Architectures and Compilation Techniques,Barcelona, Sept 2001.

Benjamin C. Lee, David M. Brooks :: 18 June 2006 30 :: Workshop on Modeling, Benchmarking, and Simulation

Page 31: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

AppendixLinksReferencesExtra Slides

Assessing FitI Multiple Correlation Statistic

I R2 is fraction of response variance captured by predictorsI Large R2 suggests better fit to observed dataI R2 → 1 suggests over-fitting (less likely if p < n/20)

R2 = 1−∑n

i=1(yi − yi)2∑ni=1(yi − 1

n

∑ni=1 yi)2

I Residual Distribution AssumptionsI Residuals are normally distributed, ei ∼ N(0, σ2)I No correlation between residuals and response, predictorsI Validate by scatterplots and quantile-quantile plots

ei = yi − β0 −p∑

j=0

βjxij

Benjamin C. Lee, David M. Brooks :: 18 June 2006 31 :: Workshop on Modeling, Benchmarking, and Simulation

Page 32: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

AppendixLinksReferencesExtra Slides

Predictor Non-Linearity I

I Polynomial TransformationsI Undesirable peaks and valleysI Differing trends across regions

I Linear SplinesI Piecewise linear regions separated by knotsI Inadequate for complex, highly curved relationships

I Restricted Cubic SplinesI Higher order polynomials provide better fitsI Continuous at knotsI Linear constraint on tails

Benjamin C. Lee, David M. Brooks :: 18 June 2006 32 :: Workshop on Modeling, Benchmarking, and Simulation

Page 33: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

AppendixLinksReferencesExtra Slides

Predictor Non-Linearity II

I Location of KnotsI Location of knots less important than number of knotsI Place knots at fixed predictor quantiles

I Number of KnotsI Flexibility, risk of over-fitting increases with knot countI 5 knots or fewer are often sufficient 3

I 4 knots is a good compromise between flexibility, over-fittingI Fewer knots required for small data sets

3Stone [SS’86]Benjamin C. Lee, David M. Brooks :: 18 June 2006 33 :: Workshop on Modeling, Benchmarking, and Simulation

Page 34: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

AppendixLinksReferencesExtra Slides

Significance Testing I

I ApproachI Given two nested models, hypothesis H0 states additional

predictors in larger model have no response associationI Test H0 with F-statistics and p-values

I ExampleI Predictor interaction requires comparing nested modelsI Consider a model y = β0 + β1x1 + β2x2 + β3x1x2.I Test significance of x1 with null hypothesis H0 : β1 = β3 = 0

Benjamin C. Lee, David M. Brooks :: 18 June 2006 34 :: Workshop on Modeling, Benchmarking, and Simulation

Page 35: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

AppendixLinksReferencesExtra Slides

Significance Testing III F-Statistic

I Compare two nested models using their R2 and F-statisticI R2 is fraction of response variance captured by predictors

R2 = 1−∑n

i=1(yi − yi)2∑ni=1(yi − 1

n

∑ni=1 yi)2

I F-statistic of two nested models follows F distribution

Fk,n−p−1 =R2 − R2

∗k

× n− p− 11− R2

I P-ValuesI Probability F-statistic greater than or equal to observed

value would occur under H0

I Small p-values cast doubt on H0

Benjamin C. Lee, David M. Brooks :: 18 June 2006 35 :: Workshop on Modeling, Benchmarking, and Simulation

Page 36: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

AppendixLinksReferencesExtra Slides

Treatment of Missing Data

I Missing Completely at Random (MCAR)I Treat unobserved design points as missing dataI Sampling UAR ensures observations are MCARI Data is missing for reasons unrelated to characteristics or

responses of the configuration

I Informative MissingI Data is more likely missing if their responses are

systematically higher or lowerI “Missingness” is non-ignorable and must also be modeledI Sampling UAR avoids such modeling complications

Benjamin C. Lee, David M. Brooks :: 18 June 2006 36 :: Workshop on Modeling, Benchmarking, and Simulation

Page 37: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

AppendixLinksReferencesExtra Slides

Performance Associations I

bips

0.6 0.7 0.8 0.9

12131170 815 802

131813201362

11971179 802 822

11251247 818 810

4000

N

4 8

16

[ 9,18) [18,27) [27,33) [33,36]

[ 40, 70) [ 70,100) [100,120) [120,130]

[ 6, 9) [ 9,12)

[12,14) [14,15]

depth

width

phys_reg

resv

Overall

Pipeline

N=4000bips

0.70 0.80 0.90

10101017 977 996

1505 728 953 814

2399 800 432 180 189

1036 973

1197 794

1298 707

1046 949

4000

N

4

1 2 3 4 5

1 2

5

[ 34, 56) [ 56, 77)

[ 77,124) [124,307]

[1, 4)

[5, 8) [8,19]

[3, 5) [5,13]

[ 2, 5)

[ 6,10) [10,24]

mem_lat

ls_lat

ctl_lat

fix_lat

fpu_lat

Overall

Latency

N=4000bips

0.74 0.78 0.82

799 828 807 805 761

774 794 773 837 822

825 776 823 780 796

4000

N

11 12 13 14 15

7 8 9

10 11

6 7 8 9

10

l2cache_size

icache_size

dcache_size

Overall

Memory Hierarchy

N=4000

Benjamin C. Lee, David M. Brooks :: 18 June 2006 37 :: Workshop on Modeling, Benchmarking, and Simulation

Page 38: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

AppendixLinksReferencesExtra Slides

Performance Associations II

bips

0.6 0.8

1076

1060

934

930

1085

1055

960

900

4000

N

[0.01,0.05)

[0.05,0.10)

[0.10,0.16)

[0.16,0.29]

[0.005,0.034)

[0.034,0.072)

[0.072,0.093)

[0.093,0.165]

br_rate

br_mis_rate

Overall

Branch

N=4000bips

0.6 0.8

10751104 910 911

1114 933

1053 900

1135 876

1129 860

4000

N

[ 6225, 108733) [108733, 242619) [242619, 597291) [597291,2821539]

[ 193084,2.60e+06) [ 2599382,7.22e+06) [ 7222608,4.69e+07) [46879987,1.04e+08]

[ 949, 6037) [ 6037, 18667)

[ 18667, 254857) [254857,14883942]

stall_inflight

stall_dmissq

stall_cast

Overall

Stalls

N=4000bips

0.5 0.7 0.9

1093 933

1045 929

1096 907

1077 920

1127 922

1070 881

10991075 941 885

4000

N

[ 0, 9959) [ 9959, 97381)

[ 97381, 299112) [299112,1373099]

[ 0, 350) [ 350, 24448)

[ 24448,144305) [144305,923543]

[ 1814764,5.98e+06) [ 5984258,1.01e+07)

[10134921,2.42e+07) [24224471,1.13e+08]

[ 16188, 2013914) [ 2013914, 4131316)

[ 4131316,22315283) [22315283,45296951]

stall_storeq

stall_reorderq

stall_resv

stall_rename

Overall

Stalls

N=4000

Benjamin C. Lee, David M. Brooks :: 18 June 2006 38 :: Workshop on Modeling, Benchmarking, and Simulation

Page 39: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

AppendixLinksReferencesExtra Slides

Performance Associations III

bips

0.5 0.8

1110 9191071 900

1103 9111103 883

1856 349 928 867

4000

N

0.02

[5.08e−06,3.91e−05) [3.91e−05,7.15e−04) [7.15e−04,3.85e−03) [3.85e−03,3.71e−02]

[0.02,0.09) [0.09,0.12) [0.12,0.18) [0.18,0.55]

[0.00,0.02)

[0.04,0.11) [0.11,0.24]

il1miss_rate

dl1miss_rate

dl2miss_rate

Overall

Cache

N=4000bips

0.5 0.7 0.9

1056

1122

913

909

4000

N

[0.309,0.92)

[0.920,1.21)

[1.208,1.35)

[1.347,1.58]

base_bips

Overall

Baseline Performance

N=4000

Benjamin C. Lee, David M. Brooks :: 18 June 2006 39 :: Workshop on Modeling, Benchmarking, and Simulation

Page 40: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

AppendixLinksReferencesExtra Slides

Significance Tests

I Microarchitectural PredictorsI Majority of F-tests imply significance (p-values < 2.2E − 16)I Several predictors were less significant

I Control latency (p-value = 0.1247)I Reservation station size (p-value = 0.1239)I L1 instruction cache size (p-value = 0.02941)

I Application-Specific PredictorsI Majority of F-tests imply significance (p-values < 2.2E − 16)I Pipeline stalls classified by structure are less significant

I Completion and reorder queue stalls (p-values > 0.4)

Benjamin C. Lee, David M. Brooks :: 18 June 2006 40 :: Workshop on Modeling, Benchmarking, and Simulation

Page 41: Statistically Rigorous Regression Modeling for the ...leebcc/documents/lee2006-mobs-talk.… · I Collect program characteristics on baseline architecture I Baseline instruction throughput

AppendixLinksReferencesExtra Slides

Related Work

I Statistical Significance RankingI Yi :: Plackett-Burman, effect rankingsI Joseph :: Stepwise regression, coefficient rankingsI Bound parameter values to improve tractabilityI Require simulation for estimation

I Synthetic WorkloadsI Eeckhout :: Profile workloads to obtain synthetic tracesI Nussbaum :: Superscalar and SMP simulationI Obtain distribution of instructions and data dependenciesI Require simulation with smaller traces for estimation

Benjamin C. Lee, David M. Brooks :: 18 June 2006 41 :: Workshop on Modeling, Benchmarking, and Simulation