workloads 02 tutorial
DESCRIPTION
workload modelling of performance testingTRANSCRIPT
-
Workload Modeling
and its Effect on
Performance EvaluationDror Feitelson
Hebrew University
Thanks toparticipants and progam committee; thanks to Monien; abuse hospitality talk about agenda
-
Performance Evaluation
In system designSelection of algorithmsSetting parameter valuesIn procurement decisionsValue for moneyMeet usage goalsFor capacity planingImportant and basic activity
-
The Good Old Days
The skies were blueThe simulation results were conclusiveOur scheme was better than theirsFeitelson & Jette, JSSPP 1997
Focus on system design. Widely different designs lead to conclusive results.
-
But in their papers,
Their scheme was better than ours!
But literature is full of contradictory results.
-
How could they be so wrong?
Leads to question of what is the cause for contradictions.
-
Performance evaluation depends on:
The systems design(What we teach in algorithms and data structures)
Its implementation(What we teach in programming courses)
The workload to which it is subjectedThe metric used in the evaluationInteractions between these factorsNext: our focus is the workloads.
-
Performance evaluation depends on:
The systems design(What we teach in algorithms and data structures)
Its implementation(What we teach in programming courses)
The workload to which it is subjectedThe metric used in the evaluationInteractions between these factors -
Outline for Today
Three examples of how workloads affect performance evaluationWorkload modelingGetting dataFitting, correlations, stationarityHeavy tails, self similarityResearch agendaIn the context of parallel job scheduling
Job scheduling, not task scheduling
-
Example #1
Gang Scheduling and
Job Size Distribution
-
Gang What?!?
Time slicing parallel jobs with coordinated context switching
Ousterhout
matrix
Ousterhout, ICDCS 1982
-
Gang What?!?
Time slicing parallel jobs with coordinated context switching
Ousterhout
matrix
Optimization:
Alternative
scheduling
Ousterhout, ICDCS 1982
-
Packing Jobs
Use a buddy system for allocating processors
Feitelson & Rudolph, Computer 1990
-
Packing Jobs
Use a buddy system for allocating processors
Start with full system in one block
-
Packing Jobs
Use a buddy system for allocating processors
To allocate repeatedly partition in two to get desired size
-
Packing Jobs
Use a buddy system for allocating processors
-
Packing Jobs
Use a buddy system for allocating processors
Or use existing partition
-
The Question:
The buddy system leads to internal fragmentationBut it also improves the chances of alternative scheduling, because processors are allocated in predefined groupsWhich effect dominates the other?
-
The Answer (part 1):
Feitelson & Rudolph, JPDC 1996
Answer as function of workload, but not full answer because workload unknown. Dashed lines: provable bounds.
-
The Answer (part 2):
Note logarithmic Y axis
-
The Answer (part 2):
-
The Answer (part 2):
-
The Answer (part 2):
Many small jobsMany sequential jobsMany power of two jobsPractically no jobs use full machineConclusion: buddy system should work well
-
Verification
Feitelson, JSSPP 1996
Using Feitelson workload
-
Example #2
Parallel Job Scheduling
and Job Scaling
-
Variable Partitioning
Each job gets a dedicated partition for the duration of its executionResembles 2D bin packingPacking large jobs first should lead to better performanceBut what about correlation of size and runtime?First-fit decreasing is optimal
-
Scaling Models
Constant workParallelism for speedup: Amdahls LawLarge first SJFConstant timeSize and runtime are uncorrelatedMemory boundLarge first LJFFull-size jobs lead to blockoutWorley, SIAM JSSC 1990
Question is which model applies within the context of a single machine
-
Scan Algorithm
Keep jobs in separate queues according to size (sizes are powers of 2)Serve the queues Round Robin, scheduling all jobs from each queue (they pack perfectly)Assuming constant work model, large jobs only block the machine for a short timeBut the memory bound model would lead to excessive queueing of small jobsKrueger et al., IEEE TPDS 1994
Important point: schedule order determined by size
-
The Data
-
The Data
-
The Data
-
The Data
Data: SDSC Paragon, 1995/6
-
The Data
Data: SDSC Paragon, 1995/6
Partitions with equal numbers of jobs; many more small jobs.
-
The Data
Data: SDSC Paragon, 1995/6
Similar range, different shape; 80th percentile moves from
-
Conclusion
Parallelism used for better results, not for faster resultsConstant work model is unrealisticMemory bound model is reasonableScan algorithm will probably not perform well in practice -
Example #3
Backfilling and
User Runtime Estimation
-
Backfilling
Variable partitioning can suffer from external fragmentationBackfilling optimization: move jobs forward to fill in holes in the scheduleRequires knowledge of expected job runtimes -
Variants
EASY backfillingMake reservation for first queued job
Conservative backfillingMake reservation for all queued jobs
-
User Runtime Estimates
Lower estimates improve chance of backfilling and better response timeToo low estimates run the risk of having the job killedSo estimates should be accurate, right? -
They Arent
Mualem & Feitelson, IEEE TPDS 2001
Short=failed; killed typically exceeded runtime estimate, ~15%
-
Surprising Consequences
Inaccurate estimates actually lead to improved performancePerformance evaluation results may depend on the accuracy of runtime estimatesExample: EASY vs. conservativeUsing different workloadsAnd different metricsWill focus on second bullet
-
EASY vs. Conservative
Using CTC SP2 workload
-
EASY vs. Conservative
Using Jann workload model
Note: jann model of CTC
-
EASY vs. Conservative
Using Feitelson workload model
-
Conflicting Results Explained
Jann uses accurate runtime estimatesThis leads to a tighter scheduleEASY is not affected too muchConservative manages less backfilling of long jobs, because respects more reservationsRelative measure: more by EASY = less by conservative
-
Conservative is bad for the long jobs
Good for short ones that are respected
Conservative
EASY -
Conflicting Results Explained
Response time sensitive to long jobs, which favor EASYSlowdown sensitive to short jobs, which favor conservativeAll this does not happen at CTC, because estimates are so loose that backfill can occur even under conservative -
Verification
Run CTC workload with accurate estimates
-
But What About My Model?
Simply does not have such small long jobs
-
Workload Data Sources
-
No Data
Innovative unprecedented systemsWirelessHand-heldUse an educated guessSelf similarityHeavy tailsZipf distribution -
Serendipitous Data
Data may be collected for various reasonsAccounting logsAudit logsDebugging logsJust-so logsCan lead to wealth of information -
NASA Ames iPSC/860 log
42050 jobs from Oct-Dec 1993
user job nodes runtime date time
user4 cmd8 32 70 11/10/93 10:13:17
user4 cmd8 32 70 11/10/93 10:19:30
user42 nqs450 32 3300 11/10/93 10:22:07
user41 cmd342 4 54 11/10/93 10:22:37
sysadmin pwd 1 6 11/10/93 10:22:42
user4 cmd8 32 60 11/10/93 10:25:42
sysadmin pwd 1 3 11/10/93 10:30:43
user41 cmd342 4 126 11/10/93 10:31:32
Feitelson & Nitzberg, JSSPP 1995
-
Distribution of Job Sizes
-
Distribution of Job Sizes
-
Distribution of Resource Use
-
Distribution of Resource Use
-
Degree of Multiprogramming
-
System Utilization
-
Job Arrivals
-
Arriving Job Sizes
-
Distribution of Interarrival Times
-
Distribution of Runtimes
-
User Activity
-
Repeated Execution
-
Application Moldability
Of jobs run more than once
-
Distribution of Run Lengths
-
Predictability in Repeated Runs
For jobs run more than 5 times
-
Recurring Findings
Many small and serial jobsMany power-of-two jobsWeak correlation of job size and durationJob runtimes are bounded but have CV>1Inaccurate user runtime estimatesNon-stationary arrivals (daily/weekly cycle)Power-law user activity, run lengths -
Instrumentation
Passive: snoop without interferingActive: modify the systemCollecting the data interferes with system behaviorSaving or downloading the data causes additional interferencePartial solution: model the interference -
Data Sanitation
Strange things happenLeaving them in is safe and faithful to the real dataBut it risks situations in which a non-representative situation dominates the evaluation results -
Arrivals to SDSC SP2
-
Arrivals to LANL CM-5
-
Arrivals to CTC SP2
-
Arrivals to SDSC Paragon
What are they doing at 3:30 AM?
-
3:30 AM
Nearly every day, a set of 16 jobs are run by the same userMost probably the same set, as they typically have a similar pattern of runtimesMost probably these are administrative jobs that are executed automatically -
Arrivals to CTC SP2
-
Arrivals to SDSC SP2
-
Arrivals to LANL CM-5
-
Arrivals to SDSC Paragon
-
Are These Outliers?
These large activity outbreaks are easily distinguished from normal activityThey last for several days to a few weeksThey appear at intervals of several months to more than a yearThey are each caused by a single user!Therefore easy to remove -
Two Aspects
In workload modeling, should you include this in the model?In a general model, probably notConduct separate evaluation for special conditions (e.g. DOS attack)In evaluations using raw workload data, there is a danger of bias due to unknown special circumstances -
Automation
The idea:Cluster daily data in based on various workload attributesRemove days that appear alone in a clusterRepeatThe problem:Strange behavior often spans multiple daysCirne &Berman, Wkshp Workload Charact. 2001
-
Workload Modeling
-
Statistical Modeling
Identify attributes of the workloadCreate empirical distribution of each attributeFit empirical distribution to create modelSynthetic workload is created by sampling from the model distributions -
Fitting by Moments
Calculate model parameters to fit moments of empirical dataProblem: does not fit the shape of the distribution -
Jann et al, JSSPP 1997
-
Fitting by Moments
Calculate model parameters to fit moments of empirical dataProblem: does not fit the shape of the distributionProblem: very sensitive to extreme data values -
Effect of Extreme Runtime Values
Downey & Feitelson, PER 1999
Change when top records omittedomitmeanCV0.01%-2.1%-29%0.02%-3.0%-35%0.04%-3.7%-39%0.08%-4.6%-39%0.16%-5.7%-42%0.31%-7.1%-42% -
Alternative: Fit to Shape
Maximum likelihood: what distribution parameters were most likely to lead to the given observationsNeeds initial guess of functional formPhase type distributionsConstruct the desired shapeGoodness of fitKolmogorov-Smirnov: difference in CDFsAnderson-Darling: added emphasis on tailMay need to sample observations -
Correlations
Correlation can be measured by the correlation coefficientIt can be modeled by a joint distribution functionBoth may not be very useful -
Correlation Coefficient
Gives low results for correlation of runtime and size in parallel systems
systemCCCTC SP2-0.029KTH SP20.011SDSC SP20.145LANL CM-50.211SDSCParagon0.305 -
Distributions
A restricted version of a joint distribution
-
Modeling Correlation
Divide range of one attribute into sub-rangesCreate a separate model of other attribute for each sub-rangeModels can be independent, or model parameter can depend on sub-range -
Stationarity
Problem of daily/weekly activity cycleNot important if unit of activity is very small (network packet)Very meaningful if unit of work is long (parallel job) -
How to Modify the Load
Multiply interarrivals or runtimes by a factorChanges the effective length of the dayMultiply machine size by a factorModifies packing propertiesAdd users -
Stationarity
Problem of daily/weekly activity cycleNot important if unit of activity is very small (network packet)Very meaningful if unit of work is long (parallel job)Problem of new/old systemImmature workloadLeftover workload -
Heavy Tails
-
Tail Types
When a distribution has mean m, what is the distribution of samples that are larger than x?
Light: expected to be smaller than x+m Memoryless: expected to be x+m Heavy: expected to be larger than x+m -
Formal Definition
Tail decays according to a power law
Test: log-log complementary distribution
-
Consequences
Large deviations from the mean are realisticMass disparitysmall fraction of samples responsible for large part of total massMost samples together account for negligible part of massCrovella, JSSPP 2001
-
Unix File Sizes Survey, 1993
-
Unix File Sizes LLCD
-
Consequences
Large deviations from the mean are realisticMass disparitysmall fraction of samples responsible for large part of total massMost samples together account for negligible part of massInfinite momentsFor mean is undefinedFor variance is undefinedCrovella, JSSPP 2001
-
Pareto Distribution
With parameter the density is proportional to
The expectation is then
i.e. it grows with the number of samples
-
Pareto Samples
-
Pareto Samples
-
Pareto Samples
-
Effect of Samples from Tail
In simulation:A single sample may dominate resultsExample: response times of processesIn analysis:Average long-term behavior may never happen in practice -
Real Life
Data samples are necessarily boundedThe question is how to generalize to the model distributionArbitrary truncationLognormal or phase-type distributionsSomething in between -
Solution 1: Truncation
Postulate an upper bound on the distributionQuestion: where to put the upper boundProbably OK for qualitative analysisMay be problematic for quantitative simulations -
Solution 2: Model the Sample
Approximate the empirical distribution using a mixture of exponentials (e.g. phase-type distributions)In particular, exponential decay beyond highest sampleIn some cases, a lognormal distribution provides a good fitGood for mathematical analysis -
Solution 3: Dynamic
Place an upper bound on the distributionLocation of bound depends on total number of samples requiredExample:Note: does not change during simulation
-
Self Similarity
-
The Phenomenon
The whole has the same structure as certain partsExample: fractals -
The Phenomenon
The whole has the same structure as certain partsExample: fractalsIn workloads: burstiness at many different time scalesNote: relates to a time series
-
Job Arrivals to SDSC Paragon
-
Process Arrivals to SDSC Paragon
-
Long-Range Correlation
A burst of activity implies that values in the time series are correlatedA burst covering a large time frame implies correlation over a long rangeThis is contrary to assumptions about the independence of samples -
Aggregation
Replace each subsequence of m consecutive values by their meanIf self-similar, the new series will have statistical properties that are similar to the original (i.e. bursty)If independent, will tend to average out -
Poisson Arrivals
-
Tests
Essentially based on the burstiness-retaining nature of aggregationRescaled range (R/s) metric: the range (sum) of n samples as a function of n -
R/s Metric
-
Tests
Essentially based on the burstiness-retaining nature of aggregationRescaled range (R/s) metric: the range (sum) of n samples as a function of nVariance-time metric: the variance of an aggregated time series as a function of the aggregation level -
Variance Time Metric
-
Modeling Self Similarity
Generate workload by an on-off processDuring on period, generate work at steady paceDuring off period to nothingOn and off period lengths are heavy tailedMultiplex many such sourcesLeads to long-range correlation -
Research Areas
-
Effect of Users
Workload is generated by usersHuman users do not behave like a random sampling processFeedback based on system performanceRepetitive working patterns -
Feedback
User population is finiteUsers back off when performance is inadequateNegative feedback
Better system stability
Need to explicitly model this behavior -
Locality of Sampling
Users display different levels of activity at different timesAt any given time, only a small subset of users is active -
Active Users
-
Locality of Sampling
Users display different levels of activity at different timesAt any given time, only a small subset of users is activeThese users repeatedly do the same thingWorkload observed by system is not a random sample from long-term distribution -
SDSC Paragon Data
-
SDSC Paragon Data
-
Growing Variability
-
SDSC Paragon Data
-
SDSC Paragon Data
-
Locality of Sampling
The questions:
How does this effect the results of performance evaluation?Can this be exploited by the system, e.g. by a scheduler? -
Hierarchical Workload Models
Model of user populationModify load by adding/deleting usersModel of a single users activityBuilt-in self similarity using heavy-tailed on/off timesModel of application behavior and internal structureCapture interaction with system attributes -
A Small Problem
We dont have data for these modelsEspecially for user behavior such as feedbackNeed interaction with cognitive scientistsAnd for distribution of application types and their parametersNeed detailed instrumentation -
Final Words
-
We like to think that we design systems based on solid foundations
-
But beware:
the foundations might be unbased assumptions!
-
We should have more science in computer science:
Collect data rather than make assumptions Run experiments under different conditions Make measurements and observations Make predictions and verify them Share data and programs to promote goodpractices and ensure comparability
Computer Systems are Complex
Science = experimental scince, like physics, chemistry, biology
-
Advice from the Experts
Science if built of facts as a house if built of stones. But a collection of facts is no more a science than a heap of stones is a house
-- Henri Poincar
-
Advice from the Experts
Science if built of facts as a house if built of stones. But a collection of facts is no more a science than a heap of stones is a house
-- Henri Poincar
Everything should be made as simple as possible, but not simpler
-- Albert Einstein
-
Acknowledgements
Students: Ahuva Mualem, David Talby,Uri Lublin
Larry Rudolph / MITData in Parallel Workloads ArchiveJoefon Jann / IBMAllen Downey / WelselleyCTC SP2 log / Steven HotovySDSC Paragon log / Reagan MooreSDSC SP2 log / Victor HazelwoodLANL CM-5 log / Curt CanadaNASA iPSC/860 log / Bill Nitzbergx
a
x
F
log
)
(
log
-
=
(
)
(
)
(
)
(
)
-
-
-
-
2
2
y
y
x
x
y
y
x
x
i
i
i
i
(
)
(
)
2
0
Pr