active sampling for accelerated learning of performance models piyush shivam, shivnath babu, jeff...
Post on 05-Jan-2016
213 Views
Preview:
TRANSCRIPT
Active Sampling for Accelerated Learning of
Performance Models
Piyush Shivam, Shivnath Babu, Jeff Chase
Duke University
C3
C1
C2
Site A
Site B
Site C
Task scheduler
Task workflow
A network of clusters or grid sites.
Each site is a pool of heterogeneous resources (e.g., CPU, memory, storage, network)
Managed as a shared utility.
Jobs are task/data workflows.
Challenge: choose the ‘best’ resource mapping/schedule for the job mix.
Instance of “utility resource planning”.
Solution under construction: NIMO
Networked Computing Utility
Subproblem: Predict Job Completion Time
AttributesSamples
CPU speed
Memory size
Network latency
Disk spindles Execution time
s1 2.4 GHz
2 GB 1 ms 10 2 hours
. . . . . .
. . . . . .
Premises (Limitations)• Important batch applications are run repeatedly.
– Most resources are consumed by applications we have seen in the past.
• Behavior is predictable across data sets.– …given some attributes associated with the data set.– Stable behavior per unit of data processed (D)– D is predictable from data set attributes.
• Behavior depends only on resource attributes.– CPU type and clock, seek time, spindle count.
• Utility controls the resources assigned to each job.– Virtualization enables precise control.
• Your mileage may vary.
NIMONonInvasive Modeling for
Optimization
• NIMO learns end-to-end performance models– Models predict performance as a function of, (a)
application profile, (b) data set profile, and (c) resource profile of candidate resource assignment
• NIMO is active– NIMO collects training data for learning models by
conducting proactive experiments on a ‘workbench’• NIMO is noninvasive
App/data profiles
(Target) performance
Candidate resource profiles
Model
“What if…”
Applicationprofiler
Training setdatabase
Active learning
C3
C1
C2
Site A
Site B
Site C
SchedulerResourceprofiler
The Big Picture
Jobs, benchmarks
Pervasive instrumentation
Correlate metrics
with job logs
Generic End-to-End Model
compute phases(compute resource busy)
stall phases(compute resource
stalled on I/O)
Od
(storage
occupancy)
On
(network
occupancy)
+ + )(T = D *totaldata
comp.time
Oa
(compute
occupancy)
Os
(stall occupancy)
occupancy: average time consumed per unit of datadirectly observable
Independent variables
Dependent variables
Resource profile ( )
Dataprofile ( )
Statistical Learning
Complexity (e.g., latency hiding, concurrency, arm contention) is captured implicitly in the training data rather than in the structure of the model.
Sampling Challenges
• Full system operating range– Samples must cover space of candidate resource
assignments
• Cost of sample acquisition– Acquiring a sample has a non-negligible cost, e.g.,
time to acquire a sample, or opportunity cost for the application
• Curse of dimensionality– Too many parameters!– E.g., 10 dimensions X 10 values per dimension– 5 minutes for each sample => 951 years for 1%
samples!
Active Learning in NIMO
Passive sampling
Active sampling
Number of training samples
Accuracy of
current model
100%
• Passive sampling might not expose the system operating range
• Active sampling using “design of experiments” collects most relevant training data
• Automatic and quick
How to learn accurate models quickly?
Sample Carefully
Passive sampling
Active sampling with acceleration
Number of training samples
Accuracy ofcurrent model
100%
Active samplingwithout acceleration
Active Sampling Challenges
• How to expose the main factors and interactions in the shortest time?– Which dimensions/attributes to perturb?– What values to choose for the attributes?
• Where to conduct the experiment?– On a separate system (“workbench”) or “live”?
Planning `active’ experiments
1. Choose a predictor function to refine• Focus in on the most significant/relevant
predictors….or…the least accurate• Example: CPU-intensive app needs an
accurate compute time predictor2. Choose attribute (if any) to add to the predictor
• Example: CPU speed3. Choose the values of the attributes 4. Conduct the experiment5. Compute current prediction error; Go to Step 1
Choosing the Next Predictor
• Learn the most significant/relevant predictors first.– Static vs. dynamic ordering– Static: define total order, e.g., a priori or by
pre-estimates of influence (Plackett-Burman).• Cycle through the order: round-robin vs.
improvement threshold– Dynamic: choose the predictor with maximum
current error
Choosing New Attributes
• Include the most significant/relevant attributes– Choose attributes to expose main factors and
interactions• Add an attribute when error reduction from
further training with the current set falls below threshold.
• Choose the attribute with maximum potential improvement in accuracy.– Establish total order using pre-estimate of
relevance using Plackett-Burman.
Choosing New Values• Select a new value sample to train the selected
predictor function with the chosen set of attributes.
• Range of approaches balance coverage vs. interactions
Binary search/bracketPB to identify interactions
La-Ib
a = #levels for valueb = degree of interactions
Experimental Results
• Biomedical applications– BLAST, fMRI, NAMD, CardioWave
• Resources– 5 CPU speeds, 6 Network latencies, 5 Memory
sizes– 5 X 6 X 5 = 150 resource assignments
• Goal: Learn executing time model with least number of training assignments
• Separate test set to evaluate the accuracy of the current model
BLAST Application
• Total time for 150 assignments: 130 hrs
• Active sampling: 5 hrs
• Sample space: 2%
• Incorrect order of predictor refinement
• 12 hrs• 10% sample space
BLAST Application
• Total time for 150 assignments: 130 hrs
• Active sampling: 5 hrs
• Sample space: 2%
• Incorrect order of attribute refinement
• 12 hrs• 10% sample space
Summary/Conclusions
• Current SLT – given the right data, learn the right model
• Use active sampling to acquire the right data• Ongoing experiments demonstrate the
importance/potential of guided active sampling– 2% sample space, >= 90% model accuracy
• Upcoming VLDB paper…
top related