active and accelerated learning of cost models for optimizing scientific applications
DESCRIPTION
Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications. Piyush Shivam, Shivnath Babu, Jeffrey Chase Duke University. Site A. Site C. Site B. Networked Computing Utility. Task workflow. A network of clusters or grid sites. Task scheduler. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/1.jpg)
Active and Accelerated Learning of Cost Models for Optimizing Scientific
Applications
Piyush Shivam, Shivnath Babu, Jeffrey Chase
Duke University
![Page 2: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/2.jpg)
C3
C1
C2
Site A
Site B
Site C
Task scheduler
Task workflow•A network of clusters or grid
sites
Networked Computing Utility
•Each site is a pool of heterogeneous resources
•Jobs are task workflows
•Challenge: choose good resource assignments for the jobs
![Page 3: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/3.jpg)
C3C1
C2
Site A
Site B
Site C
home file server
P1
P2P3
• A workflow with a single task
Example: Assigning Resources to Run Tasks
P1 Site A Site A
• Task input data at Site A
• Execution plan Ξ Resource assignment
P2 Site B Site A
P3 Site B Site B
Plan CPU Storage
![Page 4: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/4.jpg)
Plan Selection Problem
Choose Best Plan
Plans CPU Storage
P1 Site A Site A
P2 Site B Site A
… … …
Task workflow
Plan Enumeration
Cost
T1
T2
…
Cost: Plan Execution
Time
Challenge: Need cost models to estimate plan execution time
![Page 5: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/5.jpg)
Generating Cost Models is Hard
• Non-declarative
– Scientific workflow tasks are usually scripts (matlab, perl)
– Such tasks are not database operators like join or select
– Hence: task is a black box with no prior knowledge
• Heterogeneous resources
– Computational grid setting
– Performance varies a lot across resource assignments
• Data dependency
– Performance can vary significantly based on properties of input data & parameters to scripts
![Page 6: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/6.jpg)
Problem Setting• Scientific workflows at DSCR (Duke Shared Cluster
Resource)
• Important scientific workflows are run repeatedly
– Opportunity to observe & learn task behavior
– Better plan selection for subsequent runs
• Sequential scientific workflows
– Each task runs on a single node
– >90% of workflows at DSCR are sequential
![Page 7: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/7.jpg)
NIMO SystemNonInvasive Modeling for
Optimization
NIMO learns cost models for task workflows
– End-to-end cost models
• Incorporate properties of tasks, resources, & data
– Non-invasive
• No changes to tasks
– Automated and active
• Automatically collects training data for learning cost models
C3
C1
C2
Site A
Site B
Site C
Scheduler NIMO
NIMO SystemNonInvasive Modeling for
Optimization
![Page 8: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/8.jpg)
NIMO Fills a Gap
• WorkFlow Management Systems (WFMSs)
– WFMSs use database technology for managing all aspects of scientific workflows [Liu ‘04, Shankar ‘05]
• Batch scheduling systems
– Knowledge of plan execution time is assumed for optimizing resource assignments [Casanova ‘00, Phan ‘05, Kelly ‘03]
NIMO generates cost models for these systems
![Page 9: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/9.jpg)
Roadmap
• Cost models
• NIMO: active learning of cost models
• Experimental evaluation
• Related work
• Conclusions
• Future work
![Page 10: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/10.jpg)
Cost Model
Task
Executiontime
Resource assignment
Cost Modelfor Task Input data
Total workflow execution time can be derived usingthe cost models for individual tasks
Task workflow
![Page 11: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/11.jpg)
Oa
(compute
occupancy)
Os
(stall occupancy)
Task Cost Model
compute phase(compute resource busy)
stall phase(compute resource
stalled on I/O)
Od
(storage
occupancy)
On
(network
occupancy)
+ + )(T = D *totaldata
exec.time
occupancy: average time spent per unit of data
![Page 12: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/12.jpg)
Cost ModelTask
Executiontime
Resourceassignment
Cost Model
Input dataT = D * (Oa + On + Od)
Resource profile
Data profile
Task profile
![Page 13: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/13.jpg)
Learning Cost Models
Learning the cost model = Learning profiles + Learning predictors
![Page 14: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/14.jpg)
Independent variables
Resource profile ( )
Dataprofile ( )
Statistical Learningof Predictors
Dependent variables
Ex: Learn each predictor as a regression modelfrom the training data
![Page 15: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/15.jpg)
Challenges in Learning
• Cost of sample acquisition
• Coverage of system operating range
• Curse of dimensionality
– Suppose: 10 profile attributes X 10 values per attribute, and 5 minutes for a task run (sample) We sample 1% of space and build cost model
Passive learning
Elapsed Time
Accuracy of
currentbest
model
951 years!
Active & AcceleratedLearning
Best accuracy possible
![Page 16: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/16.jpg)
Active (and Accelerated) Learning
• Which predictors are important?
• Which profile attributes should each predictor have?
• What values to consider for each profile attribute during training?
Resource profile Data profile
![Page 17: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/17.jpg)
WANemulator(nistnet)
NIMO workbench
Training setdatabase
Active &Accel.
learning
C3
C1
C2
Site A
Site B
Site C
Scheduler
NIMO System
Taskprofiler
Resourceprofiler
Run standard benchmarks
Dataprofiler
![Page 18: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/18.jpg)
Active Learning Algorithm
Initialization
While( ) {
}
![Page 19: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/19.jpg)
• Relearn predictors with the new set of training samples
• Compute current prediction error of each predictor
– Fixed test set
– Cross-validation
Active Learning Algorithm
Initialization
While( ) {
}
Pick a new assignment
Run task on chosen assignment
Relearn predictors
Relearn Predictors
10ms256M1GHz 1G512MB 6 8T44
![Page 20: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/20.jpg)
Active Learning Algorithm
Initialization
While( ) {
}
Run task on chosen assignment
Relearn predictors
10ms256M1GHz 1G512MB 6 8T44
Choose a predictor to refine
Choose attributes for the predictor
Choose attribute values for the run
Predictor Choice• Predictors – fa, fn, fd, fD
• Order predictors + Traverse this order
– Ex: relevance-based order (Plackett-Burman)
– Ex: choose predictor with current max. error
![Page 21: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/21.jpg)
Active Learning Algorithm
Initialization
While( ) {
}
Run task on chosen assignment
Relearn predictors
10ms256M1GHz 1G512MB 6 8T44
Choose a predictor to refine
Choose attributes for the predictor
Choose attribute values for the run
Attribute Choice
• Each predictor takes profile attributes as input
• Not all attributes are equally relevant
• Order attributes + Traverse this order
![Page 22: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/22.jpg)
Active Learning Algorithm
Initialization
While( ) {
}
Run task on chosen assignment
Relearn predictors
10ms256M1GHz 1G512MB 6 8T44
Choose a predictor to refine
Choose attributes for the predictor
Choose attribute values for the run
Value Choice
• Cover the operating range of attributes
• Expose main interactions with other attributes
![Page 23: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/23.jpg)
Experimental Results
• Biomedical workflows (from DSCR)
– BLAST, fMRI, NAMD, CardioWave
– Single task workflows
• Plan space in the heterogeneous networked utility
– 5 CPU speeds, 6 Network latencies, 5 Memory sizes
– 5 X 6 X 5 = 150 resource plans
• Goal: Converge quickly to a fairly-accurate cost model
– We use regression models for the predictors
– Model validation details in previous work (ICAC 2005)
![Page 24: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/24.jpg)
Performance Summary
• Error: Mean absolute % error in predicted execution time• A separate test set for evaluating the error
![Page 25: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/25.jpg)
BLAST Application: Predictor Choice
![Page 26: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/26.jpg)
BLAST Application: Attribute Choice
![Page 27: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/27.jpg)
Related Work
• Workflow Management Systems (WFMSs)
– [Shankar ’05, Liu ’04 etc.]
• Performance prediction in scientific applications
– [Carrington ’05, Rosti ’02, etc.]
• Learning cost models using statistical techniques
– [Zhang ’05, Zhu ’96, etc.]
• NIMO is end-to-end, noninvasive, and active (acquires model learning data automatically)
![Page 28: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/28.jpg)
Conclusions
• NIMO:
– Learns cost models for scientific workflows
– Noninvasive and end-to-end
– Active and accelerated learning: Learns accurate cost models quickly
– Fills a gap in Workflow Management Systems
![Page 29: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/29.jpg)
• NIMO + SHIRAKO
– A policy-based resource-leasing system that can slice-and-dice virtualized resources
• NIMO + Fa
– Processing system-management queries (e.g., root-cause diagnosis, forecasting performance problems, capacity-planning)
C3
C1
C2
Site A
Site B
Site C
Scheduler NIMO
Future Work
![Page 30: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/30.jpg)
Backup Slides for Explanation
![Page 31: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/31.jpg)
See Paper for Details of Steps• Each algorithm step has sub-algorithms
• Example: Choosing the predictor to refine in current step
– Goal: learn most relevant predictors first
– Static Vs. dynamic ordering
• Static:
– Define total order: a priori or using estimates of influence (Plackett-Burman)
– Traverse the order: round-robin Vs. improvement-threshold-based
• Dynamic: choose the predictor with maximum current prediction error
![Page 32: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/32.jpg)
Active and Accelerated Learning
![Page 33: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/33.jpg)
Latency hiding
![Page 34: Active and Accelerated Learning of Cost Models for Optimizing Scientific Applications](https://reader036.vdocument.in/reader036/viewer/2022081603/56814911550346895db649e3/html5/thumbnails/34.jpg)
Saturation