utility driven adaptive workflow execution

20
Combining the strengths of UMIST and The Victoria University of Manchester Utility Driven Adaptive Workflow Execution Kevin Lee School of Computer Science, University of Manchester Currently at University of Mannheim, Germany [email protected] www.kevin-lee.co.uk/research.html 20 th May 2009

Upload: stacia

Post on 05-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Utility Driven Adaptive Workflow Execution. Kevin Lee School of Computer Science, University of Manchester Currently at University of Mannheim, Germany L [email protected] www.kevin-lee.co.uk/research.html. 20 th May 2009. 1. Problem Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

Utility Driven Adaptive Workflow Execution

Kevin Lee School of Computer Science, University of Manchester

Currently at University of Mannheim, Germany

[email protected]

www.kevin-lee.co.uk/research.html

20th May 2009

Page 2: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

1. Problem OverviewConcerning: Scientific Workflows executing on Grids

Characteristics:•Very long running•Small delays can have large effects due to dependencies•involve highly distributed resources•Limited control over resources•Uncertain execution and batch queue times

Statically schedule a workflow before it starts executing:•Using current information about the execution environment

What happens if the environment changes?•Resources appear/disappear•Loads change due to resources being used

Obvious solution, Adapt at runtime!!!

Page 3: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

2. Background: montage workflow

Mosaic created by Montage from a run of the M101 galaxy images

<- A Simple Montage workflow.

Can execute on Grid resources.

Can be specified in a high level abstract form:Logical filesLogical transformations

Montage

•Deliver science-grade mosaics on demand

•Produce mosaics from a wide range of data sources

•User-specified parameters of projection, coordinates,

size, rotation and spatial sampling

Page 4: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

2. Background

Pegasus Workflow execution:

Compilation

Abstract (logical) ->Concrete

Submission

Graph dependency manager

Execution

Jobs execute on grid resources

Reporting

Task and workflow status

Page 5: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

Retrofit an Adaptivity

framework to Pegasus

Minimal changes to Pegasus

Touch points via Sensors and

Effectors

3. Adaptive Workflow Execution

Page 6: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

3. Adaptive Workflow Execution

When grid site has contention

•Batch queues times is higher than estimated

When a grid site is under utilised

•Batch queues times is lower than estimated

Static Schedule may be initially correct

•This diverges from the ideal with time.

•Adapt to changing batch queue times

Aim

Page 7: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

3. Adaptive Workflow Execution

To monitor the progress of an executing workflow, we parse the Live Log.Example:

Sensors->Monitoring

2/17 11:53:14 Event: ULOG_GRID_SUBMIT for Condor Node mBackground_ID000023 (4713.0)2/17 11:53:14 Event: ULOG_EXECUTE for Condor Node mBackground_ID000021 (4709.0)2/17 11:53:14 Number of idle job procs: 42/17 11:53:20 Event: ULOG_EXECUTE for Condor Node mBackground_ID000019 (4708.0)2/17 11:53:20 Number of idle job procs: 32/17 11:53:28 Event: ULOG_JOB_TERMINATED for Condor Node mBackground_ID000022 (4710.0)2/17 11:53:28 Node mBackground_ID000022 job proc (4710.0) completed successfully.

Result:XML Events for job queued, executed, termination.Made available to analysis as a stream

RegEx:

([\d]+)/([\d]+).([\d]+):([\d]+):([\d]+).Event:.([\S]+_[\S]+).for.Condor.Node.([a-zA-Z0-9_]+)

Page 8: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

3. Adaptive Workflow Execution

Uses the CQL continuous query language to group and analyse the eventsSQL-like but with extensions for queries over time.1.Calculates current average job queue times over a period of time2.Causes re-planning when queue times are more or less than expected

Analysis

<vquery> <cqlvquery>select h*3600+m*60+s,job,site,est from workflowlog where event="ULOG_SUBMIT";</cqlvquery> <cqlvtable>register stream submittedjobs (time int, job char(22), site char(22), est int);</cqlvtable> </vquery> <vquery> <cqlvquery>select h*3600+m*60+s,job from workflowlog where event="ULOG_EXECUTE";</cqlvquery> <cqlvtable>register stream executedjobs (time int, job char(22));</cqlvtable> </vquery> <vquery> <cqlvquery>Rstream (select executed.time-submitted.time, executed.job, submitted.site, submitted.est from executedjobs[Range 360 Seconds] as executed,submittedjobs as submitted where executed.job=submitted.job);</cqlvquery> <cqlvtable>register stream jobdelay (delay int, job char(22), site char(22), est int);</cqlvtable> </vquery> <cqlquery>select site, delay, est, (delay-est) from jobdelay where (delay-est)>20;</cqlquery>

Output from this causes planning

Page 9: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

3. Adaptive Workflow Execution

Planning has the task of recalculating a better assignment for the workflow

Data we have:Workflow DAGCurrent AssignmentCollected data about resources, number CPUS, Execution times, AVG queue timeWhat we’ve submitted since the execution started.

Approach:

Call out to a Matlab based utility function optimiser (MADS)

Each iteration:

New potential assignment

We provide a function that evaluates the new potential assignment.

Proceed with the search until the best assignment is found

Planning

Page 10: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

3. Adaptive Workflow Execution

Firstly, for each proposed new assignment we calculate estimated queue times:

Planning

Estimated Queue time:Based on external demand, the new demand and the change in actual queue times

A Estimate of External Demand For a period p

Assigned demandFor the period p

The Candidate Demand The demand we’ll put on the resources

Full explanation in papers

Page 11: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

3. Adaptive Workflow Execution

Next, calculate the Predicted Response Time for the workflow:

Planning

Completion time of the last task plus any adaptation cost:

Recursive formula to estimate the completion time of the last task

So, now we have a estimate of how long a workflow will take for each new assignment

We need a way of judging how good a assignment is in relation to its PRT and the resources used

Page 12: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

3. Adaptive Workflow Execution

Option 1: Utility for Response time:

Purely tries to use the fastest resources available to complete the workflow

EQT ensures a resource isn’t overloaded

The utility is therefore just:

The higher the Utility value the better

The optimiser will try multiple values of assignment until a ‘good’ one is found

Planning

Page 13: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

3. Adaptive Workflow Execution

Option 2: Utility for Profit:

As resources are not free, we attach a value to using resources

We have a reward for completing a workflow within a target time

A cost for using a resource to execute a task

Planning

Cost for a workflow assignment:Profit is a measure of utility minus cost

The utility is a calculation of how likely the assignment completes before the target response time

The larger the ‘profit’ the better for the optimiser

Page 14: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

3. Adaptive Workflow Execution

For a new assignment:

1.Tell the local DAG manager to halt the workflow(s)

2.Collect the locations of all the partial results

3.Modify local databases with this new data

4.Replan the workflow(s) with the new assignment

5.Deploy the workflow

6.Continue monitoring the new execution

Repeats every time a new assignment is available

Execution/Deploying a new assignment

Page 15: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

4. Experimental Evaluation

Workflow:

27 Node Montage workflow of M17:

Takes between 20 mins and a few hours depending on

resources

Profit gain is 100 for completing within the target

Resources:

Two Clusters.

Linux, Sun Grid Engine, WSGRAM, Globus.

(1) is less powerful with longer queue times

(2) is more powerful with shorter queue times

(2) costs more than (1). (1) costs 1, (2) costs 2.

Page 16: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

4. Experimental Evaluation

Experiment 1

Single workflow. Periodic Load Applied to Cluster 1.

Utility based on response time

The adaptive version performs an adaption and results in a faster workflow

Page 17: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

4. Experimental Evaluation

Experiment 2: Same as experiment 1 but for different target response times

U(RT) Always performs the best.U(Profit) meets the High and mid target response times at less cost than U(RT)U(Profit) fails to meet the low target response time so uses the cheapest resources

Page 18: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

4. Experimental EvaluationExperiment 3

Two Montage workflows. Periodic Load Applied to Cluster 1.

Achieved by submitting and monitoring two workflows at the same time.

Utility is the Sum of all U(RT) and U(Profit) for all workflows.

U(RT) Always performs the best.U(Profit) meets the Loose and mid target response times at less cost than U(RT)U(Profit) fails to meet the Tight target response time so uses the cheapest resources

Page 19: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

5. Conclusions

An Approach to optimising workflow execution:

•Long running workflows

•Takes into account a workflows structure

•Takes into account current loads

•Takes into account the loads we will apply

•Minimal intervention to workflow infrastructure

•Good results for Response time and Profit focus

Page 20: Utility Driven Adaptive Workflow Execution

Combining the strengths of UMIST andThe Victoria University of Manchester

Questions?