opportune job shredding: an efficient approach for scheduling parameter sweep applications

Post on 23-Feb-2016

41 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications. Rohan Kurian, Pavan Balaji, P. Sadayappan The Ohio State University. Parameter Sweep Applications. An important class of applications Set of independent tasks MCell Application - PowerPoint PPT Presentation

TRANSCRIPT

Opportune Job Shredding:An Efficient Approach for

Scheduling Parameter Sweep Applications

Rohan Kurian, Pavan Balaji, P. Sadayappan

The Ohio State University

Parameter Sweep Applications

An important class of applicationsSet of independent tasksMCell Application

3D simulations for sub-cellular architecture/physiologyGTOMO (Parallel Tomography) Application

Multiple view-point simulation

Systems exist for scheduling on the Grid Cluster-based Scheduling?

Application Level Schedulers

Manage the scheduling of applicationsBreak the application to appropriate

chunksAPST (AppLeS Parameter Sweep Template)NIMROD

Greedy approach to schedule PSA chunks

Presentation Roadmap Job Scheduling in Clusters Multi-Site Job Scheduling PSA Scheduling Strategies Multi-Site Scheduling of PSAs Performance Evaluation Conclusions

Job Scheduling in Clusters Mapping arriving jobs to available resources Multiple Schemes for Scheduling

First Come First Serve (FCFS) Conservative Scheduling Aggressive or EASY Scheduling

Fair-Share Constraints A user can not have more than ‘N’ queued jobs

Submitting the multiple chunks of a PSA job Violation of Fair-Share constraints Combine chunks to form a single parallel job

Formation of PSAs in ClustersSmall

Independent Tasks

Parallel Parameter

Sweep Application

Presentation Roadmap Job Scheduling in Clusters Multi-Site Job Scheduling PSA Scheduling Strategies Multi-Site Scheduling of PSAs Performance Evaluation Conclusions

Multi-Site Job SchedulingMultiple Simultaneous Requests

Job submitted to multiple sitesStarted on the earliest clusterExisting schemes have limitations

Heterogeneous ClustersDifferent Scheduling Schemes

Multiple-simultaneous-requests

Meta Scheduler

Local Scheduler

Meta Scheduler

Local Scheduler

Meta Scheduler

Local Scheduler

Jobs

Jobs

JobsSite 1 Site 2

Site 3

Presentation Roadmap Job Scheduling in Clusters Multi-Site Job Scheduling PSA Scheduling Strategies Multi-Site Scheduling of PSAs Performance Evaluation Conclusions

PSA Scheduling Strategies Flooding based Job Shredding

Submit all chunks in the PSA at onceGreedy approach Improves User and System metricsDoesn’t ensure fairness to Non-PSA jobs

Opportune Job ShreddingUses an additional Application-Level Scheduler

Monitors the current schedule of the system If no normal backfill is possible

Allow PSA jobs to shred and backfill

Presentation Roadmap Job Scheduling in Clusters Multi-Site Job Scheduling PSA Scheduling Strategies Multi-Site Scheduling of PSAs Performance Evaluation Conclusions

Multi-Site Scheduling for PSAsTwo-level Application Level SchedulersNo constraints on sites

Allowed to have different speedsAllowed to have different scheduling

policiesSimilar to “Multiple Simultaneous

Requests”Simultaneous requests only for PSAs

Multi-Site Scheduling for PSAs

App-Level Scheduler

Job Queue Local Scheduler

App-Level Scheduler

Job Queue Local Scheduler

App-Level Scheduler

Job Queue Local Scheduler

MetaApplication-Level

Scheduler

Site 1

Site 2

Site 3

Presentation Roadmap Job Scheduling in Clusters Multi-Site Job Scheduling PSA Scheduling Strategies Multi-Site Scheduling of PSAs Performance Evaluation Conclusions

Performance MetricsResponse Time

Completion Time – Submit TimeSlowdown

Response Time / RuntimeLoss of Capacity (LOC)

LOC = min {(waiting jobs procs), idle procs}

T = Time for which this state lastsLOC = LOC x T

Evaluation Scheme Simulation based Approach CTC trace from Feitelson’s archive EASY backfilling used For multi-site evaluation

CTC traces from 3 different monthsProcessing speeds in the ratio 2:1:3

Flooding Based Job ShreddingAverage Slowdown (10% PSA Jobs)

-150

-100

-50

0

50

100

1 1.2 1.5

LoadP

erce

ntag

e de

crea

se

All Jobs PSA Jobs Non-PSA Jobs

Average Response Time(10% PSA Jobs)

-20

0

20

40

60

80

1 1.2 1.5

Load

Per

cent

age

decr

ease

All Jobs PSA Jobs Non-PSA Jobs

• Up to 60% improvement for PSA Jobs• Up to 90% worse performance for Non-PSA

Jobs

Flooding: Job Category wise breakup

Average Response Time(10% PSA Jobs)

-100

-80

-60

-40

-20

0

20

1 1.2 1.5

Load

Per

cent

age

decr

ease

NarrowShort NarrowLongWideShort WideLong

Average Slowdown(10% PSA Jobs)

-140-120-100

-80-60-40-20

02040

1 1.2 1.5

LoadP

erce

ntag

e de

crea

seNarrowShort NarrowLongWideShort WideLong

• Narrow Short Non-PSA jobs suffer most• Loss of back-filling opportunities is the main

reason

Flooding: Loss of CapacityLoss Of Capacity (10% PSA jobs)

0

10

20

30

40

50

60

70

80

1 1.2 1.5

Load

Per

cent

age

decr

ease

10% PSA Jobs

• Up to 75% improvement in the Loss of Capacity

Opportune Job ShreddingAverage Response Time

(10% PSA Jobs)

-2

0

2

4

6

8

10

1 1.2 1.5

Load

Per

cent

age

decr

ease

All Jobs PSA Jobs Non-PSA Jobs

Average Slowdown(10% PSA Jobs)

-100

1020304050607080

1 1.2 1.5Load

Per

cent

age

decr

ease

All Jobs PSA Jobs Non-PSA Jobs

• Up to 70% improvement for PSA Jobs• Less than 2% worsening in performance for Non-

PSA Jobs

Opportune: Job Category wise breakup

Average Response Time(10 % PSA Jobs)

-3

-2-1

01

23

4

1 1.2 1.5

Load

Per

cent

age

decr

ease

NarrowShort NarrowLongWideShort WideLong

Average Slowdown (10% PSA Jobs)

-8

-6

-4

-2

0

2

4

1 1.2 1.5

LoadP

erce

ntag

e de

crea

seNarrowShort NarrowLongWideShort WideLong

• No category of Non-PSA jobs suffers more than 7%

Opportune: Loss of CapacityLoss Of Capacity (10% PSA Jobs)

02468

101214

1 1.2 1.5

Load

Per

cent

age

decr

ease

10% PSA Jobs

• Up to 12% improvement in the Loss of Capacity

Opportune (Multi-Site)Average Response Time

(10% PSA Jobs)

0102030405060708090

1 1.2 1.5Load

Perce

ntag

e dec

reas

e

PSA Jobs Cluster1 Non-PSA Jobs Cluster1PSA Jobs Cluster2 Non-PSA Jobs Cluster2PSA Jobs Cluster3 Non-PSA Jobs Cluster3

Average Slowdown (10% PSA Jobs)

-40-20

020406080

100120

1 1.2 1.5

LoadPe

rcent

age d

ecre

ase

PSA Jobs Cluster1 Non-PSA Jobs Cluster1PSA Jobs Cluster2 Non-PSA Jobs Cluster2PSA Jobs Cluster3 Non-PSA Jobs Cluster3

• Up to 95% improvement for PSA Jobs• No significant loss of performance for Non-PSA jobs

Opportune (Multi-Site):Response Time

Average Response Time (10% PSA Jobs)

0102030405060708090

1 1.2 1.5Load

Perce

ntag

e dec

reas

e

PSA Jobs Cluster1 Non-PSA Jobs Cluster1 PSA Jobs Cluster2Non-PSA Jobs Cluster2 PSA Jobs Cluster3 Non-PSA Jobs Cluster3

• Up to 75% improvement for PSA Jobs• No significant loss of performance for Non-PSA jobs

Opportune (Multi-Site):Slowdown

Average Slowdown (10% PSA Jobs)

-40-20

020406080

100120

1 1.2 1.5

Load

Perce

ntag

e dec

reas

e

PSA Jobs Cluster1 Non-PSA Jobs Cluster1 PSA Jobs Cluster2Non-PSA Jobs Cluster2 PSA Jobs Cluster3 Non-PSA Jobs Cluster3

• Up to 95% improvement for PSA Jobs• No significant loss of performance for Non-PSA jobs

Opportune (Multi-Site):Loss of Capacity

Loss Of Capacity (10% PSA Jobs)

05

101520253035404550

1 1.2 1.5

Load

Per

cent

age

decr

ease

Cluster1Cluster2Cluster3

• Up to 45% improvement in the Loss of Capacity

Concluding RemarksOpportune Job Shredding

Efficient Scheduling of PSAsSingle Site and Multi-Site versionsSignificant improvement for PSA jobsEnsures that Non-PSA jobs are not affected

Plan to integrate this with Prod. Schedulers

Thank You!

top related