a scheduling component for e-science central anirudh agarwal jacek cała
TRANSCRIPT
2
Introduction
• .– Cloud-based workflow management system for
data analytics.
– Workflows composed of blocks which can be written in Java, R, Octave, JavaScript, Gnuplot, recently also bash.
– Portable system – workflows can run on a laptop, cluster, private or public clouds.
• EUBrazil Cloud Connect– to create an intercontinental, federated
infrastructure for the scientific use.
– combined effort between Brazil and several EU countries.
– 3 user applications to demonstrate potential of the EUBCC infrastructure:• Leishmania Virtual Laboratory, Heart Simulation, Biodiversity and climate change
3
EUBrazil Cloud ConnectAAI
Opportunistic Cloud
HPC
COMPSs PMESCSGRID e-SC
PDAS
fogbow
Private Cloud
mc2Users
Execution & Provisioning Services
InfrastructureProviders
COMPSse-SC APIProgramming Frameworks & Services
Data Providers
IM VMRC
LSF
OCCI CDMI
BESx509
oAuth2
OVFVOMS
OGE
4
EUBrazil Cloud ConnectAAI
Opportunistic Cloud
HPC
COMPSs PMESCSGRID e-SC
PDAS
fogbow
Private Cloud
mc2Users
Execution & Provisioning Services
InfrastructureProviders
COMPSse-SC APIProgramming Frameworks & Services
Data Providers
IM VMRC
LSF
OCCI CDMI
BESx509
oAuth2
OVFVOMS
OGE
5
e-Science Centralworkflow execution model
• Workflows are constructed from a number of interacting blocks.• Each workflow invocation is deployed onto one engine as a single job.• Each engine can process one or more workflows at a time.• Workflows can be composite -- can submit sub-workflow invocations
allowing for parallelism.
6
Advantages of the current model
• Simple management:– single pool of engines,– the pool can grow and shrink according to needs,– engines can be of different speed.
• Good scalability:– very little overheads.
0 50 100 150 200 2500
50
100
150
200
250
20.0
99.0%(49)
95.5%(95)
92.5% (139)
181.2
idealactual
Number of nodes
Rela
tive
spee
d-up
7
Limitations of the current model
• To simple for more sophisticated needs:– heterogeneous workflows/blocks,– heterogeneous hardware infrastructure.
• No control over invocation dispatch policy:– no priorities – e.g. admin == user,– no fairness – single user can block the system submitting 1000s of
invocations,– invocation messages may be consumed in an unfavourable manner.
• Invocation messages which are once moved to the JMS queue cannot be re-allocated.
8
Selected scheduler requirements
• To run workflows based on their hard and soft requirements and static and dynamic infrastructure capabilities:– support for heterogeneous workflows and resources <= federated
resources,– data-aware scheduling,– user-defined scheduling policies.
• To allow system to adapt in size dynamically (cloud bursting, opportunistic resources).
• To allow users to specify the priority for the workflows.• Improve the use of resources available:
– offer users/administrators some optimisation strategies.
9
Our focus in EUBCC
• To run workflows based on their hard and soft requirements and static and dynamic infrastructure capabilities:– support for heterogeneous workflows and resources <= federated
resources,– data-aware scheduling,– user-defined scheduling policies.
• To allow system to adapt in size dynamically (cloud bursting, opportunistic resources).
• To allow users to specify the priority for the workflows.• Improve the use of resources available:
– offer users/administrators some optimisation strategies.
10
Proposed solution
• Add a scheduling component (as a pluggable module) between the e-SC server and engines.
• Make use of the Performance Monitor which gathers information about the system.
• Have a one-one JMS queue for each engine (pool?).• Based on a Scheduling Policy choose the best engine to send
the workflow to.• Make sure the pending workflows can be rescheduled when
all the execution threads are busy.
11
Proposed Solution (cont.)
jBoss AS
e-SC server
JMS queue
Scheduling component
JMS queue
JMS queue
JMS queue
Engine pool Call engines equivalent
Engine pool Aall engines equivalent
Engine pool Ball engines equivalent
workflow invocations dispatched by the server
workflow invocations started by engines
workflow invocations started by users
12
Progress so far…
workflow invocations dispatched by the server
workflow invocations started by engines
workflow invocations started by users
performance and provenance information
jBoss AS
e-SC server
JMS queue
Scheduling component
JMS queue
JMS queue
JMS queue
Engine pool
Performance monitor
14
Progress so far (cont.)
• Current scheduling policy based on CPU load– not effective – just as a PoC.
• More advanced queue management– able to dynamically attach a new engine to a “scheluder” queue,– able to grow the queue pool if needed.
• Able to save workflow invocations in the scheduler when all the engine execution threads are exhausted– currently assuming there is 1 execution thread per engine.
15
Current problems and issues
• Simple CPU load policy.
• Engine vs engine pool per queue.
• Impact of the delay betweenengine --> PM --> scheduler.
• Missing event-based communication betweenthe engine and server.
17
Delay problem
E-SC server Scheduler
Performance Monitor
Engine
JMS QueueStart Workflow Check for Jobs
Get Information from PM
Send job to correct engine
Update PMUpdate Server about job status
5 sec delay
Gets wrong engine information from PM because of 5 second delay
Wrong engine maybe selected or wrong task maybe assigned
18
Expected issues and problems
• For more sophisticated policies:– Lack of input information about the task and its inputs and outputs:
• hardware/software requirements and capabilities,• absence of time completion for the task rules out many scheduling
policies,• data locality can play important role.
• Support for cloud bursting• e.g. interaction with an Infrastructure Manager
• Support for simulation– e.g. integration with WorkflowSim