usc viterbi school of engineering scientific workflows and systems ewa deelman

67
USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

Upload: calvin-turner

Post on 11-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Scientific Workflows and Systems

Ewa Deelman

Page 2: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Outline

• Scientific workflows• Business workflows• Different workflow systems

– Taverna

– Kepler

– Triana

– Askalon

Page 3: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Ewa Deelman [email protected]

Applications today

• Complex– Involve many computational steps– Require many (possibly diverse resources)

• Composed of individual application components– Components written by different individuals– Components require and generate large amounts of data– Components written in different languages

• Reuse of individual intermediate data products

• Need to keep track of how the data was produced

Page 4: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Workflow Instance

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Collect image

Collect image

Collect image

AdjustColor

AdjustColor

AdjustColor

Co-Addimage

Visualize

Image 2

Image 1

Image n

Page 5: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Business Workflows

Page 6: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Business Workflows

• Designed to compose applications based on web services

• BPEL – Standard language for service interactions

– Has many constructs to deal with the invocation of web services, including fault handling, and support for conditional logic.

Page 7: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

BPEL constructs

• <receive>: Blocks until a matching message is received. This is typically used to receive a message from the client or a callback from a partner web service.

• <reply>: Send a message in response to a message received via a <receive>

• <invoke>: Perform an invocation on a web service. (one-way or request-response)

• <assign>: Assign a value to a variable. • <sequence>: Executes a list of activities sequentially in

lexical order. • <flow>: Executes the activities in parallel. • <while>: Used for looping until a criteria is true. • <switch>: Select one branch for execution amongst a set of

branches based on a value.

Page 8: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Many BPEL engines

• Active bpel• IBM BPEL4J • Oracle BPEL Process Manager • Microsoft Windows Foundation• ….

Page 9: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Scientific vs Business Workflows

• Large amounts of data• Varied granularity of

computations• Large number of

computations• Often standalone

components• Non-programmers need to

be able to compose them• Need to provide

provenance info• Performance is important

• Deal with services across domains

• Do not deal with standalone application components

• Usually not very data intensive– Data can be easily sent between

services

• Important to agree on standard interfaces so that MS & IBM can work together

• Focus on functionality/interoperability rather than performance

Page 10: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Example of a business workflow

Page 11: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

• Example of Scientific Workflow

• Workflow Specification Components– Standalone computations

– Designed by different individuals

BgModel

Project

Project

Project

Diff

Diff

Fitplane

Fitplane

Background

Background

Background

Add

Image1

Image2

Image3

Page 12: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Different workflow systems

Taverna, a workbench for bioinformatics workflows

Slides courtesy of Katy Wolstencroft

Page 13: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

The Community Problems

• Everything is Distributed

– Data, Resources and Scientists

• Heterogeneous data • Very few standards

– I/O formats, data representation, annotation

– Everything is a string!

Integration of data and interoperability of resources is difficult

Page 14: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Lots of Resources

NAR 2007 – 968 databases

Page 15: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Traditional Bioinformatics

12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

Page 16: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Cutting and Pasting

• Advantages:– Low Technology on both server and client side

– Very Robust: Hard to break.

– Data Integration happens along the way

• Disadvantages:– Time Consuming (and painful!)

• Can be repeated rarely

• Limited to small data sets.

– Error Prone:• Poor repeatability

Page 17: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Pipeline Programming

• Advantages– Repeatable

– Allows automation

– Quick, reliable, efficient

• Disadvantages– Requires programming skills

– Difficult to modify

– Requires local tool and database installation

– Requires tool and database maintenance!!!

Page 18: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

What we want as a solution

A system that is:

• Allows automation• Allows easy repetition, verification and sharing of

experiments• Works on distributed resources• Requires few programming skills• Runs on a local desktop / laptop

Page 19: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

myGrid as a solution

myGrid allows the automated orchestration of in silico experiments over distributed resources from the scientist’s desktop

Built on computer science technologies of:• Web services• Workflows• Semantic web technologies

Page 20: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Workflows

– General technique for describing and enacting a process– Describes what you want to do, not how you want to do it– High level description of the experiment

RepeatMasker

Web service

GenScanWeb Service

BlastWeb Service

Page 21: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Workflow language specifies how bioinformatics processes fit together.

High level workflow diagram separated from any lower level coding – you don’t have to be a coder to build workflows.

Workflow is a kind of script or protocol that you configure when you run it.

Easier to explain, share, relocate, reuse and repurpose.

Workflow <=> ModelWorkflow is the integrator of knowledge

The METHODS section of a scientific publication

Workflows

Page 22: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Workflow Advantages

• Automation

– Capturing processes in an explicit manner

– Tedium! Computers don’t get bored/distracted/hungry/impatient!

– Saves repeated time and effort

• Modification, maintenance, substitution and personalisation

• Easy to share, explain, relocate, reuse and build

• Releases Scientists/Bioinformaticians to do other work

• Record

– Provenance: what the data is like, where it came from, its quality

Page 23: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Taverna Workflow Components

Scufl Simple Conceptual Unified Flow LanguageTaverna Writing, running workflows & examining resultsSOAPLAB Makes applications available

SOAPLABWeb Service

Any Application

Web Service e.g. DDBJ BLAST

Page 24: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

An Open World

• Open domain services and resources.• Taverna accesses 3000+ services• Third party – we don’t own them – we didn’t build them• All the major providers

– NCBI, DDBJ, EBI …• Enforce NO common data model.

• Quality Web Services considered desirable

Page 25: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Adding your own web services

• SoapLab • Java API Consumer

import Java API of libSBML as workflow components

http://www.ebi.ac.uk/soaplab/

Page 26: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Shield the Scientist – Bury the Complexity

Workflow enactor

Processor Processor

PlainWeb

Service

Soaplab

Processor

LocalJavaApp

Processor

Enactor

Processor

BioMOBY

Processor

WSRF

Processor

BioMART

Styx

Styxclient

Processor

Rpackage

...

...

Scufl Model

TavernaWorkbench

Workflow Execution

Application

Simple Conceptual Unified Flow Language

Page 27: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Kepler

Slides courtesy of Bertram Ludaesher

Page 28: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Scientific WorkflowCapture how a scientist works with data and analytical tools

– data access, transformation, analysis, visualization

– possible worldview: dataflow-oriented (cf. signal-processing)

Scientific workflow (wf) benefits (compare w/ script-based approaches) :

– wf automation

– wf & component reuse

– wf design, documentation

– wf archival, sharing

– built-in concurrency

(task-, pipeline-parallelism)

– built-in provenance support

– distributed execution

(Grid) support

– …

Page 29: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Ex: SEEK Ecological Niche Modeling Pipeline

• Scientific Workflow paradigm:– Reusable components (“actors”): a

scientist’s verbs/actions – Top-level workflows ≈ conceptual

representation of the science process, sentences in the scientist’s language

– Sub-workflows ≈ increasing levels of detail

• Separation of concerns:– actors: what to do– parameters: configurable behavior– channels: dataflow, pipeline composition– directors: fix execution model, scheduling– semantic types: smart discovery, linking

D Pennington, D Higgins, AT Peterson, M Jones, B Ludaescher, S Bowers. Ecological Niche Modeling using the Kepler Workflow System. Workflows

for e-Science, Springer.

Page 30: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Simple Kepler workflow using R (a statistics

package)

Data source from EcoGrid(metadata-driven ingestion)

res <- lm(BARO ~ T_AIR)resplot(T_AIR, BARO)abline(res)

R processing script

Page 31: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Convert

Archive

Monitor

Transfer

Plumbing with Style … (Norbert Podhorszki UC Davis, Scott

Klasky ORNL)

• Plasma physics simulation on 2048 processors on Seaborg@NERSC (LBL)– Gyrokinetic Toroidal Code (GTC) to study energy transport in fusion devices (plasma microturbulence)

– Generating 800GB of data (3000 files, 6000 timesteps, 267MB/timestep), 30+ hour simulation run

• Under workflow control:– Monitor (watch) simulation progress (via remote scripts)

– Transfer from NERSC to ORNL concurrently with the simulation run

– Convert each file to HDF5 file

– Archive files to 4GB chunks into HPSS

Page 32: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Our Starting Point: Actor-Oriented Modeling

Ports– each actor has a set of input and output ports

– denote the actor’s signature

– produce/consume data (a.k.a. tokens)

– parameters are special “static” ports

Page 33: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Actor-Oriented Modeling

Dataflow Connections– unidirectional actor “communication” channels

– connect output ports with input ports

– for composing analysis pipelines

Page 34: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Actor-Oriented Modeling

Sub-workflows / Composite Actors– composite actors “wrap” sub-workflows

– like actors, have signatures (i/o ports of sub-workflow)

– hierarchical workflows (arbitrary nesting levels)

Page 35: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Actor-Oriented Modeling

Directors– define the execution semantics of workflow graphs

– executes workflow graph (some schedule)

– sub-workflows may have different directors

– promotes reusability

Page 36: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Models of Computation (A Wf Engineer’s Issue)

Directors separate the concerns of orchestration and scheduling from conceptual design

– Synchronous Dataflow (SDF)• Statically analyzable: schedule, no deadlocks, fixed buffer requirements; executable

as a single thread by the director.

– Process Networks (PN)• Generalizes SDF. Actors execute as separate threads/processes, with queues of

unbounded size (Kahn/MacQueen networks).

– Directed Acyclic Graph (DAG)• Special case of SDF. No loops, no pipelining.

– Continuous Time (CT)• Connections represent the value of a continuous time signal at some point in time ...

Often used to model physical processes.

– Discrete Event (DE)• Actors communicate through a queue of events in time. Used for instantaneous

reactions in physical systems.

– …

Page 37: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Everything is a service / actor…

Page 38: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Smart Discovery

Find a component (here: an actor) in different locations (“categories”)

• … based on the semantic annotation of the component (or its ports)

Browse for Components Search for Component Name Search for Category / Keyword

Page 39: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Behold the Beauty of Scientific Workflow Design

Author: Kristian Stevens, UC Davis

Page 40: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

… Shimology Part 2: the ugly truth inside Author: Kristian Stevens, UC Davis

Page 41: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Triana

Slides courtesy of Ian Taylor

Page 42: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Triana Focus• Two core underlying focuses:

– Interactive graphical programming of the distributed tasks - complex editing

• Intuitive drag/drop flexible editing - copy/paste services, wizards for creating tools/toolboxes, user interfaces, adding nodes and multi-level grouping.

• Has been used as a “graphical editor” for other languages, e.g. DAG, VDLx (DAX in progress).

– Heterogeneous workflows - Bridge the gap between different distributed environments

• Use cross-environment interfaces

• led to integration with GAT (pre SAGA), GAP

Page 43: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Types of Uses

– For fine-grained operations, specifying dataflow for local operations

– Or course-grained composition of a distributed workflow

– Or Both - can connect heterogeneous tools (e.g. Web services, Java units, Jxta services) on one workflow

Has been used as a dataflow system, a distributed-workflow environment, workflow-management system, an automated scripting tool, workflow editor.

Page 44: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Current Capabilities• Local Java Units

– 600 units in signal, image, audio, text processing, complete math/stats toolboxes etc

– Common units - flexible importers/exporters, graphing, duplicators– Data types - strong data types for a number of domains - includes

run-time checking

• Distributed Integration– GAT - Java GAT implementation - graphical representation of

GAT primitives - supports GRAM, GridFTP, etc– GAP - SOA publish, find, bind triad of operations

• Bindings: Jxta, P2PS, Web Services, WS-RF

– Group unit deployment

• Legacy Applications– Can incorporate legacy applications easy (using local GAT

adaptor) - standard file in/out interface

Page 45: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Distributed Work-flow

WorkflowCommands

Workflow, e.g. BPEL4WS

TrianaEngine

TrianaService &

Engine

Remote Legacy

Applications

Distributed services

Distributing Triana Units or Groups (Java)

Integrating Legacy applications into Workflow

Integrating Web Services or P2P Services

GAP

GAT & GAP

GAP

Upperware Middleware

Page 46: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Triana, the GAT and the GAP

P2PS JXTAWeb

Services

GAP Interface

UDDISOAP

P2PSDiscovery

P2PSPipes

JXTADiscovery

JXTAPipes

GAT Interface

Condor

Globus RLS

Unicore

PBS GridLab

GRMS

SGESSH

WSRF

LDR

.NET

Other..

GridFTP

Grid Computing:

Job Submission, File services

A Graphical Grid Computing

Environment or Portal

Service Based Computing:

Deployment, discovery and communication with distributed services e.g. P2P and (GSI) Web services

Page 47: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Audio Processing (Groups)

Page 48: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Group Units

Page 49: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

GAT Interface

• Main deliverable of Gridlab• Application-level interface• With a set of adapters

– That adapt the interface to an underlying capability

• Versions in C++ and Java• Pre-cursor to SAGA - Simple API for Grid

Applications

Page 50: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Grid FTP Adapter

Grid FTP Connection

Jxta File Adapter

Jxta Pipe

GAT Adapters: ExampleGAT Adapters: ExampleGAT API

ResourceManagement

Streaming/Comms

File Management

Job Management

MonitoringCollection

Management

GAT Engine

P2P Environment

Copy File(Machine A, Machine B)

Grid Environment

Page 51: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

GAP Interface• Motivation by GAT• A Simple Service based API, for

– Service Deployment,– Service Discovery– Pipe Based Communication

• Static application interface with multiple middleware bindings

– P2PS (name…?)– JXTA– Web services

P2PS JXTAWeb

Services

GAP Interface

UDDISOAP

P2PSDiscovery

P2PSPipes

JXTADiscovery

JXTAPipes

Page 52: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Deploying and Connecting To Remote Services

• Running services are automatically discovered via the GAP Interface, and appear in the tool tree

• User can drag remote services onto the workspace and connect cables to them like standard tools (except the cables represent actual JXTA/P2PS pipes)

RemoteServices

Page 53: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Web Service Discovery

• Triana allows users to query UDDI repositories

• Alternatively, users can import services directly from WSDL

Page 54: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Complex Data Types

• Users can build their own interface for creating/mediating between complex types

• Alternatively, Triana can dynamically generate an interface from the WSDL2Java generated bean class

Page 55: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Askalon

Slides Courtesy of Thomas Fahringer

Page 56: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Goal: simple, efficient, effective application development for the Grid

• Invisible Grid• Application Modeling (UML) and programming at a high level of abstraction (AGWL)• Semantics technologies• Semi-automatic deployment• SOA-based runtime environment with stateful services • Analysis and optimization of performance, costs and reliability

ASKALONASKALONApplication Development andApplication Development andRuntime Environment for the GridRuntime Environment for the Grid

Page 57: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

WSRFWSRF

ASKALON Workflow Composition and Runtime Environment

Execution

Engine

Execution

Engine

Scheduler

Scheduler

Resource

Manager

Resource

Manager

<agwl> <parallel> activity </parallel></agwl>

<agwl> <parallel> activity </parallel></agwl>

The Grid

Globus toolkitGlobus toolkit

UML-based WorkflowComposition

AGWL Runtime Middleware Services

DataRepositor

y

DataRepositor

y

JobJobPerformanceAnalysis

Page 58: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Austrian Grid

karwendel80

CPUs

272 CPUs

altix164 CPUs

16 CPUsCA

UniVie

RAUni-Linz

RAUIBK

MAUI

Uni-Sbg

16 CPUs

MAUI

ZID Grid

gescher

FHVRARA`

hydra

altix116 CPUs

HPC16 CPUs

grid21 CPUs

TorquePBS

SGE

PBS/Torque

SGE

Torque

schafberg 16

CPUsPBS

RA

• 517 CPUs distributed across 5 cities and over 20 parallel computers

Parallel computer # CPU Clock Architecture Location

altix1.jkuhydra.gup

schafberg.sbggrid.fhv.at

gescher.vcpckarwendel.dps

altix1.uibkhc-ma.uibk

zid-grid

6416162132801616

272

ITA2AthlonITA2XeonXeon

OpteronITA2

OpteronP4

1.61.61.633

2.21.62.21.8

ccNUMACOW

ccNUMACOWCOWCOW

ccNUMACOWNOW

LinzLinz

SalzburgVorarlberg

ViennaInnsbruckInnsbruckInnsbruckInnsbruck

Page 59: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

ASKALON Workflows

• Activity = basic or atomic unit of computation• Activity type

– Functional description of the activity• Signature specified by data input/output ports

– Semantically meaningful name• E.g. matrix multiplication, Gaussian elimination, povray, png2yuv, ffmpeg,

FFT, LAPW, WASIM, …– Implementation-independent

• Workflow = collection of activity types interconnected through control flow and data flow dependencies– Plus some advanced constructs

• Activity deployment– Binds an activity type to a concrete installed implementation– Description how to instantiate the activity– Registered by the application provider in a special registry of the

Resource Management service

Page 60: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

ASKALON: Abstract Grid Workflow Language (AGWL)

• Atomic activities– abstract from the real implementation, e.g. Web services, legacy applications– Sequential constructs: <sequence>– Conditional constructs: <if>, <switch>

• Basic compound activities– Loop constructs: <while>, <dowhile>, <for>, <forEach>– Directed Acyclic Graph constructs: <dag>

• Advanced compound activities– Parallel section constructs: <parallel>– Parallel loop constructs: <parallelFor>, <parallelForEach>

• Data flow constructs– dataIn/dataOut ports, collections, data repositories, data set distributions, etc.

• Properties– provide hints about the behavior of activities– Predicted I/O data size, computational complexity, non-functional parameters

• Constraints– Optimization metric (e.g. performance, cost, fault tolerance)– Scheduling constraints (e.g. compute architecture, disk, memory)

Page 61: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

ASKALON Workflow Development Stack

Portal

AGWL

CGWR

Grid

Application Developer

ASKALONMiddleware

Abstract Grid Workflow Language

UMLUML Workflow UML model

XMLXMLActivity Type

JavaJavaActivity Type

ASKALONASKALONActivity Deployment

GridGridActivity Instance

Con

cre

tizing

Concrete Grid Workflow Representation

Page 62: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Real-world Scientific Workflows with ASKALON

• WIEN2k

• Material science application

• Technical University of Vienna– Institute of Theoretical Chemistry

• Seven activity types

• Over 500 activity instances

• Statically unknown number of sequential loop iterations

StageIn

LAPW0

LAPW1_K1 LAPW1_K2 LAPW1_Kn...

LAPW2_FERMI

LAPW2_K1 LAPW2_K2 LAPW2_Kn...

Sumpara

Lcore

Mixer

Converged?

StageOut

Page 63: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Resource Management

• Resource brokerage– Interface to MDS information service

for resource discovery– Selection based on matchmaking

• Advance reservation– Useful for co-allocation purposes

• GLARE– Registry of activity deployments

• Activity deployment– Binds an abstract activity type to a

concrete implementation– Refers to an installed executable or a

deployed Web/Grid service– Description how to instantiate the

activity– Registered in GLARE by the

application provider

Page 64: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Askalon Runtime Environment

Dynamic Bindings of Workflow Abstract - Concrete

Node 1 Nod 2

Node 3

Node 4

Abstract Workflow

Web ServicesExecutables

A

G

AAD

C B

A

B

A B yx

yx

Activity Type (abstract)

Activity Deployment

A B yx A B yx

Concrete Workflow

Resource Manager

Page 65: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Composite Activities

• Composite activity– Sequence– Parallel activities– Conditional activities: if, switch– Sequential loops: for, while, for each– Parallel loops: parallel for, parallel for each– Sub-workflows

<sequence name=“seq”> <dataIn name=“in” source=... /> <activity name=“A1”> <dataIn name=“in” source=seq/in /> ... <dataOut name=“out” /> </activity>

<activity name=“A2”> <dataIn name=“in” source=“A1/out” /> ... <dataOut name=“out” /> </activity> <dataOut name=“out” source=“A2/out” /></sequence>

data flowcontrol

flow

A1

A2

Sequence

Page 66: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

If-then-else

<if> <dataIn ...> <condition> ... </condition> <then> <activity name=“A2”> <dataIn name=“in” source=“...” /> ... <dataOut name=“out” /> </activity> </then>

<else> <activity name=“A3”> <dataIn name=“in” source=“...” /> ... <dataOut name=“out” /> </activity> <else> <dataOut name=“ifout” source=“A2/out,A3/out”></if>

(2)

(4)(3)

A1 A2

A0

A3

(1)then

else

Page 67: USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman

USC Viterbi School of Engineering

Execution Engine• Workflow controller

– Converts XML-based specification (AGWL) to internal representation– Executes the workflow according to control and data flow dependencies

• One separate Controller for every workflow instance

• Event system– Other components can subscribe to the internal events– e.g. logging, controller, tool (WS-Notification), ...

• Logging and database– For post-mortem performance analysis

• GT4 WSRF wrapper– Send WS-Notifications to the portal

Scheduler– Receives jobs ready to

execute from the task loop– Retrieves the resources with

available from GridARM– Assigns the task to the best

machine according to the selection criteria

o Clock speed * no free processors

o Prediction information, memory available, …

Core

Task LoopFault

Handler

ControllerAGWL Interpreter

Event System

GT4 WSRF Service

Logging &

DatabaseSchedul

er

Execution / Launching Framework

GridARM

AGWL