part iii computational workflows in wings/pegasusgil/aaai08tutorialslides/3-wings.pdf ·...

49
1 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute Part III Computational Workflows in Wings/Pegasus AAAI-08 Tutorial on Computational Workflows for Large-Scale Artificial Intelligence Research

Upload: others

Post on 21-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

1 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Part III

Computational Workflows in Wings/Pegasus

AAAI-08 Tutorial on Computational Workflows for

Large-Scale Artificial Intelligence Research

Page 2: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

2 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Our Approach   Express analysis as distributed workflows

•  Data analysis as distributed application   User-centric workflow refinement process

•  Start with high-level problem description, add layers of detail, map to distributed execution environment

  Knowledge-rich descriptions of workflows -- OWL/RDF •  Descriptions of input data and data products (aka “metadata”) •  Models of components in terms of I/O data and their function

  Automation of resource allocation and optimization •  Efficient scheduling algorithms for workflow graphs •  Optimization techniques of broad applicability

  Build on distributed computing research -- GRID •  Designed, by definition, to be robust, secure, flexible

Page 3: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

3 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

The Wings/Pegasus Workflow System[Gil et al 07; Deelman et al 03; Deelman et al 05; Kim et al 08; Gil et al forthcoming]

Grid services condor.uwisc.edu www.globus.org

Pegasus: Automated workflow refinement and execution pegasus.isi.edu

WINGS: Knowledge-based workflow environment www.isi.edu/ikcap/wings

• Ontology-based reasoning on workflows and data (W3C’s OWL)

• Workflow library of useful analyses • Proactive assistance +automation • Execution-independent workflows

• Optimize for performance, cost, reliability

• Assign execution resources • Manage execution through DAGMan • Daily operational use in many domains

• Secure and controlled sharing of distributed services, computing, data

• Scalable service-oriented architecture • Commercial quality, open source

Page 4: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

4 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Workflow Selection

Workflow Template

Data Selection

Workflow Instance

Workflow Libraries

Data Repositories

Application Components

Ontologies: Domain terms,

Component types, Workflow Products

- Preexisting data collections - Workflow execution results

“Show me workflows that classify datasets”

“Run this workflow with the weather1980 data set”

“Validate this workflow based on the component specs”

STUDENT

SEASONED NL RESEARCHER

Workflow Creation

ALGORITHM DEVELOPER

- Workflow templates specify complex analyses sequences - Workflow instances specify data

“Here is a new classification algorithm, has a parameter for smoothing, is compiled for MPI”

Component Specification

Executable Workflow Pegasus

WINGS

-  Specifies data requirements -  Specifies execution requirements

DAGMan/ Grid

(OWL)

Wings: Workflow Instance Generation and Selection

Page 5: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

5 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute August 1, 2008

© 2005 TANGRAM 5

Globus RLS replica mgmt

GRAM remote

submission

GridFTP data

transfer

Condor DAGMan execution

engine

Condor-G job

manager

Nagios monitoring

probes

Pegasus Site

selection Replica

selection Workflow

optimization

Wings Workflow validation

Data/Comp selection

Metadata generation

Workflow generation

National Middleware

Infrastructure (NMI) software

Workflow submission

LEGEND:

Workflow System

All software is open source

Page 6: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

6 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Workflow Structure   We take to heart the separation of “programming” from

“analysis” activities –  Components are designed by programmers and can be complex

(and need testing, debugging, loops should terminate, etc) –  Workflows are composed by non-programmers and should have

simple structure-- focus is on selecting application components and data

  Therefore, our workflow structure is very streamlined •  Only iterations handled are parallel data processing pipelines •  Only conditionals handled are data-driven component selections •  Standard workflow languages offer much more complex

constructs

  Workflow structure designed to: •  Be accessible to users •  Facilitate automation and failure recovery

Page 7: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

7 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Core Workflow Concepts

C1

C2

F1

F4

F6

F2   Workflow consists of

•  Components: software to be executed •  Links: data flow among components

  Directed Acyclic Graphs (DAGs) •  Facilitate automation, esp. execution

monitoring and repair   Data always handled through files   Special handling of some control

constructs loops (more on this later) •  Choices of components •  Iterations over data sets

  Layered workflow refinement process •  Select application components -> select

data -> select execution resources   Each layer adds more information to the

same basic workflow structure

C3

F5

F3

F5

Page 8: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

8 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Workflow Abstraction Layers   We use several layers of description of workflows

Page 9: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

9 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

WINGS:Workflow Representation

Page 10: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

10 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

F2-operation-SA-Median-Distance-JB F2-operation-SA-Median-VS30

Compute-F2-SA-Median-wrt-Distance-JB- given-Fault-Type-&-Basin-Depth-&-…

Compute-F2-SA-MEDIAN-wrt-VS30- given-Fault-Type-&-Basin-Depth-&-…

Hazard-Level

Hazard-Level-with-SA

Hazard-Level-with-PGA

Hazard-Level-with-PGV

Compute-Hazard-Level- given-IMR-input-parameters

. . .

. . .

Compute-Hazard-Level- with-SA- given-IMR-input-parameters

Compute-Hazard-Level-with-PGA- given-IMR-input-parameters

Compute-Hazard-Level- with-PGV- given-IMR-input-parameters

Hazard-Level-with-SA-Median

Hazard-Level-with-SA-Std-Dev

Hazard-Level-with-SA-Prob-Exc

Hazard-Level-with-Median

Hazard-Level-with-Std-Dev

Hazard-Level-with-Median

. . .

Compute-Hazard-Level-with-SA-Median- given-IMR-input-parameters

Compute-Hazard-Level-with-SA-Std-Dev- given-IMR-input-parameters

Compute-Hazard-Level-with-SA-Prob-Exc- given-IMR-input-parameters

IMR-Input-Parameter

Field-2000-Input-Parameter

Parameter

Fault-Type

Basin-Depth

Distance

. . .

. . . Compute-F2-SA-Median- given-Field-2000-input-parameters

Compute-F2-Hazard-Level- given-Field-2000-input-parameters

F2-Hazard-Level

. . . . . . Domain Ontology Ontology of Components

IMT probability-function

IMR

probability-function

F2-SA-Median-wrt-VS30

. . .

Page 11: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

11 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

F1

WINGS: Representing Components

  Any input or output can be defined as a file collection

•  Same file type •  Unspecified cardinality •  Ordered

  Inputs and outputs through files •  Files are typed

  Each input is uniquely identified by a file descriptor (~ parameterID)

  Ordered lists of file descriptors for both I and O

C-one

D1

D3

D2

C-many

F1

D13

F1 DC11 D12

Page 12: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

12 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Data Descriptions   Metadata of different

kinds can be organized in ontology

  Files represented as instances and classified in ontology according to their metadata

  File collections also represented as instances and defined as ordered sets of file instances

  A file Skolem is created for each class as a representative instance (more on this later)

  Similarly, a file collection Skolem is created for each class

Application-Specific

Metadata Ontologies

Content Metadata

Format Metadata

Kim- Homepage

EHS-T

File Collection

Gil- Homepage

Kim- Homepage

Gil- Homepage

EHCS-T

IKCAP-pages

Page 13: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

13 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

A Component in a Workflow Template

C-one

D1

D3

D2

  Nodes correspond to individual application components   Links include file descriptors for origin and destination and a

file Skolem

C-one

D1

D3

D2

Link

Node

C67 C67

D6

D7

D6 C67

D6

L1 L2

L3

L4

N1

N2

N3

FS-A FS-B

FS-C

FS-D Notation: “S” marks a Skolem

Page 14: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

14 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

File Collections in a Workflow Template

F1

  Links that include file descriptors that are collections refer to file collection Skolems

  Using the same file Skolem ID or file collection Skolem ID in different links indicates identity

F1 F1 DC11 D12

C-many

D13

F1

C-many

F1

D13

F1 DC11 D12 FS-B

FS-C

L1 L2

L3

N1

FCS-A

Page 15: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

15 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Iteration Over File Collections in a Workflow T

  Iteration over sets compactly represented with single nodes that contain component collections

  Will be expanded to as many jobs as files are specified for the executable workflow

  Links capture formation of file collections as input

C-one

G1

Z1

D1 D2

D3

C-many

C-one

Z2

C-one

Z88

K1 G2 K2 G88 K88

L1 L2

L3

C-many N2

D12

L4

FS-Y

Y1

C-one

D1

D3

D2

F1

C-many

F1

D13

F1 DC11 D12 F1 F1 F1 DC11

FCS-G FCS-K

FCS-Z

C-one

NC1

Page 16: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

16 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Iteration With a Constant in a Workflow T

  Nodes that represent component collections can take the same file from the same link when the link contains a file Skolem instead of a file collection Skolem

C-one

G1

Z1

C-many

C-one

Z2

C-one

Z88

K1 G2 K1 G88 K1

Y1

C-one

D1

D3

D2

F1

C-many

F1

D13

F1 DC11 D12

D1 D2

D3

L1 L2

L3

C-many N2

D12

L4

FS-Y F1 F1 F1 DC11

FCS-G

FCS-Z

C-one

NC1

FS-K1

Page 17: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

17 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Constraints on Workflow Templates

CybershakeTemplate

InputLink_SiteNameFile_to_BoxNameCheck

hasSiteName

InputLink_RuptureVars_to_SeisgmogramGen

hasLink

F-RV

C-RuptVars

CC-RuptureVariations

InputLink_SGTCollforRup_to_SeismogramGen

F-SGT

C-SGT-forRups

CC-SGTs

hasFile

hasFile

hasFile

SGTsSiteName

SiteNameFile

hasSiteName

SiteName

N_Rups

hasN_Items

hasN_Items

… isSameAs

Constraints on number of elements in different collections

Constraints on files/collections of different workflow components

Page 18: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

18 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Workflow Instances

C-one

D1

D3

D2

C67

D6

C-plenty

L1 L2

L3

N1

N2 N3

FS-A FS-B

F-C

D7

C-one

D1

L5

N4

FS-E D8

D2

L6

FS-F

D3 L7

FS-G

DC9

L4

File85

File28

F34254-05-06-08

FileColl54

F34256-05-06-08

F34255-05-06-08

F34257-05-06-08

Existing data

New data products

  Input data selected from the file library by querying for files of the type of file Skolems

  Logical names created for new data products with metadata based on file Skolems

  Compact Workflow Instance = WT + bindings

  Easy to understand, and easily transformed into an expanded WI and a DAX for Pegasus

Bindings

FCS-D

Page 19: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

19 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

AUTOMATED WORKFLOW INSTANCE GENERATION

IN WINGS

Page 20: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

20 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Corpus

Kernel_Rules Split

Filter_Rules

Prune_Rules

Binarize Generate_Rule_Map

Compile

XRS_Rules BRF_Rules Lexicon_Dictionary

1…n

1…n

1…n 1…n

WSJ-2001

KR-09-05

WSJ-2001 KR-09-05

Workflow Instance Expressions • Compact expression for efficient search and matching

• Expanded expression when further details are needed

Page 21: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

21 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Expanded Workflow Instance <rdf:RDF ..(xmlns definitions)....> <wflns:WorkflowInstance rdf:ID="WFT0b"> <wflns:hasDescription rdf:datatype=MailScanner has detected a possible fraud attempt from "www.w3.org" claiming to be MailScanner

has detected a possible fraud attempt from "www.w3.org" claiming to be "http://www.w3.org/2001/XMLSchema#string"> Count the number of unique words in a file </wflns:hasDescription> <wflns:hasNode rdf:resource="#N1"/> <wflns:hasNode rdf:resource="#N2"/> <wflns:hasLink rdf:resource="#L12"/> <wflns:hasLink rdf:resource="#L01"/> <wflns:hasLink rdf:resource="#L2Output"/> </wflns:WorkflowInstance> <wflns:InOutLink rdf:ID="L12"> <wflns:hasOriginFileDescription rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics

/componentLibrary.owl#remDupesOutputFile"/> <wflns:hasFile rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/fileLibrary.owl#F12_WFT0b_1117161532484"/> <wflns:hasDestinationFileDescription rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics

/componentLibrary.owl#CountWordsInputFile"/> <wflns:hasDestinationNode> <wflns:Node rdf:ID="N2"> <wflns:hasComponent rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/componentLibrary.owl#countWordsV1"/> </wflns:Node> </wflns:hasDestinationNode> <wflns:hasOriginNode> <wflns:Node rdf:ID="N1"> <wflns:hasComponent rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics

/componentLibrary.owl#removeDuplicatesV1"/> </wflns:Node> </wflns:hasOriginNode> </wflns:InOutLink> <wflns:InputLink rdf:ID="L01"> <wflns:hasDestinationNode rdf:resource="#N1"/> <wflns:hasFile rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/fileLibrary.owl#test_txt_WFT0b_1117161532484"/> <wflns:hasDestinationFileDescription rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics

/componentLibrary.owl#remDupesInputFile"/> </wflns:InputLink> <wflns:OutputLink rdf:ID="L2Output"> <wflns:hasOriginFileDescription rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics

/componentLibrary.owl#CountWordsOutputFile"/> <wflns:hasOriginNode rdf:resource="#N2"/> <wflns:hasFile rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/fileLibrary.owl#F2Output_WFT0b_1117161532484"/> </wflns:OutputLink> </rdf:RDF>

Page 22: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

22 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

W Instance: “dax” for Pegasus <?xml version="1.0" encoding="UTF-8"?> <!-- generated: 2004-08-18T10:53:01-05:00 --> <adag xmlns="http://www.griphyn.org/chimera/DAX"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.griphyn.org/chimera/DAX http://www.griphyn.org/chimera/dax-1.8.xsd" version="1.7"

count="1" index="0" name="WorkFlow0b">

<!-- part 1: list of all files used (may be empty) --> <filename file="vahi.f.a" link="input"/> <filename file="vahi.f.b1" link="inout"/> <filename file="vahi.f.b2" link="output"/> <!-- part 2: definition of all jobs (at least one) --> <job id="ID000001" namespace="vds" name="removeDups" version="1.0" level="3" dv-namespace="vds" dv

-name="top" dv-version="1.0"> <argument>-a top -T60 -i <filename file="vahi.f.a"/> -o <filename file="vahi.f.b1"/> </argument> <uses file="vahi.f.a" link="input" dontRegister="false" dontTransfer="false"/> <uses file="vahi.f.b1" link="output" dontRegister="true" dontTransfer="true" temporaryHint="true"/> </job> <job id="ID000002" namespace="vds" name="countWords" version="1.0" level="2" dv-namespace="vds" dv

-name="left" dv-version="1.0"> <argument>-a left -T60 -i <filename file="vahi.f.b1"/> -o <filename file="vahi.f.b2"/> -p 0.5<

/argument> <uses file="vahi.f.b1" link="input" dontRegister="false" dontTransfer="false" temporaryHint="true"/> <uses file="vahi.f.b2" link="output" dontRegister="true" dontTransfer="true" temporaryHint="true"/> </job> <!-- part 3: list of control-flow dependencies (empty for single jobs) --> <child ref="ID000002"> <parent ref="ID000001"/> </child> </adag>

Page 23: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

23 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

AUTOMATED METADATA GENERATIONIN WINGS

Page 24: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

24 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Metadata Reasoning for file name generation and workflow validation   Filename Generation

•  Explicit representation of metadata in ontology (e.g. source id, rupture id)

•  Propagate metadata attributes for all data products when creating workflow instance

•  Names for intermediate files are created automatically from the metadata

  Workflow Validation •  Explicit representation of metadata constraints (examples are

shown below) –  Constraints on individual files and collections –  Constraints on component inputs and outputs –  Constraints among components in a workflow

•  Check constraints while generating workflow instantiations

Page 25: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

25 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Propagation of metadata for filename generation: an example

SeismogramGen_Li

RVM

127_6.rvm - source_id: 127 - rupture_id: 6

Rupture_variation Rupture_variation

127_6.txt.variation -s0000-h0000 - source_id: 127 -  rupture_id: 6 -  slip_relaization_#:0 -  hypo_center_#: 1

127_6.txt.variation -s0000-h0000 - source_id: 127 -  rupture_id: 6 -  slip_relaization_#:0 -  hypo_center_#: 1

127_6.txt.variation -s0000-h0001 - source_id: 127 -  rupture_id: 6 -  slip_relaization_#:0 -  hypo_center_#: 1

127_6.txt.variation -s0000-h0001 - source_id: 127 -  rupture_id: 6 -  slip_relaization_#:0 -  hypo_center_#: 1

SGT

127_6.txt.variation -s0000-h0000 - source_id: 127 -  rupture_id: 6 -  slip_relaization_#:0 -  hypo_center_#: 1

127_6.txt.variation -s0000-h0001 - source_id: 127 -  rupture_id: 6 -  slip_relaization_#:0 -  hypo_center_#: 1

FD_SGT/PAS_1/A/SGT161 - site_name: PAS -  tensor_direction: 1 -  time_period: A -  xyz_volumn_id: 161

127_6.txt.variation -s0000-h0001 - source_id: 127 -  rupture_id: 6 -  slip_realization_#:0 -  hypo_center_#: 1

Seismogram

Seismogram_PAS_127_6.grm - site_name: PAS - source_id: 127 - rupture_id: 6

… … SGT

Page 26: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

26 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

AUTOMATIC WORKFLOW GENERATION IN WINGS

Page 27: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

27 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Automatic Template-Based Workflow Generation Algorithm

WR0: Workflow Template

Workflow request =

Workflow Template

+

Seed Constraints

Seed workflow from request

unified well-formed request

Find input data requirements

seeded workflows

Data source selection

binding-ready workflows

Parameter selection

bound workflows

configured workflows

Workflow instantiation

Workflow grounding

workflow instances

Workflow mapping

ground workflows

executable workflows

Workflow ranking

top-k workflows dataVariable5 data:contains data:Muti-party-communication dataVariable0 data:creator 5048 dataVariable1 data:creator 5048

WR0: Seed Constraints

Page 28: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

28 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Step 1: Workflow Template is Seeded

unified well-formed request

Find input data requirements

seeded workflows

Data source selection

binding-ready workflows

Parameter selection

bound workflows

configured workflows

Workflow instantiation

Workflow grounding

workflow instances

Workflow mapping

ground workflows

executable workflows

Workflow ranking

top-k workflows

Seed workflow from request

Page 29: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

29 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Step 2: Backward Sweep unified well-formed request

Find input data requirements

seeded workflows

Data source selection

binding-ready workflows

Parameter selection

bound workflows

configured workflows

Workflow instantiation

Workflow grounding

workflow instances

Workflow mapping

ground workflows

executable workflows

Workflow ranking

top-k workflows

Seed workflow from request

Page 30: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

30 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

E-07

S-NY

Step 3: Select Data Sources unified well-formed request

Find input data requirements

seeded workflows

Data source selection

binding-ready workflows

Parameter selection

bound workflows

configured workflows

Workflow instantiation

Workflow grounding

workflow instances

Workflow mapping

ground workflows

executable workflows

Workflow ranking

top-k workflows

Seed workflow from request

Page 31: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

31 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

E-07

S-NY

Step 3: Select Data Sources unified well-formed request

Find input data requirements

seeded workflows

Data source selection

binding-ready workflows

Parameter selection

bound workflows

configured workflows

Workflow instantiation

Workflow grounding

workflow instances

Workflow mapping

ground workflows

executable workflows

Workflow ranking

top-k workflows

Seed workflow from request

Page 32: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

32 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

E-07

S-NY

Step 4: Forward Sweep unified well-formed request

Find input data requirements

seeded workflows

Data source selection

binding-ready workflows

Parameter selection

bound workflows

configured workflows

Workflow instantiation

Workflow grounding

workflow instances

Workflow mapping

ground workflows

executable workflows

Workflow ranking

top-k workflows

Seed workflow from request

Page 33: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

33 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

E-07

S-NY

Result-PartA

Result-PartB

Step 5: Workflow Instantiation unified well-formed request

Find input data requirements

seeded workflows

Data source selection

binding-ready workflows

Parameter selection

bound workflows

configured workflows

Workflow instantiation

Workflow grounding

workflow instances

Workflow mapping

ground workflows

executable workflows

Workflow ranking

top-k workflows

Seed workflow from request

Page 34: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

34 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

E-07

S-NY

Result-PartA

Result-PartB

Step 5: Workflow Instantiation unified well-formed request

Find input data requirements

seeded workflows

Data source selection

binding-ready workflows

Parameter selection

bound workflows

configured workflows

Workflow instantiation

Workflow grounding

workflow instances

Workflow mapping

ground workflows

executable workflows

Workflow ranking

top-k workflows

Seed workflow from request

Page 35: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

35 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

E-07

S-NY

Result-PartA

Result-PartB

<job id = “j42” name=“Neuman-BC”> <argument> -i E-07 17.5 -o ES-07….

parent

parent parent

parent

parent

Step 6: Workflow Grounding

Ground Workflow

Seed workflow from request

unified well-formed request

Find input data requirements

seeded workflows

Data source selection

binding-ready workflows

Parameter selection

bound workflows

configured workflows

Workflow instantiation

Workflow grounding

workflow instances

Workflow mapping

ground workflows

executable workflows

Workflow ranking

top-k workflows

Page 36: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

36 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

W1: estimated exec time 3hrs W2: estimated exec time 20hrs

W3: estimated exec time 3d W4: estimated exec time 5hrs

Step 7: Workflow Ranking Seed workflow from request

unified well-formed request

Find input data requirements

seeded workflows

Data source selection

binding-ready workflows

Parameter selection

bound workflows

configured workflows

Workflow instantiation

Workflow grounding

workflow instances

Workflow mapping

ground workflows

executable workflows

Workflow ranking

top-k workflows

Page 37: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

37 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Seed workflow from request

unified well-formed request

Find input data requirements

seeded workflows

Data source selection

binding-ready workflows

Parameter selection

bound workflows

configured workflows

Workflow instantiation

Workflow grounding

workflow instances

Workflow mapping

ground workflows

executable workflows

Workflow ranking

top-k workflows

W1: estimated exec time 3hrs W2: estimated exec time 20hrs

W3: estimated exec time 3d W4: estimated exec time 5hrs

Step 7: Workflow Ranking

Page 38: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

38 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Ground workflow: 15 compute nodes devoid of resource assignment

4 1

8 5

10

9

13

12

15

9

4

8 3 7

10

13

12

15

13 data stage-in nodes

11 compute nodes (1-2&5-6 reduced based on available intermediate data)

8 inter-site data transfers

14 data stage-out nodes to long-term storage 14 data registration nodes (data cataloging)

Executable workflow: mapped to 3 sites

Step 8: Workflow Mapping Seed workflow from request

unified well-formed request

Find input data requirements

seeded workflows

Data source selection

binding-ready workflows

Parameter selection

bound workflows

configured workflows

Workflow instantiation

Workflow grounding

workflow instances

Workflow mapping

ground workflows

executable workflows

Workflow ranking

top-k workflows

Page 39: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

39 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Why Do We Automate All This?So You Donʼt Have To

Request ID

# Binding-Ready

Workflow Candidates

# Bound Workflow

Candidates

# Configured Workflow

Candidates

# Calls to c:find-DODs-given-output-requirements

# Calls to

d:find-data-

objects

# Calls to c:predict-DODs-given-input-requirements

Workflow Generation

Time

R1 6 8 8 1 6 8 5 s

R2 6 8 8 7 6 16 4 s

R3 6 24 24 7 6 48 7 s

R4 6 24 24 13 6 72 8 s

R5 18 64 48 7 18 128 22 s

R6 18 288 216 7 18 576 81 s

R7 18 16 12 7 18 32 10 s

R8 6 0 0 1 6 0 1 s

Seed workflow from request

unified well-formed request

Find input data requirements

seeded workflows

Data source selection

binding-ready workflows

Parameter selection

bound workflows

configured workflows

Workflow instantiation

Workflow grounding

workflow instances

Workflow mapping

ground workflows

executable workflows

Workflow ranking

top-k workflows

Workflow candidates generated + considered (many are eliminated)

Queries about data

Queries about tools

Page 40: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

40 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

WINGS DEMO

Page 41: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

41 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Editing a Seed & Template,Generating a DAX

WR0: Workflow Template

dataVariable5 data:contains data:Muti-party-communication dataVariable0 data:creator 5048 dataVariable1 data:creator 5048

WR0: Seed Constraints

Workflow seed =

Workflow Template

+

Seed Constraints

Seed workflow from request

unified well-formed request

Find input data requirements

seeded workflows

Data source selection

candidate workflows

Parameter selection

bound workflows

configured workflows

Workflow instantiation

Workflow grounding

workflow instances

Workflow mapping

ground workflows (DAXes)

executable workflows

Workflow ranking

top-k workflows

Page 42: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

42 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

SCEC WORKFLOWS IN WINGS

Page 43: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

43 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

InSAR Image of theHector Mine Earthquake

• A satellitegeneratedInterferometricSynthetic Radar(InSAR) image ofthe 1999 HectorMine earthquake.

• Shows thedisplacement fieldin the direction ofradar imaging

• Each fringe (e.g.,from red to red)corresponds to afew centimeters ofdisplacement.

Seismic Hazard Model

Seismic Hazard Analysis in Southern California Earthquake Center (SCEC) [Slide from T. Jordan]

Page 44: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

44 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Intensional descriptions of data sets

Intensional descriptions of parallel computations

Querying results of other data creation subworkflows

Rich metadata descriptions for all data products

Reusable High-Level Workflow Templates

Page 45: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

45 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Workflows for Seismic Hazard Analysis [Gil et al 06; Kim et al 06; Gil et al 07]

  Input data: a site and an earthquake forecast model •  thousands of possible fault ruptures and rupture

variations, each a file, unevenly distributed •  ~110,000 rupture variations to be simulated for a

given site   High-level template combines 11 application codes   8048 application nodes in the workflow instance

generated by Wings   24,135 nodes in the executable workflow generated

by Pegasus, including: •  data stage-in jobs, data stage-out jobs, data

registration jobs   Executed in USC HPCC cluster, 1820 nodes w/

dual processors) but only < 144 available •  Including MPI jobs, each runs on hundreds of

processors for 25-33 hours •  Runtime was 1.9 CPU years

  Provenance records kept throughout the generation and execution process for 100,000 workflow data products

Page 46: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

46 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

DAX automatically generated from WINGS

14,639 jobs for 4,626 ruptures with 106,124 rupture variations for USC site

Page 47: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

47 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Summary:Creating Workflows with WINGS   Separates analysis spec from data

•  Workflow template as reusable well-defined acceptable analysis process •  Workflow instance binds template to data for particular analyses

  Ensures that the data complies with the component specifications and their constraints within the workflow

  Represents data collections (nominal or otherwise) within the workflow specification

  Automatically generates descriptions and metadata to new data products to be created by the workflow execution

  Compact workflow instance is user-friendly and reusable •  Separates data provenance (workflow instance) and pedigree (workflow

template)   Expands workflow instance into DAX for Pegasus, which creates the

executable workflow

Page 48: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

48 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Key Benefits   Efficient and correct creation of new workflows

•  By retrieving a template and filling in the data

  Framework ensures adherence to methodology •  Represents as templates widely-accepted analysis methodologies •  Supports repeatability of experiments/analyses •  Enables controlled variations

  Ensures better quality of data analysis results •  Attaches provenance and pedigree information

Page 49: Part III Computational Workflows in Wings/Pegasusgil/AAAI08TutorialSlides/3-Wings.pdf · 2008-08-01 · USC Information Sciences Institute Yolanda Gil (gil@isi.edu) AAAI-08 Tutorial

49 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute

Ongoing and Future Work   Interactive assistance in creating valid workflow

templates •  Based on CAT (Composition Analysis Tool) [Kim et al 05]

  More sophisticated models of components   Automatic completion of workflow’s data conversion and

formatting steps through AI planning techniques   Tracking new versions of components, invalidate data

and workflows from old versions   Workflow template libraries

•  Indexing, retrieval

  Managing collections of workflows as part of an overall analysis activity

•  Eg: parameter sweeping, variants of analysis