part iii computational workflows in wings/pegasusgil/aaai08tutorialslides/3-wings.pdf ·...
TRANSCRIPT
1 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Part III
Computational Workflows in Wings/Pegasus
AAAI-08 Tutorial on Computational Workflows for
Large-Scale Artificial Intelligence Research
2 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Our Approach Express analysis as distributed workflows
• Data analysis as distributed application User-centric workflow refinement process
• Start with high-level problem description, add layers of detail, map to distributed execution environment
Knowledge-rich descriptions of workflows -- OWL/RDF • Descriptions of input data and data products (aka “metadata”) • Models of components in terms of I/O data and their function
Automation of resource allocation and optimization • Efficient scheduling algorithms for workflow graphs • Optimization techniques of broad applicability
Build on distributed computing research -- GRID • Designed, by definition, to be robust, secure, flexible
3 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
The Wings/Pegasus Workflow System[Gil et al 07; Deelman et al 03; Deelman et al 05; Kim et al 08; Gil et al forthcoming]
Grid services condor.uwisc.edu www.globus.org
Pegasus: Automated workflow refinement and execution pegasus.isi.edu
WINGS: Knowledge-based workflow environment www.isi.edu/ikcap/wings
• Ontology-based reasoning on workflows and data (W3C’s OWL)
• Workflow library of useful analyses • Proactive assistance +automation • Execution-independent workflows
• Optimize for performance, cost, reliability
• Assign execution resources • Manage execution through DAGMan • Daily operational use in many domains
• Secure and controlled sharing of distributed services, computing, data
• Scalable service-oriented architecture • Commercial quality, open source
4 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Workflow Selection
Workflow Template
Data Selection
Workflow Instance
Workflow Libraries
Data Repositories
Application Components
Ontologies: Domain terms,
Component types, Workflow Products
- Preexisting data collections - Workflow execution results
“Show me workflows that classify datasets”
“Run this workflow with the weather1980 data set”
“Validate this workflow based on the component specs”
STUDENT
SEASONED NL RESEARCHER
Workflow Creation
ALGORITHM DEVELOPER
- Workflow templates specify complex analyses sequences - Workflow instances specify data
“Here is a new classification algorithm, has a parameter for smoothing, is compiled for MPI”
Component Specification
Executable Workflow Pegasus
WINGS
- Specifies data requirements - Specifies execution requirements
DAGMan/ Grid
(OWL)
Wings: Workflow Instance Generation and Selection
5 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute August 1, 2008
© 2005 TANGRAM 5
Globus RLS replica mgmt
GRAM remote
submission
GridFTP data
transfer
Condor DAGMan execution
engine
Condor-G job
manager
Nagios monitoring
probes
Pegasus Site
selection Replica
selection Workflow
optimization
Wings Workflow validation
Data/Comp selection
Metadata generation
Workflow generation
National Middleware
Infrastructure (NMI) software
Workflow submission
LEGEND:
Workflow System
All software is open source
6 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Workflow Structure We take to heart the separation of “programming” from
“analysis” activities – Components are designed by programmers and can be complex
(and need testing, debugging, loops should terminate, etc) – Workflows are composed by non-programmers and should have
simple structure-- focus is on selecting application components and data
Therefore, our workflow structure is very streamlined • Only iterations handled are parallel data processing pipelines • Only conditionals handled are data-driven component selections • Standard workflow languages offer much more complex
constructs
Workflow structure designed to: • Be accessible to users • Facilitate automation and failure recovery
7 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Core Workflow Concepts
C1
C2
F1
F4
F6
F2 Workflow consists of
• Components: software to be executed • Links: data flow among components
Directed Acyclic Graphs (DAGs) • Facilitate automation, esp. execution
monitoring and repair Data always handled through files Special handling of some control
constructs loops (more on this later) • Choices of components • Iterations over data sets
Layered workflow refinement process • Select application components -> select
data -> select execution resources Each layer adds more information to the
same basic workflow structure
C3
F5
F3
F5
8 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Workflow Abstraction Layers We use several layers of description of workflows
9 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
WINGS:Workflow Representation
10 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
F2-operation-SA-Median-Distance-JB F2-operation-SA-Median-VS30
Compute-F2-SA-Median-wrt-Distance-JB- given-Fault-Type-&-Basin-Depth-&-…
Compute-F2-SA-MEDIAN-wrt-VS30- given-Fault-Type-&-Basin-Depth-&-…
Hazard-Level
Hazard-Level-with-SA
Hazard-Level-with-PGA
Hazard-Level-with-PGV
Compute-Hazard-Level- given-IMR-input-parameters
. . .
. . .
Compute-Hazard-Level- with-SA- given-IMR-input-parameters
Compute-Hazard-Level-with-PGA- given-IMR-input-parameters
Compute-Hazard-Level- with-PGV- given-IMR-input-parameters
Hazard-Level-with-SA-Median
Hazard-Level-with-SA-Std-Dev
Hazard-Level-with-SA-Prob-Exc
Hazard-Level-with-Median
Hazard-Level-with-Std-Dev
Hazard-Level-with-Median
. . .
Compute-Hazard-Level-with-SA-Median- given-IMR-input-parameters
Compute-Hazard-Level-with-SA-Std-Dev- given-IMR-input-parameters
Compute-Hazard-Level-with-SA-Prob-Exc- given-IMR-input-parameters
IMR-Input-Parameter
Field-2000-Input-Parameter
Parameter
Fault-Type
Basin-Depth
Distance
. . .
. . . Compute-F2-SA-Median- given-Field-2000-input-parameters
Compute-F2-Hazard-Level- given-Field-2000-input-parameters
F2-Hazard-Level
. . . . . . Domain Ontology Ontology of Components
IMT probability-function
IMR
probability-function
F2-SA-Median-wrt-VS30
. . .
11 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
F1
WINGS: Representing Components
Any input or output can be defined as a file collection
• Same file type • Unspecified cardinality • Ordered
Inputs and outputs through files • Files are typed
Each input is uniquely identified by a file descriptor (~ parameterID)
Ordered lists of file descriptors for both I and O
C-one
D1
D3
D2
C-many
F1
D13
F1 DC11 D12
12 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Data Descriptions Metadata of different
kinds can be organized in ontology
Files represented as instances and classified in ontology according to their metadata
File collections also represented as instances and defined as ordered sets of file instances
A file Skolem is created for each class as a representative instance (more on this later)
Similarly, a file collection Skolem is created for each class
Application-Specific
Metadata Ontologies
Content Metadata
Format Metadata
Kim- Homepage
EHS-T
File Collection
Gil- Homepage
Kim- Homepage
Gil- Homepage
…
EHCS-T
IKCAP-pages
13 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
A Component in a Workflow Template
C-one
D1
D3
D2
Nodes correspond to individual application components Links include file descriptors for origin and destination and a
file Skolem
C-one
D1
D3
D2
Link
Node
C67 C67
D6
D7
D6 C67
D6
L1 L2
L3
L4
N1
N2
N3
FS-A FS-B
FS-C
FS-D Notation: “S” marks a Skolem
14 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
File Collections in a Workflow Template
F1
Links that include file descriptors that are collections refer to file collection Skolems
Using the same file Skolem ID or file collection Skolem ID in different links indicates identity
F1 F1 DC11 D12
C-many
D13
F1
C-many
F1
D13
F1 DC11 D12 FS-B
FS-C
L1 L2
L3
N1
FCS-A
15 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Iteration Over File Collections in a Workflow T
Iteration over sets compactly represented with single nodes that contain component collections
Will be expanded to as many jobs as files are specified for the executable workflow
Links capture formation of file collections as input
C-one
G1
Z1
D1 D2
D3
C-many
C-one
Z2
C-one
Z88
…
…
…
K1 G2 K2 G88 K88
L1 L2
L3
C-many N2
D12
L4
FS-Y
Y1
C-one
D1
D3
D2
F1
C-many
F1
D13
F1 DC11 D12 F1 F1 F1 DC11
FCS-G FCS-K
FCS-Z
C-one
NC1
16 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Iteration With a Constant in a Workflow T
Nodes that represent component collections can take the same file from the same link when the link contains a file Skolem instead of a file collection Skolem
C-one
G1
Z1
C-many
C-one
Z2
C-one
Z88
…
…
…
K1 G2 K1 G88 K1
Y1
C-one
D1
D3
D2
F1
C-many
F1
D13
F1 DC11 D12
D1 D2
D3
L1 L2
L3
C-many N2
D12
L4
FS-Y F1 F1 F1 DC11
FCS-G
FCS-Z
C-one
NC1
FS-K1
17 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Constraints on Workflow Templates
CybershakeTemplate
InputLink_SiteNameFile_to_BoxNameCheck
hasSiteName
InputLink_RuptureVars_to_SeisgmogramGen
hasLink
…
F-RV
C-RuptVars
CC-RuptureVariations
InputLink_SGTCollforRup_to_SeismogramGen
F-SGT
C-SGT-forRups
CC-SGTs
hasFile
hasFile
hasFile
SGTsSiteName
SiteNameFile
hasSiteName
SiteName
N_Rups
hasN_Items
hasN_Items
…
… isSameAs
Constraints on number of elements in different collections
Constraints on files/collections of different workflow components
18 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Workflow Instances
C-one
D1
D3
D2
C67
D6
C-plenty
L1 L2
L3
N1
N2 N3
FS-A FS-B
F-C
D7
C-one
D1
L5
N4
FS-E D8
D2
L6
FS-F
D3 L7
FS-G
DC9
L4
File85
File28
F34254-05-06-08
FileColl54
F34256-05-06-08
F34255-05-06-08
F34257-05-06-08
Existing data
New data products
Input data selected from the file library by querying for files of the type of file Skolems
Logical names created for new data products with metadata based on file Skolems
Compact Workflow Instance = WT + bindings
Easy to understand, and easily transformed into an expanded WI and a DAX for Pegasus
Bindings
FCS-D
19 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
AUTOMATED WORKFLOW INSTANCE GENERATION
IN WINGS
20 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Corpus
Kernel_Rules Split
Filter_Rules
Prune_Rules
Binarize Generate_Rule_Map
Compile
XRS_Rules BRF_Rules Lexicon_Dictionary
1…n
1…n
1…n 1…n
WSJ-2001
KR-09-05
…
…
WSJ-2001 KR-09-05
Workflow Instance Expressions • Compact expression for efficient search and matching
• Expanded expression when further details are needed
21 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Expanded Workflow Instance <rdf:RDF ..(xmlns definitions)....> <wflns:WorkflowInstance rdf:ID="WFT0b"> <wflns:hasDescription rdf:datatype=MailScanner has detected a possible fraud attempt from "www.w3.org" claiming to be MailScanner
has detected a possible fraud attempt from "www.w3.org" claiming to be "http://www.w3.org/2001/XMLSchema#string"> Count the number of unique words in a file </wflns:hasDescription> <wflns:hasNode rdf:resource="#N1"/> <wflns:hasNode rdf:resource="#N2"/> <wflns:hasLink rdf:resource="#L12"/> <wflns:hasLink rdf:resource="#L01"/> <wflns:hasLink rdf:resource="#L2Output"/> </wflns:WorkflowInstance> <wflns:InOutLink rdf:ID="L12"> <wflns:hasOriginFileDescription rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics
/componentLibrary.owl#remDupesOutputFile"/> <wflns:hasFile rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/fileLibrary.owl#F12_WFT0b_1117161532484"/> <wflns:hasDestinationFileDescription rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics
/componentLibrary.owl#CountWordsInputFile"/> <wflns:hasDestinationNode> <wflns:Node rdf:ID="N2"> <wflns:hasComponent rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/componentLibrary.owl#countWordsV1"/> </wflns:Node> </wflns:hasDestinationNode> <wflns:hasOriginNode> <wflns:Node rdf:ID="N1"> <wflns:hasComponent rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics
/componentLibrary.owl#removeDuplicatesV1"/> </wflns:Node> </wflns:hasOriginNode> </wflns:InOutLink> <wflns:InputLink rdf:ID="L01"> <wflns:hasDestinationNode rdf:resource="#N1"/> <wflns:hasFile rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/fileLibrary.owl#test_txt_WFT0b_1117161532484"/> <wflns:hasDestinationFileDescription rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics
/componentLibrary.owl#remDupesInputFile"/> </wflns:InputLink> <wflns:OutputLink rdf:ID="L2Output"> <wflns:hasOriginFileDescription rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics
/componentLibrary.owl#CountWordsOutputFile"/> <wflns:hasOriginNode rdf:resource="#N2"/> <wflns:hasFile rdf:resource="http://www.isi.edu/ikcap/wings/domains/linguistics/fileLibrary.owl#F2Output_WFT0b_1117161532484"/> </wflns:OutputLink> </rdf:RDF>
22 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
W Instance: “dax” for Pegasus <?xml version="1.0" encoding="UTF-8"?> <!-- generated: 2004-08-18T10:53:01-05:00 --> <adag xmlns="http://www.griphyn.org/chimera/DAX"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.griphyn.org/chimera/DAX http://www.griphyn.org/chimera/dax-1.8.xsd" version="1.7"
count="1" index="0" name="WorkFlow0b">
<!-- part 1: list of all files used (may be empty) --> <filename file="vahi.f.a" link="input"/> <filename file="vahi.f.b1" link="inout"/> <filename file="vahi.f.b2" link="output"/> <!-- part 2: definition of all jobs (at least one) --> <job id="ID000001" namespace="vds" name="removeDups" version="1.0" level="3" dv-namespace="vds" dv
-name="top" dv-version="1.0"> <argument>-a top -T60 -i <filename file="vahi.f.a"/> -o <filename file="vahi.f.b1"/> </argument> <uses file="vahi.f.a" link="input" dontRegister="false" dontTransfer="false"/> <uses file="vahi.f.b1" link="output" dontRegister="true" dontTransfer="true" temporaryHint="true"/> </job> <job id="ID000002" namespace="vds" name="countWords" version="1.0" level="2" dv-namespace="vds" dv
-name="left" dv-version="1.0"> <argument>-a left -T60 -i <filename file="vahi.f.b1"/> -o <filename file="vahi.f.b2"/> -p 0.5<
/argument> <uses file="vahi.f.b1" link="input" dontRegister="false" dontTransfer="false" temporaryHint="true"/> <uses file="vahi.f.b2" link="output" dontRegister="true" dontTransfer="true" temporaryHint="true"/> </job> <!-- part 3: list of control-flow dependencies (empty for single jobs) --> <child ref="ID000002"> <parent ref="ID000001"/> </child> </adag>
23 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
AUTOMATED METADATA GENERATIONIN WINGS
24 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Metadata Reasoning for file name generation and workflow validation Filename Generation
• Explicit representation of metadata in ontology (e.g. source id, rupture id)
• Propagate metadata attributes for all data products when creating workflow instance
• Names for intermediate files are created automatically from the metadata
Workflow Validation • Explicit representation of metadata constraints (examples are
shown below) – Constraints on individual files and collections – Constraints on component inputs and outputs – Constraints among components in a workflow
• Check constraints while generating workflow instantiations
25 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Propagation of metadata for filename generation: an example
SeismogramGen_Li
RVM
127_6.rvm - source_id: 127 - rupture_id: 6
Rupture_variation Rupture_variation
127_6.txt.variation -s0000-h0000 - source_id: 127 - rupture_id: 6 - slip_relaization_#:0 - hypo_center_#: 1
127_6.txt.variation -s0000-h0000 - source_id: 127 - rupture_id: 6 - slip_relaization_#:0 - hypo_center_#: 1
127_6.txt.variation -s0000-h0001 - source_id: 127 - rupture_id: 6 - slip_relaization_#:0 - hypo_center_#: 1
127_6.txt.variation -s0000-h0001 - source_id: 127 - rupture_id: 6 - slip_relaization_#:0 - hypo_center_#: 1
SGT
127_6.txt.variation -s0000-h0000 - source_id: 127 - rupture_id: 6 - slip_relaization_#:0 - hypo_center_#: 1
127_6.txt.variation -s0000-h0001 - source_id: 127 - rupture_id: 6 - slip_relaization_#:0 - hypo_center_#: 1
FD_SGT/PAS_1/A/SGT161 - site_name: PAS - tensor_direction: 1 - time_period: A - xyz_volumn_id: 161
127_6.txt.variation -s0000-h0001 - source_id: 127 - rupture_id: 6 - slip_realization_#:0 - hypo_center_#: 1
Seismogram
Seismogram_PAS_127_6.grm - site_name: PAS - source_id: 127 - rupture_id: 6
… … SGT
26 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
AUTOMATIC WORKFLOW GENERATION IN WINGS
27 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Automatic Template-Based Workflow Generation Algorithm
WR0: Workflow Template
Workflow request =
Workflow Template
+
Seed Constraints
Seed workflow from request
unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows dataVariable5 data:contains data:Muti-party-communication dataVariable0 data:creator 5048 dataVariable1 data:creator 5048
WR0: Seed Constraints
28 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Step 1: Workflow Template is Seeded
unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
Seed workflow from request
29 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Step 2: Backward Sweep unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
Seed workflow from request
30 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
E-07
S-NY
Step 3: Select Data Sources unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
Seed workflow from request
31 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
E-07
S-NY
Step 3: Select Data Sources unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
Seed workflow from request
32 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
E-07
S-NY
Step 4: Forward Sweep unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
Seed workflow from request
33 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
E-07
S-NY
Result-PartA
Result-PartB
Step 5: Workflow Instantiation unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
Seed workflow from request
34 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
E-07
S-NY
Result-PartA
Result-PartB
Step 5: Workflow Instantiation unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
Seed workflow from request
35 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
E-07
S-NY
Result-PartA
Result-PartB
<job id = “j42” name=“Neuman-BC”> <argument> -i E-07 17.5 -o ES-07….
parent
parent parent
parent
parent
Step 6: Workflow Grounding
Ground Workflow
Seed workflow from request
unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
36 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
W1: estimated exec time 3hrs W2: estimated exec time 20hrs
W3: estimated exec time 3d W4: estimated exec time 5hrs
Step 7: Workflow Ranking Seed workflow from request
unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
37 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Seed workflow from request
unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
W1: estimated exec time 3hrs W2: estimated exec time 20hrs
W3: estimated exec time 3d W4: estimated exec time 5hrs
Step 7: Workflow Ranking
38 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Ground workflow: 15 compute nodes devoid of resource assignment
4 1
8 5
10
9
13
12
15
9
4
8 3 7
10
13
12
15
13 data stage-in nodes
11 compute nodes (1-2&5-6 reduced based on available intermediate data)
8 inter-site data transfers
14 data stage-out nodes to long-term storage 14 data registration nodes (data cataloging)
Executable workflow: mapped to 3 sites
Step 8: Workflow Mapping Seed workflow from request
unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
39 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Why Do We Automate All This?So You Donʼt Have To
Request ID
# Binding-Ready
Workflow Candidates
# Bound Workflow
Candidates
# Configured Workflow
Candidates
# Calls to c:find-DODs-given-output-requirements
# Calls to
d:find-data-
objects
# Calls to c:predict-DODs-given-input-requirements
Workflow Generation
Time
R1 6 8 8 1 6 8 5 s
R2 6 8 8 7 6 16 4 s
R3 6 24 24 7 6 48 7 s
R4 6 24 24 13 6 72 8 s
R5 18 64 48 7 18 128 22 s
R6 18 288 216 7 18 576 81 s
R7 18 16 12 7 18 32 10 s
R8 6 0 0 1 6 0 1 s
Seed workflow from request
unified well-formed request
Find input data requirements
seeded workflows
Data source selection
binding-ready workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows
executable workflows
Workflow ranking
top-k workflows
Workflow candidates generated + considered (many are eliminated)
Queries about data
Queries about tools
40 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
WINGS DEMO
41 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Editing a Seed & Template,Generating a DAX
WR0: Workflow Template
dataVariable5 data:contains data:Muti-party-communication dataVariable0 data:creator 5048 dataVariable1 data:creator 5048
WR0: Seed Constraints
Workflow seed =
Workflow Template
+
Seed Constraints
Seed workflow from request
unified well-formed request
Find input data requirements
seeded workflows
Data source selection
candidate workflows
Parameter selection
bound workflows
configured workflows
Workflow instantiation
Workflow grounding
workflow instances
Workflow mapping
ground workflows (DAXes)
executable workflows
Workflow ranking
top-k workflows
42 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
SCEC WORKFLOWS IN WINGS
43 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
InSAR Image of theHector Mine Earthquake
• A satellitegeneratedInterferometricSynthetic Radar(InSAR) image ofthe 1999 HectorMine earthquake.
• Shows thedisplacement fieldin the direction ofradar imaging
• Each fringe (e.g.,from red to red)corresponds to afew centimeters ofdisplacement.
Seismic Hazard Model
Seismic Hazard Analysis in Southern California Earthquake Center (SCEC) [Slide from T. Jordan]
44 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Intensional descriptions of data sets
Intensional descriptions of parallel computations
Querying results of other data creation subworkflows
Rich metadata descriptions for all data products
Reusable High-Level Workflow Templates
45 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Workflows for Seismic Hazard Analysis [Gil et al 06; Kim et al 06; Gil et al 07]
Input data: a site and an earthquake forecast model • thousands of possible fault ruptures and rupture
variations, each a file, unevenly distributed • ~110,000 rupture variations to be simulated for a
given site High-level template combines 11 application codes 8048 application nodes in the workflow instance
generated by Wings 24,135 nodes in the executable workflow generated
by Pegasus, including: • data stage-in jobs, data stage-out jobs, data
registration jobs Executed in USC HPCC cluster, 1820 nodes w/
dual processors) but only < 144 available • Including MPI jobs, each runs on hundreds of
processors for 25-33 hours • Runtime was 1.9 CPU years
Provenance records kept throughout the generation and execution process for 100,000 workflow data products
46 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
DAX automatically generated from WINGS
14,639 jobs for 4,626 ruptures with 106,124 rupture variations for USC site
47 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Summary:Creating Workflows with WINGS Separates analysis spec from data
• Workflow template as reusable well-defined acceptable analysis process • Workflow instance binds template to data for particular analyses
Ensures that the data complies with the component specifications and their constraints within the workflow
Represents data collections (nominal or otherwise) within the workflow specification
Automatically generates descriptions and metadata to new data products to be created by the workflow execution
Compact workflow instance is user-friendly and reusable • Separates data provenance (workflow instance) and pedigree (workflow
template) Expands workflow instance into DAX for Pegasus, which creates the
executable workflow
48 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Key Benefits Efficient and correct creation of new workflows
• By retrieving a template and filling in the data
Framework ensures adherence to methodology • Represents as templates widely-accepted analysis methodologies • Supports repeatability of experiments/analyses • Enables controlled variations
Ensures better quality of data analysis results • Attaches provenance and pedigree information
49 Yolanda Gil ([email protected]) AAAI-08 Tutorial July 13, 2008 USC Information Sciences Institute
Ongoing and Future Work Interactive assistance in creating valid workflow
templates • Based on CAT (Composition Analysis Tool) [Kim et al 05]
More sophisticated models of components Automatic completion of workflow’s data conversion and
formatting steps through AI planning techniques Tracking new versions of components, invalidate data
and workflows from old versions Workflow template libraries
• Indexing, retrieval
Managing collections of workflows as part of an overall analysis activity
• Eg: parameter sweeping, variants of analysis