the grid is a complex, distributed and heterogeneous execution environment. running applications...
TRANSCRIPT
Grid-BasedData Selector
CompositionalAnalysis Tool
(CAT)
DAXGenerator
Pegasus
CondorDAGMAN
PathwayComposition
Tool
GRID
host1host2
Data
Data
CAT KnowledgeBase
SCEC DatatypeDB
MetadataCatalog Service
ReplicaLocationService
Dax
Dag
Rsl
HAZARD MAP
The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need to discover the available resources and schedule the jobs onto them, essentially composing detailed application workflow descriptions by hand. This leaves users struggling with the complexity of the Grid and weighing which resources to use, where to run the computations, where to access the data etc. Thus there is a need to automate the workflow generation and execution process as much as possible.
Pegasus: Planning for Execution in Grids http://pegasus.isi.edu
Maps from abstract to concrete workflow.Isolates the user from many Grid details. Automatically locates physical locations for both components
(transformations) and data, via Globus RLS
and the Transformation Catalog.Finds appropriate resources to execute the components
(via Globus MDS).Interfaces with external site selectors.Publishes newly derived data products.Reuses existing data products where applicable.Supports on demand staging of binary executables.
Other Success StoriesLaser Interferometer Gravitational Wave Observatory (LIGO)
http://www.ligo.caltech.eduMontage http://montage.ipac.caltech.eduBLAST Genome Analysis and Database Update
http://www-fp.mcs.anl.gov/pdq/pdq.htmATLAS Monte Carlo data productionSloan Digital Sky Survey galaxy cluster finding http://www.sdss.org
Planning the SCEC Pathways: Planning the SCEC Pathways: Pegasus at work on the GridPegasus at work on the Grid
People Involved:ISI : Ewa Deelman, Sridhar Gullapalli, Carl Kesselman, John McGee
Gaurang Mehta, Gurmeet Singh, Mei-Hui Su, Karan Vahi
SCEC: Vipin Gupta, Phil Maechling
USC : Maureen Dougherty, Brian Mendenhall, Garrick Staples
SCEC COMPOSITION PROCESS
CAT (Compositional Analysis Tool) an ontology based
workflow composition tool or PCT (Pathway Composition Tool)
generate the application workflows template (using ontologies and data types).
The Grid-Based Input Data selection component allows the user to select the input data necessary to populate the workflow template. The result in an abstract workflow that refers only to the logical application components and logical input data required for a pathway.
The DAX generator translates the abstract workflow to a corresponding XML description (DAX).
Pegasus takes in the DAX and generates the concrete
workflow. Concrete Workflow identifies the resources that are used to
run on the grid and refers to the physical locations of input data.
Condor DAGMAN submits the workflow on the grid and tracks
the execution of the workflow. Successful execution generates the final hazard map for the
region.
A View of SCEC Composition Process
PEGASUS ENGINE
CPlanner (gencdag)
Rls-client Tc-clientGenpoolconfig
client
Data Transfer Mechanism
Gridlabtransfer
Transfer2
Multiple Transfer
Globus-url-copy
Stork
Transformation Catalog
Mechanism(TC)
DatabaseFile
Resource Information
Catalog
MDS File
Submit Writer
CondorStork Writer
GridLab GRMS
Pegasus command line clients
RoundRobin
Site SelectorMin-Min
Max-MinProphesy
Random
Grasp
RLS
Replica Query and Registration
Mechanism
Replica Selection
Existing Interfaces
Research Implementations
Production Implementations
Interfaces in development
RLS
Palm Springs
Caltech Teragrid
SCEC
NCSA-Teragrid
SDSC-Teragrid
USC
ANL-Teragrid
PSC-Teragrid
SCEC Resources Teragrid Resources Other Resources
Testbed
SCEC Workflow
Preparation job prepares input for
Pathway 2 simulation. Pathway 2 is Fortran based, wave
propagation MPI code. Pathway2PGV reads in binary
output file generated by Pathway2,
and converts it into hazard map that
can be visualized.