the grid is a complex, distributed and heterogeneous execution environment. running applications...

1
G rid-Based D ata Selector Compositional Analysis Tool (C AT) DAX Generator Pegasus C ondor DAGMAN Pathw ay Composition Tool G RID host1 host2 Data Data C AT Know ledge Base SC EC D atatype DB Metadata C atalog Service Replica Location Service D ax D ag R sl HAZARD M AP The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need to discover the available resources and schedule the jobs onto them, essentially composing detailed application workflow descriptions by hand. This leaves users struggling with the complexity of the Grid and weighing which resources to use, where to run the computations, where to access the data etc. Thus there is a need to automate the workflow generation and execution process as much as possible. Pegasus: Planning for Execution in Grids http://pegasus.isi.edu Maps from abstract to concrete workflow. Isolates the user from many Grid details. Automatically locates physical locations for both components (transformations) and data, via Globus RLS and the Transformation Catalog. Finds appropriate resources to execute the components (via Globus MDS). Interfaces with external site selectors. Publishes newly derived data products. Reuses existing data products where applicable. Supports on demand staging of binary executables. Other Success Stories Laser Interferometer Gravitational Wave Observatory (LIGO) http://www.ligo.caltech.edu Montage http://montage.ipac.caltech.edu BLAST Genome Analysis and Database Update http://www-fp.mcs.anl.gov/pdq/pdq.htm ATLAS Monte Carlo data production Sloan Digital Sky Survey galaxy cluster finding http://www.sdss.org Planning the SCEC Planning the SCEC Pathways: Pathways: Pegasus at Pegasus at work on the Grid work on the Grid People Involved: ISI : Ewa Deelman, Sridhar Gullapalli, Carl Kesselman, John McGee Gaurang Mehta, Gurmeet Singh, Mei-Hui Su, Karan Vahi SCEC: Vipin Gupta, Phil Maechling USC : Maureen Dougherty, Brian Mendenhall, Garrick Staples SCEC COMPOSITION PROCESS CAT (Compositional Analysis Tool) an ontology based workflow composition tool or PCT (Pathway Composition Tool) generate the application workflows template (using ontologies and data types). The Grid-Based Input Data selection component allows the user to select the input data necessary to populate the workflow template. The result in an abstract workflow that refers only to the logical application components and logical input data required for a pathway. The DAX generator translates the abstract workflow to a corresponding XML description (DAX). Pegasus takes in the DAX and generates the concrete workflow. Concrete Workflow identifies the resources that are used to run on the grid and refers to the physical locations of input data. Condor DAGMAN submits the workflow on the grid and tracks the execution of the workflow. Successful execution generates the final hazard map for the region. A View of SCEC Composition Process PEGASUS ENG IN E C Planner (gencdag) Rls-client Tc-client Genpoolconfig client D ata Transfer Mechanism G ridlab transfer Transfer2 Multiple Transfer Globus- url-copy Stork Transformation Catalog Mechanism (TC ) D atabase File R esource Information Catalog MDS File SubmitWriter C ondor Stork W riter GridLab GRMS Pegasus com m and line clients R ound Robin Site Selector Min-M in M ax-Min Prophesy R andom G rasp R LS Replica Q uery and Registration Mechanism Replica Selection Existing Interfaces R esearch Im plementations Production Im plementations Interfaces in developm ent R LS Palm Springs Caltech Teragrid SCEC NCSA-Teragrid SDSC-Teragrid USC ANL-Teragrid PSC-Teragrid SCEC Resources Teragrid Resources Other Resources Testbed SCEC Workflow Preparation job prepares input for Pathway 2 simulation. Pathway 2 is Fortran based, wave propagation MPI code. Pathway2PGV reads in binary output file generated by Pathway2, and converts it into hazard map that can be visualized.

Upload: erica-campbell

Post on 28-Dec-2015

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need

Grid-BasedData Selector

CompositionalAnalysis Tool

(CAT)

DAXGenerator

Pegasus

CondorDAGMAN

PathwayComposition

Tool

GRID

host1host2

Data

Data

CAT KnowledgeBase

SCEC DatatypeDB

MetadataCatalog Service

ReplicaLocationService

Dax

Dag

Rsl

HAZARD MAP

The Grid is a complex, distributed and heterogeneous execution environment. Running applications requires the knowledge of many grid services: users need to discover the available resources and schedule the jobs onto them, essentially composing detailed application workflow descriptions by hand. This leaves users struggling with the complexity of the Grid and weighing which resources to use, where to run the computations, where to access the data etc. Thus there is a need to automate the workflow generation and execution process as much as possible.

Pegasus: Planning for Execution in Grids http://pegasus.isi.edu

Maps from abstract to concrete workflow.Isolates the user from many Grid details. Automatically locates physical locations for both components

(transformations) and data, via Globus RLS

and the Transformation Catalog.Finds appropriate resources to execute the components

(via Globus MDS).Interfaces with external site selectors.Publishes newly derived data products.Reuses existing data products where applicable.Supports on demand staging of binary executables.

Other Success StoriesLaser Interferometer Gravitational Wave Observatory (LIGO)

http://www.ligo.caltech.eduMontage http://montage.ipac.caltech.eduBLAST Genome Analysis and Database Update

http://www-fp.mcs.anl.gov/pdq/pdq.htmATLAS Monte Carlo data productionSloan Digital Sky Survey galaxy cluster finding http://www.sdss.org

Planning the SCEC Pathways: Planning the SCEC Pathways: Pegasus at work on the GridPegasus at work on the Grid

People Involved:ISI : Ewa Deelman, Sridhar Gullapalli, Carl Kesselman, John McGee

Gaurang Mehta, Gurmeet Singh, Mei-Hui Su, Karan Vahi

SCEC: Vipin Gupta, Phil Maechling

USC : Maureen Dougherty, Brian Mendenhall, Garrick Staples

SCEC COMPOSITION PROCESS

CAT (Compositional Analysis Tool) an ontology based

workflow composition tool or PCT (Pathway Composition Tool)

generate the application workflows template (using ontologies and data types).

The Grid-Based Input Data selection component allows the user to select the input data necessary to populate the workflow template. The result in an abstract workflow that refers only to the logical application components and logical input data required for a pathway.

The DAX generator translates the abstract workflow to a corresponding XML description (DAX).

Pegasus takes in the DAX and generates the concrete

workflow. Concrete Workflow identifies the resources that are used to

run on the grid and refers to the physical locations of input data.

Condor DAGMAN submits the workflow on the grid and tracks

the execution of the workflow. Successful execution generates the final hazard map for the

region.

A View of SCEC Composition Process

PEGASUS ENGINE

CPlanner (gencdag)

Rls-client Tc-clientGenpoolconfig

client

Data Transfer Mechanism

Gridlabtransfer

Transfer2

Multiple Transfer

Globus-url-copy

Stork

Transformation Catalog

Mechanism(TC)

DatabaseFile

Resource Information

Catalog

MDS File

Submit Writer

CondorStork Writer

GridLab GRMS

Pegasus command line clients

RoundRobin

Site SelectorMin-Min

Max-MinProphesy

Random

Grasp

RLS

Replica Query and Registration

Mechanism

Replica Selection

Existing Interfaces

Research Implementations

Production Implementations

Interfaces in development

RLS

Palm Springs

Caltech Teragrid

SCEC

NCSA-Teragrid

SDSC-Teragrid

USC

ANL-Teragrid

PSC-Teragrid

SCEC Resources Teragrid Resources Other Resources

Testbed

SCEC Workflow

Preparation job prepares input for

Pathway 2 simulation. Pathway 2 is Fortran based, wave

propagation MPI code. Pathway2PGV reads in binary

output file generated by Pathway2,

and converts it into hazard map that

can be visualized.