griphyn & ivdgl architectural issues ggf5 bof data intensive applications common architectural...

GriPhyN & iVDGLArchitectural Issues

GGF5 BOF

Data Intensive Applications

Common Architectural Issues and Drivers

Edinburgh, 23 July 2002

Mike Wilde

Argonne National Laboratory

Grid Physics NetworkInternational Virtual Data Grid Laboratory

Project Summary Principle requirements

– IT Research: virtual data and transparent execution– Grid building: deploy international grid lab at scale

Components developed/used– Virtual Data Toolkit; Linux deployment platform– Virtual Data Catalog, Request planner and executor,

DAGman, NeST Scale of current testbeds

– ATLAS Test Grid – 8 sites– CMS Test Grid – 5 sites– Compute nodes: ~900 @ UW, UofC, UWM, UTB, ANL– >50 researchers and grid-builders working on IT

research challenge problems and demos Future directions (2002 & 2003)

– Extensive work on virtual data, planning, and catalog architecture, and fault tolerance

Chimera Overview

Concept: Tools to support management of transformations and derivations as community resources

Technology: Chimera virtual data system including virtual data catalog and virtual data language; use of GriPhyN virtual data toolkit for automated data derivation

Results: Successful early applications to CMS and SDSS data generation/analysis

Future: Public release of prototype, new apps, knowledge representation, planning

“Chimera” Virtual Data Model

Transformation designers create programmatic abstractions– Simple or compound; augment with metadata

Production managers create bulk derivations– Can materialize data products or leave virtual

Users track their work through derivations– Augment (replace?) the scientist’s log book

Definitions can be augmented with metadata– The key to intelligent data retrieval

– Issues relating to metadata propagation

pythia_input

pythia.exe

cmsim_input

cmsim.exe

writeHits

writeDigis

begin v /usr/local/demo/scripts/cmkin_input.csh file i ntpl_file_path file i template_file file i num_events stdout cmkin_param_fileend

begin v /usr/local/demo/binaries/kine_make_ntpl_pyt_cms121.exe pre cms_env_var stdin cmkin_param_file stdout cmkin_log file o ntpl_fileend

begin v /usr/local/demo/scripts/cmsim_input.csh file i ntpl_file file i fz_file_path file i hbook_file_path file i num_trigs stdout cmsim_param_fileend

begin v /usr/local/demo/binaries/cms121.exe condor copy_to_spool=false condor getenv=true stdin cmsim_param_file stdout cmsim_log file o fz_file file o hbook_fileend

begin v /usr/local/demo/binaries/writeHits.sh condor getenv=true pre orca_hits file i fz_file file i detinput file i condor_writeHits_log file i oo_fd_boot file i datasetname stdout writeHits_log file o hits_dbend

begin v /usr/local/demo/binaries/writeDigis.sh pre orca_digis file i hits_db file i oo_fd_boot file i carf_input_dataset_name file i carf_output_dataset_name file i carf_input_owner file i carf_output_owner file i condor_writeDigis_log stdout writeDigis_log file o digis_dbend

CMS Pipeline in VDL-0

Data Dependencies – VDL-1

TR tr1( out a2, in a1 ) { profile hints.exec-pfn = "/usr/bin/app1"; argument stdin = ${a1}; argument stdout = ${a2}; }

TR tr2( out a2, in a1 ) { profile hints.exec-pfn = "/usr/bin/app2"; argument stdin = ${a1}; argument stdout = ${a2}; }

DV x1->tr1( a2=@{out:file2}, a1=@{in:file1});DV x2->tr2( a2=@{out:file3}, a1=@{in:file2});

file1

file2

file3

x1

x2

Executor Example: Condor DAGMan

Directed Acyclic Graph Manager

Specify the dependencies between Condor jobs using DAG data structure

Manage dependencies automatically– (e.g., “Don’t run job “B” until job “A” has completed

successfully.”)

Each job is a “node” in DAG

Any number of parent or children nodes

No loops

Job A

Job B Job C

Job D

Slide courtesy Miron Livny, U. Wisconsin

Joint work with Jim Annis, Steve Kent, FNAL

Size distribution ofgalaxy clusters?

1

10

100

1000

10000

100000

1 10 100

Num

ber

of C

lust

ers

Number of Galaxies

Galaxy clustersize distribution

Chimera Virtual Data System+ GriPhyN Virtual Data Toolkit

+ iVDGL Data Grid (many CPUs)

Chimera Application: Sloan Digital Sky Survey Analysis

catalog

cluster

5

4

core

brg

field

tsObj

3

2

1

brg

field

tsObj

2

1

brg

field

tsObj

2

1

brg

field

tsObj

2

1

core

3

Cluster-finding Data Pipeline

Small SDSS Cluster-Finding DAG

And Even Bigger:744 Files, 387 Nodes

108

168

60

50

Vision:Distributed Virtual Data Service

apps

Tier 1 centers

Regional Centers

Local sites

VDC

VDCVDC

VDC

Distributed virtual data service

Knowledge Management - Strawman Architecture

Knowledge based requests are formulated in terms of science data– Eg, Give me a specific transform of channels c,p,&t over time range t0-t1

Finder finds the data files– Translates range “t0-t1” into a set of files

Coder creates an execution plan and defines derivations from known transformations– Can deal with missing files (e.g, file c in LIGO example)

Knowledge request is answered in terms of datasets Coder translates datasets into logical files (or objects, queries, tables,…) Planner translates logical entities into physical entities

Coderknowledge-

basedrequest

Finder

MetadataCatalog

Virtual DataCatalog

PlanneraDAG

GriPhyN/PPDGData Grid Architecture

Application

Planner

Executor

Catalog Services

Info Services

Policy/Security

Monitoring

Repl. Mgmt.

Reliable TransferService

Compute Resource Storage Resource

DAG (concrete)

DAG (abstract)

DAGMAN, Kangaroo

GRAM GridFTP; GRAM; SRM

GSI, CAS

MDS

MCAT; GriPhyN catalogs

GDMP

MDS

Globus

Common Problem #1(evolving) View of Data Grid Stack

Data Transport(GridFTP)

Storage Element

Local Repl Catalog(Flat or Hierarchical)

Reliable FileTransfer

Replica LocationService

Publish-SubscribeService (GDMP)

StorageElementManager

Reliable Replication

Site

RRM

GDMP

Personal Condor

OtherTP

Abstract Planner

Condor-G

DAGman

DAPman

GridFTP

RFTService

Replica LocationService

Virtual Data Service

CASPolicy Repository

FSNFS

AFS

SRB

HRM

SRM

Tertiary Sore

FS

GRAM Client

Gatekeeper

Condor JobManager

Condor

Concrete Planner

Application Application

Gatekeeper

Q

Q

Q

Q

Qetc...

DRM QPBS JobManager

PBS Q

NeST

Site

GriPhyN Data GridArchitectural

ContextPreliminary view of

a discussion inprogress

13 May 2002Notes by M. Wilde

based ondiscussions with M.Livny, I. Foster, W.

Allcock, A.Shoshani, andmany others

NOTE: mostinterconections are

either approximate ornot yet shown here

Architectural

Complexities

Common Problem #2:Request Planning

Map of grid resources Incoming work to plan

– Queue? With lookahead? Status of grid resources

– State (up/down)– Load (current, queued, and anticipated)– Reservations

Policy– Allocation (commitment of resource to VO or

group based on policy) Ability to change decisions dynamically

Policy

Focus is on resource allocation (not with security)

Allocation examples:– “CMS should get 80% of the resources at

Caltech” (averaged monthly)

– “Higgs group has high prio at BNL till 8/1” Need to apply fair share scheduling to grid Need to understand the allocation models

dictated by funders and data centers

Grids as overlays on shared resources

S1A

sharedSE

C C

LRC

SAS-1

ooo

Site 1

VO-A

CAS-A Planner 1

Virtual Data Service vdb

Replica Location Service RIS

SnA

sharedSE

C C

LRC

S2A

sharedSE

C C

LRC

S1C

sharedSE

C C

LRC

VO-C

CAS-C

Grid Scheduling Problem Given an abstract DAG representing logical work:

– Where should each compute job be executed?> What does site and VO policy say?> What does grid “weather” dictate?

– Where is the required data now?– Where should data results be sent?

Stop and re-schedule computations? Suspend or de-prioritize work in progress to let

higher prio work go through? Degree of policy control? Is a “grid” an entity? - “aggregator” of resources? How is data placement coordinated with planning? Use of a Execution profiler in the planner arch:

– Characterize resource needs of an app over time– Parameterize resource reqs of app by its parameters

What happens when things go wrong?

Policy and the Planner

Planner considers:– Policy (fairly static, from CAS/SAS)

– Grid status

– Job (user/group) resource consumptn history

– Job profiles (resources over time) from Prophesy

planner

policy

AccountingRecords

Status

Job Usageinfo

Job ProfileRecords

Prohphesy(predictor)

Job ProfilingData

Open Issues – Planner (1)

Does the planner have a queue?If so, how does a planner manage its queue?

How many planners are there? Is it a service? How is responsibility between planner and the

executor (cluster scheduler) partitioned? How many other entities need to be coordinated?

– RFT, DAPman, SRM, NeST, …?

– How to wait on reliable file transfers? How does planner estimate times if it only has

partial responsibility for when/where things run? How is data placement planning coordinated with

request planning?

Open Issues – Planner (2) Clearly need incremental planning (eg for analysis) Stop and re-schedule computations? Suspend or de-prioritize work in progress to let

higher prio work go through? Degree of policy control? Is the “grid” an entity? Use of a Execution profiler in the planner arch:

– Characterize the resource requirements of an app over time

– Parameterize the res reqs of an app w.r.t its (salient) parameters

What happens when things go wrong?

Issue Summary Consolidate the data grid stack

– Reliable file transfer– Reliable replication– Replica catalog and virtual data catalog

scaled for global use Define interfaces and locations of planners Unify job workflow representation around

DAGs Define how to state and manage policy Strategies for fault tolerance – similar to

replanning for weather and policy changes?

Evolution of services to OGSA

griphyn & ivdgl architectural issues ggf5 bof data intensive applications common architectural...

Documents