griphyn & ivdgl architectural issues ggf5 bof data intensive applications common architectural...
TRANSCRIPT
GriPhyN & iVDGLArchitectural Issues
GGF5 BOF
Data Intensive Applications
Common Architectural Issues and Drivers
Edinburgh, 23 July 2002
Mike Wilde
Argonne National Laboratory
Grid Physics NetworkInternational Virtual Data Grid Laboratory
Project Summary Principle requirements
– IT Research: virtual data and transparent execution– Grid building: deploy international grid lab at scale
Components developed/used– Virtual Data Toolkit; Linux deployment platform– Virtual Data Catalog, Request planner and executor,
DAGman, NeST Scale of current testbeds
– ATLAS Test Grid – 8 sites– CMS Test Grid – 5 sites– Compute nodes: ~900 @ UW, UofC, UWM, UTB, ANL– >50 researchers and grid-builders working on IT
research challenge problems and demos Future directions (2002 & 2003)
– Extensive work on virtual data, planning, and catalog architecture, and fault tolerance
Chimera Overview
Concept: Tools to support management of transformations and derivations as community resources
Technology: Chimera virtual data system including virtual data catalog and virtual data language; use of GriPhyN virtual data toolkit for automated data derivation
Results: Successful early applications to CMS and SDSS data generation/analysis
Future: Public release of prototype, new apps, knowledge representation, planning
“Chimera” Virtual Data Model
Transformation designers create programmatic abstractions– Simple or compound; augment with metadata
Production managers create bulk derivations– Can materialize data products or leave virtual
Users track their work through derivations– Augment (replace?) the scientist’s log book
Definitions can be augmented with metadata– The key to intelligent data retrieval
– Issues relating to metadata propagation
pythia_input
pythia.exe
cmsim_input
cmsim.exe
writeHits
writeDigis
begin v /usr/local/demo/scripts/cmkin_input.csh file i ntpl_file_path file i template_file file i num_events stdout cmkin_param_fileend
begin v /usr/local/demo/binaries/kine_make_ntpl_pyt_cms121.exe pre cms_env_var stdin cmkin_param_file stdout cmkin_log file o ntpl_fileend
begin v /usr/local/demo/scripts/cmsim_input.csh file i ntpl_file file i fz_file_path file i hbook_file_path file i num_trigs stdout cmsim_param_fileend
begin v /usr/local/demo/binaries/cms121.exe condor copy_to_spool=false condor getenv=true stdin cmsim_param_file stdout cmsim_log file o fz_file file o hbook_fileend
begin v /usr/local/demo/binaries/writeHits.sh condor getenv=true pre orca_hits file i fz_file file i detinput file i condor_writeHits_log file i oo_fd_boot file i datasetname stdout writeHits_log file o hits_dbend
begin v /usr/local/demo/binaries/writeDigis.sh pre orca_digis file i hits_db file i oo_fd_boot file i carf_input_dataset_name file i carf_output_dataset_name file i carf_input_owner file i carf_output_owner file i condor_writeDigis_log stdout writeDigis_log file o digis_dbend
CMS Pipeline in VDL-0
Data Dependencies – VDL-1
TR tr1( out a2, in a1 ) { profile hints.exec-pfn = "/usr/bin/app1"; argument stdin = ${a1}; argument stdout = ${a2}; }
TR tr2( out a2, in a1 ) { profile hints.exec-pfn = "/usr/bin/app2"; argument stdin = ${a1}; argument stdout = ${a2}; }
DV x1->tr1( a2=@{out:file2}, a1=@{in:file1});DV x2->tr2( a2=@{out:file3}, a1=@{in:file2});
file1
file2
file3
x1
x2
Executor Example: Condor DAGMan
Directed Acyclic Graph Manager
Specify the dependencies between Condor jobs using DAG data structure
Manage dependencies automatically– (e.g., “Don’t run job “B” until job “A” has completed
successfully.”)
Each job is a “node” in DAG
Any number of parent or children nodes
No loops
Job A
Job B Job C
Job D
Slide courtesy Miron Livny, U. Wisconsin
Joint work with Jim Annis, Steve Kent, FNAL
Size distribution ofgalaxy clusters?
1
10
100
1000
10000
100000
1 10 100
Num
ber
of C
lust
ers
Number of Galaxies
Galaxy clustersize distribution
Chimera Virtual Data System+ GriPhyN Virtual Data Toolkit
+ iVDGL Data Grid (many CPUs)
Chimera Application: Sloan Digital Sky Survey Analysis
catalog
cluster
5
4
core
brg
field
tsObj
3
2
1
brg
field
tsObj
2
1
brg
field
tsObj
2
1
brg
field
tsObj
2
1
core
3
Cluster-finding Data Pipeline
Small SDSS Cluster-Finding DAG
And Even Bigger:744 Files, 387 Nodes
108
168
60
50
Vision:Distributed Virtual Data Service
apps
Tier 1 centers
Regional Centers
Local sites
VDC
VDCVDC
VDC
Distributed virtual data service
Knowledge Management - Strawman Architecture
Knowledge based requests are formulated in terms of science data– Eg, Give me a specific transform of channels c,p,&t over time range t0-t1
Finder finds the data files– Translates range “t0-t1” into a set of files
Coder creates an execution plan and defines derivations from known transformations– Can deal with missing files (e.g, file c in LIGO example)
Knowledge request is answered in terms of datasets Coder translates datasets into logical files (or objects, queries, tables,…) Planner translates logical entities into physical entities
Coderknowledge-
basedrequest
Finder
MetadataCatalog
Virtual DataCatalog
PlanneraDAG
GriPhyN/PPDGData Grid Architecture
Application
Planner
Executor
Catalog Services
Info Services
Policy/Security
Monitoring
Repl. Mgmt.
Reliable TransferService
Compute Resource Storage Resource
DAG (concrete)
DAG (abstract)
DAGMAN, Kangaroo
GRAM GridFTP; GRAM; SRM
GSI, CAS
MDS
MCAT; GriPhyN catalogs
GDMP
MDS
Globus
Common Problem #1(evolving) View of Data Grid Stack
Data Transport(GridFTP)
Storage Element
Local Repl Catalog(Flat or Hierarchical)
Reliable FileTransfer
Replica LocationService
Publish-SubscribeService (GDMP)
StorageElementManager
Reliable Replication
Site
RRM
GDMP
Personal Condor
OtherTP
Abstract Planner
Condor-G
DAGman
DAPman
GridFTP
RFTService
Replica LocationService
Virtual Data Service
CASPolicy Repository
FSNFS
AFS
SRB
HRM
SRM
Tertiary Sore
FS
GRAM Client
Gatekeeper
Condor JobManager
Condor
Concrete Planner
Application Application
Gatekeeper
Q
Q
Q
Q
Qetc...
DRM QPBS JobManager
PBS Q
NeST
Site
GriPhyN Data GridArchitectural
ContextPreliminary view of
a discussion inprogress
13 May 2002Notes by M. Wilde
based ondiscussions with M.Livny, I. Foster, W.
Allcock, A.Shoshani, andmany others
NOTE: mostinterconections are
either approximate ornot yet shown here
Architectural
Complexities
Common Problem #2:Request Planning
Map of grid resources Incoming work to plan
– Queue? With lookahead? Status of grid resources
– State (up/down)– Load (current, queued, and anticipated)– Reservations
Policy– Allocation (commitment of resource to VO or
group based on policy) Ability to change decisions dynamically
Policy
Focus is on resource allocation (not with security)
Allocation examples:– “CMS should get 80% of the resources at
Caltech” (averaged monthly)
– “Higgs group has high prio at BNL till 8/1” Need to apply fair share scheduling to grid Need to understand the allocation models
dictated by funders and data centers
Grids as overlays on shared resources
S1A
sharedSE
C C
LRC
SAS-1
ooo
Site 1
VO-A
CAS-A Planner 1
Virtual Data Service vdb
Replica Location Service RIS
SnA
sharedSE
C C
LRC
S2A
sharedSE
C C
LRC
S1C
sharedSE
C C
LRC
VO-C
CAS-C
Grid Scheduling Problem Given an abstract DAG representing logical work:
– Where should each compute job be executed?> What does site and VO policy say?> What does grid “weather” dictate?
– Where is the required data now?– Where should data results be sent?
Stop and re-schedule computations? Suspend or de-prioritize work in progress to let
higher prio work go through? Degree of policy control? Is a “grid” an entity? - “aggregator” of resources? How is data placement coordinated with planning? Use of a Execution profiler in the planner arch:
– Characterize resource needs of an app over time– Parameterize resource reqs of app by its parameters
What happens when things go wrong?
Policy and the Planner
Planner considers:– Policy (fairly static, from CAS/SAS)
– Grid status
– Job (user/group) resource consumptn history
– Job profiles (resources over time) from Prophesy
planner
policy
AccountingRecords
Status
Job Usageinfo
Job ProfileRecords
Prohphesy(predictor)
Job ProfilingData
Open Issues – Planner (1)
Does the planner have a queue?If so, how does a planner manage its queue?
How many planners are there? Is it a service? How is responsibility between planner and the
executor (cluster scheduler) partitioned? How many other entities need to be coordinated?
– RFT, DAPman, SRM, NeST, …?
– How to wait on reliable file transfers? How does planner estimate times if it only has
partial responsibility for when/where things run? How is data placement planning coordinated with
request planning?
Open Issues – Planner (2) Clearly need incremental planning (eg for analysis) Stop and re-schedule computations? Suspend or de-prioritize work in progress to let
higher prio work go through? Degree of policy control? Is the “grid” an entity? Use of a Execution profiler in the planner arch:
– Characterize the resource requirements of an app over time
– Parameterize the res reqs of an app w.r.t its (salient) parameters
What happens when things go wrong?
Issue Summary Consolidate the data grid stack
– Reliable file transfer– Reliable replication– Replica catalog and virtual data catalog
scaled for global use Define interfaces and locations of planners Unify job workflow representation around
DAGs Define how to state and manage policy Strategies for fault tolerance – similar to
replanning for weather and policy changes?
Evolution of services to OGSA