grid technologies enabling collaborative science ian foster mathematics and computer science...
TRANSCRIPT
Grid TechnologiesEnabling Collaborative Science
Ian Foster
Mathematics and Computer Science Division
Argonne National Laboratory
and
Department of Computer Science
The University of Chicago
http://www.mcs.anl.gov/~foster
Plenary Talk, I2 Virtual Conference, October 2, 2001
[email protected] ARGONNE CHICAGO
Issues I Will Address
Problem statement: What are Grids? Architecture and technologies Projects and futures
[email protected] ARGONNE CHICAGO
The Grid Problem
Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations
[email protected] ARGONNE CHICAGO
Elements of the Problem
Resource sharing– Computers, storage, sensors, networks, …
– Sharing always conditional: issues of trust, policy, negotiation, payment, …
Coordinated problem solving– Beyond client-server: distributed data analysis,
computation, collaboration, … Dynamic, multi-institutional virtual orgs
– Community overlays on classic org structures
– Large or small, static or dynamic
[email protected] ARGONNE CHICAGO
Grid Communities & Applications:Data Grids for High Energy Physics
Tier2 Centre ~1 TIPS
Online System
Offline Processor Farm
~20 TIPS
CERN Computer Centre
FermiLab ~4 TIPSFrance Regional Centre
Italy Regional Centre
Germany Regional Centre
InstituteInstituteInstituteInstitute ~0.25TIPS
Physicist workstations
~100 MBytes/sec
~100 MBytes/sec
~622 Mbits/sec
~1 MBytes/sec
There is a “bunch crossing” every 25 nsecs.
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec or Air Freight (deprecated)
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Caltech ~1 TIPS
~622 Mbits/sec
Tier 0Tier 0
Tier 1Tier 1
Tier 2Tier 2
Tier 4Tier 4
1 TIPS is approximately 25,000
SpecInt95 equivalents
Image courtesy Harvey Newman, Caltech
[email protected] ARGONNE CHICAGO
Grid Communities and Applications:Network for Earthquake Eng. Simulation
NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other
On-demand access to experiments, data streams, computing, archives, collaboration
NEESgrid: Argonne, Michigan, NCSA, UIUC, USC
[email protected] ARGONNE CHICAGO
Why Discuss Architecture?
Descriptive– Provide a common vocabulary for use when
describing Grid systems Guidance
– Identify key areas in which services are required
Prescriptive– Define standard “Intergrid” protocols and
APIs to facilitate creation of interoperable Grid systems and portable applications
[email protected] ARGONNE CHICAGO
What Sorts of Standards? Need for interoperability when different groups
want to share resources– E.g., IP lets me talk to your computer, but how do
we establish & maintain sharing?
– How do I discover, authenticate, authorize, describe what I want to do, etc., etc.?
Need for shared infrastructure services to avoid repeated development, installation, e.g.– One port/service for remote access to computing,
not one per tool/application
– X.509 enables sharing of Certificate Authorities
[email protected] ARGONNE CHICAGO
So, in Defining Grid Architecture, We Must Address …
Development of Grid protocols & services– Protocol-mediated access to remote resources
– New services: e.g., resource brokering
– “On the Grid” = speak Intergrid protocols
– Mostly (extensions to) existing protocols Development of Grid APIs & SDKs
– Facilitate application development by supplying higher-level abstractions
The model is the Internet and Web
[email protected] ARGONNE CHICAGO
The Role of Grid Services(aka Middleware) and Tools
Remotemonitor
Remoteaccess
Informationservices
Faultdetection
. . .Resourcemgmt
CollaborationTools
Data MgmtTools
Distributedsimulation
. . .
net
[email protected] ARGONNE CHICAGO
Layered Grid Architecture(By Analogy to Internet Architecture)
Application
Fabric“Controlling things locally”: Access to, & control of, resources
Connectivity“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services
InternetTransport
Application
Link
Inte
rnet P
roto
col
Arch
itectu
re
[email protected] ARGONNE CHICAGO
Where Are We With Architecture?
No “official” standards exist But:
– Globus Toolkit has emerged as the de facto standard for several important Connectivity, Resource, and Collective protocols
– GGF has an architecture working group
– Technical specifications are being developed for architecture elements: e.g., security, data, resource management, information
– Internet drafts submitted in security area
[email protected] ARGONNE CHICAGO
Grid Services Architecture:Connectivity Layer Protocols & Services
Communication– Internet protocols: IP, DNS, routing, etc.
Security: Grid Security Infrastructure (GSI)– Uniform authentication & authorization
mechanisms in multi-institutional setting
– Single sign-on, delegation, identity mapping
– Public key technology, SSL, X.509, GSS-API (several Internet drafts document extensions)
– Supporting infrastructure: Certificate Authorities, key management, etc.
[email protected] ARGONNE CHICAGO
Site A(Kerberos)
Site B (Unix)
Site C(Kerberos)
Computer
User
Single sign-on via “grid-id”& generation of proxy cred.
Or: retrieval of proxy cred.from online repository
User ProxyProxy
credential
Computer
Storagesystem
Communication*
GSI-enabledFTP server
AuthorizeMap to local idAccess file
Remote fileaccess request*
GSI-enabledGRAM server
GSI-enabledGRAM server
Remote processcreation requests*
* With mutual authentication
Process
Kerberosticket
Restrictedproxy
Process
Restrictedproxy
Local id Local id
AuthorizeMap to local idCreate processGenerate credentials
Ditto
GSI in Action: “Create Processes at A and B that Communicate & Access Files at C”
[email protected] ARGONNE CHICAGO
Grid Services Architecture (3):Resource Layer Protocols & Services
Resource management: GRAM– Remote allocation, reservation, monitoring,
control of [compute] resources Data access: GridFTP
– High-performance data access & transport Information: MDS (GRRP, GRIP)
– Access to structure & state information & others emerging: catalog access, code
repository access, accounting, … All integrated with GSI
[email protected] ARGONNE CHICAGO
GRAM Resource Management Protocol
Grid Resource Allocation & Management– Allocation, monitoring, control of computations
Simple HTTP-based RPC– Job request: Returns opaque, transferable “job
contact” string for access to job
– Job cancel, Job status, Job signal
– Event notification (callbacks) for state changes Protocol/server address robustness (exactly once
execution), authentication, authorization Servers for most schedulers; C and Java APIs
[email protected] ARGONNE CHICAGO
Resource Management Futures:GRAM-2 (planned for late 2001)
Advance reservations– As prototyped in GARA in previous 2 years
Multiple resource types– Manage anything: storage, networks, etc., etc.
Recoverable requests, timeout, etc.– Build on early work with Condor group
Use of SOAP (RPC using HTTP + XML)– First step towards Web Services
Policy evaluation points for restricted proxies
Karl Czajkowski, Steve Tuecke, others
[email protected] ARGONNE CHICAGO
Data Access & Transfer
GridFTP: extended version of popular FTP protocol for Grid data access and transfer
Secure, efficient, reliable, flexible, extensible, parallel, concurrent, e.g.:– Third-party data transfers, partial file transfers
– Parallelism, striping (e.g., on PVFS)
– Reliable, recoverable data transfers Reference implementations
– Existing clients and servers: wuftpd, nicftp
– Flexible, extensible libraries
[email protected] ARGONNE CHICAGO
Grid Services Architecture (4):Collective Layer Protocols & Services Index servers aka metadirectory services
– Custom views on dynamic resource collections assembled by a community
Resource brokers (e.g., Condor Matchmaker)– Resource discovery and allocation
Replica management and replica selection– Optimize aggregate data access performance
Co-reservation and co-allocation services– End-to-end performance
Etc., etc.
[email protected] ARGONNE CHICAGO
The Grid Information Problem
Large numbers of distributed “sensors” with different properties
Need for different “views” of this information, depending on community membership, security constraints, intended purpose, sensor type
[email protected] ARGONNE CHICAGO
The Globus Toolkit Solution: MDS-2
Registration & enquiry protocols, information models, query languages– Provides standard interfaces to sensors
– Supports different “directory” structures supporting various discovery/access strategies
Karl Czajkowski, Steve Fitzgerald, others
[email protected] ARGONNE CHICAGO
GriPhyN/PPDGData Grid Architecture
Application
Planner
Executor
Catalog Services
Info Services
Policy/Security
Monitoring
Repl. Mgmt.
Reliable TransferService
Compute Resource Storage Resource
DAG
DAG
DAGMAN, Kangaroo
GRAM GridFTP; GRAM; SRM
GSI, CAS
MDS, NWS
MCAT; GriPhyN catalogs
GDMP
MDS
Globus
= initial solution is operational
Ewa Deelman, Mike Wilde, others www.griphyn.org
[email protected] ARGONNE CHICAGO
Selected Major Grid ProjectsName URL &
SponsorsFocus
Access Grid www.mcs.anl.gov/FL/accessgrid; DOE, NSF
Create & deploy group collaboration systems using commodity technologies
BlueGrid IBM Grid testbed linking IBM laboratories
DISCOM www.cs.sandia.gov/discomDOE Defense Programs
Create operational Grid providing access to resources at three U.S. DOE weapons laboratories
DOE Science Grid
sciencegrid.org
DOE Office of Science
Create operational Grid providing access to resources & applications at U.S. DOE science laboratories & partner universities
Earth System Grid (ESG)
earthsystemgrid.orgDOE Office of Science
Delivery and analysis of large climate model datasets for the climate research community
European Union (EU) DataGrid
eu-datagrid.org
European Union
Create & apply an operational grid for applications in high energy physics, environmental science, bioinformatics
g
g
g
g
g
g
New
New
[email protected] ARGONNE CHICAGO
Selected Major Grid ProjectsName URL/
SponsorFocus
EuroGrid, Grid Interoperability (GRIP)
eurogrid.org
European Union
Create technologies for remote access to supercomputer resources & simulation codes; in GRIP, integrate with Globus
Fusion Collaboratory
fusiongrid.org
DOE Off. Science
Create a national computational collaboratory for fusion research
Globus Project globus.org
DARPA, DOE, NSF, NASA, Msoft
Research on Grid technologies; development and support of Globus Toolkit; application and deployment
GridLab gridlab.org
European Union
Grid technologies and applications
GridPP gridpp.ac.uk
U.K. eScience
Create & apply an operational grid within the U.K. for particle physics research
Grid Research Integration Dev. & Support Center
grids-center.org
NSF
Integration, deployment, support of the NSF Middleware Infrastructure for research & education
g
g
g
g
g
g
New
New
New
New
New
[email protected] ARGONNE CHICAGO
Selected Major Grid ProjectsName URL/Sponsor Focus
Grid Application Dev. Software
hipersoft.rice.edu/grads; NSF
Research into program development technologies for Grid applications
Grid Physics Network
griphyn.org
NSF
Technology R&D for data analysis in physics expts: ATLAS, CMS, LIGO, SDSS
Information Power Grid
ipg.nasa.gov
NASA
Create and apply a production Grid for aerosciences and other NASA missions
International Virtual Data Grid Laboratory
ivdgl.org
NSF
Create international Data Grid to enable large-scale experimentation on Grid technologies & applications
Network for Earthquake Eng. Simulation Grid
neesgrid.org
NSF
Create and apply a production Grid for earthquake engineering
Particle Physics Data Grid
ppdg.net
DOE Science
Create and apply production Grids for data analysis in high energy and nuclear physics experiments
g
g
g
g
g
New
New
g
[email protected] ARGONNE CHICAGO
Selected Major Grid ProjectsName URL/Sponsor Focus
TeraGrid teragrid.org
NSF
U.S. science infrastructure linking four major resource sites at 40 Gb/s
UK Grid Support Center
grid-support.ac.uk
U.K. eScience
Support center for Grid projects within the U.K.
Unicore BMBFT Technologies for remote access to supercomputers
g
g
New
New
Also many technology R&D projects: e.g., Condor, NetSolve, Ninf, NWS
See also www.gridforum.org
[email protected] ARGONNE CHICAGO
The 13.6 TF TeraGrid:Computing at 40 Gb/s
26
24
8
4 HPSS
5
HPSS
HPSS UniTree
External Networks
External Networks
External Networks
External Networks
Site Resources Site Resources
Site ResourcesSite ResourcesNCSA/PACI8 TF240 TB
SDSC4.1 TF225 TB
Caltech Argonne
TeraGrid/DTF: NCSA, SDSC, Caltech, Argonne www.teragrid.org
[email protected] ARGONNE CHICAGO
iVDGL:International Virtual Data Grid Laboratory
Tier0/1 facility
Tier2 facility
10 Gbps link
2.5 Gbps link
622 Mbps link
Other link
Tier3 facility
U.S. PIs: Avery, Foster, Gardner, Newman, Szalay www.ivdgl.org
[email protected] ARGONNE CHICAGO
NSF GRIDS Center
Grid Research, Integration, Deployment, & Support (GRIDS) Center
Develop, deploy, support– Middleware infrastructure for national-scale
collaborative science and engineering
– Integration platform for experimental middleware technologies
UC, USC/ISI, UW, NCSA, SDSC Partner with Internet-2, SURA, Educause in
NSF Middleware Initiative
www.grids-center.org www.nsf-middleware-org
GRIDS
[email protected] ARGONNE CHICAGO
And What’s This Got To Do With … CORBA?
– Grid-enabled CORBA underway Java, Jini, Jxta?
– Java CoG Kit. Jini, Jxta: future uncertain Web Services, .NET, J2EE?
– Major Globus focus (GRAM-2: SOAP, WSDL)
– Workflow/choreography services
– Q: What can Grid offer to Web services? Next revolutionary technology of the month?
– They’ll need Grid technologies too
[email protected] ARGONNE CHICAGO
The Future:All Software is Network-Centric
We don’t build or buy “computers” anymore, we borrow or lease required resources– When I walk into a room, need to solve a
problem, need to communicate A “computer” is a dynamically, often
collaboratively constructed collection of processors, data sources, sensors, networks– Similar observations apply for software
[email protected] ARGONNE CHICAGO
And Thus …
Reduced barriers to access mean that we do much more computing, and more interesting computing, than today => Many more components (& services); massive parallelism
All resources are owned by others => Sharing (for fun or profit) is fundamental; trust, policy, negotiation, payment
All computing is performed on unfamiliar systems => Dynamic behaviors, discovery, adaptivity, failure
[email protected] ARGONNE CHICAGO
Acknowledgments
Globus R&D is joint with numerous people– Carl Kesselman, Co-PI
– Steve Tuecke, principal architect at ANL
– Others to be acknowledged GriPhyN R&D is joint with numerous people
– Paul Avery, Co-PI; Newman, Lazzarini, Szalay
– Mike Wilde, project coordinator
– Carl Kesselman, Miron Livny CS leads
– ATLAS, CMS, LIGO, SDSS participants; others Support: DOE, DARPA, NSF, NASA, Microsoft
[email protected] ARGONNE CHICAGO
Summary
The Grid problem: Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations
Grid architecture: Emphasize protocol and service definition to enable interoperability and resource sharing
Globus Toolkit a source of protocol and API definitions, reference implementations
See: globus.org, griphyn.org, gridforum.org, grids-center.org, nsf-middleware.org