an introduction to grid computing - david groep 2004.09.13 1 grid computing introduction david groep...
Post on 18-Dec-2015
217 views
TRANSCRIPT
An Introduction to Grid Computing - David Groep 2004.09.13 1
Grid Computing Introduction
David GroepNIKHEF Physics Data Processing Group
Dutc hG rid
An Introduction to Grid Computing - David Groep 2004.09.13 2
Talk Outline• The vision
• A problem and a computing model
• What makes a Grid– Authentication and Authorization– Protocols and Standards– Putting it together: Collective Services
• Into production: the LGC Computing Grid• Building your own Grid
An Introduction to Grid Computing - David Groep 2004.09.13 3
Grid – a visionThe GRID: networked data processing centres and ”middleware” software as the “glue” of resources.
Researchers perform their activities regardless geographical location, interact with colleagues, share and access data
Scientific instruments and experiments provide huge amounts of data
An Introduction to Grid Computing - David Groep 2004.09.13 4
Place event info on 3D map
Trace trajectories through hits
Assign type to each track
Find particles you want
Needle in a haystack!
This is “relatively easy” case
A Glimpse of the Problem in HEP
An Introduction to Grid Computing - David Groep 2004.09.13 5
level 1 - special hardware
40 MHz (40 TB/sec)level 2 - embedded processorslevel 3 - PCs
75 KHz (75 GB/sec)5 KHz (5 GB/sec)100 Hz(100 MB/sec)data recording &
offline analysis
HEP Data Rates
• Reconstruct & analyze 1 event takes about 90 s
• Maybe only a few out of a million are interesting. But we have to check them all!
• Analysis program needs lots of calibration; determined from inspecting results of first pass.
Each event will be analyzed several times!
• Raw data rate ~ 5PByte/yr/expt.
• total volume: ~20 Pbyte/yr
• per major centre: ~2 PByte/yr
The ATLAS experiment
An Introduction to Grid Computing - David Groep 2004.09.13 6
LHC User Distribution
•Putting all computers in one spot leads to traffic jams
•Which spot is willing to pay for & maintain 100k CPUs?
An Introduction to Grid Computing - David Groep 2004.09.13 7
Computing models
Mini ComputerMini Computer
MicrocomputerMicrocomputer
ClusterCluster
(by Christophe Jacquet)
Once upon a time……..
mainframemainframe
An Introduction to Grid Computing - David Groep 2004.09.13 8
The Grid & Distributed Computing
(by Christophe Jacquet)
…and today
An Introduction to Grid Computing - David Groep 2004.09.13 9
Realizing the Grid VisionGrid was the logical next step in the end of the 1990:
• Harnassing desktop power became commonplace – 1988: Condor, later: SETI@Home, Entropia, Distributed.NET
• Peer-to-peer data access protocols emerged~ 1999: Napster, later: Gnutella, KaZaa, BitTorrent
• Network access became extremely fast– 1997: wide area bandwidth starts to double every 9 months!
• Meta-computing experiments– 1995: I-Way, GUSTO, …
An Introduction to Grid Computing - David Groep 2004.09.13 10
Beyond meta-computing: the Grid
A grid integrates resources that are
– not owned or administered by one single organisation
– speak a common, open protocol … that is generic
– working as a coordinated, transparent system
And …
– can be used by many people from multiple organisations
– that work together in one Virtual Organisation
An Introduction to Grid Computing - David Groep 2004.09.13 11
Virtual Organisations
A set of individuals or organisations, not under single hierarchical control, temporarily joining forces to solve a particular problem at hand, bringing to the collaboration a subset of their resources, sharing those at their discretion and each under their own conditions.
• A VO is a temporary alliance of stakeholders– Users– Service providers– Information Providers
An Introduction to Grid Computing - David Groep 2004.09.13 12
Coordination and Security
• Parties have no a-priori trust relationship– need for trusted third parties (TTPs): PKI Certificates
– should span whole Grid infrastructure
• Community formation is independent of resources– VOs as a whole negotiate access to resources
– community management is done by the VO
– VO establishment/change/liquidation can be rapid
– needs Grid-wide authorization/enforcement solution
Grid Security Infrastructure (GSI)
An Introduction to Grid Computing - David Groep 2004.09.13 13
Certification Authorities
Alice
(d,n)
(e,n)CommonName=‘Alice’Organization=‘KNMI’
Certificate Request
CA keyCA cert (self-signed)
Alice…
CA checksidentifiers againstidentity of requestor
sign requestwith CA key
ship to Aliceandpublish
Alice generates key pair andships `request’ to CA
An Introduction to Grid Computing - David Groep 2004.09.13 14
Your certificate• Your private key is valuable, keep it safe
– protected with a pass phrase (conventional symmetric crypto)
– store it securely
– make proxies for signing on to the grid
• Find all your credential data in $HOME/.globus/– Private key in “userkey.pem”
– Public key certificate in “usercert.pem”
– CA’s that you trust in “~/.globus/certificates/”
CA User User proxy
An Introduction to Grid Computing - David Groep 2004.09.13 16
Authorization: grid-mapfile and ACs
Each resource can authorize or ban users• Tedious but simple way: grid-mapfile
$ cat /etc/grid-security/grid-mapfile "/O=dutchgrid/O=users/O=nikhef/CN=David Groep" davidg "/O=dutchgrid/O=users/O=nikhef/CN=Michiel Botje" h24 "/O=dutchgrid/O=users/O=sara/CN=Ron Trompert" ront "/O=dutchgrid/O=users/O=nikhef/CN=Jeffrey Templon" aliprod
• VO-managed membership lists: edg-mkgridmap $ cat /opt/edg/etc/edg-mkgridmap.conf group ldap://grid-vo.nikhef.nl/ou=lcg1,o=alice,dc=cern,dc=ch .alice
$ cat /var/adm/grid-mapfile "/C=IT/O=INFN/L=Catania/CN=Roberto Barbera" .alice
• dynamic communities and multi-VO membership: VOMS and/or Attribute Certificates
An Introduction to Grid Computing - David Groep 2004.09.13 17
‘Common and open protocols’
Applications
Grid Services GRAM
Grid Security Infrastructure (GSI)
FabricFARMS MPPs Desktops HPSS Equipment
Application ToolkitsEDG RB MPICH-G2Condor-G
GridFTPBDII
Genius
RLS/RMC
MySQL
An Introduction to Grid Computing - David Groep 2004.09.13 18
Current protocols
Protocols used in production today (legacy):• GridFTP – data transfer• GRAM – job submission• LDAP and GLUE – information system (MDS, BDII)
New direction: Web Services• Web Services Resource Framework WSRF
syntax for service access• Open Grid Services Architecture OGSA
composition and behaviour
An Introduction to Grid Computing - David Groep 2004.09.13 19
What is a WS-Resource?
• Web service: Operation execution component made available at an endpoint address– Implementation often stateless, but accesses state
• WS-Resource: Web service + associated resource– Equivalently: A resource with an associated WS
• A WS-Resource has– Identity: Can be uniquely identified/referenced– Lifetime: Often created & destroyed by clients– State: Can be projected as an XML document
• WS-Resource type = Web service interface• WS-Resources are not just for physical devices
– Jobs, subscriptions, logical data sets, etc.
slide by Steve Tuecke
An Introduction to Grid Computing - David Groep 2004.09.13 20
Inte
rface
WebService
WSDLRun-time environment
Web Services Modelslide by Steve Tuecke
An Introduction to Grid Computing - David Groep 2004.09.13 21
Inte
rface
WebService
message
Invoking a Web Service
address message
Endpoint Reference
Run-time environment
Web Services Modelslide by Steve Tuecke
An Introduction to Grid Computing - David Groep 2004.09.13 22
context
Inte
rface
WebService
messageid
message
Using a Web service to access a resource
id
address
resource
Run-time environment
Endpoint Reference
WS-Resource Framework Model
AddressResource id
slide by Steve Tuecke
An Introduction to Grid Computing - David Groep 2004.09.13 23
Access in a coordinated wayTransparently crossing of domain boundaries satisfying
constraints of – site autonomy
– authenticity, integrity, confidentiality
• single sign-on to all services
• ways to address services collectively– via command-line tools or submission portals (existing apps)
– via API calls to Grid middleware
An Introduction to Grid Computing - David Groep 2004.09.13 24
Grid Security In Action
User
Single sign-on via “grid-id”& generation of proxy cred.
Or: retrieval of proxy cred.from online repository
User ProxyProxy
credential
Communication*
Site C(Kerberos)
Storagesystem
GSI-enabledFTP server
AuthorizeMap to local idAccess file
Remote fileaccess request*
Remote processcreation requests*
* With mutual authentication
Site A(Kerberos)
Computer
GSI-enabledGRAM server
Process
Kerberosticket
Restrictedproxy
Local id
Site B (Unix)
Computer
GSI-enabledGRAM server
Process
Restrictedproxy
Local id
AuthorizeMap to local idCreate processGenerate credentials
Ditto
slide by Steve Tuecke
An Introduction to Grid Computing - David Groep 2004.09.13 25
C = DS = =Grid software service
(like http server)
InformationSystem
CC
C
C
C
C
C
Information System is Central Nervous System of Grid
Info system defines grid
slide by Jeff Templon
An Introduction to Grid Computing - David Groep 2004.09.13 26
C = DS = =Grid software service
I.S.
CC
C
C
C
C
C
DS
DS
DSDSDS
DS
D.M.S
Data Grid
slide by Jeff Templon
An Introduction to Grid Computing - David Groep 2004.09.13 27
C = DS = =Grid software service
I.S.
CC
C
C
C
C
C
DS
DS
DSDSDS
DS
D.M.S
Computing Task Submission
W.M.S.
proxy + command;(data);
Get fresh, detailed info
Coarse Requirements
Candidate Clusters
slide by Jeff Templon
An Introduction to Grid Computing - David Groep 2004.09.13 28
DS
DS
DSDSDS
DS
List of b
est
loca
tions
C = DS = =Grid software service
I.S.
C
D.M.S
Computing Task Execution
W.M.S.
proxy + command;(data);
logger
Where
is my
data
?
proxy
Find DMS
slide by Jeff Templon
An Introduction to Grid Computing - David Groep 2004.09.13 29
DSDSDS
C = DS = =Grid software service
I.S.
C
D.M.S
Computing Task Execution
W.M.S.
logger
How to contact O.D.S.?
Where do I put the data?
proxy + data
Register outputDone
slide by Jeff Templon
An Introduction to Grid Computing - David Groep 2004.09.13 31
Realising the Grid VisionGrid was the logical next step in the end of the 1990:• Harnassing desktop power became commonplace
– 1988: Condor, later: SETI@Home, Entropia, Distributed.NET• Peer-to-peer data access protocols emerged
– 1999: Napster, later: Gnutella, KaZaa, BitTorrent• Network access became extremely fast
– 1997: wide area bandwidth starts to double every 9 months!• 1997: Globus starts developing basic middleware
– 1996: middleware by Legion, 2000: Unicore• Massive take-up of the Grid vision in 1999
– lead in Europe by the EU DataGrid– others include: NASA-IPG, CrossGrid, GridLab, PPDG, Alliance,
• Global Production Grids since 2003– LHC Computing Grid project (LCG)– Enabling Grids for e-Science Europe (EGEE)
An Introduction to Grid Computing - David Groep 2004.09.13 33
Some of the Resources
• 1.2 PByte near-line StorageTek
• 36 node IA32 cluster ‘matrix’
• 468 CPU IA64 + 1024 CPU MIPS
• multi-Gbit links to 100TByte cache
• 7 TByte cache
• 140 nodes IA32
• 1Gbit link SURFnet
• multiple links with SARA
An Introduction to Grid Computing - David Groep 2004.09.13 34
applet and video by Stuart Wakefield, IC London