jon wakelin condor, globus and srb: tools for constructing a campus grid
TRANSCRIPT
![Page 1: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/1.jpg)
Jon Wakelin
Condor, Globus and SRB:Tools for Constructing a Campus Grid
![Page 2: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/2.jpg)
2 Overview
• Condor
• Globus
• Storage Resource Broker (SRB)
• UoBGrid
• Summary
![Page 3: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/3.jpg)
3 Condor Overview
• High Throughput Computing Environment– From networked resources (Condor pool)
• Like other Schedulers– Queuing mechanism– Prioritisation scheme– Scheduling Policy
• Unlike other schedulers– Doesn’t need dedicated resources– Desktops workstations, library or PC lab computers – Cycle scavenging
![Page 4: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/4.jpg)
4 Class Ads
• Classified advertisements– Machine Class Ads (for sale)– Job Class Ads (wanted)
• Machine Class Ads– Created from information “advertised” by machines in the condor pool
– Can add extra Class Ad information
• Job Class Ads – Created from information in the condor submit file
– Created from default values
![Page 5: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/5.jpg)
5 Different roles in a Condor pool
• Central Manager
• Submit
• Execute
• Or a combination of these
– e.g. submit and execute node
• Different daemons will be started depending on the role of the machine
![Page 6: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/6.jpg)
6 Condor Daemons
• All Machines– condor_master - controls other daemons
• Central Manager– condor_collector - Collects information from other machines– condor_negotiator - Performs matchmaking
• Execute– condor_startd - Starts, stops, suspends jobs
• Submit– condor_schedd - Maintains queue of jobs
![Page 7: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/7.jpg)
7 Job Submission
Executable = /bin/lsArguments = -lInitialDir = /usr/bin
Output = outError = err
Queue
![Page 8: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/8.jpg)
8 Job Submission
Executable = /bin/lsArguments = -lInitialDir = /usr/bin
Output = out.$(Process)Error = err.$(process)
Queue 2
![Page 9: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/9.jpg)
9 Job Submission
Executable = /bin/lsArguments = -lInitialDir = /usr/bin
Output = out.$(Process)Error = err.$(process)
Requirements = ((Arch==“INTEL” && OpSys=“LINUX”) || (Arch==“INTEL” && OpSys=“IRIX65”))
Queue 2
![Page 10: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/10.jpg)
10 Job Submission
Executable = gaussian.$$(Arch).$$(OpSys)InitialDir = /home/jon
Input = chlorobenzene.inOutput = chlorobenzene.$(Process)Error = chlorobenzene.$(process)
Requirements = ((Arch==“INTEL” && OpSys=“LINUX”) || (Arch==“INTEL” && OpSys=“IRIX65”))
Queue 2
![Page 11: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/11.jpg)
11 Condor Commands
• condor_submit <submit_file>
• condor_q
• condor_rm
• condor_status– Displays pool status in a succinct format
• condor_status –l <machine>– Display full Class Ad information
![Page 12: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/12.jpg)
12 Condor-G
• Condor interface to access Globus resources– condor submit file– condor commands– Keeps log of runs– Adds fault tolerance
• Can be used to perform matchmaking– Must create machine Class Ads manually– condor_advertise command– Can be used to create a resource broker– No RB functionality in Globus Toolkit
![Page 13: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/13.jpg)
13 Globus Toolkit Overview
• Globus is a toolkit not an turnkey solution
• Globus Toolkit 2.4.3 common choice for production grids
• Four main components– Authentication (GSI)– Resource management (GRAM)– Data transfer (GridFTP)– Resource discovery and monitoring (MDS)
![Page 14: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/14.jpg)
14 Authentication
• Grid users need to obtain something called a
certificate
• Applications can use the certificate to establish the
identity of the user….
• i.e. authenticate the user
![Page 15: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/15.jpg)
15 PKI Authentication
• Public Key Infrastructure– Public/Private keys– Used to encrypt data– And to sign certificates
• Certification Authority (CA)– User create certificate– CA Signs certificates– UK eScience CA at RAL
• Certificate Contains– Identity/Distinguished Name (DN)– Public Key– signature & Identity of CA
![Page 16: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/16.jpg)
16 GSI Authentication
• Grid Security Infrastructure– extensions to PKI (X509, SSL extensions)– Single sign-on– Delegation
– grid-proxy-init – command to create proxy certificates
CA User Proxy
Signature Signature
![Page 17: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/17.jpg)
17 Resource Management (GRAM)
• Grid Resource Allocation Manager– Gatekeeper– Resource Specification Language– JobManagers
![Page 18: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/18.jpg)
18 GRAM - Gatekeeper
• Daemon runs on a grid resource
• Processes incoming globus requests
• Authenticates Users– Configured to trust a given CA– e.g. UK eScience CA at RAL
• Maps user to local account– DN => username – grid-mapfile
• Passes the job onto the jobmanager
![Page 19: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/19.jpg)
19 GRAM - RSL
• Resource Specification Language– (attribute op value) in parenthesis
• Operators– Numerical operators within clauses (<, <=, >, >=, =, !=)– Logical operators between clauses (&, | )
• Attributes– Predefined
• executable, arguments, stdin, stdout, stderr, environment• maxCpuTime, maxWallTime, maxMemory, project, queue
– User defined• May be handled by subsequent application
![Page 20: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/20.jpg)
20 GRAM - RSL
&(executable=“/bin/ls”) (arguments=“-l”) (directory=“/usr/bin”)
![Page 21: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/21.jpg)
21 GRAM - JobManagers
• Perl modules– Convert RSL into scheduler specific language
• Reference implementations– Fork, Condor, PBS, LSF
• May need to roll-your-own– e.g. LoadLeveller, SGE– Or just to add extra functionality
![Page 22: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/22.jpg)
22 Data Management (GridFTP)
• File Transfer Protocol– Extension of the standard FTP protocol stack to include extra
functionality• GSI authentication• Third Party transfers• Striped transfers
– User application is globus-url-copy
![Page 23: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/23.jpg)
23 Information Services (MDS)
• Collect and provide status information about Grid resources
• MDS: Monitoring and Discovery Service– GRIS: Grid Resource Information Service
• Collects info about local resource• Reports to GIIS server
– GIIS: Grid Index Information Service• Aggregates information from GRIS servers• One per organisation
– Same executable with different configuration
![Page 24: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/24.jpg)
24 Storage Resource Broker (SRB) Overview
• Uniform interface to heterogeneous data storage resources– Unix, Irix, linux file systems– Windows– Databases– Physical media (tape storage)
• SRB is middleware– Allows access to a wide range of data resources– Allows a wide range of user Apps to be written – All accessed through a “narrow” API
API
Storage
Applications
![Page 25: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/25.jpg)
25 SRB Access
• Applications– Scommands: command line– MySRB: Web access– inQ: Windows GUI
• APIs– Java, C, C++, Python, Perl
![Page 26: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/26.jpg)
26 UoBGrid Overview
• What is a Campus Grid?
• Our Situation
• Software Choices
• Services
![Page 27: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/27.jpg)
27 What is a Campus Grid?
• A Grid: Single sign-on to multiple resources located in different
administrative domains
![Page 28: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/28.jpg)
28 Our Situation
• Dedicated departmental clusters– Windows Condor pools not a requested resource
• Separation of user communities– parallel vs serial usage
• All contained within a single firewall domain
• Wanted to become partners in the NGS– Systems must be compatible– Encourage our users to become NGS users
• Full Economic Costing coming soon!– Important to keep usage records– Ensure best usage of purchased resources for sustainable future
![Page 29: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/29.jpg)
29 Software Choices
• Condor 6.6.7• Globus 2.4.3• MyProxy• GSI-SSH• Storage Resource Broker (SRB)
• Virtual Data Toolkit (VDT)– Bundles many useful tools– Platform independent installation– Supported release of Globus Toolkit, MyProxy & GSI-SSH
![Page 30: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/30.jpg)
30 Planned Resources
![Page 31: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/31.jpg)
31 Current Resources
• 4 Servers– RB: Resource Broker– VOM: Virtual Organisation Manager– MDS: Monitoring and Discovery Service– SRB: Storage Resource Broker
• 4 Compute Resources– Monster2 - SGE, 20 CPU– Tuya - PBS, 16 CPU– Grendel - PBS, 110 CPU– BSESrv1 - PBS, 28 CPU
![Page 32: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/32.jpg)
32 Resource Broker
• Condor-G with matchmaking
• Custom script for determination of resource status– Converts MDS information into condor Class Ads– Adds information about available software
• User submission script– Create condor submit file – Software requirements passed into Condor submit file– Submits jobs– Sends data SRB
• http://cerb-rb.bris.ac.uk/cgi-bin/rb_status.cgi
![Page 33: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/33.jpg)
33 Virtual Organisation Manager
• Built using– Webserver
• Apache + mod_ssl • Perl CGI
– Postgres Database– Modified Globus JobManagers
• Functionality– Record of users and machines– Administrative functions– Accounting/Usage Statistics
![Page 34: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/34.jpg)
34 Virtual Organisation Manager
• Admin – via web interface (https)– Access based on Certificate/DN– Add/remove Users– Add/Remove Resource– Control Users Access to Resources
• Constructs grid-mapfiles for all resources
• https://cerb-vom.bris.ac.uk/vom-bin/VOM.cgi
![Page 35: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/35.jpg)
35 Virtual Organisation Manager
• Accounting/Usage Statistics– Usage by machine
– Usage by users
• Modified GRAM JobManagers– Job details sent to DB on completion– executable, arguments, start time, end time, CPU, wall time, memory,
virtual memory, jobmanager-type, number of nodes
• http://cerb-vom.bris.ac.uk/cgi-bin/VOM-usage-stats.cgi
![Page 36: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/36.jpg)
36 Resource Monitor
• Runs GIIS– Collects information from UoBGrid resources
• Runs Big Brother monitoring software– Client/Server model– Server pings registered resources– Client records local system info and reports to server
![Page 37: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/37.jpg)
37 Big Brother Monitoring System
Web available
status page with
easy to
understand
functionality for
helpdesk and
admin staff.
![Page 38: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/38.jpg)
38 Storage Resource Broker
• All UobGrid users given SRB account
• GSI authentication enabled for Scommands
• Access via certificate
![Page 39: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/39.jpg)
39
SRB
MDS
RB
VOM
User
RAL Oxford
Leeds Man
Monster2 (SGE)
bserv (PBS)
Grendel (PBS)
Tuya(PBS)
BDII
NGS
UoB Grid
![Page 40: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/40.jpg)
40
SRB
MDS
RB
VOM
User
RAL Oxford
Leeds Man
Monster2 (SGE)
bserv (PBS)
Grendel (PBS)
Tuya(PBS)
BDII
NGS
Compute resources running GRIS report to information servers
![Page 41: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/41.jpg)
41
SRB
MDS
RB
VOM
User
RAL Oxford
Leeds Man
Monster2 (SGE)
bserv (PBS)
Grendel (PBS)
Tuya(PBS)
BDII
NGS
Resource Broker polls information servers and convertsMDS information into Condor Class Ads
![Page 42: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/42.jpg)
42
SRB
MDS
RB
VOM
User
RAL Oxford
Leeds Man
Monster2 (SGE)
bserv (PBS)
Grendel (PBS)
Tuya(PBS)
BDII
NGS
User logs on to Resource Broker to submit job. JobsAre matched to resources using condor
![Page 43: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/43.jpg)
43
SRB
MDS
RB
VOM
User
RAL Oxford
Leeds Man
Monster2 (SGE)
bserv (PBS)
Grendel (PBS)
Tuya(PBS)
BDII
NGS
Job details sent to machine by Condor-G
![Page 44: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/44.jpg)
44
SRB
MDS
RB
VOM
User
RAL Oxford
Leeds Man
Monster2 (SGE)
bserv (PBS)
Grendel (PBS)
Tuya(PBS)
BDII
NGS
Upon completion output files are sent back to the Resource broker
![Page 45: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/45.jpg)
45
SRB
MDS
RB
VOM
User
RAL Oxford
Leeds Man
Monster2 (SGE)
bserv (PBS)
Grendel (PBS)
Tuya(PBS)
BDII
NGS
If job runs on UoB Grid resources run details are sent to VOM DBFor NGS and UoB users alike.
![Page 46: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/46.jpg)
46
SRB
MDS
RB
VOM
User
RAL Oxford
Leeds Man
Monster2 (SGE)
bserv (PBS)
Grendel (PBS)
Tuya(PBS)
BDII
NGS
Finally output file are sent from RB to the Storage Resource Broker
![Page 47: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/47.jpg)
47 Summary
• Condor– Standalone: high throughput computing system
– Matchmaking with Class Ads
– Condor-G: interface to Globus Toolkit
• Globus Toolkit– Applications, Protocols, APIs
– GSI – Certificates (DN, public key, digital signature)
• UoBGrid– Centralised access to disparate resources
– Custom components created to fill functionality gaps
– Globus gives authenticated access to resources
– Condor provides matchmaking (i.e. brokering)
– SRB provides storage
![Page 48: Jon Wakelin Condor, Globus and SRB: Tools for Constructing a Campus Grid](https://reader035.vdocument.in/reader035/viewer/2022062515/56649d095503460f949dc08f/html5/thumbnails/48.jpg)
48 Useful URLs
• SRB: http://www.sdsc.edu/srb/
• Globus: http://www.globus.org/
• Condor: http://www.cs.wisc.edu/condor/
• UK eScience CA: https://ca.grid-support.ac.uk/
• NGS: http://www.ngs.ac.uk/
• UoBGrid: http://escience.bris.ac.uk