6d.1 schedulers and resource brokers topics itcs 4146/5146, unc-charlotte, b. wilkinson, 2007 feb...
DESCRIPTION
6d.3From "Introduction to Grid Computing with Globus," IBM Redbooks SchedulingTRANSCRIPT
![Page 1: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/1.jpg)
6d.1
Schedulers and Resource Brokers
Topics
ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12, 2007
• Local schedulers• Condor
![Page 2: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/2.jpg)
6d.2
Scheduler
• Job manager submits jobs to scheduler.
• Scheduler assigns work to resources to achieve specified time requirements.
![Page 3: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/3.jpg)
6d.3From "Introduction to Grid Computing with Globus," IBM Redbooks
Scheduling
![Page 4: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/4.jpg)
6d.4
Executing GT 4 jobs
Globus has the modes.
• Interactive• Interactive-streaming• Batch
![Page 5: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/5.jpg)
6d.5
GT 4 “Fork” Scheduler
• Attempts to execute the job immediately
• Provided for starting and controlling a job on a local host if job does not require any special software loaded or requirements.
![Page 6: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/6.jpg)
6d.6
Batch scheduling
• Batch, a term from old computing days, when one submitted a pack of punched cards as the program to a computer and one would come back after the program had been run on the computer, maybe overnight.
![Page 7: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/7.jpg)
6d.7
GRAMservices
GT4 Java ContainerGRAM
services Localscheduler
Userjob
Compute element
GRAMadapter
Local jobcontrolJob
functions
Relationship between GT4 GRAM and a Local Scheduler
I Foster
Client
Various possible
![Page 8: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/8.jpg)
6d.8
globusrun-ws-Ff flag
• Selects scheduler.• Default: Fork for single jobs.• Other schedulers have to be added
separately, and supported by a GRAM “adapter.”
![Page 9: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/9.jpg)
6d.9
Scheduler adapters included in GT 4
• PBS (Portable Batch System)• Condor• LSF (Load Sharing Facility)
Third party adapter provided for:• SGE (Sun Grid Engine)
![Page 10: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/10.jpg)
6d.10
globusrun-ws-Ff flag
Examples
globusrun-ws -Ft Condor on coit-grid02-4
globusrun-ws -Ft SGE coit-grid01 & toralds.cis.uncw.edu
![Page 11: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/11.jpg)
6d.11
(Local) Scheduler Issues
• Distribute job• Based on load and characteristics of
machines, available disk storage, network characteristics, … .
• Runtime scheduling!
• Arrange data in right place (Staging)– Data Replication and movement as needed– Data Error checking
![Page 12: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/12.jpg)
6d.12
Scheduler Issues (continued)• Performance
– Error checking – check pointing– Monitoring job, progress monitoring– QOS (Quality of service)– Cost (area considered by Nimrod-G scheduler)
• Security– Need to authenticate and authorize remote user
for job submission• Fault Tolerance• Automatic scheduling
![Page 13: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/13.jpg)
6d.13
Scheduling policies
• First-in, First-out• Favor certain types of jobs• Shortest job first• Smallest (or largest) memory first• Short (or long) running job first• Fair sharing or priority to certain users• Dynamic policies
– Depending upon time of day and load– Custom, preemptive, process migration
![Page 14: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/14.jpg)
6d.14
Advance Reservation• Requesting actions at times in future.
• “A service level agreement in which the conditions of the agreement start at some agreed-upon time in the future”
From: “The Grid 2, Blueprint for a New Computing Infrastructure,” I. Foster and C. Kesselman editors, Morgan Kaufmann, 2004.
![Page 15: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/15.jpg)
6d.15
Resource Broker
• “A scheduler that optimizers the performance of a particular resource.
• Performance may be measured by such criteria as fairness (to ensure that all requests for the resources are satisfied) or utilization (to measure the amount of the resource used).”
From: “The Grid 2, Blueprint for a New Computing Infrastructure,” I. Foster and C. Kesselman editors, Morgan Kaufmann, 2004.
![Page 16: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/16.jpg)
6d.16
Scheduler/Resource Broker Examples
We have used Condor and Sun Grid Engine:
• Condor/Condor-G– Used in Fall 2004 course and this year in
assignment 3.
• Sun Grid Engine– Used in Fall 2005 course
![Page 17: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/17.jpg)
6d.17
Condor
• First developed at University of Wisconsin-Madison in mid 1980’s to convert a collection of distributed workstations and clusters into a high-throughput computing facility.
• Key concept - using wasted computer power of idle workstations.
![Page 18: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/18.jpg)
6d.18
Condor
• Converts collections of distributed workstations and dedicated clusters into a distributed high-throughput computing facility.
![Page 19: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/19.jpg)
6d.19
Uses• Consider following scenario:
– I have a simulation that takes two hours to run on my high-end computer
– I need to run it 1000 times with slightly different parameters each time.
– If I do this on one computer, it will take at least 2000 hours (or about 3 months)
From: “Condor: What it is and why you should worry about it,” by B. Beckles, University of Cambridge, Seminar, June 23, 2004
![Page 20: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/20.jpg)
6d.20
– Suppose my department has 100 PCs like mine that are mostly sitting idle overnight (say 8 hours a day).
– If I could use them when their legitimate users are not using them, so that I do not inconvenience them, I could get about 800 CPU hours/day.
– This is an ideal situation for Condor.
• I could do my simulations in 2.5 days.
From: “Condor: What it is and why you should worry about it,” by B. Beckles, University of Cambridge, Seminar, June 23, 2004
![Page 21: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/21.jpg)
6d.21
Condor Features
• Include:– Resource finder– Batch queue manager– Scheduler– Checkpoint/restart– Process migration
![Page 22: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/22.jpg)
6d.22
Intended to run job even if:
• Machines crash• Disk space exhausted• Software not installed• Machines are needed by others• Machines are managed by others• Machines are far away
![Page 23: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/23.jpg)
6d.23
How does Condor work?
• A collection of machines running Condor called a pool.
• Individual pools can be joined together in a process called flocking.
From: “Condor: What it is and why you should worry about it,” by B. Beckles, University of Cambridge, Seminar, June 23, 2004
![Page 24: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/24.jpg)
6d.24
Machine Roles
• Machines have one or more of 4 roles:
– Central manager– Submit machine (Submit host)– Execution machine (Execute host)– Checkpoint server
![Page 25: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/25.jpg)
6d.25
Central Manager
• Resource broker for a pool.
• Keeps track of which machines are available, what jobs are running, negotiates which machine will run which job, etc.
• Only one central manager per pool.
![Page 26: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/26.jpg)
6d.26
Submit Machine
• Machine which submits jobs to pool.
• Must be at least one submit machine in a pool, and usually more than one.
![Page 27: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/27.jpg)
6d.27
Execute Machine
• Machine on which jobs can be run.
• Must be at least one execute machine in a pool, and usually more than one.
![Page 28: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/28.jpg)
6d.28
Checkpoint Server
• Machine which stores al checkpoint files produced by job which checkpoint.
• Can only be one checkpoint machine in a pool.
• Optional to have a checkpoint machine.
![Page 29: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/29.jpg)
6d.29
Possible Configuration• A central manager.
• Some machine that can only be submit hosts.
• Some machine that can be only execute hosts.
• Some machines that can be both submit and execute hosts.
![Page 30: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/30.jpg)
6d.30
![Page 31: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/31.jpg)
6d.31
Types of Jobs• Classified according to environment it
provides. Currently seven environments:
– Standard– Vanilla– PVM– MPI– Globus– Java– Scheduler
![Page 32: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/32.jpg)
6d.32
Standard• For jobs compiled with Condor libraries.
• Allows for checking pointing and remote system calls.
• Must be single threaded.
• Not available under Windows.
![Page 33: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/33.jpg)
6d.33
Checkpointing• Certain jobs can checkpoint, both
periodically for safety and when interrupted.
• If checkpointed job interrupted, it will resume at the last checkpointed state when it starts again.
• Generally no change to source code - need to link Condor’s Standard Universe support library.
![Page 34: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/34.jpg)
6d.34
Vanilla
• For jobs that cannot be compiled with Condor libraries, and for shell scripts and Windows batch files.
• No checkpointing or remote system calls.
![Page 35: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/35.jpg)
6d.35
PVMFor PVM programs.
MPIFor MPI programs (MPICH).
Both PVM and MPI are message-passing libraries used in message passing programs.
Used for local clusters of computers.
MPI could be used in grid computing – we will talk about this later in the course.
![Page 36: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/36.jpg)
6d.36
Globus For submitting jobs to resources managed by Globus (version 2.2 and higher).
JavaFor Java programs (written for Java Virtual Interface).
SchedulerUsed with DAG scheduled jobs, see later.
![Page 37: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/37.jpg)
6d.37
Submitting a job• Job submitted to “submit host” using
Condor_submit command.
• Job described in “submit description” file.
• Submit description file includes details such as given in an RSL file in Globus, i.e. the name of the executable, arguments, etc.
![Page 38: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/38.jpg)
6d.38
Condor Submit Description File
# This is a comment, condor submit fileUniverse = vanillaExecutable = /home/abw/condor/myProgInput = myProg.stdinOutput = myProg.stdoutError = myProg.stderrArguments = -arg1 -arg2InitialDir = /home/abw/condor/assignment4Queue
Describes job to Condor.Used with Condor _submit command.
Description File Example
![Page 39: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/39.jpg)
6d.39
Submitting Multiple Jobs
• Submit file can specify multiple jobs– Example: Queue 500 will submit 500 jobs at once
• Condor calls groups of jobs a cluster
• Each job within cluster called a process
• Condor job ID is the cluster number, a period and process number, for example 26.2
• Single jobs also a cluster but with a single process (process 0)
![Page 40: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/40.jpg)
6d.40
Submitting a job with requirements and preferences • Done using Condor’s “ClassAd”
mechanism, which may include:– What it requires– What it desires– What it prefers, and– What it will accept
• These details start in submit description file.
![Page 41: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/41.jpg)
6d.41
condor-submit command creates a “ClassAd” from the submit description file, which is then used in ClassAd matchmaking mechanism.
Command:
condor_submit submit.prog1
ClassAd file
submit description file
![Page 42: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/42.jpg)
6d.42
Specifying Requirements
• A C/Java-like Boolean expression that evaluates to TRUE for a match.
# This is a comment, condor submit fileUniverse = vanillaExecutable = /home/abw/condor/myProgInitialDir = /home/abw/condor/assignment4Requirements = Memory >= 512 && Disk > 10000queue 500
![Page 43: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/43.jpg)
6d.43
ClassAd MatchmakingUsed to ensure job done according to constraints of users and owners.
Example of user constraints“ I need a Pentium IV with at least 512 Mbytes of
RAM and speed of at least 3.8 GHz
Example of machine owner constraints “Never run jobs owned by Fred”
![Page 44: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/44.jpg)
6d.44
ClassAd Matchmaking Steps
1. Agents (jobs) and resources (computers) advertise their characteristics and requirements in “classified advertisements.”
2. Matchmaker scans ClassAds and creates pairs that satisfy each others constraints and preferences.
3. Matchmaker informs both parties of match.
4. Agent and resource make contact.
![Page 45: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/45.jpg)
6d.45
Job
Job ClassAd
Machine ClassAdd
Machine ClassAdd
Machine
Match
Machine
![Page 46: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/46.jpg)
6d.46
Job ClassAd Example
[MyType = “Job”TargetType=“Machine”Requirements =((other.Arch==“INTEL”&&other.OpSys==“LINUX”)&& other.Disk>myDiskUsage)DiskUsage = 6000] 6 MB
Requirements statement must evaluate to true
![Page 47: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/47.jpg)
6d.47
Machine ClassAd Example[MyType=“Machine”TargetType=“Job”Machine=“coit-grid01.uncc.edu”Requirements=((LoadAvg<=0.300000)&&(KeyboardIdle>(15*60))Arch=“INTEL”OpSys=“LINUX”Disk=1000000]
Keyboard idle for more than 15 minutes
Low load average
![Page 48: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/48.jpg)
6d.48
ClassAd’s Rank Statement
• Can be used in job ClassAdd for selection between compatible machines. Choose highest rank
• Rank expression should evaluate to a floating point number.
ExampleRank = (Memory * 10000) + KFlops
Machine speed
![Page 49: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/49.jpg)
6d.49
Rank StatementCan also be used in Machines ClassAd in
matchmaking.
ExampleRank = (other.Department == self.Department)
where Department defined in job ClassAdd, say:
Department=“Computer Science”
![Page 50: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/50.jpg)
6d.50
Job ClassAd[MyType = “Job”TargetType=“Machine”
…Department=“Computer
Science”…]
Machines ClassAd[MyType=“Machine”TargetType=“Job”
…Rank = (other.Department == self.Department)…]
Using rank in Machines ClassAd
![Page 51: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/51.jpg)
6d.51
Directed Acyclic GraphManager (DAGMan)
Meta-scheduler
Allows one to specify dependencies between Condor Jobs.
![Page 52: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/52.jpg)
6d.52
Example“Do not run Job B until Job A completed
successfully”
Especially important to jobs working together (as in Grid computing).
![Page 53: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/53.jpg)
6d.53
Directed Acyclic Graph(DAG)
• A data structure used to represent dependencies.
• Directed graph.
• No cycles.
• Each job is a node in the DAG.
• Each node can have any number of parents and childred as long as there are no loops (Acyclic graph).
![Page 54: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/54.jpg)
6d.54
DAG
Job A
Job CJob B
Job D
Do job A.
Do jobs B and C after job A finished
Do job D after both jobs B and C finished.
![Page 55: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/55.jpg)
6d.55
Defining a DAG• Defined by a .dag file, listing each of the
nodes and their dependencies.
• Each “job” statement has an abstract job name (say A) and a file (say a.condor)
• PARENT-CHILD statement describes relationship between two or more jobs
• Other statements available.
![Page 56: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/56.jpg)
6d.56
Example# diamond.dagJob A a.subJob B b.subJob C c.subJob D d.subParent A Child B CParent B C Child D
Job A
Job CJob B
Job D
![Page 57: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/57.jpg)
6d.57
To start a DAG, use condor_submit_dag command with dag file:
condor_submit_dag diamond.dag
condor_submit_dag submits a Scheduler Universe Job with DAGMan as the executable.
![Page 58: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/58.jpg)
6d.58
Running a DAG
• DAGMan acts as a scheduler managing the submission of jobs to Condor based upon DAG dependencies.
• DAGMan holds and submits jobs to Condor queue at appropriate times.
![Page 59: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/59.jpg)
6d.59
Job Failures
• DAGMan continues until it cannot make progress and then creates a rescue file holding current state of DAG.
• When failed job ready to re-run, rescue file used to restore prior state of DAG.
![Page 60: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/60.jpg)
6d.60
Summary of Key Condor Features
• High throughput computing using an opportunitistic environment.
• Provides a mechanisms for running jobs on remote machines.
• Matchmaking
• Checkpointing
• DAG scheduling
![Page 61: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/61.jpg)
6d.61
QuizGive one reason why a scheduler or resource broker is used in conjunction with Globus:
(a)Globus does not provide the ability to submit jobs.
(b)Globus does not provide the ability to make advance reservations.
(c) No reason whatsoever.(d) Globus does not provide the ability to
transfer files.
![Page 62: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/62.jpg)
6d.62
(a)There are no similarities.
(b)They both provide a means of specifying command line arguments for the job.
(c) They both provide a means of specifying whether a named user is allowed to execute a job.
(d)They both provide a means of specifying machine requirements for a job.
Identify which of the following are similarities between Condor ClassAd and Globus RSL (version 1 or 2). (There may be more than one similarity.)
![Page 63: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/63.jpg)
6d.63
In the context of schedulers, what is meant by the term “Advance Reservation”?
(a) Requesting an advance.
(b) Submitting a more advanced job.
(c) Move onto the next job.
(d) Requesting actions at a future time.
![Page 64: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/64.jpg)
6d.64
More Information• http://www.cs.wisc.org/condor
• Chapter 11, Condor and the Grid, D. Thain, T. Tannenbaum, and M. Livny, Grid Computing: Making The Global Infrastructure a Reality, F. Berman, A. J. G. Hey, and G. Fox, editors, John Wiley, 2003.
• “Condor-G: A Computation Management Agent for Multi-Institutional Grids,” J. Frey, T. Tannenbaum, I. Foster, M. Livny, S. Tuecke, Proc. 10th Int. Symp. High Performance Distributed Computing (HPDC-10) Aug. 2001.
![Page 65: 6d.1 Schedulers and Resource Brokers Topics ITCS 4146/5146, UNC-Charlotte, B. Wilkinson, 2007 Feb 12,…](https://reader035.vdocument.in/reader035/viewer/2022062911/5a4d1bf17f8b9ab0599e5f4d/html5/thumbnails/65.jpg)
6d.65
Questions