condor tutorial prabhaker mateti wright state university
TRANSCRIPT
Condor TutorialCondor Tutorial
Prabhaker MatetiWright State University
Mateti, Condor2
AcknowledgementsAcknowledgements
Many of these slides are adapted from tutorials by
Miron Livny, and his associatesUniversity of Wisconsin-Madisonhttp://www.cs.wisc.edu/condor
Mateti, Condor3
Clusters with Part Time NodesClusters with Part Time Nodes
Cycle Stealing: Running of jobs on a workstations that don't belong to the owner.
Definition of Idleness: E.g., No keyboard and no mouse activity
Tools/Libraries– Condor– PVM– MPI
Mateti, Condor4
Performance v. ThroughputPerformance v. Throughput
High Performance - Very large amounts of processing capacity over short time periods
– FLOPS - Floating Point Operations Per Second
High Throughput - Large amounts of processing capacity sustained over very long time periods – FLOPY - Floating Point Operations Per Year
FLOPY = 365x24x60x60*FLOPS?
Mateti, Condor5
CooperationCooperation
Workstations are “personal” Others use slows you down
– Immediate-Eviction
– Pause-and-Migrate
Willing to share– Letting you cycle-steal
Willing to trust
Mateti, Condor6
Granularity of MigrationGranularity of Migration
Process migration– Process = Collection of objects– at least one active object
Object migration– Passive objects– Active objects
Mateti, Condor7
Migration of Jobs: Technical Migration of Jobs: Technical IssuesIssues
Checkpointing: Preserving the state of the process so it can be resumed.
One architecture to anotherYour “environment”
– keyboard, mouse, display, files, …
Mateti, Condor8
CondorCondor
A system for high throughput computing by making use of idle computing resources
Lots of jobs over a long period of time, not a short burst of high performance
Manages both machines and jobsHas been stable, and delivered thousands of
CPU hours
Mateti, Condor9
Condor TechniquesCondor Techniques
Migratory programs– Checkpointing– Remote IO
Resource matching
Mateti, Condor10
Condor: AssumptionsCondor: Assumptions
Large numbers of workstations are idle most of the time
Owners of such machines would not mind their use by others while idle
Owners want their work to be given high priority
Mateti, Condor11
RolesRolesOwner offers his machine for use by othersUser requests to run his jobsAdministrator manages the pool of available
machinesMultiple roles possible
Mateti, Condor12
Classified Advertisements: ExampleClassified Advertisements: Example
MyType = "Machine"TargetType = "Job"Name = "froth.cs.wisc.edu"
StartdIpAddr="<128.105.73.44:33846>"
Arch = "INTEL"OpSys = "SOLARIS26"VirtualMemory = 225312
Disk = 35957KFlops = 21058Mips = 103LoadAvg = 0.011719KeyboardIdle = 12Cpus = 1Memory = 128Requirements = LoadAvg <= 0.300000 && KeyboardIdle > 15 * 60
Rank = 0
Mateti, Condor13
Condor User RequestsCondor User Requests
Describes the program, and its needsExample condor_submit File
Universe = standardExecutable = /home/wsu03/condor/my_job.condorInput = my_job.stdinOutput = my_job.stdoutError = my_job.stderrLog = my_job.logArguments = -arg1 -arg2InitialDir = /home/wsu03/condor/run_1Queue
Mateti, Condor14
ClassAds: Example for JobsClassAds: Example for Jobs
Requirements = Arch == “INTEL” && OpSys == “LINUX” && Memory > 20
Rank = (Memory > 32) * ( (Memory * 100)
+ (IsDedicated * 10000) + Mips )
Mateti, Condor15
Condor Pool of MachinesCondor Pool of Machines
“Pool” can be a single machine, or a group of machines volunteered by their owners
Determined by a “central manager” - the matchmaker and centralized information repository
Each machine runs various daemons to provide different services, either to the users who submit jobs, the machine owners, or the pool itself
Mateti, Condor16
Condor System StructureCondor System Structure
Mateti, Condor17
Condor AgentsCondor Agents
Condor Resource Agent– condor_startd daemon– allows a machine to execute Condor jobs– enforces owner policy
Condor User Agent– condor_schedd daemon– allows a machine to submit jobs to a pool
Mateti, Condor18
Condor: RobustnessCondor: Robustness
Checkpointing allows guaranteed forward progress of your jobs, even jobs that run for weeks before completion
If an execute machine crashes, you only loose work done since the last checkpoint
Condor maintains a persistent job queue - if the submit machine crashes, Condor will recover
Mateti, Condor19
What’s Condor Good For?What’s Condor Good For?
Managing a large number of jobs– You specify the jobs in a file and submit them
to Condor, which runs them all and can send you email when they complete
– Mechanisms to help you manage huge numbers of jobs (1000’s), all the data, etc.
– Condor can handle inter-job dependencies (DAGMan)
Mateti, Condor20
ThroughputThroughput
Checkpointing allows your job to run on opportunistic resources, not dedicated
Checkpointing permits migration - if a machine is no longer available, migrate
With remote system calls, you don’t even need an account on a machine where your job executes
Mateti, Condor21
Can your program work with Can your program work with Condor?Condor?What kind of I/O does it do?Does it use TCP/IP? (network sockets)Can the job be resumed?Multiple processes?
– fork(), pvm_addhost(), etc.
Mateti, Condor22
Typical IOTypical IO
Interactive TTY“Batch” TTY (just reads from STDIN and
writes to STDOUT or STDERR, but you can redirect to/from files)
X WindowsNFS, AFS, or another network file systemLocal file systemTCP/IP
Mateti, Condor23
Condor Condor UniversesUniverses
Different universes support different functionalities
VanillaStandardSchedulerPVM
Mateti, Condor24
Condor Condor UniversesUniverses: : IO supportIO support
No support for interactive TTY
X11 NFS LocalFiles TCP
Vanilla x x - x
Standard - x x -
Scheduler x x x x
PVM x x x x
Mateti, Condor25
Condor UniversesCondor Universes
PVM (Parallel Virtual Machine)– Multiple processes in Condor
Scheduler– The job is run on the submit machine, not on a
remote execute machine– Job is automatically restarted if the
condor_schedd is shutdown– Used to schedule jobs
Mateti, Condor26
Submitting Jobs to CondorSubmitting Jobs to Condor
Choosing a “Universe” for your job Preparing your job
– Making it “batch-ready”– Re-linking if checkpointing and remote system
calls are desired (condor_compile)Creating a submit description filecondor_submit your request to the User
Agent (condor_schedd)
Mateti, Condor27
Making your job “batch-ready”Making your job “batch-ready”Must be able to run in the background: no
interactive input, windows, GUI, etc.Can still use STDIN, STDOUT, and STDERR but files are used for these instead of the actual devices
If your job expects input from the keyboard, you have to put the input you want into a file
Mateti, Condor28
Preparing Your Job (cont’d)Preparing Your Job (cont’d)
If you are going to use the standard universe with checkpointing and remote system calls, you must re-link your job with Condor’s libraries
condor_compile gcc -o myjob myjob.c
Mateti, Condor29
Submit Description FileSubmit Description File
Tells Condor about your job:– Which executable, universe, input, output and
error files to use, command-line arguments, environment variables, any special requirements or preferences (more on this later)
Can describe many jobs at once (a “cluster”) each with different input, arguments, output, etc.
Mateti, Condor30
Example condor_submit FileExample condor_submit File
Universe = standardExecutable = /home/wsu03/condor/my_job.condorInput = my_job.stdinOutput = my_job.stdoutError = my_job.stderrLog = my_job.logArguments = -arg1 -arg2InitialDir = /home/wsu03/condor/run_1Queue
Mateti, Condor31
Example Submit Description FileExample Submit Description FileSubmits a single job to the standard
universe, specifies files for STDIN, STDOUT and STDERR, creates a UserLog, defines command line arguments, and specifies the directory the job should be run in
As if you did
% cd /home/wright/condor/run_1% /home/wsu03/condor/my_job.condor -arg1 -arg2 \ > my_job.stdout 2> my_job.stderr \ < my_job.stdin
Mateti, Condor32
““Clusters” and “Processes”Clusters” and “Processes”
A submit file describes one or more jobsThe collection of jobs is called a “cluster”Each job is called a “process” or “proc”A Condor “Job ID” is the cluster number,
a period, and the proc number (e.g., 23.5)Proc numbers always start at 0
Mateti, Condor33
A Cluster Submit Description FileA Cluster Submit Description File
Universe = standardExecutable = /home/wsu03/condor/my_job.condorInput = my_job.stdinOutput = my_job.stdoutError = my_job.stderrLog = my_job.logArguments = -arg1 -arg2InitialDir = /home/wsu03/condor/run_$(Process)Queue 500
Mateti, Condor34
A Cluster Submit Description FileA Cluster Submit Description File“Queue 500” = submit 500 jobs at onceThe initial directory for each job is specified
with the $(Process) macro$(Process) will be expanded to the process
number for each job in the cluster; “run_0”, “run_1”, … “run_499” directoriesAll the input/output files will be in different
directories
Mateti, Condor35
condor_submitcondor_submit
condor_submit the-submit-file-namecondor_submit parses the file and creates a
“ClassAd” that describes your job(s)Creates the files you specified for STDOUT and STDERR
Sends your job’s ClassAd(s) and executable to the condor_schedd, which stores the job in its queue
Mateti, Condor36
Monitoring Your JobsMonitoring Your JobsUsing condor_qUsing a “User Log” fileUsing condor_statusUsing condor_rmGetting email from CondorUsing condor_history after completion
Mateti, Condor37
Using condor_qUsing condor_q
Displays the status of your jobs, how much compute time it has accumulated, etc.
Many different options:– A single job, a single cluster, all jobs that
match a certain constraint, or all jobs– Can view remote job queues, either individual
queues, or “-global”
Mateti, Condor38
Using a “User Log” fileUsing a “User Log” file
Specify in your submit file:– Log = filename
Entries logged for:– When it was submitted– when it started executing– if it is checkpointed or vacated– if there are any problems, etc.
Mateti, Condor39
Using condor_statusUsing condor_status
the “-run” option to see – Machines running jobs– The user who submitted each job– The machine they submitted from
Can also view the status of various submitters with “-submitter <name>”
Mateti, Condor40
Using condor_rmUsing condor_rm
Removes a job from the Condor queueYou can only remove jobs that you ownRoot can condor_rm someone else’s jobsYou can give specific job ID’s (cluster or
cluster.proc), or you can remove all of your jobs with the “-a” option.
Mateti, Condor41
Getting Email from CondorGetting Email from CondorBy default, Condor will send you email
when your jobs completesIf you don’t want this email, put this in your
submit file:notification = never
If you want email every time something happens to your job (checkpoint, exit, etc), use this:notification = always
Mateti, Condor42
If you only want email if your job exits with an error, use this:notification = error
By default, the email is sent to your account on the host you submitted from. If you want the email to go to a different address, use this:notify_user = [email protected]
Getting Email from CondorGetting Email from Condor
Mateti, Condor43
Using condor_historyUsing condor_history
Once your job completes, it will no longer show up in condor_q
Now, you must use condor_history to view the job’s ClassAd
The status field (“ST”) will have either a “C” for “completed”, or an “X” if the job was removed with condor_rm
Mateti, Condor44
Classified AdvertisementsClassified Advertisements
A ClassAd is a set of named expressions– Each named expression is an attribute
Expressions are similar to those in C …– Constants, attribute references, operators
Mateti, Condor45
Classified Advertisements: ExampleClassified Advertisements: Example
MyType = "Machine"TargetType = "Job"Name = "froth.cs.wisc.edu"
StartdIpAddr="<128.105.73.44:33846>"
Arch = "INTEL"OpSys = "SOLARIS26"VirtualMemory = 225312
Disk = 35957KFlops = 21058Mips = 103LoadAvg = 0.011719KeyboardIdle = 12Cpus = 1Memory = 128Requirements = LoadAvg <= 0.300000 && KeyboardIdle > 15 * 60
Rank = 0
Mateti, Condor46
ClassAd MatchingClassAd Matching
ClassAds are always considered in pairs:– Does ClassAd A match ClassAd B (and vice
versa)?– This is called “2-way matching”
If the same attribute appears in both ClassAds, you can specify which attribute you mean by putting “MY.” or “TARGET.” in front of the attribute name
Mateti, Condor47
ClassAd Matching “Example”ClassAd Matching “Example”
ClassAd AMyType = "Apartment“TargetType =
"ApartmentRenter“SquareArea = 3500RentOffer = 1000OnBusLine = TrueRank =
UnderGrad==False + TARGET.RentOffer
Requirements = MY.RentOffer -TARGET.RentOffer < 150
ClassAd BMyType =
"ApartmentRenter"TargetType =
"Apartment"UnderGrad = FalseRentOffer = 900Rank =
1/(TARGET.RentOffer + 100.0) + 50*HeatIncluded
Requirements = OnBusLine &&
SquareArea > 2700
Mateti, Condor48
ClassAds in the Condor SystemClassAds in the Condor System
ClassAds allow Condor to be a general system– Constraints and ranks on matches expressed
by the entities themselves– Only priority logic integrated into the Match-
MakerAll principal entities in the Condor system
are represented by ClassAds– Machines, Jobs, Submitters
Mateti, Condor49
ClassAds: Example for MachinesClassAds: Example for Machines
Friend = Owner == "tannenba“ || Owner == "wright"ResearchGroup = Owner == "jbasney" || Owner == "raman"
Trusted = Owner != "rival" && Owner != "riffraff"
Requirements = Trusted && ( ResearchGroup || (LoadAvg < 0.3 && KeyboardIdle > 15*60) )
Rank = Friend + ResearchGroup*10
Mateti, Condor50
ClassAd Machine ExampleClassAd Machine ExampleMachine will never start a job submitted by
“rival” or “riffraff”If someone from ResearchGroup (“jbasney”
or “raman”) submits a job, it will always runIf anyone else submits a job, it will only run
here if the keyboard has been idle for more than 15 minutes and the load average is less than 0.3
Mateti, Condor51
Machine Rank Example DescribedMachine Rank Example Described
If the machine is running a job submitted by owner “foo”, it will give this a Rank of 0, since foo is neither a friend nor in the same research group
If “wright” or “tannenba” submits a job, it will be ranked at 1 (since Friend will evaluate to 1 and ResearchGroup is 0)
If “raman” or “jbasney” submit a job, it will have a rank of 10
While a machine is running a job, it will be preempted for a higher ranked job
Mateti, Condor52
ClassAds: Example for JobsClassAds: Example for Jobs
Requirements = Arch == “INTEL” && OpSys == “LINUX” && Memory > 20
Rank = (Memory > 32) * ( (Memory * 100)
+ (IsDedicated * 10000) + Mips )
Mateti, Condor53
Job Example DescribedJob Example DescribedThe job must run on an Intel CPU, running
Linux, with at least 20 megs of RAMAll machines with 32 megs of RAM or less
are Ranked at 0Machines with more than 32 megs of RAM
are ranked according to how much RAM they have, if the machine is dedicated (which counts a lot to this job!), and how fast the machine is, as measured in MIPS
Mateti, Condor54
ClassAd Attributes in your PoolClassAd Attributes in your PoolCondor defines a number of attributes by
default, which are listed in the User Manual (“About Requirements and Rank”)
To see if machines in your pool have other attributes defined, use:– condor_status -long <hostname>
A custom-defined attribute might not be defined on all machines in your pool, so you’ll probably want to use “meta-operators”
Mateti, Condor55
ClassAd “Meta-Operators”ClassAd “Meta-Operators”
Meta operators allow you to compare against “UNDEFINED” as if it were a real value:– =?= is “meta-equal-to”– =!= is “meta-not-equal-to”– Color != “Red” (non-meta) would evaluate to
UNDEFINED if Color is not defined– Color =!= “Red” would evaluate to True if
Color is not defined, since UNDEFINED is not “Red”
Mateti, Condor56
Priorities In CondorPriorities In Condor
User Priorities– Priorities between users in the pool to ensure fairness
– The lower the value, the better the priority Job Priorities
– Priorities that users give to their own jobs to determine the order in which they will run
– The higher the value, the better the priority
– Only matters within a given user’s jobs
Mateti, Condor57
User Priorities in CondorUser Priorities in Condor
Each active user in the pool has a user priority
Viewed or changed with condor_userprioThe lower the number, the betterA given user’s share of available machines
is inversely related to the ratio between user priorities.– Example: Fred’s priority is 10, Joe’s is 20. Fred will be
allocated twice as many machines as Joe.
Mateti, Condor58
User Priorities in Condor, cont.User Priorities in Condor, cont.Condor continuously adjusts user priorities
over time– machines allocated > priority, priority worsens– machines allocated < priority, priority improves
Priority Preemption– Higher priority users will grab machines away from
lower priority users (thanks to Checkpointing…)– Starvation is prevented– Priority “thrashing” is prevented
Mateti, Condor59
Job Priorities in CondorJob Priorities in Condor
Can be set at submit-time in your description file with:prio = <number>
Can be viewed with condor_qCan be changed at any time with
condor_prioThe higher the number, the more likely the
job will run (only among the jobs of an individual user)
Mateti, Condor60
Managing a Large Cluster of JobsManaging a Large Cluster of Jobs
Condor can manage huge numbers of jobsSpecial features of the submit description
file make this easierCondor can also manage inter-job
dependencies with condor_dagman– For example: job A should run first, then, run
jobs B and C, when those finish, submit D, etc…
– We’ll discuss DAGMan later
Mateti, Condor61
Submitting a Large ClusterSubmitting a Large Cluster
Each process runs in its own directory: InitialDir = dir.$(process)
Can either have multiple Queue entries, or put a number after Queue to tell Condor how many to submit: Queue 1000
A cluster is more efficient: Your jobs will run faster, and they’ll use less space
Can only have one executable per cluster: Different executables must be different clusters!
Mateti, Condor62
Inter-Job Dependencies with Inter-Job Dependencies with DAGManDAGManDAGMan handles a set of jobs that must
be run in a certain orderAlso provides “pre” and “post” operations,
so you can have a program or script run before each job is submitted and after it completes
Robust: handles errors and submit-machine crashes
Mateti, Condor63
Using DAGManUsing DAGMan
You define a DAG description file, which is similar in function to the submit file you give to condor_submit
DAGMan restrictions:– Each job in the DAG must be in its own
cluster (for now)– All jobs in the DAG must have a User Log
and must share the same file
Mateti, Condor64
DAGMan Description FileDAGMan Description File
# is a commentFirst section names the jobs in your DAG
and associates a submit description file with each job
Second (optional) section defines PRE and POST scripts to run
Final section defines the job dependencies
Mateti, Condor65
Example DAGMan FileExample DAGMan File
Job A A.submitJob B B.submitJob C C.submitJob D D.submitScript PRE D d_input_checkerScript POST A a_output_processor A.outPARENT A CHILD B CPARENT B C CHILD D
Mateti, Condor66
Setting up a DAG for CondorSetting up a DAG for Condor
Create all the submit description files for the individual jobs
Prepare any executables you plan to useCan have a mix of Vanilla and Standard
jobsSetup any PRE/POST commands or scripts
you wish to use
Mateti, Condor67
Submitting a DAG to CondorSubmitting a DAG to Condor
condor_submit_dag DAG-description-fileThis will check your input file for errors
and submit a copy of condor_dagman as a scheduler universe job with all the necessary command-line arguments
Mateti, Condor68
Removing a DAGRemoving a DAG
On shutdown, DAGMan will remove any jobs that are currently in the queue that are associated with its DAG
Once all jobs are gone, DAGMan itself will exit, and the scheduler universe job will be removed from the queue
Mateti, Condor69
Typical ProblemsTypical Problems
Special requirements expressions for vanilla jobs
You didn’t submit it from a directory that is shared
Condor isn’t running as root You don’t have your file permissions setup
correctly
Mateti, Condor70
Special Requirements Expressions Special Requirements Expressions for Vanilla Jobsfor Vanilla JobsWhen you submit a vanilla job, Condor
automatically appends two extra Requirements:– UID_DOMAIN == <submit_uid_domain>– FILESYSTEM_DOMAIN == <submit_fs>
Since there are no remote system calls with Vanilla jobs, they depend on a shared file system and a common UID space to run as you and access your files
Mateti, Condor71
Special Requirements Expressions Special Requirements Expressions for Vanilla Jobsfor Vanilla JobsBy default, each machine in your pool is in
its own UID_DOMAIN and FILESYSTEM_DOMAIN, so your pool administrator has to configure your pool specially if there really is a common UID space and a network file system
If you don’t have an account on the remote system, Vanilla jobs won’t work
Mateti, Condor72
Shared Files for Vanilla JobsShared Files for Vanilla Jobs
May be not all directories are sharedInitialdir = /tmp will probably cause trouble for Vanilla jobs!
You must be sure to set Initialdir to a shared directory (or cd into it to run condor_submit) for Vanilla jobs
Mateti, Condor73
Why Don’t My Jobs Run?Why Don’t My Jobs Run?
Try condor_q -analyzeTry specifying a User Log for your jobLook at condor_userprio: maybe you have
a low priority and higher priority users are being served
Problems with file permissions or network file systems
Look at the SchedLog
Mateti, Condor74
Using condor_q -analyzeUsing condor_q -analyze
Analyzes your job’s ClassAd, get all the ClassAds of the machines in the pool, and tell you what’s going on:
Will report errors in your Requirements expression (impossible to match, etc.)
Will tell you about user priorities in the pool (other people have better priority)
Mateti, Condor75
Looking at condor_userprioLooking at condor_userprio
You can look at condor_userprio yourselfIf your priority value is a really high
number (because you’ve been running a lot of Condor jobs), other users will have priority to run jobs in your pool
Mateti, Condor76
File Permissions in CondorFile Permissions in Condor
If Condor isn’t running as root, the condor_shadow process runs as the user the condor_schedd is running as (usually “condor”)
You must grant this user write access to your output files, and read access to your input files (both STDOUT, STDIN from your submit file, as well as files your job explicitly opens)
Mateti, Condor77
File Permissions in CondorFile Permissions in Condor
Often, there will be a “condor” group and you can make your files owned and write-able by this group
For vanilla jobs, even if the UID_DOMAIN setting is correct, and they match for your submit and execute machines, if Condor isn’t running as root, your job will be started as user Condor, not as you!
Mateti, Condor78
Problems with NFS in CondorProblems with NFS in Condor
For NFS, sometimes the administrators will setup read-only mounts, or have UIDs remapped for certain partitions (the classic example is root = nobody, but modern NFS can do arbitrary remappings)
Mateti, Condor79
Problems with NFS in CondorProblems with NFS in Condor
If your pool uses NFS automounting, the directory that Condor thinks is your InitialDir might not exist on a remote machine
With automounting, you always need to specify InitialDir explicitly – InitialDir = /home/me/...
Mateti, Condor80
Problems with AFS in CondorProblems with AFS in Condor
If your pool uses AFS, the condor_shadow, even if it’s running with your UID, will not have your AFS token.
You must grant an unauthenticated AFS user the appropriate access to your files
Some sites provide a better alternative that world-writable files– Host ACLs– Network-specific ACLs
Mateti, Condor81
Looking at the SchedLogLooking at the SchedLog
Looking at the log file of the condor_schedd, the “SchedLog” file can possibly give you a clue if there are problems.
Find it with: condor_config_val schedd_log
You might need your pool administrator to turn on a higher “debugging level” to see more verbose output
Mateti, Condor82
Other User FeaturesOther User Features
Submit-Only installationHeterogeneous SubmitPVM jobs
Mateti, Condor83
Submit-Only InstallationSubmit-Only Installation
Can install just a condor_master and condor_schedd on your machine
Can submit jobs into a remote poolSpecial option to condor_install
Mateti, Condor84
Heterogeneous SubmitHeterogeneous Submit
The job you submit doesn’t have to be the same platform as the machine you submit from– Maybe you have access to a pool that is full of Alphas,
but you have a Sparc on your desk, and moving all your data is a pain
You can take an Alpha binary, copy it to your Sparc, and submit it with a requirements expression that says you need to run on ALPHA/OSF1
Mateti, Condor85
PVM Jobs in CondorPVM Jobs in Condor
Condor can run parallel applications – PVM applications now– Future work includes support for MPI
Master-Worker ParadigmWhat does Condor-PVM do?How to compile and submit Condor-PVM
jobs
Mateti, Condor86
Master-Worker ParadigmMaster-Worker Paradigm
Condor-PVM is designed to run PVM applications based on the master-worker paradigm.
Master– has a pool of work, sends pieces of work to the
workers, manages the work and the workersWorker
– gets a piece of work, does the computation, sends the result back
Mateti, Condor87
What does Condor-PVM do?What does Condor-PVM do?
Condor acts as the PVM resource manager.All pvm_addhost requests get re-mapped
to Condor. – Condor dynamically constructs PVM virtual
machines out of non-dedicated desktop machines.
When a machine leaves the pool, the user gets notified via the normal PVM notification mechanisms.
Mateti, Condor88
Submission of Condor-PVM jobsSubmission of Condor-PVM jobs
Binary Compatible– Compile and link with PVM library just as
normal PVM applications. No need to link with Condor.
In the submit description file, set:universe = PVMmachine_count = <min>..<max>
Mateti, Condor89
Resource Agent Resource Agent Configuration Configuration ExpressionsExpressions
STARTSTARTSTARTSTART
WANT SUSPENDWANT SUSPENDWANT SUSPENDWANT SUSPEND
SUSPENDSUSPENDSUSPENDSUSPEND
VACATEVACATEVACATEVACATE
WANT VACATEWANT VACATEWANT VACATEWANT VACATE
KILLKILLKILLKILL
True
True
True
True
True
False
False
Mateti, Condor90
Resource Agent ConfigurationResource Agent Configuration
Default SetupWANT_VACATE : True
WANT_SUSPEND : True
START : Keyboard_Idle && CPU_Idle
SUSPEND : Keyboard_Busy || CPU_Busy
CONTINUE : Keyboard and CPU idle again
VACATE : If Suspended > 10 minutes
KILL : If spent > 10 minutes in VACATE state
Mateti, Condor91
condor_mastercondor_master
Watches/restarts other daemonsSends Email if suspicious problems ariseRuns condor_preenProvides administrator remote control
Mateti, Condor92
Condor Administrator CommandsCondor Administrator Commands
condor_off [ hostname … ]condor_oncondor_restartcondor_reconfig condor_vacateCan be used by the Owner also
Mateti, Condor93
Host-based Access ControlHost-based Access Control
HOST_ALLOW and HOST_DENY to grant machines (subnets, domains) different access levels:
READ accessWRITE accessADMINISTRATOR accessOWNER access
Mateti, Condor94
Host-based Access Control Ex.Host-based Access Control Ex.
HOSTDENY_READ = *.comHOSTALLOW_WRITE = *.cs.wright.eduHOSTDENY_WRITE = ppp*.wright.edu, 172.44.*
HOSTALLOW_ADMINISTRATOR = osis111.cs.wright.edu
HOSTALLOW_OWNER = $(FULL_HOSTNAME), $(HOSTALLOW_ADMINISTRATOR)
Mateti, Condor95
Configuration File HierarchyConfiguration File Hierarchy
condor_config– Pool-wide default– Condor pool administrator’s requirements
condor_config.local– Overrides for a specific machine– Reflects Owner’s requirements
condor_config.root– System Administrator requirements
Mateti, Condor96
Obtaining CondorObtaining Condor
Condor accounts available! [email protected]
Condor executables can be downloaded from http://www.cs.wisc.edu/condor
Complete Users and Administrators manual http://www.cs.wisc.edu/condor/manual