linux cluster job management systems sge2197

22
Job Management Systems SGE v1.3  Author: Anand Vaid ya [email protected]

Upload: amir-abdella

Post on 03-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 1/22

Job Management Systems

SGEv1.3 Author: Anand Vaidya

[email protected]

Page 2: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 2/22

Why use SGE?Maintain order in a shared resource li!e "ueing u#

at a movie tic!et counter rather than mobbing thecounter

A##ly di$$erent usage #olicies %h&s and %ro$s getbetter treatment than $irst year grads

Everyone gets a $air share o$ the com#utingresource.

Page 3: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 3/22

Page 4: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 4/22

How does SGE work?

Users submit jobs to the Grid Engine.Unless resources are immediately availablenon-interactive jobs are kept in ueuesuntil resources to e!ecute them becomeavailable.

"obs are passed onto the availablee!ecution hosts

#ecords of each jobs progress through thesystem are kept and reported whenre uested.

Page 5: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 5/22

SGE Components$osts

%aster &coordinate activities' hold ueues(E!ecution &workers()dministration &sets up system' ueues etc(Submit &users can submit jobs from these(

Usually the master and admin host are the samemachines

*ueues &de+ned by the administrator(User and )dministrator ,ommands

aemons sge/ master &%aster aemon('sge/schedd &Scheduler aemon(' sge/e!ecd&E!ecution aemon( and sge/commd&,ommunication aemon(

Page 6: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 6/22

SGE Commands - qhost'hat is the state o$ the cluster( )o* many nodes+

ty#e+ load( 'hat is my chance o$ getting a node(,root@shar! - / "host

)0S 2AME A 4) 24%5 60A& MEM 0 MEM5SE S'A% 0 S'A%5S

777777777777777777777777777777777777777777777777777777777777777777777777777777

global 7 7 7 7 7 7 7shar!7c88 l9 ;7amd<; .8 3.=G ;8.>M ;.8G 8.8

shar!7c8 l9 ;7amd<; .88 3.=G 1;.=M ;.8G 8.8

shar!7c83 l9 ;7amd<; 1.?< 3.=G 1 .=M ;.8G 8.8

Page 7: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 7/22

SGE Commands - qsub4reate a obscri#ts Bmy ob.shC

Submit $or e9ecutionD "sub my ob.sh

our ob ?; BFmy ob.shFC has been submitted.

Sim#lest Job:,vaidya@shar! - D cat my ob.sh

/ HbinHsh

slee# 18date I Htm#Htest1.out.t9t

Variations: "sub 7c*d my ob.sh

Page 8: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 8/22

&,( )nand 0aidya anand1novaglobal.com.sg

SGE Commands - qstatchec! status o$ your ob:

"stat "stat 7$ "stat 7u username "stat 7 obKid

,root@shar! - / "stat ob7L& #rior name user state submitHstart at "ueueslots a7tas!7L&7777777777777777777777777777777777777777777777777777777777777777777777777777777 <3= 8. 88 )4%&LV? test1 r 8 H1?H 88< 18:1<:31 all."@shar!7c88

1 < > 8. 88 )4%&LV1 test1 r 8 H1?H 88< 13:3?:3 all."@shar!7c88

1

<=; 8. 88 44&VL test1 r 8 H1?H 88< 3: :1= all."@shar!7c81 <= 8. 88 44&VL1 test1 r 8 H1?H 88< 3: :1= all."@shar!7c8

1

Page 9: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 9/22

SGE Commands - qstatStatus o$ the ob is indicated by letters as:

"* 7 *aiting t 7 trans$eringr 7 running s+S 7 sus#ended7 restarted 7 threshold

Page 10: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 10/22

SGE Commands - qdel&elete your ob+ i$ you *ish

"del ?;3vaidya has deleted ob ?;3

Page 11: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 11/22

SGE Commands - qmon"mon is a N'indo*s G5L tool to

submitHdeleteHvie* obs+ con$igure SGE systemE9am#le: Submit a ob using "mon

4lic! the Job Submission icon.4lic! the Job Scri#t $ile selection icon to o#en a $ile selection bo9

and select your scri#t $ile. hen+ clic! 0O.4lic! the Submit button at the bottom o$ the Job Submission

dialog. A$ter a cou#le o$ seconds+ you should be able to monitor your

ob in the Job 4ontrol dialog. 4lic! the Job 4ontrol icon in thePM02 control #anel.

ou $irst see it under %ending Jobs+ and it "uic!ly moves tounning Jobs a$ter it gets started.

Page 12: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 12/22

SGE Commands – qsh, qtcshSubmit a Lnteractive session re"uest:

"login"rshEnsure you have a valid NServer running on your

des!to#. Allo* remote 9clients to dis#lay on yourdes!to#.Submit an Lnteractive session re"uest:

"sh"tcsh

2ote: using this $eature needs additional con$iguration+ maynot *or! other*ise.

Page 13: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 13/22

SGE Commands – obscriptsam#le ob scri#t:

/ HbinHbash

/

/D 7c*d

/D 7 y

/D 7S HbinHbash

/D 7V

date

slee# 18

env

date

Page 14: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 14/22

SGE Commands – obscriptsam#le ob scri#t:

/ HbinHbash

/

/D 7c*d

/D 7 y

/D 7S HbinHbash

/

DM%LK&L Hm#irun 7n# D2S60 S 7machine$ile

D M%&L Hmachines my#arallel#rog.e9e Qin$ile.t9t out$ile.t9tR

Page 15: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 15/22

SGE Commands – obscript7c*d change to current dir be$ore running ob

7 y merge error *ith stdout

7r y code is re7runnable

72 name set the ob name

7l hKrt 88:38:88 run ob $or ma9 o$ 38mins

7#e m#ich Lnvo!e #arallel environment

7#e m#ich7ib use in$iniband #arallel environment

7#e m#ich7eth use ethernet #arallel env

7V carry all env variable settings

Page 16: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 16/22

!dmin Commands2e9t $e* slides sho* commands use$ul $or SGE

admins Bnot usersHresearchersC

Page 17: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 17/22

SGE Commands – qcon" Sho*:

com#le9es: "con$ 7sc"ueues: "con$ 7s"l%E: "con$ 7s#le9ec host: "con$ 7sel "con$ 7se c3submit hosts: "con$ 7ssadmin hosts: "con$ 7shlist calendars "con$ 7scallcon$iguration "con$ 7scon$ user list: "con$ 7suserlScheduler con$: "con$ 7sscon$

SGE C d i #

Page 18: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 18/22

SGE Commands – qpin#[anand@shark-c02 ~]$ qping -info shark-c01 537 execd 1

05/24/200 21!57!34!

"# % &ersion! 0'1"# % (essage id! 1

s)ar) )i(e! 05/24/200 21!31!37*114+4774,7

r.n )i(e [s]! 17 +(essages in read .ffer! 0

(essages in ri)e .ffer! 0

nr' of connec)ed c ien)s! 2

s)a).s! 0

info! dispa)cher! *0'04

%oni)or! disa ed

$S% C d

Page 19: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 19/22

$S% Commandsbsub – submit a jobbsto# sus#end a obbresume resume a sus#ended tas!bto# move ob to to#bs*itch move obs bet*een "ueues

lsgrun run a tas! on a set o$ hostsb!ill !ill a ob

$S% C d

Page 20: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 20/22

$S% Commandslsmon – monitor load, resource

availability...lsid sho* ls$ details Bversion etcClshosts sho* hosts T static in$olsload sho* load in$o $or hosts

lsin$o sho* ls$ con$ig in$obusers sho* user in$obacct sho* acct in$o on $inished obsb obs sho* in$o on obs

b#ee! sho* stdinHstdout o$ un$inished obs

! k l d# & C i #

Page 21: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 21/22

!cknowled#ements & Copyin#his material is based on my e9#erience as *ell as materialcollected $rom SGE documentation.

his #resentation can be redistributed as $ollo*s:2o commercial re7distribution: eg+ as #art o$ a $or7#ro$it 4& 0M

or as #art o$ your sales #itch. See! my #ermission $irst.Must attribute the document creator.Share ali!e: L$ you use this document and enhance it or modi$y+

share the modi$ications or the modi$ied document'hich means L a##ly: 4reative 4ommons 6icense+

htt#:HHcreativecommons.orgHlicensesHby7nc7saH . H

'h E d

Page 22: Linux Cluster Job Management Systems Sge2197

8/12/2019 Linux Cluster Job Management Systems Sge2197

http://slidepdf.com/reader/full/linux-cluster-job-management-systems-sge2197 22/22

'he Endhan!s $or your time. L$ you have any $eedbac!+ corrections or"uestions #lease contact me: Anand Vaidya+

[email protected]

his document *as created *ith 0#en0$$ice on 6inu9. email me i$you *ant the od# $ile instead o$ the #d$