apst internals sathish vadhiyar. apstd daemon should be started on the local resource opens a port...

21
APST Internals APST Internals Sathish Vadhiyar Sathish Vadhiyar

Upload: marjorie-dalton

Post on 14-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

APST InternalsAPST Internals

Sathish VadhiyarSathish Vadhiyar

Page 2: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

apstd daemon should be started on the local apstd daemon should be started on the local resourceresourceOpens a port to listen for apst client requestsOpens a port to listen for apst client requestsRuns on the host where input files are locatedRuns on the host where input files are locatedInput files can also be specified by running Input files can also be specified by running <files> element<files> elementapstd automatically copies output files from apstd automatically copies output files from working directory to where apstd is startedworking directory to where apstd is startedapst and apstd started by same user since apstd apst and apstd started by same user since apstd writes files on behalf of apst’s userwrites files on behalf of apst’s user

Page 3: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

A APST run is associated with a XML fileA APST run is associated with a XML fileTask dependency can be enforced by APST Task dependency can be enforced by APST XMLXML

<apst><apst> <tasks><tasks> <task executable='second' input='anyfile'/><task executable='second' input='anyfile'/> <task executable='first' output='anyfile'/> <task executable='first' output='anyfile'/> </tasks></tasks></apst> </apst>

Page 4: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

Some times there may not be file Some times there may not be file dependency but task dependencydependency but task dependency

<apst> <files> <file id='anyfile' <apst> <files> <file id='anyfile' download='no' transfer='no'/> </files> download='no' transfer='no'/> </files> <tasks> <task executable='second' <tasks> <task executable='second' input='anyfile'/> <task executable='first' input='anyfile'/> <task executable='first' output='anyfile'/> </tasks> </apst> output='anyfile'/> </tasks> </apst>

Page 5: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

XML exampleXML example

<apst> <storage> <disk id='blueDisk'><scp <apst> <storage> <disk id='blueDisk'><scp server='blue.ufo.edu'/></disk> <disk id='greenDisk'><scp server='blue.ufo.edu'/></disk> <disk id='greenDisk'><scp server='green.ufo.edu'/></disk> <disk id='redDisk'><scp server='green.ufo.edu'/></disk> <disk id='redDisk'><scp server='red.ufo.edu'/></disk> <disk id='grayDisk'><scp server='red.ufo.edu'/></disk> <disk id='grayDisk'><scp server='gray.ufo.edu'/></disk> </storage> <compute> <host server='gray.ufo.edu'/></disk> </storage> <compute> <host id='blueHost disk='blueDisk'><ssh server='blue.ufo.edu'/></host> id='blueHost disk='blueDisk'><ssh server='blue.ufo.edu'/></host> <host id='greenHost disk='greenDisk'><ssh <host id='greenHost disk='greenDisk'><ssh server='green.ufo.edu'/></host> <host id='redHost server='green.ufo.edu'/></host> <host id='redHost disk='redDisk'><ssh server='red.ufo.edu'/></host> <host disk='redDisk'><ssh server='red.ufo.edu'/></host> <host id='grayHost disk='grayDisk'><ssh server='gray.ufo.edu'/></host> id='grayHost disk='grayDisk'><ssh server='gray.ufo.edu'/></host> </compute> <files> <file id='blueOrGreenOnly' transfer='no'> <copy </compute> <files> <file id='blueOrGreenOnly' transfer='no'> <copy disk='blueDisk'/> <copy disk='greenDisk'/> </file> </files> <tasks> disk='blueDisk'/> <copy disk='greenDisk'/> </file> </files> <tasks> <task executable='first' input='blueOrGreenOnly'/> </tasks> </apst> <task executable='first' input='blueOrGreenOnly'/> </tasks> </apst>

Page 6: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

securitysecurity

Some kind of security regarding what kind of commands Some kind of security regarding what kind of commands will apstd accept over socketwill apstd accept over socketGiven a description of the tasks to do and the resources Given a description of the tasks to do and the resources (disks and machines) available, APST will assign (disks and machines) available, APST will assign individual tasks to available machines, copy the input individual tasks to available machines, copy the input files, run the tasks, and return the output files. APST also files, run the tasks, and return the output files. APST also tries to assign tasks to machines intelligently, using tries to assign tasks to machines intelligently, using information such as the load and speed of individual information such as the load and speed of individual machines. machines. The main APST program, The main APST program, apstdapstd, handles all of the task , handles all of the task assignment, application execution, and file copying.assignment, application execution, and file copying.Splitting the control and user interface portions of APST Splitting the control and user interface portions of APST like this allows you, for example, to run like this allows you, for example, to run apstdapstd on your on your main system but control it from your laptop. main system but control it from your laptop.

Page 7: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

Using local resourcesUsing local resources

<apst><apst> <compute><compute> <host id='myMachine'/><host id='myMachine'/> </compute></compute> <tasks><tasks> <task executable='perl' arguments='/home/$<task executable='perl' arguments='/home/$

{USER}/apst/Examples/charcount.pl {USER}/apst/Examples/charcount.pl /home/${USER}/apst/Examples/charcount0.dat' /home/${USER}/apst/Examples/charcount0.dat'

stdout='charcount0.out' />stdout='charcount0.out' /> </tasks></tasks></apst> </apst>

/home/${USER}/apst/bin/apstd -d --port 7890 first.xml /home/${USER}/apst/bin/apstd -d --port 7890 first.xml

Page 8: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

APST can use remote machines accessed APST can use remote machines accessed through either a Globus GRAM or ssh, through either a Globus GRAM or ssh, remote storage accessed through a remote storage accessed through a Globus GASS server, scp, ftp, sftp, or an Globus GASS server, scp, ftp, sftp, or an SRB server, and queueing systems SRB server, and queueing systems controlled by Condor, DQS, LoadLeveler, controlled by Condor, DQS, LoadLeveler, LSF, PBS, or SGE. LSF, PBS, or SGE.

Page 9: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

Accessing remote resources – walk Accessing remote resources – walk throughthrough

<apst><apst> <compute><compute> <host id='blueHost'> <ssh <host id='blueHost'> <ssh

server='blue.ufo.edu'/> </host>server='blue.ufo.edu'/> </host> </compute></compute></apst> </apst>

Launches task on blueHost through ssh but Launches task on blueHost through ssh but assume files on local disk can be directly assume files on local disk can be directly accessedaccessed

Page 10: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

<apst> <storage> <disk id='blueDisk'> <apst> <storage> <disk id='blueDisk'> <scp server='blue.ufo.edu'/> </disk> <scp server='blue.ufo.edu'/> </disk> </storage> <compute> <host id='blueHost' </storage> <compute> <host id='blueHost' disk='blueDisk'> <ssh disk='blueDisk'> <ssh server='blue.ufo.edu'/> </host> server='blue.ufo.edu'/> </host> </compute> </apst></compute> </apst>This tells This tells apstdapstd that blueHost can see files that blueHost can see files available on blueDisk, rather than those on available on blueDisk, rather than those on the local disk. the local disk.

Page 11: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

<apst> <storage> <disk id='blueDisk'> <scp <apst> <storage> <disk id='blueDisk'> <scp server='blue.ufo.edu'/> </disk> </storage> <compute> server='blue.ufo.edu'/> </disk> </storage> <compute> <host id='blueHost' disk='blueDisk'> <ssh <host id='blueHost' disk='blueDisk'> <ssh server='blue.ufo.edu'/> </host> </compute> <tasks> server='blue.ufo.edu'/> </host> </compute> <tasks> <task executable='perl' <task executable='perl' arguments='/home/${USER}/apst/Examples/charcount.pl arguments='/home/${USER}/apst/Examples/charcount.pl /home/${USER}/apst/Examples/charcount0.dat' /home/${USER}/apst/Examples/charcount0.dat' stdout='charcount0.out' /> </tasks> </apst>stdout='charcount0.out' /> </tasks> </apst>The problem with this XML is that it requires that APST The problem with this XML is that it requires that APST be installed on the remote machine in be installed on the remote machine in /home/${USER}/apst, since the /home/${USER}/apst, since the argumentsarguments task attribute task attribute refers to files in this directory. refers to files in this directory.

Page 12: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

<apst> <storage> <disk id='local' <apst> <storage> <disk id='local' datadir='/home/${USER}/apst/example'/> <disk id='blueDisk' datadir='/home/${USER}/apst/example'/> <disk id='blueDisk' datadir='/tmp'> <scp server='blue.ufo.edu'/> </disk> </storage> datadir='/tmp'> <scp server='blue.ufo.edu'/> </disk> </storage> <compute> <host id='blueHost' disk='blueDisk'> <ssh <compute> <host id='blueHost' disk='blueDisk'> <ssh server='blue.ufo.edu'/> </host> </compute> <tasks> <task server='blue.ufo.edu'/> </host> </compute> <tasks> <task executable='perl' arguments='./charcount.pl ./charcount0.dat' executable='perl' arguments='./charcount.pl ./charcount0.dat' input='charcount.pl charcount0.dat' stdout='charcount0.out' /> input='charcount.pl charcount0.dat' stdout='charcount0.out' /> </tasks> </apst></tasks> </apst>Equivalent toEquivalent toscp /home/${USER}/apst/Examples/charcount.pl scp /home/${USER}/apst/Examples/charcount.pl blue.ufo.edu:/tmp/charcount.pl scp blue.ufo.edu:/tmp/charcount.pl scp /home/${USER}/apst/Examples/charcount0.dat /home/${USER}/apst/Examples/charcount0.dat blue.ufo.edu:/tmp/charcount0.dat ssh blue.ufo.edu 'cd /tmp; perl blue.ufo.edu:/tmp/charcount0.dat ssh blue.ufo.edu 'cd /tmp; perl ./charcount.pl ./charcount0.dat > charcount0.out' scp ./charcount.pl ./charcount0.dat > charcount0.out' scp blue.ufo.edu:/tmp/charcount0.out blue.ufo.edu:/tmp/charcount0.out /home/${USER}/apst/Examples/charcount0.out /home/${USER}/apst/Examples/charcount0.out

Page 13: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

Run the above example: Run the above example: /home/${USER}/apst/bin/apstd -d --port 7890 /home/${USER}/apst/bin/apstd -d --port 7890 second.xmlsecond.xml

For globus:For globus: Scp -> gassScp -> gass Ssh -> globusSsh -> globus <globus server='blue.ufo.edu:4300'/><globus server='blue.ufo.edu:4300'/> - i.e. machine - i.e. machine

and port where gatekeeper is runningand port where gatekeeper is running E.g. E.g. <gass server='gridftp://blue.ufo.edu:2345'/><gass server='gridftp://blue.ufo.edu:2345'/>. . Run grid-proxy-init before starting apstdRun grid-proxy-init before starting apstd

Page 14: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

Apst client programApst client program

You can use You can use apstapst to examine your to examine your application's state, add, stop, or restart application's state, add, stop, or restart tasks, and add or disable resourcestasks, and add or disable resources

/home/${USER}/apst/bin/apst --host /home/${USER}/apst/bin/apst --host localhost:7890 localhost:7890 commandcommand

Page 15: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

Accessing batch systemsAccessing batch systems

<apst> <storage> <disk id='bigDisk'> <scp <apst> <storage> <disk id='bigDisk'> <scp server='big.ufo.edu'/> </disk> </storage> server='big.ufo.edu'/> </disk> </storage> <compute> <host id='bigHost' <compute> <host id='bigHost' disk='bigDisk' cpus='8'> <ssh disk='bigDisk' cpus='8'> <ssh server='big.ufo.edu'/> <pbs nodes='20' server='big.ufo.edu'/> <pbs nodes='20' time='240' queue='normal'/> </host> time='240' queue='normal'/> </host> </compute> </apst> </compute> </apst> Can replace pbs with lsf, condor, Can replace pbs with lsf, condor, loadlevelerloadleveler

Page 16: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

Gridinfo tagGridinfo tag

<apst> <gridinfo> <infosource <apst> <gridinfo> <infosource id='localInfo'> <local/> </infosource> id='localInfo'> <local/> </infosource> <infosource id='gangliaInfo'> <ganglia <infosource id='gangliaInfo'> <ganglia server='ganglia.ufo.edu'/> </infosource> server='ganglia.ufo.edu'/> </infosource> <infosource id='mdsInfo'> <mds <infosource id='mdsInfo'> <mds server='mds.ufo.edu:2345' basedn='mds-server='mds.ufo.edu:2345' basedn='mds-vo-name=local,o=grid'/> </infosource> vo-name=local,o=grid'/> </infosource> <infosource id='nwsInfo'> <nws <infosource id='nwsInfo'> <nws server='nws.ufo.edu:8800'/> </infosource> server='nws.ufo.edu:8800'/> </infosource> </gridinfo> </apst> </gridinfo> </apst>

Page 17: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

Apstd daemonApstd daemon

Can be started –heuristic= option. Default Can be started –heuristic= option. Default is wqis wqXml file has <storage>, <compute>, Xml file has <storage>, <compute>, <files>, <tasks><files>, <tasks><disk><disk> Attributes – unique id, datadirAttributes – unique id, datadir Access method elementAccess method element Access method can be Access method can be <ftp/><ftp/>, , <gass/><gass/>, ,

<local/><local/>, , <scp/><scp/>, , <sftp/><sftp/>, or , or <srb/><srb/>

Page 18: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

<host><host> Attributes – unique ID, cpus, disk, dnsname, Attributes – unique ID, cpus, disk, dnsname,

memory, wdmemory, wd Access method - Access method - <globus/><globus/>, , <local/><local/>, or , or

<ssh/><ssh/> Batch queuing system - Batch queuing system - <condor\><condor\>, , <dqs/><dqs/>, ,

<loadleveler/><loadleveler/>, , <lsf/><lsf/>, , <pbs/><pbs/>, or , or <sge/><sge/>Attributes – account, memory, node, nodetype, Attributes – account, memory, node, nodetype, queue, stdin, stdout, stderr, time, optionqueue, stdin, stdout, stderr, time, option

Page 19: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

<files><files> Specifies input, output and executable filesSpecifies input, output and executable files Contains one or more file attributeContains one or more file attribute

<file><file> Input files may have transfer attribute (yes or no) – Input files may have transfer attribute (yes or no) –

whether files have to be transferred from submitting whether files have to be transferred from submitting machinemachine

Output files have analogously download attribute, Output files have analogously download attribute, may also have size attribute indicating the size of the may also have size attribute indicating the size of the output file – useful for scheduling decisionsoutput file – useful for scheduling decisions

Page 20: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

<file> element may have <copy> element <file> element may have <copy> element for input filesfor input files

To indicate the placement of copies of the To indicate the placement of copies of the file that you have pre-staged to remote file that you have pre-staged to remote disksdisks

Will have disk attribute and copy attributeWill have disk attribute and copy attribute

Page 21: APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host

<task><task> Attributes – executable, id, groups, wd, Attributes – executable, id, groups, wd,

arguments, input, stdin, stdout, stderr, priority, arguments, input, stdin, stdout, stderr, priority, host, memory, costhost, memory, cost

<infosource><infosource> Access method - Access method - <ganglia/><ganglia/>, , <local/><local/>, ,

<mds/><mds/>, or , or <nws/><nws/>..