heterogeneous grid design and implementation thesis presentation by jeffrey wells state university...
Post on 18-Dec-2015
213 Views
Preview:
TRANSCRIPT
Heterogeneous Grid Design and Implementation
Thesis PresentationBy Jeffrey Wells
State University New York Institute of TechnologyMay 7, 2008CSC 599
Outline
Purpose Overview Intro to Globus Toolkit and Condor Interoperability Experiments Results Conclusion
Purpose
This thesis investigates the extent to which two open source approaches to Grid computing achieves interoperability. The Globus Alliance’s Globus Toolkit and the University of Wisconsin-Madison’s Condor scheduler were used, in this thesis, to offer an example of interoperability.
Overview
What is a Grid? Condor Scheduler Globus Toolkit BITS Regional Grid SUNYIT Local Grid Network Grid Security
What is a Grid?
What is a Grid you might ask… definition given by (Ian Foster of the University of Chicago) – is a system that coordinates resources that are not subject to
centralized control uses standardized, open, general purpose protocols and interfaces delivers non- trivial qualities of service
Examples of Grids (TeraGrid has 20 Teraflops of computing power and 1 Petabyte of storage, Access Grid used for scheduling and conducting meetings, and eDiaMoND used for medical research in England)
Condor Scheduler
Condor High Throughput Computing (HTC) – Ties idle resources together to harness their idle
resource in a distributed fashion. Condor was developed by the University of
Wisconsin-Madison Other distributed schedulers …
PBS (Portable Batch System ) LSF (Load Sharing Facility) CSF (Community Scheduler Framework)
SETI (Search of Extraterrestrial Intelligence)
Globus Toolkit
The Globus Toolkit is an open source software toolkit used for building Grid systems and applications. It is constantly being developed by the Globus Alliance at the University of Chicago and many others all over the world.
Other type of Grid toolkit… Virtual Data Toolkit (VDT)
SUNY IT Local Grid Network
192.168.14.20Globus 405
192.168.14.30Globus 405
192.168.14.40Condor 605
192.168.14.50Globus 405 192.168.14.60
Globus 405Condor 605
192.168.14.70Condor 605
bitsgw
Grid Security
Grid Security Infrastructure (GSI) implements public key cryptography as the
backbone for its functionality The reasons behind GSI are:
the requirement for secure communication between resources of a Grid;
prevent a centrally managed security system allow for a “signal sign-on” for users of the Grid. This
includes delegation of credentials for jobs that require more than one resource and /or sites
SUNY Geneseo
Debian Linux Cluster
Condor Execute/Submit
Services used, tested and evaluated:• GridFTP, RFT (Reliable File Transfer)• Delegation, authentication authorization• Credential management• Grid Security Infrastructure (GSI)• Various Condor submits
Globus Services
Condor Central Manager (Scheduler)
Central Manager
Submit/Execute
Submit/Execute
Submit/Execute
Globus Globus
Central Manager
•Condor Central Manager (Scheduler) submits jobs either to a Condor Submit/Execute or Globus Machine. •Each machine “advertises” via ClassAd to Central Manager its resources•Central Manager matches up resource with submitted job requires•Central Manger sends executable to remote resource that matches requirement.•Once job is completed, Execute Machine reports back to Central Manager•Central Manager reports final results.
Cla
ssA
d/
Re
sults
Job
Re
qu
est
Job
Re
qu
est
Job
Re
qu
est
Cla
ssA
d/
Re
sults
Cla
ssA
d/
Re
sults
Job
Re
qu
est
Cla
ssA
d/
Re
sults
Various Jobs Implemented
Condor Jobs Vanilla Standard Java Parallel Grid (Globus)
Globus Jobs Forwarded a job to
Condor machine with a scheduler
From a Condor scheduler to a Globus machine (Globus Job).
Forward Jobs to other
Globus machines.
Interoperability Experiments
Globus, Condor and Condor-G Condor-G Interface Job Examples Condor to Globus Job Submit Globus to Condor Job Submit Test Scripts Swift Workflow Some More Test Scripts
Globus, Condor and Condor-G
Linux Cluster
Condor Workstation Pool
Globus Services
Condor Scheduler
Condor-G manages jobsthrough the resource manager of the GlobusToolkit.
Results of the Job passed to the Globus Toolkitare returned via the Condor-G interface.
Condor_startd advertises about the resource and executes the job.Condor_starter spawns the remote job.Condor_shadow maintains the resources.
Condor_master is responsible for keeping all the rest of the Condor daemons running. Condor_schedd submits jobs to remote resources for the job queue.Condor_negotiator is responsible for the match making.
Condor-G Interface
Linux Cluster
Globus Services
Condor Workstation Pool
Condor-G uses the Globus resource manager to start a job on the remote machine.It also manages the job running on the remote resource.
Condor-G waits for the job to becompleted and then returns theresults.
Condor-G interface
Job ExamplesCondor Job and Globus Script======================== Condor to Globus== test.submit======================universe = gridexecutable = myscript.sharguments = TestJob 10JobManager_type = Condorgrid_type = gt4globusscheduler =https://stengal.cs.sunyit.edu:8443/wsrf/services/ManagedJobFactoryService/log = test.logoutput = test.outputerror = test.errorshould_transfer_files = YESwhen_to_transfer_output = ON_EXITQueue
#! /bin/shecho "I'm process id $$ on" `hostname`echo "This is sent to standard error" 1>&2dateecho "Running as binary $0" "$@"echo "My name (argument 1) is $1"echo "My sleep duration (argument 2) is $2"sleep $2echo "Sleep of $2 seconds finished. Exiting"echo "RESULT: 0 SUCCESS“
Condor Job and MPI Program########################### Submit description file# for /bin/hostname# (Parallel)#########################universe = parallelexecutable = /bin/hostnamemachine_count = 2log = parallellogfileoutput = outfileMPI.$(NODE)error = errfileMPI.$(NODE)should_transfer_files = YESwhen_to_transfer_output = ON_EXITqueue
MPI Program#include "mpi.h"#include <stdio.h>int main( int argc, char* argv[] ){ int rank, size; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD,
&rank ); MPI_Comm_size( MPI_COMM_WORLD, &
size ); printf( "I am %d of %d\n", rank, size ); MPI_Finalize();return 0;}
Condor to Globus Job Submit
Condor-G Condor
(Scheduler)GASSServer
Gate Keeper
Job Manager
Globus Toolkit
Job
1.) Central Manager submits grid job
2.) Job Passes through Condor-G to Globus gate keeper 3.) Verify security via gate
keeper
4.) Forward job to job manager 5.) Process and return result toCentral manager
Globus to Condor Job Submission
Gram Client
GASS Server
GRAM Gatekeeper
GRAM Job Manager
Batch System Condor
GASS Client
Local Machine
Remote Machine
GRAM Job Request
Creation
Job RequestData
Callback
Grid -Proxy
Sample Test Scripts
Perl Scripts were created to test most functionality of the BITS regional Grid
Job submit from Globus to Condor print " \n------> Submitting a Job to Condor on Stengel
<---------\n"; system "globusrun-ws -submit -Ft Condor -S -c /bin/date"; Job submit from Condor to Globus print "-----> Submitting a Condor Globus Job
<--------\n"; system "condor_submit
/home/wells/testjobs/condorjobs/globussubmits/submitGFork";
Swift Workflow
Swift is a data-oriented coarse-grained scripting language that supports dataset typing and mapping, dataset iteration, conditional branching, and sub-workflow composition
The Swift programs, also known as workflows, are written in a language called SwiftScript
Swift handles the execution of these programs on remote sites
Sample Test Scripts cont.
Swift Job submit to SUNYIY3 (Geneseo) print "\n<-------- Swift Job Sent to SUNY_IT3
------------>\n"; system "swift sites.file
/home/wells/testjobs/swiftjobs/sites3.xml /home/wells/testjobs/swiftjobs/first.swift";
Results Condor.pm is malformed for job submits from Globus to Condor.
Addition of should_transfer_files = YES and when_to_transfer_output = ON_EXIT must be added to script.
-S is used in the Globus Toolkit 4.0.5 versus –s in 4.0.4. Mpiexe.py, mpdlib.py was modified so that ws-gram was able to send a
distributed job to MPICH2. Thanks to Dr. Ralph Butler of Middle Tennessee State University.
Another application layer can easily be added to the Globus Toolkit. Applications are changing and maturing faster than the documentation. Mail groups and lists are not always helpful nor do they respond to
questions. Documentation is scarce on the MPI-2 and Globus Toolkit connection
and is also outdated. Documentation on the Condor and Globus interface is outdated.
Resolved by installing Condor and then Globus with Condor scheduler.
Conclusion
1. It is necessary to modify the Condor.pm script in order to allow the Globus Toolkit to submit jobs to the Condor Scheduler.
2. It is necessary to correct Mpiexe.py, mpdlib.py in order for the Globus Toolkit to submit a distributed job to MPICH2.
3. Investigation found that –S is now used to submit a job to Condor under 4.05. versus the –s under 4.0.4
4. Another application layer can be easily added to the Globus Toolkit without effecting the interoperability with the Condor Scheduler.
5. Documentation is scarce on the MPI-2 and Globus Toolkit connection and is also outdated.
6. Applications are changing and maturing faster than the documentation.
References
Globus Toolkit Version 4 Grid Security Infrastructure: A Standards Perspective. The Globus Security Team, Version 4 updated September 12, 2005. Retrieved on September 26, 2007 from http://www.globus.org/toolkit/docs/development/4.1.2/security/GT4-GSI-Overview.pdf/
Tanenbaum, A.(2003) Computer Networks Fourth Edition. New Jersey: Prentice Hall PTR
Condor Users Manual Version 6.8 (2007) Retrieved September 24, 2007 from http://www.cs.wisc.edu/condor/manual/v6.8/
Globus Toolkit Administration Manual (2007) Retrieved September 24, 2007 from http://www.globus.org/toolkit/AdministrationManual.pdf
Swift Users Guide (Change Revision 1700). Retrieved on February 16, 2008 from http://www.ci.uchicago.edu/swift/guides/userguide.pdf
Swift – Home (2007), retrieved on February 16, 2008 from http://www.ci.uchicago.edu/swift/
Yong Zhao, Michael Hadean, Ben Clifford, Ian Foster, Gregor von Laszewski, Ioan Raicu, Tiberiu Stef-Praun, Mike Wilde Swift: Fast, Reliable, Loosely Coupled Parallel Computation (2007), retrieved on March 2, 2008 from http://www.ci.uchicago.edu/swift/papers/Swift-SWF07.pdf
References (cont.)
Mausolf, J. (2005) Grid In Action: Implementation SOA and Web Services In Grid. (2005, August 09). Retrieved September 24, 2007, from http://www.ibm.com/developmentworks/Grid/library/gr-gt4graph/
Foster, I. (2002) What is a Grid? A Three Point Checklist. Argonne National Laboratory & University of Chicago. Retrieved September 2, 2007 from http://www.globus.org
Overview of the Grid Security Infrastructure, Globus Alliance Globus Toolkit. Retrieved May 6, 2008 from http://www.globus.org/security/overview.html
Noel, C (2007). What is a Grid? CETIC’s Tentative Definition. Retrieved on September 6, 2007 from http://www.cetic.be/article432.html
top related