installation of a condor supercomputing pool

21
Brain Campbell Bryce Carmichael Unquiea Wade Mentor: Dr. Eric Akers

Upload: shanon

Post on 21-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Installation of a Condor Supercomputing pool. Brain Campbell Bryce Carmichael Unquiea Wade Mentor: Dr. Eric Akers. Abstract. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Installation of a Condor Supercomputing pool

Brain CampbellBryce Carmichael

Unquiea Wade

Mentor:Dr. Eric Akers

Page 2: Installation of a Condor Supercomputing pool

The international polar year was designed to study and better understand the current state of the climatic changes to the world’s ice sheets. For the last few decades, there have been automated weather stations and satellites in geo-synchronous orbit that created data sets. Today, numerous amounts of data are unexplored due to insufficient funding and the scarcity of resources. For this reason, the polar grid concept was proposed to delegate the analysis of the existing data sets.    The goal of the Elizabeth City State University’s Polar Grid Team was to construct a model network to serve as a base for a super computing pool. The super computing pool will be constructed on the university’s campus and linked to the overall polar grid system. Numerous Software and protocols were researched that are currently in use at other institutions around the nation. From the possible protocols, the condor software was chosen. Condor was created and developed at the University of Wisconsin because of easier usage and its willingness for expansion.   An eighteen node computing pool was constructed and tested within Dixon Hall's second floor lab using Condor. This pool was comprised of seventeen desk-tops running on a Windows NT platform, with the pool's mater housed in Lane hall acting as a Linux based server.

Page 3: Installation of a Condor Supercomputing pool

The goal was to utilize all of our computers.

Gain knowledge about Supercomputing.

Setup a pool of computers that can be accessed by Polar Grid.

Familiarize team members with job submission and overall operation of Condor.

Page 4: Installation of a Condor Supercomputing pool

What is Supercomputing?

Supercomputing a term given to a system capable of processing at speeds much greater than commercially available CPU’s.

High throughput computing is used in describing systems with intermediate processing abilities.

Page 5: Installation of a Condor Supercomputing pool

• Distributed computing utilizes a network of many computers, each accomplishing a portion of an overall task, to achieve a computational result much more quickly than with a single computer.

Distributed computing also allows many users to interact and connect openly.

•Parallel processing is the simultaneous processing of the same task on two or more microprocessors in order to obtain faster results.

The computer resources can include a single computer with multiple processors.

Page 6: Installation of a Condor Supercomputing pool

•Parallel processing allows more intimate communication between nodes increasing efficiency.

•As the size of the network grows communication takes up a greater part of the CPU’s time

•This can be limited by using more than one type of protocol in a system

Page 7: Installation of a Condor Supercomputing pool

Condor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management.

Beowulf is a design for high-performance parallel computing clusters on inexpensive personal computer hardware. Beowulf cluster is a group of usually identical PC computers running a Free and Open Source Software (FOSS) Unix-like operating system, such as BSD, Linux or Solaris.

BOINC is a software platform for volunteer computing and desktop Grid computing. BOINC is designed to support applications that have large computation requirements, storage requirements, or both.

Page 8: Installation of a Condor Supercomputing pool

The Condor project was started in 1988.Condor was built from the results of the Remote Unix project and from the continuation of research in the area of Distribute Resource Management (DRM).

Condor was created at the University of Wisconsin-Madison (UW-Madison), and it was first installed as a production system in the UW-Madison Department of Computer Science.

Page 9: Installation of a Condor Supercomputing pool

Versatility Capability of switching between

distributive or parallel computing Multiple programming codes for

simple execution of jobs. Operates on Multiple platforms

Page 10: Installation of a Condor Supercomputing pool

Availability – Open source software Easy Expansion – Any number of

nodes can be added to an existing pool

Cost efficiency – Any CPU meeting the base requirements can be use efficiently.

Page 11: Installation of a Condor Supercomputing pool

Windows

• Condor for Windows requires Windows 2000 (or better) or Windows XP.

• 300 megabytes of free disk space is recommended. Significantly more disk space could be desired to be able to run jobs with large data files.

• Condor for Windows will operate on either an NTFS or FAT file system. However, for security purposes, NTFS is preferred.

Unix

• The size requirements for the downloads are currently vary from about 20 Mbytes (statically linked HP Unix on a PA RISC) to more than 50 Mbytes (dynamically linked Irix on an SGI).

• In addition, you will need a lot of disk space in the local directory of any machines that are submitting jobs to Condor

Page 12: Installation of a Condor Supercomputing pool

.

Condor software can be access through their main website.

Condor can be downloaded on various platform such as: Solaris, Linux/Unix, Windows, and MAC

Administrative and user manuals are also available on the website.

http://parrot.cs.wisc.edu/

Page 13: Installation of a Condor Supercomputing pool

Installation – overseen through the windows installation wizard

Changes to default :

Pool master node – Linux base machine in lane hall 10.40.20.37 having a Linux based master will allow the eventual use of the full array of condor options.

Read & Write access - parameters changed to include 10.*.*.* to allow fee back and access from different nodes.

Due to the use of the CERSER labs during class hours each node is required to be idle for 15 minutes before it is available to perform tasks. If a tasks interrupted it will be restarted on a different machine, if the original node is not freed in less than ten minutes

Page 14: Installation of a Condor Supercomputing pool

Jobs can be submitted using any executable file format through the condor/bin directory.

Jobs are submitted through the condor bin using the condor_submit filename,the status of the nodes within the system can be checked using the command condor_status,

Page 15: Installation of a Condor Supercomputing pool

condor _status command will bring up a menu given the condition that will list the current platform and availability of each node. Availability is signified by the one word qualifiers in the fourth column.

Unclaimed: The node is open but is unable to perform the specified task

Claimed: The node is currently running a specified task

Matched : The node is opened and can perform a specified task

Owner: The node has a local user demanding its attention

Page 16: Installation of a Condor Supercomputing pool

After submission a task can be traced through the pool using condor_q, command.

The results of the tasks can be seen within the output files created through the executable. or through the .log file that is created automatically for each task.

Page 17: Installation of a Condor Supercomputing pool

Condor pool composed of 17 nodes running on windows NT platform has been established in the Dixon hall laboratory. Operating under a Linux based master housed at the lane hall offices.

To date simple tasks have been submitted using C++ code and have ran successfully through the pool.

Diagnostic assessment has shown two CPU’s unconnected to the network and that there were naming redundancies which hindered the installation of the condor system.

Page 18: Installation of a Condor Supercomputing pool

Installation of Condor was a success .

Expansion of the cluster is easy and can be done efficiently with minimal cost in resources.

Management and Programming with Condor can be done on an undergraduate level and is encouraged.

Page 19: Installation of a Condor Supercomputing pool

Familiarize more of CERSER teams with Condor software.

Continue the expansion of the Condor pool .

Link ECSU to the Polar Grid network. Encourage the development of a

programs to aide future CERSER research projects.

Page 20: Installation of a Condor Supercomputing pool

1. Andrew S. Tanenbaum, Maarten Van Steen (2002): Distributed Systems Principles and Paradigms. New Jersey: Prentice- Hall Inc.

2. Amza C., A.L. Cox, S. Dwarkadas, P. Keleher, R. Rajamony H. Lu, W. Yu, and W.Zwaenepoel. ThreadMarks: Shared memory computing on networks of workstations, to appear in IEEE Computer,(draft copy): www.cs.rice.edu/willy/TreadMarks/papers.html.

3. A.J. van der Steen, An evaluation of some Beowulf clusters, Technical Report WFI-00-07, Utrecht University, Dept. of Computational Physics, December 2000. (Also available through www.euroben.nl, directory reports/.)

4. A.J. van der Steen, Overview of recent supercomputers high-end servers, June 2005, www.euroben.nl, directory reports/.

5. http://www.cs.wisc.edu/condor/manual/v7.0/

6. http://boinc.berkeley.edu/trac/wiki/BoincIntro

7. http://www.supercomputingonline.com/ads.php

Page 21: Installation of a Condor Supercomputing pool