cluster computing

56
CLUSTER COMPUTING A SEMINAR REPORT SUBMITTED TO COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY In partial fulfillment of the requirements for the award of the degree of MASTER OF COMPUTER APPLICATIONS Submitted by STIMI K.O. (Register. No: 95550052) ER&DCI INSTITUTE OF TECHNOLOGY Vellayambalam, Thiruvananthapuram. 1

Upload: vsunny488

Post on 13-Nov-2014

30 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: CLUSTER COMPUTING

CLUSTER COMPUTING

A SEMINAR REPORT SUBMITTED TO

COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGYIn partial fulfillment of the requirements for the award of the degree of

MASTER OF COMPUTER APPLICATIONS

Submitted bySTIMI K.O.

(Register. No: 95550052)

ER&DCI INSTITUTE OF TECHNOLOGYVellayambalam, Thiruvananthapuram.

1

Page 2: CLUSTER COMPUTING

ER&DCI INSTITUTE OF TECHNOLOGYCENTRE FOR DEVELOPMENT OF ADVANCED COMPUTING

Thiruvananthapuram

CERTIFICATE

Certified that this is a bonafide record of the seminar

work entitled “CLUSTER COMPUTING” by Stimi K.O.

in partial fulfillment of the requirement for the award

of the degree in Master of Computer Applications of

Cochin University of Science and Technology during

the period 2005-2008

Internal ExaminerPlace: Thiruvananthapuram

2

Page 3: CLUSTER COMPUTING

Acknowledgement

As I submit the final complete form of my seminar

entitled - “CLUSTER COMPUTING”, I wish to express my

gratitude to all who helped and supported me in bringing it to

completion.

First of all I want to express my sincere gratitude to

Smt Kumari Roshni V S, Principal, ER & DCI Institute of

Technology for her inspiration and constant encouragement, which

made me take up this seminar and bring it to completion.

I also express my deep sense of gratitude to

Sri.Parameswaran Nampoothiri and Mrs Hudlin Leo and the

teaching faculties at ER &DCI IT who conscientiously helped to

complete this seminar.

Above all, I am extremely thankful to God for giving

me the ability and Endurance for completing this seminar

3

Page 4: CLUSTER COMPUTING

ABSTRACT A computer cluster is a group of loosely coupled

computers that work together closely so that in many respects it can be

viewed as though it were a single computer. Clusters are commonly

connected through fast local area networks. Clusters are usually deployed to

improve speed and/or reliability over that provided by a single computer,

while typically being much more cost-effective than single computers of

comparable speed or reliability. Cluster computing has emerged as a result

of convergence of several trends including the availability of inexpensive

high performance microprocessors and high speed networks, the

development of standard software tools for high performance distributed

computing. Clusters have evolved to support applications ranging from

ecommerce, to high performance database applications. Clustering has

been available since the 1980s when it was used in DEC's VMS systems.

IBM's sysplex is a cluster approach for a mainframe system. Microsoft,

Sun Microsystems, and other leading hardware and software companies

offer clustering packages that are said to offer scalability as well as

availability. Cluster computing can also be used as a relatively low-cost

form of parallel processing for scientific and other applications that lend

themselves to parallel operations.

4

Page 5: CLUSTER COMPUTING

CONTENTS

1. Introduction---------------------------------------------6

2. History-----------------------------------------------------8

3. Clusters----------------------------------------------------9

4. Why Clusters? -------------------------------------------13

5. Comparing old and new--------------------------------15

6. Logical view of Clusters--------------------------------17

7. Architecture-----------------------------------------------19

8. Components of Cluster Computer--------------------29

9. Cluster Classifications-----------------------------------31

10. Issues to be considered---------------------------------32

11. Future Trends-------------------------------------------34

12. Conclusion------------------------------------------------36

13. Reference-------------------------------------------------37

5

Page 6: CLUSTER COMPUTING

INTRODUCTION Computing is an evolutionary process. Five generations of

development history— with each generation improving on the previous

one’s technology, architecture, software, applications, and representative

systems—make that clear. As part of this evolution, computing requirements

driven by applications have always outpaced the available technology. So,

system designers have always needed to seek faster, more cost effective

computer systems. Parallel and distributed computing provides the best

solution, by offering computing power that greatly exceeds the technological

limitations of single processor systems. Unfortunately, although the parallel

and distributed computing concept has been with us for over three decades,

the high cost of multiprocessor systems has blocked commercial success so

far. Today, a wide range of applications are hungry for higher computing

power, and even though single processor PCs and workstations now can

provide extremely fast processing; the even faster execution that multiple

processors can achieve by working concurrently is still needed. Now, finally,

costs are falling as well. Networked clusters of commodity PCs and

workstations using off-the-shelf processors and communication platforms

such as Myrinet, Fast Ethernet, and Gigabit Ethernet are becoming

increasingly cost effective and popular. This concept, known as cluster

computing, will surely continue to flourish: clusters can provide enormous

computing power that a pool of users can share or that can be collectively

6

Page 7: CLUSTER COMPUTING

used to solve a single application. In addition, clusters do not incur a very

high cost, a factor that led to the sad demise of massively parallel machines.

Clusters, built using commodity-off-the-shelf

(COTS) hardware components and free, or commonly used, software, are

playing a major role in solving large-scale science, engineering, and

commercial applications. Cluster computing has emerged as

a result of the convergence of several trends, including the availability of

inexpensive high performance microprocessors and high speed networks, the

development of standard software tools for high performance distributed

computing, and the increasing need of computing power for computational

science and commercial applications.

7

Page 8: CLUSTER COMPUTING

CLUSTER HISTORY

The first commodity clustering product was ARCnet,

developed by Datapoint in 1977. ARCnet wasn't a commercial success and

clustering didn't really take off until DEC released their VAXcluster product

in the 1980s for the VAX/VMS operating system. The ARCnet and

VAXcluster products not only supported parallel computing, but also shared

file systems and peripheral devices. They were supposed to give you the

advantage of parallel processing while maintaining data reliability and

uniqueness. VAXcluster, now VMScluster, is still available on OpenVMS

systems from HP running on Alpha and Itanium systems.

The history of cluster computing is intimately tied up

with the evolution of networking technology. As networking technology has

become cheaper and faster, cluster computers have become significantly

more attractive.

How to run applications faster?

There are 3 ways to improve performance:

Work Harder

Work Smarter

Get Help

Era of Computing

Rapid technical advances

8

Page 9: CLUSTER COMPUTING

the recent advances in VLSI technology

software technology

grand challenge applications have become the main driving force

Parallel computing

CLUSTERS Extraordinary technological improvements

over the past few years in areas such as microprocessors, memory, buses,

networks, and software have made it possible to assemble groups of

inexpensive personal computers and/or workstations into a cost effective

system that functions in concert and posses tremendous processing power.

Cluster computing is not new, but in company with other technical

capabilities, particularly in the area of networking, this class of machines is

becoming a highperformance platform for parallel and distributed

applications Scalable computing clusters, ranging from a cluster of

(homogeneous or heterogeneous) PCs or workstations to SMP (Symmetric

Multi Processors), are rapidly becoming the standard platforms for high-

performance and large-scale computing. A cluster is a group of independent

computer systems and thus forms a loosely coupled multiprocessor system

as shown in figure.

9

Page 10: CLUSTER COMPUTING

10

Page 11: CLUSTER COMPUTING

However, the cluster computing concept also poses three pressing research

challenges:

11

Page 12: CLUSTER COMPUTING

A cluster should be a single computing resource and provide a single system

image. This is in contrast to a distributed system where the nodes serve only

as individual resources.

It must provide scalability by letting the system scale up or down. The

scaled-up system should provide more functionality or better performance.

The system’s total computing power should increase proportionally to the

increase in resources. The main motivation for a scalable system is to

provide a flexible, cost effective Information-processing tool.

The supporting operating system and communication Mechanism must be

efficient enough to remove the performance Bottlenecks.

The concept of Beowulf clusters is originated at the Center of

Excellence in Space Data and Information Sciences (CESDIS), located at the

NASA Goddard Space Flight Center in Maryland. The goal of building a

Beowulf cluster is to create a cost effective parallel computing system from

commodity components to satisfy specific computational requirements for

the earth and space sciences community. The first Beowulf cluster was built

from 16 IntelDX4TM processors connected by a channel bonded 10 Mbps

Ethernet and it ran the Linux operating system. It was an instant success,

demonstrating the concept of using a commodity cluster as an alternative

choice for high-performance computing (HPC).

After the success of the first Beowulf cluster, several more were

built by CESDIS using several generations and families of processors and

network. Beowulf is a concept of clustering commodity computers to form a

parallel, virtual supercomputer. It is easy to build a unique Beowulf cluster

12

Page 13: CLUSTER COMPUTING

from components that you consider most appropriate for your applications.

Such a system can provide a cost-effective way to gain features and benefits

(fast and reliable services) that have historically been found only on more

expensive proprietary shared memory systems. The typical architecture of a

cluster is shown in Figure 3. As the figure illustrates, numerous design

choices exist for building a Beowulf cluster.

WHY CLUTERS?

13

Page 14: CLUSTER COMPUTING

The question may arise why clusters are

designed and built when perfectly good commercial supercomputers are

available on the market. The answer is that the latter is expensive. Clusters

are surprisingly powerful. The supercomputer has come to play a larger role

in business applications. In areas from data mining to fault tolerant

performance clustering technology has become increasingly important.

Commercial products have their place, and there are perfectly good reasons

to buy a commerciallyproduced supercomputer. If it is within our budget and

our applications can keep machines busy all the time, we will also need to

have a data center to keep it in. then there is the budget to keep up with the

maintenance and upgrades that will be required to keep our investment up to

par. However, many who have a need to harness supercomputing power

don’t buy supercomputers because they can’t afford them. Also it is

impossible to upgrade them. Clusters, on the other hand, are cheap and easy

way to take off-the-shelf components and combine them into a single

supercomputer. In some areas of research clusters are actually faster than

commercial supercomputer. Clusters also have the distinct advantage in that

they are simple to build using components available from hundreds of

sources. We don’t even have to use new equipment to build a cluster.

Price/Performance

The most obvious benefit of clusters, and the most compelling reason for the

growth in their use, is that they have significantly reduced the cost of

processing power.

One indication of this phenomenon is the Gordon Bell

Award for Price/Performance Achievement in Supercomputing, which many

of the last several years has been awarded to Beowulf type clusters. One of

14

Page 15: CLUSTER COMPUTING

the most recent entries, the Avalon cluster at Los Alamos National

Laboratory, "demonstrates price/performance an order of magnitude superior

to commercial machines of equivalent performance." This reduction in the

cost of entry to high-power computing (HPC) has been due to co

modification of both hardware and software over the last 10 years

particularly. All the components of computers have dropped dramatically in

that time. The components critical to the development of low cost clusters

are:

1. Processors - commodity processors are now capable of computational

power previously reserved for supercomputers, witness Apple Computer's

recent add campain touting the G4 Macintosh as a supercomputer.

2. Memory - the memory used by these processors has dropped in cost right

with the processors.

3. Networking Components - the most recent group of products to

experience co modification and dramatic cost decreases is networking

hardware. High- Speed networks can now be assembled with these products

for a fraction of the cost necessary only a few years ago.

4. Motherboards, busses, and other sub-systems - all of these have

become commodity products, allowing the assembly of affordable

computers from off the shelf components

COMPARING OLD AND NEW

15

Page 16: CLUSTER COMPUTING

Today, open standards-based HPC systems are

being used to solve problems from High-end, floating-point intensive

scientific and engineering problems to data intensive tasks in industry. Some

of the reasons why HPC clusters outperform RISC based systems Include:

Collaboration

Scientists can collaborate in real-time across dispersed locations- bridging

isolated islands of scientific research and discovery- when HPC clusters are

based on open source and building block technology.

Scalability

HPC clusters can grow in overall capacity because processors and nodes can

be added as demand increases.

Availability

Because single points of failure can be eliminated, if any one system

component goes Down, the system as a whole or the solution (multiple

systems) stay highly available.

Ease of technology refresh

Processors, memory, disk or operating system (OS) technology can be easily

updated, And new processors and nodes can be added or upgraded as

needed.

Affordable service and support

Compared to proprietary systems, the total cost of ownership can be much

lower. This includes service, support and training.

Vendor lock-in

The age-old problem of proprietary vs. open systems that use industry-

accepted standards is eliminated.

System manageability

16

Page 17: CLUSTER COMPUTING

The installation, configuration and monitoring of key elements of

proprietary systems is usually accomplished with proprietary technologies,

complicating system management. The servers of an HPC cluster can be

easily managed from a single point using readily available network

infrastructure and enterprise management software.

Reusability of components

Commercial components can be reused, preserving the investment. For

example, older nodes can be deployed as file/print servers, web servers or

other infrastructure servers.

Disaster recovery

Large SMPs are monolithic entities located in one facility. HPC systems can

be collocated or geographically dispersed to make them less susceptible to

disaster.

17

Page 18: CLUSTER COMPUTING

LOGICAL VIEW OF CLUSTER A Beowulf cluster uses multi computer architecture,

as depicted in figure. It features a parallel computing system that usually

consists of one or more master nodes and one or more compute nodes, or

cluster nodes, interconnected via widely available network interconnects. All

of the nodes in a typical Beowulf cluster are commodity systems- PCs,

workstations, or servers-running commodity software such as Linux.

18

Page 19: CLUSTER COMPUTING

The master node acts as a server for Network File System

(NFS) and as a gateway to the outside world. As an NFS server, the master

node provides user file space and other common system software to the

compute nodes via NFS. As a gateway, the master node allows users to gain

access through it to the compute nodes. Usually, the master node is the only

machine that is also connected to the outside world using a second network

interface card (NIC). The sole task of the compute nodes is to execute

parallel jobs. In most cases, therefore, the compute nodes do not have

keyboards, mice, video cards, or monitors. All access to the client nodes is

provided via remote connections from the master node. Because compute

nodes do not need to access machines outside the cluster, nor do machines

outside the cluster need to access compute nodes directly, compute nodes

commonly use private IP addresses, such as the 10.0.0.0/8 or 192.168.0.0/16

address ranges. From a user’s perspective, a Beowulf cluster appears as a

Massively Parallel Processor (MPP) system. The most common methods of

using the system are to access the master node either directly or through

Telnet or remote login from personal workstations. Once on the master node,

users can prepare and compile their parallel applications, and also spawn

jobs on a desired number of compute nodes in the cluster. Applications must

be written in parallel style and use the message-passing programming

model. Jobs of a parallel application are spawned on compute nodes, which

work collaboratively until finishing the application. During the execution,

compute nodes use 10 standard message-passing middleware, such as

Message Passing Interface (MPI) and Parallel Virtual Machine (PVM), to

exchange information.

19

Page 20: CLUSTER COMPUTING

ARCHITECTURE A cluster is a type of parallel or distributed

processing system, which consists of a collection of interconnected stand-

alone computers cooperatively working together as a single, integrated

computing resource

A node:

a single or multiprocessor system with memory, I/O facilities, & OS

generally 2 or more computers (nodes) connected together

in a single cabinet, or physically separated & connected via a LAN

appear as a single system to users and applications

provide a cost-effective way to gain features and benefits

20

Page 21: CLUSTER COMPUTING

Three principle features usually provided by cluster

computing are availability, scalability and simplification. Availability is

provided by the cluster of computers operating as a single system by

continuing to provide services even when one of the individual computers is

lost due to a hardware failure or other reason. Scalability is provided by the

inherent ability of the overall system to allow new components, such as

computers, to be assed as the overall system's load is increased. The

simplification comes from the ability of the cluster to allow administrators to

manage the entire group as a single system. This greatly simplifies the

management of groups of systems and their applications. The goal of cluster

computing is to facilitate sharing a computer load over several systems

without either the users of system or the administrators needing to know that

more than one system is involved. The Windows NT Server Edition of the

Windows operating system is an example of a base operating system that has

been modified to include architecture that facilitates a cluster computing

environment to be established.

Cluster computing has been employed for over

fifteen years but it is the recent demand for higher availability in small

businesses that has caused an explosion in this field. Electronic databases

and electronic malls have become essential to the daily operation of small

businesses. Access to this critical information by these entities has created a

large demand for cluster computing principle features.

21

Page 22: CLUSTER COMPUTING

There are some key concepts that must be

understood when forming a cluster computing resource. Nodes or systems

are the individual members of a cluster. They can be computers, servers, and

other such hardware although each node generally has memory and

processing capabilities. If one node becomes unavailable the other nodes can

carry the demand load so that applications or services are always available.

There must be at least two nodes to compose a cluster structure otherwise

they are just called servers. The collection of software on each node that

manages all cluster specific activity is called the cluster service. The cluster

service manages all of the resources, the canonical items in the system, and

sees then as identical opaque objects. Resources can be such things as

physical hardware devices, like disk drives and network cards, logical items,

like logical disk volumes, TCP/IP addresses, applications, and databases.

22

Page 23: CLUSTER COMPUTING

When a resource is providing its service on a

specific node it is said to be on-line. A collection of resources to be managed

as a single unit is called a group. Groups contain all of the resources

necessary to run a specific application, and if need be, to connect to the

service provided by the application in the case of client systems. These

groups allow administrators to combine resources into larger logical units so

that they can be managed as a unit. This, of course, means that all operations

performed on a group affect all resources contained within that group.

Normally the development of a cluster computing system occurs in phases.

The first phase involves establishing the underpinnings into the base

operating system and building the foundation of the cluster components.

These things should focus on providing enhanced availability to key

applications using storage that is accessible to two nodes. The following

stages occur as the demand increases and should allow for much larger

clusters to be formed. These larger clusters should have a true distribution of

applications, higher performance interconnects, widely distributed storage

for easy accessibility and load balancing. Cluster computing will become

even more prevalent in the future because of the growing needs and

demands of businesses as well as the spread of the Internet.

Clustering Concepts

Clusters are in fact quite simple. They are a bunch of computers tied

together with a network working on a large problem that has been broken

down into smaller pieces. There are a number of different strategies we can

use to tie them together. There are also a number of different software

packages that can be used to make the software side of things work.

23

Page 24: CLUSTER COMPUTING

Parallelism

The name of the game in high performance computing is parallelism. It is

the quality that allows something to be done in parts that work

independently rather than a task that has so many interlocking dependencies

that it cannot be further broken down. Parallelism operates at two levels:

hardware parallelism and software parallelism.

Hardware Parallelism

On one level hardware parallelism deals with the CPU of an individual

system and how we can squeeze performance out of sub-components of the

CPU that can speed up our code. At another level there is the parallelism that

is gained by having multiple systems working on a computational problem

in a distributed fashion. These systems are known as ‘fine grained’ for

parallelism inside the CPU or having to do with the multiple CPUs in the

same system, or ‘coarse grained’ for parallelism of a collection of separate

systems acting in concerts.

CPU Level Parallelism

A computer’s CPU is commonly pictured as a device that operates on one

instruction after another in a straight line, always completing one-step or

instruction before a new one is started. But new CPU architectures have an

inherent ability to do more than one thing at once. The logic of CPU chip

divides the CPU into multiple execution units. Systems that have multiple

execution units allow the CPU to attempt to process more than one

instruction at a time. Two hardware features of modern CPUs support

multiple execution units: the cache – a small memory inside the CPU. The

pipeline is a small area of memory inside the CPU where instructions that

are next in line to be executed are stored. Both cache and pipeline allow

impressive increases in CPU performances.

24

Page 25: CLUSTER COMPUTING

System level Parallelism

It is the parallelism of multiple nodes coordinating to work on a problem in

parallel that gives the cluster its power. There are other levels at which even

more parallelism can be introduced into this system. For example if we

decide that each node in our cluster will be a multi CPU system we will be

introducing a fundamental degree of parallel processing at the node level.

Having more than one network interface on each node introduces

communication channels that may be used in parallel to communicate

with other nodes in the cluster. Finally, if we use multiple disk drive

controllers in each node we create parallel data paths that can be used to

increase the performance of I/O subsystem.

Software Parallelism

Software parallelism is the ability to find well defined areas in a problem we

want to solve that can be broken down into self-contained parts. These parts

are the program elements that can be distributed and give us the speedup that

we want to get out of a high performance computing system. Before we can

run a program on a parallel cluster, we have to ensure that the problems we

are trying to solve are amenable to being done in a parallel fashion. Almost

any problem that is composed of smaller subproblems that can be quantified

can be broken down into smaller problems and run on a node on a cluster.

System-Level Middleware

System-level middleware offers Single System Image (SSI) and high

availability infrastructure for processes, memory, storage, I/O, and

networking. The single system image illusion can be implemented using the

hardware or software infrastructure. This unit focuses on SSI at the

operating system or subsystems level.

25

Page 26: CLUSTER COMPUTING

A modular architecture for SSI allows the use of

services provided by lower level layers to be used for the implementation of

higher-level services. This unit discusses design issues, architecture, and

representative systems for job/resource management, network RAM,

software RAID, single I/O space, and virtual networking. A number of

operating systems have proposed SSI solutions, including MOSIX,

Unixware, and Solaris -MC. It is important to discuss one or more such

systems as they help students to understand architecture and implementation

issues.

Message Passing Primitives

Although new high-performance protocols are available for cluster

computing, some instructors may want provide students with a brief

introduction to message passing programs using the BSD Sockets interface

Transmission Control Protocol/Internet Protocol (TCP/IP) before

introducing more complicated parallel programming with distributed

memory programming tools. If students have already had a course in data

communications or computer networks then this unit should be skipped.

Students should have access to a networked computer lab with the Sockets

libraries enabled. Sockets usually come installed on Linux workstations.

Parallel Programming Using MPI

An introduction to distributed memory programming using a standard tool

such as Message Passing Interface (MPI)[23] is basic to cluster computing.

Current versions of MPI generally assume that programs will be written in

C, C++, or Fortran. However, Java-based versions of MPI are becoming

available.

26

Page 27: CLUSTER COMPUTING

Application-Level Middleware

Application-level middleware is the layer of software between the operating

system and applications. Middleware provides various services required by

an application to function correctly. A course in cluster programming can

include some coverage of middleware tools such as CORBA, Remote

Procedure Call, Java Remote Method Invocation (RMI), or Jini. Sun

Microsystems has produced a number of Java-based technologies that can

become units in a cluster programming course, including the Java

Development Kit (JDK) product family that consists of the essential tools

and APIs for all developers writing in the Java programming language

through to APIs such as for telephony (JTAPI), database connectivity

(JDBC), 2D and 3D graphics, security as well as electronic commerce.

These technologies enable Java to interoperate with many other devices,

technologies, and software standards.

Single System image

A single system image is the illusion, created by software or hardware, that

presents a collection of resources as one, more powerful resource. SSI makes

the cluster appear like a single machine to the user, to applications, and to

the network. A cluster without a SSI is not a cluster. Every SSI has a

boundary. SSI support can exist at different levels within a system, one able

to be build on another.

27

Page 28: CLUSTER COMPUTING

Single System Image Benefits

Provide a simple, straightforward view of all system resources and

activities, from any node of the cluster

Free the end user from having to know where an application will run

Free the operator from having to know where a resource is located

Let the user work with familiar interface and commands and allows

the administrators to manage the entire clusters as a single entity

Reduce the risk of operator errors, with the result that end users see

improved reliability and higher availability of the system

Allowing centralize/decentralize system management and control to

avoid the need of skilled administrators from system administration

Present multiple, cooperating components of an application to the

administrator as a single application

28

Page 29: CLUSTER COMPUTING

Greatly simplify system management

Provide location- independent message communication

Help track the locations of all resource so that there is no longer any

need for system operators to be concerned with their physical

location

Provide transparent process migration and load balancing across

nodes.

Improved system response time and performance

High speed networks

Network is the most critical part of a cluster. Its capabilities and

performance directly influences the applicability of the whole system for

HPC. Starting from Local/Wide Area Networks (LAN/WAN) like Fast

Ethernet and ATM, to System Area Networks (SAN) like Myrinet and

Memory Channel

Eg. Fast Ethernet

100 Mbps over UTP or fiber-optic cable

MAC protocol: CSMA/CD

29

Page 30: CLUSTER COMPUTING

COMPONENTS OF CLUSTER COMPUTER

1. Multiple High Performance Computers

a. PCs

b. Workstations

c. SMPs (CLUMPS)

2. State of the art Operating Systems

a. Linux (Beowulf)

b. Microsoft NT (Illinois HPVM)

c. SUN Solaris (Berkeley NOW)

d. HP UX (Illinois - PANDA)

e. OS gluing layers(Berkeley Glunix)

3. High Performance Networks/Switches

a. Ethernet (10Mbps),

b. Fast Ethernet (100Mbps),

c. Gigabit Ethernet (1Gbps)

d. Myrinet (1.2Gbps)

e. Digital Memory Channel

f. FDDI

4. Network Interface Card

a. Myrinet has NIC

b. User-level access support

5. Fast Communication Protocols and Services

a. Active Messages (Berkeley)

b. Fast Messages (Illinois)

c. U-net (Cornell)

d. XTP (Virginia)

30

Page 31: CLUSTER COMPUTING

6. Cluster Middleware

a. Single System Image (SSI)

b. System Availability (SA) Infrastructure

7. Hardware

a. DEC Memory Channel, DSM (Alewife, DASH), SMP Techniques

8. Operating System Kernel/Gluing Layers

a. Solaris MC, Unixware, GLUnix

9. Applications and Subsystems

a. Applications (system management and electronic forms)

b. Runtime systems (software DSM, PFS etc.)

c. Resource management and scheduling software (RMS)

10. Parallel Programming Environments and Tools

a. Threads (PCs, SMPs, NOW..)

b. MPI

c. PVM

d. Software DSMs (Shmem)

e. Compilers

f. RAD (rapid application development tools)

g. Debuggers

h. Performance Analysis Tools

i. Visualization Tools

11. Applications

a. Sequential

b. Parallel / Distributed (Cluster-aware app.)

31

Page 32: CLUSTER COMPUTING

CLUSTER CLASSIFICATIONSClusters are classified in to several sections based on the

facts such as 1)Application target 2) Node owner ship 3)

Node Hardware 4) Node operating System 5) Node

configuration.

Clusters based on Application Target are again classified into

two:

High Performance (HP) Clusters

High Availability (HA) Clusters

Clusters based on Node Ownership are again classified into

two:

Dedicated clusters

Non-dedicated clusters

Clusters based on Node Hardware are again classified into

three:

Clusters of PCs (CoPs)

Clusters of Workstations (COWs)

Clusters of SMPs (CLUMPs)

Clusters based on Node Operating System are again

classified into:

Linux Clusters (e.g., Beowulf)

Solaris Clusters (e.g., Berkeley NOW)

Digital VMS Clusters

HP-UX clusters

Microsoft Wolfpack clusters

32

Page 33: CLUSTER COMPUTING

Clusters based on Node Configuration are again classified

into:

Homogeneous Clusters -All nodes will have similar

architectures and run the same OSs

Heterogeneous Clusters- All nodes will have different

architectures and run different OSs

ISSUES TO BE CONSIDEREDCluster Networking

If you are mixing hardware that has different networking technologies, there

will be large differences in the speed with which data will be accessed and

how individual nodes can communicate. If it is in your budget make sure

that all of the machines you want to include in your cluster have similar

networking capabilities, and if at all possible, have network adapters from

the same manufacturer.

Cluster Software

You will have to build versions of clustering software for each kind of

system you include in your cluster.

Programming

Our code will have to be written to support the lowest common denominator

for data types supported by the least powerful node in our cluster. With

mixed machines, the more powerful machines will have attributes that

cannot be attained in the powerful machine.

Timing

This is the most problematic aspect of heterogeneous cluster. Since these

machines have different performance profile our code will execute at

different rates on the different kinds of nodes. This can cause serious

33

Page 34: CLUSTER COMPUTING

bottlenecks if a process on one node is waiting for results of a calculation on

a slower node. The second kind of heterogeneous clusters is made from

different machines in the same architectural family: e.g. a collection of Intel

boxes where the machines are different generations or machines of same

generation from different manufacturers.

Network Selection

There are a number of different kinds of network topologies, including

buses, cubes of various degrees, and grids/meshes. These network topologies

will be implemented by use of one or more network interface cards, or NICs,

installed into the head-node and compute nodes of our cluster.

Speed Selection

No matter what topology you choose for your cluster, you will want to get

fastest network that your budget allows. Fortunately, the availability of high

speed computers has also forced the development of high speed networking

systems. Examples are 10Mbit Ethernet, 100Mbit Ethernet, gigabit

networking, channel bonding etc.

34

Page 35: CLUSTER COMPUTING

FUTURE TRENDS - GRID COMPUTING

As computer networks become cheaper and faster, a new computing

paradigm, called the Grid has evolved. The Grid is a large system of

computing resources that performs tasks and provides to users a single point

of access, commonly based on the World Wide Web interface, to these

distributed resources. Users consider the Grid as a single computational

resource. Resource management software, frequently referenced as

middleware, accepts jobs submitted by users and schedules them for

execution on appropriate systems in the Grid, based upon resource

management policies. Users can submit thousands of jobs at a time without

being concerned about where they run. The Grid may scale from single

systems to supercomputer-class compute farms that utilize thousands of

processors. Depending on the type of applications, the interconnection

between the Grid parts can be performed using dedicated high-speed

networks or the Internet. By providing scalable, secure, high-performance

mechanisms for discovering and negotiating access to remote resources, the

Grid promises to make it possible for scientific collaborations to share

resources on an unprecedented scale, and for geographically distributed

groups to work together in ways that were previously impossible. Several

examples of new applications that benefit from using Grid technology

35

Page 36: CLUSTER COMPUTING

constitute a coupling of advanced scientific instrumentation or desktop

computers with remote supercomputers; collaborative design of complex

systems via high-bandwidth access to shared resources; ultra-large virtual

supercomputers constructed to solve problems too large to fit on any single

computer; rapid, large-scale parametric studies.

The Grid technology is currently under intensive

development. Major Grid projects include NASA’s Information Power Grid,

two NSF Grid projects (NCSA Alliance’s Virtual Machine Room and

NPACI), the European DataGrid Project and the ASCI Distributed Resource

Management project. Also first Grid tools are already available for

developers. The Globus Toolkit [20] represents one such example and

includes a set of services and software libraries to support Grids and Grid

applications.

36

Page 37: CLUSTER COMPUTING

CONCLUSIONClusters are promising

Solve parallel processing paradox

Offer incremental growth and matches with funding pattern

New trends in hardware and software technologies are likely to make

clusters more promising and fill SSI gap.

Clusters based supercomputers (Linux based clusters) can be seen

everywhere!

37

Page 38: CLUSTER COMPUTING

REFERENCE www.buyya.com

www.beowulf.org

www.clustercomp.org

www.sgi.com

www.thu.edu.tw/~sci/journal/v4/000407.pdf

www.dgs.monash.edu.au/~rajkumar/cluster

www.cfi.lu.lv/teor/pdf/LASC_short.pdf

www.webopedia.com

www.howstuffworks.com

38

Page 39: CLUSTER COMPUTING

39