computer_clustering_technologies

Computer Clustering Technologies

By

Manish Chopra

Corporate Trainer

IIHT Ltd. Bangalore

August, 2008

Computer Clustering Technologies – IIHT Ltd.

Though Clustering of computers requires deep and dedicated research in computer science, the

scope of this document limits to the following topics:

1. Introduction to Clusters

2. History of Computing Clusters

3. Types and Categorization of Clusters

4. Cluster Components and Resources

5. Cluster Implementations

1. Introduction to Clusters

The term “Clustering” is an area of scientific study, practice and implementation in computer

science where “Technology Infrastructure” is utilized in the form of a collection of

interconnected computer systems collaborating to achieve specific end goals. The concept of a

cluster involves taking two or more computers and organizing them to work together to provide

higher availability, reliability and scalability, than can be obtained by using a single computer.

Some clusters are designed to deliver high performance, with respect to speed and efficiency

of applications that run on them.

When any kind of failure occurs in a cluster, redirection of resources to alternative ones are

carried out immediately and the workload is redistributed. The end user may experience only a

short lag in service, most often without even knowing of an occurrence of a component

breakdown in the cluster.

A computer cluster may be a group of loosely coupled computers that work together and to the

end-user they can be viewed as a single large computing system. Clusters are commonly

connected through high-speed networks, and are usually designed and deployed to improve

speed and reliability, while typically being more cost-effective. A well-designed cluster solution

uses redundant systems and components so that the failure of an individual server does not affect

the availability of the entire cluster.

2. History of Computing Clusters

The formal engineering basis of cluster computing as a means of doing parallel work of any sort

was invented by Gene Amdahl of IBM, who in 1967 published a seminal paper on parallel

processing (http://www-inst.eecs.berkeley.edu/~n252/paper/Amdahl.pdf). This was regarded as

Amdahl's Law, which describes mathematically the speedup one can expect from parallelizing

any given otherwise serially performed task on a parallel architecture. This article defined the

engineering basis for both multiprocessor computing and cluster computing, where the primary

differentiator is whether or not the inter-processor communications are supported "inside" the

computer or "outside" the computer on an external network.

2


Coincidentally the history of early computer clusters is directly tied with the history of early

networks, as one of the primary motivations for the development of a network was to link

computing resources, creating a de facto computer cluster. Packet switching networks were

conceptually invented by the RAND corporation in 1962. Using the concept of a packet switched

network, the ARPANET project succeeded in creating the world's first network based computer

cluster in 1969 by linking four different computer centers. ARPANET (Advanced Research

Projects Agency Network) was the world's first operational packet switching network, and is

known as the predecessor of the present global Internet.

Figure 1 - ARPANET logical map, March 1977

The development of customer-built and research clusters preceded hand in hand with that of both

networks and the UNIX operating system from the early 1970s, as both TCP/IP and the

Xerox-PARC project created and formalized protocols for network-based communications.

However, it wasn't until 1983 that the protocols and tools for easily doing remote job distribution

and file sharing were defined, (in context of BSD UNIX, as implemented by Sun Microsystems)

and hence became generally available commercially, along with a shared file system.

The first commercial clustering product was ARCnet, developed by Datapoint in 1977. ARCnet

wasn't a commercial success and clustering didn't really take off until DEC released their

VAXcluster product in the 1984 for the VAX/VMS operating system. The ARCnet and

VAXcluster products not only supported parallel computing, but also shared file systems and

peripheral devices. They were supposed to give you the advantage of parallel processing, while

3


maintaining data reliability and uniqueness. VAXcluster, now VMScluster, is still available on

OpenVMS systems from HP running on Alpha and Itanium systems.

Figure 2 - A Beowulf Cluster

The above Beowulf Cluster in the lab of College Of Engineering, Boise State University houses

a 61 node Beowulf cluster with 122 2.4 GHz Xeon processors. The lab is sponsored by National

Science Foundation.

No history of commodity compute clusters would be complete without noting the pivotal role

played by the development of Parallel Virtual Machine (PVM) software in 1989

(http://www.csm.ornl.gov/pvm/pvm_home.html). This open source software based on TCP/IP

communications enabled the instant creation of a virtual supercomputer, a high performance

compute cluster made out of any TCP/IP connected systems. Free form heterogeneous clusters

built on top of this model rapidly achieved total throughput in FLOPS (Floating point Operations

Per Second) that greatly exceeded that available even with the most expensive supercomputers.

In 1993, PVM and the advent of inexpensive networked computers led NASA on a project to

build supercomputers out of commodity clusters. NASA developed a Beowulf cluster in 1995,

which was a compute cluster built on top of a commodity network for the specific purpose of

"being a supercomputer" capable of performing tightly coupled parallel HPC computations. This

in turn spurred the independent development of Grid computing as a named entity, although

4


Grid-style clustering had been around at least as long as the UNIX operating system and the

Arpanet were named.

Two other noteworthy early commercial clusters were the Tandem Himalaya (a 1994 high-

availability product) and the IBM S/390 Parallel Sysplex (in 1994, primarily for business use).

3. Types and Categorization of Clusters

Understanding some of the clustering technologies will allow us to see where our approach to

clustering fits in, and how some organizations use clustering for their goals. As a result of

implementing multiple computer systems into a single “whole”, two aspirations can be identified

for a cluster employing different hardware components:

• Parallel processing

• Application availability

The above two not being the only aspirations, they are broad enough to categorize some key

technologies that involve clustering techniques. Immense amount of research has been done by

various technology vendors on clustering techniques, since the requirements of organizations for

a cluster implementation varies depending on the type and size of cluster. Computing Clusters

may be categorized in to the following types:

1. High Performance Clusters

2. High Availability Clusters

3. Load Balancing Clusters

4. Database Clusters

5. Web Server Clusters

6. Storage Clusters

7. Single System Image

8. Grid computing

High Performance Clusters (HPC): High-performance clusters are implemented primarily to

provide increased performance by splitting a computational task across many different nodes in

the cluster, and are most commonly used in scientific computing. The nodes are "coupled" in

such a way that they run their own instance of the operating system but provide a single interface

to the user. Events are managed via a central coordinator of the "coupling facility." Individual

nodes will need access to a common pool of I/O devices, possibly via a Storage Area Network

(SAN). The underlying operating system provides some form of cluster-wide filesystem or

allows cluster-wide access to raw disk space. HPC clusters are optimized for workloads which

require jobs or processes happening on the separate cluster computer nodes to communicate

actively during the computation. These include computations where intermediate results from

one node's calculations will affect future calculations on other nodes.

Six of the Top 10 (http://www.top500.org) fastest computers in the world are running as high-

performance clusters. Hewlett Packard offers the HyperPlex product to provide high-

5


performance parallel processing. One of the more popular HPC implementations is a cluster with

nodes running Linux as the OS and free software to implement the parallelism. This

configuration is often referred to as a Beowulf cluster. Such clusters commonly run custom

programs which have been designed to exploit the parallelism available on HPC clusters. Many

such programs use libraries such as MPI which are specially designed for writing scientific

applications for HPC computers.

High Availability Clusters (HA): Also known as Failover Clusters are computer clusters that

are implemented primarily for the purpose of providing high availability of services which the

cluster provides. They operate by having redundant computers or nodes which are then used to

provide service when system components fail. Normally, if a server with a particular application

crashes, the application will be unavailable until someone fixes the crashed server. HA clustering

remedies this situation by detecting hardware/software faults, and immediately restarting the

application on another system without requiring administrative intervention, a process known as

Failover.

In these clusters, we are providing a computing resource that is able to sustain failures. Here, we

are looking at the types of outages that affect our ability to continue processing. Outages come in

two main forms: planned and unplanned. Planned outages include software and hardware

upgrades, maintenance, or regulatory conditions that effectively stops processing. Unplanned

outages are the difficult ones: power failures, failed hardware, software bugs, human error,

natural disasters, and terrorism, to name only a few. We don't know in advance when they are

going to happen.

Figure 3 - A two node High Availability Cluster network diagram

6


HA cluster implementations attempt to build redundancy into a cluster to eliminate single points

of failure, including multiple network connections and data storage which is multiply connected

via Storage area networks. HA clusters usually use a heartbeat “private network connection”

which is used to monitor the health and status of each node in the cluster.

There are many commercial implementations of High-Availability clusters for many operating

systems. Hewlett Packard's Serviceguard, IBM's HACMP, and Veritas Cluster Services are all

examples of High Availability Clusters. The Linux-HA project is one commonly used free

software HA package for the Linux Operating System.

Load Balancing Clusters: Load balancing is a technique to spread work between two or more

computers, network links, CPUs, hard drives, or other resources in order to get optimal resource

utilization, throughput, or response time. Using multiple components with load balancing,

instead of a single component, may increase reliability through redundancy. The balancing

service is usually provided by a dedicated program or hardware device.

Here, the user gets a view of a single computing resource insofar as submitting jobs to be

executed is concerned. This will provide a single interface to the user as well as distribute

workload over all nodes in the cluster. Each node is running its own instance of the operating

system, and in some installations, the operating systems can be heterogeneous. We hope to

achieve high throughput because we have multiple machines to run our jobs, as well as high

availability because the loss of a single node means that we can rerun a particular job on another

node. This will require all nodes to have access to all data and applications necessary to run a

job.

Load balancing clusters operate by having all workload come through one or more load-

balancing front ends, which then distribute it to a collection of back end servers. Although they

are implemented primarily for improved performance, they commonly include high-availability

features as well. Such a cluster of computers is sometimes referred to as a server farm. There are

many commercial load balancers available including Platform LSF HPC, Moab Cluster Suite

and Maui Cluster Scheduler. The Linux Virtual Server project provides one commonly used

free software package for the Linux OS.

Database Clusters: Many database vendors now offer the facility to have a single database

managed from several nodes, like Oracle Parallel Server. The user simply sends a request to

the database management processes, but in this case the processes can be running concurrently

on multiple nodes. Consistency is maintained by sophisticated message passing between the

database management processes themselves. Each node is running its own instance of the

operating system. All nodes in the cluster are connected to the disks holding the database and

will have to provide either a cluster-wide filesystem or simple raw disk access managed by the

database processes themselves.

Web Server Clusters: In this solution, there is a front-end Web server that receives all

incoming requests from the Internet/Intranet. Instead of servicing these thousands of requests, the

dispatcher sends individual requests to a farm of actual Web servers that will individually

construct a response and send the response back to the originator directly. In effect, the

7


dispatcher is acting as a load balancer. Some solutions simply send requests to backend servers

on a round-robin basis. Others are more content aware and send requests based on the likely

response times. Local Director from Cisco Systems, ACEdirector from Alteon, and HP e-

Commerce Traffic Director Server Appliance are examples of Web-cluster solutions. One

thing to be aware of is the redundancy in these solutions; Local Director, for example, is a

hardware-based solution. This could now be your Single Point of Failure. While being able to

achieve higher performance in responding to individual requests, we may need to investigate

further whether individual solutions can support improved availability through redundancy of

devices.

Storage Clusters: A common problem for systems is being limited in the number of devices

they can attach to. Interfaces like PCI are common these days, but to have a server with more

than10 interfaces is not so common because the individual costs of such servers are normally

high. A solution would be to a collection of small blade servers all connected to a common pool

of storage devices. Today, we would call this pool of storage devices a SAN, a Storage Area

Network. Back in the late 1980s and early 1990s, Tandem (as it was known then) developed a

product known as ServerNet, which was based on the concept known as System Area Network

(SAN). Here, we have inter-device links as well as inter-host links. All nodes can, if necessary,

see all devices. Centralized storage management has benefits for ease of management, for

example, performing a backup to a tape device. Because we have inter-device communication,

we can simply instruct a disk to send I/O to a tape without going through the system/memory bus

of the individual host. Today, we would call this a serverless backup. We gain higher throughput

in this instance, although we need to be careful that the communications medium used has the

bandwidth and low-latency needed to respond to multiple requests from multiple hosts. High

availability is not necessarily achieved because individual devices like disks and tapes are still

Single Points of Failure.

Single System Image (SSI): An SSI is a conceptual notion of providing, at some level, a single

instance of "a thing." All of the designs we have listed above can be viewed as SSIs, but it

depends at what level of abstraction you view them from. A simple load-leveling batch scheduler

can be seen as an SSI from the perspective of a user submitting batch jobs. A Web cluster is an

SSI because the user at home has no concept of who, what, or how many devices are capable of

responding to his query; he gets a single response. Database users submit their queries and their

screens are updated accordingly, regardless of which node performed their query. Nuclear

scientists at Los Alamos National Laboratory perform their massive nuclear weapon simulations

unaware of which nodes are processing their individual calculations. As we can see here, these

SSIs appear to operate at different levels. This is true for all SSIs. There are three main levels at

which an SSI can exist: application, operating system (kernel), and hardware. An individual

SSI can support multiple levels, each building on the other. Hewlett Packard's (formerly DEC's)

TruCluster technology is an example. We have a cluster at the operating system level and the

application level providing to the cluster administrator a "single image" of the operating system

files held on a central machine. Installing and updating operating system software happens only

once, on the SSI files. The individual systems themselves are independent nodes running

effectively their own instance of the operating system.

The SSI also has a concept known as a "boundary." In the case of TruCluster, the boundary

8


would be at the operating system level. Anything performed outside the boundary obliterates the

idea of a unified computing resource; for example, in the case of TruCluster, performing

hardware upgrades on an individual node exposes the "individuality" of single nodes in the

cluster. Another example is openMosix which is a Linux kernel extension for SSI clustering.

This kernel extension turns a network of ordinary computers into a supercomputer for Linux

applications.

Grid computing: The key differences between grids and traditional clusters are that grids

connect collections of computers which do not fully trust each other, and hence operate more like

a computing utility than like a single computer. In addition, grids typically support more

heterogeneous collections than are commonly supported in clusters.

Grid computing is optimized for workloads which consist of many independent jobs or packets

of work, which do not have to share data between the jobs during the computation process. Grids

serve to manage the allocation of jobs to computers which will perform the work independently

of the rest of the grid cluster. Resources such as storage may be shared by all the

nodes, but

intermediate results of one job do not affect other jobs in progress on other nodes of the grid.

Grid computing is a form of distributed computing whereby a "super and virtual computer" is

composed of a cluster of networked, loosely-coupled computers, acting in concert to perform

very large tasks. This technology has been applied to computationally-intensive scientific,

mathematical, and academic problems through volunteer computing, and it is used in commercial

enterprises for such diverse applications as drug discovery, economic forecasting, seismic

analysis, and back-office data processing in support of e-commerce and web services.

What distinguishes grid computing from typical cluster computing systems is that grids tend to

be more loosely coupled, heterogeneous, and geographically dispersed. Also, while a computing

grid may be dedicated to a specialized application, it is often constructed with the aid of general

purpose grid software libraries and middleware.

Grid computing is presently being applied successfully by the National Science Foundation,

NASA's Information Power Grid, Pratt & Whitney, Bristol-Myers Squibb, and American

Express. Another well-known project is distributed.net, which was started in 1997 and has run a

number of successful projects in its history. The Enabling Grids for E-science project, which is

based in the European Union and includes sites in Asia and the United States, is a follow

up

project to the European Data Grid (EDG) and is arguably the largest computing grid on

the

planet. This, along with the LHC Computing Grid has been developed to support the experiments

using the CERN Large Hadron Collider.

4. Cluster Components and Resources

A Computer Cluster consists of various hardware components like CPU cabinets mounted on

Racks, Storage Arrays and SANs, Network communication infrastructure, and mostly

importantly system and application software to run an organized cluster constituting these

components.


MPI (Message Passing Interface) is a widely-available communications library that enables

parallel programs to be written in C and Fortran, for example in the climate modeling program.

The GNU/Linux world sports various cluster software, such as:

• Beowulf, distcc, MPICH and other - mostly specialized application clustering. distcc

provides parallel compilation when using GCC.

• Linux Virtual Server, Linux-HA - director-based clusters that allow incoming requests for

services to be distributed across multiple cluster nodes.

• MOSIX, openMosix, Kerrighed, OpenSSI - full-blown clusters integrated into the kernel

that provide for automatic process migration among homogeneous nodes. OpenSSI,

openMosix and Kerrighed are single-system image implementations.

DragonFly BSD, a recent fork of FreeBSD 4.8 is being redesigned at its core to enable native

clustering capabilities. It also aims to achieve single-system image capabilities.

MSCS is Microsoft's high-availability cluster service for Windows. Based on technology

developed by Digital Equipment Corporation, the current version supports up to eight nodes in a

single cluster, typically connected to a SAN. A set of APIs support cluster-aware applications,

generic templates provide support for non-cluster aware applications.

Open Source Clustering Software

• Kerrighed

• Linux-Cluster Project and Global File System for High-Avalability

• Maui Cluster Scheduler

• OpenSSI HA, HPC, and load-balancing clusters with or without a SAN.

• OpenMosix

• OpenSCE

• Open Source Cluster Application Resources (OSCAR)

• Rocks Cluster Distribution

• Sun GridEngine

• TORQUE Resource Manager

• WareWulf

Clustering products

• Alchemi

• BOINC • HP's OpenVMS

• IBM's HACMP and Parallel Sysplex

• KeyCluster • United Devices Grid MP • Linux-HA • MC Service Guard for HP-UX systems • Microsoft Cluster Server (MSCS) • Moab Cluster Suite • Platform LSF • NEC ExpressCluster • Novell Cluster Services, Novell Cluster Services for Linux and NetWare

• Oracle Real Application Cluster (RAC) • PolyServe

10

• Red Hat Cluster Suite

• SteelEye LifeKeeper

• Sun N1 GridEngine Sun N1 GridEngine

• Veritas Cluster Server (VCS) • Scyld Beowulf Cluster • Platform Rocks

5. Cluster Implementations


There are several thousands of cluster implementations across the globe and those may be

as

small as a two-node cluster, while on the other end there are supercomputers constituting

numerous nodes, providing High Performance, High Availability, Load Balancing and multitude

of clustering features.

The TOP500 organization publishes the 500 fastest computers twice a year, usually including

many clusters on their list. The current top supercomputer is the Department of Energy's

BlueGene/L system with performance of 280.6 TeraFlops. The second place is owned by another

BlueGene/L system with performance of 91.29 TeraFlops.

Figure 4 - IBM’s Blue Gene/L supercomputer

In the latest rankings, three of the top five computers are located outside of the United States.

Significantly, fourth place went to a Hewlett-Packard supercomputer located at the

Computational Research Laboratories in Pune, India.

Clustering can provide significant performance benefits versus price. The System X

supercomputer at Virginia Tech, the twentieth most powerful supercomputer on Earth as of

November 2005, is a 12.25 TFlops computer cluster of 1100 Apple XServe G5 2.3 GHz dual

processor machines (4 GB RAM, 80 GB SATA HD) running Mac OS X. The cluster initially

11


consisted of Power Mac G5s; the XServe's are smaller, reducing the size of the cluster. The total

cost of the previous Power Mac system was $5.2 million, a tenth of the cost of slower mainframe

supercomputers.

Figure 5 – NASA’s Columbia Supercomputer

Ranked the fourth fastest supercomputer in the world on the November 2005 Top500 list,

Columbia has increased the NASA’s total high-end computing, storage, and network capacity

tenfold. This has enabled advances in science not previously possible on NASA’s high-end

systems. It consists of a 10,240-processor SGI Altix system comprised of 20 nodes, each with

512 Intel Itanium 2 processors, and running a Linux operating system.

Internet References:

• http://en.wikipedia.org/wiki/Computer_cluster

• http://www.nasa.gov/centers/goddard

Book References:

• Charles Keenan : HP-UX CSE Official Study Guide and Desk Reference

• Karl Kopper : The Linux Enterprise Cluster

12

computer_clustering_technologies

Documents

computer science

single computer

network based computer

cluster components

entire cluster

cluster implementations

history of computing

different computer centers