computer_clustering_technologies
TRANSCRIPT
Computer Clustering Technologies
By
Manish Chopra
Corporate Trainer
IIHT Ltd. Bangalore
August, 2008
Computer Clustering Technologies – IIHT Ltd.
Though Clustering of computers requires deep and dedicated research in computer science, the
scope of this document limits to the following topics:
1. Introduction to Clusters
2. History of Computing Clusters
3. Types and Categorization of Clusters
4. Cluster Components and Resources
5. Cluster Implementations
1. Introduction to Clusters
The term “Clustering” is an area of scientific study, practice and implementation in computer
science where “Technology Infrastructure” is utilized in the form of a collection of
interconnected computer systems collaborating to achieve specific end goals. The concept of a
cluster involves taking two or more computers and organizing them to work together to provide
higher availability, reliability and scalability, than can be obtained by using a single computer.
Some clusters are designed to deliver high performance, with respect to speed and efficiency
of applications that run on them.
When any kind of failure occurs in a cluster, redirection of resources to alternative ones are
carried out immediately and the workload is redistributed. The end user may experience only a
short lag in service, most often without even knowing of an occurrence of a component
breakdown in the cluster.
A computer cluster may be a group of loosely coupled computers that work together and to the
end-user they can be viewed as a single large computing system. Clusters are commonly
connected through high-speed networks, and are usually designed and deployed to improve
speed and reliability, while typically being more cost-effective. A well-designed cluster solution
uses redundant systems and components so that the failure of an individual server does not affect
the availability of the entire cluster.
2. History of Computing Clusters
The formal engineering basis of cluster computing as a means of doing parallel work of any sort
was invented by Gene Amdahl of IBM, who in 1967 published a seminal paper on parallel
processing (http://www-inst.eecs.berkeley.edu/~n252/paper/Amdahl.pdf). This was regarded as
Amdahl's Law, which describes mathematically the speedup one can expect from parallelizing
any given otherwise serially performed task on a parallel architecture. This article defined the
engineering basis for both multiprocessor computing and cluster computing, where the primary
differentiator is whether or not the inter-processor communications are supported "inside" the
computer or "outside" the computer on an external network.
2
Computer Clustering Technologies – IIHT Ltd.
Coincidentally the history of early computer clusters is directly tied with the history of early
networks, as one of the primary motivations for the development of a network was to link
computing resources, creating a de facto computer cluster. Packet switching networks were
conceptually invented by the RAND corporation in 1962. Using the concept of a packet switched
network, the ARPANET project succeeded in creating the world's first network based computer
cluster in 1969 by linking four different computer centers. ARPANET (Advanced Research
Projects Agency Network) was the world's first operational packet switching network, and is
known as the predecessor of the present global Internet.
Figure 1 - ARPANET logical map, March 1977
The development of customer-built and research clusters preceded hand in hand with that of both
networks and the UNIX operating system from the early 1970s, as both TCP/IP and the
Xerox-PARC project created and formalized protocols for network-based communications.
However, it wasn't until 1983 that the protocols and tools for easily doing remote job distribution
and file sharing were defined, (in context of BSD UNIX, as implemented by Sun Microsystems)
and hence became generally available commercially, along with a shared file system.
The first commercial clustering product was ARCnet, developed by Datapoint in 1977. ARCnet
wasn't a commercial success and clustering didn't really take off until DEC released their
VAXcluster product in the 1984 for the VAX/VMS operating system. The ARCnet and
VAXcluster products not only supported parallel computing, but also shared file systems and
peripheral devices. They were supposed to give you the advantage of parallel processing, while
3
Computer Clustering Technologies – IIHT Ltd.
maintaining data reliability and uniqueness. VAXcluster, now VMScluster, is still available on
OpenVMS systems from HP running on Alpha and Itanium systems.
Figure 2 - A Beowulf Cluster
The above Beowulf Cluster in the lab of College Of Engineering, Boise State University houses
a 61 node Beowulf cluster with 122 2.4 GHz Xeon processors. The lab is sponsored by National
Science Foundation.
No history of commodity compute clusters would be complete without noting the pivotal role
played by the development of Parallel Virtual Machine (PVM) software in 1989
(http://www.csm.ornl.gov/pvm/pvm_home.html). This open source software based on TCP/IP
communications enabled the instant creation of a virtual supercomputer, a high performance
compute cluster made out of any TCP/IP connected systems. Free form heterogeneous clusters
built on top of this model rapidly achieved total throughput in FLOPS (Floating point Operations
Per Second) that greatly exceeded that available even with the most expensive supercomputers.
In 1993, PVM and the advent of inexpensive networked computers led NASA on a project to
build supercomputers out of commodity clusters. NASA developed a Beowulf cluster in 1995,
which was a compute cluster built on top of a commodity network for the specific purpose of
"being a supercomputer" capable of performing tightly coupled parallel HPC computations. This
in turn spurred the independent development of Grid computing as a named entity, although
4
Computer Clustering Technologies – IIHT Ltd.
Grid-style clustering had been around at least as long as the UNIX operating system and the
Arpanet were named.
Two other noteworthy early commercial clusters were the Tandem Himalaya (a 1994 high-
availability product) and the IBM S/390 Parallel Sysplex (in 1994, primarily for business use).
3. Types and Categorization of Clusters
Understanding some of the clustering technologies will allow us to see where our approach to
clustering fits in, and how some organizations use clustering for their goals. As a result of
implementing multiple computer systems into a single “whole”, two aspirations can be identified
for a cluster employing different hardware components:
• Parallel processing
• Application availability
The above two not being the only aspirations, they are broad enough to categorize some key
technologies that involve clustering techniques. Immense amount of research has been done by
various technology vendors on clustering techniques, since the requirements of organizations for
a cluster implementation varies depending on the type and size of cluster. Computing Clusters
may be categorized in to the following types:
1. High Performance Clusters
2. High Availability Clusters
3. Load Balancing Clusters
4. Database Clusters
5. Web Server Clusters
6. Storage Clusters
7. Single System Image
8. Grid computing
High Performance Clusters (HPC): High-performance clusters are implemented primarily to
provide increased performance by splitting a computational task across many different nodes in
the cluster, and are most commonly used in scientific computing. The nodes are "coupled" in
such a way that they run their own instance of the operating system but provide a single interface
to the user. Events are managed via a central coordinator of the "coupling facility." Individual
nodes will need access to a common pool of I/O devices, possibly via a Storage Area Network
(SAN). The underlying operating system provides some form of cluster-wide filesystem or
allows cluster-wide access to raw disk space. HPC clusters are optimized for workloads which
require jobs or processes happening on the separate cluster computer nodes to communicate
actively during the computation. These include computations where intermediate results from
one node's calculations will affect future calculations on other nodes.
Six of the Top 10 (http://www.top500.org) fastest computers in the world are running as high-
performance clusters. Hewlett Packard offers the HyperPlex product to provide high-
5
Computer Clustering Technologies – IIHT Ltd.
performance parallel processing. One of the more popular HPC implementations is a cluster with
nodes running Linux as the OS and free software to implement the parallelism. This
configuration is often referred to as a Beowulf cluster. Such clusters commonly run custom
programs which have been designed to exploit the parallelism available on HPC clusters. Many
such programs use libraries such as MPI which are specially designed for writing scientific
applications for HPC computers.
High Availability Clusters (HA): Also known as Failover Clusters are computer clusters that
are implemented primarily for the purpose of providing high availability of services which the
cluster provides. They operate by having redundant computers or nodes which are then used to
provide service when system components fail. Normally, if a server with a particular application
crashes, the application will be unavailable until someone fixes the crashed server. HA clustering
remedies this situation by detecting hardware/software faults, and immediately restarting the
application on another system without requiring administrative intervention, a process known as
Failover.
In these clusters, we are providing a computing resource that is able to sustain failures. Here, we
are looking at the types of outages that affect our ability to continue processing. Outages come in
two main forms: planned and unplanned. Planned outages include software and hardware
upgrades, maintenance, or regulatory conditions that effectively stops processing. Unplanned
outages are the difficult ones: power failures, failed hardware, software bugs, human error,
natural disasters, and terrorism, to name only a few. We don't know in advance when they are
going to happen.
Figure 3 - A two node High Availability Cluster network diagram
6
Computer Clustering Technologies – IIHT Ltd.
HA cluster implementations attempt to build redundancy into a cluster to eliminate single points
of failure, including multiple network connections and data storage which is multiply connected
via Storage area networks. HA clusters usually use a heartbeat “private network connection”
which is used to monitor the health and status of each node in the cluster.
There are many commercial implementations of High-Availability clusters for many operating
systems. Hewlett Packard's Serviceguard, IBM's HACMP, and Veritas Cluster Services are all
examples of High Availability Clusters. The Linux-HA project is one commonly used free
software HA package for the Linux Operating System.
Load Balancing Clusters: Load balancing is a technique to spread work between two or more
computers, network links, CPUs, hard drives, or other resources in order to get optimal resource
utilization, throughput, or response time. Using multiple components with load balancing,
instead of a single component, may increase reliability through redundancy. The balancing
service is usually provided by a dedicated program or hardware device.
Here, the user gets a view of a single computing resource insofar as submitting jobs to be
executed is concerned. This will provide a single interface to the user as well as distribute
workload over all nodes in the cluster. Each node is running its own instance of the operating
system, and in some installations, the operating systems can be heterogeneous. We hope to
achieve high throughput because we have multiple machines to run our jobs, as well as high
availability because the loss of a single node means that we can rerun a particular job on another
node. This will require all nodes to have access to all data and applications necessary to run a
job.
Load balancing clusters operate by having all workload come through one or more load-
balancing front ends, which then distribute it to a collection of back end servers. Although they
are implemented primarily for improved performance, they commonly include high-availability
features as well. Such a cluster of computers is sometimes referred to as a server farm. There are
many commercial load balancers available including Platform LSF HPC, Moab Cluster Suite
and Maui Cluster Scheduler. The Linux Virtual Server project provides one commonly used
free software package for the Linux OS.
Database Clusters: Many database vendors now offer the facility to have a single database
managed from several nodes, like Oracle Parallel Server. The user simply sends a request to
the database management processes, but in this case the processes can be running concurrently
on multiple nodes. Consistency is maintained by sophisticated message passing between the
database management processes themselves. Each node is running its own instance of the
operating system. All nodes in the cluster are connected to the disks holding the database and
will have to provide either a cluster-wide filesystem or simple raw disk access managed by the
database processes themselves.
Web Server Clusters: In this solution, there is a front-end Web server that receives all
incoming requests from the Internet/Intranet. Instead of servicing these thousands of requests, the
dispatcher sends individual requests to a farm of actual Web servers that will individually
construct a response and send the response back to the originator directly. In effect, the
7
Computer Clustering Technologies – IIHT Ltd.
dispatcher is acting as a load balancer. Some solutions simply send requests to backend servers
on a round-robin basis. Others are more content aware and send requests based on the likely
response times. Local Director from Cisco Systems, ACEdirector from Alteon, and HP e-
Commerce Traffic Director Server Appliance are examples of Web-cluster solutions. One
thing to be aware of is the redundancy in these solutions; Local Director, for example, is a
hardware-based solution. This could now be your Single Point of Failure. While being able to
achieve higher performance in responding to individual requests, we may need to investigate
further whether individual solutions can support improved availability through redundancy of
devices.
Storage Clusters: A common problem for systems is being limited in the number of devices
they can attach to. Interfaces like PCI are common these days, but to have a server with more
than10 interfaces is not so common because the individual costs of such servers are normally
high. A solution would be to a collection of small blade servers all connected to a common pool
of storage devices. Today, we would call this pool of storage devices a SAN, a Storage Area
Network. Back in the late 1980s and early 1990s, Tandem (as it was known then) developed a
product known as ServerNet, which was based on the concept known as System Area Network
(SAN). Here, we have inter-device links as well as inter-host links. All nodes can, if necessary,
see all devices. Centralized storage management has benefits for ease of management, for
example, performing a backup to a tape device. Because we have inter-device communication,
we can simply instruct a disk to send I/O to a tape without going through the system/memory bus
of the individual host. Today, we would call this a serverless backup. We gain higher throughput
in this instance, although we need to be careful that the communications medium used has the
bandwidth and low-latency needed to respond to multiple requests from multiple hosts. High
availability is not necessarily achieved because individual devices like disks and tapes are still
Single Points of Failure.
Single System Image (SSI): An SSI is a conceptual notion of providing, at some level, a single
instance of "a thing." All of the designs we have listed above can be viewed as SSIs, but it
depends at what level of abstraction you view them from. A simple load-leveling batch scheduler
can be seen as an SSI from the perspective of a user submitting batch jobs. A Web cluster is an
SSI because the user at home has no concept of who, what, or how many devices are capable of
responding to his query; he gets a single response. Database users submit their queries and their
screens are updated accordingly, regardless of which node performed their query. Nuclear
scientists at Los Alamos National Laboratory perform their massive nuclear weapon simulations
unaware of which nodes are processing their individual calculations. As we can see here, these
SSIs appear to operate at different levels. This is true for all SSIs. There are three main levels at
which an SSI can exist: application, operating system (kernel), and hardware. An individual
SSI can support multiple levels, each building on the other. Hewlett Packard's (formerly DEC's)
TruCluster technology is an example. We have a cluster at the operating system level and the
application level providing to the cluster administrator a "single image" of the operating system
files held on a central machine. Installing and updating operating system software happens only
once, on the SSI files. The individual systems themselves are independent nodes running
effectively their own instance of the operating system.
The SSI also has a concept known as a "boundary." In the case of TruCluster, the boundary
8
Computer Clustering Technologies – IIHT Ltd.
would be at the operating system level. Anything performed outside the boundary obliterates the
idea of a unified computing resource; for example, in the case of TruCluster, performing
hardware upgrades on an individual node exposes the "individuality" of single nodes in the
cluster. Another example is openMosix which is a Linux kernel extension for SSI clustering.
This kernel extension turns a network of ordinary computers into a supercomputer for Linux
applications.
Grid computing: The key differences between grids and traditional clusters are that grids
connect collections of computers which do not fully trust each other, and hence operate more like
a computing utility than like a single computer. In addition, grids typically support more
heterogeneous collections than are commonly supported in clusters.
Grid computing is optimized for workloads which consist of many independent jobs or packets
of work, which do not have to share data between the jobs during the computation process. Grids
serve to manage the allocation of jobs to computers which will perform the work independently
of the rest of the grid cluster. Resources such as storage may be shared by all the
nodes, but
intermediate results of one job do not affect other jobs in progress on other nodes of the grid.
Grid computing is a form of distributed computing whereby a "super and virtual computer" is
composed of a cluster of networked, loosely-coupled computers, acting in concert to perform
very large tasks. This technology has been applied to computationally-intensive scientific,
mathematical, and academic problems through volunteer computing, and it is used in commercial
enterprises for such diverse applications as drug discovery, economic forecasting, seismic
analysis, and back-office data processing in support of e-commerce and web services.
What distinguishes grid computing from typical cluster computing systems is that grids tend to
be more loosely coupled, heterogeneous, and geographically dispersed. Also, while a computing
grid may be dedicated to a specialized application, it is often constructed with the aid of general
purpose grid software libraries and middleware.
Grid computing is presently being applied successfully by the National Science Foundation,
NASA's Information Power Grid, Pratt & Whitney, Bristol-Myers Squibb, and American
Express. Another well-known project is distributed.net, which was started in 1997 and has run a
number of successful projects in its history. The Enabling Grids for E-science project, which is
based in the European Union and includes sites in Asia and the United States, is a follow
up
project to the European Data Grid (EDG) and is arguably the largest computing grid on
the
planet. This, along with the LHC Computing Grid has been developed to support the experiments
using the CERN Large Hadron Collider.
4. Cluster Components and Resources
A Computer Cluster consists of various hardware components like CPU cabinets mounted on
Racks, Storage Arrays and SANs, Network communication infrastructure, and mostly
importantly system and application software to run an organized cluster constituting these
components.
9
Computer Clustering Technologies – IIHT Ltd.
MPI (Message Passing Interface) is a widely-available communications library that enables
parallel programs to be written in C and Fortran, for example in the climate modeling program.
The GNU/Linux world sports various cluster software, such as:
• Beowulf, distcc, MPICH and other - mostly specialized application clustering. distcc
provides parallel compilation when using GCC.
• Linux Virtual Server, Linux-HA - director-based clusters that allow incoming requests for
services to be distributed across multiple cluster nodes.
• MOSIX, openMosix, Kerrighed, OpenSSI - full-blown clusters integrated into the kernel
that provide for automatic process migration among homogeneous nodes. OpenSSI,
openMosix and Kerrighed are single-system image implementations.
DragonFly BSD, a recent fork of FreeBSD 4.8 is being redesigned at its core to enable native
clustering capabilities. It also aims to achieve single-system image capabilities.
MSCS is Microsoft's high-availability cluster service for Windows. Based on technology
developed by Digital Equipment Corporation, the current version supports up to eight nodes in a
single cluster, typically connected to a SAN. A set of APIs support cluster-aware applications,
generic templates provide support for non-cluster aware applications.
Open Source Clustering Software
• Kerrighed
• Linux-Cluster Project and Global File System for High-Avalability
• Maui Cluster Scheduler
• OpenSSI HA, HPC, and load-balancing clusters with or without a SAN.
• OpenMosix
• OpenSCE
• Open Source Cluster Application Resources (OSCAR)
• Rocks Cluster Distribution
• Sun GridEngine
• TORQUE Resource Manager
• WareWulf
Clustering products
• Alchemi
• BOINC • HP's OpenVMS
• IBM's HACMP and Parallel Sysplex
• KeyCluster • United Devices Grid MP • Linux-HA • MC Service Guard for HP-UX systems • Microsoft Cluster Server (MSCS) • Moab Cluster Suite • Platform LSF • NEC ExpressCluster • Novell Cluster Services, Novell Cluster Services for Linux and NetWare
• Oracle Real Application Cluster (RAC) • PolyServe
10
• Red Hat Cluster Suite
• SteelEye LifeKeeper
• Sun N1 GridEngine Sun N1 GridEngine
• Veritas Cluster Server (VCS) • Scyld Beowulf Cluster • Platform Rocks
5. Cluster Implementations
Computer Clustering Technologies – IIHT Ltd.
There are several thousands of cluster implementations across the globe and those may be
as
small as a two-node cluster, while on the other end there are supercomputers constituting
numerous nodes, providing High Performance, High Availability, Load Balancing and multitude
of clustering features.
The TOP500 organization publishes the 500 fastest computers twice a year, usually including
many clusters on their list. The current top supercomputer is the Department of Energy's
BlueGene/L system with performance of 280.6 TeraFlops. The second place is owned by another
BlueGene/L system with performance of 91.29 TeraFlops.
Figure 4 - IBM’s Blue Gene/L supercomputer
In the latest rankings, three of the top five computers are located outside of the United States.
Significantly, fourth place went to a Hewlett-Packard supercomputer located at the
Computational Research Laboratories in Pune, India.
Clustering can provide significant performance benefits versus price. The System X
supercomputer at Virginia Tech, the twentieth most powerful supercomputer on Earth as of
November 2005, is a 12.25 TFlops computer cluster of 1100 Apple XServe G5 2.3 GHz dual
processor machines (4 GB RAM, 80 GB SATA HD) running Mac OS X. The cluster initially
11
Computer Clustering Technologies – IIHT Ltd.
consisted of Power Mac G5s; the XServe's are smaller, reducing the size of the cluster. The total
cost of the previous Power Mac system was $5.2 million, a tenth of the cost of slower mainframe
supercomputers.
Figure 5 – NASA’s Columbia Supercomputer
Ranked the fourth fastest supercomputer in the world on the November 2005 Top500 list,
Columbia has increased the NASA’s total high-end computing, storage, and network capacity
tenfold. This has enabled advances in science not previously possible on NASA’s high-end
systems. It consists of a 10,240-processor SGI Altix system comprised of 20 nodes, each with
512 Intel Itanium 2 processors, and running a Linux operating system.
Internet References:
• http://en.wikipedia.org/wiki/Computer_cluster
• http://www.nasa.gov/centers/goddard
Book References:
• Charles Keenan : HP-UX CSE Official Study Guide and Desk Reference
• Karl Kopper : The Linux Enterprise Cluster
12