cluster computing ppt

of 57 /57
1 Classification of Cluster Computer

Author: rupak-bhattacharjee

Post on 02-Nov-2014




5 download

Embed Size (px)


Cluster Computing Paper Presentation & Seminar A computer cluster is a group of loosely coupled computers that work together closely so that ..


Page 1: Cluster Computing PPT



of Cluster Computer

Page 2: Cluster Computing PPT


Clusters Classification..1

Based on Focus (in Market)

– High Performance (HP) Clusters• Grand Challenging Applications

– High Availability (HA) Clusters• Mission Critical applications

Page 3: Cluster Computing PPT


HA Cluster: Server Cluster with "Heartbeat" Connection

Page 4: Cluster Computing PPT


Clusters Classification..2

Based on Workstation/PC Ownership

– Dedicated Clusters

– Non-dedicated clusters• Adaptive parallel computing

• Also called Communal multiprocessing

Page 5: Cluster Computing PPT


Clusters Classification..3

Based on Node Architecture..

– Clusters of PCs (CoPs)

– Clusters of Workstations (COWs)

– Clusters of SMPs (CLUMPs)

Page 6: Cluster Computing PPT


Building Scalable Systems: Cluster of SMPs (Clumps)

Performance of SMP Systems Vs. Four-Processor Servers in a Cluster

Page 7: Cluster Computing PPT


Clusters Classification..4

Based on Node OS Type..– Linux Clusters (Beowulf)– Solaris Clusters (Berkeley NOW)– NT Clusters (HPVM)– AIX Clusters (IBM SP2)– SCO/Compaq Clusters (Unixware)– …….Digital VMS Clusters, HP clusters,


Page 8: Cluster Computing PPT


Clusters Classification..5

Based on node components architecture & configuration (Processor Arch, Node Type: PC/Workstation.. & OS: Linux/NT..):

– Homogeneous Clusters• All nodes will have similar configuration

– Heterogeneous Clusters• Nodes based on different processors and

running different OSes.

Page 9: Cluster Computing PPT


Clusters Classification..6a

Dimensions of Scalability & Levels of Clustering







CPU / I/O / M

emory / OS








Public Metacomputing

Page 10: Cluster Computing PPT


Clusters Classification..6b

Group Clusters (#nodes: 2-99) – (a set of dedicated/non-dedicated computers - mainly

connected by SAN like Myrinet) Departmental Clusters (#nodes: 99-999) Organizational Clusters (#nodes: many

100s) (using ATMs Net) Internet-wide Clusters=Global Clusters:

(#nodes: 1000s to many millions)– Metacomputing– Web-based Computing– Agent Based Computing

• Java plays a major in web and agent based computing

Page 11: Cluster Computing PPT


Cluster Middleware


Single System Image

Page 12: Cluster Computing PPT



What is Middleware ? What is Single System Image ? Benefits of Single System Image SSI Boundaries SSI Levels Relationship between Middleware

Modules. Strategy for SSI via OS Solaris MC: An example OS supporting

SSI Cluster Monitoring Software

Page 13: Cluster Computing PPT


What is Cluster Middleware ?

An interface between user applications and cluster hardware and OS platform.

Middleware packages support each other at the management, programming, and implementation levels.

Middleware Layers:

– SSI Layer– Availability Layer: It enables the cluster

services of• Checkpointing, Automatic Failover, recovery from

failure, • fault-tolerant operating among all cluster nodes.

Page 14: Cluster Computing PPT


Middleware Design Goals

Complete Transparency

– Lets the see a single cluster system..• Single entry point, ftp, telnet, software loading...

Scalable Performance

– Easy growth of cluster• no change of API & automatic load distribution.

Enhanced Availability

– Automatic Recovery from failures• Employ checkpointing & fault tolerant technologies

– Handle consistency of data when replicated..

Page 15: Cluster Computing PPT


What is Single System Image (SSI) ?

A single system image is the illusion, created by software or hardware, that a collection of computing elements appear as a single computing resource.

SSI makes the cluster appear like a single machine to the user, to applications, and to the network.

A cluster without a SSI is not a cluster

Page 16: Cluster Computing PPT


Benefits of Single System Image

Usage of system resources transparently

Improved reliability and higher availability

Simplified system management

Reduction in the risk of operator errors

User need not be aware of the underlying system architecture to use these machines effectively

Page 17: Cluster Computing PPT


SSI vs. Scalability(design space of competing


Page 18: Cluster Computing PPT


Desired SSI Services

Single Entry Point

– telnet– telnet node1.cluster.

Single File Hierarchy: xFS, AFS, Solaris MC Proxy

Single Control Point: Management from single GUI

Single virtual networking Single memory space - DSM Single Job Management: Glunix, Condin, LSF Single User Interface: Like workstation/PC

windowing environment (CDE in Solaris/NT), may it can use Web technology

Page 19: Cluster Computing PPT


Availability Support Functions

Single I/O Space (SIO):

– any node can access any peripheral or disk devices without the knowledge of physical location.

Single Process Space (SPS)

– Any process on any node create processes cluster wide and they communicate through signal, pipes, etc, as if they are one a single node.

Checkpointing and Process Migration.

– Saves the process state and intermediate results in memory to disk to support rollback recovery when node fails. PM for Load balancing...

Reduction in the risk of operator errors User need not be aware of the underlying system

architecture to use these machines effectively

Page 20: Cluster Computing PPT


SSI Levels

It is a computer science notion of levels of abstractions (house is at a higher level of abstraction than walls, ceilings, and floors).

Application and Subsystem Level

Operating System Kernel Level

Hardware Level

Page 21: Cluster Computing PPT


Cluster Computing - Research Projects

Beowulf (CalTech and Nasa) - USA CCS (Computing Centre Software) - Paderborn, Germany Condor - Wisconsin State University, USA DJM (Distributed Job Manager) - Minnesota Supercomputing Center DQS (Distributed Queuing System) - Florida State University, USA EASY - Argonne National Lab, USA HPVM -(High Performance Virtual Machine),UIUC&now UCSB,US far - University of Liverpool, UK Gardens - Queensland University of Technology, Australia Generic NQS (Network Queuing System),University of Sheffield, UK NOW (Network of Workstations) - Berkeley, USA NIMROD - Monash University, Australia PBS (Portable Batch System) - NASA Ames and LLNL, USA PRM (Prospero Resource Manager) - Uni. of S. California, USA QBATCH - Vita Services Ltd., USA

Page 22: Cluster Computing PPT


Cluster Computing - Commercial Software

Codine (Computing in Distributed Network Environment) - GENIAS GmbH, Germany

LoadLeveler - IBM Corp., USA LSF (Load Sharing Facility) - Platform

Computing, Canada NQE (Network Queuing Environment) -

Craysoft Corp., USA OpenFrame - Centre for Development of

Advanced Computing, India RWPC (Real World Computing Partnership),

Japan Unixware (SCO-Santa Cruz Operations,), USA Solaris-MC (Sun Microsystems), USA

Page 23: Cluster Computing PPT


Representative Cluster Systems

1. Solaris -MC2. Berkeley NOW3. their comparison with Beowulf & HPVM

Page 24: Cluster Computing PPT


Next Generation Distributed Computing:

The Solaris MC Operating System

Page 25: Cluster Computing PPT


Why new software?

Without software, a cluster is:– Just a network of machines

– Requires specialized applications

– Hard to administer

With a cluster operating system:– Cluster becomes a scalable, modular computer

– Users and administrators see a single large machine

– Runs existing applications

– Easy to administer

New software makes cluster better for the customer

Page 26: Cluster Computing PPT


Cluster computing and Solaris MC

Goal: use computer clusters for general-purpose computing

Support existing customers and applications

Solution: Solaris MC (Multi Computer) operating system

A distributed operating system (OS) for multi-computers

Page 27: Cluster Computing PPT


What is the Solaris MC OS ?

Solaris MC extends standard Solaris

Solaris MC makes the cluster look like a single machine

Global file system Global process management Global networking

Solaris MC runs existing applications unchanged Supports Solaris ANI (Application binary interface)

Page 28: Cluster Computing PPT


Applications Ideal for:

Web and interactive servers Databases File servers Timesharing

Benefits for vendors and customers Preserves investment in existing applications Modular servers with low entry-point price and low cost of

ownership Easier system administraion Solaris could become a preferred platform for clustered


Page 29: Cluster Computing PPT


Solaris MC is a running research system

Designed, built and demonstrated Solaris MC prototype CLuster of SPARCstations connected with Myrinet network Runs unmodified commercial parallel database, scalable Web server,

parallel make

Next: Solaris MC Phase II High availability New I/O work to take advantage of clusters Performance evaluation

Page 30: Cluster Computing PPT


Advantages of Solaris MC

Leverages continuing investment in Solaris Same applications: binary-compatible Same kernel, device drivers, etc. As portable as base Solaris - will run on SPARC, x86, PowerPC

State of the art distributed systems techniques High availability designed into the system Powerful distributed object-oriented framework

Ease of administration and use Looks like a familiar multiprocessor server to users, sytem

administrators, and applications

Page 31: Cluster Computing PPT


Solaris MC details

Solaris MC is a set of C++ loadable modules on top of Solaris

– Very few changes to existing kernel

A private Solaris kernel per node: provides reliability Object-oriented system with well-defined interfaces

Page 32: Cluster Computing PPT


Solaris MC components

Object and communication support

High availability support

PXFS global distributed file system

Process mangement

NetworkingSolaris MC Architecture

System call interface


File system



Object framework

Existing Solaris 2.5 kernel


Object invocations


Solaris MC


Page 33: Cluster Computing PPT


Object Orientation

Better software maintenance, change, and evolution Well-defined interfaces Separate implementation from interface Interface inheritance

Solaris MC uses: IDL: a better way to define interfaces CORBA object model: a better RPC (Remote Procedure Call) C++: a better C

Page 34: Cluster Computing PPT


Object and Communication Framework

Mechanism for nodes and modules to communicate Inter-node and intra-node interprocess communication

Optimized protocols for trusted computing base

Efficient, low-latency communication primitives

Object communication independent of interconnect We use Ethernet, fast Ethernet, FibreChannel, Myrinet

Allows interconnect hardware to be upgraded

Page 35: Cluster Computing PPT


High Availability Support

Node failure doesn’t crash entire system Unaffected nodes continue running Better than a SMP A requirement for mission critical market

Well-defined failure boundaries Separate kernel per node - OS does not use shared

memory Object framework provides support

Delivers failure notifications to servers and clients Group membership protocol detects node failures

Each subsystem is responsible for its recovery Filesystem, process management, networking,


Page 36: Cluster Computing PPT


PXFS: Global Filesystem

Single-system image of file sytem

Backbone of Solaris MC

Coherent access and caching of files and directories Caching provides high performance

Access to I/O devices

Page 37: Cluster Computing PPT


PXFS: An object-oriented VFS

PXFS builds on existing Solaris file sytems Uses the vnode/virtual file system interface (VFS) externally Uses object communication internally

Page 38: Cluster Computing PPT


Process management

Provide global view of processes on any node Users, administrators, and applications see global view Supports existing applications

Uniform support for local and remote processes Process creation/waiting/exiting (including remote execution) Global process identifiers, groups, sessions Signal handling procfs (/proc)

Page 39: Cluster Computing PPT


Process management benefits

Global process management helps users and administrators

Users see familiar single machine process model

Can run programs on any node

Location of process in the cluster doesn’t matter

Use existing commands and tools: unmodified ps, kill, etc.

Page 40: Cluster Computing PPT


Networking goals Cluster appears externally as a single SMP server

Familiar to customers Access cluster through single network address Multiple network interfaces supported but not required

Scalable design protocol and network application processing on any mode Parallelism provides high server performance

Page 41: Cluster Computing PPT


Networking: Implementation

A programmable “packet filter” Packets routed between network device and the correct node Efficient, scalable, and supports parallelism Supports multiple protocols with existing protocol stacks

Parallelism of protocol processing and applications Incoming connections are load-balanced across the cluster

Page 42: Cluster Computing PPT



4 node, 8 CPU prototype with Myrinet demonstratedObject and communication infrastructureGlobal file system (PXFS) with coherency and

cachingNetworking TCP/IP with load balancingGlobal process management (ps, kill, exec, wait,

rfork, /proc)Monitoring toolsCluster membership protocols

Demonstrated applicationsCommercial parallel databaseScalable Web serverParallel makeTimesharing

Solaris-MC team is working on high availability

Page 43: Cluster Computing PPT


Summary of Solaris MC

Clusters likely to be an important market Solaris MC preserves customer investment in Solaris

Uses existing Solaris applications

Familiar to customers Looks like a multiprocessor, not a special cluster architecture

Ease of administration and use Clusters are ideal for important applications

Web server, file server, databases, interactive services State-of-the-art object-oriented distributed

implementation Designed for future growth

Page 44: Cluster Computing PPT


Berkeley NOW Project

Page 45: Cluster Computing PPT


NOW @ Berkeley

Design & Implementation of higher-level system

Global OS (Glunix)Parallel File Systems (xFS)Fast Communication (HW for Active

Messages)Application Support

Overcoming technology shortcomingsFault toleranceSystem Management

NOW Goal: Faster for Parallel AND Sequential

Page 46: Cluster Computing PPT


NOW Software Components


VN segment Driver



VN segment Driver



VN segment Driver



VN segment Driver

Unix (Solaris)Workstation

Global Layer Unix

Myrinet Scalable Interconnect

Large Seq. AppsParallel Apps

Sockets, Split-C, MPI, HPF, vSM

Active MessagesName Svr



Page 47: Cluster Computing PPT


Active Messages: Lightweight Communication Protocol

Key Idea: Network Process ID attached to every message that HW checks upon receipt

Net PID match, as fast as beforeNet PIC mismatch, interrupt and invoke OS

Can mix LAN messages and MPP messages;invoke OS & TCP/IP only when not cooperating

(if everyone uses same physical layer format)

Page 48: Cluster Computing PPT


MPP Active Messages

Key Idea: associate a small user-level handler directly with each message

Sender injects the message directly into the network

Handler executes immediately upon arrivalPulls the message out of the network and

integrates it into the ongoing computation, or replies

No buffering (beyond transport), no parsing, no allocation, primitive scheduling

Page 49: Cluster Computing PPT


Active Message Model

Every message contains at its header the address of a user level handler which gets executed immediately in user level

No receive side buffering of messages

Supports protected multiprogramming of a large number of users onto finite physical network resource

Active message operations, communication events and threads are integrated in a simple and cohesive model

Provides naming and protection

Page 50: Cluster Computing PPT


Active Message Model (Contd..)






data pc

Active Message


Page 51: Cluster Computing PPT


xFS: File System for NOW

Serverless File System: All data with clientsUses MP cache coherency to reduce traffic

Files striped for parallel transfer Large file cache (“cooperative caching-

Network RAM”)

Miss Rate Response Time

Client/Server 10% 1.8 ms

xFS 4% 1.0 ms

(42 WS, 32 MB/WS, 512 MB/server, 8KB/access)

Page 52: Cluster Computing PPT


Glunix: Gluing Unix

It is built onto of Solaris It glues together Solaris running on Cluster

nodes. Support transparent remote execution, load

balancing, allows to run existing applications. Provides globalized view of system resources

like SolarisMC Gang schedule parallel jobs to be as good as

dedicated MPP for parallel jobs

Page 53: Cluster Computing PPT


3 Paths for Applications on NOW?

Revolutionary (MPP Style): write new programs from scratch using MPP languages, compilers, libraries,…

Porting: port programs from mainframes, supercomputers, MPPs, …

Evolutionary: take sequential program & use1)Network RAM: first use memory of many computers

to reduce disk accesses; if not fast enough, then:

2)Parallel I/O: use many disks in parallel for accesses not in file cache; if not fast enough, then:

3)Parallel program: change program until it sees enough processors that is fast

=> Large speedup without fine grain parallel program

Page 54: Cluster Computing PPT


Comparison of 4 Cluster Systems

Page 55: Cluster Computing PPT


Clusters Revisited

Page 56: Cluster Computing PPT



We have discussed ClustersEnabling TechnologiesArchitecture & its ComponentsClassificationsMiddlewareSingle System ImageRepresentative Systems

Page 57: Cluster Computing PPT



Clusters are promising..

Solve parallel processing paradoxOffer incremental growth and matches with funding

pattern.New trends in hardware and software technologies

are likely to make clusters more thatClusters based supercomputers can be seen