![Page 1: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/1.jpg)
(c) Raj
Rajkumar Buyya, Monash University, Melbourne, Australia.
[email protected] http://www.dgs.monash.edu.au/~rajkumar
Low Cost Supercomputing
Parallel Processing on Linux Clusters
No
![Page 2: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/2.jpg)
(c) Raj Agenda
Cluster ? Enabling Tech. & Motivations Cluster Architecture Cluster Components and Linux Parallel Processing Tools on Linux Cluster Facts Resources and Conclusions
![Page 3: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/3.jpg)
(c) Raj
Need of more Computing Power:
Grand Challenge Applications
Solving technology problems using
computer modeling, simulation and analysis
Life SciencesLife Sciences
Mechanical Design & Analysis (CAD/CAM)Mechanical Design & Analysis (CAD/CAM)
AerospaceAerospace
GeographicInformationSystems
GeographicInformationSystems
![Page 4: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/4.jpg)
(c) Raj Architectures System Software Applications P.S.Es Architectures System
Software
Applications P.S.Es
SequentialEra
ParallelEra
1940 50 60 70 80 90 2000 2030
Two Eras of Computing
Commercialization R & D Commodity
![Page 5: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/5.jpg)
(c) Raj
Competing Computer Architectures
Vector Computers (VC) ---proprietary system– provided the breakthrough needed for the emergence of computational science,
buy they were only a partial answer. Massively Parallel Processors (MPP)-proprietary
system– high cost and a low performance/price ratio.
Symmetric Multiprocessors (SMP)– suffers from scalability
Distributed Systems– difficult to use and hard to extract parallel performance.
Clusters -- gaining popularity– High Performance Computing---Commodity Supercomputing
– High Availability Computing ---Mission Critical Applications
![Page 6: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/6.jpg)
(c) RajTechnology Trend...
Performance of PC/Workstations components has almost reached performance of those used in supercomputers…– Microprocessors (50% to 100% per year)
– Networks (Gigabit ..)
– Operating Systems
– Programming environment
– Applications Rate of performance improvements of
commodity components is too high.
![Page 7: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/7.jpg)
(c) RajTechnology Trend
![Page 8: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/8.jpg)
(c) Raj
The Need for Alternative
Supercomputing Resources
Cannot afford to buy “Big Iron” machines– due to their high cost and short life span.– cut-down of funding– don’t “fit” better into today's funding model.
– …. Paradox: time required to develop a
parallel application for solving GCA is equal to: – half Life of Parallel Supercomputers.
![Page 9: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/9.jpg)
(c) Raj
Clusters are best-alternative!
Supercomputing-class commodity components are available
They “fit” very well with today’s/future funding model.
Can leverage upon future technological advances– VLSI, CPUs, Networks, Disk, Memory, Cache,
OS, programming tools, applications,...
![Page 10: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/10.jpg)
(c) RajBest of both Worlds!
High Performance Computing (talk
focused on this)
– parallel computers/supercomputer-class workstation cluster
– dependable parallel computers High Availability Computing
– mission-critical systems
– fault-tolerant computing
![Page 11: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/11.jpg)
(c) Raj What is a cluster?
A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource.
A typical cluster:– Network: Faster, closer connection than a typical network
(LAN)– Low latency communication protocols– Looser connection than SMP
![Page 12: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/12.jpg)
(c) Raj
So What’s So Different about Clusters?
Commodity Parts? Communications Packaging? Incremental Scalability? Independent Failure? Intelligent Network Interfaces? Complete System on every node
– virtual memory
– scheduler
– files
– … Nodes can be used individually or
combined...
![Page 13: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/13.jpg)
Clustering of Computers
for Collective Computating
1960 1990 1995+
![Page 14: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/14.jpg)
(c) Raj Computer Food Chain (Now and Future)
Demise of Mainframes, Supercomputers, & MPPs
![Page 15: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/15.jpg)
(c) Raj
Cluster Configuration..1Dedicated Cluster
![Page 16: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/16.jpg)
(c) Raj
Shared Pool ofComputing Resources:
Processors, Memory, Disks
Interconnect
Guarantee at least oneworkstation to many individuals
(when active)
Deliver large % of collectiveresources to few individuals
at any one time
Cluster Configuration..2Enterprise Clusters (use JMS like Codine)
![Page 17: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/17.jpg)
(c) Raj
Windows of Opportunities
MPP/DSM:
– Compute across multiple systems: parallel. Network RAM:
– Idle memory in other nodes. Page across other nodes idle memory
Software RAID:
– file system supporting parallel I/O and reliability, mass-storage.
Multi-path Communication:
– Communicate across multiple networks: Ethernet, ATM, Myrinet
![Page 18: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/18.jpg)
(c) Raj
Cluster Computer Architecture
![Page 19: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/19.jpg)
(c) Raj
Size Scalability (physical & application)
Enhanced Availability (failure management)
Single System Image (look-and-feel of one system)
Fast Communication (networks & protocols)
Load Balancing (CPU, Net, Memory, Disk)
Security and Encryption (clusters of clusters)
Distributed Environment (Social issues)
Manageability (admin. And control)
Programmability (simple API if required)
Applicability (cluster-aware and non-aware app.)
Major issues in cluster design
![Page 20: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/20.jpg)
(c) Raj
Scalability Vs. Single System Image
UP
![Page 21: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/21.jpg)
(c) Raj
High Availability Computing
High Performance Computing
Linux-based Tools for
![Page 22: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/22.jpg)
(c) RajHardware
Linux OS is running/driving...– PCs (Intel x86 processors)
– Workstations (Digital Alphas)
– SMPs (CLUMPS)
– Clusters of Clusters
Linux supports networking with – Ethernet (10Mbps)/Fast Ethernet (100Mbps),
– Gigabit Ethernet (1Gbps)
– SCI (Dolphin - MPI- 12micro-sec latency)
– ATM
– Myrinet (1.2Gbps)
– Digital Memory Channel
– FDDI
![Page 23: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/23.jpg)
(c) Raj
Communication Software
Traditional OS supported facilities (heavy weight due to protocol processing)..
– Sockets (TCP/IP), Pipes, etc. Light weight protocols (User Level)
– Active Messages (AM) (Berkeley)– Fast Messages (Illinois)– U-net (Cornell)– XTP (Virginia)– Virtual Interface Architecture (industry standard)
![Page 24: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/24.jpg)
(c) RajCluster Middleware
Resides Between OS and Applications and offers in infrastructure for supporting:
– Single System Image (SSI)
– System Availability (SA) SSI makes collection appear as single
machine (globalised view of system resources). telnet cluster.myinstitute.edu
SA - Check pointing and process migration..
![Page 25: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/25.jpg)
(c) Raj
Cluster Middleware
OS / Gluing Layers– Solaris MC, Unixware, MOSIX– Beowulf “Distributed PID”
Runtime Systems– Runtime systems (software DSM, PFS, etc.)– Resource management and scheduling (RMS):
• CODINE, CONDOR, LSF, PBS, NQS, etc.
![Page 26: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/26.jpg)
(c) RajProgramming environments
Threads (PCs, SMPs, NOW..) – POSIX Threads
– Java Threads MPI
– http://www-unix.mcs.anl.gov/mpi/mpich/ PVM
– http://www.epm.ornl.gov/pvm/ Software DSMs (Shmem)
![Page 27: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/27.jpg)
(c) RajDevelopment Tools
Compilers– C/C++/Java/
Debuggers Performance Analysis Tools Visualization Tools
GNU--
www.gnu.org
![Page 28: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/28.jpg)
(c) RajApplications
Sequential (benefit from the cluster)
Parallel / Distributed (Cluster-aware app.)– Grand Challenging applications
• Weather Forecasting
• Quantum Chemistry
• Molecular Biology Modeling
• Engineering Analysis (CAD/CAM)
• Ocean Modeling
• …………
– PDBs, web servers,data-mining
![Page 29: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/29.jpg)
(c) Raj
Linux Webserver(Network Load Balancing)
http://proxy.iinchina.net/~wensong/ippfvs/High Performance (by serving through light loaded machine)
High Availability (detecting failed nodes and isolating them from the cluster)
Transparent/Single System view
![Page 30: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/30.jpg)
(c) Raj
A typical Cluster Computing Environment
PVM / MPI/ RSH
Application
Hardware/OS
???
![Page 31: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/31.jpg)
(c) Raj CC should support
Multi-user, time-sharing environments
Nodes with different CPU speeds and memory sizes
(heterogeneous configuration)
Many processes, with unpredictable requirements
Unlike SMP: insufficient “bonds” between nodes
– Each computer operates independently
– Inefficient utilization of resources
![Page 32: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/32.jpg)
(c) Raj
Multicomputer OS for UNIX (MOSIX)
An OS module (layer) that provides the applications with the illusion of working on a single system
Remote operations are performed like local operations Transparent to the application - user interface
unchanged
PVM / MPI / RSHMOSIX
Application
Hardware/OS
Offers missing link
http://www.mosix.cs.huji.ac.il/
![Page 33: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/33.jpg)
(c) Raj MOSIX is Main tool
Supervised by distributed algorithms that respond on-line to global resource availability - transparently
Load-balancing - migrate process from over-loaded to under-loaded nodes
Memory ushering - migrate processes from a node that has exhausted its memory, to prevent paging/swapping
Preemptive process migration that can migrate--->any process, anywhere, anytime
![Page 34: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/34.jpg)
(c) RajMOSIX for Linux at HUJI
A scalable cluster configuration:
– 50 Pentium-II 300 MHz– 38 Pentium-Pro 200 MHz (some are SMPs)– 16 Pentium-II 400 MHz (some are SMPs)
Over 12 GB cluster-wide RAM Connected by the Myrinet 2.56 G.b/s LAN
Runs Red-Hat 6.0, based on Kernel 2.2.7 Upgrade: HW with Intel, SW with Linux Download MOSIX:
http://www.mosix.cs.huji.ac.il/
![Page 35: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/35.jpg)
(c) Raj
Nimrod - A tool for parametric modeling on clusters
http://www.dgs.monash.edu.au/~davida/nimrod.html
![Page 36: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/36.jpg)
(c) RajJob processing with Nimrod
![Page 37: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/37.jpg)
(c) Raj
PARMON: A Cluster Monitoring Tool
PARMONHigh-Speed
Switch
parmond
parmon
PARMON Serveron each nodePARMON Client on JVM
![Page 38: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/38.jpg)
(c) Raj
Resource Utilization at a Glance
![Page 39: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/39.jpg)
(c) RajLinux cluster in Top500
Top500 Supercomputing (www.top500.org) Sites declared Avalon(http://cnls.lanl.gov/avalon/), Beowulf cluster, the 113th most powerful computer in the world.
70 processor DEC Alpha cluster
Cost: $152K
Completely commodity and Free Software
price/performance is $15/Mflop,
performance similar to 1993’s 1024-node CM-5
![Page 40: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/40.jpg)
(c) Raj
Adoption of the Approach
![Page 41: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/41.jpg)
(c) Raj
Conclusions Remarks
Clusters are promising..
Solve parallel processing paradoxOffer incremental growth and matches with funding
patternNew trends in hardware and software technologies are
likely to make clusters more promising and fill SSI gap..so that
Clusters based supercomputers (Linux based clusters) can be seen everywhere!
![Page 42: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/42.jpg)
(c) Raj
Announcement: formation of
IEEE Task Force on Cluster Computing
(TFCC)
http://www.dgs.monash.edu.au/~rajkumar/tfcc/
http://www.dcs.port.ac.uk/~mab/tfcc/
![Page 43: (c) Raj Rajkumar Buyya, Monash University, Melbourne, Australia. rajkumar@ieee.org rajkumar Low Cost Supercomputing Parallel](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bfed1a28abf838cb8df6/html5/thumbnails/43.jpg)
(c) Raj
Well, Read my book for….
http://www.dgs.monash.edu.au/~rajkumar/cluster/
Thank You ...
Thank You ...
?