hpc cluster - decus.de · hyper-threading technology yes ... architekturen und hpc cluster...

39
HP IT-Symposium 2006 www.decus.de 1 © 2004 Hewlett-Packard Development Company, L.P. Änderungen vorbehalten. 5982-7693DEE. August 2004 HPC Cluster High Performance Computing Cluster Status Update 18-Mai-2006 Dr. Werner Höhn Senior Consultant Presales, HPC, Bad Homburg Vortrag 3F01 IT-Symposium 2006 Agenda 1. Architekturen und HPC Cluster Bausteine 2. Cluster Management Software 3. HP‘s Cluster Filesystem : SFS

Upload: phamdiep

Post on 29-Aug-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

HP IT-Symposium 2006

www.decus.de 1

© 2004 Hewlett-Packard Development Company, L.P. Änderungen vorbehalten. 5982-7693DEE. August 2004

HPC ClusterHigh Performance Computing Cluster

Status Update

18-Mai-2006

Dr. Werner HöhnSenior ConsultantPresales, HPC, Bad Homburg

Vortrag 3F01

IT-Symposium 2006

Agenda

1. Architekturen und HPC Cluster Bausteine 2. Cluster Management Software3. HP‘s Cluster Filesystem : SFS

HP IT-Symposium 2006

www.decus.de 2

Agenda

1. Architekturen und HPC Cluster Bausteine 2. Cluster Management Software3. HP‘s Cluster Filesystem : SFS

HP – Decus IT-Symposium 2006 – www.decus.de 4

HPC platforms• Choice• Performance• Manageability

Version 2Version 2

CP6000CP3000 CP4000& CP4000BL

New!

HP Cluster Platforms

rx2620

rx1620

rx4640 Superdomerx7620 rx8620

HP IntegrityServers

DL140 G2DL145 G2

DL360 G4pDL385

DL380 G4

DL585

HP ProLiant Servers

HP Workstations

xw8200nw8240 xw9300 c8000

HP BladeSystemsBL20p G3

BL25pBL30p BL35p

BL40p BL45p

BL60pNew!

HP IT-Symposium 2006

www.decus.de 3

HP – Decus IT-Symposium 2006 – www.decus.de 5

Core 000

Core 001

HP Choice of Standard Processors: Leading the Dual-Core Curve

ProLiantDP/MP

Dual Core

ProLiantMP/DP

Dual Core

ProLiantUP

Dual Core

2Q05 4Q05 4Q05 3Q06

Itanium2Dual Core

Opteron, Xeon and IA64 in 2006

New Dual-Core Intel Xeon Processors for ProLiant Platforms arrived 4Q05

ProLiantDP 1066Mhz

Dual Core

2Q06Paxville Dempsey Montecito

HP – Decus IT-Symposium 2006 – www.decus.de 6

Architecture Comparison

• Bus bottlenecks reduced or eliminated

• Adding CPU’s adds memory and I/O bandwidth

• 5.3 GB/s dedicated CPU memory bandwidth

• CPU-to-CPU cHT links offer bandwidth of 3.2GB/s in each direction (HT1, 800MHz)

• Each PCI-X Bus has bandwidth of 3.2GB/s

• I/O is independent of memory access

• Under full load each CPU gets <=½ of max bus

bandwidth

• Memory and I/O must share the same bus

• However, FSB clockrate will significantly increase

• single bus : not highly scalable past 2-way

• multiple buses are coming

• more functional units

Intel Xeon DP AMD Opteron

HP IT-Symposium 2006

www.decus.de 4

HP – Decus IT-Symposium 2006 – www.decus.de 7

Streams benchmark on DL585

Previous, inofficial/official(1.8.05) numbers (MB/sec):

11800.23/145625974.973014.73triad

11926.22/145996044.213047.45add

11667.05/138945904.612984.81scale

11677.71/138935915.542994.53copy

DL585-4P DL585-2PDL585-1PKernel 2.6.6

http://www.cs.virginia.edu/stream/stream_mail/2005/0005.html

HP – Decus IT-Symposium 2006 – www.decus.de 8

K8LInternet news Auszug (im wesentlichen von Chuck Moore, 16. Mai

2006)• QuadCore chips in 2007, 65nm (ein chip)• Hypertransport bis zu 5.2GT/s bei vermutlich 8B pro Takt• Verbesserungen bei der Cache Kohärenz (multisocket systems ?)• L2 cache pro core/onchip, gemeinsamer L3 cache• Separate Stromversorgung für memory controller und cores• 48b virtual/physical addressing (bis zu 256TB) und 1GB pages• DDR2 ab FX2 (AM2) socket, DDR3 später (mit neuem socket ?)• RAS Eigenschaften für Memory Zugriff und HT• OOO special load - ähnlich memory disambiguation/speculative load• 2x 128b SSE units auch für FP ops

HP IT-Symposium 2006

www.decus.de 5

HP – Decus IT-Symposium 2006 – www.decus.de 9

DP Dempsey Overview

Yes**Demand Based Switching (DBS)

YesIntel I/OAT

2 x 2 MBIndependent L2 Cache

YesHyper-Threading Technology

Q1’06Availability Target

LGA 771Socket

65nmProcess technology

YesIntel® EM64T

DP Dempsey Feature Summary

MCH FSB

ExecutionCore

2MB L2Cache

ExecutionCore

2MB L2Cache

** DBS will not be supported on all SKUs** DBS will not be supported on all SKUs

HP – Decus IT-Symposium 2006 – www.decus.de 10

Intel x86 processor roadmap BemerkungenDual-Core Xeon/Paxville, Oktober 2005Paxvill DP begann mit 3.2GB Taktrate, 2x2MB onchip L2

Cache. 800MT/s FSBDempseyÄhnlich Paxville, aber 65nm DP. Hat gegenüber Intels

„Presler“ SMP support. 2.5-3.73 GHz. 666 bis 1033MHz. 2x2MB onchip L2 cache.

Woodcrest (80W)Dual-core Prozessor für vor allem 2socket Systeme

(basierend auf Intels Merom und Conroe cores, NGA). 1333MT/s FSB, dual independent buses. 4MB shared L2 cache. 1.6-3.0 GHz. Mehr functional units

ClovertownQad-core Version von Woodcrest aus 2 Woodcrest dies auf

einem MCM. Vermutlich 1066 MT/s FSB clockrate.

HP IT-Symposium 2006

www.decus.de 6

Agenda

1. Architekturen und HPC Cluster Bausteine2. Cluster Management Software3. HP‘s Cluster Filesystem : SFS

HP – Decus IT-Symposium 2006 – www.decus.de 12

HP ProLiant servers for HPTCPower and choice for scale-out solutions

DL500 Series• DL585 − The industries most capable

Opteron-based 4-way server, with 64GB memory capacity and best-in-class management and uptime features

DL300 Series • DL360 / DL380 / DL385− Maximum compute power

with commercial robustness

DL100 Series• DL140/DL145 − High-performance, low-cost

2P/1U compute node optimized for HPC environments.

BL series• BL2xp/BL3xp/BL4xp− Performance 2P & 4p blades

designed for density− Cluster manageability &

connectivity

HP IT-Symposium 2006

www.decus.de 7

HP – Decus IT-Symposium 2006 – www.decus.de 13

HP Cluster Platform components:Specific nodes• ProLiant DL145 two processors, 1U

− Opteron 2.8GHz SC or 2.6GHz DC− two PCI-X slots− up to 16GB memory (PC3200 DDR)

• ProLiant DL585 four processors, 4U− Opteron 2.8GHz SC or 2.6GHz DC− six PCI-X buses, 8 slots− up to 64GB memory (PC2700 DDR), or 32GB (PC3200 DDR)

• Xeon-64 DL360 two socket 1U − 3.4GHz − two PCI-express slots− Up to 8GB memory (DDR333)

• Integrity servers− Next generation of Itanium2 processors− rx1620, rx2620 and rx4640

HP – Decus IT-Symposium 2006 – www.decus.de 14

ProLiant DL145 G2 highlights

Performance• AMD Opteron processors processor w/1GHz

HyperTransport− Model 285 (2.6GHz/1MB) dual-core− Model 254 (2.8GHz/1MB) single-core− Integrated memory controlling running at processor

frequency and support for AMD PowerNow!TM

• 8 DIMM slots supporting up to 16GB of 400MHz DDR1 memory (DDR3200)

• 2 PCI-X 64-bit/133 MHz slots (one full-length, one low-profile)

• Optional PCI-Express support @ x16 (full-length)• NHP SATA and SCSI Hard Disk Drive support• SATA RAID 0 & 1 Support (Optional SAS HBA)

Design & Connectivity• New bezel design with UID for easy

identification in large-scale rack deployments

• Simplified rack rail design and common 1U rail kit

• 4 USB ports: 2 front & 2 rear

Management• HP ProLiant Lights Out 100i Remote

Management • IPMI 1.5 • IPMI 2.0

low cost 1U server with two AMD Opteron 200 series processors,

delivering outstanding performance and price for both High Performance Computing (HPC) environments, and cost conscious server deployments.

SpeedbumpMarch 2006

HP IT-Symposium 2006

www.decus.de 8

HP – Decus IT-Symposium 2006 – www.decus.de 15

ProLiant DL145 G2 Overview

Ideal for large clustered High Performance Computing (HPC) environments and general purpose compute requirements for corporate datacenters and cost-conscious small and medium businesses.

Key Benefits– Maximum performance 2P/1U compute node at an affordable

price– Reliable and flexible infrastructure compute engine for

businesses of all sizes– Complete tools for essential system management – Tier 1 Vendor Service and Support

HP – Decus IT-Symposium 2006 – www.decus.de 16

DL145 G2 OS SupportFull Support for…• Microsoft Windows− 2003 Server (Enterprise, Standard, and Web)− 2003 Server for64-bit Extended Systems (Standard

& Enterprise)− 2000 Server and Advanced Server

• RedHat− Enterprise Linux AS 3 (32-bit & 64-bit), WS 3 (32-bit

& 64-bit), ES 3 32-bit & 64-bit), AS 4 (32-bit & 64-bit), WS 4 (32-bit & 64-bit), ES 4 32-bit & 64-bit)

• SuSE Linux ES8 and ES9 (32 bit & 64-bit)• Solaris 10 (64-bit)

HP IT-Symposium 2006

www.decus.de 9

HP – Decus IT-Symposium 2006 – www.decus.de 17

DL145 G2 Layout

(2) 133MHz PCI-X slots”

(1) full-length & (1) low-profile

slot; Optional

PCI-E @x16 in place of full-length

slot

2 Non-hot Plug SATA

or SCSI HDDs

500W Power Supply

6 Non-hot plug fans

Processor & Memory Modules

HP – Decus IT-Symposium 2006 – www.decus.de 18

Product Overview: 4U 4P - ProLiant DL585

Maximum Performance

•Up to four 852 series 2.6GHz AMD Opteron processors with 1MB L2 cache and on-board 2.6GHz memory controller for outstanding performance and scalability

•HyperTransport technology delivering 8GB/sec CPU to CPU throughput for maximum performance and scalability

•Up to 64GB 2-way interleaved DDR •PC2700: 64GB running at 266MHz or 48GB at 333MHz•PC3200: 32GB running at 400MHz

•8 expansion slots: 6 x 64-bit/100 MHz and 2 x 64-bit/133MHz PCI-X

•Dual-port Gbit NIC and Smart Array 5i Plus controller with Battery-backed Write Cache enabler

ProLiant Management•Powerful Integrated Lights-Out (iLO) technology embedded•Support for SmartStart and Systems Insight Manager

Outstanding Uptime•Advanced ECC memory protection•Hot Plug redundant power supplies and fans•Redundant ROMs

HP IT-Symposium 2006

www.decus.de 10

HP – Decus IT-Symposium 2006 – www.decus.de 19

Product Overview- ProLiant DL585

• Support for 4 800 series AMD Opteron Single and Dual Core processors and on-board full speed memory controller for outstanding performance and scalability

• HyperTransport technology delivering 8.0GB/sec CPU to CPU throughput for maximum performance and scalability

• Up to 128GB dual channel DDR1 at 266MHz (PC2700), 48GB at 333MHz(PC2700) or 32GB at 400MHz (PC3200)

• 6 x 64-bit/100 MHz PCI-X expansion slots and 2 x 64-bit/133MHz PCI-X slots• Dual-port Gbit NIC and Smart Array 5i Plus controller with Battery-Backed

Write Cache enabler

The industry’s top performing x86 4-way rack server, combining AMD’s new Opteron dual core processor technology, best-in-class management and high uptime features in a system ideal for large data center deployments.

• Powerful Integrated Lights-Out (iLO) technology embedded• Support for SmartStart and Systems Insight Manager

• Advanced ECC memory protection• Hot plug redundant power supplies and fans• Redundant ROMs

Performance

Management & Deployment

Uptime

HP – Decus IT-Symposium 2006 – www.decus.de 20

AMD Opteron™ Dual Core Overview• The AMD Opteron™ processor was

designed from the start to add a second core− Port already existed on crossbar/SRI− One die with 2 CPU cores, each core

has its own 1MB L2 cache• Drops into existing AMD Opteron 940-pin

sockets that are compatible with 90nm single core processors

• A BIOS update is all that is necessary to get a 2 processor/2 core server up and running as a 2 Processor/4 core server.

• The 2 CPU cores share the same memory and HyperTransport™ technology resources found in single core AMD Opteron processors− Integrated memory controller &

HyperTransport links route out the same as today’s implementation

CPU0

1MB L2 Cache

CPU1

System Request InterfaceCrossbar Switch

MemoryController HT0 HT1 HT2

Existing AMD Opteron™ Design

1MB L2 Cache

HP IT-Symposium 2006

www.decus.de 11

HP – Decus IT-Symposium 2006 – www.decus.de 21

Opteron Price-PerformanceSingle vs Dual Core

Price-Performance - SPEC rates

DL585 2.4 SC

DL585 2.6 SC

DL585 2.2 DC

Sun V40z 2.2 DC

DL145G2 2.6 SC

DL145G2 2.2 DC

75.0

100.0

125.0

150.0

175.0

200.0

225.0

250.0

5000 10000 15000 20000 25000 30000 35000 40000

List Price

Per

form

ance

SPE

Cin

t_ra

te +

SPE

Cfp

_rat

e (b

ase)

Better

Constant Price-PerformanceSource: www.spec.org

HP – Decus IT-Symposium 2006 – www.decus.de 22

HP extends its x86 Blade offerings:More performance & choice with 100% compatibility

Greater 32-bit performanceProLiant design consistencyTransparent 32/64-bit capabilitiesDual-core

Performance 2P blade server using AMD Opteron technology, providing the industry’s best blade performance, ideal for high performance blade deployments.•16GB RAM max•Up to 2 HP SCSI hard drives

BL25p

Double-dense 2P blade server using AMD Opteron technology, optimized for compute density and external storage solutions.•8 GB RAM max•2 SFF ATA or 1 SFF SAS drives with pre-failure alerting

BL35p

BL45p4P blade for superior scalability and Infrastructure Apps.•Occupies 2 server blade bays•32GB RAM max•Up to 2 HP SCSI hard drives

HP IT-Symposium 2006

www.decus.de 12

HP – Decus IT-Symposium 2006 – www.decus.de 23

HP ProLiant DL360 G4p• 1-2 Intel Xeon DP ( 3.4 und 3.6 GHz)• 2MB L2 Cache• Hyper-Threading Technologie• EM64T- 64 bit Erweiterung• 800 MHz Systembus• 1 – 12 GB DDR Hauptspeicher

(PC3200)• Online Spare Memory• 2 Hot Plug SCSI Festplatten (max.)

oder 2 SATA Festplatten (no dvd then)• Smart Array 6i Plus RAID Controller• 2 64-bit (133 MHz) PCI-X Steckplätze• Optinal PCI-Express• iLO Integrierter Management Prozessor• 2 integrierte Gigabit Netzwerkkarten• Optional redundante Netzteile• 1 Höheneinheit

HP – Decus IT-Symposium 2006 – www.decus.de 24

DL360 G4p with Dual Core Processors

Smart Array P600256MB BBWC includedRAID 0/1/5/6 in a slot

Smart Array 6iOptional 128MB BBWCRAID 0/1

RAID

4 SFF SAS Drive Bays2 Ultra320 SCSI Drive BaysMaximum internal drives

1 Available PCI-X Optional PCI-Express

2 PCI-XOptional PCI-Express

Slots

Dual Core 2.8GHz Intel Xeon; or up to 3.8GHz Intel Xeon processors; 2MB L2 cache per core; Up 12GB DDR2 memory

Up to 3.8GHz Intel Xeon processors with 2MB L2 cache; 1GB DDR2 memory standard

Processor/Memory

DL360 G4p SASDL360 G4p SCSI

Concentrated 1U compute powerFlexible, enterprise-class 1U server with integrated Light-Out management and essential fault tolerance

Dual Core Xeonavailable sinceFebruary ‘06

HP IT-Symposium 2006

www.decus.de 13

HP – Decus IT-Symposium 2006 – www.decus.de 25

HP Support Advantages• Access to 88,000 service professionals • 70 worldwide help desks for around-the-clock

support• 24 x 365 business critical support in 160

countries• Full range of startup, installation, extended

warranty, network planning, software updates, system health checks, recovery services, and IT outsourcing

• Instant Support Enterprise Edition (ISEE)− Single common support solution to manage

entire IT network− Filters events to identify all actionable service

events− Proactively reduces downtime risks – provides

quick recovery− Simplifies multi-vendor service management− Supports all ProLiant hardware & OS platforms

HP – Decus IT-Symposium 2006 – www.decus.de 26

More flexibility: More choice. HP does not force customers into one business model area. We provide a variety of solutions, allowing customers to purchase products where they feel most comfortable, such as from a trusted reseller. HP also provides complete CTO capabilities with its Factory Express solution.

• A pre-priced, pre-packaged, comprehensive and flexible portfolio of configured, customized and integrated factory solutions and deployment services

• Customers choose how their solution is built, integrated, tested, shipped and deployed.

Factory Express

Why We Win. The HP Difference

The ProLiant Advantage

HP IT-Symposium 2006

www.decus.de 14

Agenda

1. Architekturen und HPC Cluster Bausteine 2. Cluster Management Software3. HP‘s Cluster Filesystem : SFS

HP – Decus IT-Symposium 2006 – www.decus.de 28

HP Cluster Platforms• Factory integrated hardware solution with optional software

installation− Includes nodes, interconnects, network, racks, etc. integrated

& tested• Configure to order from 5 node to 1024 nodes (more by

request)− Uniform, worldwide specification and product menus− Fully integrated, with HP warranty and support

GigEProLiant BL35P/BL45P

GigE, IB, QuadricsIntegrity rx1620 Integrity rx2620

HP Cluster Platform 6000

GigE, IB, Myrinet, Quadrics

ProLiant DL145 G2ProLiant DL585

HP Cluster Platform 4000 and4000BL

GigE, IB, MyrinetProLiant DL360 G4 serverProLiant DL140 G2

HP Cluster Platform 3000

InterconnectsCompute Nodes

Dual Core

HP IT-Symposium 2006

www.decus.de 15

HP – Decus IT-Symposium 2006 – www.decus.de 29

Basic Elements in HP Cluster Platforms• Nodes:

− Control nodes (aka head node, mgmt node)− Compute nodes − Utility nodes (aka service nodes)

• Added nodes for special admin tasks (e.g., login, file)− Visualization nodes

• Network/switches: − Admin and cluster network (GigE)− Console network (leverages IPMI, ILO functionality in nodes)− Optional high-performance cluster interconnect

• Rack infrastructure− PDUs, monitor

• Software options• Integration

HP – Decus IT-Symposium 2006 – www.decus.de 30

Interconnect technology directions• Ethernet

− Integrated1GbE− Optional 10GbE w/ offload− Optional multifunction 1GbE

(e.g. trunking)

• InfiniBand− 4X DDR HCA & switches(for e.g. PCI-E in dl145)

• Myrinet− HP-MPI support for MX− Myri-10G

• QsNet− Elan4− QsNet III (Elan5)

• HTX options (?)

top-level switches (288 ports)

node-level switches (24 ports)

Connects to 12 nodes

HP IT-Symposium 2006

www.decus.de 16

HP – Decus IT-Symposium 2006 – www.decus.de 31

• InfiniBand− Emerging industry standard− IB 4x – speeds 1.8GB/s, 3.5μSec MPI latency− 24 port, 96 port, 288 port switches− Scalable topologies with federation of switches

• Myrinet− Rev F – speeds 489MB/s, 2.6μSec MPI latency− Rev E – speeds800MB/s, 2.7μSec MPI latency− 16 port, 128 port, 256 port switches− Scalable topologies with federation of switches

• Quadrics− Elan 4 – 800MB/s, <1.3μSec MPI latency− 8 port, 32 port, 64 port, 128 port switches− Scalable topologies with federation of switches

• GigE− 60-80MB/s, >40 μSec MPI latency

High Performance Interconnects

node-level switches (128 ports)

Connects to 64 nodes

top-level switches (288 ports)

node-level switches (24 ports)

Connects to 12 nodes

node-level switches (128 ports)

Connects to 64 nodes

top-level switches (264 ports)

PCI-e

HP – Decus IT-Symposium 2006 – www.decus.de 32

HP Cluster Platform Examples

32 compute node configuration

128 node configuration

Designed for expansion

HP IT-Symposium 2006

www.decus.de 17

HP – Decus IT-Symposium 2006 – www.decus.de 33

Front Rear

New additions to HP Cluster Platforms• CP3000 with dual-core Xeon

processors• Visualization building blocks• New CP4000BL based on BL35p

and BL45p blade servers, providing:− Simplified management− Performance and scalability− Reduced interconnect and network

complexity− High density− Centralized power management

• Cluster Platform Express− Faster, easier way of configuring and

ordering small clusters− Single-rack (up to 32 nodes) CP3000 and

CP4000, with GigE or Infiniband− Configuration tool at

www.hp.com/go/hptcHPC ClustersHP Cluster Platforms

HP Cluster Platform Express

HP BladeSystems

HP – Decus IT-Symposium 2006 – www.decus.de 34

XC Cluster: HP’s Linux-Based Production Cluster for HPC

• A production computing environment for HPC built on Linux/Industry standard clusters − Industrial-hardened, scalable, supported− Integrates leading technologies from open source and partners

• Simple and complete product for real world usage− Turn-key, with single system ease of deployment and

management− Smart provisioning takes the guess work and experimentation out

of deployment− Allows customer focus on computation rather than infrastructure

• General purpose technical computing flexibility− Supports throughput ‘farms’ as well as MPI parallel jobs− Plugs into ‘grid’ environments

HP IT-Symposium 2006

www.decus.de 18

HP – Decus IT-Symposium 2006 – www.decus.de 35

HP Delivers a Complete Solution

HP Cluster Platform

XC System Software

Linux OSCluster ManagerJob Scheduler (LSF)HP-MPISFS Client

NodesNetworksStorage

Compilers, Development

tools

Validated selection of Compilers,Math Libraries, Debuggers, Profiling Tools

ApplicationsExtensive portfolio of testedapplications

HP ServicesSupport and Consulting, Training,On site Staffing

HP – Decus IT-Symposium 2006 – www.decus.de 36

XC System Architecture

HP IT-Symposium 2006

www.decus.de 19

HP – Decus IT-Symposium 2006 – www.decus.de 37

Installation and configuration

• Head node configuration− Includes code replication

environment (systemimager)• “golden client”• Image server

• Other node configuration− Image installed via SystemImager in

a two phase process• Phase-1: node is generically

imaged using flamethrower (multicast)

• Phase-2: per-node personality is applied using configuration data

− At the end of this process, all nodes have been rebooted, and configured with their respective personality

• Smart provisioning: recommends and assigns roles, based on user preferences− Automated discovery of network

topoloty− Distributed service roles − Sets up firewall

HeadNode

SystemFiles

XCDistribution

Kickstart

HeadNode

SystemFiles

AdminNetwork

XCDB

SystemImagerPropagation

SystemImagerGolden Client

HP – Decus IT-Symposium 2006 – www.decus.de 38

Adding nodes• Simple set of commands to discover additional

nodes and identify associated switches• Define roles via cluster config commands• Image the nodes

• Cluster continues to be online and available

• Adding switches?−Discover switch command

HP IT-Symposium 2006

www.decus.de 20

HP – Decus IT-Symposium 2006 – www.decus.de 39

Monitoring • Nagios−An open source host, service and network monitoring

program −Monitoring of network services such as SMTP−Monitoring of host resources such as processor load−Simple plugin design that allows administrators to

easily develop their own service checks and define event handlers

−Parallelized service checks−Contact notifications when service or host problems

occurs and get resolved

• Info collected via Supermon

HP – Decus IT-Symposium 2006 – www.decus.de 40

XC Resource Management• SLURM and LSF−Scalability−Handling of STDIO, signals, etc…−Common underpinning to allow for PBS and/or other

batch schedulers

• LSF manages the user workload and creates demand for resources.

• SLURM manages the cluster resources and provisions the resources to workload queues.

HP IT-Symposium 2006

www.decus.de 21

HP – Decus IT-Symposium 2006 – www.decus.de 41

Manageability• Comprehensive and integrated system−Hardware and software−Supported by HP – includes patches and updates

• Automated discovery and smart provisioning• Central point of control and administration• Integrated job management− Includes job accounting functions from LSF

• Upgrade utilities

HP – Decus IT-Symposium 2006 – www.decus.de 42

XC System Software V3• Based on a Red Hat EL 4.0 AS compatible distribution• Open source utilities

− SLURM (for resource management)− SystemImager w/Flamethrower (cluster install and update )− Nagios and SuperMon (monitoring)− Syslog-ng (logging)

• Multiple LSF implementation choice:− Integrated with SLURM− LSF standalone, by pass SLURM− Bypass LSF install, and use 3rd party

• SFS support, with option to use as high performance global file system in cluster

• HP-MPI integrated• Full support and service - worldwide, extensive testing• Available on HP Cluster Platform 3000, 4000, and 6000

HP IT-Symposium 2006

www.decus.de 22

HP – Decus IT-Symposium 2006 – www.decus.de 43

ISVs and XC Clusters: Key applications tested and supported

Accelrys Materials Studio, Lion Biosciences, SCM ADF, Tripos, OpenEye…plus lots of open source code…and more coming

Life and Material Sciences

Abaqus, ACUSIM Acusolve, ADINA, ANSYS, AVL Fire, CD-Adapco Star-CD, ESI CFD/ACE, PamCrash/ PamFlow, Exa PowerFlow, Fluent, LSTC LS-Dyna, MSC Software Marc and Nastran, Mecalog Radioss, UGS Nastran….

CAE

Altair PBSPro, Globus Toolkit, Platform LSF and LSF Multicluster, United Devices MP Synergy

Grid and Resource Management

GO TO XC WEB SITE FOR LATEST UPDATE ON ISV SUPPORT!

http://www.hp.com/techservers/clusters/xc_clusters.html

HP – Decus IT-Symposium 2006 – www.decus.de 44

XC also available with large SMP nodes• HP Cluster Platforms built with rack-optimized

nodes – primarily 2P, along with 4P rx4640• Some customers desire a mix in cluster to enable

support for large memory jobs as well as typical distributed cluster applications

• XC Clusters have been deployed in multiple sites with modified HP Cluster Platform designs to incorporate nodes using HP Integrity rx8620 servers, with 16 way SMP.

HP IT-Symposium 2006

www.decus.de 23

HP – Decus IT-Symposium 2006 – www.decus.de 45

HP Services available for XC clusters• XC Clusters are rack integrated at the factory

− Includes staging and testing

• On-site customer installation by HP technicians• Required set-up and startup service from C&I • Standard product support and warranties

− Software: 90 day warranty− Hardware warranty = underlying server nodes− Standard support offerings available worldwide

• Optional services include: − Cluster integration management− Cluster system quickstart− Cluster applications quickstart− Training

HP – Decus IT-Symposium 2006 – www.decus.de 46

Performance Benchmarks• XC ranks at top of current TAP (Top Application Performance) list

maintained by Purdue http://www.purdue.edu/TAPlist/

HP IT-Symposium 2006

www.decus.de 24

HP – Decus IT-Symposium 2006 – www.decus.de 47

Linux Cluster Programming Environment

• Compilers− Intel Visual Fortran, Visual C++−Portland Group Inc−Pathscale−GNU

• Debuggers and Profilers−Etnus TotalView− Intel VTune, AMD OProfile

• Libraries−MPI - HP-MPI, MPICH, Linda, OpenMP−Math Libraries

• HP mlib, Intel MKL, AMD ACML

HP – Decus IT-Symposium 2006 – www.decus.de 48

New HP MPI V2.1: The Universal MPI for Linux, HP-UX, XC & Tru64• Broadens portfolio of applications−Transparent support for multiple interconnects

• TCP/IP; Quadrics; InfiniBand; Myrinet−Enables single executable for each OS (HP-UX, Linux,

Tru64 Unix)−Endorsed by major ISVs

• New Functionality and Performance Enhancements−MPI-2 support−MPICH compatibility−Profiling tools

• Available on non-HP platforms through our ISV partners supporting HP MPI

HP IT-Symposium 2006

www.decus.de 25

HP – Decus IT-Symposium 2006 – www.decus.de 49

HP CMU has 3 main features(Cluster Management Utility)

The Management featureThis will help you in your day to day administration. You can halt, boot, reboot or broadcast commands to a set of nodes.

The Cloning Feature

This will help you to deploy rapidly a golden image on all the nodes of a large cluster.

The Monitoring FeatureThis will help you to see rapidly the state of your cluster.HP CMU can warn you whenever the state of a node is changing

HP – Decus IT-Symposium 2006 – www.decus.de 50

HP CMU V2.0 Graphic User Interface

The root window lists the nodes on your cluster and provides you with a visual status of node activity.

You can select any number of nodes and apply a command to the entire group

On this single window you supervise the status of more than 1024 nodes.

A single click on a node cell opens a telnet or console session on the node.

HP IT-Symposium 2006

www.decus.de 26

HP – Decus IT-Symposium 2006 – www.decus.de 51

HP CMU Management feature

• Simply select the desired nodes and HP CMU will execute the command on all the nodes.

• HP CMU uses the ECI and RILOE/ILO management cards functionalities

HP – Decus IT-Symposium 2006 – www.decus.de 52

HP CMU Console Broadcasting

Broadcast typing

Broadcasted typingor direct access

HP IT-Symposium 2006

www.decus.de 27

HP – Decus IT-Symposium 2006 – www.decus.de 53

HP CMU Event Handling

• Node status is probed regularly over the network .When a node status changes (up to down or down to up), HP CMU can optionally :

• send a mail to configured user(s)• display a pop-up window• execute a script with node name and a status as argument.

HP – Decus IT-Symposium 2006 – www.decus.de 54

HP CMU Cluster Configuration tools

• The Scan Node tool automatically registers the nodes with their network parameters in the HP CMU database.

• The whole cluster configuration is handled in a single file and can be exported and imported for backup or replication

HP IT-Symposium 2006

www.decus.de 28

HP – Decus IT-Symposium 2006 – www.decus.de 55

CMU V3.0 backup

HP – Decus IT-Symposium 2006 – www.decus.de 56

HP CMU cloning Mechanism (phase 1)

HP IT-Symposium 2006

www.decus.de 29

HP – Decus IT-Symposium 2006 – www.decus.de 57

HP CMU cloning Mechanism (phase 2)

HP – Decus IT-Symposium 2006 – www.decus.de 58

HP CMU Monitoring interface – Cluster View

Alert raised

Group Summary

CPUusage

Node state

HP IT-Symposium 2006

www.decus.de 30

HP – Decus IT-Symposium 2006 – www.decus.de 59

HP CMU monitoring design goals

It is highly scalableThe nodes are divided into network entities (nodes on a same switch), and are reporting the monitoring results to a secondary server in their network entity.Each secondary server consolidate the data and send them to the management node

It is highly customizable The parameters to monitor are stored in a text file. For each parameter an independent script is associatedThe user can easily choose the parameters he wants but also define its own action (CPU consumption of its own application…)

It is highly adaptableThe monitoring daemons are totally independent from the GUI. So that the results of the cluster monitoring can be used by any other application if the appropriate plug-in is written

It is highly reliableThe whenever a monitoring daemon is not sending data anymore, it is automatically respaned by its master.

Agenda

1. Architekturen und HPC Cluster Bausteine 2. Cluster Management Software3. HP‘s Cluster Filesystem : SFS

HP IT-Symposium 2006

www.decus.de 31

HP – Decus IT-Symposium 2006 – www.decus.de 61

HP HPC solution = cluster of clusters ...

GigE ethernet for boot and system control traffic

10/100 Ethernet out-of-band management (power on/off, etc)

Connectivity to all nodes

HP SFS Cluster

OSSOSS

OSSOSSMDS

Admin

… …

ComputeCompute

ComputeComputeCompute

ComputeCompute

ComputeCompute

Admin

Connectivity to all nodes

Connectivity to all nodes

Connectivity to all nodes

Connectivity to all nodes

Connectivity to all nodes

ComputeComputeCompute

AdminLoginLoginLoginLogin

ComputeCompute

ComputeComputeComputeComputeComputeCompute

High Speed Interconnect (GbE, InfiniBand, Myrinet, Quadrics)

CampusNetworkCampusNetwork

Many Compute NodesLustre

Multiple OSS nodes…

HP – Decus IT-Symposium 2006 – www.decus.de 62

What is HP StorageWorks Scalable File Share?• Lustre™ file system• NFS• 1000+ Linux clients• 64 servers• 512TB – 1PB of

data • Best cost-

performance in the industry

HP StorageWorksScalable File Share

(HP SFS)

HP IT-Symposium 2006

www.decus.de 32

HP – Decus IT-Symposium 2006 – www.decus.de 63

Lustre- scalable high performance filesystem• New architecture – benefit of hindsight• Open source technology• Developed by Cluster File Systems−HP continuing involvement through DoD Pathforward

program (Hendrix). CMD, Security.• Separate and scalable metadata (MDT)• Object based storage (OST)• Highly efficient, network independent layer• Near linear scaling for clients and servers• POSIX compliant

HP – Decus IT-Symposium 2006 – www.decus.de 64

Lustre Komponenten

Recovery

File Status

File Creation

System & Parallel File I/O

File Locking

Directory,

Metadata & Concurrency

ObjectStorageTargets

ObjectStorageTargets

MetadataServers

MetadataServers

ObjectStorageTargets

ObjectStorageTargets

ClientsClients

MetadataServers

ObjectStorageTargets

ClientsClientsClientsClientsClients

Configuration information,

Network connection details

& Security management LDAP Server

HP IT-Symposium 2006

www.decus.de 33

HP – Decus IT-Symposium 2006 – www.decus.de 65

Lustre Key Features• Scalable Performance (100s of OSTs, 1000s of clients)• Separation of meta-data handling from I/O processing

• Striping of file data across multiple OSTs0 841 5 92 6 103 7 11

0 4 8 1 5 92 6 10 3 7 11

12MB file

OST4OST1 OST2 OST3

striped across 4 OSTs

OSTMDT Client

open()

close()

read()

write() + Failover !

HP – Decus IT-Symposium 2006 – www.decus.de 66

HP SFS- converting technology to product

• HP SFS = “Scalable File Share”.− Product offering that provides a

reliable global file system for Linux clusters

− Lustre open-source technology + HP management tools + hardware

− Balanced system with optimisedperformance

• Integration, qualification and support− One stop shop for complete Lustre

solution− Global support by HP Services

• System management− System administration− Additional features to deliver a “whole

product”

hp qualification

hp Storage

hp Servers

hp supportHP SFS

hp Sysmantools

Interconnect

HP IT-Symposium 2006

www.decus.de 34

HP – Decus IT-Symposium 2006 – www.decus.de 67

HP SFS Hardware ArchitectureHP SFS SystemServers grouped into cells1-32 cells per systemEach cell can perform a number of roles

HP ProLiant DL380ServerHigh bandwidth I/O Paired servers provide redundancy

HP SFS20 Storage EnclosureDeveloped for HP SFSDual attached for redundancy12 SATA Disks (250GB)2TB storage per enclosure2-8 enclosures per cell

SFS CellSFS Cell

Linux clients

SFS CellSFS CellSFS

CellSFS CellSFS CellSFS CellSFS

CellSFS Cell

Interconnect

HP – Decus IT-Symposium 2006 – www.decus.de 68

X-Large HP SFS/SFS20 Configuration240 TB usable, 11 GB/s read, 9.2 GB/s write

Base Cab Expansion CabsExpansion Cabs

• Base cabinet: 2 MDSes, 2 OSS, switches, console− 16 TB usable, 0.7 GB/s read, 0.6 GB/s write (on ELAN4)

• Expansion cabinets: 4 OSSes w/4 SFS20s per OSS− Per cabinet: 32 TB usable, 1.5 GB/s read, 1.2 GB/s write (ELAN4)

• Total depicted: 2 MDSes and 30 OSSes (on ELAN4)

HP IT-Symposium 2006

www.decus.de 35

HP – Decus IT-Symposium 2006 – www.decus.de 69

Storage- performance and availability• SFS20−Highly competitive cost/performance− 2TB per active SFS20, 8 SFS20s per OSS pair−Dual host-connected for failover −Raid5; Raid6 (ADG) optionally mirrored.

• EVA3000 (EVA4000)−Enterprise class storage−Dual controller; dual fabric; virtual Raid5

HP – Decus IT-Symposium 2006 – www.decus.de 70

SFS20 Product Overview

StorageWorks Modular Smart Array 20 (MSA20)Low-cost, high capacity, external storage ideal for low I/O workloads such as reference data, archival, and disk-to-disk backup

2U, Serial-ATA to U320 SCSI, external storage array

HP IT-Symposium 2006

www.decus.de 36

HP – Decus IT-Symposium 2006 – www.decus.de 71

HP SFS V2.1 – current, since Jan 2006• Dual GigE support−Double bandwidth

• SFS20− 500GB disk support

• DL360 G4p server support• Misc updates− Interconnect versions−Clients

• Failover on interconnect failure• Improved monitoring

HP – Decus IT-Symposium 2006 – www.decus.de 72

HP SFS V.south – Q2 2006• EVA4000−Approx double performance c/f EVA3000−FC dual path

• Lustre 1.4.6+−Quotas−ACLs− Improved networking configuration

• Enhanced systems management− Insight Manager integration

• Server software−RHEL4 / rh2.6

HP IT-Symposium 2006

www.decus.de 37

HP – Decus IT-Symposium 2006 – www.decus.de 73

HP SFS performance

HP SFS Read + Write (1 and 2 OSS each with 4 SFS20)

0

200

400

600

800

1000

1200

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

clients

MB

/s

Aggregate Read - 2 OSSAggregate Read - 1 OSSAggregate Write - 2 OSSAggregate Write - 1 OSS

HP – Decus IT-Symposium 2006 – www.decus.de 74

http://www.mscsoftware.com/support/prod_support/nastran/performance/v0109_sngl.cfm

Elapsed/CPU times for XXCMD benchmark :

HP IT-Symposium 2006

www.decus.de 38

HP – Decus IT-Symposium 2006 – www.decus.de 75

MSC.Nastran V2001.0.9 Serial Test Results

XXCMD with ACMS 730 Gb31 Gb400 Mb103Car Body1,584,622XXCMDA

1073 Roots 2400 Gb43 Gb800 Mb103Car Body1,584,622XXCMD

32 Frequency Increments 209 Gb5 Gb450 Mb108Car Body529,027XLTDF

77 Gb10 Gb400 Mb101Propeller Housing2,490,516XXAFST

3 Design cycles, 500+roots1500 Gb26 Gb1700 Mb200Car Body486,573XLOOP

Acoustics 448 + 34 Roots 328 Gb11 Gb400 Mb111Car Body654,560XLEMF

76 Frequency Increments 700 Gb0.6 Gb100 Mb108Cube w/ interior31,125LGQDF

CommentsTotal I/OSCR Disk UsedMEMSOLDescriptionNdofName

From MSC’s webpage :

HP – Decus IT-Symposium 2006 – www.decus.de 76

xxcmd sfs / dasMSC.Nastran freq. (med i/o)

0

5000

10000

15000

20000

25000

1 2 4 8 16 32

# hosts

time

(sec

)

sfs 1 per host

sfs 2 per host

msa 1 per host

msa 2 per host

Preliminary application i/o measurements for sfs

Data from : Mark Kelly, HP Richardson

HP IT-Symposium 2006

www.decus.de 39

HP – Decus IT-Symposium 2006 – www.decus.de 77

HP SFS Excellent Scaling

Write scaling 1-48 OSTs

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

1 5 9 13 17 21 25 29 33 37 41 45

OSTs

MB/

s Write scaling from zeroLinear (Write scaling from zero)