cray xt3 for science - · pdf filecray xt3 for science david tanqueray cray uk limited...

27
HPCx Annual Seminar 2006 Cray XT3 for Science Cray XT3 for Science David Tanqueray Cray UK Limited [email protected]

Upload: nguyenkhue

Post on 18-Mar-2018

218 views

Category:

Documents


4 download

TRANSCRIPT

HPCx Annual Seminar 2006

Cray XT3 for ScienceCray XT3 for Science

David TanquerayCray UK Limited

[email protected]

Page 2HPCx Presentation 4th October 2006

TopicsCray IntroductionThe Cray XT3Cray RoadmapSome XT3 Applications

Page 3HPCx Presentation 4th October 2006

Supercomputing is all we do

Sustained GigaflopAchieved in 1988 on a Cray Y-MPStatic finite element analysis

Phong Vu, Cray ResearchHorst Simon, NASA AmesCleve Ashcraft, Yale UniversityRoger Grimes, Boeing Computing ServicesJohn Lewis, Boeing Computer ServicesBarry Peyton, Oak Ridge Nat. Laboratory

1 Gigaflop/sec on a Cray Y-MP with 8 processors1988 Gordon Bell prize winner

Sustained Teraflop• Achieved in 1998 on a T3E• LSMS: Locally self-consistent

multiple scattering method• Metallic magnetism simulation for

understanding thin film disc drive read heads, magnets used in motors and power generation

• 1.02 Teraflops on Cray T3E-1200E with 1480 processors

• 1998 Gordon Bell prize winner

B. Ujfalussy, Xindong Wang, Xiaoguang Zhang, D. M. C. Nicholson,W. A. Shelton, and G. M. Stocks; Oak Ridge National Laboratory.

A. Canning; NERSC, Lawrence Berkeley National Laboratory.Yang Wang; Pittsburgh Supercomputing Center

B. L. Gyorffy; H. H. Wills; Physics Laboratory, University of Bristol.Sustained Petaflop

Goal by 2010Cascade programme1st Petaflop order for Cray from ORNL

Page 4HPCx Presentation 4th October 2006

41.5 TFLOP peak performance140 cabinets

11,648 AMD Opteron™ processors10 TB DDR memory

240 TB of disk storageApproximately 3,000 ft²

Cray Scientific CustomersSandia National LaboratoriesRed Storm System

Oak Ridge National LaboratoryNational Leadership Computing Facility

Cray-ORNL Selected by DOE for National Leadership Computing Facility (NLCF)Goal: Build the most powerful supercomputer in the world250-teraflop capability by 2007

50-100 TFLOP sustained performance on challenging scientific applicationsCray X1/X1E and Cray XT3

Pittsburgh Supercomputer CenterCray XT3 Named “Big Ben”Peak Performance of 10 TeraFlops2,000 AMD Opteron ProcessorsApplications:

Protein SimulationsStorm ForecastingGlobal Climate modelingEarthquake Simulations

Page 5HPCx Presentation 4th October 2006

Cray Scientific CustomersSwiss National Supercomputing Centre (CSCS)

First Cray XT3 in Europe“Horizon” is joint initiative with Paul Scherrer Institut (PSI)

Expanded to 18 cabinets, 8.6 TFin August 2006

Highly used90+ % node utilization3x oversubscriptionTypical jobs using 64-256 cpus

ApplicationsMaterial ScienceEnvironmental ScienceLife SciencesAstronomyChemistry

AWE

Cray XT3 SystemDual Core40 TeraFlops

ApplicationsWeapons PhysicsMaterial ScienceEngineering

Page 6HPCx Presentation 4th October 2006

TopicsCray IntroductionThe Cray XT3Cray RoadmapSome XT3 Applications

Page 7HPCx Presentation 4th October 2006

The Cray XT3 SupercomputerMPP Architecture

Scales from 256 to 60,000 processing cores

Purpose-built Interconnect6.4 GB/s per processor socket delivers scalable performance

Scalable Application PerformanceLight Weight Kernel enables scalability and fine-grain synchronizationMPP job management and scheduling

Scalable I/OUp to 100 GB/sPrivate and shared global parallel file system

Reliability at Scale400 hours MTBF at 1,000 processors. One moving part, redundancy modelBuilt-in RAS for system management

Scalable by Design

Every aspect of the system supports applications that use

hundreds or thousands of processors simultaneously on

the same problem

Scalable by DesignScalable by Design

Every aspect of the system Every aspect of the system supports applications that use supports applications that use

hundreds or thousands of hundreds or thousands of processors simultaneously on processors simultaneously on

the same problemthe same problem

Cray XT3

Page 8HPCx Presentation 4th October 2006

Cray XT3 Processing Element: Measured Performance

Six Network LinksEach >3 GB/s x 2

(7.6 GB/sec Peak for each link)

5.7 GB/secSustained

1.1 Bytes/flop

6.5 GB/secSustained

2.17 GB/secSustained

.42 Bytes/flop

51.6 nslatency

measured

Page 9HPCx Presentation 4th October 2006

Scalable Software Architecture: UNICOS/lc

Microkernel on Compute PEs, full featured Linux on Service PEs.Contiguous memory layout used on compute processors to streamline communicationsService PEs specialize by functionSoftware Architecture eliminates OS “Jitter”Software Architecture enables reproducible run timesLarge machines boot in under 30 minutes, including filesystemJob Launch time is a couple seconds on 1000s of PEs Compute PE

Login PE

Network PE

System PE

I/O PE

Service Partition

Compute Partition

Specialized Linux nodes

Page 10HPCx Presentation 4th October 2006

• The Portland Group compilers PGI (unmodified from Linux version)

• High Performance MPI library

• Shmem Library

• AMD Math Libraries (ACML 2.6)

• CrayPat & Apprentice2 performance tools

• Etnus TotalView debugger available

• Static Binaries only

• UPC Support Coming

• We have support for GNU compilers as well and are adding support for PathScale

Programming Environment

Page 11HPCx Presentation 4th October 2006

Cray Apprentice2Call Graph Profile

Time Line View

Communication Activity View

Communication Overview

Pair-wise Communication View

Page 12HPCx Presentation 4th October 2006

TopicsCray IntroductionThe Cray XT3Cray RoadmapSome XT3 Applications

Page 13HPCx Presentation 4th October 2006

Cray Roadmap

Page 14HPCx Presentation 4th October 2006

“Hood” Program

Next Generation MPP Compute BladeProduct evolution for both Cray XT3 and Cray XD1 customers

ProcessorNext-generation AMD processor socketDual-Core with Multi-Core upgrade later

MemoryDDR2 667

Interface – HyperTransport 1.0Rainier Infrastructure

Interconnect - Next-generation SeaStar ASICPackaging/Cooling

Air CooledXT3 (96 sockets per cabinet)

Page 15HPCx Presentation 4th October 2006

Rainier Infrastructure

Cray’s next-generation products will rely on a common “Rainier” infrastructure:

Opteron-based service & I/O (SIO) bladesSeaStar networkSingle local file systemSingle point of loginSingle point of administration

Delivered with one or more types of compute resources

Hood compute blades (scalar)BlackWidow compute cabinets (vector)Eldorado compute blades (multi-threading)

Provides multiple architectures to users in a single system

Single administrative and user environmentCommon infrastructure means more budget can go towards compute resources

A major step on the road to adaptive supercomputing

The Rainier Infrastructure will allow customers to “mix-

and-match” compute resources

The Rainier Infrastructure The Rainier Infrastructure will allow customers to “mixwill allow customers to “mix--

andand--match” compute match” compute resourcesresources

Page 18HPCx Presentation 4th October 2006

DARPA – Cascade ProjectAdvanced Research Program

Goal of a “trans-petaflops system”Robust, easier to program, more broadly applicable

Phase IStarted in June 2002 for one yearFive total vendorsUniversity partners

Phase IIAwarded in June 2003$49.9M to CrayTwo other vendorsThree-year contracts

Phase IIIProposal submissions in May 2006Award to one or two vendors

“The cycle time from when engineers have an idea to when they have a program ready to

run is one of the bottlenecks in high-end computers, and it will

only become worse as we develop bigger and bigger

machines," -Robert Graybill- HPCS Program

“The cycle time from when engineers have an idea to when they have a program ready to

run is one of the bottlenecks in high-end computers, and it will

only become worse as we develop bigger and bigger

machines," -Robert Graybill- HPCS Program

Page 19HPCx Presentation 4th October 2006

TopicsCray IntroductionThe Cray XT3Cray RoadmapSome XT3 Applications

Page 20HPCx Presentation 4th October 2006

Cray XT3 for Science

Material Science

FusionEarth Science

Atomic Modelling•LSMS at Pittsburgh SCNanoparticle Research•Behaviour of FePt nanoparticles at ORNL

Plasma Simulation•Behaviour of plasma in a tokamak

Understanding Earthquakes•Earthquake Simulation at PSC

Global Climate Modelling•ECHAM at Max Plank

Significant Earthquakes

Page 21HPCx Presentation 4th October 2006

Atomic Modeling Code Performance on Cray XT3

0

2,000

4,000

6,000

8,000

10,000

0 500 1000 1500 2000

LSMS Performance on Cray-XT3 (bigben) at PSC

GFLOPS for the Run

Perfect Scaling (GFLOPS)

Estimated Performance (GFLOPS)

Tota

l Per

form

ance

(GFL

OPS

)

Number of Atoms (Nodes)

8.09 TFLOPS on 2068 nodes

8.03 TFLOPS on 2048 nodes

As processors are added to the problem, efficiency stays high As processors are added to the problem, efficiency stays high

Page 22HPCx Presentation 4th October 2006

Nanoparticle Research for Next Generation Storage using a Cray XT3 at Oak Ridge National Laboratory

The Science Challenge Potential to develop revolutionary new magnetic storage mediaNeed realistic thermodynamic description of the magnetic behavior of FePt nanoparticles

Computational Challenge The ability to rapidly perform the magnetic energy evaluations for nanoparticles containing thousands of atomsRequiring linear scaling codes, efficient numerical methods, andfast communication between processors

HPC SolutionCray XT325 TFlopsNanoparticles contain

thousands of atoms

Sets World Record – First time simulated FePt particle with 2662 atoms

Sets World Record – First time simulated FePt particle with 2662 atoms

Page 23HPCx Presentation 4th October 2006

Largest ever AORSA Simulation3072 processors of NCCS Cray XT3

The Science Challenge Simulate the behavior of plasma in a tokamak, the core of the multinational fusion reactor, ITER

Computational Challenge The code, AORSA, solves Maxwell’s equations –describing behavior of electric and magnetic fields and interaction with matter – for hot plasma in tokamak geometry

HPC SolutionIn August 2005, the largest run by Oak Ridge National Laboratory researcher Fred Jaeger.Utilized 3072 processors; roughly 60% of the entire Cray XT3

AORSA on the Cray XT3 “Jaguar” system compared with Seaborg, an IBM Power3. The columns represent execution phases of the code:

Aggregate is the total wall time, with Jaguar showing more than a factor of 3 improvement over Seaborg.

Page 26HPCx Presentation 4th October 2006

Understanding Earthquakes using “Big Ben” Cray XT3 at Pittsburgh Supercomputer Center

Significant Earthquakes

Modeling of Wave Propagation

The Science Challenge Simulating a magnitude 7.7 fault in southern California earthquake centered over a 230km portion of the San Andreas Predict worst hit locations, modify building codes Take into account multiple different soil layers and complex ground motion

Computational Challenge: A new simulationModels impact on larger areaHigher vibration frequencies – where most damage expected

HPC Solution “Big Ben” – Cray XT3 with 2090 AMD Opteron processors

Accurately forecast ground motion –ultimately saving lives

Accurately forecast ground motion –ultimately saving lives

Page 27HPCx Presentation 4th October 2006

WRF: Hurricane Katrina

• 48 hour WRF forecast run of Hurricane Katrina

• Domain Size: 480 x 400 x 31

• 4KM Grid Resolution

• Integration Timestep: 20 seconds

• Output: Once an hour

• Runtime: 49 minutes on 240 Cray XT3 processors

• Image is water vapor content 2KM above the ground

Page 29HPCx Presentation 4th October 2006

The Science ChallengeIncrease our ability to understand, detect and eventually predict the human influence on climateMost scientists agree that global warming is influenced by human activity

Computational Challenge Advance global climate modeling code to 50 km grid spacing and 60 vertical levels

HPC Solution Cray XT3 with thousands of processors ran ECHAM5 at a record speed of 1.4 Tflops

Cray XT3 Enabled the Fastest-Ever ECHAM5 Performance

Cray XT3 Enabled the Fastest-Ever ECHAM5 Performance

Climate Modeling using the Cray XT3:Performance Evaluation at Max Planck Institute for Meteorology

“MPI-M estimates that a Cray XT3 would make it possible to complete their next-

generation IPCC assessment runs in about the same real time as today, despite

requiring 120 times more computation. This advance promises to significantly

improve the scale and scope of the analysis researchers will be able to

submit for the next assessment report of the IPCC.”

- Dr. Annette Kirk- Max-Planck Institute for Meteorology

-http://idw-online.de/pages/de/news136278

“MPI-M estimates that a Cray XT3 would make it possible to complete their next-

generation IPCC assessment runs in about the same real time as today, despite

requiring 120 times more computation. This advance promises to significantly

improve the scale and scope of the analysis researchers will be able to

submit for the next assessment report of the IPCC.”

- Dr. Annette Kirk- Max-Planck Institute for Meteorology

-http://idw-online.de/pages/de/news136278

Page 30HPCx Presentation 4th October 2006

ECHAM5 T255L60 Performance on XT3

0,0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

384 768 1536 2048 3072CPUs

Tim

e pe

r tim

este

p (s

econ

ds)

0

200

400

600

800

1000

1200

1400

Sust

aine

d G

flops

Page 31HPCx Presentation 4th October 2006

The Shape of Things to Come

Final Slide

WhatDoYouNeedToKnow ?