nersc user group meeting future technology assessment horst d. simon nersc, division director...

41
NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

Upload: oswin-baldwin

Post on 11-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Future Technology Assessment

Horst D. SimonNERSC, Division Director

February 23, 2001

Page 2: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

DOE Science Computational Requirements…

… always outpace available resources

FY2001 Request 13,645,300

FY2001 Awards 7,532,200

FY2002 “Requests” 20,448,000

FY2003 “Requests” 28,358,000

MPP resources only.The FY02 and FY03 figures are estimates for NERSC planning only.

Page 3: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Traditional NERSC Computational Strategy

• Traditional strategy within existing NERSC Program fundingAcquire new computational capability every three years

- 3 to 4 times capability increase of existing systems

• Early, commercial, balanced systems with focus on- stable programming environment

- mature system management tools- good sustained to peak performance ratio

• Total value of $25M - $30M- About $9-10M/yr. using lease to own

• Have two generations in service at a time

- e.g. T3E and IBM SP

• Phased introduction

Page 4: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Re-evaluation of the Strategy

In order to evaluate our strategy for NERSC-4 and beyond we employ:

• Trend analysis: determine target ranges for performance of future systems which assure the high-end capability for the Office of Science

• Technology analysis: understand the different technology options to get into the target range

• Constraint analysis: understand what is feasible (space, power, budget)

Page 5: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

NERSC Peak Performance History

Page 6: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

TOP 500 - Performance Development

0.1

10

1000

100000

Per

form

ance

[G

Flo

p/s

]

N=1

N=500

N=100

Sum

N=10

NERSC-1

NERSC-2

NERSC-3Phase 1

Page 7: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

TOP500 Performance Development

0.1

10

1000

100000

Per

form

ance

[G

Flo

p/s

]

N=1

N=500

N=100

Sum

N=10

Page 8: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Performance Development

ASCI

Earth Simulator

N=1

N=500

N=100

Sum

N=10

1 GFlop/s

1 TFlop/s

1 PFlop/s

100 MFlop/s

100 GFlop/s

100 TFlop/s

10 GFlop/s

10 TFlop/s

NERSC-1

NERSC-2

NERSC-3

NERSC-4

Peak!

Page 9: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Trend Analysis

• In order to maintain flagship role, new NERSC capability systems should be in the TOP10 at installation time

• Aggregate systems performance has been accelerated because of Moore’s Law + increased parallelism: expect a factor 5-6 every three years

• NERSC-4 in late 2003 should have at least 20-30 Tflops LINPACK performance

• NERSC-5 in late 2006 should have at least 100-180 Tflops LINPACK performance

Page 10: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Extrapolation to the Next Decade

0.1

1

10

100

1000

10000

100000

1000000

Perf

orm

ance

[GFl

op/s

]

N=1

N=500

Sum

N=10

1 TFlop/s

1 PFlop/s

ASCI

Earth Simulator

Blue Gene

Page 11: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

2000 - 2005: Technology Options

• Clusters— SMP nodes, with custom interconnect— PCs, with commodity interconnect— vector nodes (in Japan)

• Custom built supercomputers— Cray SV-2— IBM Blue Gene

• Other technology to influence HPC— IRAM/PIM— low power processors (Transmeta)— consumer electronics (Playstation 2)— Internet computing

not yet mature for NERSC-4

not general purpose

Page 12: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Cray SV2 Overview

—Basic building block is a 50/100 GFLOPs node:

—4 x CPUs per node. IEEE. Design goal is 12.8 GFLOPs per CPU.

—8, 16 or 32 GB of coherent flat shared memory per CPU

—SSI to 1024 nodes: 50/100 TFLOPs, 32TB:

—100 GB/sec interconnect capacity to/from each node

—~1 microsecond latency anywhere in hypercube topology

—Targeted date of introduction, mid-2002.

—LC cabinets; Integral HEU (heat exchange unit)

—Up to 64 cabinets (4096 CPUs/50 TFLOPS) mesh topology

— availability 4Q2002

Page 13: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Incoming Power Box

Air Coil

FC-72 Filters

Router Modules

Node Modules

Power Supplies

Heat Exchanger

FC-72 Gear Pumps

I/O Cables

Cray Scalable Systems Update - Copyright Cray Inc, used by permission

Liquid-Cooled Cabinet(64 CPUs)

Page 14: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

2000 - 2005: Technology Options

• Clusters— SMP nodes, with custom interconnect— PCs, with commodity interconnect— vector nodes (in Japan)

• Custom built supercomputers— Cray SV-2— IBM Blue Gene

• Other technology to influence HPC— IRAM/PIM— low power processors (Transmeta)— consumer electronics (Playstation 2)— Internet computing

not yet mature for NERSC-4

not general purposehigh risk

Page 15: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Global Earth Simulator

• 30 Tflop/s system in Japan• completion 2002• driven by climate and earthquake simulation requirements• built by NEC• CMOS vector nodes

Page 16: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Earth Simulator

Page 17: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Global Earth Simulator Building

Page 18: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Japanese Vector Platforms

In the 2002 – 2005 time frame these platforms do not offer any advantage compared to SMP clusters built by American commercial vendors:— Distributed memory requires message

passing— Three levels of memory hierarchy require more

complicated trade-offs for performance— Similar space and power requirements

By 2003-4 a shared memory vector supercomputer will no longer be a capability platform.NERSC could pursue this as a capacity platform in addition to NERSC-4 (at the expense of a smaller capability).

Page 19: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

2000 - 2005: Technology Options

• Clusters— SMP nodes, with custom interconnect— PCs, with commodity interconnect— vector nodes (in Japan)

• Custom built supercomputers— Cray SV-2— IBM Blue Gene

• Other technology to influence HPC— IRAM/PIM— low power processors (Transmeta)— consumer electronics (Playstation 2)— Internet computing

not yet mature for NERSC-4

not general purposehigh risk

no techn. advantage

Page 20: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Cluster of SMP Approach

Processor

BuildingBlue Gene

Page 21: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

10 - 100 Tflop/s Cluster of SMPs

• Relatively low risk

— Systems are extensions of current product line

to high end

— Several commercially viable vendors in the US

— Experience at NERSC

— Leverage from ASCI investment

• The first ones are already on order

— LLNL is installing a 10 Tflop/s now

— LANL just ordered a 30 Tflop/s Compaq system

Page 22: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

100 - 1000 Tflop/s Cluster of SMPs(IBM Roadmap)

Processor

BuildingBlue Gene

Page 23: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

PC Clusters: Contributions of Beowulf

• An experiment in parallel computing systems

• Established vision of low cost, high end computing

• Demonstrated effectiveness of PC clusters for some (not all) classes of applications

• Provided networking software

• Conveyed findings to broad community (great PR)

• Tutorials and book• Design standard to rally community!

• Standards beget: books, trained people, software … virtuous cycle

Adapted from Gordon Bell, presentation at Salishan 2000

Page 24: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Linus’s Law: Linux Everywhere

• Software is or should be free (Stallman)

• All source code is “open”

• Everyone is a tester

• Everything proceeds a lot faster when everyone works on one code (HPC: nothing gets done if resources are scattered)

• Anyone can support and market the code for any price

• Zero cost software attracts users!

• All the developers write lots of code

• Prevents community from losing HPC software (CM5, T3E)

Page 25: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Is a Commodity Cluster a Supercomputer?

• The good— Hardware cost – commercial off-the-shelf— Majority of software is Open Source— Popular and trendy (see top500 list)— Well established programming model (MPI)

• The bad — Architecturally imbalanced— Higher level of complexity in HW and SW— User and system environment not fully featured

like supercomputer

Page 26: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Is a Commodity Cluster a Supercomputer? (cont.)

• The unknown— Real lifecycle costs — Rate of improvement of software environment

(system and user level)— Performance and scalability— Applicability to broad range of applications

What is the feasibility and cost-effectiveness of cluster systems for high-performance production capability computing workload?

NERSC is currently evaluating these issues to prepare for NERSC-4

Major announcements about PC Clusters (Shell, NCSA)

Page 27: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Summary on Technology Assessment

Likelihood that technology will be chosen

NERSC-4 NERSC-5

FY2003 FY2006

Cluster of SMP 75% 40%

PC Cluster 20% 40%

Vectors (Japanese) 0.1% 0%

Custom built (SV-2) 4.9% 5% (or 0%??)

New technology 0% 15%

Page 28: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

How Big Can NERSC-4 be

• Assume a deliver in FY 2003• Assume no other space is used in Oakland until

NERSC-4• Assume cost is not an issue (at least for now)• Assume technology still progresses

— ASCI will have a 30 Tflop/s system running for over 2 years

Page 29: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Full Computer Room

AvailableSpace

Phase B of OSF

Page 30: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

How close is 100 Tflop/s?

• Available gross space in Oakland is 7,700 sf without major changes— Assume it is 70% usable— The rest goes to air handlers, columns, etc.

• That gives 5,400 sf of space for racks• IBM system used for estimates

— Other vendors are similar• Each processor is 1.5 Ghz, to yield 6 Gflop/s• An SMP node is made up of 32 processors• 2 Nodes in a frame

— 64 processors in a frame = 384 Gflops per frame. • Frames are 32 - 36" wide and 48” deep

— service clearance of 3 feet in front and back (which can overlap)

— 3 by 7 is 21 sf per frame

Page 31: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Practical System Peak

• Rack Distribution— 60% of racks are for CPUs

• 90% are user/computation nodes• 10% are system support nodes

— 20 % of racks are for switch fabric — 20% of racks for disks

• 5,400 sf / 21 sf per frames = 257 frames• 277 nodes that are are directly used by computation

— 8,870 CPUS for computation— system total is 9,856 (308 nodes)

• Practical system peak is 53 Tflop/s— .192 Tflop/s per node * 277 nodes— Some other places would claim 60 Tflop/s

Page 32: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

NERSC-4

• Based on this analysis, NERSC can accommodate 53 Tflop/s peak system in existing facility with projected cluster of SMP technology

• Even at optimistic cost estimate of $1-2 M per Teraflop/s, budgets will be the limiting factor

Page 33: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Outline

• Role of NERSC in SciDAC—DOE Topical Computing Facilities —Enabling Technology Centers—DOE’s Scientific Challenge Projects

Page 34: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

SciDAC Overview

Note: SBIR/STTR for ASCR’s Mathematical, Information, and Computational Sciences Division is increased by $1.2M in FY’01 due to increases in operating expenses.The above numbers also do not include the requested $2.0M increase in the Computational Sciences Graduate Fellowship Program.

• Applied Mathematics• Computer Science

• Advanced Computing SoftwareTools

• Scientific ApplicationPilots

• Enabling Technology Centers (+$19.2M)

• Modeling andSimulation

• Physical Theory

• Scientific Challenge Teams (+$20.0M)

AdvancedScientific Computing

• Materials Sciences• Chemical Sciences• Combustion Sciences• Accelerator Sciences• High-Energy Physics• Nuclear Physics• Fusion Sciences• Biological Sciences• Global Climate• ...

FundamentalResearch

R&D forApplications Testbeds

FundamentalResearch

R&D forScience

ASCR/MICS PROGRAMS BES, BER, FES, HENP PROGRAMS

• Networking • Collaboratory Tools • Collaboratory Pilots

Energy Sciences Network (ESnet)

Advanced Computing Research Facilities

Facilities(+$12.3M)

• Access to Facilities• Link Researchers(+$1.5M) (+$2.6M) (+$6.0M)

(+$5.8M)

National Energy Research Scientific Computing Center (NERSC)

Page 35: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

SciDAC adjustments to strategy

• SciDAC provides an accelerated strategy • Increased funding by $5.8M/yr planned• NERSC-3 contract has several options to allow

upgrade of existing phase 2 system • OASCR has not yet decided on level of incremental

funding for NERSC platforms• NERSC is preparing SciDAC platform strategy to

maintain a balanced system and provide maximal capability to SciDAC users

• Planned funding would permit an upgrade of NERSC-3 to 5-6 Tflop/s peak at the expense of an unbalanced system

Page 36: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

NERSC’s and LBNL’s role

• The role of the NERSC Center as Flagship Facility for SciDAC is well defined

• NERSC should be able to compete for Topical Centers

• NERSC must be active participant in the development and deployment of new technology in the ETCs (Enabling Technology Centers)

• NERSC must be active participant in the Scientific Challenge Teams

Page 37: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

PDSF: a “Topical Facility” since 1996

• PDSF and NERSC hardware arrived at LBNL at the same time in 1996

• MICS agreed to dedicate 2FTEs to PDSF operation and to integrate PDSF into NERSC

• PDSF at NERSC evolved into a unique resource for HEP community

• PDSF strength: cost effective processing and easy access to NERSC HPSS system

• HENP experiments can draw upon resources and expertise within NERSC

• NERSC was stimulated to pursue R&D projects in— data intensive computing— distributed data access & computing— cluster computing

Page 38: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

PDSF Users and Collaborations

• ATLAS, D0, CDF, E895, E896, GC5, PHENIX, STAR

• HENP groups which are using or have used (at a significant level) PDSF include: AMANDA, ATLAS, CDF, E871, E895, GC5, NA49, PHENIX, RHIC Theory, SNO, STAR

• Specific software/production projects include:

— CERNlib port to T3E

— NERSC personnel (HCG & USG) helped with port of CERNlibs to T3E

— NERSC T3E was used for port of CERNlibs

— NERSC T3E provided 1/2 of data generated by STAR GEANT for first STAR

• Mock Data Challenge

— Pittsburg Supercomputing Center T3E provided 1/2 of data

— Stored on HPSS

— Transfered using DPSS and pftp

Page 39: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Current PDSF Configuration

Page 40: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Enabling Technology Centers

NERSC/LBNL is currently engaged in proposal

activities for ETCs, which leverage the experience of

development and deployment in the center, and the

research experience of scientific staff at LBNL— Applied Mathematics (LBNL, LANL, …)— Scientific Data Management (LBNL, LLNL,

ORNL, ANL)— Benchmarking and Performance Evaluation

(LBNL, ORNL, LLNL, ANL)— Systems Software (LBNL, ANL, ORNL …)— Optimal Solvers (LLNL, ANL, LBNL …)— Data Analysis and Visualization (ANL, LBNL, …)

Page 41: NERSC User Group Meeting Future Technology Assessment Horst D. Simon NERSC, Division Director February 23, 2001

NERSC User Group Meeting

Scientific Challenge Projects

NERSC is currently actively involved with the following pre-proposal activities:

• Climate (ANL, LLNL, NCAR, LANL) – FY2000 funding

• Accelerator Modeling (SLAC, LANL) – FY2000 funding

• Materials (ORNL, Ames Lab, ANL …)• Astrophysics (LBNL-Physics, …)• Fusion (PPPL, LLNL, …)