nersc user group meeting future technology assessment horst d. simon nersc, division director...
TRANSCRIPT
NERSC User Group Meeting
Future Technology Assessment
Horst D. SimonNERSC, Division Director
February 23, 2001
NERSC User Group Meeting
DOE Science Computational Requirements…
… always outpace available resources
FY2001 Request 13,645,300
FY2001 Awards 7,532,200
FY2002 “Requests” 20,448,000
FY2003 “Requests” 28,358,000
MPP resources only.The FY02 and FY03 figures are estimates for NERSC planning only.
NERSC User Group Meeting
Traditional NERSC Computational Strategy
• Traditional strategy within existing NERSC Program fundingAcquire new computational capability every three years
- 3 to 4 times capability increase of existing systems
• Early, commercial, balanced systems with focus on- stable programming environment
- mature system management tools- good sustained to peak performance ratio
• Total value of $25M - $30M- About $9-10M/yr. using lease to own
• Have two generations in service at a time
- e.g. T3E and IBM SP
• Phased introduction
NERSC User Group Meeting
Re-evaluation of the Strategy
In order to evaluate our strategy for NERSC-4 and beyond we employ:
• Trend analysis: determine target ranges for performance of future systems which assure the high-end capability for the Office of Science
• Technology analysis: understand the different technology options to get into the target range
• Constraint analysis: understand what is feasible (space, power, budget)
NERSC User Group Meeting
NERSC Peak Performance History
NERSC User Group Meeting
TOP 500 - Performance Development
0.1
10
1000
100000
Per
form
ance
[G
Flo
p/s
]
N=1
N=500
N=100
Sum
N=10
NERSC-1
NERSC-2
NERSC-3Phase 1
NERSC User Group Meeting
TOP500 Performance Development
0.1
10
1000
100000
Per
form
ance
[G
Flo
p/s
]
N=1
N=500
N=100
Sum
N=10
NERSC User Group Meeting
Performance Development
ASCI
Earth Simulator
N=1
N=500
N=100
Sum
N=10
1 GFlop/s
1 TFlop/s
1 PFlop/s
100 MFlop/s
100 GFlop/s
100 TFlop/s
10 GFlop/s
10 TFlop/s
NERSC-1
NERSC-2
NERSC-3
NERSC-4
Peak!
NERSC User Group Meeting
Trend Analysis
• In order to maintain flagship role, new NERSC capability systems should be in the TOP10 at installation time
• Aggregate systems performance has been accelerated because of Moore’s Law + increased parallelism: expect a factor 5-6 every three years
• NERSC-4 in late 2003 should have at least 20-30 Tflops LINPACK performance
• NERSC-5 in late 2006 should have at least 100-180 Tflops LINPACK performance
NERSC User Group Meeting
Extrapolation to the Next Decade
0.1
1
10
100
1000
10000
100000
1000000
Perf
orm
ance
[GFl
op/s
]
N=1
N=500
Sum
N=10
1 TFlop/s
1 PFlop/s
ASCI
Earth Simulator
Blue Gene
NERSC User Group Meeting
2000 - 2005: Technology Options
• Clusters— SMP nodes, with custom interconnect— PCs, with commodity interconnect— vector nodes (in Japan)
• Custom built supercomputers— Cray SV-2— IBM Blue Gene
• Other technology to influence HPC— IRAM/PIM— low power processors (Transmeta)— consumer electronics (Playstation 2)— Internet computing
not yet mature for NERSC-4
not general purpose
NERSC User Group Meeting
Cray SV2 Overview
—Basic building block is a 50/100 GFLOPs node:
—4 x CPUs per node. IEEE. Design goal is 12.8 GFLOPs per CPU.
—8, 16 or 32 GB of coherent flat shared memory per CPU
—SSI to 1024 nodes: 50/100 TFLOPs, 32TB:
—100 GB/sec interconnect capacity to/from each node
—~1 microsecond latency anywhere in hypercube topology
—Targeted date of introduction, mid-2002.
—LC cabinets; Integral HEU (heat exchange unit)
—Up to 64 cabinets (4096 CPUs/50 TFLOPS) mesh topology
— availability 4Q2002
NERSC User Group Meeting
Incoming Power Box
Air Coil
FC-72 Filters
Router Modules
Node Modules
Power Supplies
Heat Exchanger
FC-72 Gear Pumps
I/O Cables
Cray Scalable Systems Update - Copyright Cray Inc, used by permission
Liquid-Cooled Cabinet(64 CPUs)
NERSC User Group Meeting
2000 - 2005: Technology Options
• Clusters— SMP nodes, with custom interconnect— PCs, with commodity interconnect— vector nodes (in Japan)
• Custom built supercomputers— Cray SV-2— IBM Blue Gene
• Other technology to influence HPC— IRAM/PIM— low power processors (Transmeta)— consumer electronics (Playstation 2)— Internet computing
not yet mature for NERSC-4
not general purposehigh risk
NERSC User Group Meeting
Global Earth Simulator
• 30 Tflop/s system in Japan• completion 2002• driven by climate and earthquake simulation requirements• built by NEC• CMOS vector nodes
NERSC User Group Meeting
Earth Simulator
NERSC User Group Meeting
Global Earth Simulator Building
NERSC User Group Meeting
Japanese Vector Platforms
In the 2002 – 2005 time frame these platforms do not offer any advantage compared to SMP clusters built by American commercial vendors:— Distributed memory requires message
passing— Three levels of memory hierarchy require more
complicated trade-offs for performance— Similar space and power requirements
By 2003-4 a shared memory vector supercomputer will no longer be a capability platform.NERSC could pursue this as a capacity platform in addition to NERSC-4 (at the expense of a smaller capability).
NERSC User Group Meeting
2000 - 2005: Technology Options
• Clusters— SMP nodes, with custom interconnect— PCs, with commodity interconnect— vector nodes (in Japan)
• Custom built supercomputers— Cray SV-2— IBM Blue Gene
• Other technology to influence HPC— IRAM/PIM— low power processors (Transmeta)— consumer electronics (Playstation 2)— Internet computing
not yet mature for NERSC-4
not general purposehigh risk
no techn. advantage
NERSC User Group Meeting
Cluster of SMP Approach
Processor
BuildingBlue Gene
NERSC User Group Meeting
10 - 100 Tflop/s Cluster of SMPs
• Relatively low risk
— Systems are extensions of current product line
to high end
— Several commercially viable vendors in the US
— Experience at NERSC
— Leverage from ASCI investment
• The first ones are already on order
— LLNL is installing a 10 Tflop/s now
— LANL just ordered a 30 Tflop/s Compaq system
NERSC User Group Meeting
100 - 1000 Tflop/s Cluster of SMPs(IBM Roadmap)
Processor
BuildingBlue Gene
NERSC User Group Meeting
PC Clusters: Contributions of Beowulf
• An experiment in parallel computing systems
• Established vision of low cost, high end computing
• Demonstrated effectiveness of PC clusters for some (not all) classes of applications
• Provided networking software
• Conveyed findings to broad community (great PR)
• Tutorials and book• Design standard to rally community!
• Standards beget: books, trained people, software … virtuous cycle
Adapted from Gordon Bell, presentation at Salishan 2000
NERSC User Group Meeting
Linus’s Law: Linux Everywhere
• Software is or should be free (Stallman)
• All source code is “open”
• Everyone is a tester
• Everything proceeds a lot faster when everyone works on one code (HPC: nothing gets done if resources are scattered)
• Anyone can support and market the code for any price
• Zero cost software attracts users!
• All the developers write lots of code
• Prevents community from losing HPC software (CM5, T3E)
NERSC User Group Meeting
Is a Commodity Cluster a Supercomputer?
• The good— Hardware cost – commercial off-the-shelf— Majority of software is Open Source— Popular and trendy (see top500 list)— Well established programming model (MPI)
• The bad — Architecturally imbalanced— Higher level of complexity in HW and SW— User and system environment not fully featured
like supercomputer
NERSC User Group Meeting
Is a Commodity Cluster a Supercomputer? (cont.)
• The unknown— Real lifecycle costs — Rate of improvement of software environment
(system and user level)— Performance and scalability— Applicability to broad range of applications
What is the feasibility and cost-effectiveness of cluster systems for high-performance production capability computing workload?
NERSC is currently evaluating these issues to prepare for NERSC-4
Major announcements about PC Clusters (Shell, NCSA)
NERSC User Group Meeting
Summary on Technology Assessment
Likelihood that technology will be chosen
NERSC-4 NERSC-5
FY2003 FY2006
Cluster of SMP 75% 40%
PC Cluster 20% 40%
Vectors (Japanese) 0.1% 0%
Custom built (SV-2) 4.9% 5% (or 0%??)
New technology 0% 15%
NERSC User Group Meeting
How Big Can NERSC-4 be
• Assume a deliver in FY 2003• Assume no other space is used in Oakland until
NERSC-4• Assume cost is not an issue (at least for now)• Assume technology still progresses
— ASCI will have a 30 Tflop/s system running for over 2 years
NERSC User Group Meeting
Full Computer Room
AvailableSpace
Phase B of OSF
NERSC User Group Meeting
How close is 100 Tflop/s?
• Available gross space in Oakland is 7,700 sf without major changes— Assume it is 70% usable— The rest goes to air handlers, columns, etc.
• That gives 5,400 sf of space for racks• IBM system used for estimates
— Other vendors are similar• Each processor is 1.5 Ghz, to yield 6 Gflop/s• An SMP node is made up of 32 processors• 2 Nodes in a frame
— 64 processors in a frame = 384 Gflops per frame. • Frames are 32 - 36" wide and 48” deep
— service clearance of 3 feet in front and back (which can overlap)
— 3 by 7 is 21 sf per frame
NERSC User Group Meeting
Practical System Peak
• Rack Distribution— 60% of racks are for CPUs
• 90% are user/computation nodes• 10% are system support nodes
— 20 % of racks are for switch fabric — 20% of racks for disks
• 5,400 sf / 21 sf per frames = 257 frames• 277 nodes that are are directly used by computation
— 8,870 CPUS for computation— system total is 9,856 (308 nodes)
• Practical system peak is 53 Tflop/s— .192 Tflop/s per node * 277 nodes— Some other places would claim 60 Tflop/s
NERSC User Group Meeting
NERSC-4
• Based on this analysis, NERSC can accommodate 53 Tflop/s peak system in existing facility with projected cluster of SMP technology
• Even at optimistic cost estimate of $1-2 M per Teraflop/s, budgets will be the limiting factor
NERSC User Group Meeting
Outline
• Role of NERSC in SciDAC—DOE Topical Computing Facilities —Enabling Technology Centers—DOE’s Scientific Challenge Projects
NERSC User Group Meeting
SciDAC Overview
Note: SBIR/STTR for ASCR’s Mathematical, Information, and Computational Sciences Division is increased by $1.2M in FY’01 due to increases in operating expenses.The above numbers also do not include the requested $2.0M increase in the Computational Sciences Graduate Fellowship Program.
• Applied Mathematics• Computer Science
• Advanced Computing SoftwareTools
• Scientific ApplicationPilots
• Enabling Technology Centers (+$19.2M)
• Modeling andSimulation
• Physical Theory
• Scientific Challenge Teams (+$20.0M)
AdvancedScientific Computing
• Materials Sciences• Chemical Sciences• Combustion Sciences• Accelerator Sciences• High-Energy Physics• Nuclear Physics• Fusion Sciences• Biological Sciences• Global Climate• ...
FundamentalResearch
R&D forApplications Testbeds
FundamentalResearch
R&D forScience
ASCR/MICS PROGRAMS BES, BER, FES, HENP PROGRAMS
• Networking • Collaboratory Tools • Collaboratory Pilots
Energy Sciences Network (ESnet)
Advanced Computing Research Facilities
Facilities(+$12.3M)
• Access to Facilities• Link Researchers(+$1.5M) (+$2.6M) (+$6.0M)
(+$5.8M)
National Energy Research Scientific Computing Center (NERSC)
NERSC User Group Meeting
SciDAC adjustments to strategy
• SciDAC provides an accelerated strategy • Increased funding by $5.8M/yr planned• NERSC-3 contract has several options to allow
upgrade of existing phase 2 system • OASCR has not yet decided on level of incremental
funding for NERSC platforms• NERSC is preparing SciDAC platform strategy to
maintain a balanced system and provide maximal capability to SciDAC users
• Planned funding would permit an upgrade of NERSC-3 to 5-6 Tflop/s peak at the expense of an unbalanced system
NERSC User Group Meeting
NERSC’s and LBNL’s role
• The role of the NERSC Center as Flagship Facility for SciDAC is well defined
• NERSC should be able to compete for Topical Centers
• NERSC must be active participant in the development and deployment of new technology in the ETCs (Enabling Technology Centers)
• NERSC must be active participant in the Scientific Challenge Teams
NERSC User Group Meeting
PDSF: a “Topical Facility” since 1996
• PDSF and NERSC hardware arrived at LBNL at the same time in 1996
• MICS agreed to dedicate 2FTEs to PDSF operation and to integrate PDSF into NERSC
• PDSF at NERSC evolved into a unique resource for HEP community
• PDSF strength: cost effective processing and easy access to NERSC HPSS system
• HENP experiments can draw upon resources and expertise within NERSC
• NERSC was stimulated to pursue R&D projects in— data intensive computing— distributed data access & computing— cluster computing
NERSC User Group Meeting
PDSF Users and Collaborations
• ATLAS, D0, CDF, E895, E896, GC5, PHENIX, STAR
• HENP groups which are using or have used (at a significant level) PDSF include: AMANDA, ATLAS, CDF, E871, E895, GC5, NA49, PHENIX, RHIC Theory, SNO, STAR
• Specific software/production projects include:
— CERNlib port to T3E
— NERSC personnel (HCG & USG) helped with port of CERNlibs to T3E
— NERSC T3E was used for port of CERNlibs
— NERSC T3E provided 1/2 of data generated by STAR GEANT for first STAR
• Mock Data Challenge
— Pittsburg Supercomputing Center T3E provided 1/2 of data
— Stored on HPSS
— Transfered using DPSS and pftp
NERSC User Group Meeting
Current PDSF Configuration
NERSC User Group Meeting
Enabling Technology Centers
NERSC/LBNL is currently engaged in proposal
activities for ETCs, which leverage the experience of
development and deployment in the center, and the
research experience of scientific staff at LBNL— Applied Mathematics (LBNL, LANL, …)— Scientific Data Management (LBNL, LLNL,
ORNL, ANL)— Benchmarking and Performance Evaluation
(LBNL, ORNL, LLNL, ANL)— Systems Software (LBNL, ANL, ORNL …)— Optimal Solvers (LLNL, ANL, LBNL …)— Data Analysis and Visualization (ANL, LBNL, …)
NERSC User Group Meeting
Scientific Challenge Projects
NERSC is currently actively involved with the following pre-proposal activities:
• Climate (ANL, LLNL, NCAR, LANL) – FY2000 funding
• Accelerator Modeling (SLAC, LANL) – FY2000 funding
• Materials (ORNL, Ames Lab, ANL …)• Astrophysics (LBNL-Physics, …)• Fusion (PPPL, LLNL, …)