climate research at the national energy research scientific computing center (nersc) bill kramer...
Post on 04-Jan-2016
219 Views
Preview:
TRANSCRIPT
Climate Research at the National Energy Research Scientific Computing Center (NERSC)
Bill KramerDeputy Director and Head of High Performance Computing
CAS 2001October 30, 2001
#
2CCPM, October 3, 2001
NERSC Vision
NERSC strives to be a world leader in accelerating scientific discovery through
computation. Our vision is to provide high-performance computing tools and expertise
to tackle science's biggest and most challenging problems, and to play a major
role in advancing large-scale computational science and computer science.
3CCPM, October 3, 2001
Outline
• NERSC-3: Successfully fielding the world’s most powerful unclassified computing resource
• The NERSC Strategic Proposal: An Aggressive Vision for the Future of the Flagship Computing Facility of the Office of Science
• Scientific Discovery through Advanced Computing (SciDAC) at NERSC
• Support for Climate Computing at NERSC: Ensuring Success for the National Program
4CCPM, October 3, 2001
FY00 MPP Users/Usage by Scientific Discipline
NERSC FY00 MPPUsers by Discipline
NERSC FY00 MPPUsage by Discipline
5CCPM, October 3, 2001
NERSC FY00 Usage by Site
MPP Usage
PVP Usage
6CCPM, October 3, 2001
FY00 Users/Usage by Institution Type
7CCPM, October 3, 2001
NERSC Computing Highlights for FY 01
• NERSC 3 is in full and final production – exceeding original capability by more than 30% and with much larger memory.
• Increased total FY 02 allocations of computer time by 450% over FY01.
• Activated the new Oakland Scientific Facility• Upgraded NERSC network connection to 655 Mbits/s
(OC12) – ~4 times the previous bandwidth. • Increase archive storage capacity with 33% more
tape slots and double the number of tape drives• PDSF, T3E, SV1s, and other systems all continue
operating very well
8CCPM, October 3, 2001
Oakland Scientific Facility
• 20,000 sf computer room; 7,000 sf office space—16,000 sf computer space built out—NERSC occupying 12,000 sf
• Ten-year lease with 3 five-year options• $10.5M computer room construction costs• Option for additional 20,000+ sf computer room
9CCPM, October 3, 2001
HPSS Archive Storage
Monthly IO by Month and System
0
5
10
15
20
25
Month
I/O (
TB
)
Archive
User/Regent
Backup
File Counts by Date and System
0
2
4
6
8
10
12
9810
9812
9902
9904
9906
9908
9910
9912
2000
02
2000
04
2000
06
2000
08
2000
10
2000
12
2001
02
2001
04
Month
File
s (
Mill
ion
s)
Archive
User/Regent
Backup
Cumulative Storage by Month and System
0
20
40
60
80
100
120
140
160
180
200
9810
9812
9902
9904
9906
9908
9910
9912
2000
02
2000
04
2000
06
2000
08
2000
10
2000
12
2001
02
2001
04
Month
TB
Archive
User/Regent
Backup•190 Terabytes of data in the storage systems
•9 Million files in the storage systems
•Average 600-800 GBs Data transferred/day
•Peak 1.5 TB
•Average 18,000 files transferred/day
• Peak 60,000
•500-600 Tape mounts/day
•Peak 2000) (12/system)
10CCPM, October 3, 2001
NERSC-3 Vital Statistics
• 5 Teraflop/s Peak Performance – 3.05 Teraflop/s with Linpack— 208 nodes, 16 CPUs per node at 1.5 Gflop/s per CPU— “Worst case” Sustained System Performance measure .358 Tflop/s (7.2%)— “Best Case” Gordon Bell submission 2.46 on 134 nodes (77%)
• 4.5 TB of main memory— 140 nodes with 16 GB each, 64 nodes with 32 GBs, and 4 nodes with 64 GBs.
• 40 TB total disk space— 20 TB formatted shared, global, parallel, file space; 15 TB local disk for system usage
• Unique 512 way Double/Single switch configuration
11CCPM, October 3, 2001
Two Gordon Bell-Prize Finalists Are Using NERSC-3
• Materials Science -- 2016-atom supercell models for spin dynamics simulations of magnetic structure of iron-magnanese/cobalt interface. Using 2176 processors of NERSC 3 showed a sustained 2.46 teraflop/s – M. Stocks and team at ORNL and U. Pittsburgh with A. Canning at NERSC
• Climate Modeling -- Shallow Water Climate Model sustained 361 Gflop/s (12%) – S. Thomas et al., NCAR.
Section of an FeMn/Co interface shows a new magnetic structure that is different from the magnetic structure of pure FeMn.
12CCPM, October 3, 2001
VIS LAB
MILLENNIUM
SYMBOLICMANIPULATION
SERVER
REMOTEVISUALIZATION
SERVER
CRI T3E900 644/256
CRI SV1
FDDI/ETHERNET
10/100/Gigbit
SGIMAX
STRAT
HIPPI
IBMAnd STKRobots
DPSS PDSF
ESnet
HPSSHPSS
NERSC System Architecture
ResearchCluster
IBM SPNERSC-3 – Phase 2
2532 Processors/ 1824 GigabyteMemory/32 Terabytes of Disk
LBNL Cluster
13CCPM, October 3, 2001
NERSC Strategic Proposal
An Aggressive Vision for the Future of the Flagship Computing Facility of the Office of
Science
14CCPM, October 3, 2001
The NERSC Strategic Proposal
• Requested In February, 2001 by the Office of Science as a proposal for the next five years of the NERSC Center and Program
• Proposal and Implementation Plan delivered to OASCR at the end of May, 2001
• Proposal plays from NERSC’s strengths, but anticipates rapid and broad changes in scientific computing.
• Results of DOE review expected at the end of November-December 2001
15CCPM, October 3, 2001
16CCPM, October 3, 2001
High-End Systems: A Carefully Researched Plan for Growth
A three-year procurement cycle for leading-edge computing platforms
Balanced Systems, with appropriate data storage
and networking
17CCPM, October 3, 2001
18CCPM, October 3, 2001
NERSC Support for the DOE Scientific Discovery through Advanced
Computing (SciDAC)
19CCPM, October 3, 2001
Scientific Discovery Through Advanced Computing
Subsurface Transport
GlobalSystems
DOE Science ProgramsNeed Dramatic Advances
in Simulation Capabilities
To Meet TheirMission Goals
Health Effects, Bioremediation
Fusion Energy
CombustionMaterials
20CCPM, October 3, 2001
LBNL/NERSC SciDAC Portfolio – Project Leadership
Project NamePrincipal Investigator
Partner InstitutionsAnnual Funding
Scientific Data Mgmt Center (ISIC)
Shoshani
ANL, LLNL, ORNL, UC San Diego, Georgia Institute of Tech; Northwestern Univ; No Carolina State Univ
$624,000
Applied Partial Differential Center (ISIC)
ColellaLLNL, Univ of Wash, No Carolina, Wisc, UC Davis; NYU
$1,700,000
Performance Evaluation Research Center (ISIC)
BaileyORNL, ANL, LLNL, Univ of Maryland, Tenn, Ill at Urbana-CHAMPAIGN, UC San Diego
$276,000
DOE Science Grid: Enabling and Deploying the SciDAC Collaboratory Software Environment
Johnston ORNL, ANL, NERSC, PNNL $510,000
Advanced Computing for the 21st Century Accelerator Science Technology
Ryne NERSC, SLAC $650,000
21CCPM, October 3, 2001
Applied Partial Differential Equations ISIC
• New algorithmic capabilities with high-performance implementations on high-end computers:
—Adaptive mesh refinement —Cartesian grid embedded boundary
methods for complex geometries —Fast adaptive particle methods
• Close collaboration with applications scientists
• Common mathematical and software framework for multiple applications
Participants: LBNL (J. Bell, P. Colella), LLNL , Courant Institute, Univ. of Washington, Univ. of North Carolina, Univ. of California, Davis,
Univ. of Wisconsin.
Developing a new algorithmic and software framework for solving partial differential equations in core mission areas.
22CCPM, October 3, 2001
Scientific Data Management ISIC
Participants: ANL, LBNL, LLNL, ORNL,GTech, NCSU, NWU, SDSC
Tapes
Disks
ScientificSimulations
& Experiments
ScientificAnalysis
& Discovery
DataManipulation:
• Getting files from tape archive• Extracting subset of data from files• Reformatting data• Getting data from heterogeneous, distributed systems• Moving data over the network
Petabytes
Terabytes
DataManipulation:
~80% time
~20% time
~20% time
~80% time
• Using SDM-ISIC technology
ScientificAnalysis
& Discovery
•Optimizing shared access from mass storage systems•Metadata and knowledge- based federations•API for Grid I/O•High-dimensional cluster analysis•High-dimensional indexing•Adaptive file caching•Agents
SDM-ISIC Technology
Goals Optimize and simplify:• Access to very large data sets• Access to distributed data• Access of heterogeneous data• Data mining of very large data sets
23CCPM, October 3, 2001
SciDAC Portfolio – NERSC as a Collaborator
Project NameCo-Principal Investigator
Lead PI & Institution
Annual Funding
DOE Science Grid: Enabling and Deploying the SciDAC Collaboratory Software Environment
Kramer Johnston - LBNL $225,000
Scalable Systems Software Enabling Technology Center
Hargrove Al Geist - ORNL $198,000
Advanced Computing for the 21st Century Accelerator Science Technology
Ng Robert Ryne - LBNL $200,000
Terascale Optimal PDE Simulations Center (TOPS Center)
NgBarry Smith and Jorge More - ANL
$516,000
Earth Sys Grid: The Next Generation Turning Climate Datasets into Community Resources
Shoshani Ian Foster - ANL $255,000
Particle Physics Data Grid Collaboratory Pilot ShoshaniRichard Mount - SLAC
$405,000
Collaborative Design and Development of the Community Climate System Model for Terascale Computers
DingMalone – Drake, LANL, ORNL
$400,000
24CCPM, October 3, 2001
Strategic Project Support
• Specialized Consulting Support—Project Facilitator Assigned
• Help defining project requirements• Help with getting resources• Code tuning and optimization
—Special Service Coordination • Queues, throughput, increased limits, etc.
• Specialized Algorithmic Support—Project Facilitator Assigned
• Develop and improve algorithms• Performance enhancement
—Coordination with ISICs to represent work and activities
25CCPM, October 3, 2001
Strategic Project Support
• Special Software Support—Projects can request support for packages and software that
are special to their work and not as applicable to the general community
• Visualization Support—Apply NERSC Visualization S/W to projects—Develop and improve methods specific to the projects—Support any project visitors who use the local LBNL
visualization lab• SciDAC Conference and Workshop Support
—NERSC Staff will provide content and presentations at project events
—Provide custom training as project events—NERSC staff attend and participate at project events
26CCPM, October 3, 2001
Strategic Project Support
• Web Services for interested projects—Provide areas on NERSC web servers for interested
projects• Password protected areas as well• Safe “sandbox” area for dynamic script development
—Provide web infrastructure• Templates, structure, tools, forms, dynamic data scripts (cgi-
gin)
—Archive for mailing lists—Provide consulting support to help projects organize and
manage web content• CVS Support
—Provide a server area for interested projects• Backup, administration, access control
—Provide access to code repositories—Help projects set up and manage code repositories
27CCPM, October 3, 2001
Strategic Project Area Facilitators
User Services Facilitator
Scientific Computing Facilitator
Fusion David Turner Dr Jodi Lamoureux
QCD Dr Majdi Baddourah Dr Jodi Lamoureux
Experimental Physics Dr Iwona Sakrejda Dr Jodi Lamoureux
AstroPhysics Dr Richard Gerber Dr Peter Nugent
Accelerator Physics Dr Richard Gerber Dr Esmond Ng
Chemistry Dr David Skinner Dr Lin Wang
Life Science Dr Jonathan Carter Dr Chris Ding
Climate Dr Harsh Anand Passi Dr Chris Ding
Computer Science Thomas Deboni Dr Parry Husbands /Dr Osni Marques (for CCA)
Applied math Dr Majdi Baddourah Dr Chao Yang
28CCPM, October 3, 2001
NERSC Support for Climate Research
Ensuring Success for the National Program
29CCPM, October 3, 2001
Climate Projects at NERSC
• 20+ projects from the base MPP allocations with about ~6% of the entire base resource
• Two Strategic Climate Projects —High Resolution Global Coupled Ocean/Sea Ice
Modeling – Matt Maltrud @ LANL• 5% of total SP hours (920,000 wall clock hours)• “Couple high resolution ocean general circulation model
with high resolution dynamic thermodynamic sea ice model in a global context.”
—1/10th degree (3 to 5 km in polar regions)
—Warren Washington, Tom Bettge, Tony Craig, et al.
• PCM coupler
30CCPM, October 3, 2001
Early Scientific Results Using NERSC-3
• Climate Modeling – 50km resolution for global climate simulation run in a 3 year test. Proved that the model is robust to a large increase in spatial resolution. Highest spatial resolution ever used, 32 times more grid cells than ~300km grids, takes 200 times as long. – P. Duffy, LLNL
Reaching Regional Climate Resolution
31CCPM, October 3, 2001
Some other Climate Projects NERSC staff have helped with
• Richard Loft, Stephen Thomas and John Dennis, NCAR - Using 2,048 processors on NERSC-3, demonstration that dynamical core of an atmospheric general circulation model (GCM) can be integrated at a rate of 130 years per day
• Inez Fung (UCB) - CSM to build a Carbon Climate simulation package using the SV1
• Mike Wehner - CCM to do large scale ensemble simulations on T3E
• Doug Rotman – Atmospheric Chemistry/Aerosol Simulations
• Tim Barnett and Detlaf Stammer – PCM runs on T3E and SP.
32CCPM, October 3, 2001
ACPI/Avantgarde/SciDAC
• Work done by Chris Ding and team —comprehensive performance analysis of GPFS on
IBM SP (supported by Avant Garde).— I/O performance analysis, see
http://www.nersc.gov/research/SCG/acpi/IO/— numerical reproducibility and stability — MPH: a library for distributed multi-component
environment
33CCPM, October 3, 2001
Special Support for Climate Computing
NCAR CSM version 1.2• NERSC was the first site to port NCAR CSM to non-
NCAR Cray PVP machine• Main users Inez Fung (UCB) and Mike Wehner
(LLNL)
NCAR CCM3.6.6• Independent of CSM, NERSC ported NCAR
CCM3.6.6 to NERSC Cray PVP cluster. • See
http://hpcf.nersc.gov/software/apps/climate/ccm3/
34CCPM, October 3, 2001
Special Support for Climate Computing – cont.
• T3E netCDF parallelization — NERSC solicited user input for defining parallel I/O requirements for
the MOM3, LAN and CAMILLE climate models (Ron Pacanowski, Venkatramani Balaji, Michael Wehner, Doug Rotman and John Tannahill)
— Development of netCDF parallelization on T3E was done by Dr. RK Owen at NERSC/USG based on modelers requirements
• better I/O performance, • master/slave read/write capability• support for variable unlimited dimension• allow subset of PEs open/close netCDF dataset• user friendly API • etc.
— Demonstrated netCDF parallel I/O usage by building model specific I/O test cases (MOM3, CAMILLE).
— netCDF 3.5 official UNIDATA release includes “added support provided by NERSC for multiprocessing on Cray T3E.“ http://www.unidata.ucar.edu/packages/netcdf/release-notes-3.5.0.html
• Parallel netCDF for IBM SP under development by Dr. Majdi Baddourah of NERSC/USG
35CCPM, October 3, 2001
Additional Support for Climate
• Scientific Computing and User Service’s Groups have staff with special climatic focus
• Received funding for a new climate support person at NERSC• Will provide software, consulting, and documentation support for
climate researchers at NERSC• Will port the second generation of NCAR's Community Climate
System Model (CCSM-2) to NERSC's IBM SP. • Put the modified source code under CVS control so that individual
investigators at NERSC can access the NERSC version, and modify and manipulate their own source without affecting others.
• Provide necessary support and consultation on operational issues.
• Will develop enhancements to NetCDF on NERSC machines that benefit NERSC's climate researchers.
• Will respond in a timely, complete, and courteous manner to NERSC user clients, and provide an interface between NERSC users and staff.
36CCPM, October 3, 2001
NERSC Systems Utilization
MPP Charging and UsageFY 1998-2000
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
1-O
ct-97
29-O
ct-97
26-N
ov-97
24-D
ec-97
21-Jan-98
18-Feb-98
18-M
ar-98
15-A
pr-98
13-M
ay-98
10-Jun-98
8-Jul-98
5-A
ug-98
2-S
ep-98
30-S
ep-98
28-O
ct-98
25-N
ov-98
23-D
ec-98
20-Jan-99
17-Feb-99
17-M
ar-99
14-A
pr-99
12-M
ay-99
9-Jun-99
7-Jul-99
4-A
ug-99
1-S
ep-99
29-S
ep-99
27-O
ct-99
24-N
ov-99
22-D
ec-99
19-Jan-00
16-Feb-00
15-M
ar-00
12-A
pr-00
10-M
ay-00
7-Jun-00
5-Jul-00
2-A
ug-00
30-A
ug-00
27-S
ep-00
Date
CPU
H
ours
30-Day Moving Ave. Lost Time
30-Day Moving Ave. Pierre Free
30-Day Moving Ave. Pierre
30-Day Moving Ave. GC0
30-Day Moving Ave. Mcurie
30-Day Moving Ave. Overhead
80%
85%
90%
Max CPU Hours
80%
85%
90%
Peak
Goal
Systems MergedAllocationStarvation
AllocationStarvation
CheckpointResart - Start ofCapability J obs
FullScheduling Functionality
4.4% improvement per month
T3E – 95% Gross utilization
IBM SP – 80-85% Gross utilization
37CCPM, October 3, 2001
Mcurie MPP Time by Job Size - 30 Day Moving Average
0
2000
4000
6000
8000
10000
12000
14000
16000
4/2
9/200
0
5/6
/2000
5/1
3/200
0
5/2
0/200
0
5/2
7/200
0
6/3
/2000
6/1
0/200
0
6/1
7/200
0
6/2
4/200
0
7/1
/2000
7/8
/2000
7/1
5/200
0
7/2
2/200
0
7/2
9/200
0
8/5
/2000
8/1
2/200
0
8/1
9/200
0
8/2
6/200
0
9/2
/2000
9/9
/2000
9/1
6/200
0
9/2
3/200
0
9/3
0/200
0
10
/7/200
0
10
/14/20
00
10
/21/20
00
10
/28/20
00
11
/4/200
0
11
/11/20
00
11
/18/20
00
11
/25/20
00
Date
Ho
urs
257-512
129-256
97-128
65-96
33-64
17-32
<16
IBM SP
NERSC Systems Run “large” Jobs
T3E
38CCPM, October 3, 2001
Balancing Utilization and Turnaround
• NERSC consistently delivers high utilization on MPP systems, while running large applications.
• We are now working with our users to establish methods to provide improved services— Guaranteed throughput for at least a selected group of projects— More interactive and debugging resources for parallel applications— Longer application runs — More options in resource requests
• Because of the special turnaround requirements of the large climate users— NERSC established a queue working group (T. Bettge, Vince
Wayland at NCAR)— Set up special queue scheduling procedures that provide an agreed
upon amount of turnaround per day if there is work in it (Sept. ‘01)— Will present a plan at the NERSC User Group Meeting, November
12, 2001 in Denver, about job scheduling
39CCPM, October 3, 2001
Wait times in “regular” queue
Climate jobs
All other jobs
40CCPM, October 3, 2001
NERSC Is Delivering on Its Commitment to Make the Entire DOE Scientific Computing
Enterprise Successful• NERSC sets the standard for effective
supercomputing resources• NERSC is a major player in SciDAC and will
coordinate it projects and collaborations• NERSC is providing targeted support to SciDAC
projects• NERSC continues to provide targeted support for the
climate community and is acting on the input and needs of the climate community
top related