what supercomputers still can’t do – a reflection on the state of the art in cse horst d. simon

What Supercomputers Still Can’t Do – a Reflection on the State of the Art in CSEHorst D. SimonAssociate Laboratory Director, Computing SciencesDirector, NERSC

CIS’04 Shanghai, P.R. ChinaDecember 16, 2004http://www.nersc.gov/~simon

Overview

• Introducing NERSC and Computing Sciences at Berkeley Lab

• Current Trends in Supercomputing (High-End Computing)

• What Supercomputers Do

• What Supercomputers Can’t Do

http://www.nersc.gov/

NERSC Serves the Scientific Community

NERSC Center Overview

• Funded by DOE, annual budget $38M, about 60 staff– Traditional strategy to invest equally in newest

compute platform, staff, and other resources• Supports open, unclassified, basic research• Close collaborations between university and NERSC

in computer science and computational science


NERSC System Architecture

SYMBOLICMANIPULATION

SERVER

ETHERNET10/100 Megabit

FC Disk

STKRobots

ESnet

HPSS

Gigabit EthernetJumbo Gigabit Ethernet

SGI

HPSS

OC 48 – 2400 Mbps

HPPS12 IBM SP servers

15 TB of cache disk, 8 STK robots, 44,000 tape slots, 20

200 GB drives, 60 20 GB drives,max capacity 5-8 PB

PDSF400 processors

(Peak 375 GFlop/s)/ 360

GB of Memory/ 35 TB of

Disk/Gigabit and Fast EthernetRatio = (1,93)

IBM SPNERSC-3 – “Seaborg”

6,656 Processors (Peak 10 TFlop/s)/ 7.8 Terabyte

Memory/44Terabytes of Disk Ratio = (8,7)

LBNL “Alvarez” Cluster174 processors (Peak

150 GFlop/s)/87 GB of Memory/1.5

terabytes of Disk/ Myrinet 2000

Ratio - (.6,100)

Ratio = (RAM Bytes per Flop, Disk Bytes per Flop)

Testbeds and servers

Visualization Server – “escher”SGI Onyx 3400 – 12 Processors/

2 Infinite Reality 4 graphics pipes 24 Gigabyte Memory/4Terabytes

Disk


NERSC Capability Plan

2005 2006 2007 2008 20090

10203040

Tera

Flop

s

NERSC 3NCSNCSbNERSC 5LNERSC 6LCluster

5060708090

100

Year

13

81

248

110120130140150160170

2010

81

229 202


Overview






Technology Trends: Microprocessor Capability

2X transistors/chip every 1.5 years Called “Moore’s Law”

Moore’s Law

Microprocessors have become smaller, denser, and more powerful.

Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips would double roughly every 18 months.

Slide source: Jack Dongarra


1.127 PF/s

1.167 TF/s

59.7 GF/s

70.72 TF/s

0.4 GF/s

850 GF/s

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

Fuj itsu'NWT' NAL

NECEarth Simulator

Intel ASCI RedSandia

IBM ASCI WhiteLLNL

N=1

N=500

SUM

1 Gflop/ s

1 Tflop/ s

100 Mflop/ s

100 Gflop/ s

100 Tflop/ s

10 Gflop/ s

10 Tflop/ s

1 Pflop/ s IBMBlueGene/ L

TOP 500 Performance Development

My Laptop

TOP 500 Performance Projection

1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015

N=1

N=500

SUM

1 Gflop/s

1 Tflop/s

100 Mflop/s

100 Gflop/s

100 Tflop/s

10 Gflop/s

10 Tflop/s

1 Pflop/s

10 Pflop/s

1 Eflop/s

100 Pflop/s

DARPA HPCS

Asian Countries

0

50

100

150

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

Others

India

China

Korea, South

J apan

• Microprocessors have made desktop computing in 2004 what supercomputing was in 1993.

• Massive Parallelism has changed the “high end” completely.

• Today clusters of Symmetric Multiprocessors are the standard supercomputer architecture.

• The microprocessor revolution will continue with little attenuation for at least another 10 years.

• Continued discussion over architecture for High-End Computing (custom versus commodity).

Supercomputing Today


Overview






What Supercomputers Do

- Introducing Computational Science and Engineering (CSE)

- Four important important observations about CSE illustrated by examples from NERSC


Simulation: The Third Pillar of Science

• Traditional scientific and engineering paradigm:(1) Do theory or paper design.(2) Perform experiments or build system.

• Limitations: – Too difficult—build large wind tunnels. – Too expensive—build a throw-away passenger jet. – Too slow—wait for climate or galactic evolution. – Too dangerous—weapons, drug design, climate

experimentation.

• Computational science paradigm:(3) Use high performance computer systems

to simulate the phenomenon• Based on known physical laws and efficient numerical

methods.


Computational Science – Third Pillar of Science

SubsurfaceTransport

Many programs in DOEneed dramatic advancesin simulation capabilities

to meet theirmission goals-

SciDAC program created in 2001

Health Effects, Bioremediation

Combustion

Materials

Fusion Energy

Componentsof Matter

GlobalClimate


Computational Science and Engineering (CSE)

• CSE is a widely accepted label for an evolving field concerned with the science of and the engineering of systems and methodologies to solve computational problems arising throughout science and engineering

• CSE is characterized by– Multi - disciplinary– Multi - institutional– Requiring high end resources– Large teams– Focus on community software

• CSE is not “just programming” (and not CS)• Teraflop/s computing is necessary but not sufficientReference: Petzold, L., et al., Graduate Education in CSE, SIAM Rev., 43(2001), 163-

177


First Observation about CSE

1. CSE permits us to ask new scientific questions

• The increased computational capability available today lets us do more of the same (scaling to larger problems, more refinement etc.) ,

• but it is most effectively used, when addressing qualitatively new science questions.


High Resolution Climate Modeling on NERSC-3 – P. Duffy, et al., LLNL

Wintertime PrecipitationAs model resolution becomes finer,

results converge towards observations

Tropical Cyclones and Hurricanes

Research by: Michael Wehner, Berkeley Lab, Phil Duffy, and G. Bala, LLNL

• Hurricanes are extreme events with large impacts on human and natural systems

• Characterized by high vorticity (winds), very low pressure centers, and upper air temperature warm anomalies

• Wind speeds on the Saffir-Simpson Hurricane Scale– Category one: 74-95 mph (64-82 kt or 119-153 km/hr)– Category two: 96-110 mph (83-95 kt or 154-177 km/hr)– Category three: 111-130 mph (96-113 kt or 178-209 km/hr) – Category four: 131-155 mph (114-135 kt or 210-249 km/hr)– Category five: >155 mph (135 kt or 249 km/hr).

How will the hurricane cycle change as the mean climate changes?


Tropical Cyclones in Climate Models

• Tropical cyclones are not generally seen in integrations of global atmospheric general circulation models at climate model resolutions (T42 ~ 300 km).

• In fact, in CCM3 at T239 (50 km), the lowest pressure attained is 995 mb. No realistic cyclones are simulated.

• However, in high resolution simulations of the finite volume dynamics version of CAM2, strong tropical cyclones are common.


Finite Volume Dynamics CAM

• Run in an ‘AMIP’ Mode– Specified sea surface temperature and sea ice extent– Integrated from 1979 to 2000

• We are studying four resolutions– B: 2ox2.5o

– C: 1ox1.25o

– D: 0.5ox0.625o

– E: 0.25ox0.375o

• Processor Configuration and Cost (IBM SP3)– B: 64 processors, 10 wall clock hours / simulated year– C: 160 processors, 22 wall clock hours / simulated year– D: 640 processors, 33 wall clock hours / simulated year– E: 640 processors, 135 wall clock hours / simulated year


New Science Question: Hurricane Statistics

1979 1980 1981 1982 ObsNorthwest PacificBasin

>25 ~30 40

Atlantic Basin

~6 ~12 ?

Work in progress—results to be published later this year

What is the effect of different climate scenarios on number and severity of tropical storms?


Second Observation about CSE

2. CSE makes most progress when applied mathematics and computer science are tightly integrated into the project

• Increasing computer power alone will not give us sufficient capability to solve most important problems

• Teraflop/s is necessary but not sufficient


Application in Combustion:Block-Structured AMR

(J. Bell and P. Colella, LBNL)Each level is a union of rectangular patchesEach grid patch:• Logically structured, rectangular• Refined in space and time by

evenly dividing coarse grid cells• Dynamically created/destroyed to track time-

dependent features• In parallel, grids distributed based on work

estimate

Level 0 Level 1 Level 2

Block-structured hierarchical grids(Berger and Colella, 1989)


Experiment and Simulation

Experiment by R. Cheng in

LBNL combustion lab

Simulations by J. Bell and M. Day

LBNL using NERSC


V-Flame Simulation Stats

• AMR stats

• Run on seaborg.nersc.gov, 256 CPUs, 2 steps/hr

• In 2004, the Berkeley Lab group is the only group capable of fully detailed simulations of laboratory-scale methane flames.Groups employing traditional simulation techniques areseverely limited, even on vector-parallel supercomputers


Third Observation about CSE

3. The most promising algorithms are a poor match for today’s most popular system architectures


SciDAC Algorithm Success Story• A general sparse solver,

Parallel SuperLU, developed at Berkeley Lab by Sherry Li, has been incorporated into NIMROD

• Improvement in NIMROD execution time by a factor of five to ten on the NERSC IBM SP. “This would be the equivalent of three to five years progress in computing hardware.”

• Sustained performance of sparse solvers on current architectures is less than 10 % of peak


Near Term Science Breakthroughs Enabled by Computing

Science Areas Goals Computational Methods Breakthrough Target

Nanoscience Simulate the synthesis and predict the properties of multi-component nanosystems

Quantum molecular dynamicsQuantum Monte CarloIterative eigensolversDense linear algebra

Parallel 3D FFTs

Simulate nanostructures with hundreds to thousands of atoms as well as transport and optical properties and other parameters

Combustion Predict combustion processes to provide efficient, clean and sustainable energy

Explicit finite differenceImplicit finite difference

Zero-dimensional physicsAdaptive mesh refinement

Lagrangian particle methods

Simulate laboratory scale flames with high fidelity representations of governing physical processes

Fusion Understand high-energy density plasmas and develop an integrated simulation of a fusion reactor

Multi-physics, multi-scaleParticle methods

Regular and irregular accessNonlinear solvers

Adaptive mesh refinement

Simulate the ITER reactor

Climate Accurately detect and attribute climate change, predict future climate and engineer mitigation strategies

Finite difference methodsFFTs

Regular and irregular accessSimulation ensembles

Perform a full ocean/atmosphere climate model with 0.125 degree spacing, with an ensemble of 8-10 runs

Astrophysics Determine through simulations and analysis of observational data the origin, evolution and fate of the universe, the nature of matter and energy, galaxy and stellar evolutions

Multi-physics, multi-scaleDense linear algebra

Parallel 3D FFTsSpherical transforms

Particle methodsAdaptive mesh refinement

Simulate the explosion of a supernova with a full 3D model


Science Drives ArchitectureState-of-the-art computational science requires increasingly diverse and complex algorithms

Only balanced systems that can perform well on a variety of problems will meet future scientists’ needs!Data-parallel and scalar performance are both important

Science Areas

Multi-Physics

and Multi-Scale

Dense Linear

Algebra

FFTs Particle Methods

AMR Data Parallelism

Irregular Control

Flow

Nanoscience X X X X X X

Combustion X X X X XFusion X X X X X XClimate X X X X XAstrophysics X X X X X X X


New Science Presents New Architecture Challenges

Future high end computing requires an architecture capable of achieving high performance across a spectrum of key state-of-the-art applications

• Data parallel algorithms do well on machines with high memory bandwidth (vector or superscalar)

• Irregular control flow requires excellent scalar performance

• Spectral and other methods require high bisection bandwidth


Scalar Performance Increasingly Important

• Cannot use dense methods for largest systems because of N3 algorithm scaling. Need to use sparse and adaptive methods with irregular control flow

• Complex microphysics results in complex inner loops

“It would be a major step backward to acquire a new platform that could reach the 100 Tflop level for only a few applications that had ‘clean’ microphysics. Increasingly realistic models usually mean increasingly complex microphysics. Complex microphysics is not amenable to [simple vector operations].”

– Doug Swesty, SUNY Stony Brook


Overview




• What Supercomputers Still Can’t Do


Projected Performance Development

TOP 500 Performance Projection

1993 1995 1997 1999 2001 2003 2005 2007 2009 2011 2013 2015

N=1

N=500

SUM

1 Gflop/s

1 Tflop/s

100 Mflop/s

100 Gflop/s

100 Tflop/s

10 Gflop/s

10 Tflop/s

1 Pflop/s10 Pflop/s

1 Eflop/s100 Pflop/s

DARPA HPCS

BlueGene/L

The Exponential Growth of Computing, 1900-1998

Hollerith Tabulator

Bell Calculator Model 1

ENIAC

IBM 704

IBM 360 Model 75Cray 1

Pentium II PC

Adapted from Kurzweil, The Age of Spiritual Machines


The Exponential Growth of Computing, 1900-2100

Adapted from Kurzweil, The Age of Spiritual Machines


Growth of Computing Power and “Mental Power”

Hans Moravec, CACM 10, 2003, pp 90-97


Why this simplistic view is wrong

• Unsuitability of Current Architectures– Teraflop systems are focused on excelling in

computing; only one of the six (or eight) dimensions of human intelligence

• Fundamental lack of mathematical models for cognitive processes

– That’s why we are not using the most powerful computers today for cognitive tasks

• Complexity limits– We don’t even know yet how to model

turbulence, how then do we model thought?


“The computer model turns out not to be helpful in explaining what people actually do when they think and perceive”Hubert Dreyfus, pg.189

Example: one of the biggest success stories of machine intelligence, the chess computer “Deep Blue”, did not teach us anything about how a chess grandmaster thinks.


Six Dimensions of Intelligence

1. Verbal-Linguistic ability to think in words and to use language to express and appreciate complex concepts

2. Logical-Mathematical makes it possible to calculate, quantify, consider propositions and hypotheses, and carry out complex mathematical operations

3. Spatial capacity to think and orientate in physical three-dimensional environment

4. Bodily-Kinesthetic ability to manipulate objects and fine-tune physical skills

5. Musical sensitivity to pitch, melody, rhythm, and tone

6. Interpersonal capacity to understand and interact effectively with others

Howard Gardner. Frames of Mind: The Theory of Multiple Intelligences. New York: Basic Books, 1983, 1993.


Current State of Supercomputers


Retina to VisualCortex Mapping

http://cgl.elte.hu/~racz/santafe.html


Building New Models

• About 1/3 of human brain is probably dedicated towards processing of visual information

• We have only very rudimentary knowledge of the principles for human vision computing

• Research project by Don Glaser at UC Berkeley investigates mapping from retina to visual cortex• Attempt to model “optical illusions” and simple movement of objects in the visual cortex• Current models limited to about 10**5 neurons•Project at NERSC in 2005


Fourth Observation about CSE

4. There are vast areas of science and engineering where CSE has not even begun to make an impact

– current list of CSE applications is almost the same as 15 years ago

– current set of architectures is capturing only a small subset of human cognitive abilities

– in many scientific areas there is still an almost complete absence of computational models

See also: Y. Deng, J. Glimm, and D. H. Sharp, Perspectives on Parallel Computing, Daedalus Vol 12 (1992) 31-52.


Major Application Areas of CSE

• Science– Global climate modeling– Astrophysical modeling– Biology: genomics, protein folding, drug design– Computational chemistry– Computational material sciences and nanosciences

• Engineering– Crash simulation– Semiconductor design– Earthquake and structural modeling– Computational fluid dynamics– Combustion

• Business– Financial and economic modeling– Transaction processing, web services, and search engines

• Defense– Nuclear weapons—test by simulations– Cryptography

This list from 2004 is identical to a list from 1992!


Conclusions

• CSE has become well established in the US and is at the threshold of enabling significant scientific breakthroughs

• CSE permits us to ask new scientific questions• CSE makes most progress when applied mathematics and computer

science are tightly integrated

• CSE has tremendous research opportunities for computer scientists and applied mathematicians

• The most promising algorithms are a poor match for today’s most popular system architectures

• The are vast areas of science and engineering where computational modeling has not even begun to make an impact (e.g. cognitive computing)


what supercomputers still can’t do – a reflection on the state of the art in cse horst d. simon

Documents

berkeley labcurrent

dotechnology trends

computing sciencesdirector

computer science

nersccis04 shanghai

resourcessupports open

jack dongarratop

annual budget