high performance computing an overview alan edelman massachusetts institute of technology applied...
TRANSCRIPT
![Page 1: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/1.jpg)
High Performance ComputingAn overview
Alan EdelmanMassachusetts Institute of Technology
Applied Mathematics & Computer Science and AI Labs
(Interactive Supercomputing, Chief Science Officer)
![Page 2: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/2.jpg)
Not said: many powerful computer owners prefer low profiles
![Page 3: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/3.jpg)
Some historical machines
![Page 4: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/4.jpg)
Earth Simulator was #1 now #30
![Page 5: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/5.jpg)
Moore’s Law
• The number of people who point out that “Moore’s Law” is dead is doubling every year.
• Feb 2008: NSF requests $20M for "Science and Engineering Beyond Moore's Law" – Ten years out, Moore’s law itself may be dead
• Moore’s law has various forms and versions never stated by Moore but roughly doubling every 18 months-2 years– Number of transistors– Computational Power– Parallelism!
Still good for a while!
At Risk!
![Page 6: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/6.jpg)
AMD Opteron quadcore8350 Sept 2007Eight core in 2009?
2.0? 2.0?
![Page 7: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/7.jpg)
Intel Clovertown and Dunnington
Six Core: Later in 2008?
![Page 8: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/8.jpg)
Sun Niagara 2
Cro
ssba
r S
witc
h
Fully Buffered DRAM
4MB
Sha
red
L2 (
16 w
ay)
42.7GB/s (read), 21.3 GB/s (write)
8K D$MT UltraSparcFPU
8K D$MT UltraSparcFPU
8K D$MT UltraSparcFPU
8K D$MT UltraSparcFPU
8K D$MT UltraSparcFPU
8K D$MT UltraSparcFPU
8K D$MT UltraSparcFPU
8K D$MT UltraSparcFPU
179
GB
/s(f
ill)
90 G
B/s
(writ
ethr
u)
4x128b FBDIMM memory controllers
1.4gHz16 core in 2008?
![Page 9: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/9.jpg)
Accelerators
XDR DRAM
25.6GB/s
EIB
(Ring N
etwork)
<<20GB/seach
direction
SPE256K
PPE512K L2
MFC
BIF
XDR
SPE256KMFC
SPE256KMFC
SPE256KMFC
SPE256KMFC
SPE256KMFC
SPE256KMFC
SPE256KMFC
XDR DRAM
25.6GB/s
EIB
(Ring N
etwork)
SPE 256K
PPE 512K L2
MFC
BIF
XDR
SPE 256K MFC
SPE 256K MFC
SPE 256K MFC
SPE 256K MFC
SPE 256K MFC
SPE 256K MFC
SPE 256K MFC
IBM Cell Blade
Global Thread SchedulerGlobal Thread Scheduler
16K 16K
SMSM SMSM
Address unitsAddress units
8KB L1const$, tex$
16K 16K
SMSM SMSM
Address unitsAddress units
8KB L1const$, tex$
16K 16K
SMSM SMSM
Address unitsAddress units
8KB L1const$, tex$
16K 16K
SMSM SMSM
Address unitsAddress units
8KB L1const$, tex$
16K 16K
SMSM SMSM
Address unitsAddress units
8KB L1const$, tex$
16K 16K
SMSM SMSM
Address unitsAddress units
8KB L1const$, tex$
16K 16K
SMSM SMSM
Address unitsAddress units
8KB L1const$, tex$
16K 16K
SMSM SMSM
Address unitsAddress units
8KB L1const$, tex$
86.4 GB/s
768MB GDDR3 Device DRAM768MB GDDR3 Device DRAM
Crossbar?? Ring??Crossbar?? Ring??
128KB L2 const$ & texture$ (shared across SMs)128KB L2 const$ & texture$ (shared across SMs)
DRAM controllers (6 x 64b)DRAM controllers (6 x 64b)
NVIDIA
![Page 10: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/10.jpg)
Sicortex
• Teraflops from Milliwatts
![Page 11: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/11.jpg)
Software
Give me software leverage and a supercomputer, and I shall solve the world’s problems(apologies to)Archimedes
![Page 12: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/12.jpg)
What’s wrong with this story?
• I can’t get my five year old son off my (serial) computer
• I have access to the world’s fastest machines and have nothing cool to show him!
![Page 13: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/13.jpg)
Engineers and Scientists(The leading indicators)
• Mostly work in serial (still!) (Just like my 5 year old)
• Those working in parallelGo to conferences, show off speedups
• Software: MPI– (Message Passing Interface)– Really thought of as the only choice– Some say the assembler of parallel computing– Some say has allowed code to be portable– Others say has held back progress and performance
![Page 14: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/14.jpg)
Old Homework (emphasized for effect)
• Download a parallel program from somewhere.– Make it work
• Download another parallel program– Now, …, make them work together!
![Page 15: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/15.jpg)
Apples and Oranges
• A: row distributed array (or worse)
• B: column distributed array(or worse)
• C=A+B
![Page 16: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/16.jpg)
MPI Performance vs PThreadsProfessional Performance Study by
Sam Williams
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Dense
Protein
FEM-Sphr
FEM-Cant
Tunnel
FEM-Har
QCD
FEM-Ship
Econom
Epidem
FEM-Accel
Circuit
Webbase
LP
Median
GFlop/s
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Dense
Protein
FEM-Sphr
FEM-Cant
Tunnel
FEM-Har
QCD
FEM-Ship
Econom
Epidem
FEM-Accel
Circuit
Webbase
LP
Median
GFlop/s
MPI(autotuned) Pthreads(autotuned)Naïve Single Thread
Intel Clovertown AMD Opteron
MPI may introduce speed bumps on current architectures
![Page 17: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/17.jpg)
MPI Based Libraries
Typical sentence: … we enjoy using parallel computing libraries such as Scalapack
• What else? … you know, such as scalapack
• And …? Well, there is scalapack
• (petsc, superlu, mumps, trilinos, …)
• Very few users, still many bugs, immature
• Highly Optimized Libraries? Yes and No
![Page 18: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/18.jpg)
Natural Question may not be the most important
• How do I parallelize x?– First question many students ask– Answer often either one of
• Fairly obvious• Very difficult
– Can miss the true issues of high performance• These days people are often good at exploiting locality for
performance• People are not very good about hiding communication and
anticipating data movement to avoid bottlenecks• People are not very good about interweaving multiple functions to
make the best use of resources– Usually misses the issue of interoperability
• Will my program play nicely with your program?• Will my program really run on your machine?
![Page 19: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/19.jpg)
Real Computations have Dependencies (example FFT)
Time wasted on the telephone
![Page 20: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/20.jpg)
Modern Approaches• Allow users to “wrap up” computations into nice packages often denoted
threads• Express dependencies among threads• Threads need not be bound to a processor• Not really new at all: see Arvind Dataflow etc• Industry not yet caught up with the damage SPMD and MPI has done• See Transactional Memories, Streaming Languages etc.
Advantages• Easier on Programmer• More productivity• Allows for autotuning• Can Overlap Communication with Computation
![Page 21: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/21.jpg)
LU Example
![Page 22: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/22.jpg)
Software
Give me software leverage and a supercomputer, and I shall solve the world’s problems(apologies to)Archimedes
![Page 23: High Performance Computing An overview Alan Edelman Massachusetts Institute of Technology Applied Mathematics & Computer Science and AI Labs (Interactive](https://reader036.vdocument.in/reader036/viewer/2022070415/5697bfed1a28abf838cb903b/html5/thumbnails/23.jpg)
New Standards for Quality of Computation
• Associative Law:(a+b)+c=a+(b+c)
• Not true in roundoff
• Mostly didn’t matter in serial
• Parallel computation reorganizes computation
• Lawyers get very upset!