trends in computing architectureramani/cmsc828e_gpusci/lecture1.pdftrends in computing architecture...
TRANSCRIPT
Trends in Computing Architecture
CMSC828E
Ramani Duraiswami
Several slides taken from a Microway/NVIDIA webinarSome figures adapted from web sources
Problem sizes in simulation and data processing are increasing
• Change in paradigm in science– Simulate then test
– Fidelity demands larger simulations
– Problems being simulated are also much more
• Sensors are getting varied and cheaper; and storage is getting cheaper
– Cameras, microphones
• Other Large data– Text (all the newspapers, books, technical papers)
– Genome data
– Medical/biological data (X-Ray, PET, MRI, Ultrasound, Electron microscopy …)
– Climate (Temperature, Salinity, Pressure, Wind, Oxygen content, …)
Ways to attack problem size growth
• Faster algorithms with better asymptotic complexity
• Faster processors
– “Moore’s law will take care of it”
• Go parallel!
– Clusters of computers
– New data parallel chips (multicore processors, GPUs)
“Moore’s Law will take care of it”• Not law but an
observation by Gordon Moore in the 1960s
• Number of transistors doubles every 18 months
• Basically has been taken to mean that the “standard computer”sperformance improves exponentially, with a doubling time of 18 months
Refuting the Moore’s law argument
• Argument:– Moore’s law: Processor speed doubles every 18 months
– If we wait long enough the computer will get fast enough and let my inefficient algorithm tackle the problem
• Is this true?
– Yes for algorithms with linear asymptotic complexity
– No!! For algorithms with different asymptotic complexity
– Most scientific algorithms are O(N2) or O(N3)
– For a million variables, we would need about 16 generations of Moore’s law before a O(N2) algorithm was comparable with a O(N) algorithm
• Did no one tell you that Moore’s law is dead?
Moore’s Law is dead:
“Issues at small scales”
- Lithography not possible
- 2D electrostatics harder to control,
- “parasitic resistance” degrade performance,
- device to device variations will be larger,
- ultra-thin bodies and hyper-abrupt junctionsmake manufacturing difficult
Moore’s Law is dead!
• Feature sizes and clock speeds on commodity chips have been stagnant over the past 4 years
– ~3 GHz and 45 nm
• All manufacturers are going with multicore to maintain performance
– Core-2, core-2-duo, quad-core, …
• Shared memory multiprocessing
– Intel has demo’ed several many core systems
• Graphics processors and gaming consoles have already been on the multicore path for a decade!
Sony Playstation 3
2.18 teraflops <$400
Difficult to program
Microsoft X-Box 360
1.04 teraflops <$300
Difficult to program
Gamer Power
GEFORCE 8880 GTXMulticore Intel box with 3 GPUsin Slots~ 1 Teraflop for < 3000(shown with 1 GPU)
Programming on the GPU• GPU organized as groups of
multiprocessors (8 relatively slow processors) with small amount of own memory and access to common shared memory
• Factor of 100s difference in speed as one goes up the memory hierarchy
• To achieve gains problems must fit the GPU programming paradigm/ manage memory
• Fortunately many practically important tasks do map well and work on converting others– Image and Audio Processing– Some types of linear algebra cores– Many machine learning algorithms
• Research issues: – Identifying important tasks and mapping them to the
architecture– Making it convenient for programmers to call GPU
code from host code
Local memory~50kB
GPU sharedmemory~1GB
Host memory~2-32 GB
11
4 cores
What is GPU Computing?
Computing with CPU + GPU
Heterogeneous Computing
12
146X
Medical Medical
Imaging Imaging
U of UtahU of Utah
36X
Molecular Molecular
DynamicsDynamics
U of Illinois, U of Illinois,
UrbanaUrbana
18X
Video Video
TranscodingTranscoding
Elemental TechElemental Tech
50X
MatlabMatlab
ComputingComputing
AccelerEyesAccelerEyes
100X
AstrophysicAstrophysic
ss
RIKENRIKEN
149X
Financial Financial
simulationsimulation
OxfordOxford
47X
Linear AlgebraLinear AlgebraUniversidad
Jaime
20X
3D 3D
UltrasoundUltrasound
TechniscanTechniscan
130X
Quantum Quantum
ChemistryChemistry
U of Illinois, U of Illinois,
UrbanaUrbana
30X
Gene Gene
SequencingSequencing
U of MarylandU of Maryland
Not 2x or 3x : Speedups are 20x to 150x
13
Accelerating Time to Discovery
4.6 Days
27 Minutes
2.7 Days
30 Minutes
8 Hours
13 Minutes16 Minutes
3 Hours
CPU Only With GPU
14
Source: Stone, Phillips, Hardy, Schulten
Molecular Dynamics
Available MD software
NAMD / VMD (alpha release)
HOOMD
ACE-MD
MD-GPU
Ongoing work
LAMMPS
CHARMM
GROMACS
AMBER
Source: Anderson, Lorenz, Travesset
15
Quantum Chemistry
Source: Ufimtsev, Martinez
Source: Yasuda
Available MD software
NAMD / VMD (alpha release)
HOOMD
ACE-MD
Ongoing work
LAMMPS
CHARMM
Q-Chem
Gaussian
GAMESS
16
Computational Fluid Dynamics (CFD)
Source: Thibault, Senocak
Source: Tolke, Krafczyk
Ongoing work
Navier-Stokes
Lattice Boltzman
3D Euler Solver
Weather and ocean modeling
17
Electromagnetics / Electrodynamics
FDTD Acceleration using GPUsSource: Acceleware
FDTD Solvers
Acceleware
EM Photonics
CUDA Tutorial
Ongoing work
Maxwell equation solver
Ring Oscillator (FDTD)
Particle beam dynamics
simulator
18
Weather, Atmospheric, & Ocean Modeling
Source: Michalakes,
Vachharajani
CUDA-accelerated WRF available
Other kernels in WRF being ported
Ongoing work
Tsunami modeling
Ocean modeling
Several CFD codes
Source: Matsuoka, Akiyama, et al
19
Computational Finance
Source: CUDA SDK
Financial Computing Software vendors
SciComp : Derivatives pricing modeling
Hanweck: Options pricing & risk analysis
Aqumin: 3D visualization of market data
Exegy: High-volume Tickers & Risk Analysis
QuantCatalyst: Pricing & Hedging Engine
Oneye: Algorithmic Trading
Arbitragis Trading: Trinomial Options
Pricing
Ongoing work
LIBOR Monte Carlo market model
Callable Swaps and Continuous Time
Source: SciComp