a case study of communication optimizations on 3d ...charm.cs.illinois.edu/newpapers/09-24/talk.pdfa...
TRANSCRIPT
A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS
University of Illinois at Urbana-Champaign
Abhinav Bhatele, Eric Bohm, Laxmikant V. KaleParallel Programming Laboratory
Euro-Par 2009
Outline
MotivationSolution: Mapping of OpenAtomPerformance BenefitsBigger Picture:
Resources NeededHeuristic Solutions
Automatic Mapping
August 27th, 2009
2
Abhinav Bhatele @ Euro-Par 2009
OpenAtom
Ab-Initio Molecular Dynamics codeConsider electrostatic interactions between the nuclei and electronsCalculate different energy termsDivided into different phases with lot of communication
August 27th, 2009
3
Abhinav Bhatele @ Euro-Par 2009
OpenAtom on Blue Gene/L
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
4
0
0.05
0.1
0.15
0.2
0.25
0.3
512 1024 2048 4096 8192
Tim
e pe
r ste
p (s
ecs)
No. of cores
w32 Default
Runs on Blue Gene/L at IBM T J Watson Research Center, CO mode
w32 = 32 water molecules with 70 Ry cutoff
The problem lies in …
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
5
Performance Analysis and Visualization Tool: Projections (part of Charm++) – Timeline View
Solution –
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
6
Topology Aware Mapping
Processor Virtualization
User View System View
Programmer: Decomposes the computation into objects
Runtime: Maps the computation on to the processors
August 27th, 2009
7
Abhinav Bhatele @ Euro-Par 2009
Benefits of Charm++
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
8
Computation is divided into objects/chares/virtual processors (VPs)Separates decomposition from mappingVPs can be flexibly mapped to actual physical processors (PEs)
Topology Manager API†
The application needs information such asDimensions of the partitionRank to physical co-ordinates and vice-versa
TopoManager: a uniform APIOn BG/L and BG/P: provides a wrapper for system callsOn XT3/4/5, there are no such system callsProvides a clean and uniform interface to the application
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
9
† http://charm.cs.uiuc.edu/~bhatele/phd/topomgr.htm
Parallelization using Charm++
August 27th, 2009
10
Eric Bohm, Glenn J. Martyna, Abhinav Bhatele, Sameer Kumar, Laxmikant V. Kale, John A. Gunnels, and Mark E. Tuckerman. Fine Grained Parallelization of the Car-Parrinello ab initio MD Method on Blue Gene/L. IBM J. of R. and D.: Applications of Massively Parallel Systems, 52(1/2):159-174, 2008.
Abhinav Bhatele @ Euro-Par 2009
Mapping Challenge
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
11
Load Balancing: Multiple VPs per PEMultiple groups of communicating objects
Intra-group communicationInter-group communication
Conflicting communication requirements
Topology Mapping of Chare Arrays
August 27th, 2009
12
RealSpace and GSpace have state-wisecommunication Paircalculator and
GSpace have plane-wisecommunication
Abhinav Bhatele @ Euro-Par 2009
Performance Improvements on BG/L
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
13
0
0.05
0.1
0.15
0.2
0.25
0.3
512 1024 2048 4096 8192
Tim
e pe
r ste
p (s
ecs)
No. of cores
w32 Defaultw32 Topology
Runs on Blue Gene/L at IBM T J Watson Research Center, CO mode, Year: 2006
w32 = 32 water molecules with 70 Ry cutoff
Improved Timeline Views
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
14
Results on Blue Gene/L
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
15
0
5
10
15
20
25
1024 2048 4096 8192 16384
Tim
e pe
r ste
p (s
ecs)
No. of cores
w256 Defaultw256 TopologyGST_BIG DefaultGST_BIG Topology
GST_BIG = 64 Ge, 128 Sb and 256 Te molecules with 20 Ry cutoff
Runs on Blue Gene/L at IBM T J Watson Research Center, CO mode
Results on Blue Gene/P
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
16
0
2
4
6
8
10
12
1024 2048 4096 8192
Tim
e pe
r ste
p (s
ecs)
No. of cores
w256 Defaultw256 Topology
w256 = 256 water molecules with 70 Ry cutoff
Runs on Blue Gene/P at Argonne National Laboratory, VN mode
Results on Cray XT3
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
17
2
3
4
5
6
7
8
512 1024 2048
Tim
e pe
r ste
p (s
ecs)
No. of cores
w256 Defaultw256 TopologyGST_BIG DefaultGST_BIG Topology
Runs on Cray XT3 (Bigben) at Pittsburgh Supercomputing Center, VN mode(with system reservation to obtain complete 3d mesh shapes)
Performance Analysis
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
18
0
500000
1000000
1500000
2000000
2500000
3000000
1024 2048 4096 8192
Idle
Tim
e (s
ecs)
No. of cores
DefaultTopology
Performance Analysis and Visualization Tool: Projections – Idle time added across all processors
w256M_70Ry on Blue Gene/L
Reduction in Communication Volume
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
19
0
200
400
600
800
1000
1200
1024 2048 4096 8192
Band
wid
th (
GB)
No. of cores
DefaultTopology
Data obtained from Blue Gene/P’s Uniform Performance Counters
w256M_70Ry on Blue Gene/P
Relative Performance Improvement
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
20
0
0.5
1
1.5
2
2.5
512 1024 2048 4096 8192
% Im
prov
emen
t
No. of cores
Cray XT3Blue Gene/LBlue Gene/P
w256M_70Ry
Bigger picture
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
21
Different kinds of applications:Computation boundCommunication bound
Latency tolerantLatency sensitive
Technique:Obtain processor topology and application communication graphHeuristic Techniques for mapping
Why does distance affect message latencies?
Consider a 3D mesh/torus interconnect
Message latencies can be modeled by
(Lf/B) x D + L/B
Lf = length of flit, B = bandwidth,
D = hops, L = message size
When (Lf * D) << L, first term is negligible
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
22
But in presence of contention …
Automatic Topology Aware Mapping
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
23
Many MPI applications exhibit a simple two-dimensional near-neighbor communication patternExamples: MILC, WRF, POP, Stencil, …
Acknowledgements:Shawn Brown and Chad Vizino (PSC)Glenn Martyna, Sameer Kumar, Fred Mintzer (IBM)Teragrid for running time on Bigben (XT3)ANL for running time on Blue Gene/P
DOE Grant B341494 (CSAR), DOE Grant DE-FG05-08OR23332 (ORNL LCF) and NSF Grant ITR 0121357
August 27th, 2009Abhinav Bhatele @ Euro-Par 2009
Funding
E-mail: [email protected]: http://charm.cs.illinois.edu