a case study of communication optimizations on 3d ...charm.cs.illinois.edu/newpapers/09-24/talk.pdfa...

24
A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of Illinois at Urbana-Champaign Abhinav Bhatele, Eric Bohm, Laxmikant V. Kale Parallel Programming Laboratory Euro-Par 2009

Upload: others

Post on 09-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

A CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS

University of Illinois at Urbana-Champaign

Abhinav Bhatele, Eric Bohm, Laxmikant V. KaleParallel Programming Laboratory

Euro-Par 2009

Page 2: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Outline

MotivationSolution: Mapping of OpenAtomPerformance BenefitsBigger Picture:

Resources NeededHeuristic Solutions

Automatic Mapping

August 27th, 2009

2

Abhinav Bhatele @ Euro-Par 2009

Page 3: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

OpenAtom

Ab-Initio Molecular Dynamics codeConsider electrostatic interactions between the nuclei and electronsCalculate different energy termsDivided into different phases with lot of communication

August 27th, 2009

3

Abhinav Bhatele @ Euro-Par 2009

Page 4: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

OpenAtom on Blue Gene/L

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

4

0

0.05

0.1

0.15

0.2

0.25

0.3

512 1024 2048 4096 8192

Tim

e pe

r ste

p (s

ecs)

No. of cores

w32 Default

Runs on Blue Gene/L at IBM T J Watson Research Center, CO mode

w32 = 32 water molecules with 70 Ry cutoff

Page 5: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

The problem lies in …

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

5

Performance Analysis and Visualization Tool: Projections (part of Charm++) – Timeline View

Page 6: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Solution –

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

6

Topology Aware Mapping

Page 7: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Processor Virtualization

User View System View

Programmer: Decomposes the computation into objects

Runtime: Maps the computation on to the processors

August 27th, 2009

7

Abhinav Bhatele @ Euro-Par 2009

Page 8: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Benefits of Charm++

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

8

Computation is divided into objects/chares/virtual processors (VPs)Separates decomposition from mappingVPs can be flexibly mapped to actual physical processors (PEs)

Page 9: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Topology Manager API†

The application needs information such asDimensions of the partitionRank to physical co-ordinates and vice-versa

TopoManager: a uniform APIOn BG/L and BG/P: provides a wrapper for system callsOn XT3/4/5, there are no such system callsProvides a clean and uniform interface to the application

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

9

† http://charm.cs.uiuc.edu/~bhatele/phd/topomgr.htm

Page 10: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Parallelization using Charm++

August 27th, 2009

10

Eric Bohm, Glenn J. Martyna, Abhinav Bhatele, Sameer Kumar, Laxmikant V. Kale, John A. Gunnels, and Mark E. Tuckerman. Fine Grained Parallelization of the Car-Parrinello ab initio MD Method on Blue Gene/L. IBM J. of R. and D.: Applications of Massively Parallel Systems, 52(1/2):159-174, 2008.

Abhinav Bhatele @ Euro-Par 2009

Page 11: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Mapping Challenge

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

11

Load Balancing: Multiple VPs per PEMultiple groups of communicating objects

Intra-group communicationInter-group communication

Conflicting communication requirements

Page 12: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Topology Mapping of Chare Arrays

August 27th, 2009

12

RealSpace and GSpace have state-wisecommunication Paircalculator and

GSpace have plane-wisecommunication

Abhinav Bhatele @ Euro-Par 2009

Page 13: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Performance Improvements on BG/L

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

13

0

0.05

0.1

0.15

0.2

0.25

0.3

512 1024 2048 4096 8192

Tim

e pe

r ste

p (s

ecs)

No. of cores

w32 Defaultw32 Topology

Runs on Blue Gene/L at IBM T J Watson Research Center, CO mode, Year: 2006

w32 = 32 water molecules with 70 Ry cutoff

Page 14: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Improved Timeline Views

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

14

Page 15: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Results on Blue Gene/L

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

15

0

5

10

15

20

25

1024 2048 4096 8192 16384

Tim

e pe

r ste

p (s

ecs)

No. of cores

w256 Defaultw256 TopologyGST_BIG DefaultGST_BIG Topology

GST_BIG = 64 Ge, 128 Sb and 256 Te molecules with 20 Ry cutoff

Runs on Blue Gene/L at IBM T J Watson Research Center, CO mode

Page 16: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Results on Blue Gene/P

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

16

0

2

4

6

8

10

12

1024 2048 4096 8192

Tim

e pe

r ste

p (s

ecs)

No. of cores

w256 Defaultw256 Topology

w256 = 256 water molecules with 70 Ry cutoff

Runs on Blue Gene/P at Argonne National Laboratory, VN mode

Page 17: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Results on Cray XT3

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

17

2

3

4

5

6

7

8

512 1024 2048

Tim

e pe

r ste

p (s

ecs)

No. of cores

w256 Defaultw256 TopologyGST_BIG DefaultGST_BIG Topology

Runs on Cray XT3 (Bigben) at Pittsburgh Supercomputing Center, VN mode(with system reservation to obtain complete 3d mesh shapes)

Page 18: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Performance Analysis

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

18

0

500000

1000000

1500000

2000000

2500000

3000000

1024 2048 4096 8192

Idle

Tim

e (s

ecs)

No. of cores

DefaultTopology

Performance Analysis and Visualization Tool: Projections – Idle time added across all processors

w256M_70Ry on Blue Gene/L

Page 19: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Reduction in Communication Volume

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

19

0

200

400

600

800

1000

1200

1024 2048 4096 8192

Band

wid

th (

GB)

No. of cores

DefaultTopology

Data obtained from Blue Gene/P’s Uniform Performance Counters

w256M_70Ry on Blue Gene/P

Page 20: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Relative Performance Improvement

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

20

0

0.5

1

1.5

2

2.5

512 1024 2048 4096 8192

% Im

prov

emen

t

No. of cores

Cray XT3Blue Gene/LBlue Gene/P

w256M_70Ry

Page 21: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Bigger picture

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

21

Different kinds of applications:Computation boundCommunication bound

Latency tolerantLatency sensitive

Technique:Obtain processor topology and application communication graphHeuristic Techniques for mapping

Page 22: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Why does distance affect message latencies?

Consider a 3D mesh/torus interconnect

Message latencies can be modeled by

(Lf/B) x D + L/B

Lf = length of flit, B = bandwidth,

D = hops, L = message size

When (Lf * D) << L, first term is negligible

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

22

But in presence of contention …

Page 23: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Automatic Topology Aware Mapping

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

23

Many MPI applications exhibit a simple two-dimensional near-neighbor communication patternExamples: MILC, WRF, POP, Stencil, …

Page 24: A Case Study of Communication Optimizations on 3D ...charm.cs.illinois.edu/newPapers/09-24/talk.pdfA CASE STUDY OF COMMUNICATION OPTIMIZATIONS ON 3D MESH INTERCONNECTS University of

Acknowledgements:Shawn Brown and Chad Vizino (PSC)Glenn Martyna, Sameer Kumar, Fred Mintzer (IBM)Teragrid for running time on Bigben (XT3)ANL for running time on Blue Gene/P

DOE Grant B341494 (CSAR), DOE Grant DE-FG05-08OR23332 (ORNL LCF) and NSF Grant ITR 0121357

August 27th, 2009Abhinav Bhatele @ Euro-Par 2009

Funding

E-mail: [email protected]: http://charm.cs.illinois.edu