© 2009 ibm corporation parallel programming with x10/apgas ibm upc and x10 teams through languages...

Parallel Programming with X10/APGAS

The ChallengeParallelism scaling replaces frequency scaling as foundation for increased performance Profound impact on future software

Multi-core chips Cluster ParallelismHeterogeneous Parallelism

16B/cycle (2x)16B/cycle

FlexIOTM

Dual XDRTM

16B/cycle

EIB (up to 96B/cycle)

16B/cycle

64-bit Power Architecture with VMX

SXUSPU

16B/cycle

L232B/cycle

SXUSPU

16B/cycle (2x)16B/cycle

FlexIOTM

Dual XDRTM

16B/cycle

EIB (up to 96B/cycle)

16B/cycle

64-bit Power Architecture with VMX

SXUSPU

16B/cycle

L232B/cycle

SXUSPU

SMF. . .

. . .L2 Cache

PEs,L1 $

. . .L2 Cache

PEs,L1 $

Interconnect

SMP Node

Memory

PEs PEs

SMP Node

Memory

PEs PEs

(100’s of suchcluster nodes)

I/Ogatewaynodes

“Scalable Unit” Cluster Interconnect Switch/Fabric

Large Scale Parallelism

Blue Gene

Road Runner

IBM UPC and X10 teams

Through languages– Asynchronous Co-Array Fortran

– extension of CAF with asyncs

– Asynchronous UPC (AUPC)

– Proper extension of UPC with asyncs

– X10 (already asynchronous)

– Extension of sequential Java Language runtimes share common APGAS runtime

through an APGAS library in C, Fortran, Java (co-habiting with MPI)– Implements PGAS

– Remote references– Global data-structures

– Implements inter-place messaging

– Optimizes inlineable asyncs– Implements global and/or collective operations

– Implements intra-place concurrency

– Atomic operations• Algorithmic scheduler

Libraries reduce cost of adoption, languages offer enhanced productivity benefits

XL UPC status: on path to IBM supported product in 2011

APGAS Realization Programming model is still based on shared memory.

– Familiar to many programmers.

Place hierarchies provide a way to deal with heterogeneity.– Async data transfers between places are not an ad-hoc artifact of the Cell.

Asyncs offer an elegant framework subsuming multi-core parallelism and messaging.

There are many opportunities for compiler optimizations – E.g. communication aggregation.

– So the programmer can write more abstractly and still get good performance

There are many opportunities for static checking for concurrency/distribution design errors.

Programming model is implementable on a variety of hardware architectures– Leads to better application portability..

– There are many opportunities for hardware optimizations based on APGAS

APGAS Advantages

X10 Project Status X10 is an APGAS language in the Java family of languages

X10 is an open source project (Eclipse Public License)• Documentation, releases, implementation source code, benchmarks, etc. all publicly available at http://x10-lang.org

X10 and X10DT 2.0 Just Released!• Added structs for improved space/time efficiency• More flexible distributed object model (global fields/methods)• Static checking of place types (locality constraints)• X10DT 2.0 supports X10 C++ backend• X10 2.0 used in 2009 HPC Challenge (Class 2) submission|

X10 2.0 Platforms• Java-backend (compile X10 to Java)

• Runs on any Java 5 JVM• Single process implementation (all places in one JVM)

• C++ backend (compile X10 to C++)• AIX, Linux, cygwin, MacOS, Solaris• PowerPC, x86, x86_64, sparc• Multi-process implementation (one place per process)• Uses common APGAS runtime

• X10 Innovation Grants–http://www.ibm.com/developerworks/university/innovation/–Program to support academic research and curricular development activities in the area of computing at scale on cloud computing platforms based on the X10 programming language.

Asynchronous PGAS Programming Model

• A programming model provides an abstraction of the architecture that enables programmers to express their solutions in manner relevant to their domain– Mathematicians write equations– MBAs write business logic

• Compilers, language runtimes, libraries, and operating systems implement the programming model, bridging the gap to the hardware.

• Development and performance tools provide the surrounding ecosystem for a programming model and its implementation.

• The evolution of programming models impacts – Design methodologies – Operating systems– Programming environments

Compilers,Runtimes,Libraries,

Operating Systems

Programming Model

Programming Models

DesignMethodologies

OperatingSystems

ProgrammingEnvironments

Programming Models: Bridging the Gap Between Programmer and Hardware

Fine grained concurrency

• async S

Atomicity

• atomic S

• when (c) S

Global data-structures

• points, regions, distributions, arrays

Place-shifting operations

• at (P) S

Ordering

• finish S

• clock

Two basic ideas: Places and Asynchrony

X10 LU RA Stream FFT

nodes GFlop/s MUP/s GBytes/s GFlops/s

4 354 6.34 325.7 23.67

8 666 12.31 650.5 40.62

16 1268 23.02 1287.8 65.92

32 43.1 2601.5

UPC LU RA Stream FFT

nodes GFlop/s MUP/s GBytes/s GFlops/s

4 379 5.5 140 7.9

8 747 10.8 256 13

16 1442 21.5 523 26.3

32 2333 43.3 1224 39.8

Performance results: Power5+ cluster

4 8 16 32100

10000X10

HPL perf. comparison

4 8 16 321

FFT perf. comparisonIBM Poughkeepsie Benchmark Center

32 Power5+ nodes16 SMT 2x processors/node64 GB/node; 1.9 GHz

HPS switch, 2 GBytes/s/link

Performance results – Blue Gene/P

X10 LU RA Stream FFT

nodes GFlop/s GUP/s GBytes/s GFlops/s

32 117 0.042 141

1024 3893 1.05 4516

UPC LU RA Stream FFT

nodes GFlop/s GUP/s GBytes/s GFlops/s

32 242 0.04 168 6.4

1024 7744 1.27 5376 156

2048 15538 2.54

4096 28062 5.04

IBM TJ Watson Res. Ctr. WatsonShaheen

4 racks Blue Gene/P1024 nodes/rack4 CPUs/node; 850 MHz4 Gbytes/node RAM

16 x 16 x 16 torus 32

20484096

100000X10

321024

20484096

HPL perf. comparison FFT perf. comparison

© 2009 ibm corporation parallel programming with x10/apgas ibm upc and x10 teams through languages...

Documents

x10 tutorial © 2007 ibm corporation ibm research 1 x10...

x10: the big picture x10-users@lists.sourceforge ·...

x10 tutorial

solution: electricity: 1 kw-hr = (1 x10 3 j/s)(3.6 x10 3 s)...

x10 tutorial x10.sf

apgas programming in x10 -...

protocolo x10

ap chem - topic 1...

x10 language...

optimizing x10 code with rose compiler...

an overview of the x10 programming language and x10...

apgas programming in...

x10 workshop – brief introduction to x10 vijay saraswat

series merida · 2021. 6. 2. · mÉrida foro 25x25...

x10 couplers

x10 language...

introducing scalegraph : an x10 library...

x10 & hx10

performance analysis of lattice qcd with apgas programming...

x10 overview