x10 tutorial x10.sf

91
X10 Tutorial © 2007 IBM Corporation 1 IBM Research X10 Tutorial http://x10.sf.net Vijay Saraswat (based on tutorial co-authored with Vivek Sarkar, Christoph von Praun, Nate Nystrom, Igor Peshansky) August 2008 IBM Research This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002.

Upload: zack

Post on 13-Jan-2016

83 views

Category:

Documents


0 download

DESCRIPTION

X10 Tutorial http://x10.sf.net. Vijay Saraswat (based on tutorial co-authored with Vivek Sarkar, Christoph von Praun, Nate Nystrom, Igor Peshansky) August 2008 IBM Research. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation1

IBM

Res

earc

h

X10 Tutorialhttp://x10.sf.net

Vijay Saraswat(based on tutorial co-authored with Vivek Sarkar,

Christoph von Praun, Nate Nystrom, Igor Peshansky)August 2008IBM Research

This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002.

Page 2: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation2

IBM

Res

earc

h

Acknowledgements

X10 Core Team (IBM)– Ganesh Bikshandi, Sreedhar Kodali, Nathaniel

Nystrom, Igor Peshansky, Vijay Saraswat, Pradeep Varma, Sayantan Sur, Olivier Tardieu, Krishna Venkat, Tong Wen, Jose Castanos, Ankur Narang, Komondoor Raghavan

X10 Tools– Philippe Charles, Robert Fuhrer

Emeritus– Kemal Ebcioglu, Christian Grothoff, Vincent Cave,

Lex Spoon, Christoph von Praun, Rajkishore Barik, Chris Donawa, Allan Kielstra

Research colleagues– Vivek Sarkar, Rice U– Satish Chandra,Guojing Cong– Ras Bodik, Guang Gao, Radha Jagadeesan, Jens

Palsberg, Rodric Rabbah, Jan Vitek– Vinod Tipparaju, Jarek Nieplocha (PNNL)– Kathy Yelick, Dan Bonachea (Berkeley)– Several others at IBM

Recent Publications

1. “Solving large, irregular graph problems using adaptive work-stealing”, to appear in ICPP 2008.

2. “Constrained types for OO languages”, to appear in OOPSLA 2008.

3. “Type Inference for Locality Analysis of Distributed Data Structures”, PPoPP 2008.

4. “Deadlock-free scheduling of X10 Computations with bounded resources”, SPAA 2007

5. “A Theory of Memory Models”, PPoPP 2007.

6. “May-Happen-in-Parallel Analysis of X10 Programs”, PPoPP 2007.

7. “An annotation and compiler plug-in system for X10”, IBM Technical Report, Feb 2007.

8. “Experiences with an SMP Implementation for X10 based on the Java Concurrency Utilities” Workshop on Programming Models for Ubiquitous Parallelism (PMUP), September 2006.

9. "An Experiment in Measuring the Productivity of Three Parallel Programming Languages”, P-PHEC workshop, February 2006.

10. "X10: An Object-Oriented Approach to Non-Uniform Cluster Computing", OOPSLA conference, October 2005.

11. "Concurrent Clustered Programming", CONCUR conference, August 2005.

12. "X10: an Experimental Language for High Productivity Programming of Scalable Systems", P-PHEC workshop, February 2005.

Tutorials TiC 2006, PACT 2006, OOPSLA 2006, PPoPP 2007, SC 2007 Graduate course on X10 at U Pisa (07/07) Graduate course at Waseda U (Tokyo, 04/08)

Page 3: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation3

IBM

Res

earc

h

Acknowledgements (contd.)

X10 is an open source project (Eclipse Public License).

Reference implementation in Java, runs on any Java 5 VM.– Windows/Intel, Linux/Intel– AIX/PPC, Linux/PPC– Runs on multiprocessors

X10Flash project --- cluster implementation of X10 under development at IBM– Translation of X10 to C + LAPI– Will not be discussed in today’s

tutorial– Contact: Vijay Saraswat,

[email protected]

Website: http://x10.sf.net

Website contains– Language specification– Tutorial material– Presentations– Download instructions– Copies of some papers– Pointers to mailing list

This material is based upon work supported in part by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002.

Page 4: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation4

IBM

Res

earc

h

Outline

1. What is X10?• background, status

2. Basic X10 (single place)• async, finish, atomic• future, force

3. Basic X10 (arrays & loops)• points, rectangular regions,

arrays• for, foreach

4. Scalable X10 (multiple places)• places, distributions, distributed

arrays, ateach, BadPlaceException

5. Clocks• creation, registration, next,

resume, drop, ClockUseException

6. Basic serial constructs that differ from Java• const, nullable, extern

7. Advanced topics• Value types, conditional

atomic sections (when), general regions & distributions

• Refer to language spec for details

Page 5: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation5

IBM

Res

earc

h

What is X10?

X10 is a new experimental language developed in the IBM PERCS project as part of the DARPA program on High Productivity Computing Systems (HPCS)

X10’s goal is to provide a new parallel programming model and its embodiment in a high level language that:1. is more productive than current models, 2. can support higher levels of abstraction better than current

models, and 3. can exploit the multiple levels of parallelism and non-uniform

data access that are critical for obtaining scalable performance in current and future HPC systems

Page 6: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation6

IBM

Res

earc

h

The X10 RoadMap

JVM Implementation

Language Definition

Libraries, APIs, Tools, user trials

X10 Flash C/C++ Implementation

v1.7 spec

MS5 MS6 MS7 MS8 MS9 MS10

v1.7 impl

v1.7 impl Rel 1

Initial debugger release

v1.7 impl Rel 2

v2.0 spec

Concur. refactorings

Design of X10 Implementation for PERCS

Port to PERCS h/w

Multi-process debugger

More refactorings

Public Release of X10 2.0 system

Trial at Rice U

v2.0 impl

v2.0 impl

X10DT enhancements

Advanced debugger features

SSCA#2, UTS Other Apps (TBD)

Internal milestone

DARPA milestones

Page 7: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation7

IBM

Res

earc

h

Current X10 Environment:Unoptimized Single-process Implementation

Foo.x10

x10c X10 compiler --- translates Foo.x10 to Foo.java, uses javac to generate Foo.class from Foo.java

Foo.class

X10 source program --- must contain a class named Foo with a public static void main(String[] args) method

X10 Virtual Machine(JVM + J2SE libraries +

X10 libraries + X10 Multithreaded Runtime)

External DLL’s

X10 externinterface

X10 Abstract Performance Metrics(event counts, distribution efficiency)X10 Program Output

X10 program translated into Java ---// #line pseudocomment in Foo.java specifies source line mapping in Foo.x10

Foo.java

x10c Foo.x10

x10 Foo.x10

Caveat: this is a prototype implementation with many limitations. Please be patient!

Page 8: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation8

IBM

Res

earc

h

Eclipse environment for X10 – X10DT

Page 9: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation9

IBM

Res

earc

h

Future X10 Environment

Parallelism through

Unordered Tasks

X10 application programs

X10 Deployment

HPC Runtime Environment

(Parallel Environment, MPI, LAPI, …)

HPC Parallel System

Implicit parallelism,

Implicit data distributions

X10 places --- abstraction of explicit control & data distribution

Mapping of places to nodes in HPC Parallel Environment

Primitive constructs for parallelism, communication, and synchronization

Target system for parallel application

X10 Libraries

Domain Specific Languages (Streaming, …)

Page 10: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation10

IBM

Res

earc

h

The current architectural landscape

(100’s of suchcluster nodes)

I/Ogatewaynodes

“Scalable Unit” Cluster Interconnect Switch/Fabric

Road Runner: Cell-accelerated OpteronRoad Runner: Cell-accelerated OpteronMulti-core w/ accelerators (IXP 2850)Multi-core w/ accelerators (IXP 2850)

Blue GeneBlue GenePower6 ClustersPower6 Clusters

. . .

Memory

PEs,

SMP Node

PEs,

. . . . . .

Memory

PEs,

SMP Node

PEs,

Interconnect

Page 11: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation11

IBM

Res

earc

h

X10 vs. Java

X10 is an extended subset of Java– Base language = Java 1.4

• Java 5 features (generics, metadata, etc.) are currently not supported in X10

– Notable features removed from Java• Concurrency --- threads, synchronized, etc.• Java arrays – replaced by X10 arrays

– Notable features added to Java• Concurrency – async, finish, atomic, future, force, foreach, ateach, clocks• Distribution --- points, distributions• X10 arrays --- multidimensional distributed arrays, array reductions, array

initializers, • Serial constructs --- nullable, const, extern, value types

X10 supports both OO and non-OO programming paradigms

Page 12: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation12

IBM

Res

earc

h

x10.lang standard library

Java package with “built in” classes that provide support for selected X10 constructs

Standard types– boolean, byte, char, double, float, int, long, short, String

x10.lang.Object -- root class for all instances of X10 objects x10.lang.clock --- clock instances & clock operations x10.lang.dist --- distribution instances & distribution operations x10.lang.place --- place instances & place operations x10.lang.point --- point instances & point operations x10.lang.region --- region instances & region operations

All X10 programs implicitly import the x10.lang.* package, so the x10.lang prefix can be omitted when referring to members of x10.lang.* classes

e.g., place.MAX_PLACES, dist.factory.block([0:100,0:100]), …

Similarly, all X10 programs also implicitly import the java.lang.* package e.g., X10 programs can use Math.min() and Math.max() from java.lang In case of conflict (e.g. Integer), user must import the desired one explicitly, e.g.

import java.lang.Integer;

Page 13: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation13

IBM

Res

earc

h

Calling foreign functions from X10 programs

Java methods– Can be called directly from X10 programs– Java class will be loaded automatically as part of X10

program execution– Basic rule: don’t call any method that can perform

wait/notify or related thread operations• Calling synchronized methods is okay

C functions– Need to use extern declaration in X10, and perform a

System.loadLibrary() call

Page 14: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation14

IBM

Res

earc

h

Outline

1. What is X10?• background, status

2. Basic X10 (single place)• async, finish, atomic• future, force

3. Basic X10 (arrays & loops)• points, rectangular regions,

arrays• for, foreach

4. Scalable X10 (multiple places)• places, distributions, distributed

arrays, ateach, BadPlaceException

5. Clocks• creation, registration, next,

resume, drop, ClockUseException

6. Basic serial constructs that differ from Java• const, nullable, extern

7. Advanced topics• Value types, conditional

atomic sections (when), general regions & distributions

• Refer to language spec for details

Page 15: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation15

IBM

Res

earc

h

Basic X10 (Single Place)

async = construct used to execute a statement in parallel as a new activity

finish = construct used to check for global termination of statement and all the activities that it has created

atomic = construct used to coordinate accesses to shared heap by multiple activities

future = construct used to evaluate an expression in parallel as a new activity

force = construct used to check for termination of future

Core constructs used for intra-place (shared memory) parallel programming

Page 16: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation16

IBM

Res

earc

h

async

async S Creates a new child activity

that executes statement S Returns immediately S may reference final

variables in enclosing blocks Activities cannot be named Activity cannot be aborted or

cancelled

Stmt ::= async Stmt

cf Cilk’s spawn

void run() {

if (r < 2) return;

final Fib f1 = new Fib(r-1),

f2 = new Fib(r-2);

finish {

async f1.run();

f2.run();

}

result = f1.r + f2.r;

}

See also TutAsync.x10

Page 17: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation17

IBM

Res

earc

h

finish

finish S Execute S, but wait until all (transitively)

spawned asyncs have terminated.

Rooted exception model Trap all exceptions thrown by spawned

activities. Throw an (aggregate) exception if any

spawned async terminates abruptly. implicit finish at main activity

finish is useful for expressing “synchronous” operations on (local or) remote data.

Stmt ::= finish Stmt

cf Cilk’s sync

void run() {

if (r < 2) return;

final Fib f1 = new Fib(r-1),

f2 = new Fib(r-2);

finish {

async f1.run();

f2.run();

}

result = f1.r + f2.r;

}

Page 18: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation18

IBM

Res

earc

h

Granularity of concurrency

Too many tasks can reduce performance.– Overhead for task creation,

scheduling, interference, cache effects.

Too few tasks may result in under-utilization.– Idle workers

In some cases, workload can be divided equally among all tasks statically– E.g. array-based

computations Some problems may need

dynamic load-balancing– Work-stealing in X10 1.7.

Identify tasks that can be performed in parallel – Use async to specify them

Identify point in code by which tasks must be finished– Use finish

If tasks will read/write shared variables during execution, ensure atomicity of updates– Use atomic

In principle, create k*P tasks, where P is the number of processors available.

Page 19: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation19

IBM

Res

earc

h

Example programs

Fibonacci

NQueens Tree

How would you parallelize these computations?

Page 20: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation20

IBM

Res

earc

h

Termination

Local termination: Statement s terminates locally when activity has completed all its

computation with respect to s.

Global termination:Local termination + activities that have been spawned by s terminated globally (recursive definition)

main function is root activity program terminates iff root activity terminates.

(implicit finish at root activity) ‘daemon threads’ (child outlives root activity) not

allowed in X10

Page 21: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation21

IBM

Res

earc

h

Termination (Example)

public void main (String[] args) { ... finish { async {

for () { async {... } } finish async {... } ... }} // finish

}

termination

local globalstart

Page 22: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation22

IBM

Res

earc

h

Rooted computation X10

root activity

public void main (String[] args) { ... finish { async {

for () { async {... } } finish async {... } ... }} // finish

}

...

ancestor relation

Root-of hierarchy

Page 23: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation23

IBM

Res

earc

h

Rooted exception model

public void main (String[] args) { ... finish { async {

for () { async {... } } finish async {... } ... }

} // finish}

...

root-of relation

exception flow along root-of relation

Propagation along the lexical scoping:Exceptions that are not caught inside an activity are propagated to the nearest suspended ancestor in the root-of relation.

Page 24: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation24

IBM

Res

earc

h

Example: rooted exception model (async)

int result = 0;try { finish { ateach (point [i]:dist.factory.unique()) { throw new Exception (“Exception from “+here.id) } throw new Exception(); result = 42; } // finish} catch (x10.lang.MultipleExceptions me) { System.out.print(me);} assert (result == 42); // always true

no exceptions are ‘thrown on the floor’ exceptions are propagated across activity and place

boundaries

Page 25: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation25

IBM

Res

earc

h

Behavioral annotations

nonblocking On any input store, a nonblocking method can continue execution or

terminate. (dual: blocking, default: nonblocking)

recursively nonblocking Nonblocking, and every spawned activity is recursively nonblocking.

local A local method guarantees that its execution will only access variables that are local to the place of the current activity. (dual: remote, default: local)

sequentialMethod does not create concurrent activities.In other words, method does not use async, foreach, ateach. (dual: parallel, default: parallel)

Sequential and nonblocking imply recursively nonblocking.

Page 26: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation26

IBM

Res

earc

h

atomic

Atomic blocks are conceptually executed in a single step while other activities are suspended: isolation and atomicity.

An atomic block ...– must be nonblocking– must not create concurrent

activities (sequential)– must not access remote data

(local) // push data onto concurrent // list-stackNode node = new Node(data);atomic { node.next = head; head = node; }

// target defined in lexically// enclosing scope.atomic boolean CAS(Object old, Object new) { if (target.equals(old)) { target = new; return true; } return false;}

Stmt ::= atomic StatementMethodModifier ::= atomic

Page 27: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation27

IBM

Res

earc

h

Statically checked conditions on atomic blocks

An atomic block must...be local, sequential, nonblocking:

...not include blocking operations– no await, no when, no calls to blocking methods

... not include access to data at remote places– no ateach, no future, only calls to local methods

... not spawn other activities– no async, no foreach, only calls to sequential methods

Page 28: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation28

IBM

Res

earc

h

Exceptions in atomic blocks Atomicity guarantee only for successful execution.

– Exceptions should be caught inside atomic block– Explicit undo in the catch handler

(Uncaught) exceptions propagate across the atomic block boundary; atomic terminates on normal or abrupt termination of its block.

boolean move(Collection s, Collection d, Object o) { atomic { if (!s.remove(o)) { return false; // object not found } else { try { d.add(o); } catch (RuntimeException e) { s.add(o); // explicit undo throw e; // exception } return true; // move succeeded } }}

Page 29: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation29

IBM

Res

earc

h

Data races with async / foreach

final double arr[R] = …; // global array

class ReduceOp { double accu = 0.0; double sum ( double[.] arr ) { foreach (point p: arr) { atomic accu += arr[p]; } return accu;}

concurrent conflictingaccess to shared variable:data race

X10 guideline for avoiding data races: access shared variables inside an atomic block combine ateach and foreach with finish declare data to be read-only where possible (final or value type)

finish

Page 30: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation30

IBM

Res

earc

h

NQueens: Searching in parallel

NQueens.x10

How should this be parallelized?

Page 31: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation31

IBM

Res

earc

h

future

future (P) S Creates a new child activity at

place P, that executes statement S;

Returns immediately. S may reference final variables

in enclosing blocks.

future vs. async Return result from

asynchronous computation Tolerate latency of remote

access.

// global dist. arrayfinal double a[D] = …; final int idx = …;

future<double> fd = future (a.distribution[idx]) { // executed at a[idx]’s // place a[idx]; };

Expr ::= future PlaceExpSingleListopt {Expr }

future type no subtype relation between T and future<T>

Considering addition of a delayed future: needs run() to be called before it is activated

Page 32: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation32

IBM

Res

earc

h

future example

public class TutFuture1 { static int fib (final int n) { if ( n <= 0 ) return 0; if ( n == 1 ) return 1; future<int> x = future { fib(n-1) }; int y = fib(n-2); return x.force() + y; }

public static void main(String[] args) { System.out.println("fib(10) = " + fib(10)); }}

Divide and conquer: recursive calls execute concurrently.

Page 33: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation33

IBM

Res

earc

h

Example: rooted exception model (future)

double div (final double divisor) future<double> f = future { return 42.0 / divisor; } double result; try { result = f.force(); } catch (ArithmeticException e) { result = 0.0; } return result;}

Exception is propagated when the future is forced.

Page 34: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation34

IBM

Res

earc

h

Futures can deadlock

nullable<future<int>> f1=null;nullable<future<int>> f2=null;

void main(String[] args) { f1 = future(here){a1()}; f2 = future(here){a2()}; f1.force();}

int a1() { nullable<future<int>> tmp=null; do { tmp=f2; } while (tmp == null); return tmp.force();}

int a2() { nullable<future<int>> tmp=null; do { tmp=f1; } while (tmp == null); return tmp.force();}

X10 guidelines to avoid deadlock: avoid futures as shared variables force called by same activity that created body of future, or a

descendent.

cyclic wait condition

Page 35: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation35

IBM

Res

earc

h

Outline

1. What is X10?• background, status

2. Basic X10 (single place)• async, finish, atomic• future, force

3. Basic X10 (arrays & loops)• points, rectangular regions,

arrays• for, foreach

4. Scalable X10 (multiple places)• places, distributions,

distributed arrays, ateach, BadPlaceException

5. Clocks• creation, registration, next,

resume, drop, ClockUseException

6. Basic serial constructs that differ from Java• const, nullable, extern

7. Advanced topics• Value types, conditional

atomic sections (when), general regions & distributions

• Refer to language spec for details

Page 36: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation36

IBM

Res

earc

h

point

A point is an element of an n-dimensional Cartesian space (n>=1) with integer-valued coordinates e.g., [5], [1, 2], …

– Dimensions are numbered from 0 to n-1– n is also referred to as the rank of the point

A point variable can hold values of different ranks e.g., – point p; p = [1]; … p = [2,3]; …

Operations– p1.rank

• returns rank of point p1– p1[i]

• returns element (i mod p1.rank) if i < 0 or i >= p1.rank– p1.lt(p2), p1.le(p2), p1.gt(p2), p1.ge(p2)

• returns true iff p1 is lexicographically <, <=, >, or >= p2 • only defined when p1.rank and p1.rank are equal

Page 37: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation37

IBM

Res

earc

h

Syntax extensions for points

Implicit syntax for points: point p = [1,2] point p = point.factory(1,2)

Exploded variable declarations for points: point p [i,j] // final int i,j

Typical uses :– region R = [0:M-1,0:N-1];– for (point p [i, j] : R) { ... }– for (point [i, j] : R) { ... }– point sum (point [i,j], point [k, l]) { return [i+k, j+l]; }

– int [.] iarr = new int [R] (point [i,j]) { return i; }

Page 38: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation38

IBM

Res

earc

h

Example: point (TutPoint1)

public class TutPoint { public static void main(String[] args) { point p1 = [1,2,3,4,5]; point p2 = [1,2]; point p3 = [2,1]; System.out.println("p1 = " + p1 + " ; p1.rank = " + p1.rank + " ; p1[2] = " + p1[2]); System.out.println("p2 = " + p2 + " ; p3 = " + p3 + " ; p2.lt(p3) = " + p2.lt(p3)); }}

Console output:

p1 = [1,2,3,4,5] ; p1.rank = 5 ; p1[2] = 3p2 = [1,2] ; p3 = [2,1] ; p2.lt(p3) = true

Page 39: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation39

IBM

Res

earc

h

Rectangular regions

A rectangular region is the set of points contained in a rectangular subspace

A region variable can hold values of different ranks e.g., – region R; R = [0:10]; … R = [-100:100, -100:100]; … R = [0:-1]; …

Operations– R.rank ::= # dimensions in region; – R.size() ::= # points in region– R.contains(P) ::= predicate if region R contains point P– R.contains(S) ::= predicate if region R contains region S– R.equal(S) ::= true if region R equals region S– R.rank(i) ::= projection of region R on dimension i (a one-dimensional region)– R.rank(i).low() ::= lower bound of ith dimension of region R– R.rank(i).high() ::= upper bound of ith dimension of region R– R.ordinal(P) ::= ordinal value of point P in region R– R.coord(N) ::= point in region R with ordinal value = N– R1 && R2 ::= region intersection (will be rectangular if R1 and R2 are rectangular)– R1 || R2 ::= union of regions R1 and R2 (may not be rectangular)– R1 – R2 ::= region difference (may not be rectangular)

Page 40: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation40

IBM

Res

earc

h

Example: region (TutRegion1)

public class TutRegion { public static void main(String[] args) { region R1 = [1:10, -100:100]; System.out.println("R1 = " + R1 + " ; R1.rank = " +

R1.rank + " ; R1.size() = " + R1.size() + " ; R1.ordinal([10,100]) = " + R1.ordinal([10,100]));

region R2 = [1:10,90:100]; System.out.println("R2 = " + R2 + " ; R1.contains(R2) =

" + R1.contains(R2) + " ; R2.rank(1).low() = " + R2.rank(1).low() + " ; R2.coord(0) = " + R2.coord(0));

}}

Console output:

R1 = {1:10,-100:100} ; R1.rank = 2 ; R1.size() = 2010 ; R1.ordinal([10,100]) = 2009

R2 = {1:10,90:100} ; R1.contains(R2) = true ; R2.rank(1).low() = 90 ; R2.coord(0) = [1,90]

Page 41: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation41

IBM

Res

earc

h

Syntax extensions for regions

Region constructors

int hi, lo;region r = hi; region r = region.factory.region(0, hi) region r = [low:hi]; region r = region.factory.region(lo, hi)

region r1, r2; // 1-dim regionsregion r = [r1, r2]; region r = region.factory.region(r1, r2); // 2-dim region

Page 42: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation42

IBM

Res

earc

h

X10 arrays

Java arrays are one-dimensional and local– e.g., array args in main(String[] args)– Multi-dimensional arrays are represented as “arrays of arrays” in

Java X10 has true multi-dimensional arrays (as Fortran) that can be

distributed (as in UPC, Co-Array Fortran, ZPL, Chapel, etc.)

Array declaration– T [.] A declares an X10 array with element type T– An array variable can refer to arrays with different rank

Array allocation– new T [ R ] creates a local rectangular X10 array with

rectangular region R as the index domain and T as the element (range) type

– e.g., int[.] A = new int[ [0:N+1, 0:N+1] ];Array initialization

– elaborate on a slide that follows...

Page 43: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation43

IBM

Res

earc

h

Array declaration syntax: [] vs [.]

General arrays: <Type>[.]– one or multidimensional arrays– can be distributed– arbitrary region

Special case (“rail”): <Type>[] – 1 dimensional– 0-based, rectangular array– not distributed– can be used in place of general arrays– supports compile-time optimization

Page 44: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation44

IBM

Res

earc

h

Simple array operations

A.rank ::= # dimensions in array A.region ::= index region (domain) of array A.distribution ::= distribution of array A A[P] ::= element at point P, where P belongs to A.region A | R ::= restriction of array onto region R

– Useful for extracting subarrays

Page 45: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation45

IBM

Res

earc

h

Aggregate array operations

A.sum(), A.max() ::= sum/max of elements in array A1 <op> A2

– returns result of applying a pointwise op on array elements, when A1.region = A2. region

– <op> can include +, -, *, and / A1 || A2 ::= disjoint union of arrays A1 and A2

(A1.region and A2.region must be disjoint) A1.overlay(A2)

– returns an array with region, A1.region || A2.region, with element value A2[P] for all points P in A2.region and A1[P] otherwise.

Page 46: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation46

IBM

Res

earc

h

Example: arrays (TutArray1)

public class TutArray1 { public static void main(String[] args) { int[.] A = new int[ [1:10,1:10] ] (point [i,j]) { return i+j;} ; System.out.println("A.rank = " + A.rank + " ; A.region = " + A.region); int[.] B = A | [1:5,1:5]; System.out.println("B.max() = " + B.max()); }}

Console output:

A.rank = 2 ; A.region = {1:10,1:10}B.max() = 10

array copy

Page 47: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation47

IBM

Res

earc

h

Initialization of mutable arrays

Mutable array with nullable references to mutable objects:

nullable<RefType> [.] farr = new RefType[R]; // init with null value

Mutable array with references to mutable objects:

RefType [.] farr = new RefType [R]; // compile-time error, init required

RefType [.] farr = new RefType [R] (point[i]) { return RefType(here, i); }

Execution of initializer is implicitly parallel / distributed (pointwise operation):

That hold ‘reference to value objects’ (value object can be inlined by imp.)

int [.] iarr = new int[N] ; // init with default value, 0int [.] iarr = new int[.] {1, 2, 3, 4}; // Java styleValType [.] V = new ValType[N] (point[i])

{ return ValType(i);}; // explicit init

Page 48: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation48

IBM

Res

earc

h

Initialization of value arrays

Initialization of value arrays requires an initializer.

Value array of reference to mutable objects:

RefType value [.] farr = new value RefType [N]; // compile-time error, init required

RefType value [.] farr = new value RefType [N] (point[i]) { return new Foo(); }

Value array of ‘reference to value objects’ (value object can be inlined)

int value [.] iarr = new value int[.] {1, 2, 3, 4}; // Java style init

ValType value [.] iarr = new value ValType[N] (point[i]) { return ValType(i); }; // explicit init

Page 49: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation49

IBM

Res

earc

h

foreach

foreach (point p: R) S Creates |R| async statements in parallel at current place.

Termination of all (recursively created) activities can be ensured with finish.

finish foreach is a convenient way to achieve master-slave fork/join parallelism (OpenMP programming model)

foreach ( FormalParam: Expr ) Stmt

for (point p: R) async { S }

foreach (point p:R) S

Page 50: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation50

IBM

Res

earc

h

Summing up elements of an array in parallel

See ArraySum.x10

When P is small, local sums can be summed up serially.

Page 51: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation51

IBM

Res

earc

h

51

Iterative Averaging with a 1-D stencil

Problem Statement– Initialize a 1-D array, A[0:n+1] with A[0:n] = 0 & A[n+1] = n+1– A[0] = 0 and A[n+1] = n+1 are fixed boundary conditions– Iteratively compute new values for A[1:n] by averaging the values of the two

neighboring elements– Terminate when the sum of element changes is less than a given threshold

Acknowledgment– Example and codes for C + MPI, UPC, CAF versions were provided by Steve

Deitz and Brad Chamberlain from Cray, {deitz,bradc}@cray.com

Page 52: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation52

IBM

Res

earc

h

1-D stencil: parallelism within a place

See Stencil1D.

Uses x10.util.dist.Distribution.block(R, P) to block-divide a region into P regions.

Exercise: Parallelize Pascal’s triangle.

Page 53: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation53

IBM

Res

earc

h

Outline

1. What is X10?• background, status

2. Basic X10 (single place)• async, finish, atomic• future, force

3. Basic X10 (arrays & loops)• points, rectangular regions,

arrays• for, foreach

4. Scalable X10 (multiple places)• places, distributions,

distributed arrays, ateach, BadPlaceException

5. Clocks• creation, registration, next,

resume, drop, ClockUseException

6. Basic serial constructs that differ from Java• const, nullable, extern

7. Advanced topics• Value types, conditional atomic

sections (when), general regions & distributions

• Refer to language spec for details

Page 54: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation54

IBM

Res

earc

h

Limitations of using a Single Place

Activity Stacks (S)

Shared Heap (H)

Largest deployment granularity for a single place is a single SMP– Smallest granularity can be a

single CPU or even a single hardware thread

Single SMP is inadequate for solving problems with large memory and compute requirements

X10 solution: incorporate multiple places as a core foundation of the X10 programming model

Enable deployment on large-scale clustered machines, with integrated support for intra-place parallelism

Storage classes: Immutable Data (I) Shared Heap (H) Activity Stacks (S)

Immutable Data (I) -- final variables,

value type instances

LocallySynchronous

(coherent access to intra-place shared heap)

. . .

Activities

Pla

ce

0

Page 55: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation55

IBM

Res

earc

h

What is Partitioned Global Address Space (PGAS)?

Computation is performed in multiple places.

A place contains data that can be operated on remotely.

Data lives in the place it was created, for its lifetime.

A datum in one place may reference a datum in another place.

Data-structures (e.g. arrays) may be distributed across many places.

Places may have different computational properties (e.g. PPE, SPE, …).

A place expresses locality.

Address Space

Shared Memory

OpenMP

PGAS

UPC, CAF, X10Message passing

MPI

Process/Thread

Page 56: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation56

IBM

Res

earc

h

What is Asynchronous PGAS?

Asynchrony– Simple explicitly concurrent

model for the user: async (p) S runs statement S “in parallel” at place p

– Controlled through finish, and local (conditional) atomic

Used for active messaging (remote asyncs), DMAs, fine-grained concurrency, fork/join concurrency, do-all/do-across parallelism– SPMD is a special case

Concurrency is made explicit and programmable.

Page 57: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation57

IBM

Res

earc

h

Places in X10

place.MAX_PLACES = total number of places (runtime constant) place.places = value array of all places in an X10 place.factory.place(i) = place corresponding to index i here = place in which current activity is executing <place-expr>.toString() returns a string of the form “place(id=99)” <place-expr>.id returns the id of the place

X10 Places

Processes

X10 programs defines mapping from X10 objects to X10 places, and abstract

performance metrics on places

X10 Data Structures

Future X10 deployment system will define mapping from X10 places to processes, and

processes to nodes.

Nodes

Page 58: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation58

IBM

Res

earc

h

Specifying number of places in X10DT

Page 59: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation59

IBM

Res

earc

h

Locality rule

An activity executing an atomic block may access only local data.

Locality of data determined through types.– Data of type Foo! is

potentially remote.– Data of type Foo!p is at place

p.– Data of type Foo is local.

Data may be cast to local type. Failure throws BadPlaceException.

Activity may access remote data synchronously (outside an atomic block).– Write (update) access is as

if performed in a finish/async.

– Read access is as if performed in future/force.

Programmer may always use explicit asyncs to improve locality of computation.

Current implementation limitation

Place types not yet supported.

Atomic block does not check array accesses are place local.

Page 60: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation60

IBM

Res

earc

h

Distributions in X10

A distribution maps every point in a region to a place.

Creating distributions (x10.lang.dist):– dist D1 = dist.factory.constant(R, here); // local distribution

– maps region R to here– dist D2 = dist.factory.block(R); // blocked distribution – dist D3 = dist.factory.cyclic(R); // cyclic distribution– dist D4 = dist.factory.unique(); // identity map on

[0:MAX_PLACES-1]

Page 61: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation61

IBM

Res

earc

h

async and future with explicit place specifier

async (P) S Creates new activity to execute statement S at place P async S is equivalent to async (here) S

future (P) { E } Create new activity to evaluate expression E at place P future { E } is equivalent to future (here) { E }

Note that here in a child activity for an async/future computation will refer to the place P at which the child activity is executing, not the place where the parent activity is executing

Specify the destination place for async/future activities so as to obey the Locality rule e.g.,

async (O.location) O.x = 1;future<int> F = future (A.distribution[i]) { A[i] } ;

Page 62: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation62

IBM

Res

earc

h

Inter-place communication using async and future

Question: how to assign A[i] = B[j], when A[i] and B[j] may be in different places?

Answer #1: Use nested async:finish async ( B.distribution[j] ) { final int bb = B[j]; async ( A.distribution[i] ) A[i] = bb;}

Answer #2: Use future-force and an async:final int b = future (B.distribution[j]) { B[j] }.force();

finish async ( A.distribution[i] ) A[i] = b;

Page 63: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation63

IBM

Res

earc

h

ateach (distributed parallel iteration)

ateach (point p:D) S Creates |D| async statements in parallel at place specified by

distribution.

Termination of all (recursively created) activities with finish. ateach is a convenient construct for writing parallel matrix code

that is independent of the underlying distribution, e.g.,

SPMD computation:

ateach ( FormalParam: Expr ) Stmt

for (point p:D.region) async (D[p]) { S }

ateach (point p:D) S

ateach ( point p : A.distribution ) A[p] = f(B[p], C[p], D[p]) ;

finish ateach( point[i] : dist.factory.unique() ) S

Page 64: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation64

IBM

Res

earc

h

Example: ateach (TutAteach1)

public class TutAteach1 { public static void main(String args[]) { finish ateach (point p: dist.factory.unique()) { System.out.println("Hello from " + here.id); } } // main()}

unique distribution: maps point i in region [0 : place.MAX_PLACES-1] to place place.factory.place(i).Console output:

Hello from 1 Hello from 0 Hello from 3 Hello from 4

Page 65: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation65

IBM

Res

earc

h

AllDistribution

See AllDistributionP2P.x10

See AllDistributionP2PRemoteAccess.x10

Page 66: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation66

IBM

Res

earc

h

Butterfly communication

Page 67: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation67

IBM

Res

earc

h

Outline

1. What is X10?• background, status

2. Basic X10 (single place)• async, finish, atomic• future, force

3. Basic X10 (arrays & loops)• points, rectangular regions,

arrays• for, foreach

4. Scalable X10 (multiple places)• places, distributions,

distributed arrays, ateach, BadPlaceException

5. Clocks• creation, registration, next,

resume, drop, ClockUseException

6. Basic serial constructs that differ from Java• const, nullable, extern

7. Advanced topics• Value types, conditional atomic

sections (when), general regions & distributions

• Refer to language spec for details

Page 68: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation68

IBM

Res

earc

h

Clocks: Motivation

Activity coordination using finish and force() is accomplished by checking for activity termination

But in many cases activities have a producer-consumer relationship and a “barrier”-like coordination is needed without waiting for activity termination– The activities involved may be in the same place or in different places

Design clocks to offer determinate and deadlock-free coordination between a dynamically varying number of activities.

Activity 0 Activity 1 Activity 2 . . .

Phase 0

Phase 1

. . .

Page 69: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation69

IBM

Res

earc

h

Clocks (1/2)

clock c = clock.factory.clock(); Allocate a clock, register current activity with it. Phase 0 of c starts.

async(…) clocked (c1,c2,…) Sateach(…) clocked (c1,c2,…) Sforeach(…) clocked (c1,c2,…) S Create async activities registered on clocks c1, c2, …

c.resume(); Nonblocking operation that signals completion of work by current

activity for this phase of clock c

next; Barrier --- suspend until all clocks that the current activity is registered

with can advance. c.resume() is first performed for each such clock, if needed.

Next can be viewed like a “finish” of all computations under way in the current phase of the clock

Page 70: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation70

IBM

Res

earc

h

Clocks (2/2)

c.drop(); Unregister with c. A terminating activity will implicitly drop all clocks that

it is registered on.

c.registered() Return true iff current activity is registered on clock c c.dropped() returns the opposite of c.registered()

ClockUseException Thrown if an activity attempts to transmit or operate on a clock that it is

not registered on Or if an activity attempts to transmit a clock in the scope of a finish

Page 71: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation71

IBM

Res

earc

h

Semantics

Static semantics– An activity may operate only on those clocks it is registered with.– In finish S,S may not contain any (top-level) clocked asyncs.

Dynamic semantics– A clock c can advance only when all its registered activities have

executed c.resume().– An activity may not pass-on clocks on which it is not live to sub-

activities. – An activity is deregistered from a clock when it terminates

Page 72: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation72

IBM

Res

earc

h

Example (TutClock1.x10)finish async { final clock c = clock.factory.clock(); foreach (point[i]: [1:N]) clocked (c) { while ( true ) { int old_A_i = A[i]; int new_A_i = Math.min(A[i],B[i]); if ( i > 1 ) new_A_i = Math.min(new_A_i,B[i-1]); if ( i < N ) new_A_i = Math.min(new_A_i,B[i+1]); A[i] = new_A_i; next; int old_B_i = B[i]; int new_B_i = Math.min(B[i],A[i]); if ( i > 1 ) new_B_i = Math.min(new_B_i,A[i-1]); if ( i < N ) new_B_i = Math.min(new_B_i,A[i+1]);

B[i] = new_B_i; next; if ( old_A_i == new_A_i && old_B_i == new_B_i ) break; } // while } // foreach } // finish async

parent transmits clock to child

exiting from while loop terminates activity for iteration i, and automatically deregisters activity from clock

Page 73: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation73

IBM

Res

earc

h

clock phases

hierarchical static concurrency model

... ...

activities(foreach, finish)

dynamic concurrency model

...

...

Example (TutClock1.x10)

Page 74: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation74

IBM

Res

earc

h

Clock safety

An activity may be registered on one or more clocks Clock c can advance only when all activities registered

with the clock have executed c.resume().

Runtime invariant: Clock operations are guaranteed to be deadlock-free and determinate.

Page 75: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation75

IBM

Res

earc

h

Deadlock freedom

Where is this useful?– Whenever synchronization

pattern of a program is independent of the data read by the program

– True for a large majority of HPC codes.

– (Usually not true of reactive programs.)

Central theorem of X10:– Arbitrary programs with

async, atomic, finish (and clocks) are deadlock-free.

Key intuition:– atomic is deadlock-free.– finish has a tree-like

structure.– clocks are made to satisfy

conditions which ensure tree-like structure.

– Hence no cycles in wait-for graph.

Page 76: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation76

IBM

Res

earc

h

Examples using clocks

See AllReductionBarrier.

See Streams, specifically Sieve of Eratosthenes, Fib.

Exercise: Try Pascal’s triangle.

Page 77: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation77

IBM

Res

earc

h

Outline

1. What is X10?• background, status

2. Basic X10 (single place)• async, finish, atomic• future, force

3. Basic X10 (arrays & loops)• points, rectangular regions,

arrays• for, foreach

4. Scalable X10 (multiple places)• places, distributions,

distributed arrays, ateach, BadPlaceException

5. Clocks• creation, registration, next,

resume, drop, ClockUseException

6. Basic serial constructs that differ from Java• const, nullable, extern

7. Advanced topics• Value types, conditional atomic

sections (when), general regions & distributions

• Refer to language spec for details

Page 78: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation78

IBM

Res

earc

h

when

when (E) S – Activity suspends until a state in which

the guard E is true. – In that state, S is executed atomically

and in isolation.

Guard E – boolean expression– must be nonblocking– must not create concurrent activities

(sequential)– must not access remote data (local)– must not have side-effects (const)

await (E)– syntactic shortcut for when (E) ;

Stmt ::= WhenStmtWhenStmt ::= when ( Expr ) Stmt | WhenStmt or (Expr) Stmt

class OneBuffer { nullable<Object> datum = null; boolean filled = false; void send(Object v) { when ( ! filled ) { datum = v; filled = true; } } Object receive() { when ( filled ) { Object v = datum; datum = null; filled = false; return v; } }}

Page 79: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation79

IBM

Res

earc

h

Static semantics of guard for when / await

boolean field boolean expression with field access or constant values

class BufferBuffer { .. void send(Object v) { when (size() < MAX_SIZE) { datum = v; filled = true; } } ...}

compile-time error

Page 80: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation80

IBM

Res

earc

h

Semaphores

class Semaphore { private boolean taken; void p() { when (!taken) taken = true; } atomic void v() { taken = false; }}

Page 81: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation81

IBM

Res

earc

h

Value types : immutable instances

value class – Can only extend value class

or x10.lang.Object. – All fields are implicitly final – Can only be extended by

value classes.– May contain fields with

reference type.– May be implemented by

reference or copy.

Values are equal (==) if their fields are equal, recursively.

public value complex { double im, re; public complex(double im, double re) { this.im = im; this.re = re; } public complex add(complex a) { return new complex(im+a.im, re+a.re); } … }

Page 82: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation82

IBM

Res

earc

h

Nullable Types

Unlike Java, reference types in X10 do not contain the value null by default.– A method invocation on a

variable of a reference type cannot throw an NPE.

The nullable type constructor can be used to add null to a type:

– nullable<Foo> : values of this type are references to Foo or null.

– nullable<nullable<Foo>> is the same as nullable<Foo>

Nullable may be applied to value types as well.– nullable<int> : value is null or

an int.

Page 83: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation83

IBM

Res

earc

h

Dependent types

Classes and interfaces may define properties– final instance fields.

Types may contain a where clause: Foo(:c)– c is a condition, a conjunction

of equalities.– c may reference final

variables in environment or properties of Foo.

– c may reference special variable self, of type Foo.

Dependent types may be used everywhere where types are used.– Values may be cast to

dependent types.– Code is generated to check

values of properties at runtime.

Examples– foo(:place==p)– region(:rank==2)– dist(:region==r)

Page 84: X10 Tutorial x10.sf

IBM Research: Software Technology

© 2005 IBM Corporation84

Pro

gram

min

g T

echn

olog

ies

X10 Summary

Page 85: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation85

IBM

Res

earc

h

X10 v1.5 Language Summary

async [(Place)] [clocked(c…)] Stm– Run Stm asynchronously at Place

finish Stm– Execute Stm, wait for all asyncs to terminate

Region– Collection of index points, e.g.

region r = [1:N,1:M]; foreach ( point P : Reg) Stm

– Run Stm asynchronously for each point in region

Distribution– Mapping from region to places, e.g.

• dist d = dist.factory.block(r); ateach ( point P : Dist) Stm

– Run Stm asynchronously for each point in dist, in its place.

new T– Allocate object at this place (here)

atomic Stm – Execute Stm atomically

next– suspend till all clocks that the current activity is

registered with can advance– Clocks are a generalization of barriers and MPI

communicators future [(Place)] [clocked(c…)] Expr

– Compute Expr asynchronously at Place F. force()

– Block until future F has been computed atomic Stm

– Execute Stm atomically

extern– Lightweight interface to native code

Deadlock safety: any X10 program written with async, atomic, finish, foreach, ateach, and clocks can never deadlock

Page 86: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation86

IBM

Res

earc

h

x10.lang standard library

Java package with “built in” classes that provide support for selected X10 constructs

Standard types– boolean, byte, char, double, float, int, long, short, String

x10.lang.Object -- root class for all instances of X10 objects x10.lang.clock --- clock instances & clock operations x10.lang.dist --- distribution instances & distribution operations x10.lang.place --- place instances & place operations x10.lang.point --- point instances & point operations x10.lang.region --- region instances & region operations

All X10 programs implicitly import the x10.lang.* package, so the x10.lang prefix can be omitted when referring to members of x10.lang.* classes

e.g., place.MAX_PLACES, dist.factory.block([0:100,0:100]), …

Similarly, all X10 programs also implicitly import the java.lang.* package e.g., X10 programs can use Math.min() and Math.max() from java.lang In case of conflict (e.g. Integer), user must import the desired one explicitly, e.g.

import java.lang.Integer;

Page 87: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation87

IBM

Res

earc

h

Rectangular regions

A rectangular region is the set of points contained in a rectangular subspace

A region variable can hold values of different ranks e.g., – region R; R = [0:10]; … R = [-100:100, -100:100]; … R = [0:-1]; …

Operations– R.rank ::= # dimensions in region; – R.size() ::= # points in region– R.contains(P) ::= predicate if region R contains point P– R.contains(S) ::= predicate if region R contains region S– R.equal(S) ::= true if region R equals region S– R.rank(i) ::= projection of region R on dimension i (a one-dimensional region)– R.rank(i).low() ::= lower bound of ith dimension of region R– R.rank(i).high() ::= upper bound of ith dimension of region R– R.ordinal(P) ::= ordinal value of point P in region R– R.coord(N) ::= point in region R with ordinal value = N– R1 && R2 ::= region intersection (will be rectangular if R1 and R2 are rectangular)– R1 || R2 ::= union of regions R1 and R2 (may not be rectangular)– R1 – R2 ::= region difference (may not be rectangular)

Page 88: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation88

IBM

Res

earc

h

88

X10 Cheat Sheet: Regions & Distributions

Region: Expr : Expr -- 1-D region [ Range, …, Range ] -- Multidimensional Region Region && Region -- Intersection Region || Region -- Union Region – Region -- Set difference BuiltinRegion

Dist: Region -> Place -- Constant distribution Distribution | Place -- Restriction Distribution | Region -- Restriction Distribution || Distribution -- Union Distribution – Distribution -- Set difference Distribution.overlay ( Distribution ) BuiltinDistribution

Page 89: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation89

IBM

Res

earc

h

X10 runtime parameters

-NUMBER_OF_LOCAL_PLACES=int The number of places (default = 1)

-INIT_THREADS_PER_PLACE=int Initial number of Java threads per single place(default =2)

-ABSTRACT_EXECUTION_STATS=false Dump out parallel execution statistics

-ABSTRACT_EXECUTION_TIMES=false If dumping statistics also dump out unblocked exec times

-BIND_THREADS=false Use platform-specific calls to bind Java threads to CPUs.

-BIND_THREADS_DIAGNOSTICS=false Print diagnostics related to platform-specific calls to

bind Java threads to CPUs.

-BAD_PLACE_RUNTIME_CHECK=false Perform runtime place checks

-MAIN_CLASS_NAME=null The name of the main class

-OPTIMIZE_FOREACH=false Experimental: Enable runtime loop optimizations.

-PRELOAD_CLASSES=false Pre-load all classes on recursively startup

-PRELOAD_STRINGS=false If pre-loading classes, also pre-load all strings

-LOAD=null Load specified shared library.

Page 90: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation90

IBM

Res

earc

h

X10 preferences

Page 91: X10 Tutorial x10.sf

X10 Tutorial

© 2007 IBM Corporation91

IBM

Res

earc

h

Specifying X10 runtime parameters