an hpspmd programming model

69
An HPspmd Programming Model Bryan Carpenter NPAC at Syracuse University Syracuse, NY 13244 [email protected]

Upload: blenda

Post on 15-Jan-2016

43 views

Category:

Documents


0 download

DESCRIPTION

An HPspmd Programming Model. Bryan Carpenter NPAC at Syracuse University Syracuse, NY 13244 [email protected]. Goals of this lecture. Motivate a parallel programming model that combines data parallel features from HPF with an explicitly SPMD programming style. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An HPspmd Programming Model

An HPspmd Programming Model

Bryan Carpenter

NPAC at Syracuse UniversitySyracuse, NY [email protected]

Page 2: An HPspmd Programming Model

Goals of this lecture Motivate a parallel programming

model that combines data parallel features from HPF with an explicitly SPMD programming style.

Review in detail a specific HPspmd language called HPJava.

Page 3: An HPspmd Programming Model

Contents of Lecture Introduction.

HPspmd language extensions. Integration of high-level libraries.

HPJava. Processes and distributed arrays. Mapping arrays. Array sections. Rules and definitions A distributed array communication library

Page 4: An HPspmd Programming Model

HPF status Standard is more than 6 years old. Many companies involved in the HPF

forum no longer in business; many of those remaining abandoned their HPF projects.

Problems: Language too complex—robust compilers very

difficult to implement. Perception that language inflexible—limited

demand from application developers. Most parallel applications still developed

in direct SPMD style, using MPI, etc.

Page 5: An HPspmd Programming Model

High-level SPMD libraries While the HPF language hit problems,

various data-parallel SPMD libraries have been deployed: ScaLAPACK PetSc Kelp Global Array Toolkit PARTI/CHAOS Adlib

Higher-level libraries support programming with distributed arrays in essentially MPI-like environment.

Page 6: An HPspmd Programming Model

Idea of HPspmd Library approach to distributed arrays

clearly works, but lacks uniformity and elegance of data-parallel languages. No unifying framework.

Can we take a minimal subset of the ideas from HPF—unified syntax for distributed arrays—to make the library-based SPMD approach more attractive?

Page 7: An HPspmd Programming Model

Features of HPspmd Adopts ideas, run-time technologies and

some compilation techniques from HPF. Abandon:

single, logical, global thread of control, compiler-determined placement of computations, compiler-generated, automatic insertion of

communications. Left with:

explicitly MIMD (SPMD) programming model, syntax for representing distributed arrays, syntax for expressing placement of computation.

Page 8: An HPspmd Programming Model

Benefits Translators are much easier to implement

than HPF compilers. No compiler magic needed.

Attractive framework for library development, avoiding inconsistent parametrizations of distributed array arguments.

Better prospects for handling irregular problems—easier to fall back on specialized libraries as required.

Ultimate fall-back: can directly call MPI functions from within an HPspmd program.

Page 9: An HPspmd Programming Model

Language extensions HPspmd languages extended from standard

base languages (Fortran, C++, Java, . . .). A program (fragment) that doesn’t use the

extensions should be executed exactly as a SPMD program—in independent processes with their own threads of control.

Distributed array types added. Strictly separate from sequential arrays of base

language—no attempt to conceal the distinction. Distributed control constructs added.

Most important is a distributed, data-parallel loop.

Page 10: An HPspmd Programming Model

An HPspmd programProcs p = new Procs2(P, P);on(p) { Range x = new ExtBlockRange(N, p.dim(0), 1); Range y = new ExtBlockRange(N, p.dim(1), 1);

float [[,]] u = new float [[x, y]]; . . . some code to initialize ‘u’

for (int iter = 0; iter < NITER; iter++) { Adlib.writeHalo(u);

overall (i = x for 1 : N-2) overall (j = y for 1 + (i` + iter) % 2 : N-2 : 2) u[i, j] = 0.25 * (u[i-1, j] + u[i+1, j] + u[i, j-1] + u[i,

j+1]); }}

Page 11: An HPspmd Programming Model

HPspmd Architecture

Page 12: An HPspmd Programming Model

HPJava Language for parallel programming. Extends Java with syntax for manipulating

distributed arrays. Implements the HPspmd model—

independent processes executing same program, sharing elements of distributed arrays.

Processes operate directly on locally owned elements. Explicit communication needed in program to permit access to elements owned by other processes.

Page 13: An HPspmd Programming Model

Processes and Process Grids HPJava program started concurrently in

some set of processes. Processes named through grid objects:

Procs p = new Procs2(2, 3); Assumes program currently executing

on 6 or more processes. Restrict execution to processes within

grid by on construct: on(p) { . . . }

Page 14: An HPspmd Programming Model

Basic use of grids

HPJava program:

Procs p = new Procs2(2, 3);on(p) { Dimension d = p.dim(0), e =

p.dim(1);

System.out.prinln(“My coordinates are(“

+ d.crd() + “, “ + e.crd() + “)”);}

Sample output:

My coordinates are (0, 2)

My coordinates are (1, 2)

My coordinates are (0, 0)

My coordinates are (1, 0)

My coordinates are (1, 1)

My coordinates are (0, 1)

Page 15: An HPspmd Programming Model

Distributed Arrays in HPJava

Many differences between distributed arrays and ordinary arrays of Java. New kind of container class with special syntax.

Type signatures, constructors use double brackets to emphasize distinction: Procs2 p = new Procs2(2, 3); on(p) { Range x = new BlockRange(N, p.dim(0)); Range y = new BlockRange(N, p.dim(1));

float [[,]] a = new float [[x, y]]; . . . }

Page 16: An HPspmd Programming Model

2-dimensional array distributed over p

Page 17: An HPspmd Programming Model

Parallel programming Matrix addition:

Procs2 p = new Procs2(2, 3);on(p) { Range x = new BlockRange(N, p.dim(0)); Range y = new BlockRange(N, p.dim(1));

float [[,]] a = new float [[x, y]], b = new float [[x, y]], c = new float [[x, y]]; . . . initialize values in ‘a’, ‘b’

overall (i = x for :) overall (j = y for :) c[i, j] = a[i, j] + b[i, j]; }

Page 18: An HPspmd Programming Model

The overall construct Second special control construct (after on)—a

distributed parallel loop. General form parametrized by index triplet:

overall (i = x for l : u : s) { . . . }

l = lower bound, u = upper bound, s = step. All indices must be within range x.

Special forms:

overall (i = x for l : u) { . . . }

stride defaults to 1, and:

overall (i = x for :) { . . . }

lower bound = 0, upper bound = x.size() - 1.

Page 19: An HPspmd Programming Model

A parallel stencil update program float [[,]] u = new float [[x, y]]; . . . initialize values in ‘u’

float [[,]] n = new float [[x, y]], s = new float [[x, y]],

e = new float [[x, y]], w = new float [[x, y]]; Adlib.shift(n, u, 1, 0); Adlib.shift(s, u, -1, 0); Adlib.shift(e, u, 1, 1); Adlib.shift(w, u, -1, 1);

overall (i = x for 1 : N - 2) overall (j = y for 1 : N - 2) u[i, j] = 0.25 * (n[i, j] + s[i, j] + e[i, j] + w[i, j]);

Page 20: An HPspmd Programming Model

Shift communication As, advertised, communication goes

through library call. Use a binding of the Adlib function, shift:

void shift(float [[,]] dst, float [[,]] src, int amount, int dimension);

Destination and source arrays must be identically aligned.

Implements “edge-off” shift. Overloaded to apply to different array

ranks, types.

Page 21: An HPspmd Programming Model

About overall loop indexes Why does language demand use of shift?

Could we just write:

overall (i = x for 1 : N - 2) overall (j = y for 1 : N - 2) u[i, j] = 0.25 * (u[i-1, j] + u[i+1, j] + u[i, j-1] + u[i, j+1]);

? Generally, no. Symbols i, j are not integer

loop indexes. They are distributed indexes.

Value of a distributed index is a location—an abstract element of a distributed range.

Page 22: An HPspmd Programming Model

Mapping of locations to grid

Page 23: An HPspmd Programming Model

Distributed indexes Can only be declared in header of overall

construct (or at construct—see next slide). No other location-valued variables (no Java

type associated with a location). In general a subscript used in a distributed

array element reference must be a distributed index, whose value is a location in the associated range of the array.

Dramatically limits patterns of subscripting.

Page 24: An HPspmd Programming Model

The at construct If a is a distributed array, generally cannot write:

a [1, 4] = 73 ;

to assign element. 1, 4 not distributed indexes. If x and y are the ranges of a, can write:

at (i = x [1]) at (j = y [4]) a [i, j] = 73 ;

at is the final special control construct of HPJava. Similar to on—restricts execution of body to processes holding specified location.

Page 25: An HPspmd Programming Model

Relationship between overall and at If s>0, the construct:

overall (i = x for l : u : s) {. . .}

is equivalent to

for (int n = l; n <= u; n += s) at (i = x [n]) {. . .}

If s<0, it is equivalent to

for (int n = l; n >= u; n += s) at (i = x [n]) {. . .}

Page 26: An HPspmd Programming Model

Global index expression Inside the body of the construct:

at (i = x [n]) { . . . }

the expression i` stands for the integer value, n.

Most useful in overall. According to the equivalence in the previous slide, i` is then the global index value.

Page 27: An HPspmd Programming Model

A Complete exampleProcs2 p = new Procs2(P, P);on(p) { Range x = new BlockRange(N, p.dim(0)); Range y = new BlockRange(N, p.dim(1));

float [[,]] u = new float [[x, y]], r = new float [[x, y]]; . . . Initialize ‘u’, ‘r’

float [[,]] n = new float [[x, y]], s = new float [[x, y]], e = new float [[x, y]], w = new float [[x, y]];

. . . Main loop

Adlib.printArray(u);}

Page 28: An HPspmd Programming Model

Initialize ‘u’, ‘r’

overall (i = x for :) overall (j = y for :) if (i` == 0 || i` == N - 1 || j` == 0 || j` == N - 1) { u[i, j] = (float) (i` * i` - j` * j`); r[i, j] = 0.0; } else u[i, j] = 0.0;

Page 29: An HPspmd Programming Model

Main loopdo { Adlib.shift(n, u, 1, 0); Adlib.shift(s, u, -1, 0); Adlib.shift(e, u, 1, 1); Adlib.shift(w, u, -1, 1);

overall (i = x for 1 : N - 2) overall (j = y for 1 : N - 2) { float newU = 0.25 * (n[i, j] + s[i, j] + e[i, j] + w[i, j]);

r[i, j] = Math.abs(newU – u[i, j]); u[i, j] = newU; }

} while(Adlib.maxval(r) > EPS);

Page 30: An HPspmd Programming Model

Load balancing—Mandelbrot set example Set of complex numbers, c, such that the

limit of the iteration: z = c 1 2 z = c + (z ) i+1 i

has absolute value less than 2: 2 |z | < 4

Numerical computation of set: points outside the set are eliminated quickly; points inside or close to the set are computed for many iterations.

Page 31: An HPspmd Programming Model

Mandelbrot set computation

Procs2 p = new Procs2(2, 3);on(p) { Range x = new BlockRange(N, p.dim(0)); Range y = new BlockRange(N, p.dim(1));

boolean [[,]] set = new boolean [[x, y]];

overall (i = x for :) overall (j = y for :) { float cr = (4.0 * i` - 2 * N) / N; float ci = (4.0 * j` - 2 * N) / N;

. . . Inner loop }

Adlib.printArray(set);}

Page 32: An HPspmd Programming Model

Inner loopset[i, j] = false;int k = 0;while(zr * zr + zi * zi < 4.0) { if (k++ == CUTOFF) { set[i, j] = true; break; }

// z = c + z * z

float newr = cr + zr * zr – zi * zi; float newi = ci + 2 * zr * zi;

zr = newr; zi = newi;}

Page 33: An HPspmd Programming Model

Changing mapping of problem Block distribution leads to poor

load-balancing. To go over to cyclic decomposition,

just change Range x = new BlockRange(N, p.dim(0)); Range y = new BlockRange(N, p.dim(1));

to Range x = new CyclicRange(N, p.dim(0)); Range y = new CyclicRange(N, p.dim(1));

Page 34: An HPspmd Programming Model

Block-wise decomposition of Mandelbrot set

Page 35: An HPspmd Programming Model

Cyclic decomposition of Mandelbrot set

Page 36: An HPspmd Programming Model

Using ghost regions As discussed in previous lecture, ghost

regions are extremely useful in parallel stencil updates.

Usually in HPJava, distributed array subscripts must be distributed indexes. Special syntax extension for subscripting arrays with ghost regions:

shifted indexes allowed.

Page 37: An HPspmd Programming Model

Shifted indexes If i is a distributed index, then:

i ± expression

is a shifted index. Here expression is an integer, usually a small constant.

Assuming array a has suitable ghost regions, can write, say: overall (i = x for 1 : N-2) overall (j = y for 1 : N-2) a[i, j] = 0.25 * (a[i-1, j] + a[i+1, j] + a[i, j-1] + a[i, j+1]);

Page 38: An HPspmd Programming Model

Creating arrays with ghost regions. No special syntax, but new range

classes. ExtBlockRange is a range class alignment-equivalent to BlockRange, but with ghost extensions.

Size of extensions specified in constructor of range object.

Page 39: An HPspmd Programming Model

Filling ghost regions Ghost regions not magic. They

must be explicitly filled with values from (usually) neighboring processes.

Adlib has a collective communication operation, writeHalo, that does this.

Page 40: An HPspmd Programming Model

Laplace equation using ghost regions

Procs2 p = new Procs2(P, P);on(p) { Range x = new ExtBlockRange(N, p.dim(0), 1, 1); Range y = new ExtBlockRange(N, p.dim(1), 1, 1);

float [[,]] a = new float [[x, y]];

… Set boundary values of ‘a’

… Main loop}

Page 41: An HPspmd Programming Model

Main loop

float [[,]] b = new float [[x, y]], r = new float [[x, y]];do { Adlib.writeHalo(a);

overall (i = x for 1 : N-2) overall (j = y for 1 : N-2) { b[i, j] = 0.25 * (a[i-1, j] + a[i+1, j] + a[i, j-1] + a[i,

j+1]);

r[i, j] = Math.abs(b[i, j] - a[i, j]); }

HPspmd.copy(a, b);

} while(Adlib.maxval(r) > EPS);

Page 42: An HPspmd Programming Model

Red-black version

float [[,]] r = new float [[x, y]];HPspmd.init(r, 0.0);

int iter = 0;do { Adlib.writeHalo(a);

overall (i = x for 1 : N-2) overall (j = y for 1 + (i` + iter) % 2 : N-2 : 2) {

float newA = 0.25 * (a[i-1, j] + a[i+1, j] + a[i, j-1] + a[i, j+1]);

r[i, j] = Math.abs(newA - a[i, j]);

a[i, j] = newA; }

iter++;} while(Adlib.maxval(r) > EPS);

Page 43: An HPspmd Programming Model

Conway’s Life using ghost regions

int mode [] = {Adlib.CYCL, Adlib.CYCL};

Procs2 p = new Procs2(P, P);

on(p) { Range x = new ExtBlockRange(N, p.dim(0), 1, 1); Range y = new ExtBlockRange(N, p.dim(1), 1, 1);

int [[,]] state = new int [[x, y]];

… Define initial state of Life board, ‘state’.

… Main loop}

Page 44: An HPspmd Programming Model

Main loopint [[,]] sums = new int [[x, y]];for (int iter = 0; iter < NITER; iter++) { Adlib.writeHalo(state, mode);

overall (i = x for :) overall (j = y for :) sums[i, j] = state[i-1, j-1] + state[i-1, j] + state[i-1, j+1]

+ state[i, j-1] + state[i, j+1] + state[i+1, j-1] + state[i+1, j] + state[i+1, j+1]; overall (i = x for :) overall (j = y for :) switch (sums [i, j]) { case 2: break; case 3: state[i, j] = 1; break; default: state[i, j] = 0; break; }}

Page 45: An HPspmd Programming Model

Collapsed Distributions CollapsedRange subclass of Range

stands for range that is not distributed.

In: Range x = CollapsedRange(N); Range y = BlockRange(M, p.dim(0)); float [[,]] a = new float [[x, y]];

first dimension of a is collapsed.

Page 46: An HPspmd Programming Model

Sequential array dimensions

Subscripts in first dimension of array declared above must still be distributed indexes, although effectively a sequential array w.r.t. that dimension.

Very convenient to use integer subscripts in sequential dimensions.

Introduce “subtypes” of distributed arrays with sequential dimensions. Example becomes:

Range y = BlockRange(M, p.dim(0)); float [[*,]] a = new float [[N, y]];

Page 47: An HPspmd Programming Model

Syntax for sequential dimensions Asterisk, *, appears in slot of type

signature for sequential dimension. Integer expression (rather than

range) appears in constructor slot. If x is a range, the expression new int [[10, x]]

has type int [[*,]]. Can use integer expressions for

subscripts in element references!

Page 48: An HPspmd Programming Model

Replicated distributions Collapsed distributions mean array rank

can be larger than process grid rank. Also allowed for array rank to be smaller

than grid rank: Procs2 p = new Procs2(P, P);

on(p) { Range x = new BlockRange(N, p.dim(0)); float [[]] b = new float [[x]]; }

Array b is replicated over p.dim(1).

Page 49: An HPspmd Programming Model

Aside: replicated variables versus replicated values The HPJava language does not enforce that

all copies of replicated variables hold the same value at corresponding points of program execution.

However, a common programming practice is to maintain same values in all copies (most of the time)—“canonical style”.

Adlib communication library, for example, typically broadcasts results to replicated destination arrays.

Page 50: An HPspmd Programming Model

Matrix multiplication example

float [[,]] c = new float [[x, y]];float [[,*]] a = new float [[x, N]];float [[*,]] b = new float [[N, y]];

… Initialize ‘a’, ‘b’

overall (i = x for :) overall (j = y for :) { c [i, j] = 0.0; for(int k = 0; k < N; k++) c[i, j] += a[i, k] * b[k, j]; }

Page 51: An HPspmd Programming Model

Remarks on matrix example

Assumes a very specific set of alignment relations between a, b and c: First dimension of a aligned with first

dimension of c; second dimension collapsed; whole array replicated over process dimension associated with second dimension of c.

A general matrix multiplication procedure may accept any distributions for arguments, then remap them to the required relation.

Page 52: An HPspmd Programming Model

General matrix multiplication

void matmul(float [[,]] c, float [[,]] a, float [[,]] b) { Group p = c.grp(); Range x = c.rng(0), y = c.rng(1);

int n = a.rng(1).size();

float [[,*]] t1 = new float [[x, n]] on p; Adlib.remap(t1, a);

float [[*,]] t2 = new float [[n, y]] on p; Adlib.remap(t2, b);

on(p) … overall nest, c = t1 * t2. As previous

example.}

Page 53: An HPspmd Programming Model

Distribution group of arrays matmul procedure illustrates general form

of distributed array constructor, with on clause.

In general this specifies the distribution group of the array.

Distribution group defaults to the active process group—in all previous examples this was set by an enclosing on construct.

(Must call remap outside on(p){} here, because b, c may have elements outside group p.)

Page 54: An HPspmd Programming Model

Array sections HPJava has a way of representing

regular sub-arrays, similar to Fortran 90.

Syntax similar to element references, but: uses double brackets, and freer rules about subscripts.

In particular, subscripts can be triplets.

Page 55: An HPspmd Programming Model

A two-dimensional FFT 2d FFT can be implemented simply by

applying 1d FFT in parallel to all rows, then in parallel to all columns.

Pseudocode assumes existence of fictitious complex primitive type. For real code, split complex arrays into two float arrays—real and imaginary parts.

(Java Grande Numerics WG drafted proposals for adding complex to Java.)

Page 56: An HPspmd Programming Model

2d FFT (pseudocode)void fft1d(complex [[*]] u) {… Sequential

FFT…}

complex [[,*]] a = new complex [[x, N]];complex [[*,]] b = new complex [[N, x]];

… Initial values in ‘a’

overall (i = x for :) fft1d(a [[i, :]]);

Adlib.remap(b, a);

overall(i = x for :) fft1d(b [[:, i]]);

… Result in ‘b’

Page 57: An HPspmd Programming Model

Aside: an array section expression is not a variable...

An array element reference is a variable: a[i, j] = 23.0;

An array section expression is not a variable: a[[:, 1]] = b; // Semantic error!

Assuming b is a one-dimensional array, may copy elements by: HPspmd.copy(a[[:, 1]], b);

Usual rule for object-valued expressions.

Page 58: An HPspmd Programming Model

Cholesky decomposition example Similar to LU decomposition of

earlier lecture, but applies to symmetric matrix.

Choose to distribute by columns, instead of 2d decomposition.

k-th column is broadcast by passing sections to remap.

Page 59: An HPspmd Programming Model

Cholesky decomposition code

float [[*,]] a = new float [[N, x]];… Some code to initialize ‘a’

float [[*]] col = new float [[N]]; // Collapsed, replicated

for (int k = 0; k < N-1; k++) { … Normalize k-th column of ‘a’

Adlib.remap(col [[k+1 : N-1]], a [[k+1 : N-1, k]]);

overall (j = x for k+1 : N-1) for (int i = j`; i < N; i++) a[i, j] -= col[i] * col[j`];}… Normalize element a[N-1, N-1]

Page 60: An HPspmd Programming Model

Column normalization details

. . . // Normalize k-th column of ‘a’: at (j = x[k]) { float diag = Math.sqrt(a [k, j]); a[k, j] = diag; for (int i = k+1; i < N; i++) a[i, j] /= diag; }. . .// Normalize element a[N-1, N-1]:at (j = x[N-1]) a[N-1, j] = Math.sqrt(a[N-1, j]);

Page 61: An HPspmd Programming Model

Aside: correct use of subscripts In the assignment:

a[i, j] -= col[i] * col[j`];

may not write col[j]. j is a location in the range x. The

array col has a different, collapsed range.

This is not a “type” error, but it would be trapped as a runtime exception. Like ordinary array-bound checking.

Page 62: An HPspmd Programming Model

Subranges Dimension of an array section

produced by a triplet subscript is generally a subrange of the range of the parent.

Syntax for direct creation of subrange objects: Range u = x [0 : N/2 - 1]; Range v = y [0 : N-1 : 2];

Page 63: An HPspmd Programming Model

Restricted groups Discussed in the context of distributed

array descriptors in an earlier lecture. In HPJava a natural characterization of

a restricted group is as a subgroup to which a particular location is mapped.

Syntax for direct creation of a restricted group: p / x [1] p / x [1] / y [4]

Page 64: An HPspmd Programming Model

Rules of HPJava Various rules are imposed to

ensure that all accesses to array elements really are local.

These are automatically enforced by compiler or run-time checks.

First formally define active process group.

Page 65: An HPspmd Programming Model

The active process group Executing the construct:

on(p) {. . .}

changes the APG to p in its body.

If current APG is p, executing the constructs: at (i = x[n]) {. . .}

or: overall (i = x for l : u : s) {. . .}

changes the APG to p/ i in the body.

Page 66: An HPspmd Programming Model

Rules for distributed control constructs

The construct: on(p) {. . .}

can only appear if p is contained in the current APG.

The constructs: at (i = x[n]) {. . .}

or: overall (i = x for l : u : s) {. . .}

can only appear if x is distributed over a dimension of the APG.

Page 67: An HPspmd Programming Model

Rules for distributed array constructors The expression

new T [[e , . . ., e , . . .]] on p 0 r

can only appear if p is contained in the APG.

All e’s that are non-collapsed range objects are distributed over distinct dimensions of p.

If “on p” is omitted, the distribution group defaults to the APG.

Page 68: An HPspmd Programming Model

Rules for element reference subscripts If a is a distributed array, in

a[e , . . ., e , . . .] 0 r

an e can be an integer expression only if the dimension has the sequential attribute. Otherwise it must be a distributed index whose value is a location in the relevant array range.

Page 69: An HPspmd Programming Model

General rule for element access If a is a distributed array and the location-

valued subscripts in the element reference: a[e , . . ., e , . . .] 0 r

are i, j, …, the home group of the element is: p / i / j / . . .

An element may only be accessed when the active process group is contained in the home group of the element.