parallel and cluster computing 1. 2 array libraries u function extensions of c and fortran 77 with...

102
Parallel and Cluster C omputing 1

Post on 19-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing

1

Page 2: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 2

Array Libraries

Function extensions of C and Fortran 77 with array or vector libraries– The libraries are supposed to be optimised for each

particular computer– Regular compilers can be used => no need in dedicated

optimising compilers One of the most well-known and well-designed array

libraries is the Basic Linear Algebra Subprograms (BLAS)– Provides basic array operations for numerical linear algebra– Available for most modern VP and SP computers

Page 3: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 3

BLAS All BLAS routines are divided into 3 main categories:

– Level 1 BLAS addresses scalar and vector operations– Level 2 BLAS addresses matrix-vector operations– Level 3 BLAS addresses matrix-matrix operations

Routines of Level 1 do– vector reduction operations– vector rotation operations– element-wise and combined vector operations– data movement with vectors

Page 4: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 4

Level 1 BLAS A vector reduction operation

– The addition of the scaled dot product of two real vectors x and y into a scaled scalar r

1

0

n

iii

T yxryxrr

The C interface of the routine implementing the operation isvoid BLAS_ddot(

enum blas_conj_type conj, int n, double alpha, const double *x, int incx, double beta, const double *y, int incy, double *r );

Page 5: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 5

Level 1 BLAS (ctd) Other routines doing reduction operations

– Compute different vector norms of vector x– Compute the sum of the entries of vector x– Find the smallest or biggest component of vector x– Compute the sum of squares of the entries of vector x

Routines doing rotation operations– Generate Givens plane rotation– Generate Jacobi rotation– Generate Householder transformation

Page 6: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 6

Level 1 BLAS (ctd) An element-wise vector operation

– The scaled addition of two real vectors x and y

The C interface of the routine implementing the operation isvoid BLAS_dwaxpby( int n, double alpha, const double *x, int incx, double beta, const double *y, int incy, double *w, int incw );

Function BLAS_cwaxpby does the same operation but on complex vectors

yxw

Page 7: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 7

Level 1 BLAS (ctd) Other routines doing element-wise operations

– Scale the entries of a vector x by the real scalar 1/

– Scale a vector x by and a vector y by , add these two vectors to one another and store the result in the vector y

– Combine a scaled vector accumulation and a dot product

– Apply a plane rotation to vectors x and y

Page 8: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 8

Level 1 BLAS (ctd) An example of data movement with vectors

– The interchange of real vectors x and y

The C interface of the routine implementing the operation isvoid BLAS_dswap( int n, double *x, int incx, double *y, int incy );

Function BLAS_cswap does the same operation but on complex vectors

Page 9: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 9

Level 1 BLAS (ctd) Other routines doing data movement with vectors

– Copy vector x into vector y

– Sort the entries of real vector x in increasing or decreasing order and overwrite this vector x with the sorted vector as well as compute the corresponding permutation vector p

– Scale the entries of a vector x by the real scalar 1/

– Permute the entries of vector x according to permutation vector p

Page 10: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 10

Level 2 BLAS

Routines of Level 2– Compute different matrix vector products– Do addition of scaled matrix vector products– Compute multiple matrix vector products– Solve triangular equations– Perform rank one and rank two updates– Some operations use symmetric or triangular matrices

Page 11: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 11

Level 2 BLAS (ctd) To store matrices, the following schemes are used

– Column-based and row-based storage

– Packed storage for symmetric or triangular matrices

– Band storage for band matrices

Conventional storage– An nxn matrix A is stored in a one-dimensional array a

» aij => a[i+j*s] (C, column-wise storage)

» aij => a[j+i*s] (C, row-wise storage)

– If s=n, rows (columns) will be contiguous in memory

– If s>n, there will be a gap of (s-n) memory elements between two successive rows (columns)

– Only significant elements of symmetric/triangular matrices need be set

Page 12: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 12

Packed Storage Packed storage

– The relevant triangle of a symmetric/triangular matrix is packed by columns or rows in a one-dimensional array

– The upper triangle of an nxn matrix A may be stored in a one-dimensional array a» aij(ij) => a[j+i*(2*n-i-1)/2] (C, row-wise storage)

Example.

22

1211

020100

00

0

a

aa

aaa

221211020100 aaaaaa=>

Page 13: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 13

Band Storage Band storage

– A compact storage scheme for band matrices Consider Fortran and a column-wise storage scheme

– An mxn band matrix A with l subdiagonals and u superdiagonals may be stored in a 2-dimensional array A with l+u+1 rows and n columns

– Columns of matrix A are stored in corresponding columns of array A– Diagonals of matrix A are stored in rows of array A

» aij => A(u+i-j,j) for max(0,j-u) i min(m-1,j+l)

Example.

=>

444342

34333231

23222120

121110

0100

00

0

0

00

000

aaa

aaaa

aaaa

aaa

aa

423120

43322110

4433221100

34231201

aaa

aaaa

aaaaa

aaaa

Page 14: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 14

Level 2 BLAS (ctd) An example of matrix vector multiplication operation

– The scaled addition of a real n-length vector y, and the product of a general real mxn matrix A and a real n-length vector x

yAxy The C interface of the routine implementing this operation is void BLAS_dgemv( enum blas_order_type order, enum blas_trans_type trans, int m, int n, double alpha, const double *a, int stride, const double *x, int incx, double beta, const double *y, int incy );

Parametersorder => blas_rowmajor or blas_colmajortrans => blas_no_trans (do not transpose A)

Page 15: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 15

Level 2 BLAS (ctd) If matrix A is a general band matrix with l subdiagonals and u

superdiagonals, the function

void BLAS_dgbmv( enum blas_order_type order, enum blas_trans_type trans, int m, int n, int l, int u, double alpha, const double *a, int stride, const double *x, int incx, double beta, const double *y, int incy );

better uses the memory. It assumes that a band storage scheme is used to store matrix A.

Page 16: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 16

Level 2 BLAS (ctd) Other routines of Level 2 perform the following operations

yAxy TAA where

Axx xAx T where A is triangularor

BxAxy yAx T , Axw

yAx T , Azw where A is triangular

as well as many others For any matrix-vector operation with a specific matrix operand

(triangular, symmetric, banded, etc.), there is a routine for each storage scheme that can be used to store the operand

Page 17: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 17

Level 3 BLAS

Routines of Level 3 do– O(n2) matrix operations

» norms, diagonal scaling, scaled accumulation and addition» different storage schemes to store matrix operands are supported

– O(n3) matrix-matrix operations» multiplication, solving matrix equations, symmetric rank k and 2k

updates

– Data movement with matrices

Page 18: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 18

Level 3 BLAS (ctd) An example of O(n2) matrix operation, which scales two real mxn matrices A

and B and stores their sum in a matrix C, is

The C interface of the routine implementing this operation under assumption that the matrices A, B and C are of the general form, is

void BLAS_dge_add( enum blas_order_type order, int m, int n, double alpha, const double *a, int stride_a, double beta, const double *b, int stride_b, double *c, int stride_c);

BAC

There are other 15 routines performing this operation for different types and forms of the matrices A, B and C

Page 19: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 19

Level 3 BLAS (ctd) An example of O(n3) matrix-matrix operation involving a real

mxn matrix A, a real nxk matrix B, and a real mxk matrix C is

The C routine implementing the operation for matrices A, B and C in the general form is

void BLAS_dgemm( enum blas_order_type order, enum blas_trans_type trans_a, enum blas_trans_type trans_b, int m, int n, int k, double alpha, const double *a, int stride_a, const double *b, int stride_c, double beta, const double *c, int stride_c);

CABC

Page 20: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 20

Level 3 BLAS (ctd) Data movement with matrices includes

– Copying matrix A or its transpose with storing the result in matrix B

B A or B AT

– Transposition of a square matrix A with the result overwriting matrix A

A AT

– Permutation of the rows or columns of matrix A by a permutation matrix P

A PA or A AP– Different types and forms of matrix operands as well as

different storage schemes are supported

Page 21: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 21

Sparse BLAS Sparse BLAS

– Provides routines for unstructured sparse matrices– Poorer functionality compared to Dense and Banded BLAS

» only some basic array operations used in solving large sparse linear equations using iterative techniques

» matrix multiply, triangular solve, sparse vector update, dot product, gather/scatter

– Does not specify methods to store a sparse matrix» storage format is dependent on the algorithm, the original sparsity

pattern, the format in which the data already exists, etc.» sparse matrix arguments are a placeholder, or handle, which refers

to an abstract representation of a matrix, not the actual data components

Page 22: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 22

Sparse BLAS (ctd) Several routines provided to create sparse matrices

– The internal representation is implementation dependent» Sparse BLAS applications are independent of the matrix storage

scheme, relying on the scheme provided by each implementation

A typical Sparse BLAS application– Creates an internal sparse matrix representation and returns

its handle – Uses the handle as a parameter in computational Sparse

BLAS routines– Calls a cleanup routine to free resourses associated with the

handle, when the matrix is no longer needed

Page 23: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 23

Example Example. Consider a C program using Sparse BLAS

performing the matrix-vector operation y Ax, where

4.4001.4

03.300

4.202.20

0001.1

A

0.1

0.1

0.1

0.1

x

Page 24: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 24

Example (ctd)#include <blas_sparse.h>int main() { const int n = 4, nonzeros = 6; double values[] = {1.1, 2.2, 2.4, 3.3, 4.1, 4.4}; int index_i[] = {0, 1, 1, 2, 3, 3}; int index_j[] = {0, 1, 3, 2, 0, 3}; double x[] = {1.0, 1.0, 1.0, 1.0}, y[] = {0.0, 0.0, 0.0, 0.0}; blas_sparse_matrix A; int k; double alpha = 1.0;  A = BLAS_duscr_begin(n, n); //Create Sparse BLAS handle  for(k=0; k < nonzeros; k++) //Insert entries one by one BLAS_duscr_insert_entry(A, values[k], index_i[k], index_j[k]); BLAS_uscr_end(A); // Complete construction of sparse matrix  //Compute matrix-vector product y = A*x BLAS_dusmv(blas_no_trans, alpha, A, x, 1, y, 1);  BLAS_usds(A); //Release matrix handle}

Page 25: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing

25

Page 26: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 26

Parallel Languages

C and Fortran 77 do not reflect some essential features of VP and SP architectures– They cannot play the same role for VPs and SPs

Optimizing compilers– Only for a simple and limited class of applications

Array libraries– Cover a limited class of array operations

» Other array operations can be only expressed as a combination of the locally-optimized library array operations

» This excludes global optimization of combined array operations

Page 27: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 27

Parallel Languages (ctd)

Parallel extensions of C and Fortran 77 allows programmers– To explicitly express in a portable form any array

operation» Compiler does not need to recognize code to parallelize

» Global optimisation of operations on array is possible

– We consider 2 parallel supersets of C and Fortran 77 » Fortran 90

» C[]

Page 28: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 28

Fortran 90

Fortran 90 is a new Fortran standard released in 1991– Widely implemented since then

Two categories of new features– Modernization of Fortran according to the state-of-

the-art in serial programming languages– Support for explicit expression of operations on

arrays

Page 29: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 29

Fortran 90 (ctd)

Serial extensions include – Free-format source code and some other simple

improvements – Dynamic memory allocation (automatic arrays,

allocatable arrays, and pointers and associated heap storage management)

– User-defined data types (structures)– Generic user-defined procedures (functions and

subroutines) and operators

Page 30: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 30

Fortran 90 (ctd)

Serial extensions (ctd) – Recursive procedures– New control structures to support structured

programming – A new program unit, MODULE, for encapsulation

of data and a related set of procedures

We focus on parallel extensions

Page 31: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 31

Fortran 90 (ctd)

Fortran 90 considers arrays first-class objects – Whole-array operations, assignments, and functions – Operations and assignments are extended in an

obvious way, on an element-by-element basis – Intrinsic functions are array-valued for array

arguments» operate element-wise if given an array as their argument

– Array expressions may include scalar constants and variables, which are replicated (or expanded) to the required number of elements

Page 32: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 32

Fortran 90 (ctd)

Example.

REAL, DIMENSION(3,4,5) :: a, b, c, d…c = a + bd = SQRT(a)c = a + 2.0

Page 33: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 33

WHERE Structure

Sometimes, some elements of arrays in an array-valued expression should be treated specially – Division by zero in a = 1./a should be avoided

WHERE statementWHERE (a /= 0.) a = 1./a

WHERE construct WHERE (a /= 0.) a = 1./a ELSEWHERE a = HUGE(a) END WHERE

Page 34: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 34

Fortran 90 (ctd)

All the array elements in an array-valued expression or array assignment must be conformable, i.e., they must have the same shape – the same number of axes– the same number of elements along each axis

Example. REAL :: a(3,4,5), b(0:2,4,5), c(3,4,-1:3)

Arrays a, b, and c have the same rank of 3, extents of 3,4, and 5, shape of {3,4,5}, size of 60 – Only differ in the lower and upper dimension bounds

Page 35: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 35

Array Section

An array section can be used everywhere in array assignments and array-valued expressions where a whole array is allowed

An array section may be specified with subscripts of the form of triplet : lower:upper:stride

It designates an ordered set i1,…,ik such that

– i1 = lower

– ij+1 = ij + stride ( j=1,…,k-1 )

– |ik - upper| < stride

Page 36: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 36

Array Section (ctd)

Example. REAL :: a(50,50) What sections are designated by the following

expressions? What are the rank and shape for each section? – a(i,1:50:1), a(i,1:50) – a(i,:)– a(i,1:50:3)– a(i,50:1:-1)– a(11:40,j)– a(1:10,1:10)

Page 37: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 37

Array Section (ctd)

Vector subscripts may also be used to specify array sections– Any expression whose value is a rank 1 integer

array may be used as a vector subsript

Example.REAL :: a(5,5), b(5)INTEGER :: index(5)index = (/5,4,3,2,1/)b = a(index,1)

Page 38: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 38

Array Section (ctd)

Whole arrays and array sections of the same shape can be mixed in expressions and assignments

Note, that unlike a whole array, an array section may not occupy contiguous storage locations

Page 39: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 39

Array Constants

Fortran 90 introduces array constants, or array constructors – The simplest form is just a list of elements enclosed in (/

and /)

– May contain lists of scalars, lists of arrays, and implied-DO loops

Examples.(/ 0, i=1,50 /)

(/ (3.14*i, i=4,100,3) /)

(/ ( (/ 5,4,3,2,1 /), i=1,5 ) /)

Page 40: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 40

Array Constants (ctd)

The array constructors can only produce 1-dimensional arrays– Function RESHAPE can be used to construct

arrays of higher rank

REAL :: a(500,500)

a = RESHAPE( (/ (0., i=1,250000) /), (/ 500,500 /) )

Page 41: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 41

Assumed-Shape and Automatic Arrays

Consider the user-defined procedure operating on arrays

SUBROUTINE swap(a,b)REAL, DIMENSION(:,:) :: a, bREAL, DIMENSION(SIZE(a,1), SIZE(a,2)) :: temptemp = aa = bb = tempEND SUBROUTINE swap

Page 42: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 42

Assumed-Shape and Automatic Arrays (ctd)

Formal array arguments a and b are of assumed shape– Only the type and rank are specified– The actual shape is taken from that of the actual array

arguments

The local array temp is an example of the automatic array– Its size is set at runtime– It stops existing as soon as control leaves the procedure

Page 43: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 43

Intrinsic Array Functions

Intrinsic array functions include– Extension of such intrinsic functions as SQRT, SIN, etc. to array arguments

– Specific array intrinsic functions Specific array intrinsic functions do the

following– Compute the scalar product of two vectors

(DOT_PRODUCT) and the matrix product of two matrices (MATMUL)

Page 44: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 44

Specific Intrinsic Array Functions

– Perform diverse reduction operations on an array» logical multiplication (ALL) and addition (ANY)» counting the number of true elements in the

array» arithmetical multiplication (PRODUCT) and

addition (SUM) of its elements» finding the smallest (MINVAL) or the largest

(MAXVAL) element

Page 45: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 45

Specific Intrinsic Array Functions (ctd)

– Return diverse attributes of an array» its shape (SHAPE)» the lower dimension bounds of the array (LBOUND)» the upper dimension bounds (UBOUND)» the number of elements (SIZE)» the allocation status of the array (ALLOCATED)

Page 46: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 46

Specific Intrinsic Array Functions (ctd)

– Construct arrays by means of» merging two arrays under mask (MERGE)» packing an array into a vector (PACK)» replication of an array by adding a dimension

(SPREAD)» unpacking a vector (a rank 1 array) into an array

under mask (UNPACK)

Page 47: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 47

Specific Intrinsic Array Functions (ctd)

– Reshape arrays (RESHAPE)

– Move array elements performing» the circular shift (CSHIFT)» the end-off shift (EOSHIFT)» the transpose of a rank 2 array (TRANSPOSE)

– Locate the first maximum (MAXLOC) or minimum (MINLOC) element in an array

Page 48: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 48

C[]

C[] (C “brackets”) is a strict ANSI C superset allowing programmers to explicitly describe operations on arrays

Vector value, or vector– An ordered set of values (or vector values) of any

one type– Any vector type is characterised by

» the number of elements

» the type of elements

Page 49: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 49

Vector Value and Vector Object

Vector object– A region of data storage, the contents of which can

represent vector values– An ordered sequence of objects (or vector objects)

of any one type Unlike ANSI C, C[] defines the notion of

value of array object – This value is vector

Page 50: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 50

Vector Value and Vector Object (ctd)

Example. The value of the arrayint a[3][2] = {0,1,2,3,4,5};

is the vector{ {0,1}, {2,3}, {4,5} }

This vector has the shape {3,2}. This vector type is named by int[3][2] The shape of array = that of its vector value In C[], array object is a particular case of vector

object

Page 51: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 51

Arrays and Pointers

C array is a contiguously allocated set of elements of any one type of object

C[] array is a a set of elements of any one type of object sequentially allocated with a positive stride– The stride is a distance between successive

elements of the array measured in units equal to the size of array element

– If stride is not specified, it is assumed to be 1

Page 52: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 52

Arrays and Pointers (ctd)

C[] array has at least three attributes:– the type of elements– the number of elements– the allocation stride

Page 53: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 53

Arrays and Pointers (ctd)

Example 1.int a[3];

int a[3:1];

a[0] a[1] a[2]

a[0] a[1] a[2]

Example 2.int a[3:3];

The slot between array elements is of 2xsizeof(int) bytes

Page 54: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 54

Arrays and Pointers (ctd)

In C, a pointer has only one attribute– The type of object it points to– It is needed to correctly interpret

» the value of the object it points to

» the address operators + and - (operand(s) and result should point into the same array)

In C[], a pointer has an additional attribute, stride– If stride is not specified, it is assumed to be 1

Page 55: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 55

Arrays and Pointers (ctd)

Example 1. The declarations int a[] = {0,1,2,3,4};int * p1 = (void*)a;int *:2 p2 = (void*)&a[4];

form the following structure of storage

p1+2 and p2-1 point to the same array element, a[2]

a[0] a[3] a[4]a[1] a[2]

p2p1

Page 56: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 56

Arrays and Pointers (ctd)

Expressions e1[e2] or (e2)[e1] provide access to the e2-th element of an array e1 – Identical to (*((e1)+(e2))) – e2 is an integer expression – e1 is an lvalue that has the type “array of type ”

» converted to an expression of the type “pointer to type” pointing to the intial element of the array object

» the attribute stride of this pointer is identical to that of the array object

Page 57: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 57

Arrays and Pointers (ctd)

C[] allows dynamic arrays

typedef int (*pDiag)[n:n+1];int a[n][n];int j;pDiag p = (void*)a;...for(j=0; j<n; j++)

(*p)[j]=1;

Page 58: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 58

Blocking Operator

In C[], the value of an aray object is a vector– The i-th element of the vector is the value of the i-th

element of the array object The postfix [] operator (the blocking operator)

– Supports access to an array as the whole – Its operand has the type “array of type”– Blocks the conversion of the operand to a pointer

Example. int a[5], b[5:2], c[5:3]; » a[], b[], and c[] designate arrays a, b, and c as a whole» c[]=a[]+b[]

Page 59: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 59

Lvector

In C, an lvalue is an expression designating an object

Example. int d[5][5];– d[i][j], d and d[0] are lvalues– d[i][j]+1 and &d[0] are not.

Modifiable lvalue– May be the left operand of an assignment operator

»d[i][j] is a modifiable lvalue»d and d[0] are not

Page 60: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 60

Lvector (ctd)

In C[], an lvector is an expression designating a vector object

Modifiable lvector– May be the left operand of an assignment operator

Example. int d[5][5];– d, d[0], d[], and d[0][] are lvectors

»d[], and d[0][] are modifiable»d and d[0] are not modifiable

Page 61: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 61

Lvector (ctd) Example. int a[4][4];

(*(int(*)[4:5])a)[]

a[0][0]

a[1][0]

a[2][0]

a[3][0]

a[0][1] a[0][2] a[0][3]

a[2][1]

a[3][1]

a[1][2]

a[3][2]

a[1][3]

a[2][3]

a[1][1]

a[2][2]

a[3][3]

Page 62: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 62

Subarray

An object belongs to an array if– It is an element of the array, or– It belongs to an element of the array

Subarray – A set of objects belonging to an array– An array itself

Example (ctd). The main diagonal is a subarray– It is an array obect of the type int[4:5]

Page 63: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 63

Subarray (ctd) Example. int a[4][4];

(*(int(*)[3:5])(&a[0][1]))[]

a[0][0]

a[1][0]

a[2][0]

a[3][0]

a[0][1] a[0][2] a[0][3]

a[2][1]

a[3][1]

a[1][2]

a[3][2]

a[1][3]

a[2][3]

a[1][1]

a[2][2]

a[3][3]

Page 64: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 64

Subarray (ctd)

Not every regular set of objects belonging to an array makes up its subarray

Example. int a[5][5];a[0][0]

a[1][0]

a[2][0]

a[3][0]

a[0][1] a[0][2] a[0][3]

a[2][1]

a[3][1]

a[1][2]

a[3][2]

a[1][3]

a[2][3]

a[1][1]

a[2][2]

a[3][3]

a[0][4]

a[1][4]

a[2][4]

a[3][4]

a[4][0] a[4][1] a[4][2] a[4][3] a[4][4]

No constant modifiable lvector designates this “inner square”

Page 65: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 65

Array Section

The [:] operator (the grid operator) – Supports access to array sections of general form

Syntax. e[l:r:s] – Expression e may have type “array of type” or

“pointer to type”– Expressions l, r, and s have integer types and denote

» the left bound» the right bound» the stride

Page 66: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 66

Array Section (ctd)

Semantics. e[l:r:s] – A vector object of (r-s)/l+1 elements of type type

» Its i-th element is e[l+s*i]

– Expression e[l:r:s] is lvector– Expression e[l:r:s] is a modifiable lvector if

» All expressions {e[l+s*i] }i=0,1,… are modifiable lvectors/lvalues

– e[l:r:1] e[l:r]

Page 67: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 67

Array Section (ctd)

Operand e in e[l:r:s] may have a vector type – Operator [:] is applied element-wise

– Let the vector value of e be {u1,…,uk}

» e[l:r:s] will designate a vector of k vectors

» The i-th element of the j-th vector will be uj[l+s*i] (j=1,

…,k)

Page 68: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 68

Example

Example. int a[5][5];

a[1:3][1:3]

a[0][0]

a[1][0]

a[2][0]

a[3][0]

a[0][1] a[0][2] a[0][3]

a[2][1]

a[3][1]

a[1][2]

a[3][2]

a[1][3]

a[2][3]

a[1][1]

a[2][2]

a[3][3]

a[0][4]

a[1][4]

a[2][4]

a[3][4]

a[4][0] a[4][1] a[4][2] a[4][3] a[4][4]

Page 69: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 69

Example (ctd)

a[1:3]

a[0][0]

a[1][0]

a[2][0]

a[3][0]

a[0][1] a[0][2] a[0][3]

a[2][1]

a[3][1]

a[1][2]

a[3][2]

a[1][3]

a[2][3]

a[1][1]

a[2][2]

a[3][3]

a[0][4]

a[1][4]

a[2][4]

a[3][4]

a[4][0] a[4][1] a[4][2] a[4][3] a[4][4]

a[0][0]

a[1][0]

a[2][0]

a[3][0]

a[0][1] a[0][2] a[0][3]

a[2][1]

a[3][1]

a[1][2]

a[3][2]

a[1][3]

a[2][3]

a[1][1]

a[2][2]

a[3][3]

a[0][4]

a[1][4]

a[2][4]

a[3][4]

a[4][0] a[4][1] a[4][2] a[4][3] a[4][4]

a[1:3][1:3]

Page 70: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 70

Array Section (ctd)

Operands l and/or r in e[l:r:s] may be omitted – If l is omitted, the left bound is set to 0 – If r is omitted, the right bound is

» set to n-1, if the first operand e is an n-element array

» determined from the context, if e is a pointer

Example. int a[5][5];

– a[:][:] a[]

Page 71: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 71

Element-Wise Vector Operators

The operand of the cast operator and the unary &, *, +, -, ~, !, ++, and –- operators may have a vector type– The operators are applied element-wise

Example.int j, k, l, m, n;

int *p[5] = { &j, &k, &l, &m, &n };

– *(p[1:3]) designates a vector object consisting of three integer variables k, l, and m.

Page 72: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 72

Element-Wise Vector Operators (ctd)

Binary operators *, /, %, +, -, <<, >>, <, >, <=, >=, ==, !=, &, ^, |, &&, and || may have vector operands

If the operands have the same shape, then the operator is executed element-wise producing the result of this shape

Page 73: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 73

Element-Wise Vector Operators (ctd)

In general, the operands may have different shapes but they must be conformable – 2 operands are conformable iff the beginning

of the shape of one operand is identical to the shape of the other operand» Vectors having shapes {9,8,7,6} and {9,8} are

conformable» A non-vector operand is conformable with any

vector operand (why?)

Page 74: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 74

Element-Wise Vector Operators (ctd)

Let operands a and b be conformable, and rank(a) < rank(b)

The execution of the operator starts from conformable extension of the value of a to the shape of b – The conformable extension just replicates the value

by adding dimensions

Page 75: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 75

Element-Wise Vector Operators (ctd)

Example. Conformable extension of vector {{1,2,3},{4,5,6}} of shape {2,3} to shape {2,3,2} is vector {{{1,1},{2,2},{3,3}},{{4,4},{5,5},{6,6}}}

Then the operator is applied element-wise to the result of the conformable extension of the value of a and the value of b, producing the result of the same shape as that of b

Page 76: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 76

Element-Wise Vector Operators (ctd)

The assignment operators =, *=, +=, etc. may have vector operands – The left operand shall be a modifiable lvector – Its rank shall not be less than that of the right operand – The operands shall be conformable

Two-step execution– The right operand is conformably extended to the

shape of the left one– The assignment is executed element-wise

Page 77: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 77

Element-Wise Vector Operators (ctd)

Example.

int a[m][n], b[m];...a[:][:] = b[:];

Page 78: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 78

Example LU-factorization of the square matrix a by using the

Gaussian eliminationdouble a[n][n], t;int i, j;...for(i=0; i<n; i++) {

for(j=i+1; j<n; j++) { t = a[j][i]/a[i][i]; if(a[j][i]!=0.) a[j][i:n-1]-=t*a[i][i:n-1]; } }

Page 79: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 79

Element-Wise Vector Operators (ctd)

By definition, e1[e2] is identical to (*((e1)+(e2)))– Therefore, e1 and e2 may be of vector type

The programmer can construct lvectors that designate irregular array sections

Example.int a[m][n], ind[] = {0,1,6,18};...a[ind[:]][:] = 0;

This code zeros the elements of the 0-th, 1-st, 6-th, and 18-th rows of array a

Page 80: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 80

Element-Wise Vector Operators (ctd)

The first operand of the . operator may have a vector type

The second operand shall name a member of a structure or union type

The operator is executed element-wise– The result will have the same shape as the first

operand e->id is identical to (*e).id

Page 81: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 81

Reduction Operators

Reduction operators [*], [+], [&], [^], [|], [&&], [||], [?<], and [?>]– Unary operators

– Only applicable to vector operands

– If v1,…,vn are the elements of the vector value of

the expression e, then the value of the expression [*]e is that of the expression (v1*…*vn)

Page 82: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 82

Examples

Example 1. Dot product of the vectors a and b.

double a[n];

double b[n];

double c;

...

c = [+](a[]*b[]);

Page 83: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 83

Examples (ctd)

Example 2. Maximal element of the matrix a

int a[m][n];

int max;

...

max = [?>][?>]a[];

Page 84: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 84

Examples (ctd)

Example 3. Multiplication of matrices a and b

double a[m][l];double b[l][n];double c[m][n];int i;...for(i=0; i<m; i++)

c[i][] = [+](a[i][]*b[]);

Page 85: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 85

Memory Hierarchy

Parallel programming systems for VPs and SPs take into account their modern memory structure– Optimal memory management is often more efficient than

optimal usage of IEUs

– Approaches to optimal memory management appear surprisingly similar to optimisation of parallel facilities

Simple two-level memory model– Small and fast register memory

– Large and relatively slow main memory

Page 86: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 86

Memory Hierarchy (ctd)

A simple modern memory hierarchy– Register memory – Cache memory – Main memory – Disk memory

Cache memory– A buffer memory between main memory and

registers – Holds copies of some data from the main memory

Page 87: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 87

Memory Hierarchy (ctd)

Execution of instruction reading a data item from the main memory into a register– Check if a copy of the data item is already in the

cache– If so, the data item will be actually transferred into

the register from the cache– If not, the data item will be transferred into the

register from the main memory, and a copy of the item will appear in the cache

Page 88: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 88

Cache

Cache– Partitioned into cache lines

» Cache line is a minimum unit of data transfer between the cache and the main memory

» Scalars may be transferred only as a part of a cache line

– Much smaller than the main memory» The same cache line may reflect different data blocks

from the main memory

Page 89: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 89

Cache (ctd)

Types of cache memory– Direct mapped

» each block of the main memory has only one place it can appear in the cache

– Fully associative » a block can be placed anywhere in the cache

– Set associative » a block can be placed in a restricted set of places» a set is a group of two or more cache lines» n-way associative cache

Page 90: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 90

Cache (ctd)

Cache miss is the situation when a data item being referenced is not in the cache

Minimization of cache misses is able to significantly accelerate execution of the program

Programs intensively using basic operations on arrays are obviously suitable for that type of optimization

Page 91: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 91

Loop Tiling

The main specific optimization minimizing the number of cache misses is loop tiling

Consider the loop nest for(i=0; i<m; i++) /* loop 1 */ for(j=0; j<n; j++) /* loop 2 */

if(i==0) b[j]=a[i][j]; else b[j]+=a[i][j];

b[j] are repeatedly used by successive iterations of loop 1

Page 92: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 92

Loop Tiling (ctd)

If n is large enough, the data items may be flushed from the cache by the moment of their repeated use

To minimize the flushing of repeatedly used data items, the number of iterations of loop 2 may be decreased

To keep the total number of iterations of this loop nest unchanged, an additional controlling loop is introduced

Page 93: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 93

Loop Tiling (ctd)

The transformed loop nest is: for(k=0; k<n; k+=T) //additional controlling loop 0 for(i=0; i<m; i++) // loop 1 for(j=k; j<min(k+T,n); j++) // loop 2 if(i==0) b[j]=a[i][j]; else b[j]+=a[i][j];

This transformation is called tiling T is the tile size

Page 94: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 94

Loop Tiling (ctd)

In general, the loop tiling is applied to loop nests of the form

for(i1=...) /* loop 1 */ for(i2=...) /* loop 2 */ ... for(in=...) { /* loop n */ ... e[i2]…[in] ... }

The goal is to minimize the number of cache misses for reference e[i2]…[in], which is repeatedly used by successive iterations of loop 1

Page 95: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 95

Loop Tiling and Optimising Compilers

The recognition of the loop nests, which can be tiled is the most difficult problem to be solved by optimising C and Fortran 77 compilers– Based on the analysis of data dependencies in loop nests

Theorem. The loop tiling is legally applicable (to the above loop nest) iff the loops from loop 2 to loop n are fully interchangeable – To prove the interchangability an analysis of data

dependence between different iterations of the loop nest is needed

Page 96: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 96

Loop Tiling and Array Libraries

Level 3 BLAS is specified to support block algorithms of matrix-matrix operations

Partitioning matrices into blocks and performing the computation on the blocks maximizes the reuse of data held in the upper levels of memory hierarchy

Page 97: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 97

Loop Tiling and Parallel Languages

Compilers for parallel languages do not need to recognize loops suitable for tiling

They can translate explicit operations on arrays into loop nests with the best possible temporal locality

Page 98: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 98

Virtual memory

Instructions address virtual memory rather than the real physical memory

The virtual memory is partitioned into pages of a fixed size – Each page is stored on a disk until it is needed – When the page is needed, it copied to main

memory, with the virtual addresses mapping into real addresses

– This copying is known as paging or swapping

Page 99: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 99

Virtual memory (ctd)

Programs processing large enough arrays do not fit into main memory – The swapping takes place each time when required data are

not in the main memory

– The swapping is a very expensive operation

– Minimization of the number of swappings can significantly accelerate the programs

The problem is similar to minimization of cache misses and can be, therefore, approached similarly

Page 100: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 100

Vector and Superscalar Processors: Summary

VPs and SPs provide instruction-level parallelism, which is best exploited by applications with intensive operations on arrays

Such applications can be written in a serial programming language and complied by dedicated optimizing compilers performing some specific loop optimizations– Modular, portable, and reliable programming are supported

– Efficiency and portable efficiency are also supported but only for a limited class of programs

Page 101: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 101

Vector and Superscalar Processors: Summary (ctd)

Array libraries allow the programmers to avoid the use of dedicated compilers– The programmers express operations on arrays

directly using calls to carefully implemented subroutines

– Modular, portable, and reliable programming are supported

– Limited efficiency and portable efficiency» Excludes global optimization of combined array

operations

Page 102: Parallel and Cluster Computing 1. 2 Array Libraries u Function extensions of C and Fortran 77 with array or vector libraries –The libraries are supposed

Parallel and Cluster Computing 102

Vector and Superscalar Processors: Summary (ctd)

Parallel languages combine advantages of the first and second approaches– Operations on arrays can be explicitly expressed

» No need in sophisticated algorithms to recognize parallelizable loops

– Global optimisation of combined array operations is possible

– They support general-purpose programming (unlike existing array libraries)