introduction to matlab distributed computing server (mdcs) - mcgill... · 1 introduction to matlab...

Post on 23-Mar-2020

17 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Introduction to Matlab Distributed Computing Server

(MDCS)

Dan Mazur and Pier-Luc St-Ongeguillimin@calculquebec.ca

December 1st, 2015

Partners and sponsors

2

3

Exercise 0: Login and Setup

Example hand-out slip:07:k41a0?wy#

● Ubuntu login:● Username: csuser07● Password: ___@[S07

● Guillimin login:● ssh class07@guillimin.hpc.mcgill.ca ● Password: k41a0?wy#

4

Outline

● Introduction and Overview● Configuring MDCS for Guillimin

● Submitting and monitoring jobs on Guillimin– batch command

● Parallel toolbox– parfor loops (parallel for loops)

– spmd sections (single program multiple data)

– distributed arrays (large memory problems)

– GPUs and Xeon Phis

5

Parallel Computing Toolbox (PCT)

● High-level constructs for parallel programming– parallel for loops

– distributed arrays

– data parallel (spmd) sections

● Implicit (automatic) parallelism

● Implemented with MPI (MPICH2)

● Restricted to 12 cores on a single node– Multi-node scalability built into MPICH2

– Scalability intentionally limited through technological effort

6

MDCS Overview

● MDCS allows parallel toolbox users access to a number of workers (set by the license terms) on any number of nodes

7

MDCS vs. PCT differences● MDCS jobs are submitted to the batch system

on a cluster, not run locally– Client - Server model

● In PCT, one explicitly starts a parpool environment– In MDCS, this environment is requested in the

batch() command

8

MDCS OverviewGuillimin

Your PC

MDCS

Matlab

.m script+ attached files

Worker Nodes

9

MDCS OverviewGuillimin

Your PC

MDCS

Matlab

.m script+ attached files

Worker Nodes

Job

Scheduler

10

MDCS OverviewGuillimin

Your PC

MDCS

Matlab

.m script+ attached files

Monitoring informationWorker Nodes

Job

Scheduler

11

MDCS OverviewGuillimin

Your PC

MDCS

Matlab

.m script+ attached files

Monitoring informationWorker Nodes

Job

Scheduler

Important: Do not attachlarge data files.Data transfer to and from Guilliminis best accomplished with scp or sftp.See http://www.hpc.mcgill.ca for large file transfers.

12

MDCS Licensing

One N-worker MDCS Job

Provided by user (often via institution) Provided by McGill HPC

Desktop Matlab license

Parallel computing toolbox license

Additional toolbox licensesPool of 64 MDCS licenses

N x MDCS worker licenses

1 master process worker license

13

MDCS Scenario

● Researchers begin using desktop Matlab using institutional licenses

● Eventually, researchers and research programs depend on the resulting software

● Problem sizes increase with time, eventually necessitating parallel computing

● No problem: Mathworks uses an implementation of MPI with good scaling behaviour provided by the free software community to implement their parallel computing toolbox functionality

– But, place restrictions on number of nodes and cores

– Require additional licenses to remove these restrictions

● Because of decisions they made years ago, researchers find themselves facing either

– Potentially expensive license fees to unlock their software's capabilities, or

– Financial and time barriers to switching vendors (i.e. porting code)

14

MDCS Alternatives● Compile MPI functions with mex

– Difficult to maintain, cannot use PCT functions, cannot use Matlab debugger, must have access to many individual Matlab licenses (e.g. TAH license)

● Use Matlab MPI - Use global file system for MPI-like communication– Low performance for tightly-coupled problems

● Use GNU Octave– Reduces the switching costs by re-implementing the Matlab programming

language

– Parallel capabilities are less mature than Matlab

● Porting code to another language (Python, R, Fortran, etc.)– Significant effort and time

● Contact us for help and advice– guillimin@calculquebec.ca

15

MDCS Desktop Configuration

1) Install scripts used for communicating with scheduler

2) Configure the cluster profile

3) Verify your setup

16

Exercise 1: Install Scripts

● Download and unpack .tar.gz configuration file on your local machine– E.g. Linux:cd <workdir>

wget \ http://www.hpc.mcgill.ca/downloads/mdcs_config/guillimin_mdcs_config_v2.3.tar.gz

tar -xvf guillimin_mdcs_config_v2.3.tar.gz

● copy all "config/toolbox-local/*" files to the "<your_matlab_install>/toolbox/local" folder on your local machine

● Start or restart Matlab. Then test your installation:

>> glmnVersion

17

Permissions● What if you don't have write access to the toolbox/local folder?

● Create a new folder in your home directory for Matlab scripts

● Add the new path to your Matlab pathpath('newpath', path);

● Set new path in a startup.m file

● Use MATLABPATH environment variable in Mac and Linux OSs

18

MDCS Integration ScriptsglmnCommSubFcn.m

glmnIndSubFcn.m

glmnGetRemoteConn.m

glmnPBS.m

glmnCreateSubScript.m

glmnExtractJobId.m

glmnGenSubmitString.m

glmnCommJobWrapper.shglmnIndJobWrapper.sh

glmnDeleteJobFcn.m

glmnGetJobStateFcn.m

Main drivers for submitting jobs

Establishes connection to cluster with ssh

Cancel job on cluster through Matlab

Get the job status from the cluster

Specifies the submission parameters

Creates a script which will run on clusterto submit the job

Generates the qsub command

Gets the PBS jobID from the cluster

The script that is submitted to the worker nodes by qsub

19

Avoiding Metadata Corruption● Each pair (Server, Matlab installation) requires a pair of

metadata folders, one on the submitting computer and one on Guillimin

● E.g. installing a new version of Matlab and re-using the same metadata folders will result in corruption

● E.g. Submitting to a new MDCS server and re-using the same metadata folders will result in corruption

● E.g. Multiple users from the same client will require a shared metadata folder (read and write) or separate profiles– Important: You cannot re-use your class account

configuration for other Guillimin accounts

20

How many metadata folders?

guillimin orcinusServers:

Clients:

R2013a R2014a

Lab computer Home computer

R2013a

21

How many metadata folders?

guillimin orcinusServers:

Clients:

R2013a R2014a

Lab computer Home computer

R2013a

Answer: 12

22

Exercise 2: Configure your computer● We have made a script, glmnConfigCluster.m, to

make configuration easier

● Warning: glmnConfigCluster will overwrite any profiles called 'guillimin'

>> glmnConfigCluster

Enter a unique name for your local computer (e.g. the hostname): workshopHome directory on local computer (e.g. /home/alex, /Users/alex, or C:\\Users\\alex): /Users/dmazurHome directory on guillimin (e.g. /home/alex): /home/dmazurOne last step: please connect to guillimin, and create your Matlab job directory:

mkdir -p /home/dmazur/.matlab/jobs/workshop/guillimin/R2014a

Once done, your local computer will be configured to submit jobs to guillimin.

23

Exercise 3: Validation● You will want to test your new cluster with

simple tests before trying more complicated codes

● Clicking the validation button in Matlab can take a long time and the final test is expected to fail

● Perform the validation procedure from the McGill HPC documentation– Must be performed in the TestParfor directory

cd examples/TestParfor

– In glmnPBS.m, set procsPerNode to 3

24

A simple batch job

● myCluster = parcluster('guillimin')– Selects a cluster profile

● j = batch(myCluster, ...)– Submits jobs to a cluster

● Prompted for username● Select 'no' when asked to use identity file● Prompted for password

● wait(j): Waits for job to finish

25

Exercise 4: Simple Batch Job

>> myCluster = parcluster('guillimin')>> j = batch(myCluster, @rand, 1, {10, 10}, 'CurrentDirectory', '.');>> wait(j)>> r = fetchOutputs(j)

26

glmnPBS.m

● For parallel jobs, we have a script (glmnPBS.m) to make job submission easier

● Place this script in your working directory● Before submission, check that you have a

valid glmnPBS.m file, and that your submission parameters are correct

>> test = glmnPBS();

>> test.getSubmitArgs()

27

classdef glmnPBS %Guillimin PBS submission arguments properties % Local script, remote working directory (home, by default) localScript = 'TestParfor'; workingDirectory = '.';

% nodes, ppn, gpus, phis and other attributes numberOfNodes = 1; procsPerNode = 3; gpus = 0; phis = 0; attributes = '';

% Specify the memory per process required pmem = '1700m'

% Requested walltime walltime = '00:30:00'

% Please use metaq unless you require a specific node type queue = 'metaq'

% All jobs should specify an account or RAPid: % e.g. % account = 'xyz-123-aa' account = '';

% You may use otherOptions to append a string to the qsub command % e.g. % otherOptions = '-M email[at]address.com -m bae' otherOptions = '' end

28

Submitting with glmnPBS.m

>> cluster = parcluster('guillimin');

>> glmnPBS.submitTo(cluster);

● Note that glmnPBS.m must be present for all job submissions, even with batch()

– Called by glmnCommSubFcn.m

methods(Static) function job = submitTo(cluster) opt = glmnPBS(); job = batch(cluster, opt.localScript, ... 'matlabpool', opt.getNbWorkers(), ... 'CurrentDirectory', opt.workingDirectory ... ); endend

29

Matlab Job Monitor● Parallel > Monitor Jobs

● Select Profile: guillimin● Enter username

● Select 'no'

● Enter password

● Tip: Set autoupdate to 'never', or use an identity file. Otherwise, Matlab interrupts your work with password requests.

30

Job Monitor can report the state, and more details such as output and errors (right click).

Matlab Job Monitor

31

Monitoring Jobs on Guillimin

● Show running and queued jobsqstat -u class01

– qstat shows both MDCS and other Guillimin jobs

● Detailed scheduler information for job w/ jobID=########qstat -f ########

● Meta-data is stored in job-specific folders /home/username/.matlab/jobs/workshop/guillimin/R2014a/Job1

– The .log files contain output and error from Matlab itself

– The .txt files contain output from disp() and fprintf()

● You should create output and save matlab (.mat) files within your Guillimin storage (scratch, home, or project spaces)– fprintf()

– save()

32

Exercise 5: Submit Parallel Job● Change the working directory to the examples/TestParfor folder you copied from the .tar.gz configuration file

● Launch TestParFor.m using glmnPBS.m

>> cluster = parcluster('guillimin')>> job = glmnPBS.submitTo(cluster)

33

Make sure you are in thecorrect directory

>> cluster = parcluster('guillimin')>> job = glmnPBS.submitTo(cluster)

This script runs for ~15 minutes.You may use showq or the jobmonitor to monitor its progress.

34

Exercise Codes

● While your job is waiting/running...

● Please download and extract the exercise codes from our website

● http://www.hpc.mcgill.ca/downloads/

intro_mdcs/dec2015.tar.gz

35

Parallel Matlab● Benefits of parallelism

– Computations complete faster

– Scale to larger data sets in the same amount of time

– Work with larger data sets using distributed memory

36

Parallel Matlab

● Implicit (automatic) parallelism– Bioinformatics toolbox

– Image processing toolbox

– optimization toolbox

– signal processing toolbox

– statistics toolbox

– etc...

● Explicit parallelism– parallel toolbox

● parfor● spmd● distributed()

37

TestParfor.mfunction TestParfor; clear all; N=4000; filename='~/output_test_parfor.txt';outfile = fopen(filename,'w');fprintf(outfile, 'CALCULATION LOG: \n\n'); tic;for k=1:10 Ham(:,:,k)=rand(N)+i*rand(N); fprintf(outfile,'Serial: Doing K-point : %3i\n', k); inv(Ham(:,:,k));endt2=toc; fprintf(outfile, 'Time serial = %12f\n', t2);fclose(outfile); tic;parfor k=1:10 Ham(:,:,k)=rand(N)+i*rand(N); outfile = fopen(filename,'a'); fprintf(outfile,'Parallel: Doing K-point : %3i\n', k); fclose(outfile); inv(Ham(:,:,k));end t2=toc;outfile = fopen(filename,'a'); fprintf(outfile, 'Time parallel = %12f\n', t2);fprintf(outfile, 'CALCULATIONS DONE ... \n\n'); fclose(outfile);

Serial 'for' loop executed on headprocessor

Parallel 'parfor' loop executed on2 worker nodes

Location of output file on Guillimin

38

Parfor

i = 1 i = 2 i = 3 i = 4

Time

Serial for loopi = 1 i = 2 i = 3 i = 4

Time

Serial for loop

Time

Parallel parfor loopwith 4 workers

i = 1

i = 2

i = 3

i = 4

Time

39

~/output_test_parfor.txtCALCULATION LOG:

Serial: Doing K-point : 1Serial: Doing K-point : 2Serial: Doing K-point : 3Serial: Doing K-point : 4Serial: Doing K-point : 5Serial: Doing K-point : 6Serial: Doing K-point : 7Serial: Doing K-point : 8Serial: Doing K-point : 9Serial: Doing K-point : 10Time serial = 553.056296Parallel: Doing K-point : 7Parallel: Doing K-point : 4Parallel: Doing K-point : 6Parallel: Doing K-point : 3Parallel: Doing K-point : 5Parallel: Doing K-point : 2Parallel: Doing K-point : 1Parallel: Doing K-point : 9Parallel: Doing K-point : 8Parallel: Doing K-point : 10Time parallel = 291.879429CALCULATIONS DONE ...

Ideal speedup = 2.00XActual speedup = 1.90X

Serial 'for' loop executed on headprocessor

Parallel 'parfor' loop executed on2 worker nodes

40

Parfor loops● Loop index must be consecutive integers

– Cannot be altered in the loop

● Iterations must be independent from one another– Local or temporary variables modified inside the

parfor loop can't be used after the for loop

● Cannot nest parfor loops– Don't need to be the outermost for loop

● Matlab editor will automatically warn about problems

41

Load Balancing

● Each iteration of the for loop should do an equal amount of work

Good load balancing:

parfor i = 1: 40 x = rand(1000, 1000); inv(x);end

Bad load balancing:

parfor i = 1: 40 x = rand(100*i, 100*i); inv(x);end

40th iteration has much more workthan 1st iteration

42

Parallel Reduction

>> s = 0;>> parfor i = 1:40>> s = s + 1;>> end>> disp(s) 820

● Operation will be done 'atomically'● Operation must be associative

● e.g. addition or multiplication● not subtraction or division

43

Aside: Atomic Operations>> s = 0;>> parfor i = 1:40>> s = s + 1;>> end>> disp(s) 820

Step 1: Read s from memoryStep 2: add 1Step 3: Store result in s

s

0

0

0

1

1

Worker 2

0

0+1

Worker 1

0

0+1

non-atomic addition

1

23

1

23

44

Aside: Atomic Operations>> s = 0;>> parfor i = 1:40>> s = s + 1;>> end>> disp(s) 820

Step 1: Read s from memoryStep 2: add 1Step 3: Store result in s

s

0

0

0

1

1

Worker 2

0

0+1

Worker 1

0

0+1

s

0

0

0

1

1

1

1

2

Worker 2

1

1+1

Worker 1

0

0+1

non-atomic addition atomic addition

1

23

1

23

1

23

1

23

45

Aside: Atomic Operations>> s = 0;>> parfor i = 1:40>> s = s + 1;>> end>> disp(s) 820

Step 1: Read s from memoryStep 2: add 1Step 3: Store result in s

s

0

0

0

1

1

1

1

2

Worker 2

1

1+1

Worker 1

0

0+1

atomic addition

1

23

1

23

Matlab calls 's' a 'reduction variable' and these operations are automaticallyatomic.

http://www.mathworks.com/help/distcomp/reduction-variables.html

46

Parallel Concatenation

>> y = [];>> parfor i = 1:10>> y = [y, i] ;>> end>> disp(y) 1 2 3 4 5 6 7 8 9 10

● Matrix is stored in 'correct' order according to index i

47

Parameter Sweep● Damped harmonic

oscillator

● Give initial velocity for a variety of k's and b's and watch maximum response amplitude

48

Exercise 6: Parameter Sweep● paramsweep.m solves a second-order

ordinary differential equation (ODE) for varying parameter values

● Modify this code to run in parallel on 2 workers

● Submit your modified code to the MDCS

● Retrieve the resulting plot from Guillimin using scp and view it on your laptop[laptop]$ scp \ class07@guillimin.hpc.mcgill.ca:~/paramsweep.png ./

49

50

Single Program Multiple Data● spmd command allows each worker to

execute the same program on different data

● Variables labindex and numlabs are (for example) used to index the data– Automatically defined inside SPMD sections

● Functions labsend() and labreceive() are used to send and receive data between the workers

51

>> matlabpool(3) % Or parpool(3) on newer versions of MatlabStarting matlabpool using the 'local' profile ... connected to 3 workers.>> spmdlabindexendLab 1: ans = 1 Lab 2: ans = 2 Lab 3: ans = 3 >> spmdq = magic(labindex + 2);end

>> q q = Lab 1: class = double, size = [3 3] Lab 2: class = double, size = [4 4] Lab 3: class = double, size = [5 5] >> q{1}

ans =

8 1 6 3 5 7 4 9 2

>> q{2}

ans =

16 2 3 13 5 11 10 8 9 7 6 12 4 14 15 1

52

SPMD data load example● SPMD can be used to have each worker

process data from separate files

● Example, process data stored in files datafile1.mat, datafile2.mat, etc...

spmd infile = load(['datafile' num2str(labindex) '.mat']); result = myfunc(infile)end

53

Serial numerical integration

m = 10;b = pi/2;dx = b/m;x = dx/2:dx:b-dx/2;int = sum(cos(x)*dx)

54

SPMD Integral● We would like to parallelize this integral using

spmd

● In terms of m, b, numlabs and labindex:– How many increments per lab?

– Integration length per lab?

– Local integration range?

● We can use gplus() to perform a global sum over workers

55

SPMD Integral

● We would like to parallelize this integral using spmd

● In terms of m, b, numlabs and labindex:– How many increments per lab?

● n = m / numlabs

– Integration length per lab?● Delta = dx * n = (b / m) * (m / numlabs) = b / numlabs

– Local integration range?● ai = (labindex – 1) * Delta● bi = labindex * Delta

● We can use gplus(int, 1) to perform a global sum over int from each worker

56

SPMD Integral

e.g.) m = 10, numlabs = 5n = 10/5 = 2 increments per labDelta = (pi/2)/5 = pi/10ai = (labindex-1)*pi/10bi = labindex*pi/10

Sum over increments for a worker:int = sum(cos(x)*dx);Global sum over all workers:int = gplus(int, 1);

57

Exercise 7: Numerical Integration● integration.m is a serial numerical

integration program

● Modify this code to run in parallel using the spmd command

● Submit your modified code to the MDCS using 2 workers

58

Distributed Arrays

[a;e;i;

m]

[b; f;j;

n]

[c;g; k; o]

[d; h; l;p]

matlabpool(4)A = distributed([ a b c d; e f g h; i j k l; m n o p]);

1 21

3 4

MDCS Workers

59

Distributed Arrays

● Allow large data sets to be distributed over multiple nodes● Distributed by columns● Can be constructed by

– partitioning a large array already in memory

– combining smaller arrays into one large array

– using distributed matrix constructor functions (distributed.rand(), distributed.zeros(), etc.)

● Operations on distributed arrays are automatically parallelized● Arrays do not persist if the matlabpool is closed

60

Codistributed Arrays● Codistributed arrays provide much more

control over how arrays are distributed– Can be distributed by any dimension

– Can distribute different amounts of data to different workers

● Codistributed arrays can be declared inside spmd sections

61

Exercise 8: Matrix Multiplication● matrixmul.m is a serial matrix multiplication

● Modify this file to use distributed arrays– create distributed random arrays a, b

– time a matrix multiplication: tic; c = a*b; toc

● Submit the job for 1 worker and then for 4 workers– What is the speedup (serial time / parallel time)?

62

Using GPUs with Matlab● The Parallel Computing Toolbox can utilize

CUDA-capable GPUs on the system (e.g. the K20s on Guillimin)

● GPU-enabled functions– fft, filter

– toolbox functions

● Linear-algebra operations

● Custom CUDA kernels – .cu or .ptx formats

63

GPU Arrays● Matlab can copy arrays to the GPU

● Perform matrix operations on the GPU to speed them up

● e.g.– x = rand(1000, 'single', 'gpuArray');

– x2 = x.*x; %performed on GPU

64

Exercise 9: GPU Job

● fourier.m is a serial fast Fourier transform (FFT) code

● Modify this file to perform the same calculation using normal and GPU Arrays

● Use tic and toc to time both operations and output the results

● Submit this job to a Guillimin GPU node– Hint: Simply request in glmnPBS.m

numberOfNodes = 1;

procsPerNode = 1;

gpus = 1;

● What is the speedup from the GPU?

65

Summary● Today we learned:

– How to configure a desktop installation of Matlab to submit jobs to a cluster computer using MDCS

– How to submit jobs to a cluster and monitor their output

– How to write parallel Matlab applications using parfor, spmd, and distributed arrays

● Many Matlab programs can be parallelized with a very small change

● Note that parallel programming is a huge topic and we have only scratched the surface!

66

Questions

What questions do you have?

67

Using Xeon Phi with Matlab

● Matlab uses the Intel MKL math library

● Version >= 11.0 of MKL has automatic offloading to Xeon Phi– Included in Matlab R2014a and newer

● On Guillimin:module add ifort_icc

export MKL_MIC_MAX_MEMORY=16G

export MKL_MIC_ENABLE=1

matlab &

top related