parallelization with the matlab® distributed computing server (mdcs) @ cbi cluster

30
Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

Upload: lindsay-cobb

Post on 26-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

Page 2: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

2

Overview• Parallelization with Matlab using Parallel

Computing Toolbox(PCT)

• Matlab Distributed Computing Server Introduction

• Benefits of using the MDCS

• Hardware/Software/Utilization @ CBI

• MDCS Usage Scenarios

• Hands-on Training

Page 3: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

3

Parallelization with Matlab PCT• The Matlab Parallel Computing Toolbox provides

access to multi-core, multi-system(MDCS), GPU parallelism.

• Many built-in Matlab functions directly support parallelism ( e.g. FFT ) transparently.

• Parallel constructs such as going from for loops to parfor loops.

• Allows handling of many different types of parallel software development challenges.

• MDCS allows scaling of locally developed parallel enabled Matlab applications.

Page 4: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

4

Parallelization with Matlab PCT• Distributed / Parallel algorithm characteristics

– Memory Usage & CPU Usage

• Load a 4 Gigabyte file into Memory Calculate averages

– Communication/Data IO patterns

• Read file 1 ( 10 Gigabytes ) Run a function

• Worker B Send data to worker A run a function return data to worker B

– Dependencies

• Function 1 Function 2 Function 3

• Hardware resource contention ( e.g. 16 cores each trying to read /write a set of files, bandwidth limitations on RAM )

• Managing large #’s of small files Filesystem contention

Page 5: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

5

Parallelization with Matlab PCT

GPU Cards/External Accelerator Cards

CPU’s, Multi-Cores

Clusters

Applications have layers of parallelism:For optimal solution, must look at the application as a whole.

Scalability: use as many workers as possible in an efficient manner

Matlab PCT + MDCS framework automates much of the complexity in developing parallel & distributed apps

Page 6: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

6

Parallelization with Matlab PCT & MDCS

Distributed loops: parfor

Interactive development mode(matlabpool/pmode)

Distributed Arrays(spmd)

CPU’s, Multi-Cores MDCS Cluster

Scale out with the MDCS Cluster in Batch Job Submission Mode

Page 7: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

7

MDCS BenefitsMDCS Worker Processes ( a.k.a. “Labs”)

– The workers never request regular Matlab or toolbox licenses.

– The only license an MDCS worker ever uses is an MDCS worker license( of which we have up to 64 ).

– Toolboxes are unlocked to an MDCS worker based on the licenses owned by the client during the job submission process.

– Wonderful parallel algorithm development environment with the superior visualization & profiling capabilities of the Matlab environment.

– Many built-in functions are parallel enabled: fft, lu, svd…

– Distributed arrays allow development of data – parallel algorithms

– Enable the scaling of codes that cannot be compiled using the Matlab Compiler Toolbox.

– Allows you to go from development on a laptop directly to running on up to 64 MDCS Labs. ( Some simulations can go from years of runtime to days of runtime on 64 MDCS Labs)

Page 8: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

8

MDCS Structure

Page 9: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

9

Hardware/Software/Utilization @ CBI

MDCS worker processes run on 4 physical servers Dell PowerEdge M910: Four x 16 core systems,

4x64GB RAM, 2x Intel Xeon 2.26 Ghz/system with 8 cores per processor

Total of 64 cores, with 256 GB total RAM distributed among systems

Max 64 MDCS worker licenses available Subsets of MDCS workers can be created based on

project needs

Page 10: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

10

Usage scenarios Local system: Interactive Use: ( matlabpool /

spmd / pmode / mpiprofile ) – Local system(e.g. one of the Workstations @ CBI ) as part of initial

algorithm development.

MDCS: Non-interactive Use: Job&Task based– 2 main types: Independent vs. Communicating Jobs

• Both types can be used with either the local( on a non-cluster workstation ) or MDCS profile.

Page 11: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

11

MDCS Workloads2 main types of workloads can be implemented with the MDCS:

– A job is logically decomposed into a set of tasks. The job may have 1 or more tasks, and each task may or may not have additional parallelism within it.

CASE 1: Independent Within a job the parallelism is fully independent, we have the opportunity to

use MDCS workers to offload some of the independent work units. The code will not make use parallel language features such as parfor, spmd. Note: In many cases, parfor can be transformed into a set of tasks.

– createJob() + createTask(), createTask(), … createTask()

CASE 2: Communicating Within a single job the parallelism is more complex, requiring the workers to

communicate or when parfor, spmd, codistributed arrays(language features are used from Parallel Compute Toolbox).

– createCommunicatingJob(), createTask()

Page 12: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

12

MDCS Working Environment

Page 13: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

13

MDCS Working Environment

Page 14: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

14

Interactive Mode Sample(parfor)For well mapping workloads, parfor can yield exceptional performance improvement

From years to days / days to hours for certain workloads: ideally case are long running jobs with little or no inter-job communication.

Parfor enabled on the MDCS

Standard for loop

Page 15: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

15

MDCS Scaling ( Batch Mode )

Page 16: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

16

MDCS Scaling( Batch mode )

Page 17: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

17

MDCS Scaling ( Batch mode )

Page 18: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

18

Summary

• Applied examples of using MDCS in Batch mode available as part of hands-on section or via consulting appointment for more in-depth MDCS usage information.

• We can allocate a subset of MDCS workers on a per project basis.

Page 19: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

19

Summary

• Wonderful parallel algorithm design & development environment

• Scale out codes up to 64 Matlab MDCS workers– Both distributed compute & memory

• Standard Matlab+Toolbox license usage minimization

• Many options to approach parallelization of computational workloads.

Page 20: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

20

Acknowledgements

• This project received computational, research & development, software design/development support from the Computational System Biology Core/Computational Biology Initiative, funded by the National Institute on Minority Health and Health Disparities (G12MD007591) from the National Institutes of Health. URL: http://www.cbi.utsa.edu

Page 21: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

21

Contact Us

http://cbi.utsa.edu

Page 22: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

22

Appendix A

Page 23: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

23

Local Mode: Matlab Worker Process/Thread Structure

Parallel Toolbox constructs can be tested in local mode, the “lab” abstraction allows the actual process used for a lab to reside either locally or on a distributed server node.

MPI usedfor inter-process communication between “Labs”, Matlab Worker Processes

Page 24: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

24

Local Mode Scaling Sample(parfor)

Page 25: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

25

Interactive Mode Sample(pmode/spmd)

Each lab handles a piece of the data.

Results are gathered on lab 1.

Client session requests the complete data set to be sent to it using lab2client

Page 26: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

26

Local vs. MDCS Mode Compare (parfor)

Page 27: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

27

Appendix B: MDCS Access

• Access to MDCS provided via Cheetah Cluster.– On Linux: ssh –Y [email protected]– qlogin – matlab &

Page 28: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

28

Appendix B: MDCS Access• Access to MDCS provided via Cheetah Cluster.– On Windows: Using PuTTY + Xming w/X11

forwarding– qlogin – matlab &

Page 29: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

29

References[1] http://www.mathworks.com/products/parallel-computing/ ( Parallel Computing Toolbox reference )[2] http://www.mathworks.com/help/toolbox/distcomp/f1-6010.html#brqxnfb-1 (Parallel Computing Toolbox)[3] http://www.mathworks.com/products/parallel-computing/builtin-parallel-support.html ( Parallel Computing Toolbox )[4] http://www.mathworks.com/products/distriben/supported/license-management.html ( MDCS License Management )[5] http://www.mathworks.com/cmsimages/dm_interact_wl_11322.jpg ( MDCS Architecture Overview )[6] http://www.mathworks.com/cmsimages/62006_wl_mdcs_fig1_wl.jpg ( MDCS Architecture Overview: Scalability )[7] http://www.mathworks.com/products/parallel-computing/builtin-parallel-support.html ( Built-in MDCS support )[8] http://www.mathworks.com/products/datasheets/pdf/matlab-distributed-computing-server.pdf ( MDCS Licensing )[9] http://www.psc.edu/index.php/matlab ( MDCS @ PCS)[10] http://www.mathworks.com/products/compiler/supported/compiler_support.html ( Compiler Support for MATLAB and Toolboxes )[11] http://www.mathworks.com/support/solutions/en/data/1-2MC1RY/?solution=1-2MC1RY ( SGE Integration )[12] http://www.mathworks.com/company/events/webinars/wbnr30965.html?id=30965&p1=70413&p2=70415 ( MDCS Administration )[13] http://www.mathworks.com/help/toolbox/mdce/f4-10664.html ( General MDCE Workflow )[14] http://www.mathworks.com/help/toolbox/distcomp/f3-10664.html ( Independent Jobs with MDCS )[15] http://cac.engin.umich.edu/swafs/training/pdfs/matlab.pdf ( MDCS @ Umich ) [16] http://www.mathworks.com/products/optimization/examples.html?file=/products/demos/shipping/optim/optimparfor.html ( Optimization toolbox example )[17] http://www.mathworks.com/products/distriben/examples.html ( MDCS Examples )[18] http://www.mathworks.com/support/product/DM/installation/ver_current/ ( MDCS Installation Guide R2012a )[19] http://www.psc.edu/index.php/matlab ( MDCS @ PSC )[20] http://rcc.its.psu.edu/resources/software/dmatlab/ ( MDCS @ Penn State )[21] http://ccr.buffalo.edu/support/software-resources/compilers-programming-languages/matlab/mdcs.html ( MDCS @ U of Buffalo)[22] http://www.cac.cornell.edu/wiki/index.php?title=Running_MDCS_Jobs_on_the_ATLAS_cluster ( MDCS @ Cornell )[23] http://www.mathworks.com/products/distriben/description3.html ( MDCS Licensing )[24] http://www.mathworks.com/cmsimages/dm_interact_wl_11322.jpg ( MDCS Architecture )

Page 30: Parallelization with the Matlab® Distributed Computing Server (MDCS) @ CBI cluster

30

References[25] http://www.mathworks.com/access/helpdesk/help/toolbox/distcomp/bqxooam-1.html ( Built-in functions that work with distributed arrays )[26] http://www.rz.rwth-aachen.de/aw/cms/rz/Themen/hochleistungsrechnen/nutzung/nutzung_des_rechners_unter_windows/~sxm/

MATLAB_Parallel_Computing_Toolbox/?lang=de ( MDCS @ Aachen University )[27] http://www.mathworks.com/support/solutions/en/data/1-9D3XVH/index.html?solution=1-9D3XVH ( Compiled Matlab Applications using PCT + MDCS)[28] http://www.hpc.maths.unsw.edu.au/tensor/matlab ( MDCS @ UNSW )[29] http://blogs.mathworks.com/loren/2012/04/20/running-scripts-on-a-cluster-using-the-batch-command-in-parallel-computing-toolbox/ ( Batch command )[30] http://www.rcac.purdue.edu/userinfo/resources/peregrine1/userguide.cfm#run_pbs_examples_app_matlab_licenses_strategies ( MDCS @ Purdue )[31] http://www.mathworks.com/help/pdf_doc/distcomp/distcomp.pdf ( Parallel Computing Toolbox R2012a )[32] http://www.nccs.nasa.gov/matlab_instructions.html ( MDCS @ Nasa )[33] http://www.mathworks.com/help/toolbox/distcomp/rn/bs8h9g9-1.html ( PCT, MDCS R2012a interface changes )[34] http://www.mathworks.com/help/toolbox/distcomp/createcommunicatingjob.html ( Communicating jobs )[35] http://www.mathworks.com/products/parallel-computing/examples.html?file=/products/demos/shipping/distcomp/paralleltutorial_dividing_tasks.html

( Moving parfor loops to jobs+tasks )[36] http://people.sc.fsu.edu/~jburkardt/presentations/fsu_2011_matlab_tasks.pdf ( MDCS @ FSU: Task based parallelism )[37] http://www.icam.vt.edu/Computing/fdi_2012_parfor.pdf ( MDCS @ Virginia Tech: Parfor parallelism )[38] http://www.hpc.fsu.edu/ ( MDCS @ FSU, HPC main site )[39] http://www.mathworks.com/help/toolbox/distcomp/rn/bs8h9g9-1.html ( PCT Updates in R2012a )[40] http://www.mathworks.com/help/distcomp/using-matlab-functions-on-codistributed-arrays.html ( Built in functions available for Co-Distributed arrays )[41] http://scv.bu.edu/~kadin/Tutorials/PCT/matlab-pct.html ( Matlab PCT @ Boston University )[42] http://www.circ.rochester.edu/wiki/index.php/MatlabWorkshop#Example_using_distributed_arrays_for_FFT[43] http://www.advancedlinuxprogramming.com/alp-folder/alp-ch04-threads.pdf[44] http://www.mathworks.com/products/distriben/parallel/accelerate.html[45] http://www.mathworks.com/products/distriben/examples.html?file=/products/parallel-computing/includes/parallel.html[46] http://en.wikipedia.org/wiki/Gustafson%27s_law[47] http://www.mathworks.com/help/distcomp/index.html[48] http://www.mathworks.com/cmsimages/43623_wl_dm_using_paralles_forloops_wl.jpg[49] http://www.mathworks.com/help/distcomp/mpiprofile.html