openmp.eric
TRANSCRIPT
-
7/30/2019 OpenMP.eric
1/22
Introduction to OpenMPEric Aubanel
Advanced Computational Research LaboratoryFaculty of Computer Science, UNBFredericton, New Brunswick
-
7/30/2019 OpenMP.eric
2/22
Shared Memory
Address space
Processes
-
7/30/2019 OpenMP.eric
3/22
Shared Memory Multiprocessor
-
7/30/2019 OpenMP.eric
4/22
Distributed vs. DSM
Address space
Processes
Address space
Processes
Address space
Processes
Network
Memory
Processes
Memory
Processes
Memory
Processes
Network - Global address space
-
7/30/2019 OpenMP.eric
5/22
Parallel ProgrammingAlternatives
Use a new programming language
Use a existing sequential language modifiedto handle parallelism Use a parallelizing compiler Use library routines/compiler directives
with an existing sequential language Shared memory (OpenMP) vs. distributed
memory (MPI)
-
7/30/2019 OpenMP.eric
6/22
What is Shared Memory Parallelization?
All processors can access all the memory in the parallelsystem (one address space).
The time to access the memory may not be equal for all
processors not necessarily a flat memory Parallelizing on a SMP does not reduce CPU time
it reduces wallclock time
Parallel execution is achieved by generating multiplethreads which execute in parallel
Number of threads (in principle) is independent of thenumber of processors
-
7/30/2019 OpenMP.eric
7/22
Threads: The Basis of SMP Parallelization
Threads are not full UNIX processes. They are lightweight,independent "collections of instructions" that executewithin a UNIX process.
All threads created by the same process share the sameaddress space. a blessing and a curse: "inter-thread" communication is efficient,
but it is easy to stomp on memory and create race conditions.
Because they are lightweight, they are (relatively)inexpensive to create and destroy. Creation of a thread can take three orders of magnitude less time
than process creation!
Threads can be created and assigned to multipleprocessors: This is the basis of SMP parallelism!
-
7/30/2019 OpenMP.eric
8/22
Processes vs. Threads
ProcessIP
stack
codeheap
IP
stack
codeheap
IP
stack
Threads
-
7/30/2019 OpenMP.eric
9/22
Methods of SMP Parallelism
1. Explicit use of threads Pthreads: see "Pthreads Programming" from O'Reilly &
Associates, Inc.
2. Using a parallelizing compiler and its directives, you cangenerate pthreads "under the covers." can use vendor-specific directives (e.g. !SMP$) can use industry-standard directives (e.g. !$OMP and
OpenMP)
-
7/30/2019 OpenMP.eric
10/22
OpenMP 1997: group of hardware and software vendors announced
their support for OpenMP, a new API for multi-platformshared-memory programming (SMP) on UNIX andMicrosoft Windows NT platforms. www.openmp.org
OpenMP provides comment-line directives, embedded inC/C++ or Fortran source code, for scoping data specifying work load
synchronization of threads OpenMP provides function calls for obtaining information
about threads. e.g., omp_num_threads(), omp_get_thread_num()
-
7/30/2019 OpenMP.eric
11/22
OpenMP exampleSubroutine saxpy(z, a, x, y, n)integer i, nreal z(n), a, x(n), y!$omp parallel dodo i = 1, n
z(i) = a * x(i) + yend do
returnend
-
7/30/2019 OpenMP.eric
12/22
OpenMP Threads1.All OpenMP programs begin as a singleprocess: the master thread
2.FORK : the master thread then creates a
team of parallel threads
3.Parallel region statements executedin parallel among the various teamthreads
4.JOIN : threadssynchronize and terminate, leaving onlythe master thread
-
7/30/2019 OpenMP.eric
13/22
Private vs Shared Variablesz a x y n iGlobal shared
memory
Serial executionAll data references to globalshared memory
z a x y nGlobal sharedmemory
Parallel execution References to z, a, x, y, n areto global shared memory
i i i i
Each thread has aprivate copy of i
References to i areto the private copy
-
7/30/2019 OpenMP.eric
14/22
Division of Work
Subroutine saxpy(z, a, x, y, n)integer i, nreal z(n), a, x(n), y!$omp parallel dodo i = 1, n
z(i) = a * x(i) + yend doreturnend
i = 11, 20
n = 40, 4 threads
i = 21, 30 i = 31, 40i = 1, 10
Z(10)
n
Z(11) Z(20) Z(21)Z(1) Z(30) a
y
Z(31) Z(40)
X(10) X(11) X(20) X(21)X(1) X(30) X(31) X(40)
Global shared memory
local memory
-
7/30/2019 OpenMP.eric
15/22
Variable Scoping The most difficult part of shared memory parallelization.
What memory is shared What memory is private (i.e. each processor has its own copy) How private memory is treated vis vis the global address space.
Variables are shared by default, except for loop index inparallel do
This must mesh with the Fortran view of memory Global: shared by all routines
Local: local to a given routine saved vs. non-saved variables (through the SAVE
statement or -save option)
-
7/30/2019 OpenMP.eric
16/22
Static vs. Automatic Variables Fortran 77 standard allows subprogram
local variables to become undefinedbetween calls, unless saved with a SAVEstatement
STATIC AUTOMATIC
AIX (default) -qnosaveIRIX -static -automatic (default)SunOS (default) -statckvar
-
7/30/2019 OpenMP.eric
17/22
OpenMP Directives in FortranLine continuation:Fixed form:
!$OMP PARALLEL DO
!$OMP&PRIVATE (JMAX)!$OMP&SHARED(A, B)
Free form:!$OMP PARALLEL DO &!$OMP PRIVATE (JMAX) &!$OMP SHARED(A, B)
-
7/30/2019 OpenMP.eric
18/22
-
7/30/2019 OpenMP.eric
19/22
OpenMP Overhead Overhead for parallelization is large (eg.
8000 cycles for parallel do over 16processors of SGI Origin 2000) size of parallel work construct must be
significant enough to overcome overhead rule of thumb: it takes 10 kFLOPS to amortize
overhead
-
7/30/2019 OpenMP.eric
20/22
OpenMP UseHow is OpenMP typically used? OpenMP is usually used to parallelize
loops: Find your most time consuming loops. Split them up between threads.
Better scaling can be obtained usingOpenMP parallel regions, but can betricky!
-
7/30/2019 OpenMP.eric
21/22
OpenMP vs. MPI Only for shared memory
computers Easy to incrementally
parallelize More difficult to write highly
scalable programs Small API based on compiler
directives and limited libraryroutines
Same program can be used for
sequential and parallelexecution Shared vs private variables can
cause confusion
Portable to all platforms Parallelize all or nothing Vast collection of library
routines
Possible but difficult to usesame program for serial andparallel execution
variables are local to eachprocessor
-
7/30/2019 OpenMP.eric
22/22
References
Parallel Programming in OpenMP, byChandra et al. (Morgan Kauffman)
www.openmp.org Multimedia tutorial at Boston University:
scv.bu.edu/SCV/Tutorials/OpenMP/
Lawrence Livemore online tutorial www.llnl.gov/computing/training/
European workshop on OpenMP (EWOMP) www.epcc.ed.ac.uk/ewomp2000/