openmp.eric

Upload: atmabuddhimanas

Post on 04-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 OpenMP.eric

    1/22

    Introduction to OpenMPEric Aubanel

    Advanced Computational Research LaboratoryFaculty of Computer Science, UNBFredericton, New Brunswick

  • 7/30/2019 OpenMP.eric

    2/22

    Shared Memory

    Address space

    Processes

  • 7/30/2019 OpenMP.eric

    3/22

    Shared Memory Multiprocessor

  • 7/30/2019 OpenMP.eric

    4/22

    Distributed vs. DSM

    Address space

    Processes

    Address space

    Processes

    Address space

    Processes

    Network

    Memory

    Processes

    Memory

    Processes

    Memory

    Processes

    Network - Global address space

  • 7/30/2019 OpenMP.eric

    5/22

    Parallel ProgrammingAlternatives

    Use a new programming language

    Use a existing sequential language modifiedto handle parallelism Use a parallelizing compiler Use library routines/compiler directives

    with an existing sequential language Shared memory (OpenMP) vs. distributed

    memory (MPI)

  • 7/30/2019 OpenMP.eric

    6/22

    What is Shared Memory Parallelization?

    All processors can access all the memory in the parallelsystem (one address space).

    The time to access the memory may not be equal for all

    processors not necessarily a flat memory Parallelizing on a SMP does not reduce CPU time

    it reduces wallclock time

    Parallel execution is achieved by generating multiplethreads which execute in parallel

    Number of threads (in principle) is independent of thenumber of processors

  • 7/30/2019 OpenMP.eric

    7/22

    Threads: The Basis of SMP Parallelization

    Threads are not full UNIX processes. They are lightweight,independent "collections of instructions" that executewithin a UNIX process.

    All threads created by the same process share the sameaddress space. a blessing and a curse: "inter-thread" communication is efficient,

    but it is easy to stomp on memory and create race conditions.

    Because they are lightweight, they are (relatively)inexpensive to create and destroy. Creation of a thread can take three orders of magnitude less time

    than process creation!

    Threads can be created and assigned to multipleprocessors: This is the basis of SMP parallelism!

  • 7/30/2019 OpenMP.eric

    8/22

    Processes vs. Threads

    ProcessIP

    stack

    codeheap

    IP

    stack

    codeheap

    IP

    stack

    Threads

  • 7/30/2019 OpenMP.eric

    9/22

    Methods of SMP Parallelism

    1. Explicit use of threads Pthreads: see "Pthreads Programming" from O'Reilly &

    Associates, Inc.

    2. Using a parallelizing compiler and its directives, you cangenerate pthreads "under the covers." can use vendor-specific directives (e.g. !SMP$) can use industry-standard directives (e.g. !$OMP and

    OpenMP)

  • 7/30/2019 OpenMP.eric

    10/22

    OpenMP 1997: group of hardware and software vendors announced

    their support for OpenMP, a new API for multi-platformshared-memory programming (SMP) on UNIX andMicrosoft Windows NT platforms. www.openmp.org

    OpenMP provides comment-line directives, embedded inC/C++ or Fortran source code, for scoping data specifying work load

    synchronization of threads OpenMP provides function calls for obtaining information

    about threads. e.g., omp_num_threads(), omp_get_thread_num()

  • 7/30/2019 OpenMP.eric

    11/22

    OpenMP exampleSubroutine saxpy(z, a, x, y, n)integer i, nreal z(n), a, x(n), y!$omp parallel dodo i = 1, n

    z(i) = a * x(i) + yend do

    returnend

  • 7/30/2019 OpenMP.eric

    12/22

    OpenMP Threads1.All OpenMP programs begin as a singleprocess: the master thread

    2.FORK : the master thread then creates a

    team of parallel threads

    3.Parallel region statements executedin parallel among the various teamthreads

    4.JOIN : threadssynchronize and terminate, leaving onlythe master thread

  • 7/30/2019 OpenMP.eric

    13/22

    Private vs Shared Variablesz a x y n iGlobal shared

    memory

    Serial executionAll data references to globalshared memory

    z a x y nGlobal sharedmemory

    Parallel execution References to z, a, x, y, n areto global shared memory

    i i i i

    Each thread has aprivate copy of i

    References to i areto the private copy

  • 7/30/2019 OpenMP.eric

    14/22

    Division of Work

    Subroutine saxpy(z, a, x, y, n)integer i, nreal z(n), a, x(n), y!$omp parallel dodo i = 1, n

    z(i) = a * x(i) + yend doreturnend

    i = 11, 20

    n = 40, 4 threads

    i = 21, 30 i = 31, 40i = 1, 10

    Z(10)

    n

    Z(11) Z(20) Z(21)Z(1) Z(30) a

    y

    Z(31) Z(40)

    X(10) X(11) X(20) X(21)X(1) X(30) X(31) X(40)

    Global shared memory

    local memory

  • 7/30/2019 OpenMP.eric

    15/22

    Variable Scoping The most difficult part of shared memory parallelization.

    What memory is shared What memory is private (i.e. each processor has its own copy) How private memory is treated vis vis the global address space.

    Variables are shared by default, except for loop index inparallel do

    This must mesh with the Fortran view of memory Global: shared by all routines

    Local: local to a given routine saved vs. non-saved variables (through the SAVE

    statement or -save option)

  • 7/30/2019 OpenMP.eric

    16/22

    Static vs. Automatic Variables Fortran 77 standard allows subprogram

    local variables to become undefinedbetween calls, unless saved with a SAVEstatement

    STATIC AUTOMATIC

    AIX (default) -qnosaveIRIX -static -automatic (default)SunOS (default) -statckvar

  • 7/30/2019 OpenMP.eric

    17/22

    OpenMP Directives in FortranLine continuation:Fixed form:

    !$OMP PARALLEL DO

    !$OMP&PRIVATE (JMAX)!$OMP&SHARED(A, B)

    Free form:!$OMP PARALLEL DO &!$OMP PRIVATE (JMAX) &!$OMP SHARED(A, B)

  • 7/30/2019 OpenMP.eric

    18/22

  • 7/30/2019 OpenMP.eric

    19/22

    OpenMP Overhead Overhead for parallelization is large (eg.

    8000 cycles for parallel do over 16processors of SGI Origin 2000) size of parallel work construct must be

    significant enough to overcome overhead rule of thumb: it takes 10 kFLOPS to amortize

    overhead

  • 7/30/2019 OpenMP.eric

    20/22

    OpenMP UseHow is OpenMP typically used? OpenMP is usually used to parallelize

    loops: Find your most time consuming loops. Split them up between threads.

    Better scaling can be obtained usingOpenMP parallel regions, but can betricky!

  • 7/30/2019 OpenMP.eric

    21/22

    OpenMP vs. MPI Only for shared memory

    computers Easy to incrementally

    parallelize More difficult to write highly

    scalable programs Small API based on compiler

    directives and limited libraryroutines

    Same program can be used for

    sequential and parallelexecution Shared vs private variables can

    cause confusion

    Portable to all platforms Parallelize all or nothing Vast collection of library

    routines

    Possible but difficult to usesame program for serial andparallel execution

    variables are local to eachprocessor

  • 7/30/2019 OpenMP.eric

    22/22

    References

    Parallel Programming in OpenMP, byChandra et al. (Morgan Kauffman)

    www.openmp.org Multimedia tutorial at Boston University:

    scv.bu.edu/SCV/Tutorials/OpenMP/

    Lawrence Livemore online tutorial www.llnl.gov/computing/training/

    European workshop on OpenMP (EWOMP) www.epcc.ed.ac.uk/ewomp2000/