parallel programming on the sgi origin2000
DESCRIPTION
Parallel Programming on the SGI Origin2000. Taub Computer Center Technion. Anne Weill-Zrahia. With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI. Mar 2005. Parallel Programming on the SGI Origin2000. Parallelization Concepts SGI Computer Design Efficient Scalar Design - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/1.jpg)
Parallel Programming on theSGI Origin2000
With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI
Taub Computer CenterTechnion
Mar 2005
Anne Weill-Zrahia
![Page 2: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/2.jpg)
Parallel Programming on the SGI Origin2000
1) Parallelization Concepts
2) SGI Computer Design
3) Efficient Scalar Design
4) Parallel Programming -OpenMP
5) Parallel Programming- MPI
![Page 3: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/3.jpg)
4) Parallel Programming-OpenMP
![Page 4: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/4.jpg)
ReadIL500
ReadIL500
IL500IL500
IL500
IL 0
IL 0
IL100
IL350
TakeIL150(WriteIL350)
TakeIL400(WriteIL100)
Limorin Haifa
Shimonin Tel Aviv
Is this your joint bank account?
IL150
IL400
IL350
Initialamount
Finalamount
![Page 5: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/5.jpg)
Introduction
- Parallelization instruction to the compiler: f77 –o prog –mp prog.f Or: f77 –o prog –pfa prog.f
- Now try to understand what a compiler has to determine when deciding how to parallelize
- Note that when loosely talk about parallelization, what is meant is: “Is the program as presented here parallelizable?”
-This is an important distinction, because sometimes rewriting can transform non-parallelizable code into a parallelizable form, as we will see…
![Page 6: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/6.jpg)
Data dependency types1) Iteration i depends on values calculated in the previous iteration i-1 (loop carried dependence) do i=2,n a(i) = a(i-1) cannot be parallelized enddo
2) Data dependence within single iteration (non-loop carried dependence) do i=2,n c = . . . . a(I) = . . . c . . . parallelizable enddo
3) Reduction do i=1,n s = s + x parallelizable enddo
All data dependencies in programs are variations on thesefundamental types.
![Page 7: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/7.jpg)
Data dependency analysis
Question: Are the following loops parallelizable?
do i=2,n a(i) = b(i-1)enddo
do i=2,n a(i) = a(i-1)enddo
YES!
NO!
Why?
![Page 8: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/8.jpg)
Data dependency analysis
do i=2,n a(i) = b(i-1)enddo
YES!
CPU1
CPU2
CPU3
A(2)=B(1)
A(3)=B(2)
A(4)=B(3)
A(5)=B(4)
A(6)=B(5)
A(7)=B(6)
cycle1 cycle2
![Page 9: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/9.jpg)
Data dependency analysis
do i=2,n a(i) = a(i-1)enddo
CPU1 A(2)=A(1)
cycle1
A(3)=A(2)
cycle2
A(4)=A(3)
cycle3
Scalar (non-parallel) run:
A(5)=A(4)
cycle4
In each cycle NEW data from previous cycle is read
![Page 10: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/10.jpg)
Data dependency analysis
do i=2,n a(i) = a(i-1)enddo
No!
CPU1
CPU2
CPU3
A(2)=A(1)
A(3)=A(2)
A(4)=A(3)
cycle1
Will probably readOLD data
![Page 11: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/11.jpg)
Data dependency analysisData dependency analysis
do i=2,n a(i) = a(i-1)enddo
No!
CPU1
CPU2
CPU3
A(2)=A(1)
A(3)=A(2)
A(4)=A(3)
A(5)=A(4)
A(6)=A(5)
A(7)=A(6)
cycle1 cycle2
May read NEW data
Will probably read
OLD data
![Page 12: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/12.jpg)
Data dependency analysis
Another question: Are the following loops parallelizable?
do i=3,n,2 a(i) = a(i-1)enddo
do i=1,n s = s + a(i)enddo
YES!
Depends!
![Page 13: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/13.jpg)
Data dependency analysis
do i=3,n,2 a(i) = a(i-1)enddo
YES!
CPU1
CPU2
CPU3
A(3)=A(2)
A(5)=A(4)
A(7)=A(6)
A(9)=A(8)
A(11)=A(10)
A(13)=A(12)
cycle1 cycle2
![Page 14: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/14.jpg)
Data dependency analysisData dependency analysis
do i=1,n s = s + a(i)enddo
Depends!
CPU1
CPU2
CPU3
S=S+A(1)
S=S+A(2)
S=S+A(3)
S=S+A(4)
S=S+A(5)
S=S+A(6)
cycle1 cycle2
-The value of S will be undetermined and typically it will vary from one run to the next- This bug in parallel programming is called a “race condition”
![Page 15: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/15.jpg)
Data dependency analysis
What is the principle involved here?
The examples shown fall into two categories:
1) Data being read is independent of data that is written: a(i) = b(i-1) i=2,3,4. . . a(i) = a(i-1) i=3,5,7. . .
2) Data being read depends on data that is written: a(i) = a(I-1) i=2,3,4. . . s = s + a(i) i=1,2,3. . .
![Page 16: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/16.jpg)
Data dependency analysis
Here is a typical situation:
Is there a data dependency in the following loop?
do i = 1,n a(i) = sin(x(i)) result = a(i) + b(i) c(i) = result * c(i)enddo
Clearly, “result” is a temporary variable that isreassigned for every iteration.
Note: “result” must be a “private” variable (this will be discussed later).
No!
![Page 17: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/17.jpg)
Data dependency analysis
Here is a (slightly different) typical situation:
Is there a data dependency in the following loop?
do i = 1,n a(i) = sin(result) result = a(i) + b(i) c(i) = result * c(i)enddo
Yes!
The value of “result” is carried over from one iterationto the next.
This is the classical read/write situation but now it is somewhat hidden.
![Page 18: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/18.jpg)
Data dependency analysis
do i = 1,n a(i) = sin(result(i-1)) result(i) = a(i) + b(i) c(i) = result(i) * c(i)enddo
do i = 1,n a(i) = sin(result(i-1)) result(i) = sin(result(i-1)) + b(i) c(i) = result(i) * c(i)enddo
The loop could (symbolically) be rewritten:
Now substitute the expression for a(i):
This is really of the type “a(i)=a(i-1)” !
![Page 19: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/19.jpg)
Data dependency analysis
One more: Can the following loop be parallelized?
do i = 3,n a(i) = a(i-2)enddo
If this is parallelized, there will probably be differentanswers from one run to another.
Why?
![Page 20: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/20.jpg)
Data dependency analysis
CPU1
CPU2
A(3)=A(1)
A(4)=A(2)
A(5)=A(3)
A(6)=A(4)
cycle1 cycle2
do i = 3,n a(i) = a(i-2)enddo
This looks like it will be safe.
![Page 21: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/21.jpg)
Data dependency analysis
CPU1
CPU2
CPU3
A(3)=A(1)
A(4)=A(2)
A(5)=A(3)
cycle1
do i = 3,n a(i) = a(i-2)enddo
HOWEVER: what if there are 3 cpu’s and not 2?
In this case, a(3) isread and written intwo threads at once
![Page 22: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/22.jpg)
RISC memory levels
CPU
Main memory
Cache
Single CPU
![Page 23: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/23.jpg)
RISC memory levels
CPU
Main memory
Cache
Single CPU
![Page 24: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/24.jpg)
RISC memory levels
Main memory
Multiple CPU’s
CPU
Cache 1
CPU0
1
Cache 0
![Page 25: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/25.jpg)
RISC memory levels
Main memory
Multiple CPU’s
CPU
Cache 1
CPU0
1
Cache 0
![Page 26: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/26.jpg)
Main memory
Multiple CPU’s
CPU
Cache 1
CPU0
1
Cache 0
RISC Memory Levels
![Page 27: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/27.jpg)
Definition of OpenMP
- Application Program Interface (API) for Shared Memory Parallel Programming
- Directive based approach with library support
- Targets existing applications and widely used languages: Fortran API first released October 1997 C, C++ API first released October 1998
- Multi-vendor/platform support
![Page 28: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/28.jpg)
Why was OpenMP developed?
- Parallel programming before OpenMP * Standards for distributed memory (MPI and PVM) * No standard for shared memory programming- Vendors had different directive-based API for SMP * SGI, Cray, Kuck&Assoc, DEC * Vendor proprietary, similar but not the same * Most were targeted at loop level parallelism- Commercial users, high end software vendors, have big investment in existing codes- End result: users wanting portability were forced to use MPI even for shared memory * This sacrifices built-in SMP hardware benefits * Requires major effort
![Page 29: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/29.jpg)
The Spread of OpenMP
Organization: Architecture review board Web site: www.openmp.org
Hardware: HP/DEC IBM Intel SGI Sun
Software: Portland (PGI) NAG Intel Kuck & Assoc (KAI) Absoft
![Page 30: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/30.jpg)
OpenMP Interface model
•Control structures•Work sharing•Data scope attributes * private,firstprivate, lastprivate * shared * reduction
-Control and query * number thread * nested parallel? * throughput mode
- Lock API
-Runtime environment * schedule type * max number threads * nested parallelism * throughput mode
DirectivesAnd
Pragmas
RuntimeLibraryroutines
Environmentvariables
![Page 31: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/31.jpg)
OpenMP execution model
OpenMP programs starts in a single thread, sequential mode
To create additional threads, user opens a parallel region * additional slave threads launched * master thread is part of team * threads “disappear” at the end of parallel region run
This model is repeated as needed
Master thread
Parallel:4 threads
Parallel:2 threads
Parallel:3 threads
![Page 32: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/32.jpg)
Creating parallel threadsFortran
C/C++
c$omp parallel [clause,clause] code to run in parallelc$omp end parallel
#pragma omp parallel [clause,clause]{ code to run in parallel}
Replicate execution
i=0C$omp parallel call foo(i,a,b)C$omp end parallel print*,i
foo foo foo foo
i=0
print*,i
Number of threads: set in library or environment call
![Page 33: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/33.jpg)
![Page 34: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/34.jpg)
![Page 35: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/35.jpg)
![Page 36: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/36.jpg)
OpenMP on the Origin 2000
Switches, formatsf77 -mp
c$omp parallel doc$omp+shared(a,b,c)ORc$omp parallel do shared(a,b,c)
c$ iam = omp_get_thread()+1
Conditional compilation
![Page 37: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/37.jpg)
OpenMP on the Origin 2000 -C
Switches, formatscc -mp
#pragma omp parallel for\shared(a,b,c)OR#pragma omp parallel for shared(a,b,c)
![Page 38: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/38.jpg)
OpenMP on the Origin 2000
Parallel Do Directive
c$omp parallel do private(I)
c$omp end parallel do --> optional
do I=1,na(I)= I+1enddo
Topics: Clauses, Detailed construct
![Page 39: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/39.jpg)
OpenMP on the Origin 2000
Parallel Do Directive - Clauses
sharedprivatedefault(private|shared|none)firstprivatelastprivatereduction({operator|intrinsic}:var)schedule(type,[chunk])if(scalar_logical_expression)orderedcopyin(var)
![Page 40: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/40.jpg)
S S
Single thread Parallel region Single thread
S = shared variableP = private variable
Allocating private and shared variables
![Page 41: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/41.jpg)
Clauses in OpenMP - 1
Clauses for the “parallel” directive specify data association rulesand conditional computation
shared (list) - data accessible by all threads, which all refer to the same storageprivate (list) - data private to each thread - a new storage location is created with that name for each thread, and the of the storage are not available outside the parallel region
default (private | shared | none) - default association for variables not otherwise mentionedfirstprivate (list) - same as for private(list), but the contents are given an initial value from the variable with the same name, from outside the parallel regionlastprivate (list) - available only for work sharing constructs - a shared variable with that name is set to the last computed value of a thread private variable in the work sharing construct
![Page 42: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/42.jpg)
Clauses in OpenMP - 2reduction ({op/intrinsic}:list) - variables in the list are named scalars of intrinsic type - a private copy of each variable will be made in each thread and initialized according to the intended operation - at the end of the parallel region or other synchronization point all private copies will be combined - the operation must be of one of the forms: x = x op expr x = intrinsic(x,expr) if (x.LT.expr) x = expr x++; x--; ++x; --x; where expr does not contain x
Op Init+ or - 0* 1& -0| 0^ 0&& 1|| 0
Op/intrinsic Init+ or - 0* 1.AND. .TRUE..OR. .FALSE..EQV. .TRUE..NEQV. .FALSE.MAX smallest numberMIN largest numberIAND all bits onIOR or IEOR 0
- example: c$omp parallel do reduction(+:a,y) reduction (.OR.:s)
![Page 43: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/43.jpg)
Clauses in OpenMP - 3
copyin(list) - the list must contain common block (or global) names tahat have been declared threadprivate - data in the master thread in that common block will be copied to the thread private storage at the beginning of the parallel region - there is no “copyout” clause – data in private common block is not available outside of that threadif (scalar_logical_expression) - when an “if” clause is present, the enclosed code block is executed in parallel only if the scalar_logical_expression is .TRUE.ordered - only for do/for work sharing constructs – the code in the ORDERED block will be executed in the same sequence as sequential executionschedule (kind,[chunk]) - only for do/for work sharing constructs – specifies scheduling discipline for loop iterationsnowait - end of worksharing construct and SINGLE directive implies a synchronization\ point unless “nowait” is specified
![Page 44: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/44.jpg)
OpenMP on the Origin 2000
Parallel Sections Directive
c$omp parallel sections private(I)
c$omp end parallel sections
c$omp section block1c$omp section block2
Topics: Clauses, Detailed construct
![Page 45: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/45.jpg)
OpenMP on the Origin 2000
Parallel Sections Directive - Clauses
sharedprivatedefault(private|shared|none)firstprivatelastprivatereduction({operator|intrinsic}:var)if(scalar_logical_expression)copyin(var)
![Page 46: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/46.jpg)
OpenMP on the Origin 2000
Defining a Parallel Region - Individual Do Loopsc$omp parallel shared(a,b)
do j=1,na(j)=jenddo
do k=1,nb(k)=kenddo
c$omp do private(j)
c$omp end do nowaitc$omp do private(k)
c$omp end doc$omp end parallel
![Page 47: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/47.jpg)
OpenMP on the Origin 2000
Defining a Parallel Region - Explicit Sections
c$omp parallel shared(a,b)c$omp sectionblock1c$omp singleblock2c$omp sectionblock3c$omp end parallel
![Page 48: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/48.jpg)
OpenMP on the Origin 2000
Synchronization Constructs
master/end mastercritical/end criticalbarrieratomicflushordered/end ordered
![Page 49: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/49.jpg)
OpenMP on the Origin 2000
Run-Time Library Routines
Execution environment
omp_set_num_threadsomp_get_num_threadsomp_get_max_threadsomp_get_thread_numomp_get_num_procsomp_in_parallelomp_set_dynamic/omp_get_dynamicomp_set_nested/omp_get_nested
![Page 50: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/50.jpg)
OpenMP on the Origin 2000
Run-Time Library Routines
Lock routines
omp_init_lockomp_destroy_lockomp_set_lockomp_unset_lockomp_test_lock
![Page 51: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/51.jpg)
OpenMP on the Origin 2000
Environment Variables
OMP_NUM_THREADSorMP_SET_NUMTHREADSOMP_DYNAMICOMP_NESTED
![Page 52: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/52.jpg)
Exercise 5 – OpenMP to parallelize a loop
![Page 53: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/53.jpg)
![Page 54: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/54.jpg)
main loop
initial values
![Page 55: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/55.jpg)
![Page 56: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/56.jpg)
![Page 57: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/57.jpg)
Enhancing Performance
• Ensuring sufficient work : running a loop in parallel adds runtime costs
• Scheduling loops for load - balancing
![Page 58: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/58.jpg)
The SCHEDULE clause
SCHEDULE (TYPE[,CHUNK])
Static Each thread is assigned one chunk of iterations, according to variable or equally sized
Dynamic At runtime, chunks are assigned to threads dynamically
![Page 59: Parallel Programming on the SGI Origin2000](https://reader036.vdocument.in/reader036/viewer/2022062309/568148ee550346895db60be7/html5/thumbnails/59.jpg)
OpenMP summary
- Small number of compiler directives to set up parallel execution of code and runtime library system for locking function- Portable directives (supported by different vendors in the same way)- Parallelization is for SMP programming model – the machine should have a global address space- Number of execution threads is controlled outside the program- A correct OpenMP program should not depend on the exact number of execution threads nor on the scheduling mechanism for work distribution- In addition, a correct OpenMP program should be (weakly) serially equivalent – that is, the results of the computation should be within rounding accuracy when compared to sequential program- On SGI, OpenMP programming can be mixed with MPI library, so that it is possible to have “hierarchical parallelism” * OpenMP parallelism in a single node (Global Address Space) * MPI parallelism between nodes in a cluster (Network connection)