2 basic concepts on
TRANSCRIPT
-
8/6/2019 2 Basic Concepts on
1/58
Basic Concepts in Parallelization
1
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
IWOMP 2010CCS, University of Tsukuba
Tsukuba, JapanJune 14-16, 2010
Ruud van der Pas
Senior Staff EngineerOracle Solaris StudioOracle
Menlo Park, CA, USA
Basic Concepts in Parallelization
-
8/6/2019 2 Basic Concepts on
2/58
Basic Concepts in Parallelization
2
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Outline
Introduction
Parallel Architectures
Parallel Programming Models
Data Races
Summary
-
8/6/2019 2 Basic Concepts on
3/58
Basic Concepts in Parallelization
3
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Introduction
-
8/6/2019 2 Basic Concepts on
4/58Basic Concepts in Parallelization
4
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Why Parallelization ?
Parallelization is another optimization techniqueThe goal is to reduce the execution time
To this end, multiple processors, or cores, are used
Time
1 core 4 coresparallelization
Using 4 cores, the execution
time is 1/4 of the single coretime
-
8/6/2019 2 Basic Concepts on
5/58Basic Concepts in Parallelization
5
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
What Is Parallelization ?
"Something" is parallel if there is a certain levelof independence in the order of operations
A sequence of machine instructions
A collection of program statements
An algorithmThe problem you're trying to solve
gr
an
ul
ari
ty
In other words, it doesn't matter in what orderthose operations are performed
-
8/6/2019 2 Basic Concepts on
6/58Basic Concepts in Parallelization
6
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
What is a Thread ?
Loosely said, a thread consists of a series of instructions
with it's own program counter (PC) and state A parallel program executes threads in parallel
These threads are then scheduled onto processors
.....
.....
.....
.....
.....
.....
.....
.....
Thread 0
.....
.....
.....
.....
.....
.....
.....
.....
Thread 1
.....
.....
.....
.....
.....
.....
.....
.....
Thread 2
.....
.....
.....
.....
.....
.....
.....
.....
Thread 3
P P
PCPC
PC
PC
-
8/6/2019 2 Basic Concepts on
7/58Basic Concepts in Parallelization
7
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Parallel overhead
The total CPU time often exceeds the serial CPU time:
The newly introduced parallel portions in yourprogram need to be executed
Threads need time sending data to each other andsynchronizing (communication)
Often the key contributor, spoiling all the fun Typically, things also get worse when increasing the
number of threads
Effi cient parallelization is about minimizing thecommunication overhead
-
8/6/2019 2 Basic Concepts on
8/58Basic Concepts in Parallelization
8
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Communication
Serial
Execution
Wallclocktime
Parallel - With
communication
Additional communication
Less than 4x faster
Consumes additional resources
Wallclock time is more than of serial wallclock time
Total CPU time increases
Parallel - Without
communication
Embarrassinglyparallel: 4x faster
Wallclock time is ofserial wallclock time
Communication
-
8/6/2019 2 Basic Concepts on
9/58Basic Concepts in Parallelization
9
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Load balancing
Time
Perfect Load Balancing Load Imbalance
All threads fi nish in thesame amount of time
No threads is idle
Thread is idle
Different threads need a differentamount of time to fi nish theirtask
Total wall clock time increases
Program does not scale well
1 idle
2 idle3 idle
-
8/6/2019 2 Basic Concepts on
10/58Basic Concepts in Parallelization
10
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
About scalability Defi ne the speed-up S(P) as
S(P) := T(1)/T(P)
The effi ciency E(P) is defi ned asE(P) := S(P)/P
In the ideal case, S(P)=PandE(P)=P/P=1=100%
Unless the application is
embarrassingly parallel, S(P)eventually starts to deviate fromthe ideal curve
Past this point Popt
, the application
sees less and less benefi t fromadding processors
Note that both metrics give noinformation on the actual run-time
As such, they can be dangerous touse
Ideal
Speed-up
S(P)
PPopt
In some cases, S(P) exceeds P
This is called "superlinear" behaviour
Don't count on this to happen though
-
8/6/2019 2 Basic Concepts on
11/58
Basic Concepts in Parallelization
11
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Amdahl's Law
This "law' describes the effect the non-parallelizable part of a
program has on scalability Note that the additional overhead caused by parallelization and
speed-up because of cache effects are not taken into account
Assume our program has a parallel fractionf
This implies the execution timeT(1) := f*T(1) + (1-f)*T(1)
On P processors:T(P) = (f/P)*T(1) + (1-f)*T(1)
Amdahl's law:
S(P) = T(1)/T(P) = 1 / (f/P + 1-f)
Comments:
-
8/6/2019 2 Basic Concepts on
12/58
Basic Concepts in Parallelization
12
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Amdahl's Law
It is easy to scale on asmall number ofprocessors
Scalable performancehowever requires ahigh degree ofparallelization i.e. f isvery close to 1
This implies that youneed to parallelize thatpart of the code wherethe majority of the timeis spent
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160
2
4
6
8
10
12
14
16
1.000.99
0.75
0.50
0.25
0.00
Value for f:
Processors
Sp
eed-up
-
8/6/2019 2 Basic Concepts on
13/58
Basic Concepts in Parallelization
13
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Amdahl's Law in practice
f = (1 - T(P)/T(1))/(1 - (1/P))
We can estimate the parallel fractionf
Recall:T(P) = (f/P)*T(1) + (1-f)*T(1)
It is trivial to solve this equation for f:
Example:
T(1) = 100 and T(4) = 37 => S(4) = T(1)/T(4) = 2.70f = (1-37/100)/(1-(1/4)) = 0.63/0.75 = 0.84 = 84%
Estimated performance on 8 processors is then:
T(8) = (0.84/8)*100 + (1-0.84)*100 = 26.5S(8) = T/T(8) = 3.78
-
8/6/2019 2 Basic Concepts on
14/58
Basic Concepts in Parallelization
14
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Threads Are Getting Cheap .....
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160
10
20
30
40
50
60
70
80
90
100
Elapsed time and speed up for f = 84%
Number of cores used
Using 8 cores:
100/26.5 = 3.8x faster
= Elapsed time
= Speed up
-
8/6/2019 2 Basic Concepts on
15/58
Basic Concepts in Parallelization
15
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Numerical results
Consider:A = B + C + D + E
Serial Processing Parallel Processing
Thread 0 Thread 1
The roundoff behaviour is different and so the numericalresults may be different too
This is natural for parallel programs, but it may be hard todifferentiate it from an ordinary bug ....
A = B + C
A = A + D
A = A + E
T1 = B + C
T1 = T1 + T2
T2 = D + E
-
8/6/2019 2 Basic Concepts on
16/58
Basic Concepts in Parallelization
16
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Cache Coherence
-
8/6/2019 2 Basic Concepts on
17/58
Basic Concepts in Parallelization
17
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Typical cache based system
L2cacheCPU Memory
L1cache?
-
8/6/2019 2 Basic Concepts on
18/58
Basic Concepts in Parallelization
18
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Cache line modifi cations
Main Memory
caches
page cache line
cache lineinconsistency !
-
8/6/2019 2 Basic Concepts on
19/58
Basic Concepts in Parallelization
19
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Caches in a parallel computer
A cache line starts in
memoryOver time multiple copies
of this line may exist
Cache Coherence ("cc"):
Tracks changes in copies
Makes sure correct cacheline is used
Different implementationspossible
Need hardware support tomake it effi cient
Processors Caches Memory
CPU 0
CPU 1
CPU 2
CPU 3
invalidated
invalidated
invalidated
-
8/6/2019 2 Basic Concepts on
20/58
Basic Concepts in Parallelization
20
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Parallel Architectures
-
8/6/2019 2 Basic Concepts on
21/58
Basic Concepts in Parallelization
21
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Uniform Memory Access (UMA)
Also called "SMP"
(Symmetric MultiProcessor)
Memory Access time isUniform for all CPUs
CPU can be multicore
Interconnect is "cc":
Bus
Crossbar
No fragmentation -Memory and I/O areshared resources
Memory I/O
cache
CPU
Easy to use and to administer
Effi cient use of resources
Said to be expensive
Said to be non-scalable
Pro
Con
cache
CPU
cache
CPU
Interconnect I/O
I/O
-
8/6/2019 2 Basic Concepts on
22/58
Basic Concepts in Parallelization
22
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
NUMA
Also called "Distributed
Memory" or NORMA (NoRemote Memory Access)
Memory Access time isNon-Uniform
Hence the name "NUMA"
Interconnect is not "cc":
Ethernet, Infi niband,etc, ......
Runs 'N' copies of the OS
Memory and I/O aredistributed resources
Said to be cheap
Said to be scalable
Diffi cult to use and administer
In-effi cient use of resources
Pro
Con
I/O
cache
CPU
M
I/O
cache
CPU
M
I/O
cache
CPU
M
Interconnect
-
8/6/2019 2 Basic Concepts on
23/58
Basic Concepts in Parallelization
23
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
The Hybrid Architecture
Second-level Interconnect
Second-level interconnect is not cache coherent
Ethernet, Infi niband, etc, ....
Hybrid Architecture with all Pros and Cons:
UMA within one SMP/Multicore node
NUMA across nodes
SMP/MC node SMP/MC node
-
8/6/2019 2 Basic Concepts on
24/58
Basic Concepts in Parallelization
24
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
cc-NUMA
Two-level interconnect:
UMA/SMP within one system
NUMA between the systems
Both interconnects support cache coherence i.e. thesystem is fully cache coherent
Has all the advantages ('look and feel') of an SMP
Downside is the Non-Uniform Memory Access time
Interconnect
2-nd level
-
8/6/2019 2 Basic Concepts on
25/58
Basic Concepts in Parallelization
25
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Parallel Programming Models
-
8/6/2019 2 Basic Concepts on
26/58
Basic Concepts in Parallelization
26
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
How To Program A Parallel Computer?
There are numerous parallel programming models
The ones most well-known are: Distributed Memory
Sockets (standardized, low level)
PVM - Parallel Virtual Machine (obsolete)
MPI - Message Passing Interface (de-facto std)
Shared Memory
Posix Threads (standardized, low level)
OpenMP (de-facto standard)
Automatic Parallelization (compiler does it for you)
Any
Cluster
and/or
singleSMP
SMP
on
ly
-
8/6/2019 2 Basic Concepts on
27/58
Basic Concepts in Parallelization
27
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Parallel Programming ModelsDistributed Memory - MPI
-
8/6/2019 2 Basic Concepts on
28/58
Basic Concepts in Parallelization
28
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
What is MPI?
MPI stands for the Message Passing Interface
MPI is a very extensive de-facto parallel programmingAPI for distributed memory systems (i.e. a cluster)
An MPI program can however also be executed on ashared memory system
First specifi cation: 1994
The current MPI-2 specifi cation was released in 1997
Major enhancements
Remote memory management
Parallel I/ODynamic process management
-
8/6/2019 2 Basic Concepts on
29/58
Basic Concepts in Parallelization
29
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
More about MPI
MPI has its own data types (e.g. MPI_INT)
User defi ned data types are supported as wellMPI supports C, C++ and Fortran
Include fi lein C/C++ andmpif.hin Fortran
An MPI environment typically consists of at least:
A library implementing the API
A compiler and linker that support the library
A run time environment to launch an MPI program
Various implementations available
HPC Clustertools, MPICH, MVAPICH, LAM, VoltaireMPI, Scali MPI, HP-MPI, .....
-
8/6/2019 2 Basic Concepts on
30/58
Th MPI E ti M d l
-
8/6/2019 2 Basic Concepts on
31/58
Basic Concepts in Parallelization
31
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
The MPI Execution Model
All processesstart
MPI_Finalize
All processesfi nish
= communication
MPI_Init
Th MPI M M d l
-
8/6/2019 2 Basic Concepts on
32/58
Basic Concepts in Parallelization
32
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
The MPI Memory Model
Pprivate
Pprivate
P
private
P
private
P
private
TransferMechanism
All threads/processes haveaccess to their own, private,memory only
Data transfer and mostsynchronization has to beprogrammed explicitly
All data is private Data is shared explicitly by
exchanging buffers
E l S d N I t
-
8/6/2019 2 Basic Concepts on
33/58
Basic Concepts in Parallelization
33
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Example - Send N Integers
#include
you = 0; him = 1;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &me);
if ( me == 0 ) {
error_code =MPI_Send(&data_buffer, N,MPI_INT,
him, 1957,MPI_COMM_WORLD);
} else if ( me == 1 ) {
error_code =MPI_Recv(&data_buffer, N,MPI_INT,
you, 1957,MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
}
MPI_Finalize();
include fi le
initialize MPI environment
get process ID
process 0 sends
process 1 receives
leave the MPI environment
R ti B h i
-
8/6/2019 2 Basic Concepts on
34/58
Basic Concepts in Parallelization
34
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
you = 1him = 0
Process 0
you = 1him = 0
Process 1
Run time Behavior
N integersdestination = you = 1
label = 1957
MPI_Send
me = 0 me = 1
Yes ! Connection established
MPI_RecvN integers
sender = him =0label = 1957
-
8/6/2019 2 Basic Concepts on
35/58
-
8/6/2019 2 Basic Concepts on
36/58
Basic Concepts in Parallelization
36
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Parallel Programming ModelsShared Memory - Automatic
Parallelization
-
8/6/2019 2 Basic Concepts on
37/58
-
8/6/2019 2 Basic Concepts on
38/58
Basic Concepts in Parallelization
38
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Parallel Programming ModelsShared Memory - OpenMP
-
8/6/2019 2 Basic Concepts on
39/58
Basic Concepts in Parallelization
39
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
http://www.openmp.org
0
M
1
M
P
M
Shared Memory
Shared Memory Model
-
8/6/2019 2 Basic Concepts on
40/58
Basic Concepts in Parallelization
40
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Shared Memory Model
Tprivate
Tprivate
T
private
T
private
T
private
Programming Model
Shared
Memory
All threads have access to thesame, globally shared, memory
Data can be shared or private
Shared data is accessible by all
threads
Private data can only beaccessed by the thread thatowns it
Data transfer is transparent tothe programmer
Synchronization takes place,but it is mostly implicit
About data
-
8/6/2019 2 Basic Concepts on
41/58
Basic Concepts in Parallelization
41
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
About data
In a shared memory parallel program variables have a"label" attached to them:
Labelled "Private" Visible to one thread onlyChange made in local data, is not seen by others
Example -Local variables in a function that isexecuted in parallel
Labelled "Shared" Visible to all threadsChange made in global data, is seen by all others
Example -Global data
Example - Matrix times vector
-
8/6/2019 2 Basic Concepts on
42/58
Basic Concepts in Parallelization
42
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
TID = 0
for (i=0,1,2,3,4)
TID = 1
for (i=5,6,7,8,9)
Example - Matrix times vector
i = 0 i = 5
a[0] = sum a[5] = sum
sum = b[i=0][j]*c[j] sum = b[i=5][j]*c[j]i = 1 i = 6
a[1] = sum a[6] = sum
sum = b[i=1][j]*c[j] sum = b[i=6][j]*c[j]... etc ...
for (i=0; i
-
8/6/2019 2 Basic Concepts on
43/58
Basic Concepts in Parallelization
43
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
A Black and White comparison
MPI OpenMP
De-facto standard De-facto standard
Endorsed by all key players Endorsed by all key players
Runs on any number of (cheap) systems Limited to one (SMP) system
Grid Ready Not (yet?) Grid Ready
High and steep learning curve Easier to get started (but, ...)
You're on your own Assistance from compiler
All or nothing model Mix and match model
No data scoping (shared, private, ..) Requires data scoping
More widely used (but ....) Increasingly popular (CMT !)
Sequential version is not preserved Preserves sequential codeRequires a library only Need a compiler
Requires a run-time environment No special environment
Easier to understand performance Performance issues implicit
-
8/6/2019 2 Basic Concepts on
44/58
Basic Concepts in Parallelization
44
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
The Hybrid ParallelProgramming Model
The Hybrid Programming Model
-
8/6/2019 2 Basic Concepts on
45/58
Basic Concepts in Parallelization
45
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
The Hybrid Programming Model
0
M
1
M
P
M
Shared Memory
Distributed Memory
0
M
1
M
P
M
Shared Memory
-
8/6/2019 2 Basic Concepts on
46/58
Basic Concepts in Parallelization
46
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Data Races
About Parallelism
-
8/6/2019 2 Basic Concepts on
47/58
Basic Concepts in Parallelization
47
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
About Parallelism
Parallelism
Independence
No Fixed Ordering
"Something" that does notobey this rule, is not
parallel (at that level ...)
Shared Memory Programming
-
8/6/2019 2 Basic Concepts on
48/58
Basic Concepts in Parallelization
48
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Shared Memory Programming
Threads communicate via shared memory
Tprivate
Tprivate
T
private
T
private T
private
Shared
Memory
XX
X
Y
49What is a Data Race?
-
8/6/2019 2 Basic Concepts on
49/58
Basic Concepts in Parallelization
49
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Two different threads in a multi-threaded sharedmemory program
Access the same (=shared) memory location
Asynchronously and
Without holding any common exclusive locks and
At least one of the accesses is a write/store
50Example of a data race
-
8/6/2019 2 Basic Concepts on
50/58
Basic Concepts in Parallelization
50
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
p
T
private
T
private
SharedMemory
n
{n = omp_get_thread_num();}
#pragma omp parallel shared(n)
W
W
51Another example
-
8/6/2019 2 Basic Concepts on
51/58
Basic Concepts in Parallelization
51
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
p
{x = x + 1;}
#pragma omp parallel shared(x)
T
private
T
private
SharedMemory
x
R/W
R/W
52About Data Races
-
8/6/2019 2 Basic Concepts on
52/58
Basic Concepts in Parallelization
52
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Loosely described, a data race means that the updateof a shared variable is not well protected
A data race tends to show up in a nasty way:
Numerical results are (somewhat) different from runto run
Especially with Floating-Point data diffi cult to
distinguish from a numerical side-effect Changing the number of threads can cause the
problem to seemingly (dis)appear
May also depend on the load on the system
May only show up using many threads
-
8/6/2019 2 Basic Concepts on
53/58
54Not a parallel loop
-
8/6/2019 2 Basic Concepts on
54/58
Basic Concepts in Parallelization
54
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
for (i=0; i
-
8/6/2019 2 Basic Concepts on
55/58
Basic Concepts in Parallelization
55
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
We manually parallelized the previous loop
The compiler detects the data dependence and doesnot parallelize the loop
Vectorsaandbare of type integer
We use the checksum ofaas a measure forcorrectness:
checksum += a[i] for i = 0, 1, 2, ...., n-2
The correct, sequential, checksum result is computedas a reference
We ran the program using 1, 2, 4, 32 and 48 threads
Each of these experiments was repeated 4 times
56Numerical results
-
8/6/2019 2 Basic Concepts on
56/58
Basic Concepts in Parallelization
56
RvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
threads: 1 checksum 1953 correct value 1953threads: 1 checksum 1953 correct value 1953threads: 1 checksum 1953 correct value 1953threads: 1 checksum 1953 correct value 1953
threads: 2 checksum 1953 correct value 1953threads: 2 checksum 1953 correct value 1953threads: 2 checksum 1953 correct value 1953threads: 2 checksum 1953 correct value 1953
threads: 4 checksum 1905 correct value 1953
threads: 4 checksum 1905 correct value 1953threads: 4 checksum 1953 correct value 1953threads: 4 checksum 1937 correct value 1953
threads: 32 checksum 1525 correct value 1953threads: 32 checksum 1473 correct value 1953threads: 32 checksum 1489 correct value 1953threads: 32 checksum 1513 correct value 1953
threads: 48 checksum 936 correct value 1953threads: 48 checksum 1007 correct value 1953threads: 48 checksum 887 correct value 1953threads: 48 checksum 822 correct value 1953
Data Race
In Action !
57
-
8/6/2019 2 Basic Concepts on
57/58
Basic Concepts in ParallelizationRvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Summary
58Parallelism Is Everywhere
-
8/6/2019 2 Basic Concepts on
58/58
Basic Concepts in ParallelizationRvdP/V1 Tutorial IWOMP 2010 CCS Un. of Tsukuba, June 14 , 2010
Multiple levels of parallelism:
Granularity Technology Programming Model
Instruction Level Superscalar Compiler
Chip Level Multicore Compiler, OpenMP, MPI
System Level SMP/cc-NUMA Compiler, OpenMP, MPIGrid Level Cluster MPI
Threads Are Getting Cheap