intel parallel programming
TRANSCRIPT
Dharmendra Savaliya
{ Subham Yadav
Bhaumik Patel
Darshak Shah}
(Using Open MP & MPI)
How to make Processors always busy?
Fork join method For OpenMP.
What is MPI?
STUDY PROBLEM, SEQUENTIAL PROGRAM
LOOK FOR OPPORTUNITIES FOR PARALLELISM
TRY TO KEEP ALL PROCESSORS BUSY DOING USEFUL WORK
DOMAIN DECOMPOSITION
TASK DECOMPOSITION
Dependence Graph
First, decide how data elements should be divided among processors. Second, decide which tasks each processor should be doing
Intel White paper : www.intel.com/pressroom/archive/reference/
First, divide tasks among processors Second, decide which data elements are going to be accessed (read and/or written) by which processors
Special kind of task decomposition
1. for (i = 0; i < 3; i++) a[i] = b[i] / 2.0;
2. for (i = 1; i < 4; i++) a[i] = a[i-1] * b[i];
3. a = f(x, y, z); b = g(w, x); t = a + b; c = h(z); s = t / c;
OpenMP is an API for parallel programming
First developed by the OpenMP Architecture Review in 1997, a standard Designed for shared-memory multiprocessors
Set of compiler directives, library functions, and environment variables, but not a language
Can be used with C, C++, or Fortran
Based on fork/join model of threads
Strengths
Well-suited for Domain decompositions
Available on Unix and Windows
Weaknesses
Not good for Task decompositions
Race condition due to dependancy
16 Implementing Domain Decompositions
The compiler directive
#pragma omp parallel for
tells the compiler that the for loop which immediately follows can be executed in parallel
The number of loop iterations must be computable at run time before loop executes
Loop must not contain a break, return, or exit
Loop must not contain a goto to a label outside loop
The hello World OpenMP Prog.
#include <stdio.h> #include <omp.h> int main() { #pragma omp parallel printf("Hello World \n"); return 0; } Since fork/join is a source of loop, we
want to maximize the amount of work done for each fork/join
Master
Master
W0 W1 W2
Message Passing interface
Used on Distributed memory MIMD architectures • MPI specifies the API for message passing
(communication related routines)
Distributed Memory
Processes have only local memory and must use some other mechanism (e.g., message passing) to exchange information.
Advantage: programmers have explicit control over data distribution and communication
How to make different process to do different things (MIMD functionality)?
Need to know the execution environment: Can usually decide what to do based on the number of processes on this job and the process id.
How many processes are working on this problem?
MPI_Comm_size What is myid?
MPI_Comm_rank Rank is with respect to a communicator (context of the communication). MPI_COM_WORLD is a predefined communicator that includes all processes (already mapped to processors).
#include "mpi.h"
#include <stdio.h>
int main( int argc, char *argv[] )
{
MPI_Init( &argc, &argv );
printf( "Hello world\n" );
MPI_Finalize();
return 0;
}
• Mpi.h contains MPI
definitioins and types.
• MPI program must start
with MPI_init
• MPI program must exit
with MPI_Finalize
• MPI functions are just
library routines that can be
used on top of the regular
C, C++
MPI_Send(start, count, datatype, dest, tag, comm)
MPI_Recv(start, count, datatype, source, tag, comm, status)
The Simple MPI (six functions that make most of programs work):
MPI_INIT
MPI_FINALIZE
MPI_COMM_SIZE
MPI_COMM_RANK
MPI_SEND
MPI_RECV
MPI OpenMP
Distributed memory Shared memory model model
Distributed network on Multi-core processors
Message based Directive based
Flexible and expressive Easier to program and debug
If, We use OpenMP and MPI For C Programming But in this case performance is less then MPI. So, Better to use OpenMP or MPI Separately.
http://www.intel-software-academic-program.com/pages/courses
Any ?