copyright 1999, © amy apon, ph.d. shared memory parallel programming cluster computing short course...
Post on 31-Dec-2015
219 Views
Preview:
TRANSCRIPT
Copyright 1999, © Amy Apon, Ph.D.
Shared Memory Parallel Shared Memory Parallel ProgrammingProgramming
Cluster ComputingShort Course
Presentation One
Copyright 1999, © Amy Apon, Ph.D.
2
IntroductionIntroduction
The commodity computers available The commodity computers available today have capabilities that were today have capabilities that were previously only available on large previously only available on large mainframes and expensive mainframes and expensive workstations.workstations.
However, new programming techniques However, new programming techniques may be required in order to be able to may be required in order to be able to take advantage of these capabilities. take advantage of these capabilities.
Copyright 1999, © Amy Apon, Ph.D.
3
Introduction, continuedIntroduction, continued
A series of presentations will coverA series of presentations will cover
• shared memory programming using shared memory programming using the pthread librarythe pthread library
• distributed memory programming distributed memory programming using TCP sockets and MPIusing TCP sockets and MPI
• performance and architectural issuesperformance and architectural issues
Copyright 1999, © Amy Apon, Ph.D.
4
Shared Memory Shared Memory Programming OutlineProgramming Outline
• Process state, process creation (30 Process state, process creation (30 min)min)
• Basics of threads (30 min)Basics of threads (30 min)
• Thread synchronization (60 min)Thread synchronization (60 min)
• Symmetric multiprocessors (45 min)Symmetric multiprocessors (45 min)
Copyright 1999, © Amy Apon, Ph.D.
5
The Process State The Process State DiagramDiagram
• A process is a program in A process is a program in execution!execution!
scheduled
context switch
I/O requestI/O completion
enter exit
Process state, process creation
Copyright 1999, © Amy Apon, Ph.D.
6
fork() Creates Unix fork() Creates Unix ProcessesProcesses
main()
{ int pid;
…
if (pid = fork() !=0) { /* child is here */
}
else { /* parent is here */
}
main()
{ int pid;
…
if (pid = fork() !=0) { /* child is here */
}
else { /* parent is here */
}
Parent executes Parent executes firstfirst Child is an exact Child is an exact
copy except for copy except for process ID!process ID!
Gets control at Gets control at fork()
Process state, process creation
Copyright 1999, © Amy Apon, Ph.D.
7
Process Creation in UnixProcess Creation in Unixmain()
{ int pid;
…
if (pid = fork() !=0) { /* child is here */
}
else { /* parent is here */
}
main()
{ int pid;
…
if (pid = fork() !=0) { /* child is here */
}
else { /* parent is here */
}
In the parent, the In the parent, the returned value of returned value of the fork is the (new) the fork is the (new) child’s process IDchild’s process ID
In child, returned value is In child, returned value is 00
execve() is used to execve() is used to overlay child with a new overlay child with a new programprogram Process state, process creation
Copyright 1999, © Amy Apon, Ph.D.
8
A typical Unix process A typical Unix process treetree$ pstree init-+-crond |-httpd---5*[httpd] |-inetd---in.telnetd---login---tcsh-+-pstree |-login---bash |-lpd |-qmail-send-+-qmail-clean | |-qmail-lspawn | |-qmail-rspawn
•Process state, process creation
Copyright 1999, © Amy Apon, Ph.D.
9
Process statusProcess status
$ ps
PID TTY TIME CMD
24897 pts/0 00:00:00 tcsh
24937 pts/0 00:00:00 ps
$ ps aux | more (to see all processes)
$ top (to see busy processes)
$ man command (to get more information)
•Process state, process creation
Copyright 1999, © Amy Apon, Ph.D.
10
ProcessesProcesses
• Don’t share memory by defaultDon’t share memory by default– can do this with shmat(), but high can do this with shmat(), but high
overhead!overhead!
• Have an entry in the process tableHave an entry in the process table
• Generally have high overhead for Generally have high overhead for creation and for a context switchcreation and for a context switch
• For more info, see For more info, see Operating Operating SystemsSystems, by Silberschatz and Galvin, by Silberschatz and Galvin
Process state, process creation
Copyright 1999, © Amy Apon, Ph.D.
11
Basics of ThreadsBasics of Threads
• Threads are a fundamental tool for Threads are a fundamental tool for shared memory programming!shared memory programming!
Basics of threads
Copyright 1999, © Amy Apon, Ph.D.
12
pthread Librarypthread Library
• POSIX standard thread libraryPOSIX standard thread library
• Include in C, C++ programsInclude in C, C++ programs
• Portable across all Unix platformsPortable across all Unix platforms
• Some similarities and some differences Some similarities and some differences with Windows threadswith Windows threads
• Fully compatible with Java, Solaris Fully compatible with Java, Solaris threadsthreads
Basics of threads
Copyright 1999, © Amy Apon, Ph.D.
13
#include <pthread.h>#include <stdio.h>
void * hello (void * parm) { printf(”Hello World! My parameter is %d\n", (int) parm); pthread_exit(0);}
main() {
pthread_t tid;
pthread_create( &tid, NULL, hello, 1 );
pthread_join(tid, NULL);
}
Thread CreationThread Creation
Address of thread ID
Thread attribute
Function to execute
Single (address) parameter only!
Basics of threads
Copyright 1999, © Amy Apon, Ph.D.
14
To Execute a Thread To Execute a Thread ProgramProgram
• Compile in UnixCompile in Unix
$ gcc -o simple simple.c -lpthread
• ExecuteExecute
$ simple (or ./simple if “.” not on path)
Basics of threads
Copyright 1999, © Amy Apon, Ph.D.
15
A Thread is a A Thread is a “Lightweight Process”“Lightweight Process”
• Executes and context switches like a Executes and context switches like a processprocess
• Has its own ID and program counterHas its own ID and program counter
• Shares code, global variables, open file Shares code, global variables, open file pointers with creating process and pointers with creating process and threadsthreads
• Has its own local variables, stack spaceHas its own local variables, stack space
Basics of threads
Copyright 1999, © Amy Apon, Ph.D.
16
A Thread is a A Thread is a “Lightweight Process”“Lightweight Process”
• In most implementations, when a thread In most implementations, when a thread blocks, other threads do not block.blocks, other threads do not block.
• This allows one thread to do I/O (This allows one thread to do I/O (wait wait ) ) while another thread computes.while another thread computes.
• Multiple threads can execute Multiple threads can execute concurrently on a computer with more concurrently on a computer with more than one processorthan one processor
Basics of threads
Copyright 1999, © Amy Apon, Ph.D.
17
int globalvar = 0;pthread_t tid[3];
void * ChangeVar (void * parm) { globalvar++; printf(”I changed globalvar to %d", globalvar); }
main() { int i; for( i=0; i<3; i++) pthread_create( &tid[i], NULL, ChangeVar, NULL ); for( i=0; i<3; i++) pthread_join( tid[i], NULL);}
Shared Global VariablesShared Global Variables
Basics of threads
might be
different than the
one computed here
The value
printed here
THIS CODE HAS AN ERROR!
Copyright 1999, © Amy Apon, Ph.D.
18
A Context Switch Can A Context Switch Can Occur Anytime! Occur Anytime!
scheduled
context switch
enter exit
globalvar++;printf(”I changed globalvar to %d", globalvar);
Thread1 and Thread2 both execute this code
A context switch here
causes this access to get the wrong value
Basics of threads
Copyright 1999, © Amy Apon, Ph.D.
19
Race ConditionRace Condition• A A race conditionrace condition occurs whenever the occurs whenever the
outcome of the program depends on which outcome of the program depends on which thread modifies a shared memory location thread modifies a shared memory location first (i.e., “wins the race”)first (i.e., “wins the race”)
• A piece of code that accesses a shared A piece of code that accesses a shared memory location is called a memory location is called a critical sectioncritical section. .
• SynchronizationSynchronization is required so that the is required so that the shared access happens in mutual exclusionshared access happens in mutual exclusion
Synchronization
Copyright 1999, © Amy Apon, Ph.D.
20
Synchronization is NeededSynchronization is Needed
Example 1Example 1: : Producer/Consumer ProblemProducer/Consumer Problem
• producers place items into a shared buffer, producers place items into a shared buffer, consumers remove items from the bufferconsumers remove items from the buffer
• producers must not write into a full bufferproducers must not write into a full buffer
• consumers must not remove the same itemconsumers must not remove the same item
This occurs with network printer queues.This occurs with network printer queues.
Synchronization
Copyright 1999, © Amy Apon, Ph.D.
21
Synchronization is NeededSynchronization is Needed
Example 2Example 2: : Reader/Writer ProblemReader/Writer Problem
• Writers update a shared data item, Writers update a shared data item, readers read the itemreaders read the item
• Writers must write in mutual exclusion, Writers must write in mutual exclusion, any number of readers can read at a timeany number of readers can read at a time
Occurs with distributed database systems.Occurs with distributed database systems.
Synchronization
Copyright 1999, © Amy Apon, Ph.D.
22
Synchronization is NeededSynchronization is Needed
Example 3: Barrier Example 3: Barrier SynchronizationSynchronization
• All threads must come to a common All threads must come to a common stopping place before any can stopping place before any can proceedproceed
STOP
Synchronization
Copyright 1999, © Amy Apon, Ph.D.
23
Thread Synchronization Thread Synchronization ToolsTools• mutex variablesmutex variables
• semaphoressemaphores
• condition variablescondition variables
Only access to Only access to shared variablesshared variables must must be controlled. Local variables in a be controlled. Local variables in a thread are in private memory. thread are in private memory.
Synchronization
Copyright 1999, © Amy Apon, Ph.D.
24
Mutex VariablesMutex Variables
• Works like the service station key!Works like the service station key!
• One thread has the “key” at a time.One thread has the “key” at a time.
•Don’t forget to give the key back Don’t forget to give the key back when finished!when finished!
pthread_mutex_t mutex;
Synchronization
Copyright 1999, © Amy Apon, Ph.D.
25
Mutex ExampleMutex Examplepthread_mutex_t mut;int globalvar = 0;
void * ChangeGlobalVar (void * parm) { int localvar; pthread_mutex_lock(&mut); localvar = ++globalvar; pthread_mutex_unlock(&mut); printf(”I changed globalvar to %d", localvar); }
main() { pthread_mutex_init(&mut,NULL); /* create threads here */
Lock mutex before access
Unlock mutex after access
Synchronization
Copyright 1999, © Amy Apon, Ph.D.
26
SemaphoresSemaphores
#include <semaphore.h> sem_t semA, semB;
Two primary operations on Two primary operations on semaphores:semaphores:
• sem_post(&semA);sem_post(&semA);
• sem_wait(&semB);sem_wait(&semB);
Synchronization
Copyright 1999, © Amy Apon, Ph.D.
27
SemaphoresSemaphores
• Count open service positions, like at a Count open service positions, like at a bankbank
Sem_wait:Sem_wait: /* if a position is not open, wait *//* if a position is not open, wait */ while (sem<=0) wait; sem--; /* I take the open position *//* I take the open position */
Sem_post:Sem_post: sem++; /* when I leave, a position is open *//* when I leave, a position is open */
Synchronization
Copyright 1999, © Amy Apon, Ph.D.
28
Using Semaphores for Using Semaphores for Resource AllocationResource Allocation
Synchronization
main() { sem_init(&res_sem, 0, 5);
pshared
value (quantity of this resource)
sem_wait(&res_sem);
/* use resource here */
sem_post(&res_sem);
request resource
release resource
Copyright 1999, © Amy Apon, Ph.D.
29
Using Semaphores for Using Semaphores for Barrier SynchronizationBarrier Synchronization
Thread AThread A Thread BThread B
main() { sem_init(&semA, 0, 0); sem_init(&semB, 0, 0);
sem_post(&semB);sem_wait(&semA);
sem_post(&semA);sem_wait(&semB);
Synchronization
initial value is 0
Copyright 1999, © Amy Apon, Ph.D.
30
Condition VariablesCondition Variables
• Based on “Monitors”, by C. A. R. HoareBased on “Monitors”, by C. A. R. Hoare
• Allow threads to wait for a resource to Allow threads to wait for a resource to become availablebecome available
• Always used with a mutexAlways used with a mutex
pthread_mutex_t mutex; pthread_cond_t notempty, notfull;
Synchronization
Copyright 1999, © Amy Apon, Ph.D.
31
Condition VariablesCondition Variables
• Give a thread waiting for a resource the Give a thread waiting for a resource the first opportunity to use the mutex when first opportunity to use the mutex when the resource becomes availablethe resource becomes available
mutex
condition variable
Thread A locks mutex
A waits on condition
A obtains resource inside critical section
A enters critical section
A unlocks mutex
Thread B locks mutexB unlocks mutex
B releases resource
A waits on condition
Synchronization
A uses the resource outside of a critical section
Copyright 1999, © Amy Apon, Ph.D.
32
Using Conditions Variables Using Conditions Variables for Producer/Consumerfor Producer/Consumer
Producer
Consumer/* produce item in local buffer */
pthread_mutex_lock(&mut);
while (/* buffer is full */) pthread_cond_wait(¬full, &mut );
/* put an item in buffer */
pthread_cond_signal(¬empty);pthread_mutex_unlock(&mut);
pthread_mutex_lock(&mut);
while (/* buffer is empty */) pthread_cond_wait(¬empty, &mut );
/* get an item from buffer */
pthread_cond_signal(¬full);pthread_mutex_unlock(&mut);/* consume item */
Synchronization
Copyright 1999, © Amy Apon, Ph.D.
33
Shared Memory Shared Memory ProgrammingProgrammingWe have covered:We have covered:
• Process state, process creationProcess state, process creation
• Basics of pthreadsBasics of pthreads
• Thread synchronization using mutex, Thread synchronization using mutex,
semaphores, and condition variablessemaphores, and condition variables
Summary of
Shared Memory Programming
Copyright 1999, © Amy Apon, Ph.D.
34
Useful pthread Useful pthread informationinformation
• Getting Started With POSIX Threads, by Tom Getting Started With POSIX Threads, by Tom Wagner and Don TowsleyWagner and Don Towsley http://centaurus.cs.umass.edu/~wagner/threads_html/tutorial.html
• On-line thread tutorial from SunOn-line thread tutorial from Sun http://www.sun.com/sunworldonline/swol-02-1996/swol-02-threads.html
• Programming With Posix Threads, by David R. Programming With Posix Threads, by David R. Butenhof (Addison-Wesley Professional Butenhof (Addison-Wesley Professional Computing Series) Computing Series)
Copyright 1999, © Amy Apon, Ph.D.
35
What is an SMP? What is an SMP? (Symmetric (Symmetric
Multiprocessor)Multiprocessor)
• A computer with more than one CPUA computer with more than one CPU
• Disk subsystem, network, main Disk subsystem, network, main memory, I/O devices, … are all memory, I/O devices, … are all equally accessible to all processorsequally accessible to all processors
• However, each processor has its However, each processor has its own private cacheown private cache
Symmetric Multiprocessor
Copyright 1999, © Amy Apon, Ph.D.
36
Cache MemoryCache Memory
• Very fast memory, close to CPUVery fast memory, close to CPU
processorprocessor
cachecache
System busSystem bus
main memorymain memory
Symmetric Multiprocessor
Copyright 1999, © Amy Apon, Ph.D.
37
Cache MemoryCache Memory• Makes main memory appear faster (on the average)Makes main memory appear faster (on the average)
• If an item is in the cache, get itIf an item is in the cache, get it
• Otherwise, there is a cache miss and the item must be retrieved from memoryOtherwise, there is a cache miss and the item must be retrieved from memory
• Cache is 20 times (or more!) faster than memoryCache is 20 times (or more!) faster than memory
Symmetric Multiprocessor
Copyright 1999, © Amy Apon, Ph.D.
38
Cache MemoryCache Memory• Works because ofWorks because of
– spatial locality (next item to be accessed is likely to be close by)spatial locality (next item to be accessed is likely to be close by)
– temporal locality (this item is likely to be accessed again soon)temporal locality (this item is likely to be accessed again soon)
Think of the way we program, using loops, array access . . .Think of the way we program, using loops, array access . . .
Symmetric Multiprocessor
Copyright 1999, © Amy Apon, Ph.D.
39
Cache MemoryCache Memory
• Is organized in cache linesIs organized in cache lines
• A miss loads a cache line from memoryA miss loads a cache line from memory
Symmetric Multiprocessor
Any of these memory “lines”
Can be loaded into this cache location (in a direct mapped cache)
Copyright 1999, © Amy Apon, Ph.D.
40
Cache MemoryCache Memory
• If a line is cache needs to be replaced, If a line is cache needs to be replaced, then it must be copied back to memorythen it must be copied back to memory
Symmetric Multiprocessor
The location in memory is wrong until the cache line is copied back!
Copyright 1999, © Amy Apon, Ph.D.
41
SMP’s and Cache MemorySMP’s and Cache Memory• SMP cache and memory can be wrong! SMP cache and memory can be wrong!
-- the -- the cache coherence problemcache coherence problem
Symmetric Multiprocessor
System busSystem bus
A=5A=5
Proc 1Proc 1
A=5A=5
Proc 0Proc 0
A=7A=7
new (correct) value
old (incorrect) values!
Copyright 1999, © Amy Apon, Ph.D.
42
Snoopy BusSnoopy Bus
• Most common solution to the cache Most common solution to the cache coherence problem on SMP’scoherence problem on SMP’s
• The bus watches all reads and writesThe bus watches all reads and writes
• A cache miss causes the bus to A cache miss causes the bus to broadcast a request for the newest broadcast a request for the newest valuevalue
• A write sends an invalidate message A write sends an invalidate message
Symmetric Multiprocessor
Copyright 1999, © Amy Apon, Ph.D.
43
Snoopy BusSnoopy Bus
• Is a Is a very busyvery busy bus! bus!
• Works well for two (four?) processors, Works well for two (four?) processors, but is a classic “Von Neumann but is a classic “Von Neumann bottleneck”bottleneck”
• Does not perform well as the number Does not perform well as the number of the processors in the SMP gets of the processors in the SMP gets larger!larger!
Symmetric Multiprocessor
Copyright 1999, © Amy Apon, Ph.D.
44
Symmetric MultiprocessorSymmetric Multiprocessor
• Composed of two or more Composed of two or more (symmetric) processors, one of every (symmetric) processors, one of every other subsystemother subsystem
• Each processor has a private cacheEach processor has a private cache
• Snoopy bus is the most common Snoopy bus is the most common solution to the cache coherence solution to the cache coherence problemproblem
Symmetric Multiprocessor
Copyright 1999, © Amy Apon, Ph.D.
45
Symmetric Symmetric MultiprocessorsMultiprocessors
• Can degrade in performance as the Can degrade in performance as the number of processes increasesnumber of processes increases
• Performance depends on the Performance depends on the amount of data that is shared in the amount of data that is shared in the application!application!
Symmetric Multiprocessor
Copyright 1999, © Amy Apon, Ph.D.
46
Symmetric Symmetric MultiprocessorsMultiprocessors
• For more information, see:For more information, see:
• In Search of Clusters: The ongoing In Search of Clusters: The ongoing battle in lowly parallel computingbattle in lowly parallel computing, , Second Edition, by Gregory F. Pfister, Second Edition, by Gregory F. Pfister, Prentice Hall Publishing Company, Prentice Hall Publishing Company, 19981998
• Or a book on computer architectureOr a book on computer architecture
Symmetric Multiprocessor
Copyright 1999, © Amy Apon, Ph.D.
47
Presentation Two Presentation Two Distributed Memory Distributed Memory
ProgrammingProgramming
• Distributed memory processingDistributed memory processing
• TCP client/server examples TCP client/server examples
• How MPI works over TCP How MPI works over TCP
• Programming in MPIProgramming in MPI
• MPI set up, further information MPI set up, further information
top related