parallel programming course explicit threading - intel programming course explicit threading paul...

71
Parallel Programming Course Explicit Threading Paul Guermonprez www.Intel-Software-Academic-Program.com [email protected] Intel Software 2012-03-14

Upload: buihanh

Post on 27-May-2018

237 views

Category:

Documents


3 download

TRANSCRIPT

Parallel Programming CourseExplicit Threading

Paul Guermonprezwww.Intel-Software-Academic-Program.com

[email protected] Software

2012-03-14

2

Programming with Explicit Threads

Identify tasks for threading

Identify computational model

•What work is to be done?

•What algorithm to use?

Identify how data will be accessed

•What is global? What is local?

•How is data to be assigned to threads?

3

Coding Considerations

Create function to encapsulate computation

•May be a function that already exists• If so, driver function may be needed to coordinate multiple threads

• Pthreads allows single parameter to thread function• C structure for multiple arguments

4

More Coding Considerations

Recast parameter to local variable if needed

•May be structure of several parameters

Add code to determine task of thread

•May need access to global variable

Add code to access data for task• Add code to protect global variables

• Shared data access may need to be restricted

Add code to synchronize thread executions

5

Task Allocation Methods

Static• Initial assignment sets work•Computations or data are divided equally• based on number of threads and thread ID

Dynamic•Useful if tasks require unequal or unknown execution time•Work given to threads as computation proceeds

6

What is Pthreads?

POSIX.1c standard

C language interface

Threads exist within same process

All threads are peers

•No explicit parent-child model

• Exception: “main thread” holds process information

7

Using Pthreads

Identify portions of code to thread

Encapsulate code into function

• If code is already a function, a driver function may need to be written to coordinate work of multiple threads

Add pthread_create() call to assign thread(s) to execute function

8

pthread_create

pthread_t *tidHandle of created thread

const pthread_attr_t *attr• attributes of thread to be created :•detach state, stack size, stack address

•void *(*function)(void *)• function to be mapped to thread

•void *arg• single argument to function

int pthread_create(tid, attr, function, arg);

9

pthread_create Explained

Spawn a thread running the function

Thread handle returned via pthread_t structure

• Specify NULL to use default attributes

Single argument sent to function

• If no arguments to function, specify NULL

Check error codes!• EAGAIN - insufficient resources to create thread• EINVAL - invalid attribute

10

Example: Thread Creation#include <stdio.h>#include <pthread.h>

void *hello () { printf("Hello Thread\n"); }

main() { pthread_t tid; pthread_create(&tid, NULL, hello, NULL);}

Main thread is process

When process ends, all threads end too.

This example may work, but we need some method of waiting for a thread to finish

11

Waiting for POSIX* Threads

Not a good idea!

// wasted cycles!// wasted cycles!

#include <stdio.h>#include <pthread.h>int threadDone = 0;

void *hello () { printf("Hello Thread\n"); threadDone = 1;}

main() { pthread_t tid; pthread_create(&tid, NULL, hello, NULL); while (!threadDone); }

12

Waiting for a Thread

pthread_t tid

• handle of joinable thread

void **val_ptr

• exit value returned by joined thread

int pthread_join(tid, val_ptr);

13

pthread_join Explained

Calling thread waits for thread with handle tid to terminate

•Only one thread can be joined

• Thread must be joinable

Exit value is returned from joined thread

• Type returned is (void *)

•Use NULL if no return value expected

Check Error Code :● ESRCH - thread (pthread_t) not found

● EINVAL - thread (pthread_t) not joinable

14

Thread States

Pthreads threads have two states

• joinable and detached

Threads are joinable by default

•Resources are kept until pthread_join

•Can be reset with attributes or API call

Detached threads cannot be joined• Resources can be reclaimed at termination• Cannot reset to be joinable

pthread_join detaches the thread automaticallyd

Threads can only be joined once.

15

Example: Multiple Threads

#include <stdio.h>#include <pthread.h>#define NUM_THREADS 4

void *hello () { printf("Hello Thread\n");

} main() {

pthread_t tid[NUM_THREADS];for (int i = 0; i < NUM_THREADS; i++)

pthread_create(&tid[i], NULL, hello, NULL);

for (int i = 0; i < NUM_THREADS; i++)pthread_join(tid[i], NULL);

}

There must be one call for each thread needed to be “joined” after termination

16

LAB #1: “HelloThreads”Modify the previous example code to print out

• Appropriate “Hello Thread” message

• Unique thread number • Use for-loop variable of pthread_create() loop

Example output:

Hello from Thread #0

Hello from Thread #1

Hello from Thread #2

Hello from Thread #3

17

What’s wrong?

What is printed for myNum?

void *threadFunc(void *pArg) { int* p = (int*)pArg; int myNum = *p; printf( "Thread number %d\n", myNum);}. . .// from main():for (int i = 0; i < numThreads; i++) { pthread_create(&tid[i], NULL, threadFunc, &i);}

18

Hello Threads Timeline

Time main Thread 0 Thread 1

T0 i = 0 --- ----

T1 create(&i) --- ---

T2 i++ (i == 1) launch ---

T3 create(&i) p = pArg ---

T4 i++ (i == 2) myNum = *p

myNum = 2

launch

T5 wait print(2) p = pArg

T6 wait exit myNum = *p

myNum = 2

19

Race Conditions

Concurrent access of same variable by multiple threads

•Read/Write conflict

•Write/Write conflict

Most common error in concurrent programs

May not be apparent at all times

20

How to Avoid Data Races

Scope variables to be local to threads

• Variables declared within threaded functions

• Allocate on thread’s stack

• TLS (Thread Local Storage)

Control shared access with critical regions

•Mutual exclusion and synchronization

• Lock, semaphore, condition variable, critical section, mutex…

21

Solution – “Local” Storage

void *threadFunc(void *pArg) { int myNum = *((int*)pArg); printf( "Thread number %d\n", myNum);}. . .

// from main():for (int i = 0; i < numThreads; i++) { tNum[i] = i; pthread_create(&tid[i], NULL, threadFunc, &tNum[i]);}

22

Pthreads* Mutex

Simple, flexible, and efficient

Enables correct programming structures for avoiding race conditions

New types

• pthread_mutex_t• the mutex variable

• pthread_mutexattr_t• mutex attributes

Before use, mutex must be initialized

Mutex can only be “held” by one thread at a time.

23

pthread_mutex_init

pthread_mutex_t *mutex

• mutex to be initialized

const pthread_mutexattr_t *attr

• attributes to be given to mutex

• Error Codes :• ENOMEM - insufficient memory for mutex• EAGAIN - insufficient resources (other than memory)• EPERM - no privilege to perform operation

int pthread_mutex_init( mutex, attr );

24

Alternate Initialization

Can also use the static initializer

PTHREAD_MUTEX_INITIALIZER

•Uses default attributes

Programmer must always pay attention to mutex scope

•Must be visible to threads

pthread_mutex_t mtx1 = PTHREAD_MUTEX_INITIALIZER;

25

pthread_mutex_lock

pthread_mutex_t *mutex

mutex to attempt to lock

int pthread_mutex_lock( mutex );

26

pthread_mutex_lock Explained

Attempts to lock mutex

• If mutex is locked by another thread, calling thread is blocked

Mutex is held by calling thread until unlocked

•Mutex lock/unlock must be paired or deadlock occurs

Check error code : ● EINVAL - mutex is invalid● EDEADLK - calling thread already owns mutex

27

pthread_mutex_unlock

pthread_mutex_t *mutex• mutex to be unlocked

Error Code :• EINVAL - mutex is invalid• EPERM - calling thread does not own mutex

int pthread_mutex_unlock( mutex );

28

Example: Use of mutex#define NUMTHREADS 4pthread_mutex_t gMutex; // why does this have to be global?int g_sum = 0;

void *threadFunc(void *arg) { int mySum = bigComputation(); pthread_mutex_lock( &gMutex ); g_sum += mySum; // threads access one at a time pthread_mutex_unlock( &gMutex );}

main() { pthread_t hThread[NUMTHREADS];

pthread_mutex_init( &gMutex, NULL ); for (int i = 0; i < NUMTHREADS; i++) pthread_create(&hThread[i],NULL,threadFunc,NULL);

for (int i = 0; i < NUMTHREADS; i++) pthread_join(hThread[i]);}

29

4 . 0

2 . 0

1 . 00 . 0 X

Numerical Integration Example

∫ 4 . 0

( 1 + x 2 ) d x = π0

1

static long num_steps=100000; double step, pi;

void main(){ int i; double x;

step = 1.0/(double) num_steps; for (i=0; i< num_steps; i++){ x = (i+0.5)*step; pi += 4.0/(1.0 + x*x); } pi *= step; printf("Pi = %f\n",pi);}

30

Lab 2: Computing Pi

Parallelize the numerical integration code using Win32* Threads

How can the loop iterations be divided among the threads?

What variables can be local?

What variables need to be visible to all threads?

31

Semaphores

Synchronization object that keeps a count•Represent the number of available resources

• Formalized by Edsgar Dijkstra

Two operation on semaphores•Wait [P(s)]: Thread waits until s > 0, then s = s-1

• Post [V(s)]: s = s + 1

32

POSIX Semaphores

Not part of the POSIX Threads specification

•Defined in POSIX.1b

Check for system support of semaphores before use

• Is _POSIX_SEMAPORES defined in <unistd.h>?

• If available, threads within a process can use semaphores

•Use header file <semaphore.h>

New type

• sem_t• the semaphore variable

33

sem_init

sem_t *sem

• counting semaphore to be initialized

int pshared

• if non-zero, semaphore can be shared across processes

unsigned int value

• initial value of semaphore

int sem_init( sem, pshared, value );

34

sem_init Explained

Initializes the semaphore object

If pshared is zero, semaphore can only be used by threads within the calling process

• If non-zero, semaphore can be used between processes

The value parameter sets the initial semaphore count value

Check error code :● EINVAL – sem is not valid semaphore● EPERM – process lacks approriate privelege● ENOSYS – semaphores not supported

35

sem_wait

sem_t *sem

• counting semaphore to decrement or wait

int sem_wait( sem );

36

sem_wait Explained

If semaphore count is greater than zero

•Decrement count by one (1)

• Proceed with code following

Else, if semaphore count is zeroThread blocks until value is greater than zero

Check error code :● EINVAL – sem is not valid semaphore● EDEADLK – deadlock condition was detected● ENOSYS – semaphores not supported

37

sem_post

sem_t *sem

• counting semaphore to be incremented

int sem_post( sem );

38

sem_post Explained

Post a wakeup to semaphore

If one or more threads are waiting, release one

•Otherwise, increment semaphore count by one (1)

Check error code :● EINVAL – sem is not valid semaphore● ENOSYS – semaphores not supported

39

Semaphore Uses

Control access to limited size data structures

•Queues, stacks, deques

•Use count to enumerate available elements

Throttle number of active threads within a region

Binary semaphore [0,1] can act as mutex

40

Semaphore Cautions

No ownership of semaphore

Any thread can release a semaphore, not just the last thread to wait

•Use good programming practice to avoid

No concept of abandoned semaphore

• If thread terminates before post, semaphore increment may be lost

•Deadlock

41

Example: Semaphore as Mutex

Main thread opens input file, waits for thread termination

Threads will

•Read line from input file

•Count all five letter words in line

42

Example: Mainsem_t hSem1, hSem2;FILE *fd; int fiveLetterCount = 0;

main(){ pthread_t hThread[NUMTHREADS];

sem_init (*hSem1, 0, 1); // Binary semaphore sem_init (*hSem2, 0, 1); // Binary semaphore

fd = fopen("InFile", "r"); // Open file for read

for (int i = 0; i < NUMTHREADS; i++) pthread_create (&hThread[i], NULL, CountFives, NULL); for (int i = 0; i < NUMTHREADS; i++) pthread_join (hThread[i], NULL);

fclose(fd);

printf("Number of five letter words is %d\n", fiveLetterCount);}

43

Example: Semaphores

void * CountFives(void *arg) { int bDone = 0 ; char inLine[132]; int lCount = 0;

while (!bDone) { sem_wait(*hSem1); // access to input bDone = (GetNextLine(fd, inLine) == EOF); sem_post(*hSem1); if (!bDone) if (lCount = GetFiveLetterWordCount(inLine)) { sem_wait(*hSem2); // update global fiveLetterCount += lCount; sem_post(*hsem2); } }}

44

Activity 3 – Using Semaphores

Use binary semaphores to control access to shared variables

45

Condition Variables

Semaphores are conditional on the semaphore count

Condition variable is associated with an arbitrary conditional • Same operations: wait and signal

Provides mutual exclusion• by having threads wait on the condition variable until

signaled.

46

Condition Variable and Mutex

Condition variables are always paired with a Mutex

• Protects evaluation of the conditional expression

• Prevents race condition between signaling thread and threads waiting on condition variable

47

Lost and Spurious Signals

Signal to condition variable is not saved

• If no thread waiting, signal is “lost”

• Thread can be deadlocked waiting for signal that will not be sent

Condition variable can (rarely) receive spurious signals

• Slowed execution from predictable signals

•Need to retest conditional expression

48

Condition Variable Algorithm

Avoids problems with lost and spurious signals

acquire mutex;while (conditional is true) wait on condition variable;perform critical region computation;update conditional;signal sleeping thread(s);release mutex;

Negation of condition needed to proceed

Mutex is automatically released when thread waits

May be optional

49

Condition Variablespthread_cond_init, pthread_cond_destroy

• initialize/destroy condition variable

pthread_cond_wait

• attempt to hold condition variable

pthread_cond_signal

• signal release of condition variable

pthread_cond_broadcast

• broadcast release of condition variable

50

Condition Variable Types

Data types used

• pthread_cond_t• the condition variable

• pthread_condattr_t• condition variable attributes

Before use, condition variable (and mutex) must be initialized

51

pthread_cond_init

pthread_cond_t *cond

• condition variable to be initialized

const pthread_condattr_t *attr

• attributes to be given to condition variable

•Return codes :• ENOMEM - insufficient memory for mutex

• EAGAIN - insufficient resources (other than memory)

• EBUSY - condition variable already intialized

• EINVAL - attr is invalid

int pthread_cond_init( cond, attr );

52

Alternate Initialization

Can also use the static initializer

PTHREAD_COND_INITIALIZER

•Uses default attributes

Programmer must always pay attention to condition (and mutex) scope

•Must be visible to threads

pthread_cond_t cond1 = PTHREAD_COND_INITIALIZER;

53

pthread_cond_wait

pthread_cond_t *cond• condition variable to wait on

pthread_mutex_t *mutex• mutex to be unlocked

int pthread_cond_wait( cond, mutex );

54

pthread_cond_wait Explained

Thread put to “sleep” waiting for signal on cond

Mutex is unlocked

• Allows other threads to acquire lock

•When signal arrives, mutex will be reacquired before pthread_cond_wait returns

Check error codes :• EINVAL - cond or mutex is invalid• EINVAL - different mutex for concurrent waits• EINVAL - calling thread does not own mutex

55

pthread_cond_signal

pthread_cond_t *cond• condition variable to be signaled

int pthread_cond_signal( cond );

56

pthread_cond_signal Explained

Signal condition variable, wake one waiting thread

If no threads waiting, no action taken

• Signal is not saved for future threads

Signaling thread need not have mutex

•May be more efficient

• Problem may occur if thread priorities used

Check error code :● EINVAL - cond is invalid

57

Example: Denominator

Two threads oversee a global variable

• Thread 1 calculates the value

• Thread 2 needs a non-zero value

A mutex controls access to the variable

Thread 1 signals thread 2 (waiting)

58

Example: Denominator

#include <pthread.h>

pthread_mutex_t denom_mtx = PTHREAD_MUTEX_INITIALIZER;pthread_cond_t denom_cond = PTHREAD_COND_INITIALIZER;float denominator = 0.0; /* global */

void *thread1() { pthread_mutex_lock( &denom_mtx );

denominator = f(); /* calculate denominator */ pthread_signal( &denom_cond ); /* signal waiting thread */

pthread_mutex_unlock( &denom_mtx );}

Thread 1 :

59

Example: Denominator

void *thread2() { float local_denom;

pthread_mutex_lock( &denom_mtx );

/* wait for non-zero denominator */ while( denominator == 0.0 ) pthread_cond_wait( &denom_cond, &denom_mtx ); local_denom = denominator;

pthread_mutex_unlock( &denom_mtx );

/* Use local copy of denominator for division */}

Thread 2 :

60

pthread_cond_broadcast

pthread_cond_t *cond• condition variable to signal

int pthread_cond_broadcast( cond );

61

pthread_cond_broadcast Explained

Wake all threads waiting on condition variable

If no threads waiting, no action taken

• Broadcast is not saved for future threads

Signaling thread need not have mutex

Check return code :● EINVAL - cond is invalid

62

Denominator: Many threadsAssume multiple threads created on Thread2()

What happens if…• all threads are waiting?• no threads are waiting?• only some threads are waiting?

• -> Everything works out just fine

void *thread1() { pthread_mutex_lock( &denom_mtx ); denominator = f(); pthread_cond_broadcast( &denom_cond ); /* wake all */ pthread_mutex_unlock( &denom_mtx );}

63

Activity 4 – Condition variables

Replace spin-wait and thread counting variable with condition variables to signal thread completion of computational piece

64

Some Advanced Functions

Thread-specific Data

Thread Attributes

Mutex Attributes

Condition Variable Attributes

Just a quick overview of functions

•Not too many details

65

Thread-specific Data

Another means for “local” storage

pthread_key_create

• Create thread-specific key for all threads

pthread_setspecific

• Associate thread-specific value with given key

pthread_getspecific

• Return current data value associated with key

pthread_key_delete

• Delete a data key

66

Thread Attribute Functionspthread_attr_init

• Initialize attribute object to default settings

pthread_attr_destroy

• Delete attribute object

pthread_{get|set}detachstate

• Return or set thread’s detach state

pthread_{get|set}stackaddr

• Return or set the stack address of thread

pthread_{get|set}stacksize

• Return or set the stack size of thread

67

Mutex Attribute Functionspthread_mutexattr_init

• Intialize mutex attribute object to defaults

pthread_mutexattr_destroy

• Delete mutex attribute object

pthread_mutexattr_{get|set}pshared

• Return or set whether mutex is shared between processes

68

Condition Attribute Functionspthread_condattr_init

• Initialize condition variable attribute object

pthread_condattr_destroy

• Destroy condition variable attribute object

pthread_condattr_{get|set}pshared

• Return or set whether condition variable is shared between processes

69

Summary

Create threads to execute work encapsulated within functions

Coordinate shared access between threads to avoid race conditions

• Local storage to avoid conflicts

• Synchronization objects to organize use

License Creative Commons - By 2.5

You are free:to Share — to copy, distribute and transmit the workto Remix — to adapt the workto make commercial use of the work

Under the following conditions:Attribution — You must attribute the work in the manner

specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

With the understanding that:Waiver — Any of the above conditions can be waived if you

get permission from the copyright holder.Public Domain — Where the work or any of its elements is in

the public domain under applicable law, that status is in no way affected by the license.

Other Rights — In no way are any of the following rights affected by the license:

o Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations;

o The author's moral rights;o Rights other persons may have either in the work itself or in

how the work is used, such as publicity or privacy rights.Notice — For any reuse or distribution, you must make clear

to others the license terms of this work. The best way to do this is with a link to the web page http://creativecommons.org/licenses/by/2.5/