![Page 1: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/1.jpg)
THREAD PARALLELISM
Stephen Beard1
![Page 2: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/2.jpg)
LECTURE OUTLINE
Introduction to Threads
Correctness
Performance
2
![Page 3: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/3.jpg)
INTRODUCTION TO THREADS
3
![Page 4: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/4.jpg)
WHAT IS A THREAD?4
![Page 5: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/5.jpg)
WHAT IS A THREAD?
5
![Page 6: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/6.jpg)
THREADS VS. PROCESSES
Process “Heavyweight”
Slower context switches
Expensive IPC
Independent
Secure
Protected memory space
Thread “Lightweight”
Faster context switches
Direct communication
Share state and resources
Insecure
Shared memory space
(Generalities)
6
![Page 7: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/7.jpg)
USER THREADS AND KERNEL THREADS
7
User Thread Implemented in software
library
Transparent to the OS
Will block other threads
Library typically uses
non-blocking calls then
manages threads
Fast to create and manage
Do not benefit from
multithreading or
multiprocessing
Kernel Thread
Managed by OS
Will not block other
threads
Slower to swap than
user threads
![Page 8: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/8.jpg)
THREAD IMPLEMENTATIONS
Many to One Many to Many
One to One
8
![Page 9: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/9.jpg)
WHY USE THREADS?9
![Page 10: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/10.jpg)
WHY USE THREADS? Interactive Programs – Avoid blocking!
Modern Hardware
is designed for
thread level
parallelism
(TLP)
10
Source: Tom Ball - PPCP-54454
![Page 11: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/11.jpg)
HARDWARE FOR TLP
Chip Multi-Processors
GPUs
Clusters
Cloud Computing
Multithreading
11
![Page 12: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/12.jpg)
MULTI-THREADING TERMS
Superscalar – ILP mechanism for performing multiple instructions concurrently (One CPU with multiple functional units)
Fine-Grained – Switch between threads on each cycle
Coarse-Grained – Switch between threads on „costly‟ stalls (such as L2 cache miss)
Multiprocessing – Multi-core
Simultaneous – Multiple threads running concurrently on single processor 12
![Page 13: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/13.jpg)
MULTITHREADINGIntel
Pentium 4
Intel
Itanium 2
Intel
Hyper-Threading
Ex:
13
Sun
UltraSPARC
Source: Dr. Chris Lupo – CPE520 Advanced Computer Architecture Winter 2010
Intel
Core 2 Duo
![Page 14: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/14.jpg)
PTHREADS
14
![Page 15: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/15.jpg)
PTHREADS (POSIX THREADS)
C library that provides
Thread management
Shared Memory
Locks
In Linux
One to One
Created using „clone‟
15
![Page 16: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/16.jpg)
SIMPLE PTHREAD EXAMPLE
16
![Page 17: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/17.jpg)
METHODS OF THREAD COMMUNICATION
17
int gInt;
spawn t1, t2;
t1:
…
gInt = 5
…
t2:
…
…
int lInt = gInt
print lInt -> 5
…
Shared Memory -Memory that may be
simultaneously accessed by multiple threads
![Page 18: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/18.jpg)
METHODS OF THREAD COMMUNICATION
18
t1:
send 5
t2:
…
…
recv lInt
Print lInt -> 5
Message Passing - Threads pass messages for data
transfer and synchronization
![Page 19: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/19.jpg)
THREAD CORRECTNESS
19
![Page 20: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/20.jpg)
RACE CONDITIONS
Unsynchronized access to shared state from
multiple threads whose outcome depends upon
the order of access
r1.check, r2.check, r1.move, r2.move,CRASH
20
r1
r1
r2
r2
Source: Tom Ball - PPCP-54454
![Page 21: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/21.jpg)
RACE CONDITION PROGRAM
21
![Page 22: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/22.jpg)
SYNCHRONIZATION
Want to be able to control access to shared
memory
Several methods exist:
Mutex
Semaphore
Monitors
Barriers
22
![Page 23: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/23.jpg)
NAIVELY FIXING OUR ROBOTS
23
lock()
r1.check()
unlock()
...
…
lock()
r1.move()
unlock()
lock()
…
…
r2.check()
unlock()
lock()
…
…
r2.move()
unlock()
Robot 1 Robot 2
CRASH
![Page 24: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/24.jpg)
24
ATOMICITY
A statement sequence S is atomic if S‟s effects
appear to other threads as if S executed without
interruption
![Page 25: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/25.jpg)
FIXING OUR ROBOTS
25
lock()
r1.check()
r1.move()
unlock()
lock()
…
…
…
r2.check()
unlock()
Robot 1 Robot 2
r1
r2
r1
r2
![Page 26: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/26.jpg)
MUTEX EXAMPLE
26
![Page 27: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/27.jpg)
MUTEX IMPLEMENTATION - HARDWARE
Using XCHG on x86 to implement a mutex
XCHG exchanges two operands. If a memory operand
is involved, BUS LOCK is asserted for the duration of
the exchange.
27
LOCK: ; mutex pointer is in EBX; clobbers EAXXOR EAX, EAX ; Set EAX to 0XCHG EAX, [EBX] AND EAX, EAX ; Test for 1JZ LOCK ; if we got a zero, spin-waitRET
UNLOCK: ; mutex pointer is in EBXMOV [EBX], 1RET
![Page 28: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/28.jpg)
MUTEX IMPLEMENTATION - SOFTWARE
Peterson‟s Algorithm
Works for two processes, but can generalize
Does not work with out-of-order execution
28
flag[0] = 0;flag[1] = 0;
P0: flag[0] = 1; P1: flag[1] = 1;turn = 1; turn = 0;while (flag[1] == 1 && turn == 1) while (flag[0] == 1 && turn == 0){ {
// busy wait // busy wait} } // critical section // critical section
... ...// end of critical section // end of critical sectionflag[0] = 0; flag[1] = 0;
![Page 29: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/29.jpg)
MUTEX IMPLEMENTATION
Exact locking mechanism is hardware dependent
If a thread fails to acquire lock
Waits for lock
Spin vs Yield
How to handle multiple threads waiting on single lock
Queue
Scheduler
Reentrant Locks
Allowed to acquire same lock multiple times
Must be released same number of times
29
![Page 30: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/30.jpg)
OTHER ISSUES WITH LOCKS
Dead-lock – Circular waiting on locks
Live-lock – Locks state changing with no progress
Lock contention – Many threads require access to single lock
Lock overhead – Locking mechanisms are slow
Priority Inversion – Low priority thread holds lock, prevents progress of high priority
Convoying – Lock contention with slowest threads acquiring the lock first 30
![Page 31: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/31.jpg)
PERFORMANCE
31
![Page 32: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/32.jpg)
THREAD GRANULARITY
32
![Page 33: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/33.jpg)
THREAD GRANULARITY
Better to have lots of threads doing a little work
or a few threads doing lots of work?
Depends on:
How much communication overhead will result?
Implementation of threads
Hardware
33
![Page 34: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/34.jpg)
JACOBI ITERATIONS
For a matrix, on each iteration element‟s new
value = average of neighbors old values
How many threads?
34
![Page 35: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/35.jpg)
JACOBI IN C USING MPI
35
Row for 800 iterations
0
2
4
6
8
10
12
14
0 200 400 600 800 1000 1200
Space Size
Tim
e (
seco
nd
s)
4
16
64
Erlang MPI
![Page 36: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/36.jpg)
JACOBI IN ERLANG
36
![Page 37: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/37.jpg)
LOCKING GRANULARITY
37
![Page 38: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/38.jpg)
LOCKING GRANULARITY
Better to lock the entire structure, or parts?
Lock entire list when performing an operation
Only alter one lock per access to list
One thread in list blocks all others from accessing list
Lock each element of the list, hand-over-hand
Threads can work on different parts of the list concurrently
Lock per element, or group of elements
Threads in front of list prevent access to rest of list
38
![Page 39: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/39.jpg)
LOCK FREE DATA STRUCTURES
39
![Page 40: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/40.jpg)
LOCK-FREE ALGORITHMS
Can be more efficient and scalable than locking
Not the same as wait-free
Lock-free guarantees system progress
Wait-free guarantees thread progress
Operation must have bound on number of steps till
completion
Very rare as their performance is generally low
Good for many reads, few writes
Most attempt operation then retry if changed
occurred during operation 40
![Page 41: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/41.jpg)
COMPARE-AND-SWAP
CMPXCHG ON X86
Atomically compares contents of memory location
to a given value, if they match it updates value
Hardware support handles this operation
atomically
Integral in lock free structures
41
int compare_and_swap ( int* register, int oldval, int newval) {
int old_reg_val = *register;if (old_reg_val == oldval)
*register = newval;return old_reg_val;
}
![Page 42: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/42.jpg)
LOCK-FREE LINKED LIST – INSERTION
42
Harris, “A pragmatic implementation of non-blocking linked-lists”, 2001 (15th International Symposium on Distributed Computing)
Create new node
do
Find insertion location,
note left and right
nodes
![Page 43: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/43.jpg)
LOCK-FREE LINKED LIST – INSERTION
43
Harris, “A pragmatic implementation of non-blocking linked-lists”, 2001 (15th International Symposium on Distributed Computing)
Create new node
do
Find insertion location,
note left and right
nodes
Set new.next = right
![Page 44: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/44.jpg)
LOCK-FREE LINKED LIST – INSERTION
44
Harris, “A pragmatic implementation of non-blocking linked-lists”, 2001 (15th International Symposium on Distributed Computing)
Create new node
do
Find insertion location,
note left and right
nodes
Set new.next = right
If(CAS &left.next,
right, new) then return
![Page 45: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/45.jpg)
LOCK-FREE LINKED LIST – INSERTION
45
Create new node
do
Find insertion location,
note left and right
nodes
Set new.next = right
If(CAS &left.next,
right, new) then return
while(true)
Harris, “A pragmatic implementation of non-blocking linked-lists”, 2001 (15th International Symposium on Distributed Computing)
![Page 46: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/46.jpg)
LOCK-FREE LINKED LIST
Delete creates problems
Naive Delete
Fails for concurrent insert
46
Harris, “A pragmatic implementation of non-blocking linked-lists”, 2001 (15th International Symposium on Distributed Computing)
![Page 47: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/47.jpg)
LOCK-FREE LINKED LIST
Correct delete requires two compares
First mark deleted node as „logically deleted‟
Then „physically delete‟ the node
47
Harris, “A pragmatic implementation of non-blocking linked-lists”, 2001 (15th International Symposium on Distributed Computing)
![Page 48: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/48.jpg)
PERFORMANCE OF LOCK-FREE LINKED LIST
481 million random insertion, deletions on keys 0 - 8191
Harris, “A pragmatic implementation of non-blocking linked-lists”, 2001 (15th International Symposium on Distributed Computing)
![Page 49: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/49.jpg)
LOCK-FREE ABA PROBLEM – 1
49
Thread 1:
Insert 20 #interupted
Thread 2:
...
![Page 50: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/50.jpg)
ABA PROBLEM – 2
50
Thread 1:
Insert 20 #partial completion
...
Thread 2:
...
delete 30 address A
![Page 51: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/51.jpg)
ABA PROBLEM – 3
51
Thread 1:
Insert 20 #partial completion
...
...
Thread 2:
...
delete 30 #address A
insert 15 #address A
![Page 52: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/52.jpg)
ABA PROBLEM – 4
52
Thread 1:
Insert 20 #partial completion
...
...
Insert 20 #finishes and
#improperly succeeds
Thread 2:
...
delete 30 #address A
insert 15 #address A
![Page 53: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/53.jpg)
SOLUTIONS TO ABA
Keep “tag” bits on each pointer – ABA‟
Requires double-word CAS
Use reference counts on cells (Valois)
Only reuse cell when reference count = 0
Use „Load Linked‟ and „Store Conditional‟
LL returns value of memory location
SC stores only if no updates occurred since LL
53
![Page 54: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/54.jpg)
PERFORMANCE NOT ALWAYS GREAT
54
Harris, “A pragmatic implementation of non-blocking linked-lists”, 2001 (15th International Symposium on Distributed Computing)
1 million random insertion, deletions on keys 0 - 255
![Page 55: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/55.jpg)
NEXT TIME…Multi-process synchronization problems
•Producer Consumer!
•Reader-Writer!
•DOALL!
55
![Page 56: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/56.jpg)
APPENDIX
More interesting topics56
![Page 57: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/57.jpg)
AVOIDING ERRORS WITH PTHREADS
Create data structures that handle most of the
synchronization for you
Code the locks once correctly, then don‟t worry about
them anymore
For example:
Create a synchronized list
Perform locks inside add/remove/search functions
Synchronization now transparent to rest of program
57
![Page 58: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/58.jpg)
SMART PROGRAMMING WITH PTHREADS
Locks serialize the program, want to use as little
as possible
Only place lock around critical area
Less time spent holding lock, less lock contention
Locks have high overhead
Constant locking and unlocking can result in poor
performance
58
![Page 59: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/59.jpg)
WHAT IS A DATA RACE?
Two concurrent accesses to a memory location at least
one of which is a write.
Example: Data race between a read and a write
int x = 1;
Parallel.Invoke(
() => { x = 2; },
() => { System.Console.WriteLine(x); }
);
Outcome nondeterministic or worse
may print 1 or 2, or arbitrarily bad things on a relaxed
memory model
writes
xreads x
59Practical Parallel and Concurrent
Programming DRAFT: comments to
6/22/2010
![Page 60: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/60.jpg)
DATA RACES AND HAPPENS-BEFORE
Example of a data race with two writes:
int x = 1;
Parallel.Invoke( () => { x = 2; },
() => { x = 3; } );
System.Console.WriteLine(x);
We visualize the ordering of memory accesses with a
happens-before graph:
There is no path between
(write 2 to x) and (write 3 to x),
thus they are concurrent,
thus they create a data race
(note: the read is not in a data race)
write 2 to x write 3 to x
write 1 to x
read x
60Practical Parallel and Concurrent
Programming DRAFT: comments to
6/22/2010
![Page 61: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/61.jpg)
QUIZ: WHERE ARE THE DATA RACES?
Parallel.For(1,2, i => {
x = a[i];});
Parallel.For(1,2, i => {
a[i] = x;});
Parallel.For(1,2, i => {
a[i] = a[i+1];});
61Practical Parallel and Concurrent
Programming DRAFT: comments to
6/22/2010
![Page 62: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/62.jpg)
QUIZ: WHERE ARE THE DATA RACES?
Parallel.For(1,2, i => {
x = a[i];});
reads
a[0]writes x
reads
a[1]writes xrace
Parallel.For(1,2, i => {
a[i] = x;});
reads x
writes
a[0]
reads x
writes
a[1]
Parallel.For(1,2, i => {
a[i] = a[i+1];});
reads
a[2]writes
a[1]
reads
a[3]writes a[2]
Race between two
writes.
Race between a
read and a write.No Race between
two reads.
62Practical Parallel and Concurrent
Programming DRAFT: comments to
6/22/2010
![Page 63: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/63.jpg)
SPOTTING READS & WRITES
Sometimes a single statement performs multiple
memory accesses
6/22/2010Practical Parallel and Concurrent
Programming DRAFT: comments to
63
When you execute
a[i] = x
there are actually three
reads and one write:
reads x
reads a
reads i
writes a[i]
When you execute
x += y
there are actually two
reads and one write:
reads x
reads y
writes x
![Page 64: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/64.jpg)
DATA RACES CAN BE HARD TO SPOT.
Code looks fine... at
first.
Parallel.For(0, 10000, i => {a[i] = new Foo();})
64Practical Parallel and Concurrent
Programming DRAFT: comments to
6/22/2010
![Page 65: Thread Parallelism - Princeton University Computer ScienceJun 22, 2010 · USER THREADS AND KERNEL THREADS 7 User Thread Implemented in software library Transparent to the OS Will](https://reader033.vdocument.in/reader033/viewer/2022060313/5f0b42a87e708231d42fa315/html5/thumbnails/65.jpg)
DATA RACES CAN BE HARD TO SPOT.
Problem: we have to follow calls... even if they look
harmless at first (like a constructor).
Parallel.For(0, 10000, i => {a[i] = new Foo();})
class Foo {private static int counter;private int unique_id;public Foo()
{unique_id = counter++;}
}65
Practical Parallel and Concurrent
Programming DRAFT: comments to
6/22/2010