# contention in shared memory multiprocessors multiprocessor synchronization algorithms (20225241)...

Post on 21-Dec-2015

223 views

TRANSCRIPT

Contention in shared memory multiprocessors

Multiprocessor synchronization

algorithms (20225241)

Lecturer: Danny Hendler

• Definitions• Lower bound for consensus• Lower bounds for counters, stacks and queues

Contention in shared-memory systems

Contention: the extent to which processes access the same memory locations simultaneously

When multiple processes simultaneously write to the same memory location, they are being stalled

High contention hurts performance!

Memory Stalls & Write-Contention

variable

p0p1p2pj

Stalls# j 2 1 0

Write-contention is the maximum number of processes that can be enabled to perform a write or read-modify-write operation to the same memory location simultaneously.

Recall the consensus implementation we saw…

Decide(v) ; code for pi, i=0,11. CAS(C, null, v) 2. return C

Initially C=null

We use a single object, C, that supports the compare&swap and read operations.

What is the write-contention of this algorithm?

nIt can be shown that this is the write-

contention of any consensus algorithm

What can we say about the worst-case time complexity of objects such as counters,

stacks and queues?

Naïve Counter Implementation

3

4

6 5

2

1

FAI

Last processes to succeed incur θ(n) time complexity!

FAI

FAIFAI

FAI

FAI

Can we do much better?

FAI object

We will see a time lower bound of √n on non-blocking implementations of:

counters, stacks, queues…

Any algorithm either (a) suffers high contention or (b) suffers high latency

Capture Influence between processes

3

5

1

4 2

6

Time complexity is determined by the extent by which operations by different processes

influence each other.

Influence-levelShared Counter

17

Each of us may precede you

and modify the value you will

get!

Influence level (w.r.t. p)

FAI

Hmmm… I will soon request a

value

p

Modifying StepsShared Counter

17FAI

Hmmm… I will soon request a

value

Each of us may precede

you!

pq

Modifying StepsShared Counter

17

Hmmm… I will soon request a

value

Each of us may precede

you!

pq

FAI

Modifying StepsShared Counter

17FAI

Hmmm… I will soon request a

value

Each of us may precede

you!

pq

Modifying StepsShared Counter

18

Hmmm… I will soon request a

value

Each of us may precede

you!

pq 17

There’s an atomic step in which q modifies p’s return value.

We bring all the ‘Influencers’ to be on the verge of performing a modifying step

FAI

Space/Write-contention tradeoff

• We bring all Influencers to be on the verge of a modifying step

• Each modifying step is necessarily a write/RMW operation

S ≥IC

Space complexity

Influence-level

Write-contention

Latency/Contention tradeoff

Base-objects on which there are outstanding modifying steps

Shared Counter

17 FAI

Hmmm… I will soon request a

value

p

Process p can be made to read all

these variables in the course of its

operation!

LR ≥IC

# of read base objects

Influence-level

Write-contention

Time lower bound

LRC ≥I

Time complexity is at least I

Influence(n) Objects ClassDefinition: The Influence-function, Io(n), of

a generic object O, is defined as follows:

Io(n)= k, if the influence-level of any n-process nonblocking implementation of O is at least k.

Influence(n) includes: stacks, queues, hash-tables, pools, linearizable counters, consensus, approximate-agreement…

Definition: Influence(n) is the class of generic objects whose Influence-function is in (n)

Concurrent Counter is in Influence(n)

Shared Counter

17

Each of us may precede

you!

FAI

Hmmm… I will soon request a

value

p

Influence-level is (n-1): every q≠p can influence p

Stack is in Influence(n)

Each of us may precede

you!

Hmmm… I will soon

attempt to pop a value.

p123

n

Top of stack

Influence-level is (n-1), e.g. if every q≠p has a pending pop operation.

Approximate Agreement is in Influence(n)

P1

0 2ε 2ε 2ε 2ε 2ε

Influence-level is (n-1)

If p1 runs first, it must return 0. If it is preceded by an

execution where some q≠p1 terminates, p

1 must return a

value no less than ε.

P2 P

3P

4P

5 Pn

In approximate agreement, each process proposes its value.

•Validity: Each process must decide on a value that is legal (in the range of proposed values).

•Approximate agreement: The values decided by any two processes must be no more than ε apart.

The First-Generation Problem

• Every process calls a First operation once.• We say an operation is in the first generation of execution

E if it is not preceded in E by any other operation

• All operations not in the first generation of the execution must return false.

• In quiescence, at least one operation from the first generation must have returned true.

Lemma

The First-Generation object is in Influence(n), and for this problem our bound is tight.

The bound for Influence(n) is tight

The mark array of n multi-reader multi-writer atomic variables

An Optimal Implementation for the First Generation Problem

Groups of n

processes

A linear lower bound on the number ofStalls for long-lived objects

The following material is not required

for the exam/assignments.

“Naïve” Counter Implementation

3

4

6 5

2

1

FAI

Last process incurs θ(n) time complexity!

FAI

FAIFAI

FAI

FAI

Can we do better?

Shared word supporting fetch&inc

FAI: Fetch-and-Increment

Theorem:Consider any n-process implementation of an obstruction-free counter, then the worst-case number of stalls incurred by a process as it performs a fetch&increment operation is at least n-1.

Worst-case stalls number ≥ n-1

Start from an initial state. Fix a process p about to perform a fetch&increment operation.

Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered.p

1

Worst-case stalls number ≥ n-1

Start from an initial state. Fix a process p about to perform a fetch&increment operation.

Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered.p

1 2

Worst-case stalls number ≥ n-1

Start from an initial state. Fix a process p about to perform a fetch&increment operation.

Consider the path it takes if it runs uninterrupted when only first-accesses to shared words are considered.p

1 2 3

Worst-case stalls number ≥ n-1

Start from an initial state. Fix a process p about to perform a fetch&increment operation.

1 2 34

Worst-case stalls number ≥ n-1

p

1 2 34

Let O1 be the first word along p's path that is written by some other process in any p-free execution

There must be such a word.

O1

Worst-case stalls number ≥ n-1

p

1 2 34

O1

Let E1 be an execution that maximizes the number of processes that are about to write to O1 over all p-free executions.

|G1| = K1

Worst-case stalls number ≥ n-1

p

1 2 34

O1

If (k1=n-1) then we are done.

|G1| = K1

Otherwise, we show that p must access yet another word that may be written by other processes.

Worst-case stalls number ≥ n-1

p

1 2 34

O1

|G1| = K1

What happens if p incurs the stalls on O1?

Worst-case stalls number ≥ n-1

p

1 2 34

O1

What happens if p incurs the stalls on O1?

Worst-case stalls number ≥ n-1

p

1 2 34

O1

What happens if p incurs the stalls on O1?

Worst-case stalls number ≥ n-1

p

1 2 34

O1

What happens if p incurs the stalls on O1?

Worst-case stalls number ≥ n-1

p

1 2 34

O1

What happens if p incurs the stalls on O1?

Worst-case stalls number ≥ n-1

p

1 2 34

O1

What happens if p incurs the stalls on O1?

Worst-case stalls number ≥ n-1

p

1 2 34

O1

What happens if p incurs the stalls on O1?

But now the rest of the path may change....

Worst-case stalls number ≥ n-1

p

1 2 34

O1

What happens if p incurs the stalls on O1?

But now the rest of the path may change....

3

Worst-case stalls number ≥ n-1

p

1 2 34

O1

What happens if p incurs the stalls on O1?

But now the rest of the path may change....

3

Worst-case stalls number ≥ n-1

p

1 2 4

O1

What happens if p incurs the stalls on O1?

But now the rest of the path may change....

3

Assume p gets value v

Worst-case stalls number ≥ n-1

1 2 4

O1

3

|G1| = K1

v: the value returned by p if we let it run and incur the stallsc: the number of fetch&increment operations completed before p starts its operation

We have: v {c,…,c+K1}

p

Worst-case stalls number ≥ n-1

v: the value returned by p if we let it run and incur the stallsc: the number of fetch&increment operations completed before p starts its operation

We have: v {c,…,c+K1}

time

q.enq(x)

q.enq(y)

fetch&inc

fetch&inc

fetch&inc

time

vp

q.enq(x)fetch&inc

q.enq(x)fetch&inc

fetch&inc

c q.enq(x)fetch&inc

K1

Worst-case stalls number ≥ n-1

1 2 4

O1

3

|G1| = K1

v: the value returned by p if we let it run and incur the stallsc: the number of fetch&increment operations completed before p starts its operation

p

We select some process q G1 {p}

We let q perform K1+1 fetch&increment operations

q must write to a word read by p after O1

Worst-case stalls number ≥ n-1

1 2 4

O1

3

|G1| = K1

p

We select some process q G1 {p}

We let q perform K1+1 fetch&increment operations

q must write to a word read by p after O1

q

time

q.enq(x)

q.enq(y)

q.deq(x)

fetch&inc

fetch&inc

fetch&inc

time

v' > vP

q.enq(x)fetch&inc

fetch&inc

c+K1+1 q.enq(x)fetch&inc

K1

Worst-case stalls number ≥ n-1v: the value returned by p if we let it run and incur the stallsc: the number of fetch&increment operations completed before p starts its operation

We let q perform K1+1 fetch&increment operations q must write to a

word read by p after O1

Worst-case stalls number ≥ n-1

1 2 4

O1

3

|G1| = K1

p

Let O2 be first word that will be accessed by p after it incurs the K1 stalls that is written by some process G1 {p}Let E2 be an execution that maximizes the number of processes that are about to write to O2 over all (G1 {p})-free executions.

Worst-case stalls number ≥ n-1

O1

|G1| = K1

p

Continuing with this construction we get:

O2

|G2| = K2 |Gm| = Km

Om

Conclusion: “Naïve ” implementation is best

possible!

3

4

6 5

2

1

FAI

FAI

FAIFAI

FAI

FAI

FAI object