concurrent programming without locks keir fraser & tim harris adapted from an earlier...

60
Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Upload: oswin-barnett

Post on 15-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Concurrent Programming Without Locks

Keir Fraser & Tim Harris

Adapted from an earlier presentation by Phil Howard

Page 2: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Motivation

• Locking precludes parallelism

• Recall “A Lock-Free Multiprocessor OS Kernel” by Massalin et al– Extensive use of CAS2 (aka DCAS, DCADS)– instruction does not exist on today’s CPUs

• Need a practical and general non-blocking solution

Page 3: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Solutions?

• Only use data structures that can be implemented with CAS?– Limiting

• RCU– Still uses locks for writers– Still limited to CAS data structures

• Software MCAS

• Transactional Memory

Page 4: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Goals

• Concreteness• Linearizability• Non-blocking progress guarantee• Disjoint access parallelism• Read parallelism• Dynamicity• Practicable space costs• Composability

Page 5: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Caveats

• “It remains possible for a thread to see a mutually inconsistent view of shared memory if it performs a series of [read] calls.”

Page 6: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Definitions

• Obstruction freedom – a thread will make progress as long as it doesn’t contend with other threads access to any location

• Lock-freedom – The system as a whole will make progress

• Wait-freedom – Every thread makes progress

Focus is on Lock-free designWhole transactions are lock-free, not just the sub-

components

Page 7: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Design considerations

• Need to update multiple locations atomically – using only “real” instructions

• The secret?– Indirection!– Use descriptors to access values

Page 8: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

100

101

102

103

104

105

106

107

789456106

123123105

200100102

New ValueOld ValueAddress

Status

Memory

Descriptor

Page 9: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Implications of Descriptors

• Commit operation atomically updates status field• All accesses are indirect

– Need to distinguish between descriptor or value– Need to choose “actual”, “old”, or “new” value

• Once a descriptor is made visible, only the status field changes

• Once an outcome is decided, the status value doesn’t change– Retries use a new descriptor

• Descriptors are managed via garbage collection

Page 10: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Other requirements

• Descriptors must be able to own locations• Uncontended commits must work

– Prepare phase– Decision point– Update status value– Clean up– Status values: UNDECIDED, READ-

CHECK,SUCCESSFUL, FAILED

Page 11: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Other Requirements

• Contended Commits must make progress– Decided, but not complete

• Help the other thread complete

– Undecided, not read-check• Abort contending transactions

– Without contention management can lead to live-lock

• Help contending transactions– Sort memory addresses to prevent looping

– Read-check• Abort at least one contender• Prevent live-locks by totally ordering transactions

Page 12: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Algorithms

MCAS Multiple Compare And Swap

WSTM Word Software Transactional Memory

OSTM Object Software Transactional Memory

Page 13: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

MCAS

CAS( word *address, // actual valueword expected_value,word new_value);

(logically)MCAS( int count,

word *address[], // actual valuesword expected_value[],word new_value[]);

(but an extra indirection is added)(pointers must indirect through the descriptor!)

Page 14: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

MCAS

• Operates only on aligned pointers• Lower 2 bits used to distinguish

value/descriptor• Descriptors contain

– status– N– address[]– expected[]– new_value[]

Page 15: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Data Access

200100102

New Value

Old ValueAddress

Status: SUCCESS

descriptor

value

descriptor

300

200100105

New Value

Old ValueAddress

Status: UNKNOWN

Page 16: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

CCAS

Conditional CAS built from CAS - takes effect only if condition == undecided - used to insert descriptor references

CCAS( word *address,word expected_value,word new_value,word *condition);

return original value of *address

Page 17: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Word *MCASRead(word **addr){

word *v;retry_read:

v = CCASRead(addr);if ( !IsMCASDesc(v)) return v;

for (int i=0; i<v->N; i++) {if (v->addr[i] == addr) {

if (v->status == SUCCESS)if (CCASRead(addr) == v)

return v->new[i]else

goto retry_read;else // FAILED or UNKNOWN

if (CCASRead(addr) == v)return v->expected[i];

elsegoto retry_read;

}}return v;

}

Page 18: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

MCAS(3, {a,b,c}, {1,2,3}, {4,5,6})

1

2

3

a

b

c

Page 19: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

MCAS(3, {a,c,b}, {1,3,2}, {4,6,5})

1

2

363c

52b

41a

3

UNKNOWNa

b

c

Page 20: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

1

2

3

MCAS(3, {a,b,c}, {1,2,3}, {4,5,6})

1

2

363c

52b

41a

3

SUCCESS4

5

6

a

b

c

Page 21: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

bool MCAS(int N, word **a[], word *e[], word *n[])

{mcas_descriptor *d =

new mcas_descriptor();d->N = N; d->status = UNDECIDED;for (int i=0; i<N; i++) {

d->a[i] = a[i]; d->e[i] = e[i]; d->n[i] = n[i];

}address_sort(d);return mcas_help(d);

}

Page 22: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

bool mcas_help(mcas_descriptor *d){

word *v, desired = FAILED;bool success;

// Phase 1: acquirefor (int i=0; i<d->N; i++) {

while (TRUE){v = CCAS(d->a[i], d->e[i], d,

&d->status);if (v = d->e[i] || v == d) break;if (IsMCASDesc(v) )

mcas_help( (mcas_descriptor *)v );

elsegoto decision_point;

}}desired = SUCCESS;

decision_point:

Page 23: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

mcas_help continued

// PHASE 2: read – not used by MCAS

decision_point:

CAS(&d->status, UNDECIDED, desired);

// PHASE 3: clean up

success = (d->status == SUCCESS);

for (int i=0; i<d->N; i++) {

CAS(d->a[i], d, success ? d->n[i] : d->e[i]);

}

return success;

}

Page 24: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Claiming Ownership

200100102

789456104

777999108

New Value

Old ValueAddress

Status: UNKNOWN

102

104

108

CCAS Descr

108

999

&MCAS_Descr

&mcas->status

999

Page 25: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Claiming Ownership

200100102

789456104

777999108

New Value

Old ValueAddress

Status: UNKNOWN

102

104

108

CCAS Descr

108

999

&MCAS_Descr

&mcas->status

999

Page 26: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

word *CCAS(word **a, word *e, word *n, word *cond) {

ccas_descriptor *d = new ccas_descriptor();word *v;(d->a, d->e, d->n, d->cond) = (a,e,n,cond);while ( (v = CAS(d->a, d->e, d)) != d->e ) {

if ( IsCCASDesc(v) ) CCASHelp( (ccas_descriptor *)v);

elsereturn v;

}CCASHelp(d);return v;

}void CCASHelp(ccas_descriptor *d) {

bool success = (*d->cond == UNDECIDED);CAS(d->a, d, success ? d->n : d->e);

}

Page 27: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

word *CCASRead(word **a) {

word *v = *a;

while ( IsCCASDesc(v) ) {

CCASHelp( (ccas_descriptor *)v);

v = *a;

}

return v;

}

Page 28: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Conflicts

200100102

789456104

777999108

New Value

Old ValueAddress

Status: UNKNOWN

102

104

108

200999108

New Value

Old ValueAddress

Status: UNKNOWN

Page 29: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

bool mcas_help(mcas_descriptor *d){

word *v, desired = FAILED;bool success;

// Phase 1: acquirefor (int i=0; i<d->N; i++) {

while (TRUE){v = CCAS(d->a[i], d->e[i], d, &d->status);if (v = d->e[i] || v == d) break;if (IsMCASDesc(v) )

mcas_help( (mcas_descriptor *)v );else

goto decision_point;}

}

desired = SUCCESS;

decision_point:

Page 30: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Conflicts

200100102

789456104

777999108

New Value

Old ValueAddress

Status: UNKNOWN

102

104

108

200999108

New Value

Old ValueAddress

Status: UNKNOWN

Page 31: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Conflicts

200100102

789456104

777999108

New Value

Old ValueAddress

Status: UNKNOWN

102

104

108 200

Page 32: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

bool mcas_help(mcas_descriptor *d){

word *v, desired = FAILED;bool success;

// Phase 1: acquirefor (int i=0; i<d->N; i++) {

while (TRUE){v = CCAS(d->a[i], d->e[i], d, &d-

>status);if (v = d->e[i] || v == d) break;if (IsMCASDesc(v) )

mcas_help( (mcas_descriptor *)v );else

goto decision_point;}

}

desired = SUCCESS;decision_point:

Page 33: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Conflicts

200100102

456456104

999999108

New Value

Old ValueAddress

Status: UNKNOWN

102

104

108

123456104

200999108

New Value

Old ValueAddress

Status: UNKNOWN

Page 34: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

bool mcas_help(mcas_descriptor *d)

{

word *v, desired = FAILED;

bool success;

// Phase 1: acquire

for (int i=0; i<d->N; i++) {

while (TRUE){

v = CCAS(d->a[i], d->e[i], d, &d->status);

if (v = d->e[i] || v == d) break;

if (!IsMCASDesc(v) ) goto decision_point;

mcas_help( (mcas_descriptor *)v );

}

}

desired = SUCCESS;

decision_point:

Page 35: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

mcas_help continued

// PHASE 2: read – not used by MCAS

decision_point:

CAS(&d->status, UNDECIDED, desired);

// PHASE 3: clean up

success = (d->status == SUCCESS);

for (int i=0; i<d->N; i++) {

CAS(d->a[i], d,

success ? d->n[i] : d->e[i]);

}

return success;

}

Page 36: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

CCAS “failure modes”

• Someone helped us with the CCAS– call CCASHelp with our own descriptor– next time around, return MCAS descriptor– MCAS continues

• Someone else beat us to CCAS– help them with their CCAS– next time around, return their MCAS descriptor– Help with their MCAS– Our MCAS likely aborts

• Source value changed– return new value– MCAS aborts

Page 37: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

word *CCAS(word **a, word *e, word *n,

word *cond) {

ccas_descriptor *d = new ccas_descriptor();

word *v;

(d->a, d->e, d->n, d->cond) = (a,e,n,cond);

while ( (v = CAS(d->a, d->e, d)) != d->e ) {

if ( !IsCASDesc(v) ) return v;

CCASHelp( (ccas_descriptor *)v);

}

CCASHelp(d);

return v;

}

void CCASHelp(ccas_descriptor *d) {

bool success = (*d->cond == UNDECIDED);

CAS(d->a, d, success ? d->n : d->e);

}

Page 38: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

CCASHelp “failure modes”• MCAS aborted so status isn’t UNKNOWN

– old value put back in place

• MCAS aborted, CCASHelp doesn’t restore value– MCAS cleanup will put old value back in place

• Race: status switches to SUCCESS between check and CAS– CAS will fail because CCAS descriptor already

removed– CCAS return will not cause MCAS failure

• Race: status switches to FAILURE between check and CAS– CAS will always fail because for MCAS to fail,

someone must have read beyond us

Page 39: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Cost

• 3N + 1 CAS instructions (plus all the other code)

• “it is worth noting that the three batches of N updates all act on the same locations”

• “[improvements] may be useful if there are systems in which CAS operates substantially more slowly than an ordinary write.”

Page 40: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Deep Breath

Page 41: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

WSTM

• Remove requirement for space reserved in values being updated

• WSTM keeps track of locations rather than caller

• Provides read parallelism

• Obstruction free, not lock free nor wait free

Page 42: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Data Structures

100

200

300

400

version 52

Status: Undecideda1: (100,15) -> (200,16)

a2: (200,52) -> (100,53)

Orecs

Page 43: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Logical contents

• Orec contains a version number:– value comes direct from memory

• Orec contains a descriptor reference– descriptor contains address

• value comes from descriptor based on status

– descriptor does not contain address• value comes direct from memory

Page 44: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Transaction Process

• Call WSTMRead/WSTMWrite to gather/change data– Builds transaction data structure, but it’s NOT

visible

• WSTMCommitTransaction– Get ownership – update ORecs– Read-Check – check version numbers– Decide– Clean up

Page 45: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

version 52

version 15

version 53

version 16

Data Structures

100

200

300

400

Status: UNKNOWN

a1: (100,15) -> (200,16)

a2: (200,52) -> (200,52)a2: (200,52) -> (100,53)

200

100 Status: SUCCESS

Page 46: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Complications

• Fixed number of Orecs

• Hash collisions lead to false sharing

Page 47: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Issues• Orec ownership acts like a lock, so simple

scheme is not even obstruction free• Can’t help with “cleanup” because might

overwrite newer data• Can’t determine value during READCHECK, so

we’re forced to shoot down• force_decision() might be circular causing live

lock• helping requires <complicated> stealing of

transactions

• Uncontended cost is N+2

Page 48: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

OSTM

• Objects are represented as opaque handles– can’t use pointers directly– must rewrite data structures

• Get accessible pointers via OSTMOpenForReading/OSTMOpenForWriting

• Eliminates need for Orecs/aliasing

Page 49: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Evaluation

• “We use … reference-counting garbage collection”

• Evaluated with one thread/CPU

• “Since we know the number of threads participating in our experiments…”

Page 50: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Uncontended Performance

Page 51: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Contended Locks

Page 52: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Data Contention

Page 53: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Data/Lock Contention

Page 54: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Spare Slides

Page 55: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

word WSTMRead(wstm_transaction *tx, word *addr) {if (entry_exists) return entry->new_value;

if (orec->type != descriptor) create entry [current value, orec version]

else {force_decision(descriptor); // can’t be ours: not in commitif (descriptor contains our address)

if (status == SUCCESS) create entry [descr.new_val, descr.new_ver]

else create entry [descr.old_val, descr.old_ver]

else create entry [current value, descr.aliased.new_ver]

}

if (aliased) {if (entry->old_version != aliased->old_version)

status = FAILED;

entry->old_version = aliased->old_version;entry->new_version = aliased->new_version;

}

return entry->new_value;}

Page 56: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

void WSTMWrite(wstm_transaction *tx, word *addr, word

new_value {

get entry using WSTMRead logic

entry->new_value = new_value;

for each aliased entry {entry->new_version++;

}}

Page 57: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

bool WSTMCommit(wstm_transaction *tx) {

if (tx->status == FAILED) return false;

sort descriptor entriesdesired_status = FAILED;

for each updateif (!acquire_orec) goto decision_point;

CAS(status, UNDECIDED, READ_CHECK);for each read

if (!read_check) goto decision_point;

desired_status = SUCCESS;

decision_point:

Page 58: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

decision_point:status = tx->status;while (status != FAILED && status != SUCCESS) {

CAS(tx->status, status, desired_status);status = tx->status;

}

if (tx->status == SUCCESS)for each update

*addr = entry->new_value;

for each updaterelease_orec

return (tx->status == SUCCESS);}

Page 59: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

bool read_check(wstm_transaction *tx, wstm_entry *entry)

{if (orec is WSTM_descriptor) {

force_decision()if (SUCCESS)

version = new_version;else

version = old_version} else {

version = orec_version;}

return (version == entry->old_version);}

Page 60: Concurrent Programming Without Locks Keir Fraser & Tim Harris Adapted from an earlier presentation by Phil Howard

Data Structures

100

200

300

400

version 52

Status: Undecideda1: (100,15) -> (200,16)

a2: (200,52) -> (100,53)

a3: (300,15) -> (300,16)

Orecs

a1

a2

a3