# local-spin algorithms multiprocessor synchronization algorithms (20225241) lecturer: danny hendler...

TRANSCRIPT

Local-Spin Algorithms

Multiprocessor synchronization

algorithms (20225241)

Lecturer: Danny Hendler

This presentation is based on the book “Synchronization Algorithms and Concurrent Programming” by G. Taubenfeld and on a the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman

The CC and DSM models

This figure is taken from the survey “Shared-memory mutual exclusion: major research trends since 1986” by J. Anderson, Y-J. Kim and T. Herman

Remote and local memory accesses

In a DSM system: local

remote

In a Cache-coherent system:

An access of v by p is remote if it is the first access or if v has been written by another process since p’s last access of it.

Local-spin algorithmsIn a local-spin algorithm, all busy waiting

(‘await’) is done by read-only loops of local-accesses, that do not cause

interconnect traffic.

The same algorithm may be local-spin on one architecture (DSM/CC) and non-local spin

on the other!

For local-spin algorithms, our complexity metric is the worst-case number of Remote

Memory References (RMRs)

Peterson’s 2-process algorithm

Program for process 1

1. b[1]:=true2. turn:=13. await (b[0]=false or

turn=0)4. CS5. b[1]:=false

Program for process 0

1. b[0]:=true2. turn:=03. await (b[1]=false or

turn=1)4. CS5. b[1]:=false

Is this algorithm local-spin on a DSM machine?No

Is this algorithm local-spin on a CC machine?Yes

Peterson’s 2-process algorithm

Program for process 1

1. b[1]:=true2. turn:=13. await (b[0]=false or

turn=0)4. CS5. b[1]:=false

Program for process 0

1. b[0]:=true2. turn:=03. await (b[1]=false or

turn=1)4. CS5. b[0]:=false

What is the RMR complexity on a DSM machine?

Unbounded

What is the RMR complexity on a CC machine?Constant

Kessel’s single-writer algorithm

Program for process 0

1. b[0]:=true2. local[0]:=turn[1]3. turn[0]:=local[0]4. Await (b[1]=false or

local[0]<>turn[1])5. CS6. b[0]:=false

Program for process 1

1. b[1]:=true2. local[1]:=1-turn[0]3. turn[1]:=local[1]4. Await (b[0]=false or

local[1]=turn[0])5. CS6. b[1]:=false

Can Kessel’s algorithm be made local-spin on a DSM machine?Yes, if:

b[1], turn[1] are located at p0’s memory module

b[0], turn[0] are located at p1’s memory module

Anderson’s queue-based algorithmShared:integer ticket – A RMW object, initially 0bit valid[0..n-1], initially valid[0]=1 and valid[i]=0, for i{1,..,n-1}

Local:integer myTicket

Program for process i1. myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket2. await valid[myTicket]=1 ; wait for your turn3. CS4. valid[myTicket]:=0 ; dequeue5. valid[myTicket+1 mod n]:=1 ; signal successor

0 1 2 3 n-1

valid 1 0

1

0 0 0 0

ticket

Anderson’s queue-based algorithm (cont’d)

0ticket

valid 1 0 0 0 0

Initial configuration

1ticket

valid 1 0 0 0 0

After entry section of p3

0myTicket3

After p1 performs entry section

2ticket

valid 1 0 0 0 0

0myTicket3

1myTicket1

2ticket

valid 0 1 0 0 0

After p3 exits

1myTicket1

Anderson’s queue-based algorithm (cont’d)

What is the RMR complexity on a DSM machine?

Unbounded

What is the RMR complexity on a CC machine?Constant

Program for process i1. myTicket=fetch-and-inc-modulo-n(ticket) ; take a ticket2. await valid[myTicket]=1 ; wait for your turn3. CS4. valid[myTicket]:=0 ; dequeue5. valid[myTicket+1 mod n]:=1 ; signal successor

Graunke and Thakkar’s algorithm

Uses the more common swap primitive:

swap(w, new)do atomically prev:=w w:=new return prev

Graunke and Thakkar’s algorithm (cont’d)Shared:bit slots[0..n-1], initially slots[i]=1, for i{0,..,n-1}

structure {bit value, bit *node} tail, initially {0, &slots[0]}

Local:structure {bit value, bit *node} myRecord, prevbit temp

0

tail

1 1 1 1 1

0 2 3 n-11

slots

Graunke and Thakkar’s algorithm (cont’d)Shared:bit slots[0..n-1], initially slots[i]=1, for i{0,..,n-1}

structure {bit value, bit* slot} tail, initially {0, &slot[0]}

Local:structure {bit value, bit* node} myRecord, prev, bit temp

Program for process i1. myRecord.value:=slots[i] ; prepare to thread yourself to queue2. myRecord.slot:=&slots[i]3. prev=swap(&tail, myRecord) ; prev now points to predecessor4. await (*prev.slot ≠prev.value) ;local spin until predecessor’s value changes5. CS6. temp:=1-slots[i]7. slots[i]:=temp ; signal successor

Graunke and Thakkar’s algorithm (cont’d)

What is the RMR complexity on a DSM machine?

Unbounded

What is the RMR complexity on a CC machine?Constant

Program for process i1. myRecord.value:=slots[i] ; prepare to thread yourself to queue2. myRecord.slot:=&slots[i]3. prev=swap(&tail, myRecord) ; prev now points to predecessor4. await (*prev.slot ≠prev.value) ;local spin until predecessor’s value changes5. CS6. temp:=1-slots[i]7. slots[i]:=temp ; signal successor

The MCS queue-based algorithm

Type:Qnode: structure {bit locked, Qnode *next}Shared:Qnode nodes[0..n-1]

Qnode *tail initially nil

Local:Qnode *myNode, initially &nodes[i]Qnode *prev, *successor

Has constant RMR complexity under both the DSM and CC models

Uses swap and CAS

The MCS queue-based algorithm (cont’d)

Program for process i1. myNode.next := nil ; prepare to be last in queue2. prev := myNode ;prepare to thread yourself3. swap(&tail, prev) ;tail now points to myNode4. if (prev ≠ nil) ;I need to wait for a predecessor5. *myNode.locked := true ;prepare to wait6. *prev.next := myNode ;let my predecessor know it has to unlock me7. await myNode.locked := false8. CS9. if (myNode.next = nil) ; if not sure there is a successor 10. if (compare-and-swap(tail, myNode, nil) = false) ; if there is a

successor11. await (myNode->next ≠ null) ; spin until successor let me know its

identity12. successor := myNode->next ; get a pointer to my successor13. successor->locked := false ; unlock my successor14. else ; for sure, I have a successor15. successor := myNode->next ; get a pointer to my successor16. successor->locked := false ; unlock my successor

A local-spin tournament-tree algorithm(Anderson, Yang, 1993)

O(log n) RMR complexity for both DSM and CC systems

This is `suspected’ to be optimal!

Uses O(n log n) registers

0

0 1

0 1 2 3

0 1 2 3 4 5 6 7

Level 0

Level 1

Level 2

Processes

Each node is identified by

(level, number)

A local-spin tournament-tree algorithm (cont’d)

Shared:- Per each node, v, there are 3 registers: name[level, 2node], name[level, 2node+1] initially -1 turn[level, node]

- Per each level l and process i, a spin flag: flag[level, i]

Local:level, node, id

A local-spin tournament-tree algorithm (cont’d)Program for process i1. id:=i2. For level = o to log n-1 do ;from leaf to root3. node:= id/2 ;the current node4. name[level, 2node+(id mod 2)]:=i ;identify yourself5. turn[level,node]:=id ;update the tie-breaker6. flag[level, i]:=0 ;initialize the locally-accessible spin flag7. if (even(id))8. rival:=name[level, id+1]9. else10. rival:=name[level, id-1]11. if ( (rival ≠ -1) and (turn[level, node] = i) ) ;if not sure I should precede rival12. if (flag[level, rival] =0)13. flag[level, rival]:=1 ;release the rival from waiting 14. await flag[level, i] ≠ 0 ;await until sure the rival updated the tie-breaker15. if (turn[level,node]=i) ;if I lost16. await flag[level,i]=2 ;wait till rival notifies me its my turn17. id:=node ;move to the next level18. CS19. for level=log n –1 downto 0 do ;begin exit code20. id:= i/2level , node:= id/2 ;set node and id21. name[level, 2node+(id mod 2]) :=-1 ;erase name22. rival := turn[level,node] ;find who rival is (if there is one)23. if rival ≠ i ;if there is a rival24. flag[level,rival] :=2 ;notify rival