1 efficient dependency tracking for relevant events in shared memory systems anurag agarwal...
TRANSCRIPT
1
Efficient Dependency Tracking for Relevant Events in Shared Memory Systems
Anurag Agarwal ([email protected])Vijay K. Garg ([email protected])
PDS LabUniversity of Texas at Austin
2
Outline
Motivation Background Chain Clock Instances of Chain Clock Experimental Results Conclusion
3
Motivation
Dependency between events required for global state information
Applications like monitoring and debugging Vector clock [Fidge 88, Mattern 89]
O(N) operations for a system with N processes Dynamic creation of processes
4
Outline
Motivation Background Chain Clock Instances of Chain Clock Experimental Results Conclusion
5
Relevant Events
Events “useful” for application Predicate Detection
“There are no messages in the channel”
p1
p2
p3
p4
6
Vector Clocks [Fidge 88, Mattern 89] Assigns N-tuple (V) to every relevant event
e → f iff e.V < f.V (clock condition)
Process Pi : V = (0, … , 0) On an event e
I. If e is receive of message m:V = max (V, m.V)
II. If e is a relevant event:V[i] = V[i] + 1
III.If e is a send of message m:m.V = V
7
Outline
Motivation Background Chain Clock Instances of Chain Clock Experimental Results Conclusion
8
Key Idea
Any chain in the computation poset can function as a process
a
f
eb
d
c
h
g
p1
p2
p3
p4
a b c d
e f g h
9
Chain Clocks
A component in timestamp corresponds to a chain
Change “Rule II” in the vector clock algorithm If e is a relevant event
V[e.c] = V[e.c] + 1
Theorem: Chain clocks guarantee the “clock condition”
Goal: Online decomposition of poset into as few chains as possible
10
Outline
Motivation Background Chain Clock Instances of Chain Clock
DCC ACC VCC
Experimental Results Conclusion
11
Dynamic Chain Clocks (DCC)
Shared vector Z maintains up-to-date values of all components
Each process starts with empty vector Rule II
e.c = j such that Z[j] = e.V[j] Give preference to component last updated by Pi
V[e.c] = V[e.c] + 1
12
DCC: Example
I. If e is receive of message m:
V = max (V, m.V)
II. If e is a relevant event:e.c = i s.t. Z[i] = V[i]V[e.c] = V[e.c] + 1Z[e.c] = Z[e.c] + 1
III. If e is a send of message m: m.V = V
(1)p1
p2(0,1)
(1,1) = max{(1),(0,1)}
1 10
V1 V2 Z
1 1 122
(2,1)
(3,2)p3
V3
132
3
(3,1)
13
(3,1)
2
13
Problem
Number of processes can be much larger than minimal number of chains
(1)
p1
p2(0,1) (1,2)
(0,1,1) (1,2,2)
(0,1,1,1) (1,2,2,2)
p3
p4
14
Optimal Chain Decomposition Antichain: Set of pairwise concurrent elements Width: Maximum size of an antichain
Dilworth’s Theorem [1950] : A poset of width k can be partitioned into k chains and no fewer.
Requires knowledge of complete poset
15
Online Chain Decomposition
Elements of poset presented in a total order consistent with the poset
Assign elements to chains as they arrive Can be modeled as a game between
Bob : Presents elements Alice : Assigns them to chains
Felsner [1997] : For a poset of width k, Bob can force Alice to use k(k+1)/2 chains
16
Chain Partitioning Algorithm (ACC) Felsner gave an algorithm which meets the k(k+1)/2
bound Our algorithm is simpler and more efficient
B1 B2 B3
B1 … Bk : |Bi| = i
For an element z:
Insert into the first queue q in Bi with head < z
Swap queues in Bi and Bi-1 leaving q in its place
z
17
Drawback of DCC and ACC Require a shared data structure
Monitoring applications generally need a central server
Hybrid clocks Multiple servers, each responsible for a subset of
processes Finds chains within a process group
18
Shared Memory System
Accesses to shared variables induce dependencies
Observation: Access events for a shared variable form a chain
Variable-based Chain Clocks (VCC) Associate a component with every variable
19
VCC Application: Predicate Detection Predicate : (x = 1) and (y = 1) Only events changing x and y are relevant Associate a component of VCC with x and
other with y
x = 0
x =1 x = 2
x = 1y = 1
y = 2
Initially: x=0, y = 0
20
Outline
Motivation Background Chain Clock Instances of Chain Clock Experimental Results Conclusion
21
Experiments
Setup A multithreaded application Each thread generates a sequence of events Parameters:
Number of Processes Number of Events Probability of relevant event:
Metrics Number of components used Execution time
22
Components Used
Events = 100 = 1%
23
Execution Time
Events = 100 = 1%
24
Effect of Relevancy
Threads = 100Events = 100
25
Conclusion
Generalized vector clocks to a class of algorithms called Chain Clocks
Dynamic Chain Clock (DCC) can provide tremendous speedup and reduce memory requirement for applications
Antichain-based Chain Clock (ACC) meets the lower bound for chain decomposition
26
Questions?
27
28
Example: Poset of width 2
For a poset of width 2, Alice can force Bob to use 3 chains
1
2
1
3
29
Drawback of DCC and ACC Require a shared data structure
Monitoring applications generally need a central server
Hybrid clocks Multiple servers, each responsible for a subset of
processes Finds chains within a process group
30
Example: Poset of width 2
For a poset of width 2, Alice can force Bob to use 3 chains
1
2
1
3
31
Chain Partitioning Algorithm (ACC) Felsner gave an algorithm which meets the k(k+1)/2
bound Our algorithm is simpler and more efficient
B1 B2 B3
B1 … Bk : |Bi| = i
For an element z:
Insert into the first queue q in Bi with head < z
Swap queues in Bi and Bi-1 leaving q in its place
z
32
Happened Before Relation (→)[Lamport 78] Distributed computation with N processes Every process executes a series of events
Internal, send or receive event
p1
p2
e → f if there is a path from e to f e║f if there is no path between e and f
33
Future work
Lower bound for online chain decomposition when a decomposition into N chains is already known
Other chain decomposition strategies
34
Distributed System: Time vs Threads
Events = 100 = 1%
35
Distributed System: Events vs Time
Threads = 100 = 1%
36
Effect of Number of Events
Threads = 100 = 1%
37
DCC: Example
I. If e is receive of message m:
V = max (V, m.V)
II. If e is a relevant event:e.c = i s.t. Z[i] = V[i]V[e.c] = V[e.c] + 1Z[e.c] = Z[e.c] + 1
III. If e is a send of message m: m.V = V
(1)p1
p2(0,1)
(1,1) = max{(1),(0,1)}
1 10
V1 V2 Z
1 1 122
(2,1)
(3,2)p3
V3
132
3
(3,1)
13
(3,1)
2
38
39
40
Example for DCC – is it appropriate ? Is the content a bit too much for this amount
Where can I reduce it ? Remove VCC or ACC ?
Chain clock Generalizes vector clocks Reduces the time and memory overhead Elegantly handles dynamic process creation