distributed systems - carnegie mellon university · mism 95-702 distributed systems 3 skew and...
TRANSCRIPT
MISM 95-702 Distributed Systems 2
Learning Goals
• To understand:–The challenge of time in a distributed
system–How to synchronize distributed clocks–How you can assess the state of a
distributed system
MISM 95-702 Distributed Systems 3
Skew and drift
• Why can’t we have a global clock in a distributed system?
• Because clocks are imperfect:–Clock skew – different times
• Two clocks will have two different times.–Clock drift – different speeds
• Each clock varies in speed
Time• What is a second?
– 9,192,631,770 periods of transition between the two hyperfine levels of the ground state of Caesium-133 (Cs133)
• Ordinary quartz crystal clocks– Drifts 1 second every 11 days
• What can a computer do in a second?– Intel Core i7: ~100 Billion Instructions
MISM 95-702 Distributed Systems 4
MISM 95-702 Distributed Systems 5
Clocks
• Cesium clocks–Expensive
• GPS receiver–Less expensive– (GPS system has cesium clock(s))
MISM 95-702 Distributed Systems 6
3 Days in the life of my Mac10/20/14 11:55:00 PM ntpd[26] time reset -1.782968 s10/21/14 12:47:41 PM ntpd[26] time reset -0.719539 s10/21/14 4:30:51 PM ntpd[26] time reset +0.327154 s10/21/14 7:55:42 PM ntpd[26] time reset -0.238545 s10/21/14 10:29:06 PM ntpd[26] time reset +0.364890 s10/22/14 11:28:51 AM ntpd[26] time reset -1.058507 s10/22/14 3:09:51 PM ntpd[26] time reset +0.572059 s10/22/14 9:33:25 PM ntpd[26] time reset -0.165838 s10/22/14 10:19:11 PM ntpd[26] time reset +1.000670 s10/21/14 7:50:47 AM ntpd[26] time reset -0.171427 s10/21/14 10:10:30 AM ntpd[26] time reset +0.133970 s10/23/14 11:55:39 AM ntpd[26] time reset -0.136061 s10/23/14 12:37:57 PM ntpd[26] time reset -0.526902 s10/23/14 1:09:51 PM ntpd[26] time reset +0.400528 s
MISM 95-702 Distributed Systems 7
Network Time Protocol
Design Goals:• Sync with UTC over
Internet• Provide reliability via
redundancy• Scale to large number
of clients and servers• Defend against Mallory Graphic source:
http://en.wikipedia.org/wiki/Network_Time_Protocol
NTP Simulation• Work in pairs• Both students should have the clock demo page running
– http://tinyurl.com/702clock– Use a laptop, not a phone, and don’t allow the browser to sleep
• Use only ONE form sheet• ONE student starts the process
– fill in (a) and pass the form to student TWO• student TWO
– fills in (b) upon receipt of the form– then fills in (c) and passes the form back to student ONE
• student ONE– fills in (d)– calculates (e-i)– puts the value of (i) in their own clock demo (ONE only)
• Both students– eyeball your clocks and see if they are nearly synchronized 8
MISM 95-702 Distributed Systems 9
a Sent time
b Received time
c Sent-back time
d Returned-back time
Calculation
UDP Packet (reusable)
eTotal round trip
time(d-a)
f Remote processing time (c-b)
g Delay each way (e-f)/2
hOffset relative to
remote(d-g) - c
iAmount to adjust
local clock-h
a Sent time
b Received time
c Sent-back time
d Returned-back time
Calculation
UDP Packet (reusable)
eTotal round trip
time(d-a)
f Remote processing time (c-b)
g Delay each way (e-f)/2
hOffset relative to
remote(d-g) - c
iAmount to adjust
local clock-h
a Sent time
b Received time
c Sent-back time
d Returned-back time
Calculation
UDP Packet (reusable)
eTotal round trip
time(d-a)
f Remote processing time (c-b)
g Delay each way (e-f)/2
hOffset relative to
remote(d-g) - c
iAmount to adjust
local clock-h
MISM 95-702 Distributed Systems 11
Summarize
• Summarize in your own words how NTP synchronization works
• What is NTP synchronized time good enough for?–Median error is 2-5ms but has a long tail
• Source: Chi-Yao Hong , Chia-Chi Lin , Matthew Caesar, Clockscalpel: understanding root causes of internet clock synchronization inaccuracy, Proceedings of the 12th international conference on Passive and active measurement, p.204-213, March 20-22, 2011, Atlanta, GA
• What are its shortcomings?
Synchronization• There are two ways to synchronize
clocks in a distributed system.
1. Internal Synchronization:–Peer systems synchronize among
themselves–Bound
• If the synchronization scheme has a bound D– (i.e. two clocks will have max D skew)
• Then all clocks will agree within D.MISM 95-702 Distributed Systems 12
Synchronization
2. External Synchronization–Systems all synchronize with a more
authoritative external clock– I.e. UTC (Coordinated Universal Time)–Bound:
• Each clock is accurate to UTC within D
MISM 95-702 Distributed Systems 13
MISM 95-702 OCT 14
Process Histories& Global Histories
Process A
State 3c, 6p
SendA1 B:2p
State 3c, 4p
RecB1 B:1c
State 4c, 4p
SendA2 B:2c
State 2c, 4p
SendA3 B:2p
State 2c, 2p
RecB2 B:2p
State 2c, 4p
Process B
State 4c, 6p
RecA1 A:2p
State 4c, 8p
SendB1 A:1c
State 3c, 8p
RecA3 A:2p
State 3c, 10p
SendB2 A:2p
State 3c, 8p
Global History
State 3c, 6p
SendA1 B:2p
State 3c, 4p
State 4c, 6p
RecA1 A:2p
State 4c, 8p
SendB1 A:1c
…
Global Histories are:• causal• conjectures• they are not physical
MISM 95-702 OCT 15
Consistent historiesProcess A
State 3c, 6p
SendA1 B:2p
State 3c, 4p
RecB1 B:1c
State 4c, 4p
SendA2 B:2c
State 2c, 4p
SendA3 B:2p
State 2c, 2p
RecB2 B:2p
State 2c, 4p
Process B
State 4c, 6p
RecA1 A:2p
State 4c, 8p
SendB1 A:1c
State 3c, 8p
RecA3 A:2p
State 3c, 10p
SendB2 A:2p
State 3c, 8p
Before
After
Global History
State 4c, 6p
State 3c, 6p
MISM 95-702 OCT 16
Consistent historiesProcess A
State 3c, 6p
SendA1 B:2p
State 3c, 4p
RecB1 B:1c
State 4c, 4p
SendA2 B:2c
State 2c, 4p
SendA3 B:2p
State 2c, 2p
RecB2 B:2p
State 2c, 4p
Process B
State 4c, 6p
RecA1 A:2p
State 4c, 8p
SendB1 A:1c
State 3c, 8p
RecA3 A:2p
State 3c, 10p
SendB2 A:2p
State 3c, 8p
Before
After
Global History
State 3c, 6p
State 4c, 6p
MISM 95-702 OCT 17
Consistent historiesProcess A
State 3c, 6p
SendA1 B:2p
State 3c, 4p
RecB1 B:1c
State 4c, 4p
SendA2 B:2c
State 2c, 4p
SendA3 B:2p
State 2c, 2p
RecB2 B:2p
State 2c, 4p
Process B
State 4c, 6p
RecA1 A:2p
State 4c, 8p
SendB1 A:1c
State 3c, 8p
RecA3 A:2p
State 3c, 10p
SendB2 A:2p
State 3c, 8p
Before
After
Global History
State 4c, 6p
State 3c, 6p
SendA1 B:2p
MISM 95-702 OCT 18
Consistent historiesProcess A
State 3c, 6p
SendA1 B:2p
State 3c, 4p
RecB1 B:1c
State 4c, 4p
SendA2 B:2c
State 2c, 4p
SendA3 B:2p
State 2c, 2p
RecB2 B:2p
State 2c, 4p
Process B
State 4c, 6p
RecA1 A:2p
State 4c, 8p
SendB1 A:1c
State 3c, 8p
RecA3 A:2p
State 3c, 10p
SendB2 A:2p
State 3c, 8p
Before
After
Global History
State 4c, 6p
State 3c, 6p
SendA1 B:2p
State 3c, 4p
MISM 95-702 OCT 19
Consistent historiesProcess A
State 3c, 6p
SendA1 B:2p
State 3c, 4p
RecB1 B:1c
State 4c, 4p
SendA2 B:2c
State 2c, 4p
SendA3 B:2p
State 2c, 2p
RecB2 B:2p
State 2c, 4p
Process B
State 4c, 6p
RecA1 A:2p
State 4c, 8p
SendB1 A:1c
State 3c, 8p
RecA3 A:2p
State 3c, 10p
SendB2 A:2p
State 3c, 8p
Before
After
Global History
State 4c, 6p
State 3c, 6p
SendA1 B:2p
State 3c, 4p
RecB1 B:1c
X
in
MISM 95-702 OCT 20
Consistent historiesProcess A
State 3c, 6p
SendA1 B:2p
State 3c, 4p
RecB1 B:1c
State 4c, 4p
SendA2 B:2c
State 2c, 4p
SendA3 B:2p
State 2c, 2p
RecB2 B:2p
State 2c, 4p
Process B
State 4c, 6p
RecA1 A:2p
State 4c, 8p
SendB1 A:1c
State 3c, 8p
RecA3 A:2p
State 3c, 10p
SendB2 A:2p
State 3c, 8p
Before
After
Global History
State 4c, 6p
State 3c, 6p
SendA1 B:2p
State 3c, 4p
RecA1 A:2p
State 4c, 8p
Useful terminology to know• Process history• Happened-before relation • Consistent Global history
– A linearization of process histories– Maintaining the happened-before relation
• Consistent Global State– Also called a cut– A set of process states that could have occurred at the
same time.– Consistent cut
• A cut that maintains the happened-before relation– Inconsistent cut
• Breaks the happened-before relation
• Read Coulouris for concise definitions– And know definitions for final exam– Wikipedia can also be helpful
MISM 95-702 Distributed Systems 21
MISM 95-702 OCT 22
Diagrams can help to reason about global histories
p1
p2
p3
a b
c d
e f
m1
m2
Physicaltime
MISM 95-702 OCT 23
Consistent Cuts Example
p1
p2
p3
a b
c d
e f
m1
m2
Physicaltime
Source: Distributed Computing: Principles, Algorithms, and Systems By Ajay D. Kshemkalyani, Mukesh Singha
Because no global clocka straight line shows moreaccuracy than really possible
MISM 95-702 OCT 24
Are these global states (cuts) consistent?
p1
p2
p3
a b
c d
e f
m1
m2
Physicaltime
Notice state would include p1-3 but also m1
Logical vs Physical Concurrence• Logical concurrence: In distributed systems, two
events are logically concurrent if they do not causally affect each other.
• Physical concurrence: connotes that two events happen at the same instance in time. – This is unknowable in
a distributed system.• e is logically concurrent to all of a, b, m1, c, d, m2
• f is not logically concurrent to any of the events
MISM 95-702 OCT 26
p1
p2
p3
a b
c d
e f
m1
m2
Physicaltime
MISM 95-702 OCT 27
Lamport (Logical) Clocks• An alternative to (skewed, drifting) physical clocks• Two rules for assigning logical times to events:
1. Events on one process happen in order• Each happens-before the next• Increment the logical clock by 1 for each event
2. The passing of messages can be used to indicate happens-before between processes• The sending of the message happens-before the receiving of the
message.• Take the larger of the logical clock (+1) value from the process or the
message
MISM 95-702 OCT 31
L(d)>L(g) so did d happen after g?
a b
c d
e f
m1
m2
21
3 4
51
p1
p2
p3
Physical time
gNo.- d->f implies L(d) < L(f)- L(g) < L(d) does not imply g->d
Is this still useful?- It gives a partial ordering.
- The Lamport Clocks of two causally-related events will be ordered
- In this system design, d and gare not causally related.
- It does not matter if g->d
2
Total & Causal Ordering• Lamport Clocks can be used to coordinate the ordering of
distributed events so that each process executes events in the same order (e.g. updating a database)
• Vector clocks are extensions to Lamport's work:– Each process is assigned a process-index– Messages between processes carry a vector of process-index and
Lamport-clock with them• E.g. Dynamo, Amazon.com’s highly available key-value
storage system, uses vector clocks.– See: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf Section 4.4
MISM 95-702 Distributed Systems 32
Global State Problem 1
• We have stores and warehouses all over the world
• Each has a local system that tracks inventory.
• What is our current level of inventory?
MISM 95-702 Distributed Systems 33
Global State Problem 2
• We have a very complex chemical manufacturing plant
• Each sensor and valve is computer controlled
• There are some sensor and valve combinations that are very dangerous
• How do we know if we are in one of those states?
MISM 95-702 Distributed Systems 34
Finding Global States
• It is often important to obtain the global state of a distributed system.
• It would be nice to– Be omniscient – see all at once– Or stop time
• But in distributed systems, we only have unreliable messages that take time to go from one process to another.
• What to do?
MISM 95-702 Distributed Systems 35
MISM 95-702 Distributed Systems 36
Chandy & LamportSnapshot Algorithm
• Assumptions:– Reliable channels
• A channel is a network path from one process to another
– Every message sent is reliably received• Once (not more than once)• In order (FIFO)
– There is a channel (path) between every 2 processes
– Any process may initiate a snapshot– Processes continue as normal while snapshot is
taking place.• They don't stop their work!
MISM 95-702 Distributed Systems 37
Snapshot Algorithm• Marker receiving rule:
When I receive a marker on channel cIf (I am not yet taking part in this snapshot) {
Record my state now (count # of each commodity in hand only)Do Marker sending ruleRecord that the state of channel c is emptyBegin keeping track of state all other incoming channels (except c):
(I.e. count commodities arriving via each channel.)} else if this marker is coming in from a new channel c {
Record that the state of channel c is all messages it has received since it began keeping track. (No more coming.)
Stop recording any further messages on channel c (they are now post-shapshot)If (this is last channel to receive marker on) send records to Monitor process
}• Marker sending rule:
For (each outgoing channel c) {Send one marker message over c}
MISM 95-702 OCT 38
Count the total of c's and p'sProcess A
State 3c, 6p
SendA1 B:2p
State 3c, 4p
RecB1 B:1c
State 4c, 4p
SendA2 B:2c
State 2c, 4p
SendA3 B:2p
State 2c, 2p
RecB2 B:2p
State 2c, 4p
Process B
State 4c, 6p
RecA1 A:2p
State 4c, 8p
SendB1 A:1c
State 3c, 8p
RecA3 A:2p
State 3c, 10p
SendB2 A:2p
State 3c, 8p
A B Tot
9 10 19
MISM 95-702 OCT 39
Count the total of c's and p'sProcess A
State 3c, 6p
SendA1 B:2p
State 3c, 4p
RecB1 B:1c
State 4c, 4p
SendA2 B:2c
State 2c, 4p
SendA3 B:2p
State 2c, 2p
RecB2 B:2p
State 2c, 4p
Process B
State 4c, 6p
RecA1 A:2p
State 4c, 8p
SendB1 A:1c
State 3c, 8p
RecA3 A:2p
State 3c, 10p
SendB2 A:2p
State 3c, 8p
A B Tot
7 10 17
A B TotA B Net Tot
7 10 2 19
MISM 95-702 OCT 40
Remember: Snapshot state needs to account for messages
in progressp1
p2
p3
a b
c d
e f
m1
m2
Physicaltime
Source: Distributed Computing: Principles, Algorithms, and Systems By Ajay D. Kshemkalyani, Mukesh Singha
Queue: jms/PITplayer2
PITplayer2
PITplayer1
PITplayer3
Queue: jms/PITplayer3
Queue: jms/PITplayer1
In the trading simulation, the channels are implemented as JMS Queues. This meets the Chandy & Lamport snapshot algorithm assumption that there are channels from each player to every other player, and that the channels are First-In-First-Out.All players are implemented as Message Drive Beans (aka Queue Listeners)
Using State for Debugging• A: waitreply(B)• B:• C: waitreply(A)• D: waitreply(F)• E: waitreply(F)• F: waitreply(G)• G: waitreply(D)
• How could info like this be collected?
• What does it say about the state of the system?
MISM 95-702 OCT 44