distributed systems - carnegie mellon university · mism 95-702 distributed systems 3 skew and...

MISM 95-702 Distributed Systems 1

Distributed Systems

Time and Global States


Learning Goals

• To understand:–The challenge of time in a distributed

system–How to synchronize distributed clocks–How you can assess the state of a

distributed system


Skew and drift

• Why can’t we have a global clock in a distributed system?

• Because clocks are imperfect:–Clock skew – different times

• Two clocks will have two different times.–Clock drift – different speeds

• Each clock varies in speed

Time• What is a second?

– 9,192,631,770 periods of transition between the two hyperfine levels of the ground state of Caesium-133 (Cs133)

• Ordinary quartz crystal clocks– Drifts 1 second every 11 days

• What can a computer do in a second?– Intel Core i7: ~100 Billion Instructions



Clocks

• Cesium clocks–Expensive

• GPS receiver–Less expensive– (GPS system has cesium clock(s))


3 Days in the life of my Mac10/20/14 11:55:00 PM ntpd[26] time reset -1.782968 s10/21/14 12:47:41 PM ntpd[26] time reset -0.719539 s10/21/14 4:30:51 PM ntpd[26] time reset +0.327154 s10/21/14 7:55:42 PM ntpd[26] time reset -0.238545 s10/21/14 10:29:06 PM ntpd[26] time reset +0.364890 s10/22/14 11:28:51 AM ntpd[26] time reset -1.058507 s10/22/14 3:09:51 PM ntpd[26] time reset +0.572059 s10/22/14 9:33:25 PM ntpd[26] time reset -0.165838 s10/22/14 10:19:11 PM ntpd[26] time reset +1.000670 s10/21/14 7:50:47 AM ntpd[26] time reset -0.171427 s10/21/14 10:10:30 AM ntpd[26] time reset +0.133970 s10/23/14 11:55:39 AM ntpd[26] time reset -0.136061 s10/23/14 12:37:57 PM ntpd[26] time reset -0.526902 s10/23/14 1:09:51 PM ntpd[26] time reset +0.400528 s


Network Time Protocol

Design Goals:• Sync with UTC over

Internet• Provide reliability via

redundancy• Scale to large number

of clients and servers• Defend against Mallory Graphic source:

http://en.wikipedia.org/wiki/Network_Time_Protocol

NTP Simulation• Work in pairs• Both students should have the clock demo page running

– http://tinyurl.com/702clock– Use a laptop, not a phone, and don’t allow the browser to sleep

• Use only ONE form sheet• ONE student starts the process

– fill in (a) and pass the form to student TWO• student TWO

– fills in (b) upon receipt of the form– then fills in (c) and passes the form back to student ONE

• student ONE– fills in (d)– calculates (e-i)– puts the value of (i) in their own clock demo (ONE only)

• Both students– eyeball your clocks and see if they are nearly synchronized 8


a Sent time

b Received time

c Sent-back time

d Returned-back time

Calculation

UDP Packet (reusable)

eTotal round trip

time(d-a)

f Remote processing time (c-b)

g Delay each way (e-f)/2

hOffset relative to

remote(d-g) - c

iAmount to adjust

local clock-h

a Sent time

b Received time

c Sent-back time


Calculation


eTotal round trip

time(d-a)



hOffset relative to

remote(d-g) - c

iAmount to adjust

local clock-h

a Sent time

b Received time

c Sent-back time


Calculation


eTotal round trip

time(d-a)



hOffset relative to

remote(d-g) - c

iAmount to adjust

local clock-h

NTP


NTPServer

Laptop

Time

a

cb

d


Summarize

• Summarize in your own words how NTP synchronization works

• What is NTP synchronized time good enough for?–Median error is 2-5ms but has a long tail

• Source: Chi-Yao Hong , Chia-Chi Lin , Matthew Caesar, Clockscalpel: understanding root causes of internet clock synchronization inaccuracy, Proceedings of the 12th international conference on Passive and active measurement, p.204-213, March 20-22, 2011, Atlanta, GA

• What are its shortcomings?

Synchronization• There are two ways to synchronize

clocks in a distributed system.

1. Internal Synchronization:–Peer systems synchronize among

themselves–Bound

• If the synchronization scheme has a bound D– (i.e. two clocks will have max D skew)

• Then all clocks will agree within D.MISM 95-702 Distributed Systems 12

Synchronization

2. External Synchronization–Systems all synchronize with a more

authoritative external clock– I.e. UTC (Coordinated Universal Time)–Bound:

• Each clock is accurate to UTC within D


MISM 95-702 OCT 14

Process Histories& Global Histories

Process A

State 3c, 6p

SendA1 B:2p

State 3c, 4p

RecB1 B:1c

State 4c, 4p

SendA2 B:2c

State 2c, 4p

SendA3 B:2p

State 2c, 2p

RecB2 B:2p

State 2c, 4p

Process B

State 4c, 6p

RecA1 A:2p

State 4c, 8p

SendB1 A:1c

State 3c, 8p

RecA3 A:2p

State 3c, 10p

SendB2 A:2p

State 3c, 8p

Global History

State 3c, 6p

SendA1 B:2p

State 3c, 4p

State 4c, 6p

RecA1 A:2p

State 4c, 8p

SendB1 A:1c

…

Global Histories are:• causal• conjectures• they are not physical

MISM 95-702 OCT 15

Consistent historiesProcess A

State 3c, 6p

SendA1 B:2p

State 3c, 4p

RecB1 B:1c

State 4c, 4p

SendA2 B:2c

State 2c, 4p

SendA3 B:2p

State 2c, 2p

RecB2 B:2p

State 2c, 4p

Process B

State 4c, 6p

RecA1 A:2p

State 4c, 8p

SendB1 A:1c

State 3c, 8p

RecA3 A:2p

State 3c, 10p

SendB2 A:2p

State 3c, 8p

Before

After

Global History

State 4c, 6p

State 3c, 6p

MISM 95-702 OCT 16


State 3c, 6p

SendA1 B:2p

State 3c, 4p

RecB1 B:1c

State 4c, 4p

SendA2 B:2c

State 2c, 4p

SendA3 B:2p

State 2c, 2p

RecB2 B:2p

State 2c, 4p

Process B

State 4c, 6p

RecA1 A:2p

State 4c, 8p

SendB1 A:1c

State 3c, 8p

RecA3 A:2p

State 3c, 10p

SendB2 A:2p

State 3c, 8p

Before

After

Global History

State 3c, 6p

State 4c, 6p

MISM 95-702 OCT 17


State 3c, 6p

SendA1 B:2p

State 3c, 4p

RecB1 B:1c

State 4c, 4p

SendA2 B:2c

State 2c, 4p

SendA3 B:2p

State 2c, 2p

RecB2 B:2p

State 2c, 4p

Process B

State 4c, 6p

RecA1 A:2p

State 4c, 8p

SendB1 A:1c

State 3c, 8p

RecA3 A:2p

State 3c, 10p

SendB2 A:2p

State 3c, 8p

Before

After

Global History

State 4c, 6p

State 3c, 6p

SendA1 B:2p

MISM 95-702 OCT 18


State 3c, 6p

SendA1 B:2p

State 3c, 4p

RecB1 B:1c

State 4c, 4p

SendA2 B:2c

State 2c, 4p

SendA3 B:2p

State 2c, 2p

RecB2 B:2p

State 2c, 4p

Process B

State 4c, 6p

RecA1 A:2p

State 4c, 8p

SendB1 A:1c

State 3c, 8p

RecA3 A:2p

State 3c, 10p

SendB2 A:2p

State 3c, 8p

Before

After

Global History

State 4c, 6p

State 3c, 6p

SendA1 B:2p

State 3c, 4p

MISM 95-702 OCT 19


State 3c, 6p

SendA1 B:2p

State 3c, 4p

RecB1 B:1c

State 4c, 4p

SendA2 B:2c

State 2c, 4p

SendA3 B:2p

State 2c, 2p

RecB2 B:2p

State 2c, 4p

Process B

State 4c, 6p

RecA1 A:2p

State 4c, 8p

SendB1 A:1c

State 3c, 8p

RecA3 A:2p

State 3c, 10p

SendB2 A:2p

State 3c, 8p

Before

After

Global History

State 4c, 6p

State 3c, 6p

SendA1 B:2p

State 3c, 4p

RecB1 B:1c

X

in

MISM 95-702 OCT 20


State 3c, 6p

SendA1 B:2p

State 3c, 4p

RecB1 B:1c

State 4c, 4p

SendA2 B:2c

State 2c, 4p

SendA3 B:2p

State 2c, 2p

RecB2 B:2p

State 2c, 4p

Process B

State 4c, 6p

RecA1 A:2p

State 4c, 8p

SendB1 A:1c

State 3c, 8p

RecA3 A:2p

State 3c, 10p

SendB2 A:2p

State 3c, 8p

Before

After

Global History

State 4c, 6p

State 3c, 6p

SendA1 B:2p

State 3c, 4p

RecA1 A:2p

State 4c, 8p

Useful terminology to know• Process history• Happened-before relation • Consistent Global history

– A linearization of process histories– Maintaining the happened-before relation

• Consistent Global State– Also called a cut– A set of process states that could have occurred at the

same time.– Consistent cut

• A cut that maintains the happened-before relation– Inconsistent cut

• Breaks the happened-before relation

• Read Coulouris for concise definitions– And know definitions for final exam– Wikipedia can also be helpful


MISM 95-702 OCT 22

Diagrams can help to reason about global histories

p1

p2

p3

a b

c d

e f

m1

m2

Physicaltime

MISM 95-702 OCT 23

Consistent Cuts Example

p1

p2

p3

a b

c d

e f

m1

m2

Physicaltime

Source: Distributed Computing: Principles, Algorithms, and Systems By Ajay D. Kshemkalyani, Mukesh Singha

Because no global clocka straight line shows moreaccuracy than really possible

MISM 95-702 OCT 24

Are these global states (cuts) consistent?

p1

p2

p3

a b

c d

e f

m1

m2

Physicaltime

Notice state would include p1-3 but also m1

MISM 95-702 OCT 25

Does a->e?

p1

p2

p3

a b

c d

e f

m1

m2

Physicaltime

Logical vs Physical Concurrence• Logical concurrence: In distributed systems, two

events are logically concurrent if they do not causally affect each other.

• Physical concurrence: connotes that two events happen at the same instance in time. – This is unknowable in

a distributed system.• e is logically concurrent to all of a, b, m1, c, d, m2

• f is not logically concurrent to any of the events

MISM 95-702 OCT 26

p1

p2

p3

a b

c d

e f

m1

m2

Physicaltime

MISM 95-702 OCT 27

Lamport (Logical) Clocks• An alternative to (skewed, drifting) physical clocks• Two rules for assigning logical times to events:

1. Events on one process happen in order• Each happens-before the next• Increment the logical clock by 1 for each event

2. The passing of messages can be used to indicate happens-before between processes• The sending of the message happens-before the receiving of the

message.• Take the larger of the logical clock (+1) value from the process or the

message

MISM 95-702 OCT 28

Lamport Clock numbering

a b

c d

e f

m1

m2

21

3 4

51

p1

p2

p3

Physical time

MISM 95-702 OCT 29

What time is g?

a b

c d

e f

m1

m2

21

3 4

51

p1

p2

p3

Physical time

g

MISM 95-702 OCT 30

Now what time is g?

a b

c d

e f

m1

m2

21

3 4

51

p1

p2

p3

Physical time

g

MISM 95-702 OCT 31

L(d)>L(g) so did d happen after g?

a b

c d

e f

m1

m2

21

3 4

51

p1

p2

p3

Physical time

gNo.- d->f implies L(d) < L(f)- L(g) < L(d) does not imply g->d

Is this still useful?- It gives a partial ordering.

- The Lamport Clocks of two causally-related events will be ordered

- In this system design, d and gare not causally related.

- It does not matter if g->d

2

Total & Causal Ordering• Lamport Clocks can be used to coordinate the ordering of

distributed events so that each process executes events in the same order (e.g. updating a database)

• Vector clocks are extensions to Lamport's work:– Each process is assigned a process-index– Messages between processes carry a vector of process-index and

Lamport-clock with them• E.g. Dynamo, Amazon.com’s highly available key-value

storage system, uses vector clocks.– See: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf Section 4.4


Global State Problem 1

• We have stores and warehouses all over the world

• Each has a local system that tracks inventory.

• What is our current level of inventory?


Global State Problem 2

• We have a very complex chemical manufacturing plant

• Each sensor and valve is computer controlled

• There are some sensor and valve combinations that are very dangerous

• How do we know if we are in one of those states?


Finding Global States

• It is often important to obtain the global state of a distributed system.

• It would be nice to– Be omniscient – see all at once– Or stop time

• But in distributed systems, we only have unreliable messages that take time to go from one process to another.

• What to do?



Chandy & LamportSnapshot Algorithm

• Assumptions:– Reliable channels

• A channel is a network path from one process to another

– Every message sent is reliably received• Once (not more than once)• In order (FIFO)

– There is a channel (path) between every 2 processes

– Any process may initiate a snapshot– Processes continue as normal while snapshot is

taking place.• They don't stop their work!


Snapshot Algorithm• Marker receiving rule:

When I receive a marker on channel cIf (I am not yet taking part in this snapshot) {

Record my state now (count # of each commodity in hand only)Do Marker sending ruleRecord that the state of channel c is emptyBegin keeping track of state all other incoming channels (except c):

(I.e. count commodities arriving via each channel.)} else if this marker is coming in from a new channel c {

Record that the state of channel c is all messages it has received since it began keeping track. (No more coming.)

Stop recording any further messages on channel c (they are now post-shapshot)If (this is last channel to receive marker on) send records to Monitor process

}• Marker sending rule:

For (each outgoing channel c) {Send one marker message over c}

MISM 95-702 OCT 38

Count the total of c's and p'sProcess A

State 3c, 6p

SendA1 B:2p

State 3c, 4p

RecB1 B:1c

State 4c, 4p

SendA2 B:2c

State 2c, 4p

SendA3 B:2p

State 2c, 2p

RecB2 B:2p

State 2c, 4p

Process B

State 4c, 6p

RecA1 A:2p

State 4c, 8p

SendB1 A:1c

State 3c, 8p

RecA3 A:2p

State 3c, 10p

SendB2 A:2p

State 3c, 8p

A B Tot

9 10 19

MISM 95-702 OCT 39

Count the total of c's and p'sProcess A

State 3c, 6p

SendA1 B:2p

State 3c, 4p

RecB1 B:1c

State 4c, 4p

SendA2 B:2c

State 2c, 4p

SendA3 B:2p

State 2c, 2p

RecB2 B:2p

State 2c, 4p

Process B

State 4c, 6p

RecA1 A:2p

State 4c, 8p

SendB1 A:1c

State 3c, 8p

RecA3 A:2p

State 3c, 10p

SendB2 A:2p

State 3c, 8p

A B Tot

7 10 17

A B TotA B Net Tot

7 10 2 19

MISM 95-702 OCT 40

Remember: Snapshot state needs to account for messages

in progressp1

p2

p3

a b

c d

e f

m1

m2

Physicaltime

Source: Distributed Computing: Principles, Algorithms, and Systems By Ajay D. Kshemkalyani, Mukesh Singha

P1 P2

C12

C21

Y B RP1C21Total

Y B RP2C12Total

P1

C12

C21

Y B RP1C21C31TOTAL

P2P3

Y B RP3C13C23TOTAL

Y B RP2C12C32TOTAL

C13

C31

C32

C23

Queue: jms/PITplayer2

PITplayer2

PITplayer1

PITplayer3



In the trading simulation, the channels are implemented as JMS Queues. This meets the Chandy & Lamport snapshot algorithm assumption that there are channels from each player to every other player, and that the channels are First-In-First-Out.All players are implemented as Message Drive Beans (aka Queue Listeners)

Using State for Debugging• A: waitreply(B)• B:• C: waitreply(A)• D: waitreply(F)• E: waitreply(F)• F: waitreply(G)• G: waitreply(D)

• How could info like this be collected?

• What does it say about the state of the system?

MISM 95-702 OCT 44

Global State for Termination

• A: transaction 3382 completed• B: transaction 3382 completed• C: transaction 3382 incomplete• D: transaction 3382 completed• E: transaction 3382 completed


distributed systems - carnegie mellon university · mism 95-702 distributed systems 3 skew and...

Documents