1 principles of reliable distributed systems lecture 5: failure models, fault-tolerant broadcasts...

1

Principles of Reliable Distributed Systems

Lecture 5: Failure Models, Fault-Tolerant Broadcasts and

State-Machine Replication

Spring 2005

Dr. Idit Keidar

2

Today’s Material

• Distributed Systems 2nd edition, Sape Mullender (Editor) – Failure models, from Ch. 2– State machine replication, from Ch. 7– Fault-tolerant broadcasts, from Ch. 5– Vector clocks, Ch. 4 & Ch. 6 of Attiya-Welch

3

Models of Process FailuresCrash

Receive OmissionSend Omission

General Omission

Timing

Authenticated Byzantine

Byzantine

Benign

Malicious

4

Correct vs. Faulty Processes

• Look at a complete run (execution)– external observer’s view

• A process that does not fail in a run is correct in that run

• Otherwise, the process is faulty in the run– a process that fails any time in the run is faulty

throughout the entire run

5

Threshold Failure Model

• t out of n processes may fail

• t is usually given as a function of n, e.g.,– t < n– 2t < n– 3t < n

6

Models of Link Failures

• Failures can be attributed to links instead of processes

• Reliable links: – every message sent is eventually delivered

• Failure types:– Crash– Loss (omission)– Timing– Byzantine

7

Synchronous vs. Asynchronous

• Synchrony: – bounded latency, clock drift, processing time– process crash failures can be accurately detected

• Asynchrony: non-assumption– process crash failures cannot be accurately detected

• Failstop– like asynchronous, but crash failures accurately

detected

• Unreliable failure detectors – later in the course

8

Why These Models?

• Abstractions we can work with

• Behavior we care about– we don’t care if loss is due to router or network

card– we don’t care if Byzantine behavior is due to

software bug, hardware bug, or intrusion

• What happens if we run the algorithm and failures not in the model occur?

9

Providing Abstractions of Models

• In a synchronous system, we can implement failstop over crash

• If loss is bounded, we can use retransmission and abstract it away

• A process that detects that it is slow or incorrect can crash itself, simulating the crash model

10

State Machine(Active) Replication

• Client sends updates to all servers

• All servers are identical deterministic state machines– fixed number of replicas– start in same initial state

11

Fault-Tolerant State-Machine Replication

• How many replicas to mask – t crash failures?– t Byzantine failures?

12

Replica Coordination Requirements

• Agreement: all correct replicas receive all client requests

• Order: replicas process requests in the same order

13

Broadcast Service for Replication

• Primitives: broadcast(m), deliver(m).– For simplicity, assume m is unique.

Network

BroadcastAlgorithm

Application

deliverbroadcast

receivesend

BroadcastAlgorithm

Application

deliverbroadcast

receivesend

14

Reliable Broadcast Specification

• Validity: if a correct process broadcasts m then all correct processes eventually deliver m

• Agreement: if a correct process delivers m then all correct processes eventually deliver m– Uniform agreement: if any process delivers m then all

correct processes eventually deliver m

• Integrity: m is delivered by a correct process at most once, and only if it was previously broadcast

15

Reliable Broadcast Implementation

• Model: – asynchronous– process crash failures– reliable links between pairs of correct processes

• Simple implementation …• Does it work with links failures?

– under what condition?

• Does it solve Uniform Reliable Broadcast?

16

Ordering Properties for Broadcasts

• FIFO: If some process broadcasts message m before message m’, then every correct process that delivers m’ delivers m beforehand.

• Causal: If message m causally precedes message m’, then every correct process that delivers m’ delivers m beforehand.– causal order: transitive closure of FIFO + some process

delivers m before broadcasting m’

• Total order: If two correct processes deliver both m and m’, they deliver them in the same order.

17

FIFO Broadcast• Reliable Broadcast + FIFO• Simple implementation on top of reliable

broadcast (using sequence numbers)

Network

Reliable Broadcast

Application

receivesend

FIFO Broadcast

deliverbroadcast

deliverbroadcast

Reliable Broadcast

Application

receivesend

FIFO Broadcast

deliverbroadcast

deliverbroadcastReliable Broadcast

FIFO Deliver

18

Causal Broadcast

• Reliable Broadcast + Causal

• Implementation using Reliable Broadcast

• Reminder: last week we saw an implementation using

logical (Lamport) timestamps (LTS) that enforces total order in addition to Causal

• Fault-tolerance?

19

Vector Clocks

• Process pi has a vector clock VC[1…n]– VC[i] is the sequence number of the last

message broadcast by pi– for j≠i, VC[j] is the sequence number of the last

message pi delivered from pj

• VC is sent with each message m– for j≠i, m.VC[j] is pj’s latest message that

causally precedes m

20

Vector Clocks: Sending

• At process pi, on broadcast(m)– VC[i] := VC[i]+1– use reliable broadcast to send m with VC to all– deliver m locally

21

Vector Clocks: Delivery Rule

• Upon receive m– place in message buffer

• Deliver m from pj from buffer if– VC[j] = m.VC[j] -1 – forall k≠j : VC[k] ≥ m.VC[k]

• Upon deliver – VC[j] := VC[j] + 1

VC[j] is the number of messages of pj that

causally precedepi’s subsequent

messages

FIFO

22

Vector Clocks Example

<0,1,0>

<1,1,0><0,1,0><0,0,0>

<0,0,0>

<0,0,0> <0,1,0>

23

Fault-Tolerance?

<0,1,0>

<1,1,0><0,1,0><0,0,0>

<0,0,0>

<0,0,0>

X

X

24

Atomic Broadcast Services

• Atomic Broadcast:– Reliable Broadcast + Total Order

• FIFO Atomic Broadcast– FIFO + Reliable Broadcast + Total Order

• Causal Atomic Broadcast– Causal + Reliable Broadcast + Total Order

HW question: are FIFO Atomic Broadcast and Causal Atomic Broadcast equivalent?

25

Causal Atomic Broadcast in Failstop Model

• Order messages by logical timestamp (LTS)– Break ties by process id

• Use FIFO Broadcast• When is a message with LTS t delivered?

– Reminder: failstop = failures are accurately detected

– Assume further that no message from a faulty process arrives after its failure is detected

26

Example Exam Question

p1

p2

p3

m1

m1

m2

m2

m3

m3 m4

m4

t=0 t=5 t=10

0

0

0

0,2

1

1 1,1

1,32

2

3

2,2

3

1 2

2

2

27

Atomic Broadcast in Asynchronous Systems???

Alas, impossible if even one process can crash

28

The Consensus Problem

Each process has an input, should decide an output

• Agreement: correct processes’ decisions are the same

• Validity: decision is input of one process

• Termination: eventually all correct processes decide

29

Consensus versus Atomic Broadcast

• From Atomic Broadcast to Consensus

• From Consensus to Atomic Broadcast – Homework question

• From now on, we will focus mainly on consensus, and keep in mind that it suffices for Atomic Broadcast

30

Other Models

• Asynchronous– Consensus, and hence Atomic Broadcast impossible

even with one crash failure

– Reliable Broadcast possible

• Synchronous with Byzantine failures– Consensus, and hence Atomic Broadcast possible with

up to t = n-1 failures, assuming authentication

• Middle ground models will be discussed in future lessons

31

Now, Back to State Machines

• We can build state machines using Atomic Broadcast

• A client can forward its request to one of the servers to broadcast to the others– resend on timeout if the server fails– more than one for Byzantine failures

• What about client failures?– Crash? Byzantine?

32

Multicast Problems

• Processes organized in groups– groups have names– messages sent to groups– like broadcast, but for group members– processes can join and leave groups– more in upcoming recitations!

• Processes may care about who else is a member of the group (group membership)

1 principles of reliable distributed systems lecture 5: failure models, fault-tolerant broadcasts...

Documents

broadcast slide

t n slide

crash model slide

order slide

broadcast receivesend

course slide

correct process broadcasts

entire run slide