lecture 3: distributed systems ii - aidan...

99
CC52121 PROCESAMIENTO MASIVO DE D ATOS O TOÑO 2016 Lecture 3: Distributed Systems II Aidan Hogan [email protected]

Upload: others

Post on 01-Apr-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

CC5212‐1PROCESAMIENTO MASIVO DE DATOSOTOÑO 2016

Lecture 3: Distributed Systems II

Aidan [email protected]

Page 2: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

TYPES OFDISTRIBUTED SYSTEMS …

Page 3: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Client–Server Model

• Client makes request to server• Server acts and responds

(For example: Email, WWW, Printing, etc.)

Page 4: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Server

Client–Server: Three‐Tier Server

Data Logic Presentation

HTTP GET:Total salaryof all employees

SQL: Querysalary of all employees

Add all the salaries

Create HTML page

Page 5: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Peer‐to‐Peer: Unstructured

Pixie’s new album?

(For example: Kazaa, Gnutella)

Page 6: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Peer‐to‐Peer: Structured (DHT)

• Circular DHT:– Only aware of neighbours

– O(n) lookups

• Implement shortcuts– Skips ahead– Enables binary‐search‐like behaviour

– O(log(n)) lookups

000

001

010

011100

101

110

111

Pixie’s new album? 111

Page 7: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Desirable Criteria for Distributed Systems

• Transparency: – Appears as one machine

• Flexibility: – Supports more machines, more applications

• Reliability:– System doesn’t fail when a machine does

• Performance:– Quick runtimes, quick processing

• Scalability:– Handles more machines/data efficiently

Page 8: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Java RMI in the lab …

Server (receive)

Registry (port)

Client (send)

key skeleton

localhost172.17.69.YYY

Directory

172.17.69.XXX

Page 9: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Eight Fallacies (to avoid)

1. The network is reliable2. Latency is zero3. Bandwidth is infinite4. The network is secure5. Topology doesn’t change6. There is one administrator7. Transport cost is zero8. The network is homogeneous

What about the system we built in the lab?

Page 10: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

LET’S THINK ABOUT LAB 3

Page 11: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Using Java RMI to count trigrams …

Server (receive)

Registry (port)

Client (send)

key skeleton

localhost172.17.69.YYY

Directory

172.17.69.XXX

Page 12: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

LIMITATIONS OF DISTRIBUTED COMPUTING: CAP THEOREM

Page 13: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

But first … ACID

For traditional (non‐distributed) databases …

1. Atomicity: – Transactions all or nothing: fail cleanly

2. Consistency: – Doesn’t break constraints/rules

3. Isolation: – Parallel transactions act as if sequential

4. Durability– System remembers changes

Have you heard of ACID guarantees in a database class?

Page 14: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

What is CAP?

Three guarantees a distributed sys. could make

1. Consistency:– All nodes have a consistent view of the system

2. Availability:– Every read/write is acted upon

3. Partition‐tolerance:– The system works even if messages are lost

Page 15: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

A Distributed System (Replication)

– –

Page 16: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Consistency

– –

There’s 891 users in ‘M’

There’s 891 users in ‘M’

Page 17: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Availability

– –

How many users start with ‘M’

891

Page 18: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Partition‐Tolerance

– –

891

How many users start with ‘M’

Page 19: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

The CAP Question

Can a distributed system guarantee consistency(all nodes have the same up‐to‐date view), availability (every read/write is acted upon) and partition‐tolerance (the system 

works even if messages are lost) at the same time?

What do you think?

Page 20: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

The CAP Answer

Page 21: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

The CAP “Proof”

– –

How many users start with ‘M’

There’s 891 users in ‘M’

There’s 891 users in ‘M’

891

There’s 892 users in ‘M’

Page 22: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

The CAP “Proof” (in boring words)

• Consider machines m1 and m2 on either side of a partition:– If an update is allowed on m2 (Availability), then m1 cannot see the change: (loses Consistency)

– To make sure that m1 and m2 have the same, up‐to‐date view (Consistency), neither m1 nor m2 can accept any requests/updates (lose Availability)

– Thus, only when m1 and m2 can communicate (lose Partition tolerance) can Availability and Consistency be guaranteed

Page 23: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

The CAP Theorem

A distributed system cannot guarantee consistency (all nodes have the same up‐to‐date view), availability (every read/write is acted upon) and partition‐tolerance (the system works even if messages are lost) at the same time.

(“Proof” as shown on previous slide )

Page 24: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

The CAP Triangle

C

A P

ChooseTwo

Page 25: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

CAP Systems

C

A P(No intersection)

CA: Guarantees to give a correct response but only while network works fine(Centralised / Traditional)

CP: Guarantees responses are correct even if there are network failures, but response may fail (Weak availability)

AP: Always provides a “best‐effort” response even in presence of network failures (Eventual consistency)

Page 26: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

CA System

– –

How many users start with ‘M’

There’s 891 users in ‘M’

There’s 891 users in ‘M’

There’s 892 users in ‘M’

There’s 892 users in ‘M’

892

Page 27: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

CP System

– –

How many users start with ‘M’

There’s 891 users in ‘M’

There’s 891 users in ‘M’

891

Page 28: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

AP System

– –

How many users start with ‘M’

There’s 891 users in ‘M’

There’s 891 users in ‘M’

891

There’s 892 users in ‘M’

Page 29: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

BASE (AP)

• Basically Available– Pretty much always “up”

• Soft State– Replicated, cached data

• Eventual Consistency– Stale data tolerated, for a while

In what way was Twitter operating under BASE‐like conditions?

Page 30: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

The CAP Theorem

• C,A in CAP ≠ C,A in ACID

• Simplified model– Partitions are rare– Systems may be a mix of CA/CP/AP– C/A/P often continuous in reality!

• But concept useful/frequently discussed:– How to handle Partitions?

• Availability? or• Consistency?

Page 31: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

CONSENSUS

Page 32: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Consensus

• Goal: Build a reliable distributed system fromunreliable components– “stable replica” semantics: distributed system as a whole acts as if it were a single functioning machine

• Core feature: the system, as a whole, is able to agree on values (consensus)– Value may be:

• Client inputs– What to store, what to process, what to return

• Order of execution• Internal organisation (e.g., who is leader)• …

Page 33: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Consensus

– –

There’s 891 users in ‘M’

Under what conditions is consensus (im)possible?

Page 34: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Lunch Problem

Bob Alice

Chris

10:30AM. Alice, Bob and Chris work in the same city. All three have agreed to go downtown for lunch today but have yet to decide on a place and a time.

Page 35: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

CAP Systems (for example …)

C

A P(No intersection)

CA: They are guaranteed to go to the same place for lunch as long as each of them can be reached in time.

CP: If someone cannot be reached in time, they either all go to the same place for lunch or nobody goes for lunch.

AP: If someone cannot be reached in time, they all go for lunch downtown but might not end up at the same place.

But how easily they can reach consensus depends on how they 

communicate!

Page 36: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

SYNCHRONOUS VS. ASYNCHRONOUS

Page 37: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Synchronous vs. Asynchronous

• Synchronous distributed system:– Messages expected by a given time

• E.g., a clock tick

– Missing message has meaning

• Asynchronous distributed system:– Messages can arrive at any time– Missing message could arrive any time!

Page 38: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Asynchronous Consensus: Texting

11:35 AM. No response. Should Alice head downtown?

10:45 AM. Alice tries to invite Bob for lunch …

Hey Bob,Want to go downtown to McDonald’s for lunch at 12:00AM?

Page 39: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Asynchronous Consensus: Texting

11:42 AM. No response. Where should Bob go?

Hey Bob,Want to go downtown to McDonald’s for lunch at 12:00AM?

10:45 AM. Alice tries to invite Bob for lunch …

Hmm … I don’t like McDonald’s much. How about Dominos instead?

Page 40: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Asynchronous Consensus: Texting

11:38 AM. No response. Did Bob see the acknowledgement?

Hey Bob,Want to go downtown to McDonald’s for lunch at 12:00AM?

10:45 AM. Alice tries to invite Bob for lunch …

Hmm … I don’t like McDonald’s much. How about Dominos instead?

Okay, let’s go to Dominos.

Page 41: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Asynchronous Consensus

• Impossible to guarantee!– A message delay can happen at any time and a node can wake up at the wrong time!

– Fischer‐Lynch‐Patterson (1985): No consensus can be guaranteed amongst working nodes if there is even a single failure

• But asynchronous consensus can happen– As you should realise if you’ve ever successfully organised a meeting by email or text ;)

Page 42: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Asynchronous Consensus: Texting

11:38 AM. No response. Bob’s battery died. Alice misses the train downtown waiting for message, heads to the cafeteria at work instead. Bob charges his phone …

Hey Bob,Want to go downtown to McDonald’s for lunch at 12:00AM?

10:45 AM. Alice tries to invite Bob for lunch …

Hmm … I don’t like McDonald’s much. How about Dominos instead?

Okay, let’s go to Dominos.

Heading to Dominos now. See you there!

Page 43: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Asynchronous Consensus: Texting

How could Alice and Bob find consensus on a time and place to meet for lunch?

Page 44: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Synchronous Consensus: Telephone

10:46 AM. Clear consensus!

10:45 AM. Alice tries to invite Bob for lunch …

Hey Bob,Want to go downtown to McDonald’s for lunch at 12:00AM?

How about a completoat Domino’s instead?

Okay. 12:00AM?Yep!

See you then!

Page 45: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Synchronous Consensus

• Can be guaranteed!– But only under certain conditions …

What is the core difference between reaching consensus in synchronous (texting or email) vs. asynchronous (phone call) scenarios?

Page 46: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Synchronous Consensus: Telephone

10:46 AM. What’s the protocol?

10:45 AM. Alice tries to invite Bob for lunch …

Hey Bob,Want to go downtown to McDonald’s for lunch at 12:00AM?

How about a completo

Hello?beep, beep, beep

Page 47: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

From asynchronous to synchronous

• Agree on a timeout Δ– Any message not received within Δ = failure– If a message arrives after Δ, system returns to being asynchonous

• If Δ is unbounded, the system is asynchronous• May need a large value for Δ in practice

How could we (in some cases) turn an asynchronous system into a synchronous system?

Page 48: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Eventually synchronous

• Eventually synchronous: Assumesmostmessages will return within time Δ– More precisely, the number of messages that do not return in Δ is bounded

• We don’t need to set Δ so high• True in many practical systems

– If a message does not return in time Δ, if we keepretrying, eventually it will return in time Δ

Why might consensus be easier in an eventually synchronous system?

Page 49: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

FAULT TOLERANCE:FAIL–STOP VS. BYZANTINE

Page 50: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Faults

Page 51: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Fail–Stop Fault

• A machine fails to respond or times‐out (often hardware or load)• Need at least f+1 replicated machines? (beware asynch.!)

– f = number of clean failures

WordCount

de 4.575.144la 2.160.185en 2.073.216el 1.844.613y 1.479.936

Page 52: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Byzantine Fault

• A machine responds incorrectly/maliciously (often software)• Need at least 2f+1 replicated machines?

– f = number of (possibly Byzantine) failures

WordCount

de 4.575.144la 2.160.185en 2.073.216el 1.844.613y 1.479.936

el 4.575.144po 2.160.185sé 2.073.216ni 1.844.613al 1.479.936

de 4.575.144la 2.160.185en 2.073.216el 1.844.613y 1.479.936

How many replicated machines do we need to guarantee tolerance to f Byzantine faults?

Page 53: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Fail–Stop/Byzantine 

• Naively:– Need f+1 replicated machines for fail–stop– Need 2f+1 replicated machines for Byzantine

• Not so simple if nodes must agree beforehand!

• Replicas must have consensus to be useful!

Page 54: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

CONSENSUS GUARANTEES

Page 55: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Consensus Guarantees

• Under certain assumptions; for example– synchronous, eventually synchoronous, asynchronous

– fail‐stop, byzantine– no failures, one node fails, less than half fail

… there are methods to provide consensus withcertain guarantees

Page 56: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

A Consensus Protocol

• Agreement/Consistency [Safety]: All working nodes agree on the same value. Anything agreedis final!

• Validity/Integrity [Safety]: Every working node decides at most one value. That value has been proposed by a working node.

• Termination [Liveness]: All working nodes eventually decide (after finite steps).

• Safety: Nothing bad ever happens• Liveness: Something good eventually happens

Page 57: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

A Consensus Protocol for Lunch

• Agreement/Consistency [Safety]: Everyone agrees on the same place downtown for lunch, or agrees not to go downtown.

• Validity/Integrity [Safety]: Agreement involves a place someone actually wants to go.

• Termination [Liveness]: A decision will eventually be reached (hopefully before lunch).

Page 58: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

CONSENSUS PROTOCOL:TWO‐PHASE COMMIT

Page 59: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Two‐Phase Commit (2PC)

• Coordinator & cohort members

• Goal: Either all cohorts commit to the same value or no cohort commits to anything

• Assumes synchronous, fail‐stop behaviour– Crashes are known!

Page 60: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Two‐Phase Commit (2PC)

1. Voting:

I propose McDonalds! Is that okay?

Yes!Yes!

Page 61: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Two‐Phase Commit (2PC)

2.  Commit:

I have two yeses!Please commit.

Committed!Committed!

Page 62: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Two‐Phase Commit (2PC) [Abort]

1. Voting:

I propose McDonalds! Is that okay?

Yes!No!

Page 63: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Two‐Phase Commit (2PC) [Abort]

2.  Commit:

I don’t have two yeses!

Please abort.

Aborted!Aborted!

Page 64: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Two‐Phase Commit (2PC)

1. Voting: A coordinator proposes a commit value. The other nodes vote “yes” or “no” (they cannot propose a new value!).

2. Commit: The coordinator counts the votes. If all are “yes”, the coordinator tells the nodes to accept (commit) the answer. If one is “no”, the coordinator aborts the commit.

• For n nodes, in the order of 4n messages.– 2n messages to propose value and receive votes– 2n messages to request commit and receive acks

Page 65: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Two‐Phase Commit (2PC)

What happens if the coordinator fails?• Cohort members know coordinator has failed!

I have two yeses!Please commit.

Committed!Did you commit or abort?

Commit!

Page 66: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Two‐Phase Commit (2PC)

What happens if a coordinator and a cohort fail?Not fault‐tolerant!

I have two yeses!Please commit!

Did the other cohort 

commit or abort?

Committed!

Page 67: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Two‐Phase Commit (2PC)

I have two yeses!Please commit!

What happens if there’s a partition?Not fault‐tolerant!

Should I commit or abort?

Committed!

Page 68: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

CONSENSUS PROTOCOL:THREE‐PHASE COMMIT

Page 69: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Three‐Phase Commit (3PC)

1. Voting:

I propose McDonalds! Is that okay?

Yes!Yes!

Page 70: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Three‐Phase Commit (3PC)

2.  Prepare:

I have two yeses!Prepare to commit.

Prepared to commit!

Prepared to commit!

Page 71: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Three‐Phase Commit (3PC)

3.  Commit:

Everyone is prepared.

Please commit.

Committed!Committed!

Page 72: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Three‐Phase Commit (3PC)

1. Voting: (As before for 2PC)2. Prepare: If all votes agree, coordinator sends 

and receives acknowledgements for a “prepare to commit” message

3. Commit: If all acknowledgements are received, coordinator sends “commit” message

• For n nodes, in the order of 6n messages.– 4n messages as for 2PC– +2n messages for “prepare to commit”+ “ack.”

Page 73: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Three‐Phase Commit (3PC)

What happens if the coordinator fails?

Everyone is prepared. Please 

commit!

Prepared to commit!

Prepared to commit!

Okay!Committing!

Is everyone else prepared to commit?

Yes!

Okay!Committing!

Page 74: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Three‐Phase Commit (3PC)

What happens if coordinator and a cohort member fail?• Rest of cohort know if abort/commit!

Prepared to commit!

Okay!Committing!

It’s a commit!

Prepared to commit!

Page 75: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Two‐Phase vs. Three Phase

• In 2PC,  in case of failure, one cohort may already have committed/aborted while another cohort doesn’t even know if the decision is commit or abort!

• In 3PC, this is not the case!

Did you spot the difference?

Page 76: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

3PC useful to avoid locking

Page 77: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Two/Three Phase Commits

• Assumes synchronous behaviour!• Assumes knowledge of failures!

– Cannot be guaranteed if there’s a network partition!

• Assumes fail–stop errors

Page 78: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

How to decide the leader?

We need a leader for consensus … so what if we need consensus for a leader?

Page 79: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

CONSENSUS PROTOCOL:PAXOS

Page 80: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Turing Award: Leslie Lamport

• One of his contributions: PAXOS

Page 81: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

PAXOS Phase 1a: Prepare

• A coordinator proposes with a number n

I wish to lead a proposal! (72) 

72

Page 82: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

PAXOS Phase 1b: Promise

• By saying “okay”, a cohort agrees to reject lower numbers

I wish to lead a proposal! (72) 

Okay (72)! I accept and will reject 

proposals below 72.

I wish to lead a proposal! (23) 

Sorry! 72>23!

7272

72

Page 83: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

PAXOS Phase 1a/b: Prepare/Promise

• This continues until a majority agree and a leader for the round is chosen …

I wish to lead a proposal! (72) 

Okay (72)! I accept and will reject 

proposals below 72.

Okay (72)! I accept and will reject 

proposals below 72.

7272

72

Page 84: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

PAXOS Phase 2a: Accept Request

• The leader must now propose the value to be voted on this round …

McDonalds? (72)

72

7272

Page 85: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

PAXOS Phase 2b: Accepted

• Nodes will accept if they haven’t seen a higher request and acknowledge …

Okay (72)!

Okay (72) 

7272

72

McDonalds? (72)

Page 86: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

PAXOS Phase 3: Commit

• If a majority pass the proposal, the leader tells the cohort members to commit …

Commit!

72

72

72

Page 87: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

PAXOS Round

Wait for majority

Leader proposes

Wait for majority

I’ll leadwith id n?

Okay: n is highest 

we’ve seen

I propose“v” with n

Okay “v” sounds good

We’re agreed on “v”

1A:Prepare

1B:Promise

2B:Accepted

2A:AcceptRequest

3A:Commit

Page 88: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

PAXOS: No Agreement?

• If a majority cannot be reached, a new proposal is made with a higher number (by another member)

Page 89: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

PAXOS: Failure Handling• Leader is fluid: based on highest ID the members have stored– If Leader were fixed, PAXOS would be like 2PC

• Leader fails?– Another leader proposes with higher ID

• Leader fails and recovers (asynchronous)?– Old leader superseded by new higher ID

• Partition?– Requires majority / when partition is lifted, members must agree on higher ID

Page 90: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

PAXOS: Guarantees

• Validity/Integrity:(assumes fail‐stop errors)– Value proposed by a leader

• Agreement/Consistency (assumes fewer than half encounter errors and that all errors are fail‐stop)– A value needs a majority to pass– Each member can only choose one value– Therefore only one agreed value can have a majority in the system!

Page 91: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

PAXOS Guarantees:

• Termination/Liveness: (only if at least eventually synchronous)– Apply PAXOS in rounds based on the timeout Δ– If messages exceed Δ, retry in later round

Page 92: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

PAXOS variations

• Some steps in classical PAXOS not alwaysneeded; variants have been proposed:– Cheap PAXOS / Fast PAXOS / Byzantine PAXOS …

Page 93: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

PAXOS In‐Use

Chubby: “Paxos Made Simple”

Page 94: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

RECAP

Page 95: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

CAP Systems

C

A P(No intersection)

CA: Guarantees to give a correct response but only while network works fine(Centralised / Traditional)

CP: Guarantees responses are correct even if there are network failures, but response may fail (Weak availability)

AP: Always provides a “best‐effort” response even in presence of network failures (Eventual consistency)

Page 96: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Consensus for CP‐systems

• Synchronous vs. Asynchronous– Synchronous less difficult than asynchronous

• Fail–stop vs. Byzantine– Byzantine typically software (arbitrary response)– Fail–stop gives no response

Page 97: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Consensus for CP‐systems

• Two‐Phase Commit (2PC)– Voting– Commit

• Three‐Phase Commit (3PC)– Voting– Prepare– Commit

Page 98: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Consensus for CP‐systems

• PAXOS:– 1a. Prepare– 1b. Promise– 2a. Accept Request– 2b. Accepted– 3. Commit

Page 99: Lecture 3: Distributed Systems II - Aidan Hoganaidanhogan.com/teaching/cc5212-1-2016/MDP2016-03.pdf · CC5212‐1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2016 Lecture 3: Distributed

Questions?