distributed systems concepts

Distributed Systems Concepts

Jordan Halterman

Intro to

I N T R O D U C T I O N

2


3

W H A T I S A D I S T R I B U T E D

S Y S T E M ?


4

A N A T O M Y O F A D I S T R I B U T E D

S Y S T E MW H A T I S A D I S T R I B U T E D

S Y S T E M ?


5

A N A T O M Y O F A D I S T R I B U T E D

S Y S T E M F A L L A C I E S O F D I S T R I B U T E D C O M P U T I N G

W H A T I S A D I S T R I B U T E D

S Y S T E M ?

A collection of independent computers that appear to users as a single coherent system

W H A T I S A D I S T R I B U T E D S Y S T E M ?

6

“A collection of independent computers that appear to the users of the system as a single computer”

— Andrew Tanenbaum


7

“You know you have a distributed system when the crash of a computer you’ve never heard of stops you from getting any work done”

— Leslie Lamport


8

• Scalability and fault tolerance

• Memory, disk, and CPU are finite resources

• Computers crash and networks fail

• Science hasn’t kept up with technological needs


9

B U T D I S T R I B U T E D

S Y S T E M S A R E

H A R D !1 0

T H E T W O G E N E R A L S P R O B L E M

1 1

• Two generals on the opposite sides of a valley have to coordinate to decide when to attack

• Each general must be sure the other made the same decision

• Generals can only communicate through messages

• Messengers sent through the valley can be captured

A N A T O M Y O F A D I S T R I B U T E D S Y S T E M

1 2

Nodes


1 3

Nodes Networks


1 4

Nodes Networks Protocols


1 5

• Each independent component of a distributed system is called a node

• Also known as a process, agent or actor

• Operations within a node are fast

• Communication between nodes is slow

• Operations generally occur in order

N O D E S

1 6

S Y S T E M M O D E L

1 7

• Bounded message delays

• Accurate global clock

• Easy to reason about

• You don’t have one

ASYNCHRONOUSSYNCHRONOUS

• Processes execute independently

• Unbounded message delays

• No global clock

• Difficult to reason about

• You have one

• Nodes communicate via messages

• Example: UDP, TCP, HTTP

M E S S A G E P A S S I N G

1 8

F A L L A C I E S O F D I S T R I B U T E D C O M P U T I N G

1 9

The network is reliable


2 0


Latency is zero


2 1


Latency is zero

Bandwidth is infinite


2 2


Latency is zero


The network is secure


2 3


Latency is zero




2 4

Topology doesn’t change


Latency is zero




2 5


There is one administrator


Latency is zero




2 6



Transport cost is zero


Latency is zero




2 7



Transport cost is zero

The network is homogeneous

FALLACY #1

THE NETWORK IS RELIABLE


2 8

• On average 5.2 devices and 40.8 lines fail per day in Microsoft data centers

• The majority of Google’s outages that lasted more than 30 seconds were due to network maintenance or connectivity issues

• If network hardware doesn’t fail, software will

• We cannot rely on the network to deliver our communications


2 9

FALLACY #2

LATENCY IS ZERO


3 0

• Latency is the time it takes for a signal to travel from one computer to another

• Latency is a function of the speed of light

• It takes 40 milliseconds for light to travel from New York to Paris and back

• The JVM executes billions of instructions per second


3 1

FALLACY #3

BANDWIDTH IS INFINITE


3 2

• Bandwidth is roughly the amount of information that can be transmitted each second

• Networks are limited by hardware

• Applications are limited by software


3 3

FALLACY #4

THE NETWORK IS SECURE


3 4

• We see hacks of major corporations’ networks seemingly on a weekly basis

• In 2015, Foxglove Security discovered a major vulnerability in Java’s serialization framework

• Allowing remote access to friendly users opens systems up to unfriendly ones


3 5


3 6

D A T A B R E A C H E S S I N C E 2 0 0 5

FALLACY #5

TOPOLOGY DOESN’T CHANGE


3 7

• Administrators add and remove servers from networks

• We cannot depend on machines always being in the same place

• Service discovery and routing layers solve this problem


3 8

FALLACY #6

THERE IS ONE ADMINISTRATOR


3 9

• Production systems are often maintained and managed by numerous people

• Multiple administrators may institute conflicting policies


4 0

FALLACY #7

TRANSPORT COST IS ZERO


4 1

• Local processing is cheap

• Network communication is expensive

• Latency and bandwidth ensure transport cost is never zero


4 2

FALLACY #8

THE NETWORK IS HOMOGENEOUS


4 3

• Applications must be designed to work in a variety of environments

• Wired networks

• Wireless networks

• Cellular networks

• Satellite networks


4 4

C O N C E P T S

4 5

C O N C E P T S

4 6

T I M E I N D I S T R I B U T E D

S Y S T E M S

C O N C E P T S

4 7

C O N S I S T E N C Y I N D I S T R I B U T E D

S Y S T E M ST I M E I N D I S T R I B U T E D

S Y S T E M S

C O N C E P T S

4 8

C O N S I S T E N C Y I N D I S T R I B U T E D

S Y S T E M S P A R T I T I O N I N G A N D

R E P L I C A T I O N

T I M E I N D I S T R I B U T E D

S Y S T E M S

CONSISTENCY AVAILABILITY PARTITION TOLERANCE

ZOOKEEPER STRONG QUORUM YES

DYNAMO EVENTUALLY STRONG HIGH YES

MYSQL STRONG HIGH NO

T H E C A P T H E O R E M

4 9

T R A D E O F F S I N D I S T R I B U T E D S Y S T E M S

O R D E R I N D I S T R I B U T E D S Y S T E M S

5 0

• Order is necessary to enforce causal relationships

• Two types of order in distributed systems

• Partial order

• Order of dependent events

• Total order

• Order of all events

• Single-threaded applications are totally ordered

O R D E R I N D I S T R I B U T E D S Y S T E M S

5 1

T I M E I N D I S T R I B U T E D S Y S T E M S

5 2

• Time can be used to enforce order

• Time can be used to enforce bounds on communications

• But time progresses independently in asynchronous systems

• Clocks suffer from clock drift

• Even NTP can only synchronize clocks to within a few milliseconds of each other


5 3


5 4

• “Time, Clocks, and the Ordering of Events in a Distributed System”

• Developed by Leslie Lamport in 1978

• One of the seminal papers in distributed systems

• Determines partial ordering of events in a distributed system

• Also referred to as logical clocks


5 5

L A M P O R T C L O C K S


5 6

• “Timestamps in Message Passing Systems That Preserve the Partial Ordering” - Colin J. Fidge

• “Virtual Time and Global States of Distributed Systems” - Friedemann Mattern

• Independently developed by two researchers in 1988

• Determines causal ordering of events in a distributed system

• Also referred to as version vectors


5 7

V E C T O R C L O C K S


5 8

C O N S I S T E N C Y

5 9

• Linearizability

• Sequential consistency

• Causal consistency

• Eventual strong consistency

• Eventual consistency

C O N S I S T E N C Y M O D E L S

6 0

• Monotonic read consistency

• Monotonic write consistency

• Read-your-writes consistency

• Writes follow reads consistency

• Serializability

C O N S I S T E N C Y M O D E L S

6 1

M O R E C O N S I S T E N C Y M O D E L S

P A R T I T I O N I N G

6 2

• Split data across multiple machines

• Reduces the amount of data each node must handle

• Reduces the amount of network I/O for certain algorithms

P A R T I T I O N I N G

6 3


6 4

• Sharing information to ensure consistency between redundant services

• Active replication — push

• Passive replication — pull

• Quorum-based

• Gossip


6 5


6 6

• Nodes updated between the request and response

• Consistency over performance

A S Y N C H R O N O U SS Y N C H R O N O U S

• State persisted locally and replicated after response

• Performance over consistency

PRIMARY-BACKUP GOSSIP 2PC QUORUM

CONSISTENCY

TRANSACTIONS

LATENCY

THROUGHPUT

DATA LOSS

READ ONLY

E V E N T U A L S T R O N G

L O W

H I G H

F U L L

H I G H

F U L L L O C A L

S O M E

R E A D O N LY

L O W M E D I U M

N O N E

R E A D / W R I T E


6 7

T R A D E O F F S I N D I S T R I B U T E D S Y S T E M S

• Gossip is one of the simplest distributed communication algorithms

• Inspired by the gossip that takes place in human communication

• Each node periodically chooses a random set of neighbors with which to exchange information

• Information propagates through the system quickly

• Version vectors can be used to resolve conflicts in updates


6 8

G O S S I P

C O N S I S T E N T H A S H I N G

6 9

• Map each object to a point on the edge of a circle

• Map each machine to a pseudo-random point on the same circle

• To find the node on which an object is stored, find the location of the object on the edge of the circle and walk around the circle until the first node is found


7 0


7 1

F A I L U R E D E T E C T I O N

7 2

• Failure detectors are characterized in terms of completeness and accuracy

• In a synchronous system, failure detection is solvable

• Certain problems are not solvable without failure detection in an asynchronous system

• A partitioned process is indistinguishable from a crashed process

• Thus reliable failure detection is impossible in an asynchronous system

• Failure detection is usually based on time

F A I L U R E D E T E C T I O N

7 3

L E A D E R E L E C T I O N

7 4


7 5

• The process of selecting a single node to coordinate a cluster

• Difficult to account for failures

• Electing a leader allows a single process to control a cluster

• Frequently used in consensus algorithms

• But a single leader can limit throughput


7 6

B U L LY A L G O R I T H M

C O N S E N S U S

7 7

• Single-system view, shared state

• Key to building consistent storage systems

C O N S E N S U S

7 8

• Agreement — every correct process must agree on the same value

• Integrity — every correct process decides at most one proposed value

• Termination — all processes eventually reach some value

• Validity — if all correct processes propose the same value v then all processes decide the same value v

C O N S E N S U S

7 9

• “Impossibility of Consensus with One Faulty Process” — Fischer, Lynch, and Paterson

• Commonly referred to as the FLP Impossibility Result

• Consensus is impossible to guarantee in a fault-tolerant asynchronous system

• In practice, consensus can be reached

C O N S E N S U S

8 0

ZooKeeper Atomic Broadcast “Wait-free Coordination for Internet Scale Systems” — Hunt, Konar et al

Viewstamped Replication “Viewstamped Replication” — Brian M. Oki and Barbara H. Liskov

Raft “In Search of an Understandable Consensus Algorithm” — Diego Ongaro and John Osterhout

C O N S E N S U S

8 1

Paxos “The Part-Time Parliament” — Leslie Lamport

“Paxos Made Easy” — Leslie Lamport

• Leader election

• Log replication

• Failure detection

• Log compaction

• Membership changes

C O N S E N S U S

8 2

Distributed systems in practice

N E X T T I M E

8 3

HaltermanJordan

T H A N K Y O U !

distributed systems concepts

Technology