distributed systems fall 2011 gossip and highly available services

Distributed SystemsFall 2011

Gossip and highly available services

…sometimes, good enough and on time is better than waiting for perfection…

3

Outline

• Quality of Service (QoS) vs. High availability

• Gossip– Basic concept– Architecture

• Cloud computing

Quality of Service

• Quality of Service means different things depending on context– Low number of crashes / high uptime

– Messages delivered in ”reasonable” time (e.g. live streaming data)

– Many more interpretations

• Service Level Agreement• Compensation for broken SLAs

– E.g. accounting in Grids / Clouds4

High availability

• Uptime does not imply availability!

• Some level of service is better than none

• We do not always need replicas with either sequential consistency / linearizability properties– “Fresh enough” rather than “the freshest”

• Many modern (huge) systems have similar requirements

5

Quick refresher

• Sequential consistency– Interleaving of operations performed (as if) on a single copy of the object

– Consistent with program order of the invoking client (as opposed to real time in linearizability)

6

Gossip

• Framework, Architecture, and/or Protocol

• “Lazy” synchronization between replica managers– Eventual consistency

• Very fault-tolerant– Crashes are OK– Clients contact any replica manager for any operation

– Clients keep track of their own state

7

Basic ideas of Gossip

• Consistent service over time– Provide clients with newer data than they have observed so far

– Clients can work with different replicas for each operation

• Relaxed inter-replica consistency– Not generally sequential consistency

8

Basic ideas contd.

• Update operations– Change some value in the system– Accepted immediately (carried out later)

• Query operations– Reads value(s) from the system– Blocks until replica manager is able to respond to client

– Needs fresh enough data to respond

9

Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4

© Pearson Education 2005

Figure 15.6Query and update operations in a gossip service

Service

Query Val

FE

RM RM

RM

Query, prev Val, new

Update

FE

Update, prev Update id

Clients

gossip

Message ordering

• Causal update ordering• Forced (total and causal)• Immediate

– Applied in consistent order relative to any other update at all replica managers, regardless of ordering for the other update

• Tradeoff: consistency vs. cost– Causal is cheap and easy– Causal order for queries to a single RM

11

Message ordering example

• Discussion forum (bulletin board)

• Causal order for discussion threads– Preserves conversation structures

• Forced order for registration– Clear order of who joined when

• Immediate for unregistering– No messages sent to ex-subscriber

12

Front ends

• Much more intelligent than during active/passive replication!

• Clients always use front ends– Even for inter-client messages (allows causally related messages and information dissemination)

13

Queries

• Client state (vector-clock) included in the call

• The RM returns values that are at least as recent as the client state– If the state of the client is more recent than the RM, the RM will either request the missing messages or wait for them.

14

Updates

• Causal order• Each have unique identifiers• Client state is included in the call to support message orderings

• Updates are never blocking, only enqueued

• Due to the ordering guarantees, a client can update (or query) any RM and get the same result

15

Replication phases

• Request– Normally FE sends to single RM (use more for higher fault tolerance)

• Update response• Coordination

– Apply updates once ordering allows– No explicit coordination: only Gossip messages required

• Execution• Query response• Agreement

– Lazy: can wait and send update batches

16

Front end timestamp

• FE embeds own vector clock with each message

• Update– RM infers order relative to others

• Queries– RM returns oldest possible data:FE has (2,4,6) and RM (2,5,5)RM updates own state to (2,5,6) and returns state at (2,5,5)

17

Front end timestamps contd.

• Clients can communicate with each other– Causes causal relationship between messages

– Lets FEs update their vector timestamps so that future queries to Gossip service will give more updated data

18

Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4

© Pearson Education 2005

Figure 15.8A gossip replica manager, showing its main state components

Other replica managers

Replica timestamp

Update log

Value timestamp

Value

Executed operation table

Stable

updates

Updates

Gossipmessages

FE

Replicatimestamp

Replica log

OperationID Update PrevFE

Replica manager

Timestamp table

Gossiping

• The architecture does not specify when and with which peers to gossip

• Delay depends on:– Frequency and duration of partitions

– Frequency of gossip-exchanges (application dependant)

– Peer-selection policy• Random• Deterministic• Topological 20

Advanced Gossiping

• In some applications (think Facebook), replication can be geographically biased

• Read-only RMs close to clients can improve scalability a lot for read intensive applications

21

Facebook and Gossip

• Inbox search problem: 25TB of data

• Facebook solution: Cassandra– Similar to Distributed Hash-Tables, more about these during next lecture

– Conceptually, a hash-table with N-replicas using lazy updates to share data

• Facebook uses algorithms similar to Gossip for failure handling.– If a node in DHT crashes, the event is propagated to the other replicas

22

Cloud computing

• Servers and tasks are run inside virtual machines– Virtual machines can be moved and placed dynamically

– A master image is used to start new copies of virtual machines

– Elasticity: virtual machines are added or removed depending on certain system conditions

• A simple example follows

23

24

InterSweep Inc.

25

InterSweep Inc.

26

InterSweep Inc.

27

Next lecture

• Beyond client-server– Peer to peer (P2P)– BitTorrent– Hybrid systems

• Distributed Hash-Tables• Distributed Computing

– BOINC

distributed systems fall 2011 gossip and highly available services

Documents

perfection slide

update operations

level of service

new update fe update

easy causal order

updates causal order

update tradeoff

interclient messages