full consistency lag and its applications

31
Confidential and Proprietary. © 2013 Bazaarvoice, Inc. EmoDB Full Consistency Lag and its applications www.bazaarvoice.co m

Upload: cassandra-austin

Post on 19-Feb-2017

109 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

EmoDBFull Consistency Lag and its applications

www.bazaarvoice.com

Page 2: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Fahd SiddiquiStaff Software Engineer, Data Infrastructure

Bazaarvoice

linkedin.com/in/fahdsiddiqui

[email protected]

$ whoami

Page 3: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

SaaS serving software that collects and displays user generated content, crunches analytics, and extracts insights.

Thousands of clients

Hundreds of millions of pieces of content

Hundreds of millions of unique visitors per month

Tens of billions of pageviews per month

Austin-based company founded in 2005

Austin San Francisco New YorkEngineering offices

Page 4: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Global Monthly Unique Visitors

1B

1B

500M

1B

400M

200M

250M

450M

1B

600M

Page 5: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Overview

• Define Full Consistency Timestamp (FCT) and Full Consistency Lag (FCL)

• Applications of FCT in Distributed Systems

• Distributed Compaction

• Algorithm for determining FCT and FCL

• Final thought on FCT and gc_grace_seconds

Page 6: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Data Infrastructure

Page 7: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

EmoDB Overview

Append-only key value store

Exposes cross region databus for listeners to subscribe and watch for changes to data events

RESTful API

Multi-master, multi-datacenter, fault tolerant, horizontal scale on r/w

Backed by multiple multi-region Cassandra rings

Page 8: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Full Consistency Timestamp (FCT)

• A system is said to be fully consistent, when all nodes in a distributed system have the same state of the data. So, if record A is in State S on one node, then we know that it is in the same state in all its replicas

• An eventually consistent system may never be in a fully consistent state.

• Full Consistency Timestamp (FCT) can be defined as a timestamp such that any event, e, whose timestamp, t <= FCT, then e is guaranteed to be consistent on all nodes.

Page 9: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Full Consistency Timestamp (FCT)

More formally,

∀ e, te | te < FCT => e   C ∈

                          where                          C: Set of consistent events

e: event

te: Timestamp for event, e

Page 10: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Full Consistency Timestamp (FCT)

In the inconsistent state:

FCT = T(∆2)

Page 11: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Some applications of FCT

• Invalidate Cache

• Synchronize events

• Monitor real-time staleness

• Distributed compactions (not the same as Cassandra compaction)

Page 12: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Invalidate Cache

• Invalidate caches across data centers without relying on Global CL.

• For instance, write to DC-1 (at any Consistency Level) and then call DC-2 to invalidate cache. DC-2 should not invalidate cache before the change is actually replicated to it.

Page 13: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Invalidate Cache

• Using FCT:

• Invalidate call to DC-2 includes update timestamp, t

• DC-2 waits until t <= FCT, before invalidating cache to make sure changes are replicated.

Page 14: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Synchronize Events

• Synchronize events when we need to be certain that writes are consistent in all data centers.

• For example, Process P1 writes to DC-1, changing the value of "{maintenance: true}". Process P2 should act upon it, iff, the value is propagated to all data centers DC-2, DC-3, DC-n.

Page 15: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Synchronize Events

• Using FCT:

• Process, P2, polls to see if anything is required for it to do

• P2 checks FCT, and makes sure that update is propagated everywhere.

• P2 does work

Page 16: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Monitor real-time staleness

• At any given time, we need to know exactly how stale our data is.

• It is OK to report conservatively, and report higher latency than there is. But we should not under-report real time data staleness.

Page 17: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Monitor real-time staleness

Chart Full Consistency Lag ( T - FCT )

Note: We update FCT every 5 minutes in the above chart

Page 18: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Distributed Compaction

• In an append-only system, such as EmoDB, a row:

• Row is composed of deltas

• Writers append deltas, and readers resolve deltas to produce a resolved object

Page 19: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Distributed Compaction

• To reduce read latency, we need to compact the deltas that we know are replicated to all data centers.

• Compaction should happen as frequently as possible without data loss.

• An older delta is in flight from a different data center while we are in the process of compaction may result in data loss

Page 20: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Distributed Compaction

• Using FCT:

• Partially compact rows.

• Any deltas before FCT can be safely compacted

Page 21: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

FCT - Algorithm

• Hinted Handoffs

• Introduced in the Dynamo paper as a way of handling failure.

• Cassandra uses it for fault-tolerant replication.

• If a write is made and a replica node for the key is down, then the coordinator stores a hint about dead replicas in the local system.hints table.

Page 22: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

FCT - Algorithm

• Algorithm:

• Tf  : Find oldest hint found in the entire ring.

• rpc_timeout: Maximum timeout in cassandra that nodes will use when communicating with each other

FCT = Tf – rpc_timeout

Page 23: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

FCT - Algorithm

• Finding oldest hint in the ring:

• Let's take a quick look at Hints table in system keyspace:

cqlsh:system> desc table hints; CREATE TABLE system.hints ( target_id uuid, hint_id timeuuid, message_version int, mutation blob, PRIMARY KEY (target_id, hint_id, message_version) ) WITH COMPACT STORAGE ...

• hint_id is time UUID and is a clustering key

Page 24: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

FCT - Algorithm

• Finding oldest hint in the ring:

… which means that finding the minimum hint_id is quick

SELECT min(hint_id) AS old_hint_id FROM hints;

• T f  = minimum(Oldest hint from each node in the ring)

• Note that we can only update FCT if all nodes are up.

Page 25: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

FCT – Failure cases

• A node is down:

• A pre-requisite of FCT is that all nodes in the ring have to be up at the time.

• If a node is down, FCT is not updated and remains to the old value.

• Any process that depends on FCT is not disrupted. It just uses the last “good” FCT.

• Not a failure

Page 26: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

FCT – Failure cases

• Network failure between datacenters:

• A pre-requisite of FCT is that all nodes in the ring have to be up at the time.

• In an event of network outage, FCT remains untouched, and FCT is not calculated. This will simply increase FCL (Full Consistency Lag)

• Not a failure

Page 27: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

FCT – Failure cases

• Node is down for longer than hints storage window:

• Most impactful risk.

• A node may come up after a long time, and never get hints as they are expired. FCT is updated, which isn’t correct.

• Mitigate this risk by:

• Alerting when FCL gets close to the hints storage window and/or automatically disabling FCT updates.

Page 28: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Monitor real-time staleness

Chart Full Consistency Lag ( T - FCT )

Note: We update FCT every 5 minutes in the above chart

Page 29: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

FCT and gc_grace_seconds

• Cassandra compaction is a local operation.

• gc_grace_seconds is an arbitrary configured value that decides the maximum time that a node’s compaction will keep tombstones around. If this value is too short, you may resurrect deleted deltas. Too long, and you are being inefficient. Cassandra’s default value is 10 days.

• Equipped with FCT, we can get rid of gc_grace_seconds, and simply introduce an “auto” feature.

• In other words, if the FCT is known, then gc_grace_seconds is FCL.

Page 30: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

Other works in this area

• "Quantifying Eventual Consistency with PBS" - Peter Balis, et al. The VLDB Journal (2014) 23:279–302

• Probabilistic Bound Staleness (PBS)

• Provides expected bounds on staleness due to eventual consistency.

Page 31: Full Consistency Lag and its Applications

Confidential and Proprietary. © 2013 Bazaarvoice, Inc.

@Bazaarvoice

@BazaarvoiceDev

http://www.bazaarvoice.com/

http://blog.developer.bazaarvoice.com/

Learnmore