· form cassandra summit, usa, september 2014 . availability at netflix ©2015 datastax. do not...

34
www.informatik-aktuell.de

Upload: others

Post on 22-Apr-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

www.informatik-aktuell.de

Page 2:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Patrick Guillebert Solutions Architect

[email protected]

Availability, scalability and consistency

with Cassandra

Page 3:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Cassandra and the CAP Theorem

In a distributed system, only 2 of these attributes can be fulfilled at a time: Consistency, Availability, Partition tolerance

Availability, Partition tolerance is deeply built in by the design.

Consistency results programmatically and is tunable.

©2014 DataStax Confidential. Do not distribute without consent. 2

Cassandra is an AP system

Page 4:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Cassandra’s design goals

Massive and predictable scaling

Always-On

Tunable consistency

Geographically distributed

Low latency

Operationally simple

2016: + User friendly

©2015 DataStax. Do not distribute without consent. 3

Page 5:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

© 2014 DataStax Confidential. Do not distribute without consent.

4

Numbers and facts

from production deployments

Page 6:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Scale at Apple

Cassandra can scale from 3 to 1000+ nodes

• Footprint @ Apple

• 75,000+ nodes

• 10’s of petabytes of data

• Millions ops/second

• Largest cluster 1000+ nodes

Apple Inc.: Cassandra at Apple for Massive Scale

Video https://www.youtube.com/watch?v=Bc4ql9TDzyg Form Cassandra Summit, USA, September 2014

Page 7:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Availability at Netflix

©2015 DataStax. Do not distribute without consent. 6

Page 8:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Consistency at ING Bank

©2015 DataStax. Do not distribute without consent. 7

https://www.youtube.com/watch?v=-sD3x8-tuDU

Page 9:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

8

Scalability

Page 10:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Linear and predictable scaling

Need to store mode data? Add more nodes.

Need more throughput ? Add more nodes.

©2015 DataStax. Do not distribute without consent. 9

Cassandra

• Is “future proof”

• By scaling out linearly on commodity hardware.

Page 11:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Partitionning

• The dataset is distributed all nodes of the cluster

• Data subsets are called “token ranges”

• With vnodes, a node could have several small token ranges, 256 by default

©2015 DataStax. Do not distribute without consent. 10

Token Range

(Murmur3)

Node 1

Node 3

Node 2 Node 4

- 263 + 263

Page 12:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

The partitioner

©2015 DataStax. Do not distribute without consent. 11

"@DataSta

x"

Hash Function

97203

-2245462676723223822

7723358927203680754

• Partitioner is responsible for calculating the token, which is the hash of the key of the data to store

• The token is just a number between -2^63 and 2^63

• Partitioners:

Murmur3Partitioner (Murmur3), RandomPartitioner (MD5)

ByteOrderedPartitioner

Page 13:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Data distribution within the cluster

©2015 DataStax. Do not distribute without consent. 12

Node 1

Node 3

Node 2 Node 4 Node 4

Partitioner

’@Datastax'

Token 91

Node 1

0 100

Page 14:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

13

Availability

Page 15:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Node level availability

A node failure has no impact

• No failover

• 0 downtime

• 0 data loss

• Consistent responses

©2014 DataStax Confidential. Do not distribute without consent. 14

Node 1

1st copy

Node 4

Node 5 Node 2

2nd copy

Node 3

3rd copy

Parallel Writes

RF=3 CL=QUORUM

If 2 nodes out of the 3 replicas respond, the request is a success

Immediate onsistency is ensured

Node 4

Page 16:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Rack level availability

A rack failure has no impact on the service

©2014 DataStax Confidential. Do not distribute without consent. 15

Node 1

1st copy

Node 4

Node 5 Node 2

2nd copy

Node 3

3rd copy

Write

CL=QUORUM

Immediate consistency (RF 3 / CL QUORUM)

:

Nodes are distributed across at least 3 racks

If a rack fails, a quorum of replicas remain

available in the remaining racks and the

request suceeds.

Node 4

RAC 1

RAC 2

RAC 3

RF=3 CL=QUORUM

Page 17:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Data Centre level availability

©2014 DataStax Confidential. Do not distribute without consent. 16

Node 1

1st copy

Node 4

Node 5 Node 2

2nd copy

Node 3

3rd copy

DC: Frankfurt

Node 1

1st copy

Node 4

Node 5 Node 2

2nd copy

Node 3

3rd copy

DC: Dublin

A DC failure has no impact on the service

Page 18:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Internals and practices that enable availability

©2014 DataStax Confidential. Do not distribute without consent.

• Cluster topology awareness (“NetworkTopologyStrategy”) in replication

A cluster is described as a group of data centres, containining racks, containining nodes.

This topology is used to set replicas in a “smart” fashion.

• Peer to peer architecture

All nodes are strictly equal on the read/write path, any node can be queried for any data

Seeds have a special function only in the context of node bootstrap

Failover never happens

• Trade off on consistency

Response consistency is ensured by asking a quorum of nodes to be consistent.

There is no single “master” knowing the latest write.

• Live operations

Whichever the setup is (single or multi-dc) operations can be made live, with no downtime

Page 19:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Single DC replication

©2014 DataStax Confidential. Do not distribute without consent.

Node 1

Node 3

Node 2 Node 4

Client Driver

Node 4

Node 1

Node 2

Node 3

1-25

26-50 51-75

76-0

91

91

Single Data Center

CREATE KEYSPACE demo

WITH REPLICATION = {

'class':'SimpleStrategy',

'replication_factor':3

};

Primary Range

for Token 91

Page 20:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Multi-DC replication

©2014 DataStax Confidential. Do not distribute without consent.

Node 1

Node 3

Node 2 Node 4

Client Driver

Node 4

Node 1

Node 2

Node 3

1-13

26-38

51-63

76-88

Node 1

Node 3

Node 2 Node 4

Data Center - West Rack 1

Rack 2

Node 2

Rack 1

Rack 2

91

91

91

CREATE KEYSPACE demo

WITH REPLICATION {

'class':'NetworkTopologyStrategy',

'dc-east':2,

'dc-west’:3

}

Remote

Coordinator

Primary Range

for Token 91

14-25

39-50

64-75

89-100

91

Page 21:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

What is different from a master-slave system ?

©2014 DataStax Confidential. Do not distribute without consent.

Failovers never happens, having no master irremediable data corruption in split brain situation can’t happen

Page 22:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

21

Consistency

Page 23:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Consistency types

©2015 DataStax. Do not distribute without consent. 22

• Immediate consistency

The uniquely given consistency type which a relational database provide.

A read always return the last written data, whichever node is requested.

• Eventual consistency

A read will eventually return the last written data.

Depending on the node queried, responses could be different.

Cassandra offers both types of consistency, it can be tuned.

Page 24:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Consistency Levels

©2015 DataStax. Do not distribute without consent. 23

Consistency levels are set on each single request. It determines the number of node which need to acknowledge a write or agree on the value of a queried data.

Consistency types could be different depending on the table or in time.

• Common CLs: ONE, QUORUM, ALL

• Multi-DC CLs: LOCAL_ONE, LOCAL_QUORUM, LOCAL_ALL, EACH_QUORUM

• Special CLs: ANY, SERIAL, LOCAL_SERIAL

Page 25:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

General consistency levels

©2015 DataStax. Do not distribute without consent. 24

ONE

QUORUM

ALL

ANY

SERIAL

Page 26:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Multi-DC consistency levels

©2015 DataStax. Do not distribute without consent. 25

LOCAL_ONE

LOCAL_QUORUM

LOCAL_ALL

EACH_QUORUM

LOCAL_SERIAL

Page 27:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

The rule to tune consistency

©2015 DataStax. Do not distribute without consent. 26

Immediate consistency :

CL READ + CL WRITE > REPLICATION

READ QUORUM + WRITE QUORUM

READ ONE + WRITE ALL

READ ALL + WRITE ONE

Eventual consistency

CL READ + CL WRITE <= REPLICATION

READ ONE + WRITE ONE

READ ONE + WRITE ANY

Page 28:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Time base conflict resolution

©2015 DataStax. Do not distribute without consent. 27

Node 1

Node 3

Node 2 Node 4

Client Driver

timestamp

… 523

timestamp

… 511

timestamp

… 542

The last write wins !

Page 29:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Empirical verification of the rule

©2015 DataStax. Do not distribute without consent. 28

Node 1

Node 3

Node 2 Node 4 RF = 3

CL QUORUM WRITE

ALL COMBINATIONS

OF CL QUORUM

Page 30:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

29

Use cases

Page 31:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Domains of use cases

©2015 DataStax. Do not distribute without consent. 30

• IoT

• Web / mobile

• Gaming

• Bank, finance

• Telecoms

Page 32:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Using C* for to back a global reference data service

©2015 DataStax. Do not distribute without consent. 31

Requirements

• Always on

• High throughput

• Low latency

• Multi-region replication

Little or big data

Page 33:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Extending use cases with DSE

©2015 DataStax. Do not distribute without consent. 32

Page 34:  · Form Cassandra Summit, USA, September 2014 . Availability at Netflix ©2015 DataStax. Do not distribute without consent. 6 . Consistency at ING Bank ... ©2015 DataStax. Do not

Thank you !