traveler's guide to cassandra

65
Who wants to be a Cassandra Millionaire? 40-minutes of best practice - getting you ready for certification @VictorFAnjos Toronto Cassandra Day

Upload: datastax-academy

Post on 11-Feb-2017

902 views

Category:

Technology


8 download

TRANSCRIPT

Page 1: Traveler's Guide to Cassandra

Who wants to be a Cassandra Millionaire?40-minutes of best practice - getting you ready for certification

@VictorFAnjos

Toronto Cassandra Day

Page 2: Traveler's Guide to Cassandra

2© 2015. All Rights Reserved. @VictorFAnjos

Page 3: Traveler's Guide to Cassandra

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

Welcome to Who Wants to be a Cassandra Millionaire

50:50

@VictorFAnjos

Page 4: Traveler's Guide to Cassandra

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Page 5: Traveler's Guide to Cassandra

5© 2015. All Rights Reserved.

A: NAS / SAN

C: DAS SATA

B: SSD

D: DAS SCSI

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

This storage medium allows for best performance.

@VictorFAnjos

Page 6: Traveler's Guide to Cassandra

6© 2015. All Rights Reserved. @VictorFAnjos

A: NAS / SAN

C: DAS SATA

B: SSD

D: DAS SCSI

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

This storage medium allows for best performance.

Page 7: Traveler's Guide to Cassandra

Installation and considerations

how to store the datastore

Storage Area Network Solid State Drive

7© 2015. All Rights Reserved. @VictorFAnjos

Page 8: Traveler's Guide to Cassandra

Installation and considerations

how to store the datastore

Local (DAS), iSCSI, Fiber Channel

8© 2015. All Rights Reserved. @VictorFAnjos

● AVOID network storage like the plague

● Direct Attached Storage FTW

● Disk latency is a HUGE deal for performance

Page 9: Traveler's Guide to Cassandra

Installation and considerations

how to store the datastore

9© 2015. All Rights Reserved. 9@VictorFAnjos

SATA/SAS DAS

PCIe/NVMe DAS

Page 10: Traveler's Guide to Cassandra

Installation and considerations

how to store the datastore

10© 2015. All Rights Reserved. @VictorFAnjos

Page 11: Traveler's Guide to Cassandra

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Page 12: Traveler's Guide to Cassandra

12© 2015. All Rights Reserved. @VictorFAnjos

A: ZFS

C: Ext4

B: Btrfs

D: F2FS

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

When using SSDs, this filesystem type

is best.

@VictorFAnjos

Page 13: Traveler's Guide to Cassandra

13© 2015. All Rights Reserved. @VictorFAnjos

A: ZFS

C: Ext4

B: Btrfs

D: F2FS

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

When using SSDs, this filesystem type

is best.

@VictorFAnjos

Page 14: Traveler's Guide to Cassandra

Congratulations!

You’ve Reachedthe 1,000 ops/s

Milestone!

Congratulations!Congratulations!@VictorFAnjos

Page 15: Traveler's Guide to Cassandra

Installation and considerations

i can’t believe it’s not btrfs

15© 2015. All Rights Reserved. @VictorFAnjos

● easiest to use ext4 (it’s on most linux distros), but F2FS get 5-10% gains in write performance

● if NOT using F2FS, make sure to TRIM

● multiple disks → use RAID0

Page 16: Traveler's Guide to Cassandra

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Page 17: Traveler's Guide to Cassandra

17© 2015. All Rights Reserved. @VictorFAnjos

A: 0

C: Equal to HEAP

B: ½ of HEAP

D: EQUAL TO RAM

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

This is the sweetspot for SWAP

when using C*

@VictorFAnjos

Page 18: Traveler's Guide to Cassandra

18© 2015. All Rights Reserved. @VictorFAnjos

A: 0

C: Equal to HEAP

B: ½ of HEAP

D: EQUAL TO RAM

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

This is the sweetspot for SWAP

when using C*

@VictorFAnjos

Page 19: Traveler's Guide to Cassandra

Installation and considerations

to swap or not to swap

19© 2015. All Rights Reserved. @VictorFAnjos

Page 20: Traveler's Guide to Cassandra

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Page 21: Traveler's Guide to Cassandra

21© 2015. All Rights Reserved. @VictorFAnjos

A: 64G

C: 16G

B: 32G

D: 8G

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

Having 64G of RAM means you should optimize to have ___G of HEAP.

@VictorFAnjos

Page 22: Traveler's Guide to Cassandra

22© 2015. All Rights Reserved. @VictorFAnjos

A: 64G

C: 16G

B: 32G

D: 8G

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Having 64G of RAM means you should optimize to have ___G of HEAP.

Page 23: Traveler's Guide to Cassandra

Installation and considerations

how much heap?

23© 2015. All Rights Reserved. @VictorFAnjos

http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_tune_jvm_c.html

Page 24: Traveler's Guide to Cassandra

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Page 25: Traveler's Guide to Cassandra

25© 2015. All Rights Reserved. @VictorFAnjos

A: EC2Snitch

C: Simple Snitch

B: Dynamic Snitch

D: Property File Snitch

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

Definitely DO NOT use this snitch in

Multi-DC environments.

@VictorFAnjos

Page 26: Traveler's Guide to Cassandra

26© 2015. All Rights Reserved. @VictorFAnjos

A: EC2Snitch

C: Simple Snitch

B: Dynamic Snitch

D: Property File Snitch

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Definitely DO NOT use this snitch in

Multi-DC environments.

Page 27: Traveler's Guide to Cassandra

Installation and considerations

son of a snitch

27© 2015. All Rights Reserved. @VictorFAnjos

Page 28: Traveler's Guide to Cassandra

Installation and considerations

son of a snitch

28© 2015. All Rights Reserved. @VictorFAnjos

Page 29: Traveler's Guide to Cassandra

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Page 30: Traveler's Guide to Cassandra

30© 2015. All Rights Reserved. @VictorFAnjos

A: Synchronous AND Full Queries

C: Synchronous AND Prepared Statements

B: Asynchronous AND Prepared Statements

D: Asynchronous AND Full Queries

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

To reduce latency and wire time to my

app, I should opt for.

@VictorFAnjos

Page 31: Traveler's Guide to Cassandra

31© 2015. All Rights Reserved. @VictorFAnjos

A: Synchronous AND Full Queries

C: Synchronous AND Prepared Statements

B: Asynchronous AND Prepared Statements

D: Asynchronous AND Full Queries

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

To reduce latency and wire time to my

app, I should opt for.

@VictorFAnjos

Page 32: Traveler's Guide to Cassandra

Achieving performance through code/drivers

should I stay or should I go

32© 2015. All Rights Reserved. @VictorFAnjos

● Client writes to any Cassandra node

● Coordinator node replicates to other nodes (in local and remote Data Center)

● Local write acks returned to coordinator

● Client gets ack when enough total nodes are committed

● Data written to internal commit log disks

● When data arrives, remote node replicates data

MULTI DC

● Ack direct to source region coordinator

● Remote copies written to commit log disks

lf a node or region goes offline, hinted handoff completes the write when the node comes back up (as long as there are enough nodes to satisfy consistency level).

Page 33: Traveler's Guide to Cassandra

Achieving performance through code/drivers

should I stay or should I go

33© 2015. All Rights Reserved. @VictorFAnjos

Prepare ONCE...

Bind and Execute multiple times.

Page 34: Traveler's Guide to Cassandra

Achieving performance through code/drivers

should I stay or should I go

34© 2015. All Rights Reserved. @VictorFAnjos

Page 35: Traveler's Guide to Cassandra

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Page 36: Traveler's Guide to Cassandra

36© 2015. All Rights Reserved. @VictorFAnjos

A: 1 / 1 = 1

C: 2 * 1 = 2

B: 2 / 1 = 2

D: 2 / 2 + 1 = 2

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

With RF=2 and CL=Quorum, operations failed when 1 node went down because of this.

@VictorFAnjos

Page 37: Traveler's Guide to Cassandra

37© 2015. All Rights Reserved. @VictorFAnjos

A: 1 / 1 = 1

C: 2 * 1 = 2

B: 2 / 1 = 2

D: 2 / 2 + 1 = 2

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

With RF=2 and CL=Quorum, operations failed when 1 node went down because of this.

@VictorFAnjos

Page 38: Traveler's Guide to Cassandra

Congratulations!

You’ve Reachedthe 32,000 ops/s

Milestone!

Congratulations!Congratulations!@VictorFAnjos

Page 39: Traveler's Guide to Cassandra

Achieving performance through code/drivers

when friends aren’t enough

39© 2015. All Rights Reserved. @VictorFAnjos

Replication Factor = 3

Insert into a cluster of size 6 with consistency Quorum

Two nodes in token range must be present for write to succeed

Page 40: Traveler's Guide to Cassandra

Achieving performance through code/drivers

when friends aren’t enough

40© 2015. All Rights Reserved. @VictorFAnjos

What happens now?

Cannot achieve consistency level QUORUM

Cannot achieve consistency level QUORUM

Cannot achieve consistency level QUORUM

Cannot achieve consistency level QUORUM

Nodes in partition key DOWN

Page 41: Traveler's Guide to Cassandra

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Page 42: Traveler's Guide to Cassandra

42© 2015. All Rights Reserved. @VictorFAnjos

A: Truth table

C: CAP Theorem

B: Brewer’s Theorem

D: Entropy

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

This mathematical and CS concept helps when data modeling for query

optimization.

@VictorFAnjos

Page 43: Traveler's Guide to Cassandra

43© 2015. All Rights Reserved. @VictorFAnjos

A: Truth table

C: CAP Theorem

B: Brewer’s Theorem

D: Entropy

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

This mathematical and CS concept helps when data modeling for query

optimization.

Page 44: Traveler's Guide to Cassandra

Data modelling, CQLSH and more

the truth shall set you free

44© 2015. All Rights Reserved. @VictorFAnjos

Motivated by CS, Math, Engineering

Allows for creating building blocks that yield a single output

More complex truth tables can arise

Page 45: Traveler's Guide to Cassandra

Data modelling, CQLSH and more

the truth shall set you free

45© 2015. All Rights Reserved. @VictorFAnjos

How about searching for username?

And what about full_name?

user_stream

← ← ← Partition Key → → → user_id username full_name

1 0 00 1 00 0 1

Page 46: Traveler's Guide to Cassandra

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Page 47: Traveler's Guide to Cassandra

47© 2015. All Rights Reserved. @VictorFAnjos

A: Reads / Batches

C: Writes / Deletes

B: Writes / Batches

D: Reads / Deletes

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

A shift in paradigms, what should you

maximize and reduce for best performance.

@VictorFAnjos

Page 48: Traveler's Guide to Cassandra

48© 2015. All Rights Reserved. @VictorFAnjos

A: Reads / Batches

C: Writes / Deletes

B: Writes / Batches

D: Reads / Deletes

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

A shift in paradigms, what should you

maximize and reduce for good performance.

Page 49: Traveler's Guide to Cassandra

Data modelling, CQLSH and more

do the write thing

49© 2015. All Rights Reserved. @VictorFAnjos

Page 50: Traveler's Guide to Cassandra

Data modelling, CQLSH and more

do the write thing

50© 2015. All Rights Reserved. @VictorFAnjos

memtable --- < 100ns

commit log --- ~ 1 ms

DELETES / TTL cause compactions

Page 51: Traveler's Guide to Cassandra

Data modelling, CQLSH and more

do the write thing

51© 2015. All Rights Reserved. @VictorFAnjos

Page 52: Traveler's Guide to Cassandra

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Page 53: Traveler's Guide to Cassandra

53© 2015. All Rights Reserved. @VictorFAnjos

A: ACID

C: Rollback

B: Vector

D: Sharding

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

To not hit a 2B record limit (per row), this

RDBMS borrowed term can still makes sense.

@VictorFAnjos

Page 54: Traveler's Guide to Cassandra

54© 2015. All Rights Reserved. @VictorFAnjos

A: ACID

C: Rollback

B: Vector

D: Sharding

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

To not hit a 2B record limit (per row), this

RDBMS borrowed term can still makes sense.

@VictorFAnjos

Page 55: Traveler's Guide to Cassandra

Data modelling, CQLSH and more

sit on this and rotate

55© 2015. All Rights Reserved. @VictorFAnjos

Page 56: Traveler's Guide to Cassandra

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Page 57: Traveler's Guide to Cassandra

57© 2015. All Rights Reserved. @VictorFAnjos

A: Batches

C: Secondary Indexes

B: Synchronous

D: MySQL

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

Many say to use sparingly, I would say, avoid like the plague.

@VictorFAnjos

Page 58: Traveler's Guide to Cassandra

58© 2015. All Rights Reserved. @VictorFAnjos

A: Batches

C: Secondary Indexes

B: Synchronous

D: MySQL

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Many say to use sparingly, I would say, avoid like the plague.

Page 59: Traveler's Guide to Cassandra

Performance must-haves

never be second best

59© 2015. All Rights Reserved. @VictorFAnjos

writes are distributed among the cluster

each partition key refers to one exact position in which to get a row

but what do we do when we don’t have exactly the right type of index to specify a queryCREATE TABLE users ( user varchar, email varchar, state varchar, PRIMARY KEY (user));-- OPTION 1 : create an indexCREATE INDEX idxUBS on users (state);

-- OPTION 2 : create another table (store data twice)CREATE TABLE usersByState ( state varchar, user varchar, PRIMARY KEY (state, user));

Page 60: Traveler's Guide to Cassandra

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Page 61: Traveler's Guide to Cassandra

61© 2015. All Rights Reserved. @VictorFAnjos

A: UDT

C: JSON

B: Lightweight Transactions

D: Hinted handoff

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

This addition to C* can help with ACID like

transactions, at a bit of a performance hit.

@VictorFAnjos

Page 62: Traveler's Guide to Cassandra

62© 2015. All Rights Reserved. @VictorFAnjos

A: UDT

C: JSON

B: Lightweight Transactions

D: Hinted handoff

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

This recent addition to C* now helps with ACID like transactions, at a bit

of a performance hit.

@VictorFAnjos

Page 63: Traveler's Guide to Cassandra

Performance must-haves

slimfast agreement

63© 2015. All Rights Reserved. @VictorFAnjos

Prepares a proposal that is sent to a number of Acceptors.Waits on a an acknowledgement (in form of promise) from Acceptors.Sends accept message to Quorum of Acceptors with new value to commit.Returns success? completion to client.

Determines if proposal is newer than what it has seen.Acknowledges/agree with its own highest proposal value seen AND the current value (of what is to be set).Receive message to commit new value.Accept and return on successful commit of value.

Page 64: Traveler's Guide to Cassandra

64© 2015. All Rights Reserved. @VictorFAnjos

Performance must-haves

slimfast agreement

Page 65: Traveler's Guide to Cassandra

Thank you