traveler's guide to cassandra

Post on 11-Feb-2017

902 Views

Category:

Technology

8 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Who wants to be a Cassandra Millionaire?40-minutes of best practice - getting you ready for certification

@VictorFAnjos

Toronto Cassandra Day

2© 2015. All Rights Reserved. @VictorFAnjos

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

Welcome to Who Wants to be a Cassandra Millionaire

50:50

@VictorFAnjos

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

5© 2015. All Rights Reserved.

A: NAS / SAN

C: DAS SATA

B: SSD

D: DAS SCSI

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

This storage medium allows for best performance.

@VictorFAnjos

6© 2015. All Rights Reserved. @VictorFAnjos

A: NAS / SAN

C: DAS SATA

B: SSD

D: DAS SCSI

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

This storage medium allows for best performance.

Installation and considerations

how to store the datastore

Storage Area Network Solid State Drive

7© 2015. All Rights Reserved. @VictorFAnjos

Installation and considerations

how to store the datastore

Local (DAS), iSCSI, Fiber Channel

8© 2015. All Rights Reserved. @VictorFAnjos

● AVOID network storage like the plague

● Direct Attached Storage FTW

● Disk latency is a HUGE deal for performance

Installation and considerations

how to store the datastore

9© 2015. All Rights Reserved. 9@VictorFAnjos

SATA/SAS DAS

PCIe/NVMe DAS

Installation and considerations

how to store the datastore

10© 2015. All Rights Reserved. @VictorFAnjos

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

12© 2015. All Rights Reserved. @VictorFAnjos

A: ZFS

C: Ext4

B: Btrfs

D: F2FS

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

When using SSDs, this filesystem type

is best.

@VictorFAnjos

13© 2015. All Rights Reserved. @VictorFAnjos

A: ZFS

C: Ext4

B: Btrfs

D: F2FS

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

When using SSDs, this filesystem type

is best.

@VictorFAnjos

Congratulations!

You’ve Reachedthe 1,000 ops/s

Milestone!

Congratulations!Congratulations!@VictorFAnjos

Installation and considerations

i can’t believe it’s not btrfs

15© 2015. All Rights Reserved. @VictorFAnjos

● easiest to use ext4 (it’s on most linux distros), but F2FS get 5-10% gains in write performance

● if NOT using F2FS, make sure to TRIM

● multiple disks → use RAID0

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

17© 2015. All Rights Reserved. @VictorFAnjos

A: 0

C: Equal to HEAP

B: ½ of HEAP

D: EQUAL TO RAM

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

This is the sweetspot for SWAP

when using C*

@VictorFAnjos

18© 2015. All Rights Reserved. @VictorFAnjos

A: 0

C: Equal to HEAP

B: ½ of HEAP

D: EQUAL TO RAM

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

This is the sweetspot for SWAP

when using C*

@VictorFAnjos

Installation and considerations

to swap or not to swap

19© 2015. All Rights Reserved. @VictorFAnjos

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

21© 2015. All Rights Reserved. @VictorFAnjos

A: 64G

C: 16G

B: 32G

D: 8G

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

Having 64G of RAM means you should optimize to have ___G of HEAP.

@VictorFAnjos

22© 2015. All Rights Reserved. @VictorFAnjos

A: 64G

C: 16G

B: 32G

D: 8G

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Having 64G of RAM means you should optimize to have ___G of HEAP.

Installation and considerations

how much heap?

23© 2015. All Rights Reserved. @VictorFAnjos

http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_tune_jvm_c.html

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

25© 2015. All Rights Reserved. @VictorFAnjos

A: EC2Snitch

C: Simple Snitch

B: Dynamic Snitch

D: Property File Snitch

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

Definitely DO NOT use this snitch in

Multi-DC environments.

@VictorFAnjos

26© 2015. All Rights Reserved. @VictorFAnjos

A: EC2Snitch

C: Simple Snitch

B: Dynamic Snitch

D: Property File Snitch

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Definitely DO NOT use this snitch in

Multi-DC environments.

Installation and considerations

son of a snitch

27© 2015. All Rights Reserved. @VictorFAnjos

Installation and considerations

son of a snitch

28© 2015. All Rights Reserved. @VictorFAnjos

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

30© 2015. All Rights Reserved. @VictorFAnjos

A: Synchronous AND Full Queries

C: Synchronous AND Prepared Statements

B: Asynchronous AND Prepared Statements

D: Asynchronous AND Full Queries

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

To reduce latency and wire time to my

app, I should opt for.

@VictorFAnjos

31© 2015. All Rights Reserved. @VictorFAnjos

A: Synchronous AND Full Queries

C: Synchronous AND Prepared Statements

B: Asynchronous AND Prepared Statements

D: Asynchronous AND Full Queries

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

To reduce latency and wire time to my

app, I should opt for.

@VictorFAnjos

Achieving performance through code/drivers

should I stay or should I go

32© 2015. All Rights Reserved. @VictorFAnjos

● Client writes to any Cassandra node

● Coordinator node replicates to other nodes (in local and remote Data Center)

● Local write acks returned to coordinator

● Client gets ack when enough total nodes are committed

● Data written to internal commit log disks

● When data arrives, remote node replicates data

MULTI DC

● Ack direct to source region coordinator

● Remote copies written to commit log disks

lf a node or region goes offline, hinted handoff completes the write when the node comes back up (as long as there are enough nodes to satisfy consistency level).

Achieving performance through code/drivers

should I stay or should I go

33© 2015. All Rights Reserved. @VictorFAnjos

Prepare ONCE...

Bind and Execute multiple times.

Achieving performance through code/drivers

should I stay or should I go

34© 2015. All Rights Reserved. @VictorFAnjos

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

36© 2015. All Rights Reserved. @VictorFAnjos

A: 1 / 1 = 1

C: 2 * 1 = 2

B: 2 / 1 = 2

D: 2 / 2 + 1 = 2

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

With RF=2 and CL=Quorum, operations failed when 1 node went down because of this.

@VictorFAnjos

37© 2015. All Rights Reserved. @VictorFAnjos

A: 1 / 1 = 1

C: 2 * 1 = 2

B: 2 / 1 = 2

D: 2 / 2 + 1 = 2

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

With RF=2 and CL=Quorum, operations failed when 1 node went down because of this.

@VictorFAnjos

Congratulations!

You’ve Reachedthe 32,000 ops/s

Milestone!

Congratulations!Congratulations!@VictorFAnjos

Achieving performance through code/drivers

when friends aren’t enough

39© 2015. All Rights Reserved. @VictorFAnjos

Replication Factor = 3

Insert into a cluster of size 6 with consistency Quorum

Two nodes in token range must be present for write to succeed

Achieving performance through code/drivers

when friends aren’t enough

40© 2015. All Rights Reserved. @VictorFAnjos

What happens now?

Cannot achieve consistency level QUORUM

Cannot achieve consistency level QUORUM

Cannot achieve consistency level QUORUM

Cannot achieve consistency level QUORUM

Nodes in partition key DOWN

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

42© 2015. All Rights Reserved. @VictorFAnjos

A: Truth table

C: CAP Theorem

B: Brewer’s Theorem

D: Entropy

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

This mathematical and CS concept helps when data modeling for query

optimization.

@VictorFAnjos

43© 2015. All Rights Reserved. @VictorFAnjos

A: Truth table

C: CAP Theorem

B: Brewer’s Theorem

D: Entropy

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

This mathematical and CS concept helps when data modeling for query

optimization.

Data modelling, CQLSH and more

the truth shall set you free

44© 2015. All Rights Reserved. @VictorFAnjos

Motivated by CS, Math, Engineering

Allows for creating building blocks that yield a single output

More complex truth tables can arise

Data modelling, CQLSH and more

the truth shall set you free

45© 2015. All Rights Reserved. @VictorFAnjos

How about searching for username?

And what about full_name?

user_stream

← ← ← Partition Key → → → user_id username full_name

1 0 00 1 00 0 1

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

47© 2015. All Rights Reserved. @VictorFAnjos

A: Reads / Batches

C: Writes / Deletes

B: Writes / Batches

D: Reads / Deletes

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

A shift in paradigms, what should you

maximize and reduce for best performance.

@VictorFAnjos

48© 2015. All Rights Reserved. @VictorFAnjos

A: Reads / Batches

C: Writes / Deletes

B: Writes / Batches

D: Reads / Deletes

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

A shift in paradigms, what should you

maximize and reduce for good performance.

Data modelling, CQLSH and more

do the write thing

49© 2015. All Rights Reserved. @VictorFAnjos

Data modelling, CQLSH and more

do the write thing

50© 2015. All Rights Reserved. @VictorFAnjos

memtable --- < 100ns

commit log --- ~ 1 ms

DELETES / TTL cause compactions

Data modelling, CQLSH and more

do the write thing

51© 2015. All Rights Reserved. @VictorFAnjos

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

53© 2015. All Rights Reserved. @VictorFAnjos

A: ACID

C: Rollback

B: Vector

D: Sharding

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

To not hit a 2B record limit (per row), this

RDBMS borrowed term can still makes sense.

@VictorFAnjos

54© 2015. All Rights Reserved. @VictorFAnjos

A: ACID

C: Rollback

B: Vector

D: Sharding

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

To not hit a 2B record limit (per row), this

RDBMS borrowed term can still makes sense.

@VictorFAnjos

Data modelling, CQLSH and more

sit on this and rotate

55© 2015. All Rights Reserved. @VictorFAnjos

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

57© 2015. All Rights Reserved. @VictorFAnjos

A: Batches

C: Secondary Indexes

B: Synchronous

D: MySQL

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

Many say to use sparingly, I would say, avoid like the plague.

@VictorFAnjos

58© 2015. All Rights Reserved. @VictorFAnjos

A: Batches

C: Secondary Indexes

B: Synchronous

D: MySQL

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

Many say to use sparingly, I would say, avoid like the plague.

Performance must-haves

never be second best

59© 2015. All Rights Reserved. @VictorFAnjos

writes are distributed among the cluster

each partition key refers to one exact position in which to get a row

but what do we do when we don’t have exactly the right type of index to specify a queryCREATE TABLE users ( user varchar, email varchar, state varchar, PRIMARY KEY (user));-- OPTION 1 : create an indexCREATE INDEX idxUBS on users (state);

-- OPTION 2 : create another table (store data twice)CREATE TABLE usersByState ( state varchar, user varchar, PRIMARY KEY (state, user));

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

@VictorFAnjos

61© 2015. All Rights Reserved. @VictorFAnjos

A: UDT

C: JSON

B: Lightweight Transactions

D: Hinted handoff

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

This addition to C* can help with ACID like

transactions, at a bit of a performance hit.

@VictorFAnjos

62© 2015. All Rights Reserved. @VictorFAnjos

A: UDT

C: JSON

B: Lightweight Transactions

D: Hinted handoff

50:50

151413121110987654321

1 Million500,000250,000125,00064,00032,00016,0008,0004,0002,0001,000500300200100

This recent addition to C* now helps with ACID like transactions, at a bit

of a performance hit.

@VictorFAnjos

Performance must-haves

slimfast agreement

63© 2015. All Rights Reserved. @VictorFAnjos

Prepares a proposal that is sent to a number of Acceptors.Waits on a an acknowledgement (in form of promise) from Acceptors.Sends accept message to Quorum of Acceptors with new value to commit.Returns success? completion to client.

Determines if proposal is newer than what it has seen.Acknowledges/agree with its own highest proposal value seen AND the current value (of what is to be set).Receive message to commit new value.Accept and return on successful commit of value.

64© 2015. All Rights Reserved. @VictorFAnjos

Performance must-haves

slimfast agreement

Thank you

top related