introduction to cassandra

116
Introduction to Cassandra Wellington Ruby on Rails User Group Aaron Morton @aaronmorton 24/11/2010

Upload: aaronmorton

Post on 27-Jun-2015

2.732 views

Category:

Technology


1 download

DESCRIPTION

Recent talk I gave at the Wellington Rails User Group. I tried to build up the model of how and why Cassandra does things.

TRANSCRIPT

Page 1: Introduction to Cassandra

Introduction to Cassandra

Wellington Ruby on Rails User GroupAaron Morton @aaronmorton

24/11/2010

Page 2: Introduction to Cassandra

Disclaimer.This is an introduction not

a reference.

Page 3: Introduction to Cassandra

I may, from time to time and for the best possible

reasons, bullshit you.

Page 4: Introduction to Cassandra

What do you already know about Cassandra?

Page 5: Introduction to Cassandra

Get ready.

Page 6: Introduction to Cassandra

The next slide has a lot on it.

Page 7: Introduction to Cassandra

Cassandra is a distributed, fault tolerant, scalable, column oriented data

store.

Page 8: Introduction to Cassandra

A word about “column oriented”.

Page 9: Introduction to Cassandra

Relax.

Page 10: Introduction to Cassandra

It’s different to a row oriented DB like MySQL.

So...

Page 11: Introduction to Cassandra
Page 12: Introduction to Cassandra

For now, think about keys and values. Where each

value is a hash / dict.

Page 13: Introduction to Cassandra

Cassandra’s data model and on disk storage are based on the Google Bigtable

paper from 2006.

Page 14: Introduction to Cassandra

The distributed cluster design is based on the

Amazon Dynamo paper from 2007.

Page 15: Introduction to Cassandra

{‘foo’ => {‘bar’ => ‘baz’,},}

{key => {col_name => col_value,},}

Page 16: Introduction to Cassandra

Easy.Lets store ‘foo’ somewhere.

Page 17: Introduction to Cassandra

'foo'

Page 18: Introduction to Cassandra

But I want to be able to read it back if one machine

fails.

Page 19: Introduction to Cassandra

Lets distribute it on 3 of the 5 nodes I have.

Page 20: Introduction to Cassandra

This is the Replication Factor.

Called RF or N.

Page 21: Introduction to Cassandra

Each node has a token that identifies the upper value of

the key range it is responsible for.

Page 22: Introduction to Cassandra

#1<= E

#2<= J

#3<= O

#4<= T

#5<= Z

Page 23: Introduction to Cassandra

Client connects to a random node and asks it to coordinate storing the ‘foo’

key.

Page 24: Introduction to Cassandra

Each node knows about all other nodes in the cluster,

including their tokens.

Page 25: Introduction to Cassandra

This is achieved using a Gossip protocol. Every

second each node shares it’s full view of the cluster with 1 to 3 other nodes.

Page 26: Introduction to Cassandra

Our coordinator is node 5. It knows node 2 is

responsible for the ‘foo’ key.

Page 27: Introduction to Cassandra

#1<= E

#2'foo'

#3<= O

#4<= T

#5<= Z

Client

Page 28: Introduction to Cassandra

But there is a problem...

Page 29: Introduction to Cassandra

What if we have lots of values between F and J?

Page 30: Introduction to Cassandra

We end up with a “hot” section in our ring of

nodes.

Page 31: Introduction to Cassandra

That’s bad mmmkay?

Page 32: Introduction to Cassandra

You shouldn't have a hot section in your ring.

mmmkay?

Page 33: Introduction to Cassandra

A Partitioner is used to apply a transform to the

key. The transformed values are also used to define a

nodes’ range.

Page 34: Introduction to Cassandra

The Random Partitioner applies a MD5 transform. The range of all possible

keys values is changed to a 128 bit number.

Page 35: Introduction to Cassandra

There are other Partitioners, such as the

Order Preserving Partition. But start with the Random

Partitioner.

Page 36: Introduction to Cassandra

Let’s pretend all keys are now transformed to an

integer between 0 and 9.

Page 37: Introduction to Cassandra

Our 5 node cluster now looks like.

Page 38: Introduction to Cassandra

#1<= 2

#2<= 4

#3<= 6

#4<= 8

#5<= 0

Page 39: Introduction to Cassandra

Pretend our ‘foo’ key transforms to 3.

Page 40: Introduction to Cassandra

#1<= 2

#2"3"

#3<= 6

#4<= 8

#5<= 0

Client

Page 41: Introduction to Cassandra

Good start.

Page 42: Introduction to Cassandra

But where are the replicas?We want to replicate the

‘foo’ key 3 times.

Page 43: Introduction to Cassandra

A Replication Strategy is used to determine which

nodes should store replicas.

Page 44: Introduction to Cassandra

It’s also used to work out which nodes should have a

value when reading.

Page 45: Introduction to Cassandra

Simple Strategy orders the nodes by their token and places the replicas around

the ring.

Page 46: Introduction to Cassandra

Network Topology Strategy is aware of the racks and

Data Centres your servers are in. Can split replicas

between DC’s.

Page 47: Introduction to Cassandra

Simple Strategy will do in most cases.

Page 48: Introduction to Cassandra

Our coordinator will send the write to all 3 nodes at

once.

Page 49: Introduction to Cassandra

#1<= 2

#2"3"

#3"3"

#4"3"

#5<= 0

Client

Page 50: Introduction to Cassandra

Once the 3 replicas tell the coordinator they have

finished, it will tell the client the write completed.

Page 51: Introduction to Cassandra

Done.Let’s go home.

Page 52: Introduction to Cassandra

Hang on.What about fault tolerant?What if node #4 is down?

Page 53: Introduction to Cassandra

#1<= 2

#2"3"

#3"3"

#4"3"

#5<= 0

Client

Page 54: Introduction to Cassandra

The client must specify a Consistency Level for each

operation.

Page 55: Introduction to Cassandra

Consistency Level specifies how many nodes must

agree before the operation is a success.

Page 56: Introduction to Cassandra

For reads is known as R.For writes is known as W.

Page 57: Introduction to Cassandra

Here are the simple ones (there are a few more)...

Page 58: Introduction to Cassandra

One.The coordinator will only

wait for one node to acknowledge the write.

Page 59: Introduction to Cassandra

Quorum.N/2 + 1

Page 60: Introduction to Cassandra

All.

Page 61: Introduction to Cassandra

The cluster will work to eventually make all copies

of the data consistent.

Page 62: Introduction to Cassandra

To get consistent behaviour make sure that R + W > N.

You can do this by...

Page 63: Introduction to Cassandra

Always using Quorum for read and writes.

Or...

Page 64: Introduction to Cassandra

Use All for writes and One for reads.

Or...

Page 65: Introduction to Cassandra

Use All for reads and One for writes.

Page 66: Introduction to Cassandra

Try our write again, using Quorum consistency level.

Page 67: Introduction to Cassandra

Coordinator will wait for 2 nodes to complete the write before telling the client has completed.

Page 68: Introduction to Cassandra

#1<= 2

#2"3"

#3"3"

#4"3"

#5<= 0

Client

Page 69: Introduction to Cassandra

What about when node 4 comes online?

Page 70: Introduction to Cassandra

It will not have our “foo” key.

Page 71: Introduction to Cassandra

Won’t somebody please think of the “foo” key!?

Page 72: Introduction to Cassandra

During our write the coordinator will send a

Hinted Handoff to one of the online replicas.

Page 73: Introduction to Cassandra

Hinted Handoff tells the node that one of the

replicas was down and needs to be updated later.

Page 74: Introduction to Cassandra

#1<= 2

#2"3"

#3"3"

#4"3"

#5<= 0

Client

send "3" to #4

Page 75: Introduction to Cassandra

When node 4 comes back up, node 3 will eventually

process the Hinted Handoffs and send the

“foo” key to it.

Page 76: Introduction to Cassandra

#1<= 2

#2"3"

#3"3"

#4"3"

#5<= 0

Client

Page 77: Introduction to Cassandra

What if the “foo” key is read before the Hinted Handoff is processed?

Page 78: Introduction to Cassandra

#1<= 2

#2"3"

#3"3"

#4""

#5<= 0

Client

send "3" to #4

Page 79: Introduction to Cassandra

At our Quorum CL the coordinator asks all nodes that should have replicas to

perform the read.

Page 80: Introduction to Cassandra

Once CL nodes have returned, their values are

compared.

Page 81: Introduction to Cassandra

If the do not match a Read Repair process is kicked off.

Page 82: Introduction to Cassandra

A timestamp provided by the client during the write is used to determine the

“latest” value.

Page 83: Introduction to Cassandra

The “foo” key is written to node 4, and consistency

achieved, before the coordinator returns to the

client.

Page 84: Introduction to Cassandra

At lower CL the Read Repair happens in the

background and is probabilistic.

Page 85: Introduction to Cassandra

We can force Cassandra to repair everything using the

Anti Entropy feature.

Page 86: Introduction to Cassandra

Anti Entropy is the main feature for achieving

consistency. RR and HH are optimisations.

Page 87: Introduction to Cassandra

Anti Entropy started manually via command line

or Java JMX.

Page 88: Introduction to Cassandra

Great so far.

Page 89: Introduction to Cassandra

But ratemylolcats.com is going to be huge.

How do I store 100 Million pictures of cats?

Page 90: Introduction to Cassandra

Add more nodes.

Page 91: Introduction to Cassandra

More disk capacity, disk IO, memory, CPU, network IO.

More everything.

Page 92: Introduction to Cassandra

Linear scaling.

Page 93: Introduction to Cassandra

Clusters of 100+ TB.

Page 94: Introduction to Cassandra

And now for the data model.

Page 95: Introduction to Cassandra

From the outside in.

Page 96: Introduction to Cassandra

A Keyspace is the container for everything in your

application.

Page 97: Introduction to Cassandra

Keyspaces can be thought of as Databases.

Page 98: Introduction to Cassandra

A Column Family is a container for ordered and

indexed Columns.

Page 99: Introduction to Cassandra

Columns have a name, value, and timestamp

provided by the client.

Page 100: Introduction to Cassandra

The CF indexes the columns by name and

supports get operations by name.

Page 101: Introduction to Cassandra

CF’s do not define which columns can be stored in

them.

Page 102: Introduction to Cassandra

Column Families have a large memory overhead.

Page 103: Introduction to Cassandra

You typically have few (<10) CF’s in your Keyspace. But

there is no limit.

Page 104: Introduction to Cassandra

We have Rows. Rows have a key.

Page 105: Introduction to Cassandra

Rows store columns in one or more Column Families.

Page 106: Introduction to Cassandra

Different rows can store different columns in the same Column Family.

Page 107: Introduction to Cassandra

User CF

username => fredd_o_b => 04/03

username => bobcity => wellington

key => fred

key => bob

Page 108: Introduction to Cassandra

A key can store different columns in different Column Families.

Page 109: Introduction to Cassandra

User CF

username => fredd_o_b => 04/03

09:01 => tweet_6009:02 => tweet_70

key => fred

key => fred

Timeline CF

Page 110: Introduction to Cassandra

Here comes the Super Column Family to ruin it all.

Page 111: Introduction to Cassandra

Arrgggghhhhh.

Page 112: Introduction to Cassandra

A Super Column Family is a container for ordered and indexes Super Columns.

Page 113: Introduction to Cassandra

A Super Column has a name and an ordered and indexed list of Columns.

Page 114: Introduction to Cassandra

So the Super Column Family just gives another

level to our hash.

Page 115: Introduction to Cassandra

Social Super CF

following => {bob => 01/01/2010, tom => 01/02/2010}

followers => {bob => 01/01/2010}

key => fred

Page 116: Introduction to Cassandra

How about some code?