cassandra for ruby/rails devs

53
Cassandra Intro to Tyler Hobbs

Upload: tyler-hobbs

Post on 24-Jan-2015

3.744 views

Category:

Business


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Cassandra for Ruby/Rails Devs

CassandraIntro to

Tyler Hobbs

Page 2: Cassandra for Ruby/Rails Devs

Dynamo(clustering)

History

BigTable(data model)

Cassandra

Inbox search

Page 3: Cassandra for Ruby/Rails Devs

Users

Page 4: Cassandra for Ruby/Rails Devs

Every node plays the same role– No masters, slaves, or special nodes

– No single point of failure

Clustering

Page 5: Cassandra for Ruby/Rails Devs

Consistent Hashing

0

10

20

30

40

50

Page 6: Cassandra for Ruby/Rails Devs

0

10

20

30

40

50

Key: “www.google.com”

Consistent Hashing

Page 7: Cassandra for Ruby/Rails Devs

0

10

20

30

40

50

Key: “www.google.com”

14

md5(“www.google.com”)

Consistent Hashing

Page 8: Cassandra for Ruby/Rails Devs

0

10

20

30

40

50

14

Key: “www.google.com”

md5(“www.google.com”)

Consistent Hashing

Page 9: Cassandra for Ruby/Rails Devs

0

10

20

30

40

50

14

Key: “www.google.com”

md5(“www.google.com”)

Consistent Hashing

Page 10: Cassandra for Ruby/Rails Devs

0

10

20

30

40

50

14

Key: “www.google.com”

md5(“www.google.com”)

Replication Factor = 3

Consistent Hashing

Page 11: Cassandra for Ruby/Rails Devs

Client can talk to any node

Clustering

Page 12: Cassandra for Ruby/Rails Devs

Scaling

50

0

10

20

30

The node at50 owns the red portion

RF = 2

Page 13: Cassandra for Ruby/Rails Devs

Scaling

50

0

10

20

30

40Add a new node at 40

RF = 2

Page 14: Cassandra for Ruby/Rails Devs

Scaling

50

0

10

20

30

40Add a new node at 40

RF = 2

Page 15: Cassandra for Ruby/Rails Devs

Node Failures

50

0

10

20

30

RF = 2

40

Replicas

Page 16: Cassandra for Ruby/Rails Devs

Node Failures

50

0

10

20

30

RF = 2

40

Replicas

Page 17: Cassandra for Ruby/Rails Devs

Node Failures

50

0

10

20

30

RF = 2

40

Page 18: Cassandra for Ruby/Rails Devs

Consistency, Availability Consistency

– Can I read stale data? Availability

– Can I write/read at all? Tunable Consistency

Page 19: Cassandra for Ruby/Rails Devs

Consistency N = Total number of replicas R = Number of replicas read from

– (before the response is returned) W = Number of replicas written to

– (before the write is considered a success)

Page 20: Cassandra for Ruby/Rails Devs

Consistency N = Total number of replicas R = Number of replicas read from

– (before the response is returned) W = Number of replicas written to

– (before the write is considered a success)

W + R > N gives strong consistency

Page 21: Cassandra for Ruby/Rails Devs

Consistency

W + R > N gives strong consistency

N = 3W = 2R = 2

2 + 2 > 3 ==> strongly consistent

Page 22: Cassandra for Ruby/Rails Devs

Consistency

W + R > N gives strong consistency

N = 3W = 2R = 2

2 + 2 > 3 ==> strongly consistent

Only 2 of the 3 replicas must be available.

Page 23: Cassandra for Ruby/Rails Devs

Consistency Tunable Consistency

– Specify N (Replication Factor) per data set– Specify R, W per operation

Page 24: Cassandra for Ruby/Rails Devs

Consistency Tunable Consistency

– Specify N (Replication Factor) per data set– Specify R, W per operation– Quorum: N/2 + 1

• R = W = Quorum• Strong consistency• Tolerate the loss of N – Quorum replicas

– R, W can also be 1 or N

Page 25: Cassandra for Ruby/Rails Devs

Availability Can tolerate the loss of:

– N – R replicas for reads– N – W replicas for writes

Page 26: Cassandra for Ruby/Rails Devs

CAP Theorem

Availability

Consistency

During node or network failure:

100%

100%

Possible

Not Possible

Page 27: Cassandra for Ruby/Rails Devs

CAP Theorem

Availability

Consistency

During node or network failure:

100%

100%

Cassandra

Not Possible

Possible

Page 28: Cassandra for Ruby/Rails Devs

No single point of failure Replication that works Scales linearly

– 2x nodes = 2x performance• For both reads and writes

– Up to 100's of nodes– See “Netflix: 1 million writes/sec on AWS”

Operationally simple Multi-Datacenter Replication

Clustering

Page 29: Cassandra for Ruby/Rails Devs

Comes from Google BigTable Goals

– Commodity Hardware• Spinning disks

– Handle data sets much larger than memory• Minimize disk seeks

– High throughput– Low latency– Durable

Data Model

Page 30: Cassandra for Ruby/Rails Devs

Static– Object data– Similar to a table in a relational database

Dynamic– Precomputed query results– Materialized views

(these are just educational classifications)

Column Families

Page 31: Cassandra for Ruby/Rails Devs

Static Column Families

zznate

driftx

thobbs

jbellis

password: *

password: *

password: *

name: Nate

name: Brandon

name: Tyler

password: * name: Jonathan site: riptano.com

Users

Page 32: Cassandra for Ruby/Rails Devs

Rows– Each row has a unique primary key– Sorted list of (name, value) tuples

• Like an ordered hash– The (name, value) tuple is called a “column”

Dynamic Column Families

Page 33: Cassandra for Ruby/Rails Devs

Dynamic Column Families

zznate

driftx

thobbs

jbellis

driftx: thobbs:

driftx: thobbs:mdennis: zznate:

Following

zznate:

pcmanus: xedin:

Page 34: Cassandra for Ruby/Rails Devs

Dynamic Column Families Other Examples:

– Timeline of tweets by a user– Timeline of tweets by all of the people a user is

following– List of comments sorted by score– List of friends grouped by state

Page 35: Cassandra for Ruby/Rails Devs

The Data API RPC-based API

– github.com/twitter/cassandra CQL (Cassandra Query Language)

– code.google.com/a/apache-extras.org/p/cassandra-ruby/

Page 36: Cassandra for Ruby/Rails Devs

Inserting Data

INSERT INTO users (KEY, “name”, “age”) VALUES (“thobbs”, “Tyler”, 24);

Page 37: Cassandra for Ruby/Rails Devs

Updating Data

INSERT INTO users (KEY, “age”) VALUES (“thobbs”, 34);

Updates are the same as inserts:

Or

UPDATE users SET “age” = 34 WHERE KEY = “thobbs”;

Page 38: Cassandra for Ruby/Rails Devs

Fetching Data

SELECT * FROM users WHERE KEY = “thobbs”;

Whole row select:

Page 39: Cassandra for Ruby/Rails Devs

Fetching Data

SELECT “name”, “age” FROM users WHERE KEY = “thobbs”;

Explicit column select:

Page 40: Cassandra for Ruby/Rails Devs

Fetching Data

UPDATE letters SET 1='a', 2='b', 3='c', 4='d', 5='e' WHERE KEY = “key”;

SELECT 1..3 FROM letters WHERE KEY = “key”;

Get a slice of columns

Returns [(1, a), (2, b), (3, c)]

Page 41: Cassandra for Ruby/Rails Devs

Fetching Data

SELECT FIRST 2 FROM letters WHERE KEY = “key”;

Get a slice of columns

Returns [(1, a), (2, b)]

SELECT FIRST 2 REVERSED FROM letters WHERE KEY = “key”;

Returns [(5, e), (4, d)]

Page 42: Cassandra for Ruby/Rails Devs

Fetching Data

SELECT 3..'' FROM letters WHERE KEY = “key”;

Get a slice of columns

Returns [(3, c), (4, d), (5, e)]

SELECT FIRST 2 REVERSED 4..'' FROM letters WHERE KEY = “key”;

Returns [(4, d), (3, c)]

Page 43: Cassandra for Ruby/Rails Devs

Deleting Data

DELETE FROM users WHERE KEY = “thobbs”;

Delete a whole row:

DELETE “age” FROM users WHERE KEY = “thobbs”;

Delete specific columns:

Page 44: Cassandra for Ruby/Rails Devs

Secondary Indexes

CREATE INDEX ageIndex ON users (age);

SELECT name FROM USERS WHERE age = 24 AND state = “TX”;

Builtin basic indexes

Page 45: Cassandra for Ruby/Rails Devs

Performance Writes

– 10k – 30k per second per node– Sub-millisecond latency

Reads– 1k – 20k per second per node (depends on data

set, caching– 0.1 to 10ms latency

Page 46: Cassandra for Ruby/Rails Devs

Other Features Distributed Counters

– Can support millions of high-volume counters Excellent Multi-datacenter Support

– Disaster recovery– Locality

Hadoop Integration– Isolation of resources– Hive and Pig drivers

Compression

Page 47: Cassandra for Ruby/Rails Devs

What Cassandra Can't Do Transactions

– Unless you use a distributed lock– Atomicity, Isolation– These aren't needed as often as you'd think

Limited support for ad-hoc queries– Know what you want to do with the data

Page 48: Cassandra for Ruby/Rails Devs

Not One-size-fits-all Use alongside an RDBMS

Page 49: Cassandra for Ruby/Rails Devs

Problems you shouldn't solve with C* Prototyping Distributed Locking Small datasets

– (When you don't need availability) Complex graph processing

– Shallow graph queries work well, though Fundamentally highly relational/transactional

data

Page 50: Cassandra for Ruby/Rails Devs

The sweet spot for Cassandra Large dataset, low latency queries Simple to medium complexity queries

– Key/value– Time series, ordered data– Lists, sets, maps

High Availability

Page 51: Cassandra for Ruby/Rails Devs

The sweet spot for Cassandra Social

– Texts, comments, check-ins, collaboration Activity

– Feeds, timelines, clickstreams, logs, sensor data Metrics

– Performance data over time– CloudKick, DataStax OpsCenter

Text Search– Inbox search at Facebook

Page 52: Cassandra for Ruby/Rails Devs

ORMs Poor integration ORMs are not a natural fit for Cassandra

– In C*, we mainly care about queries, not objects– Beyond simple K/V, abstraction breaks

Suggestion: don't waste time with an ORM– C* will only be used for a specific subset of your

data/queries– Use the C* API directly in your model

Page 53: Cassandra for Ruby/Rails Devs

Tyler Hobbs@tylhobbs

[email protected]

Questions?