cassandra for rails

51
A highly scalable, eventually consistent, distributed, structured key-value store. Wednesday, December 16, 2009

Upload: pablete

Post on 18-Dec-2014

5.709 views

Category:

Technology


1 download

DESCRIPTION

Cassandra DB: ¿Qué tienen Facebook, Twitter y Digg en común?Conferencia Rails 2009

TRANSCRIPT

Page 1: Cassandra for Rails

A highly scalable, eventually consistent, distributed, structured key-value store.

Wednesday, December 16, 2009

Page 2: Cassandra for Rails

Why

• Scaling existing Relational

Databases is hard.

• Sharding is one solution, but

makes your RDBMS unusuable.

• Operational Nightmare.

Wednesday, December 16, 2009

Page 3: Cassandra for Rails

The Bigdata Age

• Scale horizontally, just add more

servers

• Cluster growth. Load balance

automatically

• Flexible schemas

• Key-Oriented Queries

• High Availability, 24 x 7 x 365

Wednesday, December 16, 2009

Page 4: Cassandra for Rails

Cassandra Design

• High availability.

• Eventual consistency.

• Incremental scalability.

• Optimistic Replication.

• Low total cost of ownership.

• Tunable tradeoffs between consistency & latency.

• Minimal administration.

Wednesday, December 16, 2009

Page 5: Cassandra for Rails

"CNN.com""CNN""<html>..."

"<html>...""<html>..."

t9t6

t3t5 8t

"anchor:cnnsi.com"

"com.cnn.www"

"anchor:my.look.ca""contents:"

Figure 1: A slice of an example table that stores Web pages. The row name is a reversed URL. The contents column family con-tains the page contents, and the anchor column family contains the text of any anchors that reference the page. CNN’s home pageis referenced by both the Sports Illustrated and the MY-look home pages, so the row contains columns named anchor:cnnsi.comand anchor:my.look.ca. Each anchor cell has one version; the contents column has three versions, at timestamps t3, t5, and t6.

We settled on this data model after examining a varietyof potential uses of a Bigtable-like system. As one con-crete example that drove some of our design decisions,suppose we want to keep a copy of a large collection ofweb pages and related information that could be used bymany different projects; let us call this particular tablethe Webtable. In Webtable, we would use URLs as rowkeys, various aspects of web pages as column names, andstore the contents of the web pages in the contents: col-umn under the timestamps when they were fetched, asillustrated in Figure 1.

Rows

The row keys in a table are arbitrary strings (currently upto 64KB in size, although 10-100 bytes is a typical sizefor most of our users). Every read or write of data undera single row key is atomic (regardless of the number ofdifferent columns being read or written in the row), adesign decision that makes it easier for clients to reasonabout the system’s behavior in the presence of concurrentupdates to the same row.

Bigtable maintains data in lexicographic order by rowkey. The row range for a table is dynamically partitioned.Each row range is called a tablet, which is the unit of dis-tribution and load balancing. As a result, reads of shortrow ranges are efficient and typically require communi-cation with only a small number of machines. Clientscan exploit this property by selecting their row keys sothat they get good locality for their data accesses. Forexample, in Webtable, pages in the same domain aregrouped together into contiguous rows by reversing thehostname components of the URLs. For example, westore data for maps.google.com/index.html under thekey com.google.maps/index.html. Storing pages fromthe same domain near each other makes some host anddomain analyses more efficient.

Column Families

Column keys are grouped into sets called column fami-lies, which form the basic unit of access control. All datastored in a column family is usually of the same type (wecompress data in the same column family together). Acolumn family must be created before data can be storedunder any column key in that family; after a family hasbeen created, any column key within the family can beused. It is our intent that the number of distinct columnfamilies in a table be small (in the hundreds at most), andthat families rarely change during operation. In contrast,a table may have an unbounded number of columns.A column key is named using the following syntax:family:qualifier. Column family names must be print-able, but qualifiers may be arbitrary strings. An exam-ple column family for the Webtable is language, whichstores the language in which a web page was written. Weuse only one column key in the language family, and itstores each web page’s language ID. Another useful col-umn family for this table is anchor; each column key inthis family represents a single anchor, as shown in Fig-ure 1. The qualifier is the name of the referring site; thecell contents is the link text.Access control and both disk and memory account-ing are performed at the column-family level. In ourWebtable example, these controls allow us to manageseveral different types of applications: some that add newbase data, some that read the base data and create derivedcolumn families, and some that are only allowed to viewexisting data (and possibly not even to view all of theexisting families for privacy reasons).

Timestamps

Each cell in a Bigtable can contain multiple versions ofthe same data; these versions are indexed by timestamp.Bigtable timestamps are 64-bit integers. They can be as-signed by Bigtable, in which case they represent “realtime” in microseconds, or be explicitly assigned by client

To appear in OSDI 2006 2

"CNN.com""CNN""<html>..."

"<html>...""<html>..."

t9t6

t3t5 8t

"anchor:cnnsi.com"

"com.cnn.www"

"anchor:my.look.ca""contents:"

Figure 1: A slice of an example table that stores Web pages. The row name is a reversed URL. The contents column family con-tains the page contents, and the anchor column family contains the text of any anchors that reference the page. CNN’s home pageis referenced by both the Sports Illustrated and the MY-look home pages, so the row contains columns named anchor:cnnsi.comand anchor:my.look.ca. Each anchor cell has one version; the contents column has three versions, at timestamps t3, t5, and t6.

We settled on this data model after examining a varietyof potential uses of a Bigtable-like system. As one con-crete example that drove some of our design decisions,suppose we want to keep a copy of a large collection ofweb pages and related information that could be used bymany different projects; let us call this particular tablethe Webtable. In Webtable, we would use URLs as rowkeys, various aspects of web pages as column names, andstore the contents of the web pages in the contents: col-umn under the timestamps when they were fetched, asillustrated in Figure 1.

Rows

The row keys in a table are arbitrary strings (currently upto 64KB in size, although 10-100 bytes is a typical sizefor most of our users). Every read or write of data undera single row key is atomic (regardless of the number ofdifferent columns being read or written in the row), adesign decision that makes it easier for clients to reasonabout the system’s behavior in the presence of concurrentupdates to the same row.

Bigtable maintains data in lexicographic order by rowkey. The row range for a table is dynamically partitioned.Each row range is called a tablet, which is the unit of dis-tribution and load balancing. As a result, reads of shortrow ranges are efficient and typically require communi-cation with only a small number of machines. Clientscan exploit this property by selecting their row keys sothat they get good locality for their data accesses. Forexample, in Webtable, pages in the same domain aregrouped together into contiguous rows by reversing thehostname components of the URLs. For example, westore data for maps.google.com/index.html under thekey com.google.maps/index.html. Storing pages fromthe same domain near each other makes some host anddomain analyses more efficient.

Column Families

Column keys are grouped into sets called column fami-lies, which form the basic unit of access control. All datastored in a column family is usually of the same type (wecompress data in the same column family together). Acolumn family must be created before data can be storedunder any column key in that family; after a family hasbeen created, any column key within the family can beused. It is our intent that the number of distinct columnfamilies in a table be small (in the hundreds at most), andthat families rarely change during operation. In contrast,a table may have an unbounded number of columns.A column key is named using the following syntax:family:qualifier. Column family names must be print-able, but qualifiers may be arbitrary strings. An exam-ple column family for the Webtable is language, whichstores the language in which a web page was written. Weuse only one column key in the language family, and itstores each web page’s language ID. Another useful col-umn family for this table is anchor; each column key inthis family represents a single anchor, as shown in Fig-ure 1. The qualifier is the name of the referring site; thecell contents is the link text.Access control and both disk and memory account-ing are performed at the column-family level. In ourWebtable example, these controls allow us to manageseveral different types of applications: some that add newbase data, some that read the base data and create derivedcolumn families, and some that are only allowed to viewexisting data (and possibly not even to view all of theexisting families for privacy reasons).

Timestamps

Each cell in a Bigtable can contain multiple versions ofthe same data; these versions are indexed by timestamp.Bigtable timestamps are 64-bit integers. They can be as-signed by Bigtable, in which case they represent “realtime” in microseconds, or be explicitly assigned by client

To appear in OSDI 2006 2

General Data Models

<html>

Key-Value

Key-Columns

Bigtable: A Distributed Storage System for Structured Data - Google Inc

Wednesday, December 16, 2009

Page 6: Cassandra for Rails

"CNN.com""CNN""<html>..."

"<html>...""<html>..."

t9t6

t3t5 8t

"anchor:cnnsi.com"

"com.cnn.www"

"anchor:my.look.ca""contents:"

Figure 1: A slice of an example table that stores Web pages. The row name is a reversed URL. The contents column family con-tains the page contents, and the anchor column family contains the text of any anchors that reference the page. CNN’s home pageis referenced by both the Sports Illustrated and the MY-look home pages, so the row contains columns named anchor:cnnsi.comand anchor:my.look.ca. Each anchor cell has one version; the contents column has three versions, at timestamps t3, t5, and t6.

We settled on this data model after examining a varietyof potential uses of a Bigtable-like system. As one con-crete example that drove some of our design decisions,suppose we want to keep a copy of a large collection ofweb pages and related information that could be used bymany different projects; let us call this particular tablethe Webtable. In Webtable, we would use URLs as rowkeys, various aspects of web pages as column names, andstore the contents of the web pages in the contents: col-umn under the timestamps when they were fetched, asillustrated in Figure 1.

Rows

The row keys in a table are arbitrary strings (currently upto 64KB in size, although 10-100 bytes is a typical sizefor most of our users). Every read or write of data undera single row key is atomic (regardless of the number ofdifferent columns being read or written in the row), adesign decision that makes it easier for clients to reasonabout the system’s behavior in the presence of concurrentupdates to the same row.

Bigtable maintains data in lexicographic order by rowkey. The row range for a table is dynamically partitioned.Each row range is called a tablet, which is the unit of dis-tribution and load balancing. As a result, reads of shortrow ranges are efficient and typically require communi-cation with only a small number of machines. Clientscan exploit this property by selecting their row keys sothat they get good locality for their data accesses. Forexample, in Webtable, pages in the same domain aregrouped together into contiguous rows by reversing thehostname components of the URLs. For example, westore data for maps.google.com/index.html under thekey com.google.maps/index.html. Storing pages fromthe same domain near each other makes some host anddomain analyses more efficient.

Column Families

Column keys are grouped into sets called column fami-lies, which form the basic unit of access control. All datastored in a column family is usually of the same type (wecompress data in the same column family together). Acolumn family must be created before data can be storedunder any column key in that family; after a family hasbeen created, any column key within the family can beused. It is our intent that the number of distinct columnfamilies in a table be small (in the hundreds at most), andthat families rarely change during operation. In contrast,a table may have an unbounded number of columns.A column key is named using the following syntax:family:qualifier. Column family names must be print-able, but qualifiers may be arbitrary strings. An exam-ple column family for the Webtable is language, whichstores the language in which a web page was written. Weuse only one column key in the language family, and itstores each web page’s language ID. Another useful col-umn family for this table is anchor; each column key inthis family represents a single anchor, as shown in Fig-ure 1. The qualifier is the name of the referring site; thecell contents is the link text.Access control and both disk and memory account-ing are performed at the column-family level. In ourWebtable example, these controls allow us to manageseveral different types of applications: some that add newbase data, some that read the base data and create derivedcolumn families, and some that are only allowed to viewexisting data (and possibly not even to view all of theexisting families for privacy reasons).

Timestamps

Each cell in a Bigtable can contain multiple versions ofthe same data; these versions are indexed by timestamp.Bigtable timestamps are 64-bit integers. They can be as-signed by Bigtable, in which case they represent “realtime” in microseconds, or be explicitly assigned by client

To appear in OSDI 2006 2

"CNN.com""CNN""<html>..."

"<html>...""<html>..."

t9t6

t3t5 8t

"anchor:cnnsi.com"

"com.cnn.www"

"anchor:my.look.ca""contents:"

Figure 1: A slice of an example table that stores Web pages. The row name is a reversed URL. The contents column family con-tains the page contents, and the anchor column family contains the text of any anchors that reference the page. CNN’s home pageis referenced by both the Sports Illustrated and the MY-look home pages, so the row contains columns named anchor:cnnsi.comand anchor:my.look.ca. Each anchor cell has one version; the contents column has three versions, at timestamps t3, t5, and t6.

We settled on this data model after examining a varietyof potential uses of a Bigtable-like system. As one con-crete example that drove some of our design decisions,suppose we want to keep a copy of a large collection ofweb pages and related information that could be used bymany different projects; let us call this particular tablethe Webtable. In Webtable, we would use URLs as rowkeys, various aspects of web pages as column names, andstore the contents of the web pages in the contents: col-umn under the timestamps when they were fetched, asillustrated in Figure 1.

Rows

The row keys in a table are arbitrary strings (currently upto 64KB in size, although 10-100 bytes is a typical sizefor most of our users). Every read or write of data undera single row key is atomic (regardless of the number ofdifferent columns being read or written in the row), adesign decision that makes it easier for clients to reasonabout the system’s behavior in the presence of concurrentupdates to the same row.

Bigtable maintains data in lexicographic order by rowkey. The row range for a table is dynamically partitioned.Each row range is called a tablet, which is the unit of dis-tribution and load balancing. As a result, reads of shortrow ranges are efficient and typically require communi-cation with only a small number of machines. Clientscan exploit this property by selecting their row keys sothat they get good locality for their data accesses. Forexample, in Webtable, pages in the same domain aregrouped together into contiguous rows by reversing thehostname components of the URLs. For example, westore data for maps.google.com/index.html under thekey com.google.maps/index.html. Storing pages fromthe same domain near each other makes some host anddomain analyses more efficient.

Column Families

Column keys are grouped into sets called column fami-lies, which form the basic unit of access control. All datastored in a column family is usually of the same type (wecompress data in the same column family together). Acolumn family must be created before data can be storedunder any column key in that family; after a family hasbeen created, any column key within the family can beused. It is our intent that the number of distinct columnfamilies in a table be small (in the hundreds at most), andthat families rarely change during operation. In contrast,a table may have an unbounded number of columns.A column key is named using the following syntax:family:qualifier. Column family names must be print-able, but qualifiers may be arbitrary strings. An exam-ple column family for the Webtable is language, whichstores the language in which a web page was written. Weuse only one column key in the language family, and itstores each web page’s language ID. Another useful col-umn family for this table is anchor; each column key inthis family represents a single anchor, as shown in Fig-ure 1. The qualifier is the name of the referring site; thecell contents is the link text.Access control and both disk and memory account-ing are performed at the column-family level. In ourWebtable example, these controls allow us to manageseveral different types of applications: some that add newbase data, some that read the base data and create derivedcolumn families, and some that are only allowed to viewexisting data (and possibly not even to view all of theexisting families for privacy reasons).

Timestamps

Each cell in a Bigtable can contain multiple versions ofthe same data; these versions are indexed by timestamp.Bigtable timestamps are 64-bit integers. They can be as-signed by Bigtable, in which case they represent “realtime” in microseconds, or be explicitly assigned by client

To appear in OSDI 2006 2

<html>

Column Family Column Family

Column Family is also named a “Locality Group” in Google’s Bigtable terminology

Key-Value

Key-Columns

General Data Models

Bigtable: A Distributed Storage System for Structured Data - Google Inc

Wednesday, December 16, 2009

Page 7: Cassandra for Rails

STORAGE LAYOUTS

Wednesday, December 16, 2009

Page 8: Cassandra for Rails

Row-Based storage

• Pros: Read/Write of a single row in

a single IO operation

• Cons: If you want to scan a only

one column you still read all data.

Row-based Storage

! Pros:! Good locality of access (on disk and in cache) of

di!erent columns! Read/write of a single row is a single IO operation.

! Cons:! But if you want to scan only one column, you still

read all. Design Patterns for Distributed Non-Relational Databases - Todd Lipcon, Cloudera

Wednesday, December 16, 2009

Page 9: Cassandra for Rails

Columnar storageColumnar Storage

! Pros:! Data for a given column is stored sequentially! Scanning a single column (eg aggregate queries) is

fast! Cons:

! Reading a single row may seek once per column.

• Pros: good locality of access for

different columns

• Cons: Reading a single row may

seek once per columnDesign Patterns for Distributed Non-Relational Databases - Todd Lipcon, Cloudera

Wednesday, December 16, 2009

Page 10: Cassandra for Rails

with Column FamilyColumnar Storage with Locality Groups

! Columns are organized into families (“localitygroups”)

! Benefits of row-based layout within a group.! Benefits of column-based - don’t have to read

groups you don’t care about.

• Pros: Scanning a single column

(aggregate queries) is fast

• Cons: Reading a single row may

seek once per columnDesign Patterns for Distributed Non-Relational Databases - Todd Lipcon, Cloudera

Wednesday, December 16, 2009

Page 11: Cassandra for Rails

Log Structured Merge Trees

• Writes go to a commit log and in-memory storage (Memtable)

• The Memtable is occasionally flushed to disk (SSTable)

• The SSTables are periodically compacted into one.

The log-structured merge-tree (LSM-tree) P. E. O’Neil, E. Cheng, D. Gawlick, and E. J. O’Neil.

Convert random writes to sequential writes.

Wednesday, December 16, 2009

Page 12: Cassandra for Rails

Write Operations

Memtable

Commit Log

SSTable

SSTable

SSTable

Write Read

RAM

DISK(DISK)

Wednesday, December 16, 2009

Page 13: Cassandra for Rails

Read Operations

Memtable

Commit Log

SSTable

SSTable

SSTable

Write Read

RAM

DISK(DISK)

Wednesday, December 16, 2009

Page 14: Cassandra for Rails

Read Operations

Memtable

Commit Log

SSTable

SSTable

SSTable

Write Read

RAM

DISK

Bloom Filter

(DISK)

Wednesday, December 16, 2009

Page 15: Cassandra for Rails

Flush Memtable

Memtable

Commit Log

SSTable

SSTable

SSTable

Write Read

RAM

DISK(DISK)

Wednesday, December 16, 2009

Page 16: Cassandra for Rails

Flush Memtable

RAM

DISK

SSTable 4

SSTable 2

SSTable 1

SSTable 3

(DISK)

Wednesday, December 16, 2009

Page 17: Cassandra for Rails

Compactation

RAM

DISK

SSTable 4

SSTable 2

SSTable 1

SSTable 3

SSTable 1'

Merge Sort

(DISK)

Wednesday, December 16, 2009

Page 18: Cassandra for Rails

Compactation

RAM

DISKSSTable 1'

(DISK)

Wednesday, December 16, 2009

Page 19: Cassandra for Rails

Write Operations

Memtable

Commit Log

SSTable

SSTable

SSTable

Write Read

RAM

DISKSSTable 1'

(DISK)

Wednesday, December 16, 2009

Page 20: Cassandra for Rails

WRITE PROPERTIES

• No locks in the critical path

• Sequential disk access

• Behaves like a write back Cache

• Append support without read ahead

• Atomicity guarantee for a key

• “Always Writable” –accept writes during failure scenarios

Wednesday, December 16, 2009

Page 21: Cassandra for Rails

CAP Theorem

• CONSISTENCY:...how and whether a system is left in a consistent state after an

operation.

• AVAILABILITY:refers to system such that it is ensured to remain operational over

some period of time.

• PARTITION-TOLERANCE:Ability for a system to continue to operate in the presence of a

network partitions.

Wednesday, December 16, 2009

Page 22: Cassandra for Rails

Eventual Consistency

• As t! !, readers will see writes.

• In a steady state, the system is

guaranteed to eventually return the

las written value.

• Examples: DNS or MySQL slave

replication.

Wednesday, December 16, 2009

Page 23: Cassandra for Rails

Partitioning Scheme: Consistent Hashing

h(key)

Wednesday, December 16, 2009

Page 24: Cassandra for Rails

Partitioning Scheme: Consistent Hashing

key previously owned by A

Wednesday, December 16, 2009

Page 25: Cassandra for Rails

Partitioning Scheme: Consistent Hashing

Wednesday, December 16, 2009

Page 26: Cassandra for Rails

Partitioning Scheme: Replication

N=3

Wednesday, December 16, 2009

Page 27: Cassandra for Rails

Read Repair

Query

Closest replica

Cassandra Cluster

Replica A

Result

Replica B Replica C

Digest Query Digest Response Digest Response

Result

Client

Wednesday, December 16, 2009

Page 28: Cassandra for Rails

Read Repair

Query

Closest replica

Cassandra Cluster

Replica A

Result

Replica B Replica C

Digest Query Digest Response Digest Response

Result

Client

Read repair if digests differ

Wednesday, December 16, 2009

Page 29: Cassandra for Rails

Cluster Memebership

• Gossip protocol is used for cluster membership.

• Super lightweight with mathematically provable properties.

• State disseminated in O(log N) rounds where N is the number of nodes in the cluster.

• A member merges the list with its own list.

• Every T seconds each member increments its heartbeat counter and selects one other member to send its list to.

2

Wednesday, December 16, 2009

Page 30: Cassandra for Rails

Gossip Algorithm

Wednesday, December 16, 2009

Page 31: Cassandra for Rails

Gossip Algorithm: Round 1

Wednesday, December 16, 2009

Page 32: Cassandra for Rails

Gossip Algorithm: Round 2

Wednesday, December 16, 2009

Page 33: Cassandra for Rails

Gossip Algorithm: Round 3

Wednesday, December 16, 2009

Page 34: Cassandra for Rails

Gossip Algorithm: Round 4

Wednesday, December 16, 2009

Page 35: Cassandra for Rails

DATA MODEL

Wednesday, December 16, 2009

Page 36: Cassandra for Rails

Hierarchy

• ClusterName

• KeySpace / Database / Delicious

• ColumFamily / Table / Users

• key / ID / 12345

• column / Attribute / email

Wednesday, December 16, 2009

Page 37: Cassandra for Rails

DATA MODEL:Columns

Name

Value

Timestamp!"#

namePablo#timestamp

lastnameDelgado#timestamp

likesSugar#timestamp

$%&'()(

nameAntonio#timestamp

lastnameGarrote#timestamp

%*)+*,+

nameMauro#timestamp

lastnamePompilio#timestamp

age25#timestamp

updated_at2009/05/03#timestamp

languagees#timestamp

-%./+

COLUMN FAMILY: Users

Name

Value

Timestamp

Name

Value

Timestamp

Name

Value

Timestamp

Name

Value

Timestamp

Wednesday, December 16, 2009

Page 38: Cassandra for Rails

Name

Value

Timestamp!"#

namePablo#timestamp

lastnameDelgado#timestamp

likesSugar#timestamp

$%&'()(

nameAntonio#timestamp

lastnameGarrote#timestamp

%*)+*,+

nameMauro#timestamp

lastnamePompilio#timestamp

age25#timestamp

updated_at2009/05/03#timestamp

languagees#timestamp

-%./+

COLUMN FAMILY: Users

Name

Value

Timestamp

Name

Value

Timestamp

Name

Value

Timestamp

Name

Value

Timestamp

DATA MODEL:Columns

ordered keys

ordered column keys

Wednesday, December 16, 2009

Page 39: Cassandra for Rails

DATA MODEL: SuperColumns

COLUMN FAMILY: Tags

KEY

pablete

Name Name Name

beach mountain

Wednesday, December 16, 2009

Page 40: Cassandra for Rails

COLUMN FAMILY: Tags

KEY

pablete

Name Name Name

beach mountain

DATA MODEL: SuperColumns

ordered keys

ordered supercolumn keys

Wednesday, December 16, 2009

Page 41: Cassandra for Rails

DATA MODEL: SuperColumns

COLUMN FAMILY: Tags

Name

Title

Name

Title

Name

Title

Name

Title

KEY

pablete

Name

Name

Title

Name

Title

Name

Title

Name

Name

Title

Name

Title

Name

Title

Name

Name

Title

9876

san-diego

843

barcelona

654

niza

777

cadaques

beach

555

sicilia

78

trapani

mountain

1234

barcelona

888

andorra

Wednesday, December 16, 2009

Page 42: Cassandra for Rails

DATA MODEL: SuperColumns

COLUMN FAMILY: Tags

Name

Title

Name

Title

Name

Title

Name

Title

KEY

pablete

Name

Name

Title

Name

Title

Name

Title

Name

Name

Title

Name

Title

Name

Title

Name

Name

Title

9876

san-diego

843

barcelona

654

niza

777

cadaques

beach

555

sicilia

78

trapani

mountain

1234

barcelona

888

andorra

ordered keys

ordered supercolumn keys

ordered column keys

Wednesday, December 16, 2009

Page 43: Cassandra for Rails

!"#$%!"&'()*+,,+-.!+'!"#$%!"&'()*+,,+-.!+/*0-,1+-1,'!"#$%!"&'()*+,,+-.!+/123",'!"#$%!"&'33'

1!+-,30!1&4&56!%71889$77"!".5!+-,30!1(-":;56!%7188<0*="1(-":;>?0*+?60,1>@&>ABCD>EE1!+-,30!1(03"-*?%"-1&4&F+,,+-.!+56!%7188F+,,+-.!+88F?%"-1(-":;56!%71889%-+!2G!010*0?(-":;1!+-,30!1EE

="2,3+*"&4&>9?0H>="2&4&>.$."/?0H%->*0?$I-G+16&4&F+,,+-.!+56!%7188F0?$I-G+16(-":;8*0?$I-/7+I%?2&4J&>K,"!,>@&8*0?$I-&4J&>"I+%?>EL+?$"&4&>.$."M"N+I3?"(*0I>1&4&5%I"(-0:1%I",1+I3&4&1(10/%&O&B/DDD/DDD&P&1($,"*

*?%"-1(%-,"!1;="2,3+*"@&="2&@*0?$I-G+16@&L+?$"@&1%I",1+I3@&F+,,+-.!+56!%7188F0-,%,1"-*2Q"L"?88RSTUEV"H%-&&33&*?%"-1(H"1;="2,3+*"@&="2@&*0?$I-G+16@&F+,,+-.!+56!%7188F0-,%,1"-*2Q"L"?88UWSE!",*$"&F+,,+-.!+56!%7188W01X0$-.SN*"31%0-&4J&"&&3$1,&>Y"2&-01&70$-.(>"-.

Z4J&'.$."M+:",0I"(*0I'

Ruby Thrift Client

http://wiki.apache.org/cassandra/ClientExamples

Wednesday, December 16, 2009

Page 44: Cassandra for Rails

!"#$%!"&'!$V2H"I,'!"#$%!"&'*+,,+-.!+'!"#$%!"&'33'%-*?$."&F+,,+-.!+88F0-,1+-1,*?%"-1&4&F+,,+-.!+(-":;'9?0H'@&>B[\(D(D(B8ABCD>E

Z&FUQK]W<*?%"-1(%-,"!1;8K,"!,@&'+:",0I".$."'@&^'"I+%?'&4J&'.$."M+:",0I"(*0I'_E*0?$I-,&4&*?%"-1(H"1;8K,"!,@&'+:",0I".$."'E33&*0?$I-,`'"I+%?'aZ4J&'.$."M+:",0I"(*0I'

Z&<KGSTFUQK]W<*?%"-1(%-,"!1;85+H,@&'+:",0I".$."'@&^'V"+*6'&4J&^KKbc(-":&4J&',+-.%"H0'__E*?%"-1(%-,"!1;85+H,@&'+:",0I".$."'@&^'V"+*6'&4J&^KKbc(-":&4J&'V+!*"?0-+'__E*?%"-1(%-,"!1;85+H,@&'+:",0I".$."'@&^'I0$-1+%-'&4J&^KKbc(-":&4J&>V+!*"?0-+>@&KKbc(-":&4J&>+-.0!!+>__E&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&1+H,&4&*?%"-1(H"1;85+H,@&'+:",0I".$."'E33&1+H,(="2,Z4J&`>V"+*6>@>I0$-1+%->a

1+H,&4&*?%"-1(H"1;85+H,@&'+:",0I".$."'@'V"+*6'E33&30,1/="2,&4&1+H,(I+3&^&d1%I",1+I3@&="2d&="2&_Z4J&`>,+-.%"H0>@&>V+!*"?0-+>a*?%"-1(!"I0L";85+H,@&'+:",0I".$."'E

Ruby with Cassandra gem

http://github.com/fauna/cassandra

Wednesday, December 16, 2009

Page 45: Cassandra for Rails

*?+,,&F$,10I"!&e&F+,,+-.!+UVf"*1889+,"&&+11!%V$1"&87%!,1/-+I"@&8123"&&&&4J&<1!%-H&&+11!%V$1"&8?+,1/-+I"@&8123"&&&&&4J&<1!%-H&&+11!%V$1"&8.+1"/07/V%!16@&8123"&4J&c+1"&&+11!%V$1"&83!"7"!"-*",@&8123"&&&4J&g+,6

&&L+?%.+1"&8,60$?./V"/*00?

&&="2&8$$%.

&&%-."N&8?+,1/-+I"@&8!"L"!,".&4J&1!$"

&&+,,0*%+1%0-&8%-L0%*",@&8$-%#$"&4J&7+?,"@&8%-L"!,"/07&4J&8*$,10I"!@&8!"L"!,".&4J&1!$"

&&3!%L+1"

&&."7&,60$?./V"/*00?&&&&$-?",,&`>]%*6+"?>@&>h-%=+>@&>SL+->a(%-*?$."i;7%!,1/-+I"E&&&&&&"!!0!,(+..;87%!,1/-+I"@&>I$,1&V"&16+1&07&+&*00?&3"!,0->E&&&&"-.&&"-.

"-.

Ruby with CassandraObject (Rails 3)

http://github.com/NZKoz/cassandra_object

Wednesday, December 16, 2009

Page 46: Cassandra for Rails

.+1+V+,"&4&g+,6(-":

.+1+V+,"&4&^>K,"!,>&4J&^>I+?.%10H""=>&4J&^>-0IV!">&4J&>]+$!0>@&>+3"??%.0>&4J&>G0I3%?%0>&__@&&&&&&&&&&&&&&&&&&&&&&&^>3+V?"1">&&&&&4J&^>-0IV!">&4J&>G+V?0>@&>"I+%?>&&&&4J&>3+V?"1"MHI+%?(*0I>__@&&&&&&&&&&&&&&&&&&&&&&&^>N$!.">&&&&&&&4J&^>-0IV!">&4J&>j0!H">@&>?+-H$+H">&4J&>+,1$!%+-$>&__&&&&&&&&&&&_@

&&&&&&&&&&&^>5+H,>&4J&^>I+?.%10H""=>&4J&^>7?0:"!>&&4J&^>[DD\)D\)Bk>&4J&>B[lm>@&>[DD\)DC)Bk>&4J&>km[>&_@&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&>3?+*",>&&4J&^>[DD\)BB)Bk>&4J&>l\n>@&&>[DD\)BD)Bl>&4J&>k>&_@&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&>o00>&&&&&4J&^>[DD\)B[)[l>&4J&>\>&__@&&&&&&&&&&&&&&&&&&&&&&&>3+V?"1">&&&&&4J&^>/-01+H/>&4J&^>[DDk)DB)B[>&4J&>B>&__@&&&&&&&&&&&&&&&&&&&&&&&>N$!.">&&&&&&&4J&^>7!%"-.,>&4J&^>[DD\)Dm)BD>&4J&>l>@&&&&>[DD\)DB)B[>&4J&>[>&__&&&&&&&&&&&&&&&&&&&&&&_&&&&&&&&&&&_@&&&&&&&&&&&^>G6010,>&4J&^>B>&&&&4J&^>1%1?">4J>I2&7!%"-.,>@&>$!?>4J>61138)):3%*(*0I)[.6l=?.>_@&&&&&&&&&&&&&&&&&&&&&&&&&>[>&&&&4J&^>1%1?">4J>h-+>@&&&&&&&&>$!?>4J>61138)):3%*(*0I),.f=:!">_@&&&&&&&&&&&&&&&&&&&&&&&&&>l>&&&&4J&^>1%1?">4J>Q+.%,?+L>@&&&>$!?>4J>61138)):3%*(*0I)H,lm.!">_@&&&&&&&&&&&&&&&&&&&&&&&&&>k>&&&&4J&^>1%1?">4J>9+!*"?0-+>@&&>$!?>4J>61138)):3%*(*0I)2ml7.7.>_@&&&&&&&&&&&&&&&&&&&&&&&&&>\>&&&&4J&^>1%1?">4J>]0-="2>@&&&&&>$!?>4J>61138)):3%*(*0I)7"!"!"7>_@&&&&&&&&&&&&&&&&&&&&&&&&&>l\n>&&4J&^>1%1?">4J>W":&p0!=>@&&&>$!?>4J>61138)):3%*(*0I)f=?"!%1>_@&&&&&&&&&&&&&&&&&&&&&&&&&>km[>&&4J&^>1%1?">4J>q6%1"&T0,">@&>$!?>4J>61138)):3%*(*0I)=.f="f7>_@&&&&&&&&&&&&&&&&&&&&&&&&&>B[lm>&4J&^>1%1?">4J>9?+*=&T0,">@&>$!?>4J>61138)):3%*(*0I)[lC.HVH>_&&&&&&&&&&&&&&&&&&&&&&&&_&&&&&&&&&&&_

Zr%L"&I"&+-&$,"!&"I+%?3$1,&.+1+V+,"`>K,"!,>a`>3+V?"1">a`>"I+%?>a

Zr%L"&I"&16"&36010,&1+H".&>7?0:"!>&70!&$,"!&>I+?.%10H""=>3$1,&.+1+V+,"`>5+H,>a`>I+?.%10H""=>a`>7?0:"!>a(L+?$",

3$1,&.+1+V+,"`>G6010,>a`>B[lm>a`>1%1?">a

*ordered hash, ruby 1.9 for example

Model with a ruby Hash*

Wednesday, December 16, 2009

Page 47: Cassandra for Rails

.+1+V+,"&4&g+,6(-":

.+1+V+,"&4&^>K,"!,>&4J&^>I+?.%10H""=>&4J&^>-0IV!">&4J&>]+$!0>@&>+3"??%.0>&4J&>G0I3%?%0>&__@&&&&&&&&&&&&&&&&&&&&&&&^>3+V?"1">&&&&&4J&^>-0IV!">&4J&>G+V?0>@&>"I+%?>&&&&4J&>3+V?"1"MHI+%?(*0I>__@&&&&&&&&&&&&&&&&&&&&&&&^>N$!.">&&&&&&&4J&^>-0IV!">&4J&>j0!H">@&>?+-H$+H">&4J&>+,1$!%+-$>&__&&&&&&&&&&&_@

&&&&&&&&&&&^>5+H,>&4J&^>I+?.%10H""=>&4J&^>7?0:"!>&&4J&^>[DD\)D\)Bk>&4J&>B[lm>@&>[DD\)DC)Bk>&4J&>km[>&_@&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&>3?+*",>&&4J&^>[DD\)BB)Bk>&4J&>l\n>@&&>[DD\)BD)Bl>&4J&>k>&_@&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&>o00>&&&&&4J&^>[DD\)B[)[l>&4J&>\>&__@&&&&&&&&&&&&&&&&&&&&&&&>3+V?"1">&&&&&4J&^>/-01+H/>&4J&^>[DDk)DB)B[>&4J&>B>&__@&&&&&&&&&&&&&&&&&&&&&&&>N$!.">&&&&&&&4J&^>7!%"-.,>&4J&^>[DD\)Dm)BD>&4J&>l>@&&&&>[DD\)DB)B[>&4J&>[>&__&&&&&&&&&&&&&&&&&&&&&&_&&&&&&&&&&&_@&&&&&&&&&&&^>G6010,>&4J&^>B>&&&&4J&^>1%1?">4J>I2&7!%"-.,>@&>$!?>4J>61138)):3%*(*0I)[.6l=?.>_@&&&&&&&&&&&&&&&&&&&&&&&&&>[>&&&&4J&^>1%1?">4J>h-+>@&&&&&&&&>$!?>4J>61138)):3%*(*0I),.f=:!">_@&&&&&&&&&&&&&&&&&&&&&&&&&>l>&&&&4J&^>1%1?">4J>Q+.%,?+L>@&&&>$!?>4J>61138)):3%*(*0I)H,lm.!">_@&&&&&&&&&&&&&&&&&&&&&&&&&>k>&&&&4J&^>1%1?">4J>9+!*"?0-+>@&&>$!?>4J>61138)):3%*(*0I)2ml7.7.>_@&&&&&&&&&&&&&&&&&&&&&&&&&>\>&&&&4J&^>1%1?">4J>]0-="2>@&&&&&>$!?>4J>61138)):3%*(*0I)7"!"!"7>_@&&&&&&&&&&&&&&&&&&&&&&&&&>l\n>&&4J&^>1%1?">4J>W":&p0!=>@&&&>$!?>4J>61138)):3%*(*0I)f=?"!%1>_@&&&&&&&&&&&&&&&&&&&&&&&&&>km[>&&4J&^>1%1?">4J>q6%1"&T0,">@&>$!?>4J>61138)):3%*(*0I)=.f="f7>_@&&&&&&&&&&&&&&&&&&&&&&&&&>B[lm>&4J&^>1%1?">4J>9?+*=&T0,">@&>$!?>4J>61138)):3%*(*0I)[lC.HVH>_&&&&&&&&&&&&&&&&&&&&&&&&_&&&&&&&&&&&_

Zr%L"&I"&+-&$,"!&"I+%?3$1,&.+1+V+,"`>K,"!,>a`>3+V?"1">a`>"I+%?>aZ4J&>3+V?"1"MHI+%?(*0I>

Zr%L"&I"&16"&36010,&1+H".&>7?0:"!>&70!&$,"!&>I+?.%10H""=>3$1,&.+1+V+,"`>5+H,>a`>I+?.%10H""=>a`>7?0:"!>a(L+?$",

3$1,&.+1+V+,"`>G6010,>a`>B[lm>a`>1%1?">a&

*ordered hash, ruby 1.9 for example

Model with a ruby Hash*

Wednesday, December 16, 2009

Page 48: Cassandra for Rails

.+1+V+,"&4&g+,6(-":

.+1+V+,"&4&^>K,"!,>&4J&^>I+?.%10H""=>&4J&^>-0IV!">&4J&>]+$!0>@&>+3"??%.0>&4J&>G0I3%?%0>&__@&&&&&&&&&&&&&&&&&&&&&&&^>3+V?"1">&&&&&4J&^>-0IV!">&4J&>G+V?0>@&>"I+%?>&&&&4J&>3+V?"1"MHI+%?(*0I>__@&&&&&&&&&&&&&&&&&&&&&&&^>N$!.">&&&&&&&4J&^>-0IV!">&4J&>j0!H">@&>?+-H$+H">&4J&>+,1$!%+-$>&__&&&&&&&&&&&_@

&&&&&&&&&&&^>5+H,>&4J&^>I+?.%10H""=>&4J&^>7?0:"!>&&4J&^>[DD\)D\)Bk>&4J&>B[lm>@&>[DD\)DC)Bk>&4J&>km[>&_@&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&>3?+*",>&&4J&^>[DD\)BB)Bk>&4J&>l\n>@&&>[DD\)BD)Bl>&4J&>k>&_@&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&>o00>&&&&&4J&^>[DD\)B[)[l>&4J&>\>&__@&&&&&&&&&&&&&&&&&&&&&&&>3+V?"1">&&&&&4J&^>/-01+H/>&4J&^>[DDk)DB)B[>&4J&>B>&__@&&&&&&&&&&&&&&&&&&&&&&&>N$!.">&&&&&&&4J&^>7!%"-.,>&4J&^>[DD\)Dm)BD>&4J&>l>@&&&&>[DD\)DB)B[>&4J&>[>&__&&&&&&&&&&&&&&&&&&&&&&_&&&&&&&&&&&_@&&&&&&&&&&&^>G6010,>&4J&^>B>&&&&4J&^>1%1?">4J>I2&7!%"-.,>@&>$!?>4J>61138)):3%*(*0I)[.6l=?.>_@&&&&&&&&&&&&&&&&&&&&&&&&&>[>&&&&4J&^>1%1?">4J>h-+>@&&&&&&&&>$!?>4J>61138)):3%*(*0I),.f=:!">_@&&&&&&&&&&&&&&&&&&&&&&&&&>l>&&&&4J&^>1%1?">4J>Q+.%,?+L>@&&&>$!?>4J>61138)):3%*(*0I)H,lm.!">_@&&&&&&&&&&&&&&&&&&&&&&&&&>k>&&&&4J&^>1%1?">4J>9+!*"?0-+>@&&>$!?>4J>61138)):3%*(*0I)2ml7.7.>_@&&&&&&&&&&&&&&&&&&&&&&&&&>\>&&&&4J&^>1%1?">4J>]0-="2>@&&&&&>$!?>4J>61138)):3%*(*0I)7"!"!"7>_@&&&&&&&&&&&&&&&&&&&&&&&&&>l\n>&&4J&^>1%1?">4J>W":&p0!=>@&&&>$!?>4J>61138)):3%*(*0I)f=?"!%1>_@&&&&&&&&&&&&&&&&&&&&&&&&&>km[>&&4J&^>1%1?">4J>q6%1"&T0,">@&>$!?>4J>61138)):3%*(*0I)=.f="f7>_@&&&&&&&&&&&&&&&&&&&&&&&&&>B[lm>&4J&^>1%1?">4J>9?+*=&T0,">@&>$!?>4J>61138)):3%*(*0I)[lC.HVH>_&&&&&&&&&&&&&&&&&&&&&&&&_&&&&&&&&&&&_

Zr%L"&I"&+-&$,"!&"I+%?3$1,&.+1+V+,"`>K,"!,>a`>3+V?"1">a`>"I+%?>aZ4J&>3+V?"1"MHI+%?(*0I>

Zr%L"&I"&16"&36010,&1+H".&>7?0:"!>&70!&$,"!&>I+?.%10H""=>3$1,&.+1+V+,"`>5+H,>a`>I+?.%10H""=>a`>7?0:"!>a(L+?$",Z4J`>B[lm>@>km[>a

3$1,&.+1+V+,"`>G6010,>a`>B[lm>a`>1%1?">a&

*ordered hash, ruby 1.9 for example

Model with a ruby Hash*

Wednesday, December 16, 2009

Page 49: Cassandra for Rails

.+1+V+,"&4&g+,6(-":

.+1+V+,"&4&^>K,"!,>&4J&^>I+?.%10H""=>&4J&^>-0IV!">&4J&>]+$!0>@&>+3"??%.0>&4J&>G0I3%?%0>&__@&&&&&&&&&&&&&&&&&&&&&&&^>3+V?"1">&&&&&4J&^>-0IV!">&4J&>G+V?0>@&>"I+%?>&&&&4J&>3+V?"1"MHI+%?(*0I>__@&&&&&&&&&&&&&&&&&&&&&&&^>N$!.">&&&&&&&4J&^>-0IV!">&4J&>j0!H">@&>?+-H$+H">&4J&>+,1$!%+-$>&__&&&&&&&&&&&_@

&&&&&&&&&&&^>5+H,>&4J&^>I+?.%10H""=>&4J&^>7?0:"!>&&4J&^>[DD\)D\)Bk>&4J&>B[lm>@&>[DD\)DC)Bk>&4J&>km[>&_@&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&>3?+*",>&&4J&^>[DD\)BB)Bk>&4J&>l\n>@&&>[DD\)BD)Bl>&4J&>k>&_@&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&>o00>&&&&&4J&^>[DD\)B[)[l>&4J&>\>&__@&&&&&&&&&&&&&&&&&&&&&&&>3+V?"1">&&&&&4J&^>/-01+H/>&4J&^>[DDk)DB)B[>&4J&>B>&__@&&&&&&&&&&&&&&&&&&&&&&&>N$!.">&&&&&&&4J&^>7!%"-.,>&4J&^>[DD\)Dm)BD>&4J&>l>@&&&&>[DD\)DB)B[>&4J&>[>&__&&&&&&&&&&&&&&&&&&&&&&_&&&&&&&&&&&_@&&&&&&&&&&&^>G6010,>&4J&^>B>&&&&4J&^>1%1?">4J>I2&7!%"-.,>@&>$!?>4J>61138)):3%*(*0I)[.6l=?.>_@&&&&&&&&&&&&&&&&&&&&&&&&&>[>&&&&4J&^>1%1?">4J>h-+>@&&&&&&&&>$!?>4J>61138)):3%*(*0I),.f=:!">_@&&&&&&&&&&&&&&&&&&&&&&&&&>l>&&&&4J&^>1%1?">4J>Q+.%,?+L>@&&&>$!?>4J>61138)):3%*(*0I)H,lm.!">_@&&&&&&&&&&&&&&&&&&&&&&&&&>k>&&&&4J&^>1%1?">4J>9+!*"?0-+>@&&>$!?>4J>61138)):3%*(*0I)2ml7.7.>_@&&&&&&&&&&&&&&&&&&&&&&&&&>\>&&&&4J&^>1%1?">4J>]0-="2>@&&&&&>$!?>4J>61138)):3%*(*0I)7"!"!"7>_@&&&&&&&&&&&&&&&&&&&&&&&&&>l\n>&&4J&^>1%1?">4J>W":&p0!=>@&&&>$!?>4J>61138)):3%*(*0I)f=?"!%1>_@&&&&&&&&&&&&&&&&&&&&&&&&&>km[>&&4J&^>1%1?">4J>q6%1"&T0,">@&>$!?>4J>61138)):3%*(*0I)=.f="f7>_@&&&&&&&&&&&&&&&&&&&&&&&&&>B[lm>&4J&^>1%1?">4J>9?+*=&T0,">@&>$!?>4J>61138)):3%*(*0I)[lC.HVH>_&&&&&&&&&&&&&&&&&&&&&&&&_&&&&&&&&&&&_

Zr%L"&I"&+-&$,"!&"I+%?3$1,&.+1+V+,"`>K,"!,>a`>3+V?"1">a`>"I+%?>aZ4J&>3+V?"1"MHI+%?(*0I>

Zr%L"&I"&16"&36010,&1+H".&>7?0:"!>&70!&$,"!&>I+?.%10H""=>3$1,&.+1+V+,"`>5+H,>a`>I+?.%10H""=>a`>7?0:"!>a(L+?$",Z4J`>B[lm>@>km[>a

3$1,&.+1+V+,"`>G6010,>a`>B[lm>a`>1%1?">aZ4J>9?+*=&T0,">

*ordered hash, ruby 1.9 for example

Model with a ruby Hash*

Wednesday, December 16, 2009

Page 50: Cassandra for Rails

Thanks

Pablo Delgado@pablete

[email protected]

Wednesday, December 16, 2009

Page 51: Cassandra for Rails

References

• Avinash Lakshman, Prashant Malik (Facebook)Cassandra - A Decentralized Structured Storage Systemhttp://static.last.fm/johan/nosql-20090611/cassandra_nosql.pdf

• Jonathan Ellis (Rackspace Apache)Introduction to Cassandra at OSCON 09http://assets.en.oreilly.com/1/event/27/Cassandra_%20Open%20Source%20Bigtable%20+%20Dynamo%20Presentation.pdf

• Todd Lipcon. (Cloudera)Design Patterns for Distributed Non-Relational Databaseshttp://static.last.fm/johan/nosql-20090611/intro_nosql.pdf

Wednesday, December 16, 2009