Transcript
Page 1: Using Apache Cassandra: What is this thing, and how do I use it?

©2013 DataStax. Do not distribute without consent.

@zanson

Jeremiah JordanLead Software Engineer/Support

Using Apache Cassandra for Big DataWhat is this thing, and how do I use it?

1Monday, October 14, 13

Page 2: Using Apache Cassandra: What is this thing, and how do I use it?

Who I am• Jeremiah Jordan

• Lead Software Engineer in Support at DataStax

• Previously Senior Architect at Morningstar, Inc.

• Using Cassandra since 0.6

• Before that, wrote code for the F22

Monday, October 14, 13

Page 3: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra - An introduction

Monday, October 14, 13

Page 4: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra - Intro

• Based on Amazon Dynamo and Google BigTable papers

• Shared nothing

• Distributed

• Data safe as possible

• Predictable scaling

4

Dynamo

BigTable

Monday, October 14, 13

Page 5: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra - More than one server

• All nodes participate in a cluster

• Shared nothing

• Add or remove as needed

•More capacity? Add a server

5

• Each node owns a number of tokens• Tokens denote a range of keys

• 4 nodes? -> Key range/4• Each node owns 1/4 the data

Monday, October 14, 13

Page 6: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra - Locally Distributed

• Client writes to any node

• Node coordinates with others

• Data replicated in parallel

• Replication factor (RF): How many copies of your data?

• RF = 3 here

6

Each node stores 3/4 of clusters total data.

Monday, October 14, 13

Page 7: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra - Geographically Distributed

• Client writes local

• Data syncs across WAN

• Replication Factor per DC

7

Single coordinator

Monday, October 14, 13

Page 8: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra - Consistency

• Consistency Level (CL)

• Client specifies per read or write

8

• ALL = All replicas ack

• QUORUM = > 51% of replicas ack

• LOCAL_QUORUM = > 51% in local DC ack

• ONE = Only one replica acks

Monday, October 14, 13

Page 9: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra - Transparent to the application

• A single node failure shouldn’t bring failure

• Replication Factor + Consistency Level = Success

• This example:

• RF = 3

• CL = QUORUM

9

>51% Ack so we are good!

Monday, October 14, 13

Page 10: Using Apache Cassandra: What is this thing, and how do I use it?

Application Example - Layout

• Active-Active

• Service based DNS routing

10

Cassandra Replication

Monday, October 14, 13

Page 11: Using Apache Cassandra: What is this thing, and how do I use it?

Application Example - Uptime

11

• Normal server maintenance

• Application is unaware

Cassandra Replication

Monday, October 14, 13

Page 12: Using Apache Cassandra: What is this thing, and how do I use it?

Application Example - Failure

12

• Data center failure

• Data is safe. Route traffic.

33

Another happy user!

Monday, October 14, 13

Page 13: Using Apache Cassandra: What is this thing, and how do I use it?

Five Years of Cassandra

Jul-09 May-10 Feb-11 Dec-11 Oct-12 Jul-13

0.1 0.3 0.6 0.7 1.0 1.2...

2.0

DSE

Jul-08

Monday, October 14, 13

Page 14: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra 2.0 - Big new features

Monday, October 14, 13

Page 15: Using Apache Cassandra: What is this thing, and how do I use it?

SELECT * FROM usersWHERE username = ’jbellis’

[empty resultset]

Session 1SELECT * FROM usersWHERE username = ’jbellis’

[empty resultset]

Session 2

Lightweight transactions: the problem

INSERT INTO users (username,password)VALUES (’jbellis’,‘xdg44hh’)

INSERT INTO users (userName,password)VALUES (’jbellis’,‘8dhh43k’)

It’s a Race!

Who wins?

Monday, October 14, 13

Page 16: Using Apache Cassandra: What is this thing, and how do I use it?

LWT: details• 4 round trips vs 1 for normal updates

• Paxos - Paxos made easy

• Immediate consistency with no leader election or failover

• For reads, ConsistencyLevel.SERIAL

• http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0

Monday, October 14, 13

Page 17: Using Apache Cassandra: What is this thing, and how do I use it?

UPDATE USERS SET email = ’[email protected]’, ...WHERE username = ’jbellis’IF email = ’[email protected]’;

INSERT INTO USERS (username, email, ...)VALUES (‘jbellis’, ‘[email protected]’, ... )IF NOT EXISTS;

Using LWT

• Don’t overwrite an existing record

• Only update record if condition is met

Monday, October 14, 13

Page 19: Using Apache Cassandra: What is this thing, and how do I use it?

Installing Cassandra

Monday, October 14, 13

Page 20: Using Apache Cassandra: What is this thing, and how do I use it?

Download Cassandra

Monday, October 14, 13

Page 21: Using Apache Cassandra: What is this thing, and how do I use it?

Download Cassandra

Monday, October 14, 13

Page 22: Using Apache Cassandra: What is this thing, and how do I use it?

Download Cassandra

Monday, October 14, 13

Page 23: Using Apache Cassandra: What is this thing, and how do I use it?

Extract Cassandra

Monday, October 14, 13

Page 24: Using Apache Cassandra: What is this thing, and how do I use it?

Setup Data and Log Directories

Monday, October 14, 13

Page 25: Using Apache Cassandra: What is this thing, and how do I use it?

Start Cassandra

Monday, October 14, 13

Page 26: Using Apache Cassandra: What is this thing, and how do I use it?

Start Cassandra

Monday, October 14, 13

Page 27: Using Apache Cassandra: What is this thing, and how do I use it?

Installing Cassandra Python Driver

Monday, October 14, 13

Page 28: Using Apache Cassandra: What is this thing, and how do I use it?

Python Cassandra Driver

Monday, October 14, 13

Page 29: Using Apache Cassandra: What is this thing, and how do I use it?

Install Python Cassandra Driver

Monday, October 14, 13

Page 30: Using Apache Cassandra: What is this thing, and how do I use it?

Connect and Create a Keyspacefrom cassandra.cluster import Cluster

cluster = Cluster(['127.0.0.1'])session = cluster.connect()

log.info("creating keyspace...")KEYSPACE = "testkeyspace"session.execute(""" CREATE KEYSPACE IF NOT EXISTS %s WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '1' } """ % KEYSPACE)

Monday, October 14, 13

Page 31: Using Apache Cassandra: What is this thing, and how do I use it?

Create a Tablelog.info("setting keyspace...")session.set_keyspace(KEYSPACE)

log.info("creating table...")session.execute(""" CREATE TABLE IF NOT EXISTS mytable ( thekey text, col1 text, col2 text, PRIMARY KEY (thekey, col1) ) """)

Monday, October 14, 13

Page 32: Using Apache Cassandra: What is this thing, and how do I use it?

Insert a Rowquery = SimpleStatement(""" INSERT INTO mytable (thekey, col1, col2) VALUES ('key1', 'a', 'b') """, consistency_level=ConsistencyLevel.ONE)

log.info("inserting row")session.execute(query)

Monday, October 14, 13

Page 33: Using Apache Cassandra: What is this thing, and how do I use it?

Insert Rows (Prepared Statement)prepared = session.prepare(""" INSERT INTO mytable (thekey, col1, col2) VALUES (?, ?, ?) """)

for i in range(10): log.info("inserting row %d" % i) bound = prepared.bind(("key%d" % i, "b%d" % i, "c%d" % i)) session.execute(bound)

Monday, October 14, 13

Page 34: Using Apache Cassandra: What is this thing, and how do I use it?

Query Resultsfuture = session.execute_async(""" SELECT * FROM mytable WHERE thekey='key1' """)rows = future.result()

log.info("key\tcol1\tcol2")log.info("---\t----\t----")for row in rows: log.info("\t".join(row))

Monday, October 14, 13

Page 35: Using Apache Cassandra: What is this thing, and how do I use it?

Run It

Monday, October 14, 13

Page 36: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra Applications - Drivers

• DataStax Drivers for Cassandra

• Java

• C#

• Python

•more on the way

36Monday, October 14, 13

Page 37: Using Apache Cassandra: What is this thing, and how do I use it?

Find Out MoreCassandra: http://cassandra.apache.org

DataStax Drivers: https://github.com/datastax

Documentation: http://www.datastax.com/docs

Getting Started: http://www.datastax.com/documentation/gettingstarted/index.html

Developer Blog: http://www.datastax.com/dev/blog

Cassandra Community Site: http://planetcassandra.org

Download: http://planetcassandra.org/Download/DataStaxCommunityEdition

Webinars: http://planetcassandra.org/Learn/CassandraCommunityWebinars

Cassandra Summit Talks: http://planetcassandra.org/Learn/CassandraSummit

Monday, October 14, 13

Page 38: Using Apache Cassandra: What is this thing, and how do I use it?

©2013 DataStax Confidential. Do not distribute without consent. 38Monday, October 14, 13


Top Related