Transcript
Page 1: Cassandra and Clojure

©2013 DataStax. Do not distribute without consent.©2013 DataStax. Do not distribute without consent.

Nick Bailey

OpsCenter Architect

Cassandra and Clojure

Page 2: Cassandra and Clojure

Who am I?• OpsCenter Architect

• Monitoring/management tool for Cassandra

• Organizer of Austin Cassandra Users• http://www.meetup.com/Austin-Cassandra-Users/

• Third Thursday each month. Come join!

• Working with Cassandra for 4 years

Page 3: Cassandra and Clojure

Cassandra - An introduction

Page 4: Cassandra and Clojure

Cassandra - Intro

• Based on Amazon Dynamo and Google BigTable papers

• Shared nothing

• Distributed

• Predictable scaling

Dynamo

BigTable

Page 5: Cassandra and Clojure

Users

33

Page 6: Cassandra and Clojure

Cassandra - Architecture

Page 7: Cassandra and Clojure

Cassandra - Cluster Architecture

• All nodes participate in a cluster

• Shared nothing

• Add or remove as needed

• More capacity? Add a server

Page 8: Cassandra and Clojure

Cassandra - Data Distribution

75

0

25

50

• Each node owns 1 or more “tokens”

• Each piece of data has a “partition key”

• Partition key is hashed to determine token

• Hashes:

• Murmur3 (default)

• Md5

Page 9: Cassandra and Clojure

Cassandra - Replication

• Client writes to any node

• Node coordinates with replicas

• Data replicated in parallel

• Replication factor (RF): How many copies of your data?

Page 10: Cassandra and Clojure

Cassandra - Failure Modes

• Consistency level

• How many nodes?

• ONE/QUORUM/ALL

Page 11: Cassandra and Clojure

Cassandra - Geographically Distributed

• Client writes local

• Data syncs across WAN

• Replication Factor per DC

• Consistency Level

• LOCAL_QUORUM

Datacenter East Datacenter West

Page 12: Cassandra and Clojure

Data Modeling - Concepts

Page 13: Cassandra and Clojure

CQL• Cassandra Query Language

• SQL-like

• Not Relational

Page 14: Cassandra and Clojure

Terminology• Keyspace

• Table (Column Family)

• Row

• Column

• Partition Key

• Clustering Key

Page 15: Cassandra and Clojure

Data Typescqlsh:clojure_cassandra_demo> help types

CQL types recognized by this version of cqlsh:

ascii bigint blob boolean counter decimal double float inet int list map set text timestamp timeuuid uuid varchar varint

Page 16: Cassandra and Clojure

Advanced Concepts• Lightweight Transactions

• Atomic Batches

• User Defined Types (coming soon)

Page 17: Cassandra and Clojure

Data Modeling - An Example

Page 18: Cassandra and Clojure

Approaching Data Modeling• Model your queries, not your data

• Generally, optimize for reads

• Denormalize!

• Iterate!

Page 19: Cassandra and Clojure

Basic Last.fm Clone• See songs that user X has listened to recently

• See user X’s favorite songs in a specific month

• See who has recently listened to artist Y

• See artist Y’s most popular songs in a specific week

Page 20: Cassandra and Clojure

Basic Last.fm Clone• See songs that user X has listened to recently

• One of the most common patterns/data models

• Time series

• Immutable (good fit for Clojure!)

Page 21: Cassandra and Clojure

Basic Last.fm Clone• See songs that user X has listened to recently

SELECT song, artist, played_at FROM user_history WHERE username = ‘nickmbailey’ORDER BY played_at DESC;

• Partition key = ‘username’

• Clustering key = ‘played_at’

Page 22: Cassandra and Clojure

Basic Last.fm Clone• See songs that user X has listened to recently

CREATE TABLE user_history ( username text, played_at timestamp, album text, artist text, song text, PRIMARY KEY (username, played_at)) WITH CLUSTERING ORDER BY (played_at DESC)

Page 23: Cassandra and Clojure

Basic Last.fm Clone• See songs that user X has listened to recently

• This table has a “bad” partition key

CREATE TABLE user_history ( username text, played_at timestamp, album text, artist text, song text, PRIMARY KEY (username, played_at)) WITH CLUSTERING ORDER BY (played_at DESC)

Page 24: Cassandra and Clojure

Basic Last.fm Clone• See songs that user X has listened to recently

• Much better partition key

CREATE TABLE user_history ( username text, year_and_month text, played_at timestamp, album text, artist text, song text, PRIMARY KEY ((username, year_and_month), played_at)) WITH CLUSTERING ORDER BY (played_at DESC)

Page 25: Cassandra and Clojure

Basic Last.fm Clone• See songs that user X has listened to recently

cqlsh:clojure_cassandra_demo> select * from user_history limit 5;

username | year_and_month | played_at | album | artist | song-------------+----------------+--------------------------+--------------------------+--------------------------+------------------------- nickmbailey | 2014-06 | 2014-06-30 17:13:54-0500 | Once More 'Round The Sun | Mastodon | Halloween nickmbailey | 2014-06 | 2014-06-30 17:08:53-0500 | Once More 'Round The Sun | Mastodon | Ember City b_hastings | 2014-06 | 2014-06-30 12:57:12-0500 | Buena Vista Social Club | Buena Vista Social Club | Chan Chan zack_smith | 2014-07 | 2014-07-30 12:49:35-0500 | Awake Remix | Tycho | Awake (Com Truise Remix) zack_smith | 2014-03 | 2014-03-30 12:44:50-0500 | Awake Remix | Tycho | Awake

Partition Key - unordered Clustering Key - Ordered

Page 26: Cassandra and Clojure

Basic Last.fm Clone• See user X’s favorite songs in a specific month

SELECT song, artist, play_count FROM user_history WHERE username = ‘nickmbailey’ AND month = ‘July’ORDER BY play_count DESC;

• Partition key = ‘username’, ‘month’

• Clustering key = ‘play_count’?

• Counters are a special case

Page 27: Cassandra and Clojure

Counters• Counter can not be part of the PRIMARY KEY

• No ordering based on counter value

• All non counter columns must be part of the PRIMARY KEY

• Limitations due to the storage format

Page 28: Cassandra and Clojure

Basic Last.fm Clone• See user X’s favorite songs in a specific month

CREATE TABLE user_song_counts ( username text, year_and_month text, artist text, song text, play_count counter, PRIMARY KEY ((username, year_and_month), artist, song))

Page 29: Cassandra and Clojure

Basic Last.fm Clone• See user X’s favorite songs in a specific month

• Results unordered• Client will have to do the sorting

cqlsh:clojure_cassandra_demo> select * from user_song_counts where username = 'nickmbailey' and year_and_month = '2014-07';

username | year_and_month | artist | song | count-------------+----------------+----------+-----------------------------------+------- nickmbailey | 2014-07 | Amos Lee | Tricksters, Hucksters, And Scamps | 10 nickmbailey | 2014-07 | Beck | Blackbird Chain | 1 nickmbailey | 2014-07 | Beck | Blue Moon | 4 nickmbailey | 2014-07 | Cherub | <3 | 12 nickmbailey | 2014-07 | Cherub | Chocolate Strawberries | 6

Page 30: Cassandra and Clojure

Basic Last.fm Clone• See who has recently listened to artist Y

CREATE TABLE artist_history ( artist text, year_and_week text, played_at timestamp, album text, song text, username text, PRIMARY KEY ((artist, year_and_week), played_at)) WITH CLUSTERING ORDER BY (played_at DESC)

Page 31: Cassandra and Clojure

Basic Last.fm Clone• See artist Y’s most popular songs in a specific week

CREATE TABLE artist_song_counts ( artist text, year_and_week text, album text, song text, play_count counter, PRIMARY KEY ((artist, year_and_week), album, song))

Page 32: Cassandra and Clojure

Cassandra from Clojure

Page 33: Cassandra and Clojure

Building Blocks

• Java Driver

• Hayt

Page 34: Cassandra and Clojure

Java Driver

• Fully featured

• Connection pooling

• Failover policies

• Retry policies

• Sync and Async interfaces

• Exposes client metrics

• https://github.com/datastax/java-driver

Page 35: Cassandra and Clojure

Hayt

• CQL DSL

• Similar to Korma

• Solely for building CQL strings

• https://github.com/mpenet/hayt

(select :foo (where { :bar 1

:baz 2)})

(->raw (select :foo (where {:bar 1 :baz 2)}))> "SELECT * FROM foo WHERE bar = 1 AND baz = 2;"

Page 36: Cassandra and Clojure

Clients

• Alia

• https://github.com/mpenet/alia

• Cassaforte

• https://github.com/clojurewerkz/cassaforte

• Both built on Java Driver and Hayt

• Not particularly different

Page 37: Cassandra and Clojure

Alia vs. Cassaforte

Cassaforte(let [conn (cc/connect ["127.0.0.1"])] (cql/create-keyspace conn "cassaforte_keyspace" (with {:replication {:class "SimpleStrategy" :replication_factor 1 }})))

Alia(def cluster (alia/cluster {:contact-points ["localhost"]}))(def session (alia/connect cluster))(alia/execute session

(create-keyspace :alia (if-exists false) (with {:replication {:class "SimpleStrategy" :replication_factor 1}})))

Page 38: Cassandra and Clojure

Learn by Example - Alia

Page 39: Cassandra and Clojure

Cluster Object

• Entry point

• Configures relevant client options

• :contact-points

• :load-balancing-policy

• :reconnection-policy

• :retry-policy

• and more!

(def cluster (alia/cluster {:contact-points ["localhost"]}))

Page 40: Cassandra and Clojure

Session Object

• A Session is associated with a keyspace

• Allows interacting with multiple keyspaces

(def cluster (alia/cluster {:contact-points [“localhost"]}))(def session (alia/connect cluster))(def session (alia/connect cluster) :my_keyspace)

Page 41: Cassandra and Clojure

Querying

• Multiple ways to query

• alia/execute

• Synchronous, block on result

• alia/execute-async

• Returns a Lamina result-channel (basically, a promise)

• Optional success/error callbacks

• alia/execute-chan

• Returns a core.async channel

• We won’t dive in to core.async now

Page 42: Cassandra and Clojure

Prepared Statements

• Statements can be prepared server side

• Better performance for common queries

(def prepared-statement (alia/prepare session "select * from users where user_name=?;"))

Page 43: Cassandra and Clojure

What else?

• See github and docs

• https://github.com/mpenet/alia

• http://mpenet.github.io/alia/qbits.alia.html

Page 44: Cassandra and Clojure

Demo

Page 45: Cassandra and Clojure

Demo

• https://github.com/nickmbailey/clojure-cassandra-demo

• Built with

• CCM - https://github.com/pcmanus/ccm

• Alia - https://github.com/mpenet/alia

• ring - https://github.com/ring-clojure/ring

• compojure - https://github.com/weavejester/compojure

• hiccup - https://github.com/weavejester/hiccup

• least - https://github.com/Raynes/least

Page 46: Cassandra and Clojure

MoreCassandra: http://cassandra.apache.org

DataStax Drivers: https://github.com/datastax

Documentation: http://www.datastax.com/docs

Getting Started: http://www.datastax.com/documentation/gettingstarted/index.html

Developer Blog: http://www.datastax.com/dev/blog

Cassandra Community Site: http://planetcassandra.org

Download: http://planetcassandra.org/Download/DataStaxCommunityEdition

Webinars: http://planetcassandra.org/Learn/CassandraCommunityWebinars

Cassandra Summit Talks: http://planetcassandra.org/Learn/CassandraSummit

Page 47: Cassandra and Clojure

©2013 DataStax Confidential. Do not distribute without consent.©2013 DataStax Confidential. Do not distribute without consent.


Top Related