cassandra and clojure
DESCRIPTION
An introduction to Cassandra as well as an example of accessing Cassandra from Clojure. Includes an introduction to cluster architecture and data model in Cassandra. The code for the examples is available at: https://github.com/nickmbailey/clojure-cassandra-demoTRANSCRIPT
©2013 DataStax. Do not distribute without consent.©2013 DataStax. Do not distribute without consent.
Nick Bailey
OpsCenter Architect
Cassandra and Clojure
Who am I?• OpsCenter Architect
• Monitoring/management tool for Cassandra
• Organizer of Austin Cassandra Users• http://www.meetup.com/Austin-Cassandra-Users/
• Third Thursday each month. Come join!
• Working with Cassandra for 4 years
Cassandra - An introduction
Cassandra - Intro
• Based on Amazon Dynamo and Google BigTable papers
• Shared nothing
• Distributed
• Predictable scaling
Dynamo
BigTable
Users
33
Cassandra - Architecture
Cassandra - Cluster Architecture
• All nodes participate in a cluster
• Shared nothing
• Add or remove as needed
• More capacity? Add a server
Cassandra - Data Distribution
75
0
25
50
• Each node owns 1 or more “tokens”
• Each piece of data has a “partition key”
• Partition key is hashed to determine token
• Hashes:
• Murmur3 (default)
• Md5
Cassandra - Replication
• Client writes to any node
• Node coordinates with replicas
• Data replicated in parallel
• Replication factor (RF): How many copies of your data?
Cassandra - Failure Modes
• Consistency level
• How many nodes?
• ONE/QUORUM/ALL
Cassandra - Geographically Distributed
• Client writes local
• Data syncs across WAN
• Replication Factor per DC
• Consistency Level
• LOCAL_QUORUM
Datacenter East Datacenter West
Data Modeling - Concepts
CQL• Cassandra Query Language
• SQL-like
• Not Relational
Terminology• Keyspace
• Table (Column Family)
• Row
• Column
• Partition Key
• Clustering Key
Data Typescqlsh:clojure_cassandra_demo> help types
CQL types recognized by this version of cqlsh:
ascii bigint blob boolean counter decimal double float inet int list map set text timestamp timeuuid uuid varchar varint
Advanced Concepts• Lightweight Transactions
• Atomic Batches
• User Defined Types (coming soon)
Data Modeling - An Example
Approaching Data Modeling• Model your queries, not your data
• Generally, optimize for reads
• Denormalize!
• Iterate!
Basic Last.fm Clone• See songs that user X has listened to recently
• See user X’s favorite songs in a specific month
• See who has recently listened to artist Y
• See artist Y’s most popular songs in a specific week
Basic Last.fm Clone• See songs that user X has listened to recently
• One of the most common patterns/data models
• Time series
• Immutable (good fit for Clojure!)
Basic Last.fm Clone• See songs that user X has listened to recently
SELECT song, artist, played_at FROM user_history WHERE username = ‘nickmbailey’ORDER BY played_at DESC;
• Partition key = ‘username’
• Clustering key = ‘played_at’
Basic Last.fm Clone• See songs that user X has listened to recently
CREATE TABLE user_history ( username text, played_at timestamp, album text, artist text, song text, PRIMARY KEY (username, played_at)) WITH CLUSTERING ORDER BY (played_at DESC)
Basic Last.fm Clone• See songs that user X has listened to recently
• This table has a “bad” partition key
CREATE TABLE user_history ( username text, played_at timestamp, album text, artist text, song text, PRIMARY KEY (username, played_at)) WITH CLUSTERING ORDER BY (played_at DESC)
Basic Last.fm Clone• See songs that user X has listened to recently
• Much better partition key
CREATE TABLE user_history ( username text, year_and_month text, played_at timestamp, album text, artist text, song text, PRIMARY KEY ((username, year_and_month), played_at)) WITH CLUSTERING ORDER BY (played_at DESC)
Basic Last.fm Clone• See songs that user X has listened to recently
cqlsh:clojure_cassandra_demo> select * from user_history limit 5;
username | year_and_month | played_at | album | artist | song-------------+----------------+--------------------------+--------------------------+--------------------------+------------------------- nickmbailey | 2014-06 | 2014-06-30 17:13:54-0500 | Once More 'Round The Sun | Mastodon | Halloween nickmbailey | 2014-06 | 2014-06-30 17:08:53-0500 | Once More 'Round The Sun | Mastodon | Ember City b_hastings | 2014-06 | 2014-06-30 12:57:12-0500 | Buena Vista Social Club | Buena Vista Social Club | Chan Chan zack_smith | 2014-07 | 2014-07-30 12:49:35-0500 | Awake Remix | Tycho | Awake (Com Truise Remix) zack_smith | 2014-03 | 2014-03-30 12:44:50-0500 | Awake Remix | Tycho | Awake
Partition Key - unordered Clustering Key - Ordered
Basic Last.fm Clone• See user X’s favorite songs in a specific month
SELECT song, artist, play_count FROM user_history WHERE username = ‘nickmbailey’ AND month = ‘July’ORDER BY play_count DESC;
• Partition key = ‘username’, ‘month’
• Clustering key = ‘play_count’?
• Counters are a special case
Counters• Counter can not be part of the PRIMARY KEY
• No ordering based on counter value
• All non counter columns must be part of the PRIMARY KEY
• Limitations due to the storage format
Basic Last.fm Clone• See user X’s favorite songs in a specific month
CREATE TABLE user_song_counts ( username text, year_and_month text, artist text, song text, play_count counter, PRIMARY KEY ((username, year_and_month), artist, song))
Basic Last.fm Clone• See user X’s favorite songs in a specific month
• Results unordered• Client will have to do the sorting
cqlsh:clojure_cassandra_demo> select * from user_song_counts where username = 'nickmbailey' and year_and_month = '2014-07';
username | year_and_month | artist | song | count-------------+----------------+----------+-----------------------------------+------- nickmbailey | 2014-07 | Amos Lee | Tricksters, Hucksters, And Scamps | 10 nickmbailey | 2014-07 | Beck | Blackbird Chain | 1 nickmbailey | 2014-07 | Beck | Blue Moon | 4 nickmbailey | 2014-07 | Cherub | <3 | 12 nickmbailey | 2014-07 | Cherub | Chocolate Strawberries | 6
Basic Last.fm Clone• See who has recently listened to artist Y
CREATE TABLE artist_history ( artist text, year_and_week text, played_at timestamp, album text, song text, username text, PRIMARY KEY ((artist, year_and_week), played_at)) WITH CLUSTERING ORDER BY (played_at DESC)
Basic Last.fm Clone• See artist Y’s most popular songs in a specific week
CREATE TABLE artist_song_counts ( artist text, year_and_week text, album text, song text, play_count counter, PRIMARY KEY ((artist, year_and_week), album, song))
Cassandra from Clojure
Building Blocks
• Java Driver
• Hayt
Java Driver
• Fully featured
• Connection pooling
• Failover policies
• Retry policies
• Sync and Async interfaces
• Exposes client metrics
• https://github.com/datastax/java-driver
Hayt
• CQL DSL
• Similar to Korma
• Solely for building CQL strings
• https://github.com/mpenet/hayt
(select :foo (where { :bar 1
:baz 2)})
(->raw (select :foo (where {:bar 1 :baz 2)}))> "SELECT * FROM foo WHERE bar = 1 AND baz = 2;"
Clients
• Alia
• https://github.com/mpenet/alia
• Cassaforte
• https://github.com/clojurewerkz/cassaforte
• Both built on Java Driver and Hayt
• Not particularly different
Alia vs. Cassaforte
Cassaforte(let [conn (cc/connect ["127.0.0.1"])] (cql/create-keyspace conn "cassaforte_keyspace" (with {:replication {:class "SimpleStrategy" :replication_factor 1 }})))
Alia(def cluster (alia/cluster {:contact-points ["localhost"]}))(def session (alia/connect cluster))(alia/execute session
(create-keyspace :alia (if-exists false) (with {:replication {:class "SimpleStrategy" :replication_factor 1}})))
Learn by Example - Alia
Cluster Object
• Entry point
• Configures relevant client options
• :contact-points
• :load-balancing-policy
• :reconnection-policy
• :retry-policy
• and more!
(def cluster (alia/cluster {:contact-points ["localhost"]}))
Session Object
• A Session is associated with a keyspace
• Allows interacting with multiple keyspaces
(def cluster (alia/cluster {:contact-points [“localhost"]}))(def session (alia/connect cluster))(def session (alia/connect cluster) :my_keyspace)
Querying
• Multiple ways to query
• alia/execute
• Synchronous, block on result
• alia/execute-async
• Returns a Lamina result-channel (basically, a promise)
• Optional success/error callbacks
• alia/execute-chan
• Returns a core.async channel
• We won’t dive in to core.async now
Prepared Statements
• Statements can be prepared server side
• Better performance for common queries
(def prepared-statement (alia/prepare session "select * from users where user_name=?;"))
What else?
• See github and docs
• https://github.com/mpenet/alia
• http://mpenet.github.io/alia/qbits.alia.html
Demo
Demo
• https://github.com/nickmbailey/clojure-cassandra-demo
• Built with
• CCM - https://github.com/pcmanus/ccm
• Alia - https://github.com/mpenet/alia
• ring - https://github.com/ring-clojure/ring
• compojure - https://github.com/weavejester/compojure
• hiccup - https://github.com/weavejester/hiccup
• least - https://github.com/Raynes/least
MoreCassandra: http://cassandra.apache.org
DataStax Drivers: https://github.com/datastax
Documentation: http://www.datastax.com/docs
Getting Started: http://www.datastax.com/documentation/gettingstarted/index.html
Developer Blog: http://www.datastax.com/dev/blog
Cassandra Community Site: http://planetcassandra.org
Download: http://planetcassandra.org/Download/DataStaxCommunityEdition
Webinars: http://planetcassandra.org/Learn/CassandraCommunityWebinars
Cassandra Summit Talks: http://planetcassandra.org/Learn/CassandraSummit
©2013 DataStax Confidential. Do not distribute without consent.©2013 DataStax Confidential. Do not distribute without consent.