Things You Should Be Doing When Using Cassandra Drivers
Rebecca Mills Junior Evangelist at Datastax @rebccamills
What do I do?
2 Confidential
• Try to create awareness for open source Cassandra
• Develop content
• Identify problems newcomers might be encountering
• Develop strategies and material to help with that first ease of initial use
Of course all this extends to drivers!
Confidential 3
• Learning and playing with the drivers as much as I can
• Develop “Getting Started” tutorials for drivers in various programming languages
• Making it my mission to bring the details to light
So How Can We Communicate with Cassandra in “X” Language?
Confidential 4
We have what you need!
Confidential 5
• Datastax provides drivers for Java, Python, C#
• Fresh out of the oven Ruby, Node.js, and C++
• Also loads of open source drivers to chose from
• Check out the Planet Cassandra Client Drivers section
Confidential 6
Let’s get into some of the basics of smart Cassandra driver usage:
1. One Cluster instance per cluster
Confidential 7
• Configure different important aspects of the way connections and queries will be handled.
• Contact points • Retry Policies • Load Balancing Policies
cluster = Cluster(['10.1.1.3', '10.1.1.4', '10.1.1.5'], compression=True, load_balancing_policy=TokenAwarePolicy( DCAwareRoundRobinPolicy(local_dc='US_EAST')))
2. One Session per keyspace
Confidential 8
• Query execution, connection pooling • Long-lived object • Not to be used in a request/response short-lived
fashion • Share the same cluster and session instances
across your application
Cluster & Session
Confidential 9
cluster = Cluster(['10.1.1.3', '10.1.1.4', '10.1.1.5'], compression=True, load_balancing_policy=TokenAwarePolicy( DCAwareRoundRobinPolicy(local_dc='US_EAST'))) session = cluster.connect('demo')
3. Use Prepared Statements
Confidential 10
• If you execute a statement more than once
• Has multiple benefits
• Prepare once, bind and execute multiple times
• We’ll talk more about this soon!
Confidential 11
Cool
Useful
Confidential 12
Deep Dives:
Confidential 13
• Prepared Statements • Load Balancing Policies • Retry Policies • Connection Pooling
• Async API
Why use Prepared Statements?
Confidential 14
• More performant than using strings • Will be parsed only once on the server • We expect you to use them with repeated queries in
production • Avoid CQL injection
Prepared Statements
Confidential 15
Consider a string session.execute(""”
INSERT INTO users (lastname, age, city, email, firstname) VALUES (‘Jones’, 35, ‘Austin’, ‘[email protected]’, ‘Bob’)
"""
Prepared Statements
Confidential 16
session.execute("""
INSERT INTO users (lastname, age, city, email, firstname) VALUES (‘Smith’, 24, ‘Tampa’, ‘[email protected]’, ‘Bob’)
""")
session.execute(""”
INSERT INTO users (lastname, age, city, email, firstname) VALUES (‘Power’, 45, ‘New York’, ‘[email protected]’, ‘Kate’)
""")
session.execute(""”
INSERT INTO users (lastname, age, city, email, firstname) VALUES (‘Renolds’, 33, ‘Miami’, ‘[email protected]’, ‘Carl’)
""")
Prepared Statements
Confidential 17
Now the same, as a prepared statement
Prepared_stmt = session.prepare (“INSERT INTO users (lastname, age, city, email, firstname) VALUES (?, ?, ?, ?, ?)”)
Bound_stmt = prepared.bind([‘Jones’, 35, ‘Austin’, ‘[email protected]’, ‘Bob’])
Stmt = session.execute(bound_stmt)
What’s the difference?
Confidential 18
Prepared Statements
Confidential 19
Client Cassandra Entire Query String
Client Cassandra Query ID & Bound Values
INSERT with strings
INSERT with PreparedStatements
Large amount of data Parse cost
Smaller amount of data No parsing
So what does that mean to me?
Confidential 20
Speed!
Confidential 21
Prepared Statements
Confidential 22
http://techblog.netflix.com/2013/12/astyanax-update.html
Prepared Statements
Confidential 23
Putting a prepared statement in a for loop is an anti-pattern for (int i; i < 10; i++) { PreparedStatement ps = session.prepare("UPDATE user SET disabled = 1 WHERE id = ?");
session.execute(ps.bind(i)); }
Load Balancing
Confidential 24
• A load balancing policy will determine which node to run an insert or query.
• Since a client can read or write to any node, sometimes that can be inefficient.
• If a node receives a read or write owned on another node, it will coordinate that request for the client.
• We can use a load balancing policy to control that action.
Load Balancing deep dive
Confidential 25
Using this example
Cluster cluster = new Cluster! .builder().! .addContactPoint(“10.0.0.1”)! .withRetryPolicy(DefaultRetryPolicy.INSTANCE)! .withLoadBalancingPolicy(! new TokenAwarePolicy(! new DCAwareRoundRobinPolicy())!
Example data model
Confidential 26
CREATE TABLE users (!
username text PRIMARY KEY!
firstName text,!
lastName text!
);!
!
INSERT INTO users (username, firstName, lastName)!
VALUES (‘rmills’, ‘Rebecca’, ‘Mills’);!
!
INSERT INTO users (username, firstName, lastName)!
VALUES (‘pmcfadin’, ‘Patrick’, ‘McFadin’);!
!
Discover cluster
Confidential 27
Client .addContactPoint(“10.0.0.1”)!
10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
RF=3
Populate connection pool
Confidential 28
10.0.0.1 00-25
Client
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
DC1!
DC1!
Request for data
Confidential 29
Client 10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
SELECT firstName!FROM users!WHERE userName = ‘rmills’;!
rmills Murmur3 Hash Token = 15!
DC1!
Token Aware
Confidential 30
Client 10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
SELECT firstName!FROM users!WHERE userName = ‘rmills’;!
Token = 15!
withLoadBalancingPolicy(! new TokenAwarePolicy(!
DC1!
Token Aware
Confidential 31
Client 10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
SELECT firstName!FROM users!WHERE userName = ‘rmills’;!
Token = 15!
DC1! Which node?
DC1!
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
Token Aware
Confidential 32
Client 10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
SELECT firstName!FROM users!WHERE userName = ‘rmills’;!
Token = 15!
DC1!
DC1!
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
Token Aware
Confidential 33
Client 10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
SELECT firstName!FROM users!WHERE userName = ‘rmills’;!
Token = 15!
DC1!
DC1!
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
Token Aware
Confidential 34
Client 10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
SELECT firstName!FROM users!WHERE userName = ‘rmills’;!
DC1!
DC1!
withLoadBalancingPolicy(! new TokenAwarePolicy(! new DCAwareRoundRobinPolicy())!!
Token Aware
Confidential 35
Client 10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
SELECT firstName!FROM users!WHERE userName = ‘rmills’;!
DC1!
DC1!
withLoadBalancingPolicy(! new TokenAwarePolicy(! new DCAwareRoundRobinPolicy())!!
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
Token Aware - Retry
Confidential 36
Client 10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
SELECT firstName!FROM users!WHERE userName = ‘rmills’;!
DC1!
DC1!
withLoadBalancingPolicy(! new TokenAwarePolicy(! new DCAwareRoundRobinPolicy())!!
Retry Timeout
Without Token Aware
Confidential 37
Using this modified example
Cluster cluster = new Cluster! .builder().! .addContactPoint(“10.0.0.1”)! .withRetryPolicy(DefaultRetryPolicy.INSTANCE)! .withLoadBalancingPolicy(! new DCAwareRoundRobinPolicy())!
Request for data
Confidential 38
Client 10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
SELECT firstName!FROM users!WHERE userName = ‘pmcfadin’;!
pmcfadin Murmur3 Hash Token = 77!
DC1!
No Token Aware
Confidential 39
Client 10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
SELECT firstName!FROM users!WHERE userName = ‘pmcfadin’;!
Token = 77!
DC1!
DC1!
.withLoadBalancingPolicy(! new DCAwareRoundRobinPolicy())!
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
Data placement
Confidential 40
Client 10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
SELECT firstName!FROM users!WHERE userName = ‘pmcfadin’;!
Token = 77!
DC1!
DC1!
.withLoadBalancingPolicy(! new DCAwareRoundRobinPolicy())!
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
Standard Round Robin
Confidential 41
Client 10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
SELECT firstName!FROM users!WHERE userName = ‘pmcfadin’;!
Token = 77!
DC1!
DC1!
.withLoadBalancingPolicy(! new DCAwareRoundRobinPolicy())!
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50 Coordinate
Load Balancing
Confidential 42
• Default pre-java 2.0.2: RoundRobinPolicy • Now: TokenAwarePolicy – Adds token awareness to
a child policy • Acts as a filter, wraps around another policy • Used to reduce network hops, as only replicas will
be considered
Load Balancing - Whitelist
Confidential 43
• Ensures only the hosts from a provided list are used
• Wraps a child policy
• Used to limit the effects of automatic peer discovery
• Execute queries only a given list of hosts
Asynchronous Statements
Confidential 44
• Native binary protocol supports request pipelining
• A single connection can be used for single simultaneous and independent request/response exchanges
Asynchronous Statements
Confidential 45
• Don’t have to wait for a query to complete and return rows directly, non-blocking IO
• Method almost immediately returns a future object
Node Client
Asynchronous Statements
Confidential 46
query = "SELECT * FROM users WHERE lastname=%s" future = session.execute_async(query, [lastname]) # ... do some other work try: rows = future.result() user = rows[0] print user.name, user.age except ReadTimeout: log.exception("Query timed out:")
Asynchronous Statements
Confidential 47
# build a list of futures futures = [] query = "SELECT * FROM users WHERE lastname=%s" for user_id in ids_to_fetch: futures.append(session.execute_async(query, [lastname]) # wait for them to complete and use the results for future in futures: rows = future.result() print rows[0].name, rows[0].age
Where can I download the drivers?
Confidential 48
Planet Cassandra
Confidential 49
• A great place for Apache Cassandra resources!
• Blog post, webinars, tutorials, and much much more!
• Also a great place for your driver needs
Confidential 50
Confidential 51
Thank You!Twitter: @rebccamills
Confidential 52