cassandra eu - state of cql

The State of CQL

Sylvain Lebresne (DataStax)

A short CQL primer

New in Cassandra 2.0

Native protocol

What's next?

A better API for CassandraThrift is not satisfactory:

Cassandra has often been regarded as hard to develop against.

It doesn't have to be that way!

Not user friendly, hard to use.

Low level, very little abstraction.

Hard to evolve (in a backward compatible way).

Unreadable without driver abstraction.

····

Quick historical notesCQL1 first introduced in Cassandra 0.8, became CQL2 in Cassandra 1.0

"These aren't the CQL you are looking for"

CQL3 (CQL for short thereafter) introduced in Cassandra 1.2

Semantically, CQL1/CQL2 are closer to the Thrift API than to CQL3.

CQL3 is the version that's here to stay: no plan for a CQL4 any time soon.

·····

A short CQL primer

The Cassandra Query LanguageSyntactically, a subset of SQL (with a few extensions)

INSERT and UPDATE are both upserts

No joins, no sub-queries, no aggregation, ...

Denormalization is the norm: do the work at write time, not read time

·CREATE TABLE users ( user_id uuid, name text, password text, email text, picture_profile blob, PRIMARY KEY (user_id))

···

Denormalization: Cassandra modeling 101Efficient queries in Cassandra are based on 2 principles:

Denormalization is the technique that allows to achieve this in practice.

But this means CQL exposes:

the data queried is collocated on one replica set

the data queried is collocated on disk on those replicas

how to collocate data on the same replica set

how to collocate data on disk (for a given replica)

This is done in CQL through the primary key

CQL distinguishes 2 sub-parts in the PRIMARY KEY:

This is important, because CQL only allow queries for which an explicit indexexists:

CREATE TABLE inboxes ( user_id uuid, email_id timeuuid, sender text, recipients set<text>, subject text, is_read boolean, PRIMARY KEY (user_id, email_id))

partition key: decides the node on which the data is storedclustering columns: within the same partition key, (CQL3) rows arephysically ordered following the clustering columns

-- Get last 50 emails in user 51b-23-ab8 inboxSELECT * FROM inboxes WHERE user_id=51b-23-ab8 ORDER BY email_id DESC LIMIT 50;

CQL main features

For more details:

Collections (set, map and list)

Secondary indexes

Convenience functions (timeuuid, type conversions, ...)

····

http://cassandra.apache.org/doc/cql3/CQL.html

http://www.datastax.com/documentation/cql/3.1/webhelp/index.html

New in Cassandra 2.0

New in Cassandra 2.0Lightweight transactions:

Triggers:

ALTER DROP:

Preparing TIMESTAMP, TTL and LIMIT:

INSERT INTO test (id, name) VALUES (42, 'Tom') IF NOT EXISTS;UPDATE test SET password='newpass' WHERE id=42 IF password='oldpass';

CREATE TRIGGER myTrigger ON test USING 'my.trigger.Class'; CQL

CREATE TABLE test (k int PRIMARY KEY, prop1 int, prop2 text, prop3 float);ALTER TABLE test DROP prop3;

SELECT * FROM myTable LIMIT ?;UPDATE myTable USING TTL ? SET v = 2 WHERE k = 'foo';

New in Cassandra 2.0Conditional DDL:

Secondary indexes everywhere (almost):

SELECT aliases:

CREATE TABLE IF NOT EXISTS test (k int PRIMARY KEY);DROP KEYSPACE IF EXISTS ks;

CREATE TABLE timeline ( event_id uuid, created_at timeuuid, content blob, PRIMARY KEY (event_id, created_at));CREATE INDEX ON timeline (created_at);

SELECT event_id, dateOf(created_at) AS creation_date, FROM timeline;

Coming in Cassandra 2.0.2Named bind variables:

Prepared IN:

Limited SELECT DISTINCT:

SELECT * FROM timeline WHERE created_at > :tlow AND created_at <= :thigh AND key = :k;CQL

SELECT * FROM users WHERE user_id IN ?; CQL

CREATE TABLE test ( event_id int, created_at timestamp, content blob, PRIMARY KEY (event_id, created_at));SELECT DISTINCT event_id FROM test;

The native protocolA binary transport protocol for CQL

Native protocol

Example usage of the Java driver (https://github.com/datastax/java-driver):

Binary transport protocol for CQL

Query execution, prepared statements, authentication, compression, ...

Asynchronous (allows multiple concurrent queries per connection)

Server notifications (Only generic cluster events currently)

Existing drivers for Java, C#, Python, C++, Golang, ...

·····

Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build();Session session = cluster.connect("myKeyspace");

for (Row row : session.execute("SELECT * FROM myTable")) // Do something ...

New in Cassandra 2.0: native protocol 2Cursors:

Batching prepared statements:

One-shot prepare and execute:

SASL for authentication

for (Row row : session.execute("SELECT * FROM myTable")) // Do something ...

PreparedStatement ps = session.prepare("INSERT INTO myTable (p1, p1) VALUES (?, ?)");

BatchStatement bs = new BatchStatement();bs.add(ps.bind(0, "v1"));bs.add(ps.bind(1, "v2"));bs.add(ps.bind(2, "v3"));session.execute(bs);

session.execute("INSERT INTO users (id, photo) VALUES (?, ?)", someId, photoBytes);JAVA

What's next?Cassandra 2.1 and beyond

CQL: some ideasStorage engine optimizations for CQL

Secondary index for collections

Server side functions

User defined types

·····

User defined types

CREATE TYPE address ( street text, zip_code int, state text, phones set<text>);

CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address>);

INSERT INTO users (id, name) VALUES (234-4a-761, "Sylvain Lebresne");UPDATE users SET addresses["work"] = { street: '777 Mariners Island Blvd #510', zip_code: 94404, state: 'CA', phones: { 650-389-6000 }} WHERE id = 234-4a-761;

Thank You!(Questions?)

cassandra eu - state of cql

Technology

datastaxodbcdriverforapache cassandraanddatastaxenterprise...

datastaxodbcdriverforapache ......[odbc drivers] datastax...

cassandra eu 2012 - storage internals by nicolas favre-felix

c* summit eu 2013: effective cassandra development with...

practical cassandra: a developer's approach...3 data...

cassandra summit 2014: cql under the hood

c* summit eu 2013: cassandra made simple with cql drivers...

cassandra 2.1 boot camp, protocol, queries, cql

introduction to cassandra and cql for java developers

cassandra eu 2012 - data modelling workshop by richard low

cassandra summit eu 2014 - testing cassandra applications

cassandra - doag deutsche oracle-anwendergruppe e.v. · •...

columnar databases nosql · google bigtable, cassandra,...

cassandra eu - data model on fire

add a bit of acid to cassandra. my talk from cassandra...

cassandra sf meetup - cql performance with apache cassandra...

cassandra eu 2012 - cql: then, now and when by eric evans

datastaxodbcdriverforapache ......[datastax odbc driver for...

c* summit eu 2013: the cassandra experience at orange

cql syntax