deep dive into cql

33
©2014 DataStax Confidential. Do not distribute without consent. @rstml Rustam Aliyev Solution Architect Deep dive into CQL and CQL improvements in Cassandra 2.1 1

Upload: rustam-aliyev

Post on 01-Dec-2014

129 views

Category:

Technology


6 download

DESCRIPTION

How Cassandra database represents on storage layer various CQL types.

TRANSCRIPT

Page 1: Deep dive into CQL

©2014 DataStax Confidential. Do not distribute without consent.

@rstml

Rustam Aliyev Solution Architect

Deep dive into CQL and CQL improvements in Cassandra 2.1

1

Page 2: Deep dive into CQL

What is CQL? * Cassandra Query Language (CQL)

* SQL-like language for communicating with Cassandra

* Simpler than the Thrift API

* An abstraction layer that hides implementation details

This is what we want to understand

Page 3: Deep dive into CQL

Use Case * Messaging Application

* Group Conversations

* Attachments

Page 4: Deep dive into CQL

Simple CQL Table

CREATE TABLE messages ( conversation_id uuid, message_id timeuuid, content text, sender text, PRIMARY KEY (conversation_id, message_id) );

Page 5: Deep dive into CQL

TimeUUID * Also known as a Version 1 UUID

* Sortable

Timestamp to Microsecond + UUID = TimeUUID

04d580b0-9412-11e3-baa8-0800200c9a66 12 February 2014 13:18:06 GMT

http://www.famkruithof.net/uuid/uuidgen"

=

Page 6: Deep dive into CQL

Primary Key

CREATE TABLE messages ( conversation_id uuid, message_id timeuuid, content text, sender uuid, PRIMARY KEY (conversation_id, message_id) );

Partition Key Clustering Column

* Also Primary Index

Page 7: Deep dive into CQL

Partition Key conversation_id: 04d580b0-9412-…9a66

Replica * Determines partition (and replicas)

* Remaining columns are stored on the determined partition

RF=3

Page 8: Deep dive into CQL

Clustering Column

Merged, Sorted and Stored Sequentially

04d580b0-9412-…9a66

2013-04-03 07:01:00 content: Hi! sender: [email protected]

2013-04-03 07:03:20 content: Hello! Sender: tom@example…

2013-04-03 07:04:52 content: Where are you? sender: [email protected]

2013-04-03 07:05:01 content: in Istanbul sender: tom@example…

2013-04-03 07:06:32 content: wow! how come sender: [email protected]

* Data on disk is ordered based on Clustering Column

* Efficient retrieval with range queries (slice)

SELECT * FROM messages WHERE conversation_id = '04d580b0-9412-…9a66' AND message_id > minTimeuuid('2013-04-03 07:04:00') AND message_id < maxTimeuuid('2013-04-03 07:10:00');

Page 9: Deep dive into CQL

Data on Disk

Partition Key (Row Key)

Column Name 1 Column Value 1

Column Name 2 Column Value 2

Column Name 3 Column Value 3

...

Column Name N Column Value N

Page 10: Deep dive into CQL

Data on Disk

04d580b0-9412-3a00-93d1-46196ee79a66

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:content Hello!

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]

...

Clustering Column (message_id) Column Name Column Value

Partition Key (conversation_id)

 INSERT  INTO  messages  (conversation_id,  message_id,  content,  sender)  VALUES      (04d580b0-­‐9412-­‐3a00-­‐93d1-­‐46196ee79a66,  2f3feb0f-­‐9c24-­‐11e2-­‐7f7f-­‐7f7f7f7f7f7f,          'Hello!',  '[email protected]');  

Page 11: Deep dive into CQL

Order of Clustering Keys

CREATE TABLE messages ( conversation_id uuid, message_id timeuuid, content text, sender text, PRIMARY KEY (conversation_id, message_id) ) WITH CLUSTERING ORDER BY (message_id DESC);

* We need only most recent N messages

* Storing messages in reverse TimeUUID order will speedup queries

Page 12: Deep dive into CQL

Static Columns

CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, PRIMARY KEY (conversation_id, message_id) );

* Let’s add conversation owner (admin)

* Owner is related to conversation (Partition Key) not message (Clustering Key)

Page 13: Deep dive into CQL

Static Columns

UPDATE messages SET conversation_owner = '[email protected]' WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66;

* Same UPDATE with non-static field will fail

Page 14: Deep dive into CQL

Static Columns on Disk

04d580b0-9412-3a00-93d1-46196ee79a66

:null:conversation_owner [email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:content Hello!

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]

...

Static Column

Page 15: Deep dive into CQL

Collections: Set

CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, PRIMARY KEY (conversation_id, message_id) );

* We want to keep message recipients

* List of recipients may vary as people join and leave conversation

Page 16: Deep dive into CQL

Collections: Set UPDATE messages SET recipients = {'[email protected]', '[email protected]'} WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66 AND message_id = dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f;

Page 17: Deep dive into CQL

Set on Disk

04d580b0-9412-3a00-93d1-46196ee79a66

:null:conversation_owner [email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:

...

Set

Page 18: Deep dive into CQL

Collections: Map

CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, attachments map<text,text>, PRIMARY KEY (conversation_id, message_id) );

* Let’s add attachments to message

* Each attachment would have name and location (URI)

Page 19: Deep dive into CQL

Collections: Map

UPDATE messages SET attachments = {'picture.png':'http://cdn.exmpl.com/1234.png', 'audio.wav':'http://cdn.exmpl.com/5678.wav'} WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66 AND message_id = dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f;

Page 20: Deep dive into CQL

Map on Disk 04d580b0-9412-3a00-93d1-46196ee79a66

:null:conversation_owner [email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:attachments:picture.png http://cdn.exmpl.com/1234.png

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:attachments:audio.wav http://cdn.exmpl.com/5678.wav

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:

...

Map Name Key Value

Page 21: Deep dive into CQL

Collections: List

CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, attachments map<text,text>, seen_by list<text>, PRIMARY KEY (conversation_id, message_id) );

* We want to know which participants have seen message and preserve order

Page 22: Deep dive into CQL

Collections: List UPDATE messages SET seen_by = ['[email protected]', '[email protected]'] WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66 AND message_id = dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f;

Page 23: Deep dive into CQL

List on Disk 04d580b0-9412-3a00-93d1-46196ee79a66

:null:conversation_owner [email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:recipient:[email protected]

dbcd9d0f-...-7f7f-7f7f7f7f7f7f:seen_by:26017c10-f487-11e2-801f-df9895e5d0f8 [email protected]

dbcd9d0f-...-7f7f-7f7f7f7f7f7f:seen_by:26017c11-f487-11e2-801f-df9895e5d0f8 [email protected]

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:

...

List Name Element ID (TimeUUID) Value

Page 24: Deep dive into CQL

User Defined Types (UDT)

CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, seen_by list<text>, attachments map<text,attachment>, PRIMARY KEY (conversation_id, message_id) );

* New in Cassandra 2.1

* Let’s add more attributes to attachments

CREATE TYPE attachment ( size int, mime text, uri text );

Page 25: Deep dive into CQL

User Defined Types UPDATE messages SET attachments = attachments + { 'picture.png': { size: 10240, mime: 'image/png', uri: 'http://cdn.exmpl.com/1234.png' }} WHERE conversation_id = 04d580b0-9412-3a00-93d1-46196ee79a66 AND message_id = dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f;

Page 26: Deep dive into CQL

UDT on Disk 04d580b0-9412-3a00-93d1-46196ee79a66

:null:conversation_owner [email protected]

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:content Hi!

dbcd9d0f-9c23-11e2-7f7f-7f7f7f7f7f7f:sender [email protected]

dbcd9d0f-...-7f7f7f7f7f7f:recipient:[email protected]

dbcd9d0f-...-7f7f7f7f7f7f:recipient:[email protected]

dbcd9d0f-...-7f7f7f7f7f7f:attachments:picture.png 10240:'image/png':'http://cdn.exmpl.com/1234.png'

2f3feb0f-9c24-11e2-7f7f-7f7f7f7f7f7f:

...

Map Key UDT Value

Page 27: Deep dive into CQL

Secondary Indexes

CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, seen_by list<text>, attachments map<text,text>, PRIMARY KEY (conversation_id, message_id) );

* What if we want to lookup messages by sender?

CREATE INDEX sender_idx ON messages(sender); "

Page 28: Deep dive into CQL

Secondary Indexes

Page 29: Deep dive into CQL

Secondary Indexes Internally

sender_idx { "[email protected]" { 54bbfd0f-9c02-11e2-7f7f-7f7f7f7f7f7f : null, df04610f-9c02-11e2-7f7f-7f7f7f7f7f7f : null }, "[email protected]" { a82e4b0f-9c02-11e2-7f7f-7f7f7f7f7f7f : null } }

* Each node will keep reverse index for local data only

Page 30: Deep dive into CQL

Indexes on Collections

CREATE TABLE messages ( conversation_id uuid, conversation_owner text STATIC, message_id timeuuid, content text, sender text, recipients set<text>, seen_by list<text>, attachments map<text,text>, PRIMARY KEY (conversation_id, message_id) );

* New in Cassandra 2.1

CREATE INDEX recipients_idx ON messages(recipients); "

Page 31: Deep dive into CQL

Indexes on Collections

Page 32: Deep dive into CQL

Way more information

• 5 minute interviews • Use cases • Free training!

www.planetcassandra.org

Page 33: Deep dive into CQL

Questions?