cassandra 3 new features 2016
TRANSCRIPT
![Page 1: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/1.jpg)
@doanduyhai
New Cassandra 3 FeaturesDuyHai DOANApache Cassandra Evangelist
![Page 2: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/2.jpg)
@doanduyhai
Who Am I ?Duy Hai DOAN Apache Cassandra Evangelist• talks, meetups, confs …
• open-source projects (Achilles, Apache Zeppelin ...)
• OSS Cassandra point of contact•
☞ [email protected] ☞ @doanduyhai
2
![Page 3: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/3.jpg)
@doanduyhai
Datastax
• Founded in April 2010
• We contribute a lot to Apache Cassandra™
• 400+ customers (25 of the Fortune 100), 400+ employees
• Headquarter in San Francisco Bay area
• EU headquarter in London, offices in France and Germany
• Datastax Enterprise = OSS Cassandra + extra features
3
![Page 4: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/4.jpg)
@doanduyhai
Agenda
4
• Materialized Views
• User Defined Functions (UDF) and Aggregates (UDA)
• JSON Syntax
• New SASI full text search index
![Page 5: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/5.jpg)
@doanduyhai
Materialized Views (MV)• Why ? • Gotchas
![Page 6: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/6.jpg)
@doanduyhai
Why Materialized Views ?• Relieve the pain of manual denormalization
CREATE TABLE user(id int PRIMARY KEY, country text, …);
CREATE TABLE user_by_country( country text, id int, …, PRIMARY KEY(country, id));
6
![Page 7: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/7.jpg)
@doanduyhai
CREATE TABLE user_by_country ( country text, id int, firstname text, lastname text, PRIMARY KEY(country, id));
Materialzed View In ActionCREATE MATERIALIZED VIEW user_by_country AS SELECT country, id, firstname, lastnameFROM user WHERE country IS NOT NULL AND id IS NOT NULLPRIMARY KEY(country, id)
7
![Page 8: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/8.jpg)
Materialized Views Demo 8
![Page 9: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/9.jpg)
@doanduyhai
Materialized View Performance• Write performance
• slower than normal write• local lock + read-before-write cost (but paid only once for all views)• for each base table update, worst case: mv_count x 2 (DELETE + INSERT) extra
mutations for the views
9
![Page 10: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/10.jpg)
@doanduyhai
Materialized View Performance• Write performance vs manual denormalization
• MV better because no client-server network traffic for read-before-write • MV better because less network traffic for multiple views (client-side BATCH)
• Makes developer life easier à priceless
10
![Page 11: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/11.jpg)
@doanduyhai
Materialized View Performance• Read performance vs secondary index
• MV better because single node read (secondary index can hit many nodes)• MV better because single read path (secondary index = read index + read data)
11
![Page 12: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/12.jpg)
@doanduyhai
Materialized Views Consistency• Consistency level
• CL honoured for base table, ONE for MV + local batchlog
• Weaker consistency guarantees for MV than for base table.
12
![Page 13: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/13.jpg)
Q & A
! "
13
![Page 14: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/14.jpg)
@doanduyhai
User Define Functions (UDF)• Why ? • UDAs • Gotchas
![Page 15: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/15.jpg)
@doanduyhai
Rationale• Push computation server-side
• save network bandwidth (1000 nodes!)• simplify client-side code• provide standard & useful function (sum, avg …)• accelerate analytics use-case (pre-aggregation for Spark)
15
![Page 16: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/16.jpg)
@doanduyhai
How to create an UDF ?CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURNS returnTypeLANGUAGE language AS $$ // source code here$$;
16
![Page 17: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/17.jpg)
@doanduyhai
How to create an UDF ?CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURNS returnTypeLANGUAGE languageAS $$ // source code here$$;
Param name to refer to in the code Type = Cassandra type
17
![Page 18: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/18.jpg)
@doanduyhai
How to create an UDF ?CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURNS returnTypeLANGUAGE language // jAS $$ // source code here$$;
Always called Null-check mandatory in code
18
![Page 19: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/19.jpg)
@doanduyhai
How to create an UDF ?CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURNS returnTypeLANGUAGE language // javAS $$ // source code here$$;
If any input is null, function execution is skipped and return null
19
![Page 20: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/20.jpg)
@doanduyhai
How to create an UDF ?CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURNS returnType LANGUAGE languageAS $$ // source code here$$;
Cassandra types • primitives (boolean, int, …) • collections (list, set, map) • tuples • UDT
20
![Page 21: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/21.jpg)
@doanduyhai
How to create an UDF ?CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS][keyspace.]functionName (param1 type1, param2 type2, …)CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUTRETURNS returnTypeLANGUAGE languageAS $$ // source code here$$;
JVM supported languages • Java, Scala • Javascript (slow) • Groovy, Jython, JRuby • Clojure ( JSR 223 impl issue)
21
![Page 22: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/22.jpg)
UDF Demo
22
![Page 23: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/23.jpg)
@doanduyhai
User Defined Aggregates (UDA)• Real use-case for UDF
• Aggregation server-side à huge network bandwidth saving
• Provide similar behavior for Group By, Sum, Avg etc …
23
![Page 24: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/24.jpg)
@doanduyhai
How to create an UDA ?CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS][keyspace.]aggregateName(type1, type2, …)SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction]INITCOND initCond;
Only type, no param name
State type
Initial state type
24
![Page 25: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/25.jpg)
@doanduyhai
How to create an UDA ?CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS][keyspace.]aggregateName(type1, type2, …)SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction]INITCOND initCond;
Accumulator function. Signature: accumulatorFunction(stateType, type1, type2, …)
RETURNS stateType
25
![Page 26: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/26.jpg)
@doanduyhai
How to create an UDA ?CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS][keyspace.]aggregateName(type1, type2, …)SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction]INITCOND initCond;
Optional final function. Signature: finalFunction(stateType)
26
![Page 27: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/27.jpg)
UDA Demo
27
![Page 28: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/28.jpg)
@doanduyhai
Gotchas
28
• UDA in Cassandra is not distributed !
• Do not execute UDA on a large number of rows (106 for ex.) • single fat partition• multiple partitions• full table scan
• à Increase client-side timeout• default Java driver timeout = 12 secs
![Page 29: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/29.jpg)
@doanduyhai
Cassandra UDA or Apache Spark ?
29
Consistency Level
Single/Multiple Partition(s)
Recommended Approach
ONE Single partition UDA with token-aware driver because node local
ONE Multiple partitions Apache Spark because distributed reads
> ONE Single partition UDA because data-locality lost with Spark
> ONE Multiple partitions Apache Spark definitely
![Page 30: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/30.jpg)
Q & A
! "
30
![Page 31: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/31.jpg)
@doanduyhai
JSON Syntax• Why ? • Example
![Page 32: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/32.jpg)
@doanduyhai
Why JSON ?
32
• JSON is a very good exchange format
• But a terrible schema …
• How to have best of both worlds ?• use Cassandra schema• convert rows to JSON format
![Page 33: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/33.jpg)
@doanduyhai
JSON syntax for INSERT/UPDATE/DELETE
33
CREATE TABLE users ( id text PRIMARY KEY,
age int, state text );
INSERT INTO users JSON '{"id": "user123", "age": 42, "state": "TX"}’;
INSERT INTO users(id, age, state) VALUES('me', fromJson('20'), 'CA');
UPDATE users SET age = fromJson('25’) WHERE id = fromJson('"me"');
DELETE FROM users WHERE id = fromJson('"me"');
![Page 34: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/34.jpg)
@doanduyhai
JSON syntax for SELECT
34
> SELECT JSON * FROM users WHERE id = 'me';[json]
---------------------------------------- {"id": "me", "age": 25, "state": "CA”}
> SELECT JSON age,state FROM users WHERE id = 'me';[json]
---------------------------------------- {"age": 25, "state": "CA"}
> SELECT age, toJson(state) FROM users WHERE id = 'me'; age | system.tojson(state) -----+---------------------- 25 | "CA"
![Page 35: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/35.jpg)
JSON Syntax Demo
35
![Page 36: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/36.jpg)
Q & A
! "
36
![Page 37: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/37.jpg)
@doanduyhai
SASI index, the search is over!• Why ? • How ? • Who ? • Demo ! • When ?
![Page 38: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/38.jpg)
@doanduyhai
Why SASI ?• Searching (and full text search) was always a pain point for Cassandra
• limited search predicates (=, <=, <, > and >= only)• limited scope (only on primary key columns)
• Existing secondary index performance is poor• reversed-index• use Cassandra itself as index storage …
• limited predicate ( = ). Inequality predicate = full cluster scan 😱
38
![Page 39: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/39.jpg)
@doanduyhai
How ?• New index structure = suffix trees
• Extended predicates (=, inequalities, LIKE %)
• Full text search (tokenizers, stop-words, stemming …)
• Query Planner to optimize AND predicates
• NO, we don’t use Apache Lucene
39
![Page 40: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/40.jpg)
@doanduyhai
Who ?• Open source contribution by an engineers team from …
40
![Page 41: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/41.jpg)
SASI Demo 41
![Page 42: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/42.jpg)
@doanduyhai
When ?• Cassandra 3.5
• Later• support for OR clause : ( aaa OR bbb) AND (ccc OR ddd)• index on collections (Set, List, Map)
42
![Page 43: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/43.jpg)
@doanduyhai
Comparison
43
SASI vs Solr/ElasticSearch ?• Cassandra is not a search engine !!! (database = durability) • always slower because 2 passes (SASI index read + original Cassandra data)• no scoring • no ordering (ORDER BY)• no grouping (GROUP BY) à Apache Spark for analytics
Still, SASI covers 80% of search use-cases and people are happy !
![Page 44: Cassandra 3 new features 2016](https://reader031.vdocument.in/reader031/viewer/2022030305/58720ec11a28ab176b8b81c1/html5/thumbnails/44.jpg)
Q & A
! "
44