tdc2017 | são paulo - trilha nosql how we figured out we had a sre team at - cassandra: por que o...

37
Globalcode – Open4education Cassandra Why will the relational thinking destroy your system performance? Paulo Ricardo R. Almeida OCJP, 2 years working with Cassandra

Upload: tdc-globalcode

Post on 16-Mar-2018

48 views

Category:

Education


0 download

TRANSCRIPT

Page 1: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

CassandraWhy will the relational thinking destroy your system

performance?

Paulo Ricardo R. AlmeidaOCJP, 2 years working with Cassandra

Page 2: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Agenda• What is Cassandra?• Why Cassandra?• Quick Review• The Problem to tackle

• Relational solution and its drawbacks• Addressing the problem with C* thinking

• Goals and Non-Goals• Query First• The Cassandra solution

• Benchmarking• Additional resources

Page 3: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

What is Cassandra ?

Distributed Fault Tolerant Linear Scalability

Page 4: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Pick two of: Availability, Consistency, Partition Tolerance

Page 5: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Why Cassandra ?

● Distributed Cache (Netflix EVCache)● Real time Processing● Data doesn't fit in one place● High write workload

○ Time series data○ Log storage/analysis

● Geographical distribution● Performance

Page 6: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Quick Review

Coordinator

RF = 3

CLIENT

token(partitionKey)using Partitioner

Keyspace

Page 7: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Page 8: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

https://pandaforme.gitbooks.io/introduction-to-cassandra

Page 9: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

The problem

Store TDC information (speakers and talks)

Page 10: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Relational Way

Page 11: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Relational Way

SELECT * FROM speaker

WHERE state = 'PR'

Page 12: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Relational Way

SELECT * FROM talk

INNER JOIN speaker

ON speaker.id == talk.speaker_id_a

OR speaker.id == talk.speaker_id_b

Page 13: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Putting into Cassandra

Page 14: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Page 15: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Why?

SELECT * FROM speaker WHERE state = 'PR'

ALLOW FILTERING

Retrieve all rows and filters one by one

Page 16: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Secondary index to Improve read performance

Page 17: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Secondary Index

CREATE INDEX speaker_name ON speaker (name);

Page 18: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Secondary Index

0312 Paulo Almeida2315 Gessica Dutra...

0003 Jefferson….

5 lookups 1 response = poor performance

SELECT * FROM tdc.speaker

WHERE name = 'Paulo Almeida'

Page 19: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Limitations● No JOIN, LIKE… support● No constraints● No transaction (ACID)● No consistency (Strong)● Secondary Index doesn't scale well

Page 20: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Goals and Non-Goals

● Non-Goals○ Minimize number of writes○ Minimize data duplication

● Goals○ Spread data evenly around the cluster○ Minimize the number of partitions read

Page 21: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Query first!

● Know your queries first and model around them○ Don't model around relations○ Don't model around objects○ Try to create a CF where you can satisfy the query by

reading one partition

Page 22: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

● Speaker by state● Speaker by name● Talks by speaker name● Talks by keywords● Talks by track

Queries

Page 23: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Cassandra Way

Page 24: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Cassandra Way

Page 25: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Data Modeling

CREATE KEYSPACE tdc WITH REPLICATION =

{

'class': 'SimpleStrategy',

'replication_factor': 3

}

Page 26: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Data Modeling

CREATE TABLE tdc.speaker (

id uuid,

name text,

email text,

bio text,

city text,

state text,

PRIMARY KEY (id)

);

keyspace

PartitionKey

Page 27: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Data Modeling

CREATE TABLE tdc.speaker_by_name (

speaker_id uuid,

name text

PRIMARY KEY (name, speaker_id)

);

SELECT speaker_id FROM tdc.speaker_by_name;

SELECT * FROM tdc.speaker = $speaker_id

Better approach, requires 2 lookups in any case

Partition Key

Page 28: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Data Modeling

SELECT * FROM tdc.speaker_by_state

WHERE state = 'PR'

CREATE TABLE tdc.speaker_by_state (

speaker_id uuid,

name text,

state text,

bio text,

PRIMARY KEY (state, name, speaker_id)

) WITH CLUSTERING ORDER BY (city ASC, name ASC);

Partition Key

Clustering Key

Page 29: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Data Modeling

CREATE TABLE tdc.speaker_by_state (

speaker_id uuid,

name text,

state text,

bio text,

PRIMARY KEY (state, city, name, speaker_id)

) WITH CLUSTERING ORDER BY (city ASC, name ASC);

Partition Key Clustering Key

Page 30: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Data Modeling

BEGIN BATCH

INSERT INTO speaker (id, …) VALUES (...);

INSERT INTO speaker_by_name (name, ...) VALUES (...);

INSERT INTO speaker_by_state (state, ...) VALUES (...);

APPLY BATCH;

Page 31: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Data Modeling

CREATE TABLE tdc.talk_by_speaker_name(

talk_id uuid,

talk_name text,

speaker_name text,

date timestamp,

PRIMARY KEY (speaker_name, date DESC, talk_id)

);

Page 32: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Data Modeling

CREATE INDEX talk_by_track_name ON tdc.talk (track_name)

SELECT * FROM tdc.talk WHERE track_name = 'Test';

Page 33: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Netflix benchmarkhttps://academy.datastax.com/planet-cassandra/nosql-performance-benchmarks

Page 34: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Netflix benchmarkhttps://academy.datastax.com/planet-cassandra/nosql-performance-benchmarks

Nodes Cassandra Couchbase HBase MongoDB

1 18,925.59 1,554.14 973.85 1,278.81

2 35,539.69 2,985.28 3,430.59 1,441.32

4 64,911.39 3,755.28 6,451.95 1,801.06

8 117,237.91 10,138.80 6,262.95 2,195.92

16 210,237.90 11,761.31 15,268.93 1,230.96

32 348,682.44 21,375.02 58,463.15 2,335.14

Operations/sec

Page 35: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Page 36: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Resources● Cassandra - The definitive guide● Datastax self-paced Training

○ https://academy.datastax.com/resources/ds220-data-modeling● Datastax CQL Reference

○ http://docs.datastax.com/en/cql/3.1/cql/cql_reference/cqlReferenceTOC.html

● Cassandra-demo-middle:○ https://github.com/rochapaulo/cassandra-demo-middle

● Presentation source code:○ https://github.com/rochapaulo/TDC-SP-2017-Cassandra

● Youtube videos

Page 37: TDC2017 | São Paulo - Trilha NOSQL How we figured out we had a SRE team at -   Cassandra: por que o pensamento relacional vai destruir a performance de seu sistema?

Globalcode – Open4education

Thank you!

/rochapaulo

/pauloricardoalmeida

[email protected]