cassandra advanced data modeling

36
Cassandra Advanced data modeling Lyon Cassandra Users Romain Hardouin 2016-05-31

Upload: romain-hardouin

Post on 21-Apr-2017

2.229 views

Category:

Data & Analytics


5 download

TRANSCRIPT

Page 1: Cassandra advanced data modeling

CassandraAdvanceddata modeling

Lyon Cassandra UsersRomain Hardouin2016-05-31

Page 2: Cassandra advanced data modeling

$ whoRomain

$ pgrep -fl workCassandra architect

$ whatis teadsNo.1 Video Advertising Marketplace

Page 3: Cassandra advanced data modeling

I. Introduction

II. Key principles

III. Chebotko methodology

IV. Time handling

Data modeling

Page 4: Cassandra advanced data modeling

I. Introduction

Page 5: Cassandra advanced data modeling

Theory

Page 6: Cassandra advanced data modeling

Theory

Chebotko diagrams

E&R

Page 7: Cassandra advanced data modeling

II. Key principles

Page 8: Cassandra advanced data modeling

Know your data

DenormalizeKnow your queries

Key Principles

Nest DataDuplicate Data

Page 9: Cassandra advanced data modeling

Know your domain

Conceptual Data Model, E&R● Entities● Relationships● Attributes / Keys● Cardinalities● Constraints

Know your data

Page 10: Cassandra advanced data modeling

Entities & relationships

Know your data

Page 11: Cassandra advanced data modeling

Query-driven model

Application Workflow

New needs?● New queries => new tables● Alter table possible?

Know your data

Know your queries

Page 12: Cassandra advanced data modeling

Goal: one partition per query

Anti-pattern:● Table scan● Client joins (a.k.a multi-table)● Secondary index● Allow filtering

Know your data

Know your queries

Page 13: Cassandra advanced data modeling

Nest Data

Clustering columns

Collection columns

UDT columns

Know your data

Denormalize

Page 14: Cassandra advanced data modeling

Nest Data

Know your data

Denormalize

CREATE TABLE actors_by_video ( video_id uuid, actor_name text, character_name text, PRIMARY KEY ((video_id),

actor_name, character_name));

Page 15: Cassandra advanced data modeling

Duplicate data

Writes are cheap: « Joins on write »

Duplication occurs at different levels:● Table: Materialized views● Partition● Rows

Know your data

Denormalize

Page 16: Cassandra advanced data modeling

III. Chebotko Methodology

Page 17: Cassandra advanced data modeling

From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »

Application workflowApplication workflow

Query workflow Query list

Page 18: Cassandra advanced data modeling

From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »

Chebotko DiagramChebotko Diagram

Page 19: Cassandra advanced data modeling

actors_by_video

video_id uuid Kactor_name text C↑character_name text C↑

CREATE TABLE actors_by_video ( video_id uuid, actor_name text, character_name text, PRIMARY KEY ((video_id), actor_name, character_name));

Chebotko DiagramChebotko Diagram

Page 20: Cassandra advanced data modeling

MR 1Entities & Relationships

MR 2Equality search attributes

MR 3Inequality search attribues

Chebotko mapping rules

MR 5Key attributes, uniqueness

MR 4Ordering attributes

<>=

↑↓

Page 21: Cassandra advanced data modeling

From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »

Chebotko mapping rulesChebotko mapping rules

Page 22: Cassandra advanced data modeling

Internet of ThingsDemo

Kashlev Data Modeler

Page 23: Cassandra advanced data modeling

IV. Time handling- Tombstones

- TTL

- UPSERTs

Page 24: Cassandra advanced data modeling

IV. Time handling- Tombstones

- TTL

- UPSERTs

Page 25: Cassandra advanced data modeling

Eventually consistency

No instant deletes

Deletes are writes

SSTables are immutable files

Writes are spread across many files

Page 26: Cassandra advanced data modeling
Page 27: Cassandra advanced data modeling

Goal: avoid to read too many* tombstones

...

...

* see tombstone_warn_threshold & tombstone_failure_threshold

Page 28: Cassandra advanced data modeling

IV. Time handling- Tombstones

- TTL

- UPSERTs

Page 29: Cassandra advanced data modeling

TTLsTTLs

Data must be designed to be TTL'ed

tombstones

Page 30: Cassandra advanced data modeling

Why?

What we add?

Page 31: Cassandra advanced data modeling

TIMEdimension

Page 32: Cassandra advanced data modeling

IV. Time handling- Tombstones

- TTL

- UPSERTs

Page 33: Cassandra advanced data modeling

UPSERTsUPSERTs

Same INSERT over and over again?

UPSERTs hide this behavior

What if… one day you want to add time

Page 34: Cassandra advanced data modeling
Page 35: Cassandra advanced data modeling

Questions?

Page 36: Cassandra advanced data modeling

Resources« A Big Data Modeling Methodology for Apache Cassandra »

- Artem Chebotko, Andrey Kashlev & Shiyong Lu - www.cs.wayne.edu/andrey/papers/TR-BIGDATA-05-2015-CKL.pdf

KDM- Andrey Kashlev- kdm.dataview.org