cassandra advanced data modeling
TRANSCRIPT
CassandraAdvanceddata modeling
Lyon Cassandra UsersRomain Hardouin2016-05-31
$ whoRomain
$ pgrep -fl workCassandra architect
$ whatis teadsNo.1 Video Advertising Marketplace
I. Introduction
II. Key principles
III. Chebotko methodology
IV. Time handling
Data modeling
I. Introduction
Theory
Theory
Chebotko diagrams
E&R
II. Key principles
Know your data
DenormalizeKnow your queries
Key Principles
Nest DataDuplicate Data
Know your domain
Conceptual Data Model, E&R● Entities● Relationships● Attributes / Keys● Cardinalities● Constraints
Know your data
Entities & relationships
Know your data
Query-driven model
Application Workflow
New needs?● New queries => new tables● Alter table possible?
Know your data
Know your queries
Goal: one partition per query
Anti-pattern:● Table scan● Client joins (a.k.a multi-table)● Secondary index● Allow filtering
Know your data
Know your queries
Nest Data
Clustering columns
Collection columns
UDT columns
Know your data
Denormalize
Nest Data
Know your data
Denormalize
CREATE TABLE actors_by_video ( video_id uuid, actor_name text, character_name text, PRIMARY KEY ((video_id),
actor_name, character_name));
Duplicate data
Writes are cheap: « Joins on write »
Duplication occurs at different levels:● Table: Materialized views● Partition● Rows
Know your data
Denormalize
III. Chebotko Methodology
From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »
Application workflowApplication workflow
Query workflow Query list
From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »
Chebotko DiagramChebotko Diagram
actors_by_video
video_id uuid Kactor_name text C↑character_name text C↑
CREATE TABLE actors_by_video ( video_id uuid, actor_name text, character_name text, PRIMARY KEY ((video_id), actor_name, character_name));
Chebotko DiagramChebotko Diagram
MR 1Entities & Relationships
MR 2Equality search attributes
MR 3Inequality search attribues
Chebotko mapping rules
MR 5Key attributes, uniqueness
MR 4Ordering attributes
<>=
↑↓
From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »
Chebotko mapping rulesChebotko mapping rules
Internet of ThingsDemo
Kashlev Data Modeler
IV. Time handling- Tombstones
- TTL
- UPSERTs
IV. Time handling- Tombstones
- TTL
- UPSERTs
Eventually consistency
No instant deletes
Deletes are writes
SSTables are immutable files
Writes are spread across many files
Goal: avoid to read too many* tombstones
...
...
* see tombstone_warn_threshold & tombstone_failure_threshold
IV. Time handling- Tombstones
- TTL
- UPSERTs
TTLsTTLs
Data must be designed to be TTL'ed
tombstones
Why?
What we add?
TIMEdimension
IV. Time handling- Tombstones
- TTL
- UPSERTs
UPSERTsUPSERTs
Same INSERT over and over again?
UPSERTs hide this behavior
What if… one day you want to add time
Questions?
Resources« A Big Data Modeling Methodology for Apache Cassandra »
- Artem Chebotko, Andrey Kashlev & Shiyong Lu - www.cs.wayne.edu/andrey/papers/TR-BIGDATA-05-2015-CKL.pdf
KDM- Andrey Kashlev- kdm.dataview.org