introduction to data modeling in cassandra
TRANSCRIPT
![Page 1: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/1.jpg)
Introduction to Data Modeling in Cassandra
BarCamp Kerala 2015
![Page 2: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/2.jpg)
Who am I?
Software Engineer at RapidValue Backend Engineer of Gudly Author of Flask-CQLAlchemy
![Page 3: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/3.jpg)
What is Cassandra?
Massively linearly scalable NoSQL database High throughput with nearly linear scaling with proper use
cases Row-column oriented with SQL like approach using CQL
![Page 4: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/4.jpg)
Brief History
Created by Avinash Lakshman(creator of Amazon's Dynamo) and Prashant Malik
Released as open source in 2008 Became an Apache top-level project in 2010
![Page 5: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/5.jpg)
Best Use-Cases
Playlists & Collections Sensor Data Personalization and recommendation engines Messaging Fraud Detection
![Page 6: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/6.jpg)
Notable features
No single point of failure Clearly defined table schema in a NoSQL environments Near linear horizontal scaling across commodity servers No joins
![Page 7: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/7.jpg)
Brewer's Conjecturea.k.a “CAP Theorem”
Consistency – All nodes see the same data at any given time Availability – Every request receives a response whether is
succeeded or failed Partition Tolerance – Failure of a node does not bring the
system down Cassandra is a AP database
![Page 8: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/8.jpg)
RDBMS vs CassandraQuerying
SQL for querying
SELECT * FROM users WHERE name = “John Doe”;
CQL for querying
SELECT * FROM users WHERE name = “John Doe”;
![Page 9: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/9.jpg)
Data Modeling
Collection and analysis of data requirements Identification of participating entities and relationships Identification of data access patterns
![Page 10: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/10.jpg)
Data Modeling
A particular way of organizing and structuring data Design and specification of a database schema Schema optimization and data indexing techniques
![Page 11: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/11.jpg)
Products of Data Modeling
Conceptual Data model
Technology independent, unified views of data Entity-relationship model, dimensional model etc.
![Page 12: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/12.jpg)
Conceptual Data Model Entity Relationship Diagram
![Page 13: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/13.jpg)
Products of Data Modeling
Logical Data model
Unique for Cassandra Column family diagrams (Chebotko diagrams)
![Page 14: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/14.jpg)
Modeling Guidelines Writes are cheap, reads are not Joins are not possible Duplication is good Indexing creates latency All data required to answer a query must be nested in a
column family
![Page 15: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/15.jpg)
Data Modeling Methodology For each query, Identify a subset of the conceptual data model that describes
query data Apply a suitable mapping pattern on the subset and the
query Use Chebotko diagram to describe this as a logical model
![Page 16: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/16.jpg)
Products of Data Modeling
Physical Data model
Unique for Cassandra CQL Definitions
![Page 17: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/17.jpg)
Physical Data Model
CQL CREATE statement CREATE TABLE emp ( empID int, deptID int, first_name varchar, last_name varchar, PRIMARY KEY (empID, deptID) );
![Page 18: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/18.jpg)
RDBMS vs Cassandra
Cassandra is equally good for complex and simple data All data required to answer a query must be nested in a
column family Data modeling methodology is driven by queries and data Data duplication is considered normal
![Page 19: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/19.jpg)
Cassandra in Production Netflix Spotify Twitter
![Page 20: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/20.jpg)
![Page 21: Introduction to Data Modeling in Cassandra](https://reader038.vdocument.in/reader038/viewer/2022102423/55d6eb15bb61eb880d8b45a8/html5/thumbnails/21.jpg)
References http://academy.datastax.com http://www.slideshare.net/nkorla1share/cass-summit-3 http://docs.datastax.com http://planetcassandra.com