cassandra summit 2014: real data models of silicon valley
DESCRIPTION
A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!TRANSCRIPT
![Page 1: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/1.jpg)
Real Data Models of Silicon Valley
Patrick McFadin Chief Evangelist for Apache Cassandra
!
@PatrickMcFadin
![Page 2: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/2.jpg)
It's been an epic year
![Page 3: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/3.jpg)
I've had a ton of fun!
• Traveling the world talking to people like you!
Warsaw
Stockholm
Melbourne
New YorkVancouver
Dublin
![Page 4: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/4.jpg)
What's new?• 2.1 is out!
• Amazing changes for performance and stability
![Page 5: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/5.jpg)
Where are we going?
• 3.0 is next. Just hold on…
![Page 6: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/6.jpg)
KillrVideo.com• 2012 Summit
• Complete example for data modeling
www.killrvideos.com
Video TitleRecommended
MeowAds
by Google
Comments
Description
Upload New!
Username
Rating: Tags: Foo Bar
*Cat drawing by goodrob13 on Flickr
![Page 7: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/7.jpg)
It’s alive!!!• Hosted on Azure
• Code on Github
![Page 8: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/8.jpg)
Data Model - Revisited• Add in some 2.1 data models
• Replace (or remove) some app code
• Become a part of Cassandra OSS download
![Page 9: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/9.jpg)
User Defined Types• Complex data in one place
• No multi-gets (multi-partitions)
• Nesting!CREATE TYPE address ( street text, city text, zip_code int, country text, cross_streets set<text> );
![Page 10: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/10.jpg)
BeforeCREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) );
CREATE TABLE video_metadata ( video_id uuid PRIMARY KEY, height int, width int, video_bit_rate set<text>, encoding text );
SELECT * FROM videos WHERE videoId = 2; !SELECT * FROM video_metadata WHERE videoId = 2;
Title: Introduction to Apache Cassandra !Description: A one hour talk on everything you need to know about a totally amazing database.
480 720
Playback rate:
In-application join
![Page 11: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/11.jpg)
After• Now video_metadata is
embedded in videos
CREATE TYPE video_metadata ( height int, width int, video_bit_rate set<text>, encoding text );
CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, metadata set <frozen<video_metadata>>, added_date timestamp, PRIMARY KEY (videoid) );
![Page 12: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/12.jpg)
Wait! Frozen??
• Staying out of technical debt
• 3.0 UDTs will not have to be frozen
• Applicable to User Defined Types and Tuples (wait for
Do you want to build a schema? Do you want to store some JSON?
![Page 13: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/13.jpg)
Let’s store some JSON{ "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } }
![Page 14: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/14.jpg)
Let’s store some JSON{ "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } }
CREATE TYPE dimensions ( units text, length float, width float, height float );
![Page 15: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/15.jpg)
Let’s store some JSON{ "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } }
CREATE TYPE dimensions ( units text, length float, width float, height float );
CREATE TYPE category ( catalogPage int, url text );
![Page 16: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/16.jpg)
Let’s store some JSON{ "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } }
CREATE TYPE dimensions ( units text, length float, width float, height float );
CREATE TYPE category ( catalogPage int, url text );
CREATE TABLE product ( productId int, name text, price float, description text, dimensions frozen <dimensions>, categories map <text, frozen <category>>, PRIMARY KEY (productId) );
![Page 17: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/17.jpg)
Let’s store some JSONINSERT INTO product (productId, name, price, description, dimensions, categories) VALUES (2, 'Kitchen Table', 249.99, 'Rectangular table with oak finish', { units: 'inches', length: 50.0, width: 66.0, height: 32 }, { 'Home Furnishings': { catalogPage: 45, url: '/home/furnishings' }, 'Kitchen Furnishings': { catalogPage: 108, url: '/kitchen/furnishings' } ! } );
dimensions frozen <dimensions>
categories map <text, frozen <category>>
![Page 18: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/18.jpg)
Retrieving fields
![Page 19: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/19.jpg)
Counters pt Deux
• Since .8
• Commit log replay would change counters
• Repair could change counters
• Performance was inconsistent. Lots of GC
![Page 20: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/20.jpg)
The good• Stable under load
• No commit log replay issues
• No repair weirdness
![Page 21: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/21.jpg)
The bad
• Still can’t delete/reset counters
• Still needs to do a read before write.
![Page 22: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/22.jpg)
UsageWait for it…
It’s the same! Carry on…
![Page 23: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/23.jpg)
Static Fields• New as of 2.0.6
• VERY specific, but useful
• Thrift people will like this
CREATE TABLE t ( k text, s text STATIC, i int, PRIMARY KEY (k, i) );
![Page 24: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/24.jpg)
Why?CREATE TABLE weather ( id int, time timestamp, weatherstation_name text, temperature float, PRIMARY KEY (id, time) );
ID = 1Partition Key
(Storage Row Key)
2014-09-08 12:00:00 : name
SFO
2014-09-08 12:00:00 : temp
63.4
2014-09-08 12:01:00 : name
SFO
2014-09-08 12:00:00 : temp
63.9
2014-09-08 12:02:00 : name
SFO
2014-09-08 12:00:00 : temp
64.0
Partition Row 1 Partition Row 2 Partition Row 3
ID = 1Partition Key
(Storage Row Key)
name
SFO
2014-09-08 12:00:00 : temp
63.4
2014-09-08 12:00:00 : temp
63.9
2014-09-08 12:00:00 : temp
64.0
Partition Row 1 Partition Row 1 Partition Row 1
CREATE TABLE weather ( id int, time timestamp, weatherstation_name text static, temperature float, PRIMARY KEY (id, time) );
![Page 25: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/25.jpg)
Usage• Put a static at the end
of the declaration
• Can’t be a part of:
CREATE TABLE video_event ( videoid uuid, userid uuid, preview_image_location text static, event varchar, event_timestamp timeuuid, video_timestamp bigint, PRIMARY KEY ((videoid,userid),event_timestamp,event) ) WITH CLUSTERING ORDER BY (event_timestamp DESC,event ASC);
![Page 26: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/26.jpg)
Tuples
• A type that represents a group
• Up to 256 different elements
CREATE TABLE tuple_table ( id int PRIMARY KEY, three_tuple frozen <tuple<int, text, float>>, four_tuple frozen <tuple<int, text, float, inet>>, five_tuple frozen <tuple<int, text, float, inet, ascii>> );
![Page 27: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/27.jpg)
Example Usage• Track a drone’s position
• x, y, z in a 3D Cartesian
CREATE TABLE drone_position ( droneId int, time timestamp, position frozen <tuple<float, float, float>>, PRIMARY KEY (droneId, time) );
![Page 28: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/28.jpg)
What about partition size?
• A CQL partition is a logical projection of a storage row
• Storage rows can have up to 2 billion cells
• Each cell can hold up to 2G of data
![Page 29: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/29.jpg)
How much is too much?
• How many cells before performance degrades?
• How many bytes per partition before it’s unmanageable
• What is “practical”
![Page 30: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/30.jpg)
Old answer• 2011: Pre-Cassandra 1.2 (actually tested on .8)
• Aaron Morton, Cassandra MVP and Founder of The Last Pickle
![Page 31: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/31.jpg)
Conclusion• Keep partition (storage row) length < 10k cells
• Total size in bytes below 64M (Multi-pass compaction)
• Multiple hits to 64k page size will start to hurt
TL;DR - It’s a performance tunable
![Page 32: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/32.jpg)
The tests revisited
• Attempted to reproduce the same tests using CQL
• Cassandra 2.1, 2.0 and 1.2
• Tested partitions sizes 1. 100 2. 2114 3. 5,000 4. 10,000 5. 100,000 6. 1,000,000 7. 10,000,000 8. 100,000,000 9. 1,000,000,000
![Page 33: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/33.jpg)
Results
mSec
Cells per partition
![Page 34: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/34.jpg)
The new answer
• 100’s of thousands is not problem
• 100’s of megs per partition is best operationally
• The issue to manage is operations
![Page 35: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/35.jpg)
Thank You!
Follow me on twitter for more @PatrickMcFadin
![Page 36: Cassandra Summit 2014: Real Data Models of Silicon Valley](https://reader033.vdocument.in/reader033/viewer/2022051817/548f494bb479597e6a8b503a/html5/thumbnails/36.jpg)
CASSANDRASUMMIT2014September 10 - 11 | #CassandraSummit