cassandra day chicago 2015: advanced data modeling
TRANSCRIPT
CassandraData
Advanced
Modeling
Let UsAssume:
• You%know%your%way%around%a%cluster(at%least%theore4cally)%
• You%have%seen%some%CQL
Let UsExplore:
• Some%use%cases%• The%Chebotko%Method%• Some%Cassandra%2.1%features
Use Cases
Top Gamer Scores
TopScores
TopScores
TopScores
TopScores
TopScores Daily Top 10 Users
handle | score -----------------+------- subsonic | 66.2 neo | 55.2 bennybaru | 49.2 tigger | 46.2 velvetfog | 45.2 flashberg | 43.6 jbellis | 43.4 cafruitbat | 43.2 groovemerchant | 41.2 rustyrazorblade | 39.2
TopScores
CREATE TABLE userScores (userId uuid, handle text static, gameId uuid, score_timestamp timestamp, score double, PRIMARY KEY ((userId, gameId), score_timestamp)) WITH CLUSTERING ORDER BY (score_timestamp DESC);
TopScoresCREATE TABLE TopTen ( gameId uuid, process_timestamp timestamp, score double, userId uuid, handle text, PRIMARY KEY (gameId, process_timestamp, score)) WITH CLUSTERING ORDER BY (process_timestamp DESC, score DESC)AND default_time_to_live = '259200'AND COMPACTION = {'class': 'DateTieredCompactionStrategy', 'enabled': 'TRUE'};
TopScores
SELECT gameId, process_timestamp, score, handle, userIdFROM TopTenWHERE gameid = 99051fe9-6a9c-46c2-b949-38ef78858dd0AND process_timestamp <= '2014-12-31 14:00:00'LIMIT 1;
CREATE TABLE TopTen ( gameId uuid, process_timestamp timestamp,
TopScores
SELECT gameId, process_timestamp, score, handle, userIdFROM TopTenWHERE gameid = 99051fe9-6a9c-46c2-b949-38ef78858dd0AND process_timestamp <= '2014-12-31 14:00:00'
gameid | process_timestamp | score | handle | userid --------------------------------------+ --------------------------+-------+-----------------+-------------------------------------- 99051fe9-6a9c-46c2-b949-38ef78858dd0 | 2014-12-31 13:42:40-0800 | 66.2 | trinity | 99051fe9-6a9c-46c2-b949-38ef78858dd0 | 2014-12-31 13:42:40-0800 | 55.2 | neo | 99051fe9-6a9c-46c2-b949-38ef78858dd0 | 2014-12-31 13:42:40-0800 | 49.2 | bennbaru | 99051fe9-6a9c-46c2-b949-38ef78858dd0 | 2014-12-31 13:42:40-0800 | 46.2 | tigger | 99051fe9-6a9c-46c2-b949-38ef78858dd0 | 2014-12-31 13:42:40-0800 | 45.2 | velvetfog | 99051fe9-6a9c-46c2-b949-38ef78858dd0 | 2014-12-31 13:42:40-0800 | 43.6 | flashberg | 99051fe9-6a9c-46c2-b949-38ef78858dd0 | 2014-12-31 13:42:40-0800 | 43.4 | jbellis | 99051fe9-6a9c-46c2-b949-38ef78858dd0 | 2014-12-31 13:42:40-0800 | 43.2 | catfruitbat | 99051fe9-6a9c-46c2-b949-38ef78858dd0 | 2014-12-31 13:42:40-0800 | 41.2 | groovemerchant | 99051fe9-6a9c-46c2-b949-38ef78858d03 99051fe9-6a9c-46c2-b949-38ef78858dd0 | 2014-12-31 13:42:40-0800 | 39.2 | rustyrazorblade | 99051fe9-6a9c-46c2-b949-38ef78858d01 99051fe9-6a9c-46c2-b949-38ef78858dd0 | 2014-12-31 13:42:40-0800 | 20.2 | driftx | 99051fe9-6a9c-46c2-b949-38ef78858d08
Yay!
Spark?
File Storage
FileStorage
FileStorage
Stories(In(Brief
• User%creates%an%account%%• User%uploads%image%• Image%is%distributed%worldwide%%• User%can%check%access%paHerns%
FileStorage
• Recall%a%single%image%• Recall%all%images%in%a%given%4me%range%• Recall%specific%images%over%a%given%4me%range%• Recall%the%4mes%each%image%was%accessed
FileStorage Access(Pa1erns
CREATE TABLE user ( username text, firstname text, lastname text, emails list<text>, PRIMARY KEY (username));
FileStorage User%Crea(on
INSERT INTO user (username, firstname, lastname, emails) VALUES ('tlberglund', 'Tim' 'Berglund' ['[email protected]', '[email protected]'])IF NOT EXISTS;
FileStorage User%Crea(on
CREATE TABLE image ( image_id, username, created_at, image_name, image_description text, tags list<text>, images map<text, uuid>, PRIMARY KEY (image_id));
FileStorage Image%Model
CREATE TABLE images_timeseries ( username text, bucket int, sequence timestamp, image_id uuid, image_name text, image_description text, images map<text, uuid>, PRIMARY KEY ((username, bucket), sequence)) WITH CLUSTERING ORDER BY (sequence DESC);
FileStorage Image%Accesses
CREATE TABLE bucket_index ( username text, bucket int,PRIMARY KEY(username, bucket)) WITH CLUSTERING ORDER BY (bucket DESC);
FileStorage Image%Buckets
CREATE TABLE blob ( object_id uuid, // unique identifier chunk_count int, // total number of chunks size int, // total size (bytes) chunk_size int, // max chunk size checksum text, attributes text, // json-encoded metadata PRIMARY KEY (object_id));
FileStorage Chunked%Blobs
CREATE TABLE blob_chunk ( object_id uuid, chunk_id int, chunk_size int, data blob, PRIMARY KEY ((object_id, chunk_id)));
FileStorage Chunked%Blobs
CREATE TABLE access_log ( object_id uuid, access_date text, access_time timestamp, ip_address inet, PRIMARY KEY ((object_id, access_date), access_time, ip_address));
FileStorage Access%Log
User Registration
UserRegistration LWT
SELECT *FROM usersWHERE userName = 'tlberglund'
Coordinator%1 Coordinator%2
SELECT *FROM usersWHERE userName = 'tlberglund' SELECT *
FROM usersWHERE userName = 'pmcfadin'
Coordinator%1 Coordinator%2
SELECT *FROM usersWHERE userName = 'tlberglund' SELECT *
FROM usersWHERE userName = 'pmcfadin'
INSERT INTO users (username, ...)VALUES ('tlberglund',...);
Coordinator%1 Coordinator%2
SELECT *FROM usersWHERE userName = 'tlberglund' SELECT *
FROM usersWHERE userName = 'pmcfadin'
INSERT INTO users (username, ...)VALUES ('tlberglund',...); INSERT INTO users (username, ...)
VALUES ('pmcfadin',...);
Coordinator%1 Coordinator%2
INSERT INTO users (username, ...)VALUES ('tlberglund',...); INSERT INTO users (username, ...)
VALUES ('pmcfadin',...);
LWT
Which user wins?
UserRegistration
LWT
• Lightweight)transac/ons)
• Uses)the)Paxos)algorithm)
• Hard)to)understand)
• So$easy$to$use
UserRegistration
Coordinator%1 Coordinator%2
INSERT INTO users (username, ...)VALUES (‘tlberglund',...)IF NOT EXISTS;
Coordinator%1 Coordinator%2
INSERT INTO users (username, ...)VALUES (‘tlberglund',...)IF NOT EXISTS;
INSERT INTO users (username, ...)VALUES (‘pmcfadin',...)IF NOT EXISTS;
Coordinator%1 Coordinator%2
Paxos%Write:%Happy%PathNODE
NODE
NODE
NODE
FLAT%WHITE
Paxos%Write:%Happy%PathNODE
NODE
NODE
NODE
FLAT%WHITE
1. Prepare
Paxos%Write:%Happy%Path
PROPOSER
FLAT%WHITE
Proposer generates
a sequence number
1. Prepare
Paxos%Write:%Happy%Path
PROPOSER
1:FLAT%WHITE
1:FLAT%WHITE
1:FLAT%WHITE
1:FLAT%WHITE
1. Prepare
Paxos%Write:%Happy%PathACCEPTOR
ACCEPTOR
ACCEPTOR
PROPOSER
Acceptors compare sequence numbers1:FLAT%WHITE
1:FLAT%WHITE
1:FLAT%WHITE
2 . Promise
Paxos%Write:%Happy%PathACCEPTOR
ACCEPTOR
ACCEPTOR
PROPOSER
Proposer receives
quorum, “makes
progress”
3. Accept Request
1:FLAT%WHITE
1:FLAT%WHITE
1:FLAT%WHITE
Paxos%Write:%Happy%Path
PROPOSER
FLAT%WHITE
4. Acceptance
1:FLAT%WHITE
1:FLAT%WHITE
1:FLAT%WHITE
ACCEPTOR
ACCEPTOR
ACCEPTOR
Acceptors check sequence numbers one more time
Paxos%Write:%BeLer%OfferACCEPTOR
ACCEPTOR
ACCEPTOR
PROPOSER
5:CAFÉ%CUBANO
5:CAFÉ%C
UBANO
5:CAFÉ%CUBANO
5:CAFÉ%CUBANO
1. Prepare
Paxos%Write:%BeLer%Offer
ACCEPTOR
1. Prepare
8:FRENCH%PRESS
Meanwhile, this node had
gotten another proposal
when we weren’t looking…
Paxos%Write:%BeLer%OfferACCEPTOR
ACCEPTOR
ACCEPTOR
PROPOSER
8:FRENC
H%PRESS
5:CAFÉ%CUBANO
5:CAFÉ%CUBANO
2 . Promise
Paxos%Write:%BeLer%OfferACCEPTOR
ACCEPTOR
ACCEPTOR
PROPOSER
Proposer changes
its mind!
3. Accept Request
8:FRENC
H%PRESS
8:FRENCH%PRESS
8:FRENCH%PRESS8:FRENCH%PRESS
Paxos%Write:%BeLer%Offer
PROPOSER
FRENCH%PRESS
4. Acceptance
8:FRENC
H%PRESS
8:FRENCH%PRESS
8:FRENCH%PRESS
ACCEPTOR
ACCEPTOR
ACCEPTOR
Two acceptors are
surprised about this,
but the sequence
numbers work out…
LightweightTransactions
• Good)solu/on)for)distributed)race)
condi/ons)
• At)some)cost)in)latency)
• Run)your)own)load)tests!)
• Now…why)were)you)using)ZooKeeper?
When ComplexDomainsAttack
Dr . Chebotko’sData Modeling
Emporium
• Conceptual)model)
• Logical)model)
• Physical)model
ChebotkoMethod
• Abstract,)implementa/onHindependent)model)
• Tradi/onally)built)in)Chen)ER)nota/on)
• Describes)en//es,)rela/onships,)roles,)keys,)
and)cardinali/es
ChebotkoMethod Conceptual%Model
ChebotkoMethod Conceptual%Model
Album
titleyear genre
releasesPerformername
founded
country
1 n
style
IsA
ArtistBand
disjoint5covering
born
died
has3member
n m
period
format
cover5image
numbertitle
1
n
Track
has
User
id
name
preferences
performs
m
1
involvedIn
1
n
IsA
RatePlay
disjoint5not5covering
Activity
id
timestamp
rating
ChebotkoMethod Logical%Model
• A)diagram)showing)queries)and)tables)
• Ensures)that)each)query)“fits”)in)a)par//on)
• Tends)to)produce)one)table)per)query
ChebotkoMethod Logical%Model
1. Iden/fy)access)paTerns)(“queries”))
2. Find)a)subset)of)the)conceptual)model)that)
sa/sfies)a)query)
3. Determine)key)
4. Verify)maximum)par//on)size
ChebotkoMethod Logical%Model
Q1
ACCESS%PATTERNSQ1:$Find$performers$for$a$specified$style;$order$by$performer$(ASC).Q2:$Find$information$for$a$specified$performer$(artist$or$band).Q3:$Find$information$for$a$specified$album$(title$and$year).Q4:$Find$albums$for$a$specified$performer;$order$by$album$release$year$(DESC)$and$title$(ASC).Q5:$Find$albums$for$a$specified$genre;$order$by$performer$(ASC),$year$(DESC),$and$title$(ASC).Q6:$Find$albums$and$performers$for$a$specified$track$title;$order$by$performer$(ASC),$year$(DESC),$and$title$(ASC).Q7:$Find$tracks$for$a$specified$album$(title$and$year);$order$by$track$number$(ASC).Q8:$Find$information$for$a$specified$user.Q9:$Find$activities$for$a$specified$user;$order$by$activity$time$(DESC).Q10:$Find$statistics$for$a$specified$track.Q11:$Find$user$activities$for$a$specified$track;$order$by$activity$time$(DESC).Q12:$Find$user$activities$for$a$specified$activity$type.…
Performer
name Ktypecountrystylefoundedborndied
Performers_by_style
style Kname C�
Albums_by_performer
performer $$$$Kyear $$$$C�title $$$$C�genre
Albums_by_genre
genre Kperformer$ C�year C�title C�
Tracks_by_album
album Kyear Knumber $ C�performer$ Sgenre Stitle
Albums_by_track
track Kperformer$ C�year C�title C�
Album
title Kyear Kperformergenretracks$(map)
Q2
Q2
Q4
Q3
Q3
Q4
Q5
Q5
Q6
Q1
Q3
Q3
Q7
Q7
Q7
User
id Kname$emailpreferences$(set)
Q8
Activities_by_user
user Kactivity (timeuuid) C�type IDXalbum_titlealbum_yeartrack_titlerating
Activities_by_track
album_title Kalbum_year Ktrack_title Kactivity$(timeuuid) C�usertyperating Track_stats
album_title Kalbum_year Ktrack_title Knum_ratings$(counter)sum_ratings$(counter)num_plays$(counter)
Q9
Q8
Q10
Q11
Q12
Q1
ACCESS%PATTERNSQ1:$Find$performers$for$a$specified$style;$order$by$performer$(ASC).Q2:$Find$information$for$a$specified$performer$(artist$or$band).Q3:$Find$information$for$a$specified$album$(title$and$year).Q4:$Find$albums$for$a$specified$performer;$order$by$album$release$year$(DESC)$and$title$(ASC).Q5:$Find$albums$for$a$specified$genre;$order$by$performer$(ASC),$year$(DESC),$and$title$(ASC).Q6:$Find$albums$and$performers$for$a$specified$track$title;$order$by$performer$(ASC),$year$(DESC),$and$title$(ASC).Q7:$Find$tracks$for$a$specified$album$(title$and$year);$order$by$track$number$(ASC).Q8:$Find$information$for$a$specified$user.Q9:$Find$activities$for$a$specified$user;$order$by$activity$time$(DESC).Q10:$Find$statistics$for$a$specified$track.Q11:$Find$user$activities$for$a$specified$track;$order$by$activity$time$(DESC).Q12:$Find$user$activities$for$a$specified$activity$type.…
Performer
name Ktypecountrystylefoundedborndied
Performers_by_style
style Kname C�
Albums_by_performer
performer $$$$Kyear $$$$C�title $$$$C�genre
Albums_by_genre
genre Kperformer$ C�year C�title C�
Tracks_by_album
album Kyear Knumber $ C�performer$ Sgenre Stitle
Albums_by_track
track Kperformer$ C�year C�title C�
Album
title Kyear Kperformergenretracks$(map)
Q2
Q2
Q4
Q3
Q3
Q4
Q5
Q5
Q6
Q1
Q3
Q3
Q7
Q7
Q7
User
id Kname$emailpreferences$(set)
Q8
Activities_by_user
user Kactivity (timeuuid) C�type IDXalbum_titlealbum_yeartrack_titlerating
Activities_by_track
album_title Kalbum_year Ktrack_title Kactivity$(timeuuid) C�usertyperating Track_stats
album_title Kalbum_year Ktrack_title Knum_ratings$(counter)sum_ratings$(counter)num_plays$(counter)
Q9
Q8
Q10
Q11
Q12
Q1
ACCESS%PATTERNSQ1:$Find$performers$for$a$specified$style;$order$by$performer$(ASC).Q2:$Find$information$for$a$specified$performer$(artist$or$band).Q3:$Find$information$for$a$specified$album$(title$and$year).Q4:$Find$albums$for$a$specified$performer;$order$by$album$release$year$(DESC)$and$title$(ASC).Q5:$Find$albums$for$a$specified$genre;$order$by$performer$(ASC),$year$(DESC),$and$title$(ASC).Q6:$Find$albums$and$performers$for$a$specified$track$title;$order$by$performer$(ASC),$year$(DESC),$and$title$(ASC).Q7:$Find$tracks$for$a$specified$album$(title$and$year);$order$by$track$number$(ASC).Q8:$Find$information$for$a$specified$user.Q9:$Find$activities$for$a$specified$user;$order$by$activity$time$(DESC).Q10:$Find$statistics$for$a$specified$track.Q11:$Find$user$activities$for$a$specified$track;$order$by$activity$time$(DESC).Q12:$Find$user$activities$for$a$specified$activity$type.…
Performer
name Ktypecountrystylefoundedborndied
Performers_by_style
style Kname C�
Albums_by_performer
performer $$$$Kyear $$$$C�title $$$$C�genre
Albums_by_genre
genre Kperformer$ C�year C�title C�
Tracks_by_album
album Kyear Knumber $ C�performer$ Sgenre Stitle
Albums_by_track
track Kperformer$ C�year C�title C�
Album
title Kyear Kperformergenretracks$(map)
Q2
Q2
Q4
Q3
Q3
Q4
Q5
Q5
Q6
Q1
Q3
Q3
Q7
Q7
Q7
User
id Kname$emailpreferences$(set)
Q8
Activities_by_user
user Kactivity (timeuuid) C�type IDXalbum_titlealbum_yeartrack_titlerating
Activities_by_track
album_title Kalbum_year Ktrack_title Kactivity$(timeuuid) C�usertyperating Track_stats
album_title Kalbum_year Ktrack_title Knum_ratings$(counter)sum_ratings$(counter)num_plays$(counter)
Q9
Q8
Q10
Q11
Q12
Q1
ACCESS%PATTERNSQ1:$Find$performers$for$a$specified$style;$order$by$performer$(ASC).Q2:$Find$information$for$a$specified$performer$(artist$or$band).Q3:$Find$information$for$a$specified$album$(title$and$year).Q4:$Find$albums$for$a$specified$performer;$order$by$album$release$year$(DESC)$and$title$(ASC).Q5:$Find$albums$for$a$specified$genre;$order$by$performer$(ASC),$year$(DESC),$and$title$(ASC).Q6:$Find$albums$and$performers$for$a$specified$track$title;$order$by$performer$(ASC),$year$(DESC),$and$title$(ASC).Q7:$Find$tracks$for$a$specified$album$(title$and$year);$order$by$track$number$(ASC).Q8:$Find$information$for$a$specified$user.Q9:$Find$activities$for$a$specified$user;$order$by$activity$time$(DESC).Q10:$Find$statistics$for$a$specified$track.Q11:$Find$user$activities$for$a$specified$track;$order$by$activity$time$(DESC).Q12:$Find$user$activities$for$a$specified$activity$type.…
Performer
name Ktypecountrystylefoundedborndied
Performers_by_style
style Kname C�
Albums_by_performer
performer $$$$Kyear $$$$C�title $$$$C�genre
Albums_by_genre
genre Kperformer$ C�year C�title C�
Tracks_by_album
album Kyear Knumber $ C�performer$ Sgenre Stitle
Albums_by_track
track Kperformer$ C�year C�title C�
Album
title Kyear Kperformergenretracks$(map)
Q2
Q2
Q4
Q3
Q3
Q4
Q5
Q5
Q6
Q1
Q3
Q3
Q7
Q7
Q7
User
id Kname$emailpreferences$(set)
Q8
Activities_by_user
user Kactivity (timeuuid) C�type IDXalbum_titlealbum_yeartrack_titlerating
Activities_by_track
album_title Kalbum_year Ktrack_title Kactivity$(timeuuid) C�usertyperating Track_stats
album_title Kalbum_year Ktrack_title Knum_ratings$(counter)sum_ratings$(counter)num_plays$(counter)
Q9
Q8
Q10
Q11
Q12
Q1
ACCESS%PATTERNSQ1:$Find$performers$for$a$specified$style;$order$by$performer$(ASC).Q2:$Find$information$for$a$specified$performer$(artist$or$band).Q3:$Find$information$for$a$specified$album$(title$and$year).Q4:$Find$albums$for$a$specified$performer;$order$by$album$release$year$(DESC)$and$title$(ASC).Q5:$Find$albums$for$a$specified$genre;$order$by$performer$(ASC),$year$(DESC),$and$title$(ASC).Q6:$Find$albums$and$performers$for$a$specified$track$title;$order$by$performer$(ASC),$year$(DESC),$and$title$(ASC).Q7:$Find$tracks$for$a$specified$album$(title$and$year);$order$by$track$number$(ASC).Q8:$Find$information$for$a$specified$user.Q9:$Find$activities$for$a$specified$user;$order$by$activity$time$(DESC).Q10:$Find$statistics$for$a$specified$track.Q11:$Find$user$activities$for$a$specified$track;$order$by$activity$time$(DESC).Q12:$Find$user$activities$for$a$specified$activity$type.…
Performer
name Ktypecountrystylefoundedborndied
Performers_by_style
style Kname C�
Albums_by_performer
performer $$$$Kyear $$$$C�title $$$$C�genre
Albums_by_genre
genre Kperformer$ C�year C�title C�
Tracks_by_album
album Kyear Knumber $ C�performer$ Sgenre Stitle
Albums_by_track
track Kperformer$ C�year C�title C�
Album
title Kyear Kperformergenretracks$(map)
Q2
Q2
Q4
Q3
Q3
Q4
Q5
Q5
Q6
Q1
Q3
Q3
Q7
Q7
Q7
User
id Kname$emailpreferences$(set)
Q8
Activities_by_user
user Kactivity (timeuuid) C�type IDXalbum_titlealbum_yeartrack_titlerating
Activities_by_track
album_title Kalbum_year Ktrack_title Kactivity$(timeuuid) C�usertyperating Track_stats
album_title Kalbum_year Ktrack_title Knum_ratings$(counter)sum_ratings$(counter)num_plays$(counter)
Q9
Q8
Q10
Q11
Q12
ChebotkoMethod Logical%Model%Analysis
• Natural)or)surrogate)keys?)
• Are)write)conflicts)(overwrites))possible?)
• What)data)types)to)use?)
• How)large)are)par//ons?)
• How)much)data)duplica/on)is)required?)
• Are)clientHside)joins)required)and)at)what)cost?)
• Are)data)consistency)anomalies)possible?)
• How)to)enable)transac/ons?
ChebotkoMethod Physical%Model
• Not)a)diagram!)
• Just)the)CQL)version)of)the)logical)
tables
C* 2 .1 FeaturesBonus!
User-DefinedTypes
UDTs• Good)for)modeling)nested)
“value)objects”)
• Eliminates)extra)queries,)inH
app)joins)
• Mechanism)for)
denormaliza/on
CREATE TYPE address ( street text, city text, zip_code int, country text, cross_streets set<text>);
UDTsCREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid));
CREATE TABLE video_metadata ( video_id uuid PRIMARY KEY, height int, width int, video_bit_rate set<text>, encoding text);
SELECT *FROM videosWHERE videoId = 2;
SELECT *FROM video_metadata WHERE videoId = 2;
InQapp%%Join
CREATE TYPE video_metadata ( height int, width int, video_bit_rate set<text>, encoding text);
UDTsCREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, metadata set <frozen<video_metadata>>, added_date timestamp, PRIMARY KEY (videoid));
ThankYou!