time series with apache cassandra - long version
DESCRIPTION
Apache Cassandra has proven to be one of the best solutions for storing and retrieving time series data. This talk will give you an overview of the many ways you can be successful. We will discuss how the storage model of Cassandra is well suited for this pattern and go over examples of how best to build data models.TRANSCRIPT
![Page 1: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/1.jpg)
©2013 DataStax Confidential. Do not distribute without consent.
@PatrickMcFadin
Patrick McFadinChief Evangelist
Time Series with Apache Cassandra
1
![Page 2: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/2.jpg)
Quick intro to Cassandra• Shared nothing •Masterless peer-to-peer • Based on Dynamo
![Page 3: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/3.jpg)
Scaling• Add nodes to scale •Millions Ops/s Cassandra HBase Redis MySQL
THRO
UG
HPU
T O
PS/S
EC)
![Page 4: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/4.jpg)
Uptime• Built to replicate • Resilient to failure • Always on
NONE
![Page 5: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/5.jpg)
Easy to use• CQL is a familiar syntax • Friendly to programmers • Paxos for locking
CREATE TABLE users (! username varchar,! firstname varchar,! lastname varchar,! email list<varchar>,! password varchar,! created_date timestamp,! PRIMARY KEY (username)!);
INSERT INTO users (username, firstname, lastname, ! email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');!
INSERT INTO users (username, firstname, ! lastname, email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],! 'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00')!IF NOT EXISTS;
![Page 6: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/6.jpg)
Time series in production• It’s all about “What’s happening” • Data is the new currency
![Page 7: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/7.jpg)
Stack Driver• AWS and Rackspace monitoring • Quick indexes • Batch rollup results
![Page 8: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/8.jpg)
MyDrive•Moved from Mongo to Cassandra • Queue processing • Bound at the storing data
“One thing that is not at all obvious from the graph is that the system was also under massively heavier strain after the switch to Cassandra because of additional bulk processing going on in the background.” - Karl Matthias, MyDrive
![Page 9: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/9.jpg)
Paddy Power• Real-time product and pricing •Much like stock tickers • Active-active across two data
centers
“Specifically for Cassandra and Datastax, the ability to process time-series data is something that lots of companies have done in the past, not something that we were very aware of, and that was one of the reasons why we chose this as the first use case for Cassandra.” - John Turner, Paddy Power
![Page 10: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/10.jpg)
Internet Of Things• 15B devices by 2015 • 40B devices by 2020!
![Page 11: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/11.jpg)
Why Cassandra for Time Series
ScalesResilientGood data modelEfficient Storage Model
What about that?
![Page 12: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/12.jpg)
Example 1: Weather Station•Weather station collects data • Cassandra stores in sequence • Application reads in sequence
![Page 13: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/13.jpg)
Use case
• Store data per weather station • Store time series in order: first to last
• Get all data for one weather station • Get data for a single date and time • Get data for a range of dates and times
Needed Queries
Data Model to support queries
![Page 14: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/14.jpg)
Data Model•Weather Station Id and Time
are unique • Store as many as needed
CREATE TABLE temperature ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time) );
INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:01:00','72F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:02:00','73F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:03:00','73F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
![Page 15: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/15.jpg)
Storage Model - Logical View
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD';
1234ABCD
1234ABCD
1234ABCD
weatherstation_id event_time temperature
2013-04-03 07:04:00
74F1234ABCD
![Page 16: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/16.jpg)
Storage Model - Disk Layout
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F1234ABCD
2013-04-03 07:04:00
74F
SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD';
Merged, Sorted and Stored Sequentially
2013-04-03 07:05:00 !!74F
2013-04-03 07:06:00 !!75F
![Page 17: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/17.jpg)
Query patterns• Range queries • “Slice” operation on disk
SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD' AND event_time >= '2013-04-03 07:01:00' AND event_time <= '2013-04-03 07:04:00';
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F1234ABCD
2013-04-03 07:04:00
74F
2013-04-03 07:05:00 !!74F
2013-04-03 07:06:00 !!75F
Single seek on disk
![Page 18: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/18.jpg)
Query patterns• Range queries • “Slice” operation on disk
SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD' AND event_time >= '2013-04-03 07:01:00' AND event_time <= '2013-04-03 07:04:00';
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
1234ABCD
2013-04-03 07:04:00
74F
weatherstation_id event_time temperature
1234ABCD
1234ABCD
1234ABCD
Programmers like this
Sorted by event_time
![Page 19: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/19.jpg)
Additional help on the storage engine
![Page 20: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/20.jpg)
SSTable seeks• Each read minimum
1 seek • Cache and bloom
filter help minimize
Total seek time = Disk Latency * number of seeks
![Page 21: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/21.jpg)
The key to speed
Use the first part of the primary key to get the node (data localization)
Minimize seeks for SStables (Bloom Filter,Key Cache up-to-date)
Find the data fast in the SSTable (Indexes)
![Page 22: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/22.jpg)
Min/Max Value Hint•New since 2.0 • Range index on primary key values per SSTable •Minimizes seeks on range data
CASSANDRA-5514 if you are interested in details
SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time => '2013-04-03 07:01:00' AND event_time =< '2013-04-03 07:04:00';
Row Key: 1234ABCD Min event_time: 2013-04-01 00:00:00 Max event_time: 2013-04-04 23:59:59
Row Key: 1234ABCD Min event_time: 2013-04-05 00:00:00 Max event_time: 2013-04-09 23:59:59
Row Key: 1234ABCD Min event_time: 2013-03-27 00:00:00 Max event_time: 2013-03-31 23:59:59
?
This one
![Page 23: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/23.jpg)
Ingestion models• Apache Kafka • Apache Flume • Storm • Spark Streaming • Custom Applications
Apache Kafka
Your totally!killer!application
![Page 24: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/24.jpg)
Kafka + Storm• Kafka provides reliable queuing • Storm processes (rollups, counts) • Cassandra stores at the same speed • Storm lookup on Cassandra
Apache KafkaApache Storm
Queue Process Store
![Page 25: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/25.jpg)
Flume• Source accepts data • Channel buffers data • Sink processes and stores • Popular for log processing
Sink
Channel
SourceApplication
Load Balancer
Syslog
![Page 26: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/26.jpg)
Dealing with data at speed• 1 million writes per second? • 1 insert every microsecond • Collisions?
• Primary Key determines node placement • Random partitioning • Special data type - TimeUUID
Your totally!killer!application weatherstation_id='1234ABCD'
weatherstation_id='5678EFGH'
![Page 27: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/27.jpg)
How does data replicate?
![Page 28: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/28.jpg)
Primary key determines placement*
Partitioning
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru gender: F
johnny age:12 gender: M
suzy age:10 gender: F
![Page 29: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/29.jpg)
jim
carol
johnny
suzy
PK
5e02739678...
a9a0198010...
f4eb27cea7...
78b421309e...
MD5 Hash
MD5* hash operation yields a 128-bit number for keys of any size.
Key Hashing
![Page 30: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/30.jpg)
Node A
Node D Node C
Node B
The Token Ring
![Page 31: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/31.jpg)
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start EndA 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
![Page 32: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/32.jpg)
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start EndA 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
![Page 33: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/33.jpg)
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start EndA 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
![Page 34: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/34.jpg)
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start EndA 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
![Page 35: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/35.jpg)
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start EndA 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
![Page 36: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/36.jpg)
Node A
Node D Node C
Node B
carol a9a0198010...
Replication
![Page 37: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/37.jpg)
Node A
Node D Node C
Node B
carol a9a0198010...
Replication
![Page 38: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/38.jpg)
Node A
Node D Node C
Node B
carol a9a0198010...
ReplicationReplication factor = 3
Consistency is a different topic for later
![Page 39: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/39.jpg)
TimeUUID
• Also known as a Version 1 UUID • Sortable • Reversible
Timestamp to Microsecond + UUID = TimeUUID
04d580b0-9412-11e3-baa8-0800200c9a66 Wednesday, February 12, 2014 6:18:06 PM GMT
http://www.famkruithof.net/uuid/uuidgen
=
![Page 40: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/40.jpg)
Example 2: Financial Transactions• Trading of stocks •When did they happen? •Massive speeds and volumes
“Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of financial data, ingesting into its database 2million pieces of information a second from every major trading exchange.”*
* http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html
![Page 41: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/41.jpg)
Use case
• Store data per symbol and date • Store time series in reverse order: last to first •Make sure every transaction is unique
• Get all trades for symbol and day • Get trade for a single date and time • Get last 10 trades for symbol and date
Needed Queries
Data Model to support queries
![Page 42: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/42.jpg)
Data Model
• date is int of days since epoch • timeuuid keeps it unique • Reverse the times for later
queries
CREATE TABLE stock_ticks ( symbol text, date int, trade timeuuid, trade_details text, PRIMARY KEY ((symbol, date), trade) ) WITH CLUSTERING ORDER BY (trade DESC);
INSERT INTO stock_ticks(symbol, date, trade, trade_details) VALUES (‘NFLX’,340,04d580b0-1431-1e33-baf8-0833200c98a6,'BUY:2000'); !INSERT INTO stock_ticks(symbol, date, trade, trade_details) VALUES (‘NFLX’,340,05d580b0-6472-1ef3-a3a8-0430200c9a66,'BUY:300'); !INSERT INTO stock_ticks(symbol, date, trade, trade_details) VALUES (‘NFLX’,340,02d580b0-9412-d223-55a8-0976200c9a25,'SELL:450'); !INSERT INTO stock_ticks(symbol, date, trade, trade_details) VALUES (‘NFLX’,340,08d580b0-4482-11e3-5fd3-3421200c9a65,'SELL:3000');
![Page 43: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/43.jpg)
Storage Model - Logical View
08d580b0-4482-11e3-5fd3-3421200c9a65
SELL:3000
02d580b0-9412-d223-55a8-0976200c9a25
SELL:450
05d580b0-6472-1ef3-a3a8-0430200c9a66
BUY:300
SELECT trade,trade_details FROM stock_ticks WHERE symbol =‘NFLX’ AND date=‘340’;
NFLX:340
NFLX:340
NFLX:340
symbol:date trade trade_details
04d580b0-1431-1e33-baf8-0833200c98a6
BUY:2000NFLX:340
Last thing inserted
First thing inserted
![Page 44: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/44.jpg)
04d580b0-1431-1e33-baf8-0833200c98a6
05d580b0-6472-1ef3-a3a8-0430200c9a66
02d580b0-9412-d223-55a8
BUY:2000BUY:300
08d580b0-4482-11e3-5fd3-3421200c9a65
SELL:3000 SELL:450
Storage Model - Disk Layout
NFLX:340
Order is from last trade to first
SELECT trade,trade_details FROM stock_ticks WHERE symbol =‘NFLX’ AND date=‘340’;
![Page 45: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/45.jpg)
04d580b0-1431-1e33-baf8-0833200c98a6
05d580b0-6472-1ef3-a3a8-0430200c9a66
02d580b0-9412-d223-55a8-0976200c9a25
Query patterns• Limit queries • Get last X trades
From here
SELECT trade,trade_details FROM stock_ticks WHERE symbol =‘NFLX’ AND date=‘340’ LIMIT 3;
BUY:2000BUY:300
08d580b0-4482-11e3-5fd3-3421200c9a65
SELL:3000 SELL:450NFLX:340
to here
![Page 46: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/46.jpg)
Query patterns
Reverse sorted by trade Last 3 trades
08d580b0-4482-11e3-5fd3-3421200c9a65
SELL:3000
02d580b0-9412-d223-55a8-0976200c9a25
SELL:450
05d580b0-6472-1ef3-a3a8-0430200c9a66
BUY:300
NFLX:340
NFLX:340
NFLX:340
symbol:date trade trade_details
• Limit queries • Get last X trades
SELECT trade,trade_details FROM stock_ticks WHERE symbol =‘NFLX’ AND date=‘340’ LIMIT 3;
![Page 47: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/47.jpg)
Way more examples
• 5 minute interviews • Use cases • Free training!
!www.planetcassandra.org
![Page 48: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/48.jpg)
![Page 49: Time series with Apache Cassandra - Long version](https://reader034.vdocument.in/reader034/viewer/2022042714/54b6ca784a7959e5268b47f4/html5/thumbnails/49.jpg)
Thank You!
Follow me for more updates all the time: @PatrickMcFadin