time series with apache cassandra strata
DESCRIPTION
This talk is geared around understanding the basics of how Apache Cassandra stores and access time series data.TRANSCRIPT
©2013 DataStax Confidential. Do not distribute without consent.
@PatrickMcFadin
Patrick McFadinChief Evangelist
Time Series with Apache Cassandra
�1
Quick intro to Cassandra• Shared nothing •Masterless peer-to-peer • Based on Dynamo
Scaling• Add nodes to scale •Millions Ops/s Cassandra HBase Redis MySQL
THRO
UG
HPU
T O
PS/S
EC)
Uptime• Built to replicate • Resilient to failure • Always on
NONE
Easy to use• CQL is a familiar syntax • Friendly to programmers • Paxos for locking
CREATE TABLE users (! username varchar,! firstname varchar,! lastname varchar,! email list<varchar>,! password varchar,! created_date timestamp,! PRIMARY KEY (username)!);
INSERT INTO users (username, firstname, lastname, ! email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');!
INSERT INTO users (username, firstname, ! lastname, email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],! 'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00')!IF NOT EXISTS;
Time series in production• It’s all about “What’s happening” • Data is the new currency
“Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of financial data, ingesting into its database 2million pieces of information a second from every major trading exchange.”*
* http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html
Why Cassandra for Time Series
ScalesResilientGood data modelEfficient Storage Model
What about that?
Data Model•Weather Station Id and Time
are unique • Store as many as needed
CREATE TABLE temperature ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time) );
INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:01:00','72F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:02:00','73F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:03:00','73F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
Storage Model - Logical View
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD';
1234ABCD
1234ABCD
1234ABCD
weatherstation_id event_time temperature
2013-04-03 07:04:00
74F1234ABCD
Storage Model - Disk Layout
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F1234ABCD
2013-04-03 07:04:00
74F
SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD';
Merged, Sorted and Stored Sequentially
2013-04-03 07:05:00 !!74F
2013-04-03 07:06:00 !!75F
Query patterns• Range queries • “Slice” operation on disk
SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00';
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F1234ABCD
2013-04-03 07:04:00
74F
2013-04-03 07:05:00 !!74F
2013-04-03 07:06:00 !!75F
Single seek on disk
Query patterns• Range queries • “Slice” operation on disk
SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00';
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
1234ABCD
2013-04-03 07:04:00
74F
weatherstation_id event_time temperature
1234ABCD
1234ABCD
1234ABCD
Programmers like this
Sorted by event_time
Ingestion models• Apache Kafka • Apache Flume • Storm • Custom Applications
Apache Kafka
Your totally!killer!application
Dealing with data at speed• 1 million writes per second? • 1 insert every microsecond • Collisions?
• Primary Key determines node placement • Random partitioning • Special data type - TimeUUID
Your totally!killer!application weatherstation_id='1234ABCD'
weatherstation_id='5678EFGH'
TimeUUID
• Also known as a Version 1 UUID • Sortable • Reversible
Timestamp to Microsecond + UUID = TimeUUID
04d580b0-9412-11e3-baa8-0800200c9a66 Wednesday, February 12, 2014 6:18:06 PM GMT
http://www.famkruithof.net/uuid/uuidgen
=
Way more information
• 5 minute interviews • Use cases • Free training!
!www.planetcassandra.org
Thank You!
Follow me for more updates all the time: @PatrickMcFadin