![Page 1: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/1.jpg)
@PatrickMcFadin
Patrick McFadinChief Evangelist for Apache Cassandra, DataStax
Storing Time Series Data with
1
![Page 2: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/2.jpg)
My Background
…ran into this problem
![Page 3: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/3.jpg)
Gave it my best shot
shard 1 shard 2 shard 3 shard 4
router
client
Patrick,All your wildest
dreams will come true.
![Page 4: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/4.jpg)
Just add complexity!
![Page 5: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/5.jpg)
A new plan
![Page 6: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/6.jpg)
Dynamo Paper(2007)•How do we build a data store that is: • Reliable • Performant • “Always On” •Nothing new and shiny
Evolutionary. Real. Computer Science
Also the basis for Riak and Voldemort
![Page 7: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/7.jpg)
BigTable(2006)
• Richer data model • 1 key. Lots of values • Fast sequential access • 38 Papers cited
![Page 8: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/8.jpg)
Cassandra(2008)
• Distributed features of Dynamo • Data Model and storage from
BigTable • February 17, 2010 it graduated to
a top-level Apache project
![Page 9: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/9.jpg)
A Data Ocean or Pond., Lake
An In-Memory Database
A Key-Value Store
A magical database unicorn that farts rainbows
![Page 10: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/10.jpg)
Cassandra for Applications
APACHE
CASSANDRA
![Page 11: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/11.jpg)
Basic Architecture
![Page 12: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/12.jpg)
Row
Column 1
Partition Key 1
Column 2
Column 3
Column 4
![Page 13: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/13.jpg)
Partition
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
![Page 14: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/14.jpg)
Table Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Column 1
Column 2
Column 3
Column 4
Column 1
Column 2
Column 3
Column 4
Column 1
Column 2
Column 3
Column 4
Partition Key 2
Partition Key 2
Partition Key 2
![Page 15: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/15.jpg)
Keyspace
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Column 1
Partition Key 2
Column 2
Column 3
Column 4
Table 1 Table 2Keyspace 1
![Page 16: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/16.jpg)
NodeServer
![Page 17: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/17.jpg)
TokenServer•Each partition is a 128 bit value
•Consistent hash between 2-63 and 264 •Each node owns a range of those values
•The token is the beginning of that range to the next node’s token value
•Virtual Nodes break these down further
Data
Token Range
0 …
![Page 18: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/18.jpg)
Cluster Server
Token Range
0 0-100
0-100
![Page 19: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/19.jpg)
Cluster Server
Token Range
0 0-50
51 51-100
Server
0-50
51-100
![Page 20: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/20.jpg)
Cluster Server
Token Range
0 0-25
26 26-50
51 51-75
76 76-100Server
ServerServer
0-25
76-100
26-5051-75
![Page 21: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/21.jpg)
Replication10.0.0.1 00-25
DC1
DC1: RF=1
Node Primary
10.0.0.1 00-25
10.0.0.2 26-50
10.0.0.3 51-75
10.0.0.4 76-100
10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
![Page 22: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/22.jpg)
Replication10.0.0.1
00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
DC1
DC1: RF=2
Node Primary Replica
10.0.0.1 00-25 76-100
10.0.0.2 26-50 00-25
10.0.0.3 51-75 26-50
10.0.0.4 76-100 51-75
76-100
00-25
26-50
51-75
![Page 23: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/23.jpg)
ReplicationDC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
76-100 51-75
00-25 76-100
26-50 00-25
51-75 26-50
![Page 24: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/24.jpg)
ConsistencyDC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
76-100 51-75
00-25 76-100
26-50 00-25
51-75 26-50
Client
Write to partition 15
![Page 25: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/25.jpg)
Consistency level
Consistency Level Number of Nodes Acknowledged
One One - Read repair triggered
Local One One - Read repair in local DC
Quorum 51%
Local Quorum 51% in local DC
![Page 26: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/26.jpg)
ConsistencyDC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
76-100 51-75
00-25 76-100
26-50 00-25
51-75 26-50
Client
Write to partition 15 CL= One
![Page 27: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/27.jpg)
ConsistencyDC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
76-100 51-75
00-25 76-100
26-50 00-25
51-75 26-50
Client
Write to partition 15 CL= One
![Page 28: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/28.jpg)
ConsistencyDC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
76-100 51-75
00-25 76-100
26-50 00-25
51-75 26-50
Client
Write to partition 15 CL= Quorum
![Page 29: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/29.jpg)
Multi-datacenterDC1
DC1: RF=3Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
76-100 51-75
00-25 76-100
26-50 00-25
51-75 26-50
Client
Write to partition 15
DC2
10.1.0.1 00-25
10.1.0.4 76-100
10.1.0.2 26-50
10.1.0.3 51-75
76-100 51-75
00-25 76-100
26-50 00-25
51-75 26-50
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
DC2: RF=3
![Page 30: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/30.jpg)
Multi-datacenterDC1
DC1: RF=3Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
76-100 51-75
00-25 76-100
26-50 00-25
51-75 26-50
Client
Write to partition 15
DC2
10.1.0.1 00-25
10.1.0.4 76-100
10.1.0.2 26-50
10.1.0.3 51-75
76-100 51-75
00-25 76-100
26-50 00-25
51-75 26-50
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
DC2: RF=3
![Page 31: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/31.jpg)
Multi-datacenterDC1
DC1: RF=3Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
76-100 51-75
00-25 76-100
26-50 00-25
51-75 26-50
Client
Write to partition 15
DC2
10.1.0.1 00-25
10.1.0.4 76-100
10.1.0.2 26-50
10.1.0.3 51-75
76-100 51-75
00-25 76-100
26-50 00-25
51-75 26-50
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
DC2: RF=3
![Page 32: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/32.jpg)
Cassandra Query Language - CQL
![Page 33: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/33.jpg)
Table
CREATE TABLE weather_station ( id text, name text, country_code text, state_code text, call_sign text, lat double, long double, elevation double, PRIMARY KEY(id) );
Table Name
Column NameColumn CQL Type
Primary Key Designation Partition Key
![Page 34: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/34.jpg)
Table
CREATE TABLE daily_aggregate_precip ( wsid text, year int, month int, day int, precipitation counter, PRIMARY KEY ((wsid), year, month, day) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC);
Partition KeyClustering Columns
Order Override
![Page 35: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/35.jpg)
Insert
INSERT INTO weather_station (id, call_sign, country_code, elevation, lat, long, name, state_code) VALUES ('727930:24233', 'KSEA', 'US', 121.9, 47.467, -122.32, 'SEATTLE SEATTLE-TACOMA INTL A', ‘WA');
Table Name Fields
Values
Partition Key: Required
![Page 36: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/36.jpg)
Select
id | call_sign | country_code | elevation | lat | long | name | state_code--------------+-----------+--------------+-----------+--------+---------+-------------------------------+------------727930:24233 | KSEA | US | 121.9 | 47.467 | -122.32 | SEATTLE SEATTLE-TACOMA INTL A | WA
SELECT id, call_sign, country_code, elevation, lat, long, name, state_codeFROM weather_stationWHERE id = '727930:24233';
Fields
Table Name
Primary Key: Partition Key Required
![Page 37: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/37.jpg)
Update
UPDATE weather_stationSET name = 'SeaTac International Airport'WHERE id = '727930:24233';
id | call_sign | country_code | elevation | lat | long | name | state_code--------------+-----------+--------------+-----------+--------+---------+------------------------------+------------727930:24233 | KSEA | US | 121.9 | 47.467 | -122.32 | SeaTac International Airport | WA
Table Name Fields to Update: Not in Primary Key
Primary Key
![Page 38: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/38.jpg)
Delete
DELETE FROM weather_stationWHERE id = '727930:24233';
Table Name
Primary Key: Required
![Page 39: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/39.jpg)
CollectionsSet
CREATE TABLE weather_station ( id text, name text, country_code text, state_code text, call_sign text, lat double, long double, elevation double, equipment set<text> PRIMARY KEY(id) );
equipment set<text>
CQL Type: For Ordering
Column Name
![Page 40: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/40.jpg)
CollectionsSet
List
CREATE TABLE weather_station ( id text, name text, country_code text, state_code text, call_sign text, lat double, long double, elevation double, equipment set<text>, service_dates list<timestamp>, PRIMARY KEY(id) );
equipment set<text>
service_dates list<timestamp>
CQL Type
Column Name
CQL Type: For Ordering
Column Name
![Page 41: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/41.jpg)
CollectionsSet
List
Map
CREATE TABLE weather_station ( id text, name text, country_code text, state_code text, call_sign text, lat double, long double, elevation double, equipment set<text>, service_dates list<timestamp>, service_notes map<timestamp,text>, PRIMARY KEY(id) );
equipment set<text>
service_dates list<timestamp>
service_notes map<timestamp,text>
CQL Type
Column Name
Column Name
CQL Key Type CQL Value Type
CQL Type: For Ordering
Column Name
![Page 42: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/42.jpg)
UDF and UDAUser Defined Function
CREATE OR REPLACE AGGREGATE group_and_count(text) SFUNC state_group_and_countSTYPE map<text, int> INITCOND {};
CREATE FUNCTION state_group_and_count( state map<text, int>, type text ) CALLED ON NULL INPUTRETURNS map<text, int> LANGUAGE java AS ' Integer count = (Integer) state.get(type); if (count == null) count = 1; else count++; state.put(type, count); return state; ' ;
User Defined Aggregate
As of Cassandra 2.2
![Page 43: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/43.jpg)
Example: Weather Station•Weather station collects data • Cassandra stores in sequence • Application reads in sequence
![Page 44: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/44.jpg)
Queries supported
CREATE TABLE raw_weather_data ( wsid text, year int, month int, day int, hour int, temperature double, dewpoint double, pressure double, wind_direction int, wind_speed double, sky_condition int, sky_condition_text text, one_hour_precip double, six_hour_precip double, PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
Get weather data given •Weather Station ID •Weather Station ID and Time •Weather Station ID and Range of Time
![Page 45: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/45.jpg)
Primary Key
CREATE TABLE raw_weather_data ( wsid text, year int, month int, day int, hour int, temperature double, dewpoint double, pressure double, wind_direction int, wind_speed double, sky_condition int, sky_condition_text text, one_hour_precip double, six_hour_precip double, PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
![Page 46: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/46.jpg)
Primary key relationship
PRIMARY KEY ((wsid),year,month,day,hour)
![Page 47: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/47.jpg)
Primary key relationship
Partition Key
PRIMARY KEY ((wsid),year,month,day,hour)
![Page 48: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/48.jpg)
Primary key relationship
PRIMARY KEY ((wsid),year,month,day,hour)
Partition Key Clustering Columns
![Page 49: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/49.jpg)
Primary key relationship
Partition Key Clustering Columns
10010:99999
PRIMARY KEY ((wsid),year,month,day,hour)
![Page 50: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/50.jpg)
2005:12:1:10
-5.6
Primary key relationship
Partition Key Clustering Columns
10010:99999-5.3-4.9-5.1
2005:12:1:9 2005:12:1:8 2005:12:1:7
PRIMARY KEY ((wsid),year,month,day,hour)
![Page 51: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/51.jpg)
Partition keys
10010:99999 Murmur3 Hash Token = 7224631062609997448
722266:13850 Murmur3 Hash Token = -6804302034103043898
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.6);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘722266:13850’,2005,12,1,7,-5.6);
Consistent hash. 128 bit number between 2-63 and 264
![Page 52: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/52.jpg)
Partition keys
10010:99999 Murmur3 Hash Token = 15
722266:13850 Murmur3 Hash Token = 77
For this example, let’s make it a reasonable number
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.6);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘722266:13850’,2005,12,1,7,-5.6);
![Page 53: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/53.jpg)
Data LocalityDC1
DC1: RF=3Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
76-100 51-75
00-25 76-100
26-50 00-25
51-75 26-50
Client
Read partition 15
DC2
10.1.0.1 00-25
10.1.0.4 76-100
10.1.0.2 26-50
10.1.0.3 51-75
76-100 51-75
00-25 76-100
26-50 00-25
51-75 26-50
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
DC2: RF=3
Client
Read partition 15
![Page 54: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/54.jpg)
Data Locality
wsid=‘10010:99999’ ?
1000 Node Cluster
You are here!
![Page 55: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/55.jpg)
WritesCREATE TABLE raw_weather_data ( wsid text, year int, month int, day int, hour int, temperature double, dewpoint double, pressure double, wind_direction int, wind_speed double, sky_condition int, sky_condition_text text, one_hour_precip double, six_hour_precip double, PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
![Page 56: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/56.jpg)
WritesCREATE TABLE raw_weather_data ( wsid text, year int, month int, day int, hour int, temperature double, PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,10,-5.6);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,9,-5.1);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,8,-4.9);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.3);
![Page 57: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/57.jpg)
Write PathClient INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,7,-5.3);
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Memtable
SSTable
SSTable
SSTable
SSTable
Node
Commit Log Data * Compaction *
![Page 58: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/58.jpg)
Date Tiered Compaction Strategy•Group similar time blocks •Never compact again •Used for high density
SSTable
SSTable
SSTable
T=2015-01-01 -> 2015-01-5
T=2015-01-06 -> 2015-01-10
T=2015-01-11 -> 2015-01-15
![Page 59: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/59.jpg)
Storage Model - Logical View
2005:12:1:10
-5.6
2005:12:1:9
-5.1
2005:12:1:8
-4.9
10010:99999
10010:99999
10010:99999
wsid hour temperature
2005:12:1:7
-5.310010:99999
SELECT wsid, hour, temperatureFROM raw_weather_dataWHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;
![Page 60: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/60.jpg)
2005:12:1:10
-5.6 -5.3-4.9-5.1
Storage Model - Disk Layout
2005:12:1:9 2005:12:1:810010:99999
2005:12:1:7
Merged, Sorted and Stored Sequentially
SELECT wsid, hour, temperatureFROM raw_weather_dataWHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;
![Page 61: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/61.jpg)
2005:12:1:10
-5.6
2005:12:1:11
-4.9 -5.3-4.9-5.1
Storage Model - Disk Layout
2005:12:1:9 2005:12:1:810010:99999
2005:12:1:7
Merged, Sorted and Stored Sequentially
SELECT wsid, hour, temperatureFROM raw_weather_dataWHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;
![Page 62: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/62.jpg)
2005:12:1:10
-5.6
2005:12:1:11
-4.9 -5.3-4.9-5.1
Storage Model - Disk Layout
2005:12:1:9 2005:12:1:810010:99999
2005:12:1:7
Merged, Sorted and Stored Sequentially
SELECT wsid, hour, temperatureFROM raw_weather_dataWHERE wsid=‘10010:99999’ AND year = 2005 AND month = 12 AND day = 1;
2005:12:1:12
-5.4
![Page 63: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/63.jpg)
Read PathClient
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Column 1
Partition Key 1
Column 2
Column 3
Column 4
Memtable
SSTableSSTable
SSTable
Node
Data
SELECT wsid,hour,temperatureFROM raw_weather_dataWHERE wsid='10010:99999'AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;
![Page 64: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/64.jpg)
Query patterns• Range queries • “Slice” operation on disk
Single seek on disk
10010:99999
Partition key for locality
SELECT wsid,hour,temperatureFROM raw_weather_dataWHERE wsid='10010:99999'AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;
2005:12:1:10
-5.6 -5.3-4.9-5.1
2005:12:1:9 2005:12:1:8 2005:12:1:7
![Page 65: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/65.jpg)
Query patterns• Range queries • “Slice” operation on disk
Programmers like this
Sorted by event_time2005:12:1:10
-5.6
2005:12:1:9
-5.1
2005:12:1:8
-4.9
10010:99999
10010:99999
10010:99999
weather_station hour temperature
2005:12:1:7
-5.310010:99999
SELECT weatherstation,hour,temperature FROM temperature WHERE weatherstation_id=‘10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;
![Page 67: Storing time series data with Apache Cassandra](https://reader030.vdocument.in/reader030/viewer/2022020101/55bec1cdbb61eb0d7b8b4792/html5/thumbnails/67.jpg)
Thank you!
Bring the questions
Follow me on twitter @PatrickMcFadin