time series data with apache cassandra
DESCRIPTION
TRANSCRIPT
![Page 1: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/1.jpg)
Time Series Data With Apache Cassandra
StrangeloopSeptember 19, 2014
Eric [email protected]
@jericevans
![Page 2: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/2.jpg)
Open
![Page 3: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/3.jpg)
Open
![Page 4: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/4.jpg)
Open
![Page 5: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/5.jpg)
Open
![Page 6: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/6.jpg)
Network
Management
System
![Page 7: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/7.jpg)
OpenNMS: What It Is
● Network Management System○ Discovery and Provisioning○ Service monitoring○ Data collection○ Event management, notifications
● Java, open source, GPLv3● Since 1999
![Page 8: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/8.jpg)
Time series: RRDTool
● Round Robin Database● First released 1999● Time series storage● File-based, constant-size, self-maintaining● Automatic, incremental aggregation
![Page 9: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/9.jpg)
… and oh yeah, graphing
![Page 10: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/10.jpg)
Consider
● 5+ IOPs per update (read-modify-write)!● 100,000s of metrics, 1,000s IOPS● 1,000,000s of metrics, 10,000s IOPS● 15,000 RPM SAS drive, ~175-200 IOPS
![Page 11: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/11.jpg)
![Page 12: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/12.jpg)
Hmmm
We collect and write a great deal; We read (graph) relatively little.
So why are we aggregating everything?
![Page 13: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/13.jpg)
Also
● Not everything is a graph● Inflexible● Incremental backups impractical● Availability subject to filesystem access
![Page 14: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/14.jpg)
TIL
Metrics typically appear in groups that are accessed together.
Optimizing storage for grouped access is a great idea!
![Page 15: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/15.jpg)
What OpenNMS needs:
● High throughput● High availability● Late aggregation● Grouped storage/retrieval
![Page 16: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/16.jpg)
Cassandra
● Apache top-level project● Distributed database● Highly available● High throughput● Tunable consistency
![Page 17: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/17.jpg)
SSTables
Writes
Commitlog
Memtable
SSTable
DiskMemory
![Page 18: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/18.jpg)
Write Properties
● Optimized for write throughput● Sorted on disk● Perfect for time series!
![Page 19: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/19.jpg)
Partitioning
A
B
C
Key: Apple
...
AZ
![Page 20: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/20.jpg)
Placement
A
B
C
Key: Apple
...
![Page 21: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/21.jpg)
Replication
A
B
C
Key: Apple
...
![Page 22: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/22.jpg)
CAP Theorem
Consistency
Availability
Partition tolerance
![Page 23: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/23.jpg)
Consistency
A
B
?
W=2
![Page 24: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/24.jpg)
Consistency
?
B
C
R=2
R+W > N
![Page 25: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/25.jpg)
Distribution Properties
● Symmetrical● Linearly scalable● Redundant● Highly available
![Page 26: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/26.jpg)
D ata odelM
![Page 27: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/27.jpg)
Data Modelresource
![Page 28: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/28.jpg)
Data Modelresource
T1 T2 T3
![Page 29: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/29.jpg)
Data Modelresource
T1
M1 M2
V1 V2
M3
V3
T2
M1 M2
V1 V2
M3
V3
T3
M1 M2
V1 V2
M3
V3
![Page 30: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/30.jpg)
Data Model
CREATE TABLE samples ( T timestamp,
M text,
V double,
resource text,
PRIMARY KEY(resource, T, M));
![Page 31: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/31.jpg)
Data model
V1T1 M1 V1T2 M1 T3 V1M1resource
![Page 32: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/32.jpg)
Data model
SELECT * FROM samplesWHERE resource = ‘resource’AND T >= ‘T1’ AND T <= ‘T3’;
V1T1 M1 V1T2 M1 T3 V1M1resource
![Page 33: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/33.jpg)
Data model
SELECT * FROM samplesWHERE resource = ‘resource’AND T >= ‘T1’ AND T <= ‘T3’;
V1T1 M1 V1T2 M1 T3 V1M1resource
![Page 34: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/34.jpg)
Data model
V1T1 M1 V1T2 M1 T3 V1M1resource
T1 M1 V1resource
T2 M2 V2resource
T3 M3 V3resource
![Page 35: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/35.jpg)
Newts
● Standalone time series data-store○ Java API○ REST interface
● Raw sample storage and retrieval● Flexible aggregations (computed at read)
○ Rate (from counter types)○ Pluggable aggregation functions○ Arbitrary calculations
![Page 36: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/36.jpg)
Newts
● Cassandra-speed● Resource search indexing (preliminary)● Approaching “1.0”● Apache license● Github (http://github.com/OpenNMS/newts)● http://newts.io
![Page 37: Time Series Data with Apache Cassandra](https://reader034.vdocument.in/reader034/viewer/2022052617/547a4e68b4af9fd3158b4a87/html5/thumbnails/37.jpg)
Fin