cassandra for mission critical data
TRANSCRIPT
![Page 1: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/1.jpg)
Apache Cassandra for mission critical dataOLEKSANDR SEMENOV
![Page 2: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/2.jpg)
Agenda1) CAP Theorem2) NoSQL vs RDBMS: advantages and disadvantages3) What is Cassandra? History.4) Cassandra features5) Cassandra datamodel6) Ways to access data: Thrift, CQL, Kundera ORM
![Page 3: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/3.jpg)
What is NoSQLNoSQL Not SQL
does not mean
NoSQL Not Only SQL ORNot Relational Database
it means
![Page 4: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/4.jpg)
CAP Theorem You can choose only two: Consistency, Availability, Partition tolerance
![Page 5: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/5.jpg)
![Page 6: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/6.jpg)
Choosing AP data storages
Cassandra is an AP storage
![Page 7: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/7.jpg)
RDBMS+ Strong mathematical basis+ Referential Integrity+ ACID transactions+ Standard SQL+ Well-known approaches to data modeling- Poor performance at great data amounts- Scaling issues
![Page 8: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/8.jpg)
NoSQL+ Great performance+ Flexible data schema+ Easy scaling- Data redundancy- Integrity should be ensured by developer in most cases- Different access interfaces for different stores- Paradigm shift required- BASE consistency model instead of ACID transactions
![Page 9: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/9.jpg)
ACID consistency model
Atomicity• Transaction
s are all or nothing
Consistency• Data written
is valid according all rules:
Isolation• Transaction
s do not affect each other
Durability• Data written
will not be lost
![Page 10: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/10.jpg)
BASE consistency model
![Page 11: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/11.jpg)
BASE system example
![Page 12: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/12.jpg)
What is Cassandra? Cassandra is a:• non-relational• highly-scalable• decentralized• eventually consistent key-multivalue storage
![Page 13: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/13.jpg)
History
![Page 14: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/14.jpg)
Who uses Cassandra?
![Page 15: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/15.jpg)
![Page 16: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/16.jpg)
Cassandra Features
Decentralized• each node
has the same role and can process any request
Replication• Cassandra
supports multi -datacenter replication
Scalable• read and
write throughput both increase linearly as new machines are added
Durable• data write
once will survive in case of hardware failure
![Page 17: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/17.jpg)
Cassandra Features
Fault-tolerant• data is
automatically replicated to multiple nodes for fault-tolerance
Tunable consistency• you can
choose desired consistency level
CQL• SQL-like
query language
Very fast IO• Both reads
and writes are very fast
![Page 18: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/18.jpg)
Availability: partitioning with SPOF
![Page 19: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/19.jpg)
Availability: Cassandra & no SPOF
• Each node can act as router
• Data is replicated to several nodes according to replication factor
![Page 20: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/20.jpg)
Replication Factor
Replication Factor = 3
![Page 21: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/21.jpg)
Availability
![Page 22: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/22.jpg)
Tunable consistency
Consistency can be set on per-operation basis
![Page 23: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/23.jpg)
Write path in Cassandra• Data is written to any node called coordinator
• Data is written to commitlog(for durability) and then to memTable
• MemTable is flushed to disk(SSTable) periodically, it is recreated in memory
• Deletes are special cases of writes - tombstones
![Page 24: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/24.jpg)
Read path in Cassandra• Any server can be queried, it acts as coordinator
• Contacts node with requested key
• If consistency < ALL, read repair is performed on background
Read at consistency level = ONE
![Page 25: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/25.jpg)
Read repair• Read repair means that when a query is made against a given key, we
perform a digest query against all the replicas of the key and push the most recent version to any out-of-date replicas.
![Page 26: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/26.jpg)
Cassandra datamodel Keyspace
ColumnFamily
Columns SuperColumns
Database
Table
Columns
RDBMS Cassandra
![Page 27: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/27.jpg)
ColumnFamilies usage patterns
Static
Dynamic
![Page 28: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/28.jpg)
Columns Column – is a tuple which contains 3 fields: name, value and timestamp
![Page 29: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/29.jpg)
Special column types• Expiring Columns –
column with auto-removal• Counter columns –
columns with auto-increment.
• SuperColumns – columns, which contain other columns. Deprecated.
![Page 30: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/30.jpg)
SuperColumns
![Page 31: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/31.jpg)
Indexes• Primary index – index built by key of the each row• Secondary index – index on column values,
should be created manually. Good only for low cardinality columns. Example: columns Gender can have only two values: M and F.And it is a problem.
• Indexing is performed in background
![Page 32: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/32.jpg)
Data modelling• Query-driven approach is
required• How to get data if I can
query only by key?• Denormalize it!• Create multiple tables for
data• Use fast writes to do few
reads as possible
![Page 33: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/33.jpg)
What Cassandra is good for?
Time series data (logs, sensor data) Write intensive applications
Applications with
predefined query-model
![Page 34: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/34.jpg)
Never use Cassandra• If you want to replace traditional RDBMS with it.
• If you can’t tell in which way your data will be queried
• If you have a lot of reads
• If strong consistency is required (financial, medical areas)
• Cassandra is not a silver-bullet solution
![Page 35: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/35.jpg)
![Page 36: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/36.jpg)
Ways to access data
Thrift• First & native
client. Deprecated.
Hector, Pelops• Libraries
based on Thrift
CQL• SQL-like
language, very limited
Kundera• ORM/ONM
framework
![Page 37: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/37.jpg)
Thrift• Apache Thrift – framework for cross-language
services development• Supported languages: C++, Java, Python, PHP,
Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Smalltalk, OCaml and others.
• Was developed by Facebook and released in 2007• Deprecated
![Page 38: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/38.jpg)
Hector
• Hector - is a high level Java client for Apache Cassandra currently in use on a number of production systems.
• Includes an incredible number of features
![Page 39: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/39.jpg)
Hector main features• Security – connection using Kerberos• Speed4j monitoring library integrating capabilities• Hector Object Mapper – simple ORM(not
compliant with JPA )• Connection pooling• Failover behavior on client side
![Page 40: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/40.jpg)
CQLCQL – a SQL-like language introduced in Cassandra 0.8Offers next functionality:• No JOINS• Creating/dropping keyspaces, column families,
columns and rows• Inserting/retrieving columns• Indexing
![Page 41: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/41.jpg)
Kundera ORM
Kundera is a “Polyglot Object Mapper” Supports:
◦ Cassandra◦ HBase◦ MongoDB◦ RDBMS◦ and other
![Page 42: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/42.jpg)
Kundera ORM
JPA 2.1 compliantSupports cross-
datastore-persistance
Supports many-to-many relationships
Allows to add any NoSQL support by
implementing Client Extension
![Page 43: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/43.jpg)
Performance Comparison
Benchmarked on Amazon Ubuntu large instance:◦ 7.5 GB memory◦ 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute
Units each)◦ 64-bit platform
![Page 44: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/44.jpg)
Performance Comparison
Number Of Threads (1 record) Pelops Time (in sec) Hector Time (in sec) Kundera (in sec)
10 0.148 0.100 0.117
100 0.350 0.363 0.361
1000 1.793 1.885 2.180
10000 11.478 11.480 14.262
40000 38.887 37.241 41.977
50000 48.646 47.749 49.285
100000 91.280 92.874 97.707
Concurrent load – 1 record per thread
![Page 45: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/45.jpg)
Performance Comparison
10 100 1000 10000 40000 50000 1000000
20
40
60
80
100
120
Concurrent load - 1 record for each thread
Pelops
Hector
Kundera
Threads number
Tim
e, s
![Page 46: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/46.jpg)
Performance ComparisonConcurrent + Bulk load – 1000 record per thread
Number Of Threads (1000 rec/ thread) Pelops Time (in sec) Hector Time (in sec) Kundera (in sec)
10 5.929 5.286 7.722
100 34.750 32.228 39.124
1000 368.022 352.711 393.931
![Page 47: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/47.jpg)
Performance Comparison
10 100 10000
200
400
600
800
1000
1200
Concurrent + Bulk load – 1000 record per thread
Kundera
Hector
Pelops
Thread number
Tim
e, s
![Page 48: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/48.jpg)
Cassandra limitations
The key (and column names) must < 64K
bytes.
The maximum number of column per row is 2 billion.
A single column value may not be larger
than 2GB.
All data read should fit in memory due to
Thrift streaming support lack
![Page 49: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/49.jpg)
SummaryGreat I/O performance
Several data access interfaces
AP data store (CAP)
Production ready & production proved
Good for time series data
Extremely available
![Page 50: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/50.jpg)
References Datastax - http://www.datastax.com/docs/1.1/index Apache Cassandra - http://cassandra.apache.org/ All Things Distributed - http://www.allthingsdistributed.com/ Hector - http://hector-client.github.com/hector/build/html/index.html Kundera - https://github.com/impetus-opensource/Kundera
![Page 51: Cassandra for mission critical data](https://reader035.vdocument.in/reader035/viewer/2022062503/58ed3e0e1a28ab7d4f8b4613/html5/thumbnails/51.jpg)
Thank you!