Download - Talk About Apache Cassandra
![Page 2: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/2.jpg)
Outline
• Overview• Architecture Overview• Partitioning and Replication• Data Consistency
![Page 3: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/3.jpg)
Overview
• Distributed– Data partitioned among all nodes
• Extremely Scalable– Add new node = Add more capacity– Easy to add new node
• Fault tolerant– All nodes are the same– Read/Write anywhere– Automatic Data replication
![Page 4: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/4.jpg)
Overview
• High Performance
• Schema-less (Not completely true)– Need to provide basic settings for each column family.
http://blog.cubrid.org/dev-platform/nosql-benchmarking/
![Page 5: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/5.jpg)
Architecture Overview
• Keyspace– Where the replication strategy and replication factor is
defined– RDBMS synonym: Database
• Column family– Standard (recommended) or Super– Lots of settings can be defined– RDBMS synonym: Table
• Row/Record– Indexed by Key. Columns might be indexed as well– Column name are sorted based on the comparator– Each column has its own timestamp
![Page 6: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/6.jpg)
Architecture OverviewStandard CF{ Key1: { column1: value, column2: value }, Key2: { column1: value, column2: value }}
Recommended. Super columns could be somehow replaced by composite columns.
Super CF{ Key1: { super_column1: { subColumn1: value, subColumn2: value }, super_column2: { subColumn1: value, subColumn2: value } }, Key2: { super_column1: { subColumn1: value, subColumn2: value }}
![Page 7: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/7.jpg)
Architecture Overview• Commit log– Used to capture write activities. Data durability is
assured.• Memtable– Used to store most recent write activities.
• SSTable– When a memtable got flushed to disk, it becomes
a sstable.
![Page 8: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/8.jpg)
Architecture Overview
• Data write path
Commitlog MemtableData
SSTable
Flushed
![Page 9: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/9.jpg)
Architecture Overview
• Data read path– Search Row cache, if the result is not empty, then
return the result. No further actions are needed.– If no hit in the Row cache. Try to get data from
Memtable(s) and SSTable(s). Collate the results and return.
![Page 10: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/10.jpg)
Partitioning and Replication
• In Cassandra, the total data managed by the cluster is represented as a circular space or ring.
• The ring is divided up into ranges equal to the number of nodes, with each node being responsible for one or more ranges of the overall data.
• Before a node can join the ring, it must be assigned a token. The token determines the node’s position on the ring and the range of data it is responsible for.
![Page 11: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/11.jpg)
Partitioning
2…
4…6…
8…
0
Data
Data is inserted and assigned a row key in a column family.
{ boris:{ first name: boris, last name: Yen }}
Data placed on the node based on its row key
“boris” is inserted here
![Page 12: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/12.jpg)
Partitioning Strategies
• Random Partitioning– This is the default and recommended strategy. Partition
data as evenly as possible across all nodes using an MD5 hash of every column family row key
• Order Partitioning– Store column family row keys in sorted order across all
nodes in the cluster.– Sequential writes can cause hot spots– More administrative overhead to load balance the cluster– Uneven load balancing for multiple column families
![Page 13: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/13.jpg)
Setting up data Partitioning
• The data partitioning strategy is controlled via the partitioner option inside cassandra.yaml file
• Once a cluster in initialized with a partitioner option, it can not be changed without reloading all of the data in the cluster.
![Page 14: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/14.jpg)
Replication
• To ensure fault tolerance and no single point of failure, you can replicate one or more copies of every row across nodes in the cluster
• Replication is controlled by the parameters replication factor and replication strategy of a keyspace
• Replication factor controls how many copies of a row should be store in the cluster
• Replication strategy controls how the data being replicated.
![Page 15: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/15.jpg)
Replication
2…
4…6…
8…
0
Data
Data is inserted and assigned a row key in a column family.
{ boris:{ first name: boris, last name: Yen }}
Copy of row is replicated across various nodes based on the assigned replication factor
“boris” is inserted here
“boris” is inserted here“boris” is inserted here
RF=3
![Page 16: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/16.jpg)
Replication Strategies
• Simple Strategy– Place the original row on a node determined by the
partitioner. Additional replica rows are placed on the new nodes clockwise in the ring.
• Network Topology Strategy– Allow replication between different racks in a data center
and or between multiple data centers– The original row is placed according the partitioner.
Additional replica rows in the same data center are then placed by walking the ring clockwise until a node in a different rack from previous replica is found. If there is no such node, additional replicas will be placed in the same rack.
![Page 17: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/17.jpg)
Replication - Network Topology Strategy
RF={DC1:2, DC2:2}
http://www.datastax.com/docs/1.0/cluster_architecture/replication
![Page 18: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/18.jpg)
Replication Mechanics
• Cassandra uses a snitch to define how nodes are grouped together within the overall network topology, such as rack and data center groupings.
• The snitch is defined in the cassandra.yaml
![Page 19: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/19.jpg)
Replication Mechanics - Snitches
• Simple Snitch– The default and used for simple replication strategy
• Rack Inferring Snitch– Infers the topology of the network by analyzing the node IP
addresses. This snitch assumes that the second octet identifies the data center where a node is located, and third octet identifies the rack
• Property File Snitch– Determines the location of nodes by referring to a user-
defined file, cassandra-topology.properties• EC2 Snitch– Is for deployments on Amazon EC2 only
![Page 20: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/20.jpg)
Data Consistency
• Cassandra supports tunable data consistency• Choose from strong and eventual consistency
depending on the need• Can be done on a per-operation basis, and for
both reads and writes.• Handles multi-data center operations
![Page 21: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/21.jpg)
Consistency Level for Writes• Any
– A write must succeed on any available node (hint included)• One
– A write must succeed on any node responsible for that row (either primary or replica)
• Quorum– A write mush succeed on a quorum of replica nodes (RF/2 + 1)
• Local_Quorum– A write mush succeed on a quorum of replica nodes in the same data
center as the coordinator node.• Each_Quorum
– A write must succeed on a quorum of replica nodes in all data centers• All
– A write must succeed on all replica nodes for a row key
![Page 22: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/22.jpg)
Consistency Level for Reads
• One– Reads from the closest node holding the data
• Quorum– Returns a result from a quorum of servers with the most recent
timestamp for the data• Local_Quorum
– Returns a result from a quorum of servers with the most recent timestamp for the data in the same data center as the coordinator node
• Each_Quorum– Returns a result from a quorum of servers with the most recent
timestamp in all data centers• All
– Returns a result from all replica nodes for a row key
![Page 23: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/23.jpg)
Built-in Consistency Repair Features
• Read Repair– When a read is done, the coordinator node
compares the data from all the remaining replicas that own the row in the background, and If they are inconsistent, issues writes to the out-of-date replicas to update the row.
• Anti-Entropy Node Repair• Hinted Handoff
![Page 24: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/24.jpg)
What is New in 1.0
• Column Family Compression– 2x-4x reduction in data size– 25-35% performance improvement on reads– 5-10% performance improvement on writes
• Improved Memory and Disk Space Management– Off-heap row cache– Storage engine self-tuning– Faster disk space reclamation
• Tunable Compaction Strategy– Support LevelDB style compaction algorithm that can be
enabled on a per-column family basis.
![Page 25: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/25.jpg)
What is New in 1.0
• Cassandra Windows Service• Improved Write Consistency and Performance– Hint data is stored more efficiently– Coordinator nodes no longer need to wait for the
failure detector to mark a node as down before saving hints for unresponsive nodes.• Running a full node repair to reconcile missed writes is
not necessary. Full node repair is only necessary when simultaneous multi-node fails o losing a node entirely• Default read repair probability has been reduced from
100% to 10%
![Page 26: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/26.jpg)
Anti-Patterns
• Non-Sun JVM• CommitLog+Data on the same Disk– Does not apply to SSDs or EC2
• Oversized JVM heaps– 6-8 GB is good– 10-12 is possible and in some circumstances
“correct”– 16GB == max JVM heap size– > 16GB => badness
http://www.slideshare.net/mattdennis/cassandra-antipatterns
![Page 27: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/27.jpg)
Anti-Patterns
• Large batch mutations– Timeout => entire mutation must be retried =>
wasted work– Keep the batch mutations to 10-100 (this really
depends on the HW)• Ordered partitioner– Creates hot spots– Requires extra cares from operators
• Cassnadra auto selection of tokens– Always specify your initial token.
http://www.slideshare.net/mattdennis/cassandra-antipatterns
![Page 28: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/28.jpg)
Anti-Patterns
• Super Column– 10-15 percent performance penalty on reads and
writes– Easier/Better to use to composite columns
• Read Before write• Winblows
http://www.slideshare.net/mattdennis/cassandra-antipatterns
![Page 29: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/29.jpg)
Want to Learn More
• http://www.datastax.com/resources/tutorials• http://www.datastax.com/docs/1.0/index
P.S. Most of the content in this presentation is actually coming from the websites above
![Page 30: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/30.jpg)
Q&A
![Page 31: Talk About Apache Cassandra](https://reader033.vdocument.in/reader033/viewer/2022051314/54b7a21c4a79592d048b45ec/html5/thumbnails/31.jpg)
We are hiring people
• If you are interesting in what we are doing, please contact us.