the apache cassandra storage engine
TRANSCRIPT
![Page 1: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/1.jpg)
©2012 DataStax
The Apache Cassandra storage engine
Sylvain Lebresne
1
NoSQL matters 2012
![Page 2: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/2.jpg)
©2012 DataStax
• Sylvain Lebresne
• @pcmanus
About me
2
![Page 3: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/3.jpg)
©2012 DataStax3
![Page 4: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/4.jpg)
©2012 DataStax3
1. What is Apache Cassandra
2. Data Model
3. The storage engine
![Page 5: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/5.jpg)
©2012 DataStax
1. What is Apache Cassandra
2. Data Model
3. The storage engine
3
![Page 6: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/6.jpg)
©2012 DataStax
about:project• Distributed data store aimed at big data.
• Apache project since 2010.
• Version 1.1 released last month.• Proven in production (Netflix, Twitter, Reddit,
Cisco, ...). Largest know cluster has over 300TB in over 400 machines.
4
![Page 7: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/7.jpg)
©2012 DataStax
Apache Cassandra
5
![Page 8: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/8.jpg)
©2012 DataStax
Apache CassandraA database:
5
![Page 9: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/9.jpg)
©2012 DataStax
Apache CassandraA database:• distributed / decentralized
5
![Page 10: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/10.jpg)
©2012 DataStax
Apache CassandraA database:• distributed / decentralized• replicated & durable
5
![Page 11: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/11.jpg)
©2012 DataStax
Apache CassandraA database:• distributed / decentralized• replicated & durable• scalable / elastic
5
![Page 12: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/12.jpg)
©2012 DataStax
Apache CassandraA database:• distributed / decentralized• replicated & durable• scalable / elastic
5
![Page 13: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/13.jpg)
©2012 DataStax
Apache CassandraA database:• distributed / decentralized• replicated & durable• scalable / elastic• fault-tolerant / no SPOF
5
![Page 14: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/14.jpg)
©2012 DataStax
Apache CassandraA database:• distributed / decentralized• replicated & durable• scalable / elastic• fault-tolerant / no SPOF• highly available
5
![Page 15: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/15.jpg)
©2012 DataStax
Apache CassandraA database:• distributed / decentralized• replicated & durable• scalable / elastic• fault-tolerant / no SPOF• highly available• data center aware
US Europe
6
![Page 16: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/16.jpg)
©2012 DataStax7
1. What is Apache Cassandra
2. Data Model
3. The storage engine
![Page 17: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/17.jpg)
©2012 DataStax
• Not SQL (no transaction, nor joins) but more than Key/Value.
• Inspired by Google BigTable
• Column families based.
Data Model
8
![Page 18: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/18.jpg)
©2012 DataStaxUsers
Ex: user profiles
birth_year 1994
50e8-e29b
fname Justin
lname Bieber
“For each user, holds profile infos”
9
![Page 19: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/19.jpg)
©2012 DataStaxUsers
Ex: user profiles
birth_year 1994
50e8-e29b
fname Justin
lname Bieber
birth_year 1978
2ab1-f1b7
email [email protected]
fname Ashton
lname Kutcher
“For each user, holds profile infos”
10
![Page 20: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/20.jpg)
©2012 DataStaxTimeline
Ex: user’s Tweets
50e8-e29b
“For each user, tweets he has made”
11
![Page 21: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/21.jpg)
©2012 DataStaxTimeline
Ex: user’s Tweets
50e8-e29b
0 @LiveLoveKary glad you had a good birthday #muchlove
“For each user, tweets he has made”
11
![Page 22: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/22.jpg)
©2012 DataStaxTimeline
Ex: user’s Tweets
50e8-e29b
0 @LiveLoveKary glad you had a good birthday #muchlove
1 @NickDeMoura happy bday my dude.
“For each user, tweets he has made”
11
![Page 23: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/23.jpg)
©2012 DataStaxTimeline
Ex: user’s Tweets
50e8-e29b
0 @LiveLoveKary glad you had a good birthday #muchlove
1 @NickDeMoura happy bday my dude.
2 @MickyArison @miamiHEAT thanks for the gam tonight
“For each user, tweets he has made”
11
![Page 24: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/24.jpg)
©2012 DataStaxTimeline
Ex: user’s Tweets
50e8-e29b
0 @LiveLoveKary glad you had a good birthday #muchlove
1 @NickDeMoura happy bday my dude.
2 @MickyArison @miamiHEAT thanks for the gam tonight
3 still a little tired. back in the studio today with Timbaland
“For each user, tweets he has made”
11
![Page 25: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/25.jpg)
©2012 DataStax
There’s more• Secondary indexes
• Distributed counters
• Composite columns
12
![Page 26: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/26.jpg)
©2012 DataStax13
![Page 27: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/27.jpg)
©2012 DataStax13
1. What is Apache Cassandra
2. Data Model
3. The storage engine
![Page 28: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/28.jpg)
©2012 DataStax
• Writes are harder than reads to scale
• Spinning disks aren’t good with random I/O
• Goal: minimize random I/O
Goal
14
![Page 29: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/29.jpg)
©2012 DataStax
A write’s journey
Memory
Hard drive
Memtable
Commit log
15
write( , )k1 c1:v1
![Page 30: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/30.jpg)
©2012 DataStax
A write’s journey
Memory
Hard drive
Memtable
write( , )k1 c1:v1
Commit log
k1 c1:v1
k1 c1:v1
16
![Page 31: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/31.jpg)
©2012 DataStax
A write’s journey
Memory
Hard drive
k1 c1:v1
k1 c1:v1
ack
17
![Page 32: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/32.jpg)
©2012 DataStax
A write’s journey
Memory
Hard drive
k1 c1:v1
k1 c1:v1
write( , )k2 c1:v1 c2:v2
k2 c1:v1 c2:v2
k2 c1:v1 c2:v2
18
![Page 33: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/33.jpg)
©2012 DataStax
A write’s journey
Memory
Hard drive
k1 c1:v1
k1 c1:v4 c2:v2
write( , )k1 c1:v4 c3:v3
k2 c1:v1 c2:v2
c3:v3
c2:v2
k2 c1:v1 c2:v2
k1 c1:v4 c3:v3
c2:v2
19
![Page 34: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/34.jpg)
©2012 DataStax
A write’s journey
Memory
Hard drive
SSTable
flush
k1 c1:v4 c2:v2
k2 c1:v1 c2:v2
c3:v3
index
cleanup
20
![Page 35: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/35.jpg)
©2012 DataStax
A write’s journey
Memory
Hard drive
k2 c1:v2 c3:v3
k1 c1:v5 c4:v4
k1 c1:v4 c2:v2
k2 c1:v1 c2:v2
c3:v3
index
more updates
k1 c1:v5 c4:v4
k2 c1:v2 c3:v3
21
![Page 36: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/36.jpg)
©2012 DataStax
A write’s journey
Memory
Hard drive
k1 c1:v4 c2:v2
k2 c1:v1 c2:v2
c3:v3
indexk1 c1:v5 c4:v4
k2 c1:v2 c3:v3
index
flush
22
![Page 37: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/37.jpg)
©2012 DataStax
Writes properties• No reads or seeks
• Only sequential I/O
• Immutable SSTables: easy snapshots
23
![Page 38: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/38.jpg)
©2012 DataStax
A read’s journey
Memory
Hard drive
k1 c1:v4 c2:v2
k2 c1:v1 c2:v2
c3:v3
indexk1 c1:v5 c4:v4
k2 c1:v2 c3:v3
index
read( )k1
?
24
![Page 39: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/39.jpg)
A read’s journey
Memory
Hard drive
k1 c1:v4 c2:v2
k2 c1:v1 c2:v2
c3:v3
index
k1 c1:v5 c4:v4
k2 c1:v2 c3:v3
index
k1
merge
c1:v5 c2:v2 c3:v3 c4:v4
![Page 40: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/40.jpg)
©2012 DataStax
Compaction
• Goal: keep the number of SSTables low
• Merge sort against multiple sstables
• Sequential I/O
26
![Page 41: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/41.jpg)
©2012 DataStax
Compaction
• Goal: keep the number of SSTables low
• Merge sort against multiple sstables
• Sequential I/O
k1 c1:v4 c2:v2
k2 c1:v1 c2:v2
c3:v3
index
k1 c1:v5 c4:v4
k2 c1:v2 c3:v3
index
26
![Page 42: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/42.jpg)
©2012 DataStax
Compaction
• Goal: keep the number of SSTables low
• Merge sort against multiple sstables
• Sequential I/O
k1 c1:v4 c2:v2
k2 c1:v1 c2:v2
c3:v3
index
k1 c1:v5 c4:v4
k2 c1:v2 c3:v3
indexk1 c1:v5 c2:v2
k2 c1:v2 c2:v2
c3:v3
indexc4:v4
c3:v3
26
![Page 43: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/43.jpg)
©2012 DataStax
SSTables
27
BF Index SummaryMemoryDisk
k1 k2 k3
k1
312 0 ...
Index
Data
Col. BF Col. Index c1:v4 c2:v2 c3:v3 ... k2 Col. BF ...
![Page 44: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/44.jpg)
©2012 DataStax
Optimizations• Row Cache
• Bloom filters: eliminates whole SSTable
• Key Cache• Rows & Columns Indexes
• ...
28
![Page 45: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/45.jpg)
©2012 DataStax
Other features
• Compression
• Checksums
• Time to live
29
![Page 46: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/46.jpg)
©2012 DataStax30
QUESTIONS?
![Page 47: The Apache Cassandra storage engine](https://reader034.vdocument.in/reader034/viewer/2022042707/585482281a28abfa39906705/html5/thumbnails/47.jpg)
©2012 DataStax
• http://cassandra.apache.org/
• http://wiki.apache.org/cassandra/
• http://www.datastax.com/docs/1.0
31