c* summit 2013: cassandra on flash: performance & efficiency lessons learned by matt stump

39
#CASSANDRA13 Ma#hew Stump | Architect @ KISSmetrics Real-time Large Queries

Upload: planet-cassandra

Post on 06-Dec-2014

1.412 views

Category:

Technology


2 download

DESCRIPTION

Flash Memory technology, deployed as server-side PCIe or solid state disks (SSDs), is emerging as a critical tool for performance and efficiency in data centers of all scales. This presentation will discuss how the use of Flash impacts Cassandra deployments in terms of configuration, DRAM requirements and performance expectations. Ideas on leveraging C*'s cutting-edge data-center awareness to blend flash and disk storage nodes for cost and workload efficiency will also be shared. Flash media itself will be examined from a physical perspective to understand endurance issues. Data on write amplification under bulk-load and operational workload conditions will be presented to explain the impact to Flash of C*'s Log Structured Merge Tree architecture and the associated compactions. Finally, we will examine strategies to make Cassandra more Flash-aware using both conventional techniques as well as emerging Non-volatile memory (NVM) programming capabilities. Lessons learned from real-world customer deployments will be shared to complete this presentation.

TRANSCRIPT

Page 2: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

Page 3: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

KISSmetrics Customers Want*Churn Prediction

*AB Tests

*Which Blog Posts and Ad Campaigns Attract High Value Customers?

*User Conversion Funnel

*Revenue Prediction

*Customer Acquisition Costs

*Customer Lifetime Value

Page 4: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

Understanding Queries

Page 5: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

RowKey username first_name last_name postal_code

cstar cstar Cassandra Database 94110

user2 user2 Some Guy 94112

Page 6: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

RowKey username first_name last_name postal_code

cstar cstar Cassandra Database 94110

user2 user2 Some Guy 94112

Page 7: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

RowKey username first_name last_name postal_code

cstar cstar Cassandra Database 94110

user2 user2 Some Guy 94112

Page 8: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

RowKey

94110 cstar

94112 user2 user4 user7 ...

Page 9: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

Where Secondary Indexes Break

Source: Place source content or footnote here. Delete if not needed.

*High Cardinality Data

*Only one index per query

*Indexes are distributed

*Only some datatypes; no counters

*Range queries are expensive

Page 10: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

What Do I Want?

Source: Place source content or footnote here. Delete if not needed.

*Index high cardinality data; e.g. counters

*Complex queries, with multiple clauses

*Results in < 500ms for billions of rows

*Sub-field searching with regular expressions

*Range queries

Page 11: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

Bitmap and Bit-Slice Indexes

Page 12: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

Page 13: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

RowKey

94110 cstar

94112 user2 user4 user7 ...

Page 14: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

RowKey

94110 00001000 01000000 00000000 000000000

94112 10000110 01000000 00000000 000000000

Page 15: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

RowKey

94110 00001000 01000000 00000000 000000000

94112 10000110 01000000 00000000 000000000

hash(“cstar”) = 4

Page 16: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

SELECT * FROM users WHERE zipcode = 94110 OR zipcode = 94112

94112 or 94110

10001110 01000000 00000000 000000000

Field Index

94110 10001010 01000000 00000000 000000000

94112 10000110 01000000 00000000 000000000

Page 17: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

SELECT * FROM users WHERE Event1 = true AND Event2 = true

Event1 and Event2

10000010 01000000 00000000 000000000

Field Index

Event1 10001010 01000000 00000000 000000000

Event2 10000110 01000000 00000000 000000000

Page 18: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

Field Value Slice

event_counter 1 10001010 01000000 00000000 000000000

event_counter 2 10000110 01000000 00000000 000000000

SELECT * FROM users WHERE event_counter < 5

Value1 or Value2

10000010 01000000 00000000 000000000

Page 19: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

"this is a test string"

Page 20: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

['thi', 's i', 's a', ' te', 'st ', 'str', 'ing']

Page 21: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

['0x746869', '0x732069', '0x732061', '0x207465', '0x737420', '0x737472', '0x696e67']

Page 22: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

Field Value Slice

text_field 0x207465 ' te' 10001010 01000000 00000000 000000000

text_field 0x696e67 'ing' 10111110 10001000 00000000 000001000

text_field 0x732061 's a' 10001010 01000001 00001000 110101110

text_field 0x732069 's i' 10001010 01000000 10110011 000000000

text_field 0x737420 'st ' 10001010 01001100 10110111 000000000

text_field 0x737472 'str' 10001010 01000000 00011010 011000000

text_field 0x746869 'thi' 10001010 01000000 10110111 000000010

Page 23: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

"thi.*ing"

Page 24: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

"thi" AND "ing"

Page 25: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

0x746869 AND 0x696e67

Page 26: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

Field Value Slice

text_field 0x207465 ' te' 10001010 01000000 00000000 000000000

text_field 0x696e67 'ing' 10111110 10001000 00000000 000001000

text_field 0x732061 's a' 10001010 01000001 00001000 110101110

text_field 0x732069 's i' 10001010 01000000 10110011 000000000

text_field 0x737420 'st ' 10001010 01001100 10110111 000000000

text_field 0x737472 'str' 10001010 01000000 00011010 011000000

text_field 0x746869 'thi' 10001010 01000000 10110111 000000010

Page 27: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

"th.*ing"

Page 28: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

"th" AND "ing"

Page 29: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

range(0x746800, 0x7468FF) AND 0x696e67

range("th" + 0x00, "th" + 0xFF) AND "ing"

Page 30: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

Field Value Slice

text_field 0x207465 ' te' 10001010 01000000 00000000 000000000

text_field 0x696e67 'ing' 10111110 10001000 00000000 000001000

text_field 0x732061 's a' 10001010 01000001 00001000 110101110

text_field 0x732069 's i' 10001010 01000000 10110011 000000000

text_field 0x737420 'st ' 10001010 01001100 10110111 000000000

text_field 0x737472 'str' 10001010 01000000 00011010 011000000

text_field 0x746869 'thi' 10001010 01000000 10110111 000000010

text_field 0x74687A 'thz' 10000000 00000001 00011100 000110010

range(0x746800, 0x7468FF) AND 0x696e67

Page 31: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

Page 32: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

Implementation

Page 33: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

Query & Indexing Engine

Queries and Events

Page 34: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

RowKey Offset 0x00 Offset 0x01 Offset 0x02 Offset 0x03

event1_0x00 10011000 10011000

event1_0x01 10011000 10011000 10011000

Page 35: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

Results So Far*Results returned for an 8 clause query for 4 billion rows < 2 second

*Full regular expression support

*Full support for range queries

*Ability to index any numeric value, or value which can be hashed.

Page 36: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

What isn't finished*Support for atomic counters

*"Group By" query aggregation

*Still working on event processing and distribution

Page 37: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

https://github.com/project-z/

Page 38: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

[email protected]@mattstump

Page 39: C* Summit 2013: Cassandra on Flash: Performance & Efficiency Lessons Learned by Matt Stump

#CASSANDRA13

THANK YOU