cassandra 2.1 boot camp, compaction

Compaction

● Overview● Compaction strategies● Tombstones● Code walkthrough

Agenda

● SSTables immutable● Get rid of duplicate/overwritten data● Drop deleted data and tombstones

● Manually, nodetool compact / scrub ...● When we add sstables

○ After flush○ Once a compaction is done○ After streaming

● Search for usages of○ o.a.c.db.compaction.

CompactionManager#submitBackground

Types of compaction● Minor - runs automatically in the background● Major - includes all sstables, only for size tiered

compaction● Single-sstable compactions

○ upgradesstables○ scrub○ cleanup

● Anticompaction○ After incremental repair to split out repaired/unrepaired data

Compaction strategies● Pluggable interface● Strategies decide

○ what sstables to compact○ how big they should be○ what implementation of CompactionTask to use

● Strategies can get notified when adding new sstables○ Makes it possible to make smart decisions when deciding which

sstables to compact○ LCS does this to keep track of what sstables are in each level

SizeTieredCompactionStrategy● Combines sstables based on their size● Skips sstables that are ‘cold’ - not read much

LeveledCompactionStrategy● Keeps levels of non-overlapping sstables● Each level is 10x the size of the previous one● All sstables in levels 1+ are about the same size

(160MB)● L0 is the dumping ground, overlapping, larger sstables

Tombstones● Write a tombstone to delete data● Covers data, but only data that is older than

the tombstone● Drop covered data during compaction

When can we drop tombstones?● Once the tombstone has existed

gc_grace_seconds● When the tombstone is guaranteed to not

cover any data on the node○ All sstables containing the key are included in the

compaction○ The other sstables where the key exists only contain

newer data

Code walkthrough

CompactionManager

● submitBackground○ Trigger minor compaction○ Fill executor with BackgroundCompactionTasks

● BackgroundCompactionTask● submitMaximal

○ Major compaction○ Not blocking, get() the future to block○ runWithCompactionsDisabled

● OneSSTableOperation○ Common way to run the single-sstable compactions in parallel

CompactionTask

● Gets executed in the CompactionExecutor and does the actual compacting

● Eventually calls runWith(..) which is where the magic happens

CompactionTask

CompactionController

● Keep track of overlapping sstables○ Is the currently compacting key in any other sstable?

● maxPurgeableTimestamp(DecoratedKey key)○ How old tombstones do we need to keep?○ Worst case, currently compacting key is the oldest in that sstable

SSTableRewriter

● Open compaction results early

SSTableWriter● Writes sstables…● Give it rows, it writes index, data file, sstable metadata

files etc● openEarly(..)

○ link index and data files○ in-memory-fake the rest of the files

● Collect SSTable metadata

SSTable metadata● Collected whenever an sstable is written● StatsMetadata

○ Kept on-heap○ min/maxTimestamp○ min/maxColumnNames○ sstableLevel

● CompactionMetadata○ Deserialized when needed○ ancestors○ cardinalityEstimator - HyperLogLog signature

● ValidationMetadata○ Used to validate sstables when opening

Iterators all the way downa 1 2 3

b 2 3 5

d .. .. ..

a 2 5 7

b 2 4 5

e .. .. ..

a 1 2 3 5 7

b 2 3 4 5

d .. .. .. .. ..

e .. .. .. .. ..

● “Partition iterator” for each sstable (SSTableScanner)

● “Cell iterator” for each partition (OnDiskAtomIterator)

● MergeIterator (MI) that takes a number of (sorted) iterators and merges them

● One MI for sstables that merges partitions

● One MI for each partition that merges cells

MergeIterator● Interesting implementation is ManyToOne● Merges many sorted iterators into one● Reducer

○ reduce(..) gets called for every version that should be reduced

○ getReduced() gets called when all versions with the same name/priority/value has been reduce():ed

MergeIterator1. call next()2. poll one item out of the PQ3. Reducer.reduce(..)4. goto 2, until we find an item

that differs5. Call next() on the iterators

you polled6. Re-add the iterators to the PQ7. return Reducer.getReduced

CompactionIterable

● Creates LazilyCompactedRow● Simple Reducer

LazilyCompactedRow

● “Lazy” because we don’t deserialize until we need to

● Uses a MergeIterator to merge the rows● Drops tombstones if possible

○ Uses CompactionController for this

cassandra 2.1 boot camp, compaction

sstables immutable

larger sstables

new sstables

data covers data

track of overlapping

size skips sstables

data files

covered data

Technology

soil compaction and compaction equipment

apache cassandra™...

cassandra summit 2014: cassandra compute cloud: an elastic...

distributed counters in cassandra (cassandra summit 2010)

la cassandra day 2015 - testing cassandra

cassandra summit 2014: cassandra at instagram 2014

state of cassandra, 2012 - nosql | apache cassandra ·...

avanzado control de calidad en compactación...•asphalt...

compaction, compaction everywhere

cassandra day nyc - cassandra anti patterns

apache cassandra at target - cassandra summit 2014

introduction to cassandrafiles.meetup.com/14849742/london...

cassandra freeman - thoughtful...

cassandra: beyond bigtable · bigtable + dynamo •lsmt /...

cassandra day atlanta 2015: python & cassandra

running cassandra on amazon’s ecs -...

cabs, cassandra, and hailo (at cassandra eu)

the missing manual for leveled compaction strategy (wei deng...

cassandra 2.1 boot camp, protocol, queries, cql

cassandra at ebay - cassandra summit 2012