cassandra 2.1 boot camp, compaction
DESCRIPTION
Cassandra Summit Boot Camp, 2014 Protocol, Queries, and Cell Names Marcus Eriksson presenterTRANSCRIPT
Compaction
● Overview● Compaction strategies● Tombstones● Code walkthrough
Agenda
Why?
● SSTables immutable● Get rid of duplicate/overwritten data● Drop deleted data and tombstones
When?
● Manually, nodetool compact / scrub ...● When we add sstables
○ After flush○ Once a compaction is done○ After streaming
● Search for usages of○ o.a.c.db.compaction.
CompactionManager#submitBackground
Types of compaction● Minor - runs automatically in the background● Major - includes all sstables, only for size tiered
compaction● Single-sstable compactions
○ upgradesstables○ scrub○ cleanup
● Anticompaction○ After incremental repair to split out repaired/unrepaired data
Compaction strategies● Pluggable interface● Strategies decide
○ what sstables to compact○ how big they should be○ what implementation of CompactionTask to use
● Strategies can get notified when adding new sstables○ Makes it possible to make smart decisions when deciding which
sstables to compact○ LCS does this to keep track of what sstables are in each level
SizeTieredCompactionStrategy● Combines sstables based on their size● Skips sstables that are ‘cold’ - not read much
LeveledCompactionStrategy● Keeps levels of non-overlapping sstables● Each level is 10x the size of the previous one● All sstables in levels 1+ are about the same size
(160MB)● L0 is the dumping ground, overlapping, larger sstables
Tombstones● Write a tombstone to delete data● Covers data, but only data that is older than
the tombstone● Drop covered data during compaction
When can we drop tombstones?● Once the tombstone has existed
gc_grace_seconds● When the tombstone is guaranteed to not
cover any data on the node○ All sstables containing the key are included in the
compaction○ The other sstables where the key exists only contain
newer data
Code walkthrough
CompactionManager
● submitBackground○ Trigger minor compaction○ Fill executor with BackgroundCompactionTasks
● BackgroundCompactionTask● submitMaximal
○ Major compaction○ Not blocking, get() the future to block○ runWithCompactionsDisabled
● OneSSTableOperation○ Common way to run the single-sstable compactions in parallel
CompactionTask
● Gets executed in the CompactionExecutor and does the actual compacting
● Eventually calls runWith(..) which is where the magic happens
CompactionTask
CompactionController
● Keep track of overlapping sstables○ Is the currently compacting key in any other sstable?
● maxPurgeableTimestamp(DecoratedKey key)○ How old tombstones do we need to keep?○ Worst case, currently compacting key is the oldest in that sstable
SSTableRewriter
● Open compaction results early
SSTableWriter● Writes sstables…● Give it rows, it writes index, data file, sstable metadata
files etc● openEarly(..)
○ link index and data files○ in-memory-fake the rest of the files
● Collect SSTable metadata
SSTable metadata● Collected whenever an sstable is written● StatsMetadata
○ Kept on-heap○ min/maxTimestamp○ min/maxColumnNames○ sstableLevel
● CompactionMetadata○ Deserialized when needed○ ancestors○ cardinalityEstimator - HyperLogLog signature
● ValidationMetadata○ Used to validate sstables when opening
Iterators all the way downa 1 2 3
b 2 3 5
d .. .. ..
a 2 5 7
b 2 4 5
e .. .. ..
a 1 2 3 5 7
b 2 3 4 5
d .. .. .. .. ..
e .. .. .. .. ..
● “Partition iterator” for each sstable (SSTableScanner)
● “Cell iterator” for each partition (OnDiskAtomIterator)
● MergeIterator (MI) that takes a number of (sorted) iterators and merges them
● One MI for sstables that merges partitions
● One MI for each partition that merges cells
MergeIterator● Interesting implementation is ManyToOne● Merges many sorted iterators into one● Reducer
○ reduce(..) gets called for every version that should be reduced
○ getReduced() gets called when all versions with the same name/priority/value has been reduce():ed
MergeIterator1. call next()2. poll one item out of the PQ3. Reducer.reduce(..)4. goto 2, until we find an item
that differs5. Call next() on the iterators
you polled6. Re-add the iterators to the PQ7. return Reducer.getReduced
CompactionIterable
● Creates LazilyCompactedRow● Simple Reducer
LazilyCompactedRow
● “Lazy” because we don’t deserialize until we need to
● Uses a MergeIterator to merge the rows● Drops tombstones if possible
○ Uses CompactionController for this