cassandra 2.1 boot camp, compaction

24
Compaction

Upload: joshmckenzie

Post on 29-Jun-2015

474 views

Category:

Technology


1 download

DESCRIPTION

Cassandra Summit Boot Camp, 2014 Protocol, Queries, and Cell Names Marcus Eriksson presenter

TRANSCRIPT

Page 1: Cassandra 2.1 boot camp, Compaction

Compaction

Page 2: Cassandra 2.1 boot camp, Compaction

● Overview● Compaction strategies● Tombstones● Code walkthrough

Agenda

Page 3: Cassandra 2.1 boot camp, Compaction

Why?

● SSTables immutable● Get rid of duplicate/overwritten data● Drop deleted data and tombstones

Page 4: Cassandra 2.1 boot camp, Compaction

When?

● Manually, nodetool compact / scrub ...● When we add sstables

○ After flush○ Once a compaction is done○ After streaming

● Search for usages of○ o.a.c.db.compaction.

CompactionManager#submitBackground

Page 5: Cassandra 2.1 boot camp, Compaction

Types of compaction● Minor - runs automatically in the background● Major - includes all sstables, only for size tiered

compaction● Single-sstable compactions

○ upgradesstables○ scrub○ cleanup

● Anticompaction○ After incremental repair to split out repaired/unrepaired data

Page 6: Cassandra 2.1 boot camp, Compaction

Compaction strategies● Pluggable interface● Strategies decide

○ what sstables to compact○ how big they should be○ what implementation of CompactionTask to use

● Strategies can get notified when adding new sstables○ Makes it possible to make smart decisions when deciding which

sstables to compact○ LCS does this to keep track of what sstables are in each level

Page 7: Cassandra 2.1 boot camp, Compaction

SizeTieredCompactionStrategy● Combines sstables based on their size● Skips sstables that are ‘cold’ - not read much

Page 8: Cassandra 2.1 boot camp, Compaction

LeveledCompactionStrategy● Keeps levels of non-overlapping sstables● Each level is 10x the size of the previous one● All sstables in levels 1+ are about the same size

(160MB)● L0 is the dumping ground, overlapping, larger sstables

Page 9: Cassandra 2.1 boot camp, Compaction

Tombstones● Write a tombstone to delete data● Covers data, but only data that is older than

the tombstone● Drop covered data during compaction

Page 10: Cassandra 2.1 boot camp, Compaction

When can we drop tombstones?● Once the tombstone has existed

gc_grace_seconds● When the tombstone is guaranteed to not

cover any data on the node○ All sstables containing the key are included in the

compaction○ The other sstables where the key exists only contain

newer data

Page 11: Cassandra 2.1 boot camp, Compaction

Code walkthrough

Page 12: Cassandra 2.1 boot camp, Compaction
Page 13: Cassandra 2.1 boot camp, Compaction

CompactionManager

● submitBackground○ Trigger minor compaction○ Fill executor with BackgroundCompactionTasks

● BackgroundCompactionTask● submitMaximal

○ Major compaction○ Not blocking, get() the future to block○ runWithCompactionsDisabled

● OneSSTableOperation○ Common way to run the single-sstable compactions in parallel

Page 14: Cassandra 2.1 boot camp, Compaction

CompactionTask

● Gets executed in the CompactionExecutor and does the actual compacting

● Eventually calls runWith(..) which is where the magic happens

Page 15: Cassandra 2.1 boot camp, Compaction

CompactionTask

Page 16: Cassandra 2.1 boot camp, Compaction

CompactionController

● Keep track of overlapping sstables○ Is the currently compacting key in any other sstable?

● maxPurgeableTimestamp(DecoratedKey key)○ How old tombstones do we need to keep?○ Worst case, currently compacting key is the oldest in that sstable

Page 17: Cassandra 2.1 boot camp, Compaction

SSTableRewriter

● Open compaction results early

Page 18: Cassandra 2.1 boot camp, Compaction

SSTableWriter● Writes sstables…● Give it rows, it writes index, data file, sstable metadata

files etc● openEarly(..)

○ link index and data files○ in-memory-fake the rest of the files

● Collect SSTable metadata

Page 19: Cassandra 2.1 boot camp, Compaction

SSTable metadata● Collected whenever an sstable is written● StatsMetadata

○ Kept on-heap○ min/maxTimestamp○ min/maxColumnNames○ sstableLevel

● CompactionMetadata○ Deserialized when needed○ ancestors○ cardinalityEstimator - HyperLogLog signature

● ValidationMetadata○ Used to validate sstables when opening

Page 20: Cassandra 2.1 boot camp, Compaction

Iterators all the way downa 1 2 3

b 2 3 5

d .. .. ..

a 2 5 7

b 2 4 5

e .. .. ..

a 1 2 3 5 7

b 2 3 4 5

d .. .. .. .. ..

e .. .. .. .. ..

● “Partition iterator” for each sstable (SSTableScanner)

● “Cell iterator” for each partition (OnDiskAtomIterator)

● MergeIterator (MI) that takes a number of (sorted) iterators and merges them

● One MI for sstables that merges partitions

● One MI for each partition that merges cells

Page 21: Cassandra 2.1 boot camp, Compaction

MergeIterator● Interesting implementation is ManyToOne● Merges many sorted iterators into one● Reducer

○ reduce(..) gets called for every version that should be reduced

○ getReduced() gets called when all versions with the same name/priority/value has been reduce():ed

Page 22: Cassandra 2.1 boot camp, Compaction

MergeIterator1. call next()2. poll one item out of the PQ3. Reducer.reduce(..)4. goto 2, until we find an item

that differs5. Call next() on the iterators

you polled6. Re-add the iterators to the PQ7. return Reducer.getReduced

Page 23: Cassandra 2.1 boot camp, Compaction

CompactionIterable

● Creates LazilyCompactedRow● Simple Reducer

Page 24: Cassandra 2.1 boot camp, Compaction

LazilyCompactedRow

● “Lazy” because we don’t deserialize until we need to

● Uses a MergeIterator to merge the rows● Drops tombstones if possible

○ Uses CompactionController for this