cassandra 2.1 boot camp, compaction

Post on 29-Jun-2015

475 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Cassandra Summit Boot Camp, 2014 Protocol, Queries, and Cell Names Marcus Eriksson presenter

TRANSCRIPT

Compaction

● Overview● Compaction strategies● Tombstones● Code walkthrough

Agenda

Why?

● SSTables immutable● Get rid of duplicate/overwritten data● Drop deleted data and tombstones

When?

● Manually, nodetool compact / scrub ...● When we add sstables

○ After flush○ Once a compaction is done○ After streaming

● Search for usages of○ o.a.c.db.compaction.

CompactionManager#submitBackground

Types of compaction● Minor - runs automatically in the background● Major - includes all sstables, only for size tiered

compaction● Single-sstable compactions

○ upgradesstables○ scrub○ cleanup

● Anticompaction○ After incremental repair to split out repaired/unrepaired data

Compaction strategies● Pluggable interface● Strategies decide

○ what sstables to compact○ how big they should be○ what implementation of CompactionTask to use

● Strategies can get notified when adding new sstables○ Makes it possible to make smart decisions when deciding which

sstables to compact○ LCS does this to keep track of what sstables are in each level

SizeTieredCompactionStrategy● Combines sstables based on their size● Skips sstables that are ‘cold’ - not read much

LeveledCompactionStrategy● Keeps levels of non-overlapping sstables● Each level is 10x the size of the previous one● All sstables in levels 1+ are about the same size

(160MB)● L0 is the dumping ground, overlapping, larger sstables

Tombstones● Write a tombstone to delete data● Covers data, but only data that is older than

the tombstone● Drop covered data during compaction

When can we drop tombstones?● Once the tombstone has existed

gc_grace_seconds● When the tombstone is guaranteed to not

cover any data on the node○ All sstables containing the key are included in the

compaction○ The other sstables where the key exists only contain

newer data

Code walkthrough

CompactionManager

● submitBackground○ Trigger minor compaction○ Fill executor with BackgroundCompactionTasks

● BackgroundCompactionTask● submitMaximal

○ Major compaction○ Not blocking, get() the future to block○ runWithCompactionsDisabled

● OneSSTableOperation○ Common way to run the single-sstable compactions in parallel

CompactionTask

● Gets executed in the CompactionExecutor and does the actual compacting

● Eventually calls runWith(..) which is where the magic happens

CompactionTask

CompactionController

● Keep track of overlapping sstables○ Is the currently compacting key in any other sstable?

● maxPurgeableTimestamp(DecoratedKey key)○ How old tombstones do we need to keep?○ Worst case, currently compacting key is the oldest in that sstable

SSTableRewriter

● Open compaction results early

SSTableWriter● Writes sstables…● Give it rows, it writes index, data file, sstable metadata

files etc● openEarly(..)

○ link index and data files○ in-memory-fake the rest of the files

● Collect SSTable metadata

SSTable metadata● Collected whenever an sstable is written● StatsMetadata

○ Kept on-heap○ min/maxTimestamp○ min/maxColumnNames○ sstableLevel

● CompactionMetadata○ Deserialized when needed○ ancestors○ cardinalityEstimator - HyperLogLog signature

● ValidationMetadata○ Used to validate sstables when opening

Iterators all the way downa 1 2 3

b 2 3 5

d .. .. ..

a 2 5 7

b 2 4 5

e .. .. ..

a 1 2 3 5 7

b 2 3 4 5

d .. .. .. .. ..

e .. .. .. .. ..

● “Partition iterator” for each sstable (SSTableScanner)

● “Cell iterator” for each partition (OnDiskAtomIterator)

● MergeIterator (MI) that takes a number of (sorted) iterators and merges them

● One MI for sstables that merges partitions

● One MI for each partition that merges cells

MergeIterator● Interesting implementation is ManyToOne● Merges many sorted iterators into one● Reducer

○ reduce(..) gets called for every version that should be reduced

○ getReduced() gets called when all versions with the same name/priority/value has been reduce():ed

MergeIterator1. call next()2. poll one item out of the PQ3. Reducer.reduce(..)4. goto 2, until we find an item

that differs5. Call next() on the iterators

you polled6. Re-add the iterators to the PQ7. return Reducer.getReduced

CompactionIterable

● Creates LazilyCompactedRow● Simple Reducer

LazilyCompactedRow

● “Lazy” because we don’t deserialize until we need to

● Uses a MergeIterator to merge the rows● Drops tombstones if possible

○ Uses CompactionController for this

top related