positional update handling in column stores

Sándor HémanMarcin ZukowskiNiels NesLefteris SidirourgosPeter Boncz

Positional Update Handlingin Column Stores

http://www.cwi.nl/

http://www.cwi.nl/

UPDATE IN PLACE:A Poison Apple?Jim Gray, 1981

“..for performance reasons, most disc-based systems have been seduced into updating the data in place.”

30 years of hardware improvements in sequential/throughput beating random/latency…. in-place less feasible every year.

alternative: differential approach.In column stores, in-place updating is by now clearly infeasible

http://www.cwi.nl/

Problem: Column Store Updates

• I/O proportional to number of attributes– I/O blocks large and compressed– Sometimes even replicated– Read-Optimized Update-Unfriendly

• Table often kept ordered on sort-key (SK) attributes– Uniform update load scattered write access

http://www.cwi.nl/

Solution: Differential Structure

• Maintain updates (INS/DEL/MOD) in a differential structure– Merge with base table during scan

http://www.cwi.nl/

Solution: Differential Structure

• Maintain updates (INS/DEL/MOD) in a differential structure– Merge with base table during scan

• Challenges:– Efficiently maintainable data-structure– Minimize Merge impact for read-only queries

http://www.cwi.nl/

Naïve Approach: Delta Tables

• For each table, maintain two update friendly row-store tables:– INS(C1..Cn)– DEL(SK1..SKm)– MOD = DEL + INS

store prod new qty

London stool N 10

London table N 20

Paris rug N 1

Paris stool N 5

Base table: inventorySort-Key (SK): [store, prod]

store prod new qty

Berlin chair Y 5

Berlin cloth Y 20

Inserts table: INS

store prod

Paris rug

Deletes table: DEL

http://www.cwi.nl/

http://www.cwi.nl/


store prod new qty

London stool N 10

London table N 20

Paris rug N 1

Paris stool N 5

Base table: inventorySort-Key (SK): [store, prod]

store prod new qty

Berlin chair Y 5

Berlin cloth Y 20

Inserts table: INS

store prod

Paris rug

Deletes table: DEL

• Rewrite table scans:MergeUnion[store,prod](Scan(INS), MergeDiff[store,prod]( Scan(Inventory), Scan(DEL)))

http://www.cwi.nl/

http://www.cwi.nl/


• Rewrite table scans:MergeUnion[store,prod](Scan(INS), MergeDiff[store,prod]( Scan(Inventory), Scan(DEL)))

for up-to-date image• Expensive!

– I/O to scan SK ‘merge’ columns; also if querydoes not need SK cols

– Each query pays CPU effort to locate the same change positions over and over again

store prod new qty

Berlin chair Y 5

Berlin cloth Y 20

London stool N 10

London table N 20

Paris stool N 5

Actual table: inventorySort-Key (SK): [store, prod]

http://www.cwi.nl/

http://www.cwi.nl/

The Idea: Positional Updates• Remember the position of an update rather than

its SK values– Merge once at write Read-Optimized approach– No need to scan SK columns– Scan can skip less CPU overhead

Notation:• TABLEx state of TABLE at time x• SID(t): StableID

– Position of tuple t in immutable base TABLE0 Stable• RIDx(t): RowID

– Position of visible tuple t at time x VOLATILE!– SID(t) = RID0(t)

http://www.cwi.nl/

SID STORE PROD NEW QTY RID

0 Berlin chair Y 5 00 Berlin cloth Y 20 10 Berlin table Y 10 20 London chair N 30 31 London stool N 10 42 London table N 20 53 Paris rug N 1 64 Paris stool N 5 7

SID/RID Example

INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20)INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5)

TABLE1


0 London chair N 30 01 London stool N 10 12 London table N 20 23 Paris rug N 1 34 Paris stool N 5 4

TABLE0

http://www.cwi.nl/

SIDs and RIDs

• RID(t) = SID(t) + ∆(t)• ∆(t) = #inserts before t – #deletes before t

= RID(t) – SID(t)• SID and RID are monotonically increasing

– organize positional updates on SID in a counting B-Tree that keeps track cumulative deltas (∆)• Positional Delta Tree (PDT)

– SIDs are stable– Only need to maintain cumulative ∆ on path root leaf

http://www.cwi.nl/

PDT Example

STORE PROD NEW QTYBerlin table Y 10Berlin cloth Y 20Berlin chair Y 5

INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20)INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5)

02 1

SID

∆

0 0ins insi2 i1

SIDtypevalue

0insi0

SIDtypevalue

Insert Value Table

i0i1i2



TABLE0

http://www.cwi.nl/

PDT Example

TABLE1

DELETE FROM inventory WHERE store = ‘Berlin’ AND prod = ‘table’DELETE FROM inventory WHERE store = ‘Paris’ AND prod = ‘rug’


0 Berlin chair Y 5 0

0 Berlin cloth Y 20 1

0 Berlin table Y 10 2



02 -1

SID

∆

0 0ins insi2 i1

SIDtypevalue

3deld0

SIDtypevalue

Insert Value Table

i0i1i2

http://www.cwi.nl/

PDT Example

TABLE2

INSERT INTO inventory VALUES (‘Paris’, ‘rack’, Y, 4)


0 Berlin chair Y 5 0

0 Berlin cloth Y 20 1

0 London chair N 30 21 London stool N 10 32 London table N 20 44 Paris stool N 5 5

Insert at RID = 5


02 -1

SID

∆

0 0ins insi2 i1

SIDtypevalue

3deld0

SIDtypevalue

Insert Value Table

i0i1i2

STORE PROD NEW QTYParis rack Y 4Berlin cloth Y 20Berlin chair Y 5

02 -1

SID

∆

0 0ins insi2 i1

SIDtypevalue

3 3ins deli0 d0

SIDtypevalue

Insert Value Table

i0i1i2

RID 5 > 0 + 2

http://www.cwi.nl/

PDT Example

0 0ins insi2 i1

SIDtypevalue

0 10 1RID

∆

0insi4

SIDtypevalue

22RID

∆

1insi3

SIDtypevalue

34RID

∆

3 3ins deli0 d0

SIDtypevalue

4 57 8RID

∆

02 1

SID

∆

RID∆ 2

2

31 0

SID

∆

RID∆ 4

7

13 1

SID

∆

RID∆ 3

4

INSERT INTO inventory VALUES (‘London’, ‘rack’, Y, 4)INSERT INTO inventory VALUES (‘Berlin’, ‘rack’, Y, 4)

Separator SIDsSubtree ∆

Separator RIDsRunning ∆

http://www.cwi.nl/

Stacking PDTs• Arbitrary number of layers: “deltas on deltas on ..”

– RID domain of child PDT = SID domain of parent PDTgeneralization:• PDT contains all differences in time [lo,hi]

Table

PDT

PDT

PDT

lohi

PDT t1t2

PDT t0t1

consecutive t2=t1

PDT t2t3 PDT t0

t1vs are

Table

PDT

PDT

PDT

PDT

PDT

PDT PDT t2t3

http://www.cwi.nl/



Table

lohi

consecutive t2=t1aligned t2=t0

“same base”

PDT t2t3 PDT t0

t1vs are

Table

PDT PDT

http://www.cwi.nl/



Table

lohi

consecutive t2=t1aligned t2=t0

“same base”overlapping [t2,t3] overlaps [t0,t1]“uncomparable” / “incompatible”

PDT t2t3 PDT t0

t1vs are

Table

PDT PDT

PDT

http://www.cwi.nl/

Stacking for Isolation• ‘lock’ PDT down for further updates

– Immutable read-PDT BIG: main memory resident• ‘stack’ empty PDT on top

– Updateable write-PDT SMALL: L2 cache resident– Note: PDTs are consecutive

• once in a while changes are propagated– Propagate() operation

• Requires consecutive PDTs

Stable Table

Read-PDT

Write-PDTTABLEx

Propagate()Read-PDT

http://www.cwi.nl/

Stable Table

Read-PDT

Write-PDTTABLEx

Write-PDT

Trans PDT

CopyWrite-PDT

TransactionState

Snapshot Isolation• Transaction creates

snapshot copy of write-PDT

• Updates go into trans-PDT

• On commit, Propagate() trans-PDT into write-PDT

Propagate()

http://www.cwi.nl/

Optimistic Concurrency Control

Stable Table

Read-PDT

Write-PDT

Trans PDT

TABLEx

CopyWrite-PDT

TransA Trans

PDT

CopyWrite-PDT

TransB

• Two concurrent transactions

http://www.cwi.nl/


Stable Table

Read-PDT

Trans PDT

TABLEx

CopyWrite-PDT

TransA Trans

PDT

CopyWrite-PDT

TransB

Propagate()

Write-PDT

• Two concurrent transactions• A commits before B

http://www.cwi.nl/


Stable Table

Read-PDT

Write-PDT

Trans PDT

TABLEx

TransA Trans

PDT

CopyWrite-PDT

TransB

Pro

paga

te()

• Two concurrent transactions• A commits before B• Can not commit B into

modified write-PDT!– A changed RID enumeration

http://www.cwi.nl/


Stable Table

Read-PDT

Write-PDT

Trans PDT

TABLEx

TransA

TransB

Serialize()Trans PDT



• Serialize(A, B)– Makes aligned PDTs consecutive– MAY FAIL!! trans abort

= succeeds if no conflict= write set intersection

Consecutive!Trans PDT

http://www.cwi.nl/


Stable Table

Read-PDT

Trans PDT

TABLEx

TransA Trans

PDT

TransB

Write-PDT

Prop

agat

e()



• Serialize(A, B)– Makes aligned PDTs consecutive– MAY FAIL!! trans abort

= succeeds if no conflict= write set intersection

Extend to any number of concurrent transactions by serializing against all PDTs of transactions that committed during its lifetime

(a.k.a. backward looking OCC)

Serialize()

http://www.cwi.nl/

Concluding..

• PDTs speed-up differential update merging– Reduced I/O volume– Reduced CPU merge overhead

• Tree structure – logarithmic lookup & maintenance of volatile RIDs– main operations: Merge(), Propagate(), Serialize()

• PDTs are stackable, and capture Write-Set– Great structure for Snapshot Isolation

• Formal definitions, algorithms and benchmarks in paper

http://www.cwi.nl/

Thank you!

http://www.cwi.nl/

Microbenchmarks

http://www.cwi.nl/

TPCH-30

http://www.cwi.nl/

positional update handling in column stores

Documents