positional update handling in column stores
DESCRIPTION
Sándor Héman Marcin Zukowski Niels Nes Lefteris Sidirourgos Peter Boncz. Positional Update Handling in Column Stores. Jim Gray, 1981. UPDATE IN PLACE: A Poison Apple?. “..for performance reasons, most disc-based systems have been seduced into updating the data in place.”. - PowerPoint PPT PresentationTRANSCRIPT
Sándor HémanMarcin ZukowskiNiels NesLefteris SidirourgosPeter Boncz
Positional Update Handlingin Column Stores
UPDATE IN PLACE:A Poison Apple?Jim Gray, 1981
“..for performance reasons, most disc-based systems have been seduced into updating the data in place.”
30 years of hardware improvements in sequential/throughput beating random/latency…. in-place less feasible every year.
alternative: differential approach.In column stores, in-place updating is by now clearly infeasible
Problem: Column Store Updates
• I/O proportional to number of attributes– I/O blocks large and compressed– Sometimes even replicated– Read-Optimized Update-Unfriendly
• Table often kept ordered on sort-key (SK) attributes– Uniform update load scattered write access
Solution: Differential Structure
• Maintain updates (INS/DEL/MOD) in a differential structure– Merge with base table during scan
Solution: Differential Structure
• Maintain updates (INS/DEL/MOD) in a differential structure– Merge with base table during scan
• Challenges:– Efficiently maintainable data-structure– Minimize Merge impact for read-only queries
Naïve Approach: Delta Tables
• For each table, maintain two update friendly row-store tables:– INS(C1..Cn)– DEL(SK1..SKm)– MOD = DEL + INS
store prod new qty
London stool N 10
London table N 20
Paris rug N 1
Paris stool N 5
Base table: inventorySort-Key (SK): [store, prod]
store prod new qty
Berlin chair Y 5
Berlin cloth Y 20
Inserts table: INS
store prod
Paris rug
Deletes table: DEL
Naïve Approach: Delta Tables
store prod new qty
London stool N 10
London table N 20
Paris rug N 1
Paris stool N 5
Base table: inventorySort-Key (SK): [store, prod]
store prod new qty
Berlin chair Y 5
Berlin cloth Y 20
Inserts table: INS
store prod
Paris rug
Deletes table: DEL
• Rewrite table scans:MergeUnion[store,prod](Scan(INS), MergeDiff[store,prod]( Scan(Inventory), Scan(DEL)))
Naïve Approach: Delta Tables
• Rewrite table scans:MergeUnion[store,prod](Scan(INS), MergeDiff[store,prod]( Scan(Inventory), Scan(DEL)))
for up-to-date image• Expensive!
– I/O to scan SK ‘merge’ columns; also if querydoes not need SK cols
– Each query pays CPU effort to locate the same change positions over and over again
store prod new qty
Berlin chair Y 5
Berlin cloth Y 20
London stool N 10
London table N 20
Paris stool N 5
Actual table: inventorySort-Key (SK): [store, prod]
The Idea: Positional Updates• Remember the position of an update rather than
its SK values– Merge once at write Read-Optimized approach– No need to scan SK columns– Scan can skip less CPU overhead
Notation:• TABLEx state of TABLE at time x• SID(t): StableID
– Position of tuple t in immutable base TABLE0 Stable• RIDx(t): RowID
– Position of visible tuple t at time x VOLATILE!– SID(t) = RID0(t)
SID STORE PROD NEW QTY RID
0 Berlin chair Y 5 00 Berlin cloth Y 20 10 Berlin table Y 10 20 London chair N 30 31 London stool N 10 42 London table N 20 53 Paris rug N 1 64 Paris stool N 5 7
SID/RID Example
INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20)INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5)
TABLE1
SID STORE PROD NEW QTY RID
0 London chair N 30 01 London stool N 10 12 London table N 20 23 Paris rug N 1 34 Paris stool N 5 4
TABLE0
SIDs and RIDs
• RID(t) = SID(t) + ∆(t)• ∆(t) = #inserts before t – #deletes before t
= RID(t) – SID(t)• SID and RID are monotonically increasing
– organize positional updates on SID in a counting B-Tree that keeps track cumulative deltas (∆)• Positional Delta Tree (PDT)
– SIDs are stable– Only need to maintain cumulative ∆ on path root leaf
PDT Example
STORE PROD NEW QTYBerlin table Y 10Berlin cloth Y 20Berlin chair Y 5
INSERT INTO inventory VALUES(‘Berlin’, ‘table’, Y, 10)INSERT INTO inventory VALUES(‘Berlin’, ‘cloth’, Y, 20)INSERT INTO inventory VALUES(‘Berlin’, ‘chair’, Y, 5)
02 1
SID
∆
0 0ins insi2 i1
SIDtypevalue
0insi0
SIDtypevalue
Insert Value Table
i0i1i2
SID STORE PROD NEW QTY RID
0 London chair N 30 01 London stool N 10 12 London table N 20 23 Paris rug N 1 34 Paris stool N 5 4
TABLE0
PDT Example
TABLE1
DELETE FROM inventory WHERE store = ‘Berlin’ AND prod = ‘table’DELETE FROM inventory WHERE store = ‘Paris’ AND prod = ‘rug’
SID STORE PROD NEW QTY RID
0 Berlin chair Y 5 0
0 Berlin cloth Y 20 1
0 Berlin table Y 10 2
0 London chair N 30 31 London stool N 10 42 London table N 20 53 Paris rug N 1 64 Paris stool N 5 7
STORE PROD NEW QTYBerlin table Y 10Berlin cloth Y 20Berlin chair Y 5
02 -1
SID
∆
0 0ins insi2 i1
SIDtypevalue
3deld0
SIDtypevalue
Insert Value Table
i0i1i2
PDT Example
TABLE2
INSERT INTO inventory VALUES (‘Paris’, ‘rack’, Y, 4)
SID STORE PROD NEW QTY RID
0 Berlin chair Y 5 0
0 Berlin cloth Y 20 1
0 London chair N 30 21 London stool N 10 32 London table N 20 44 Paris stool N 5 5
Insert at RID = 5
STORE PROD NEW QTYBerlin table Y 20Berlin cloth Y 5Berlin chair Y 10
02 -1
SID
∆
0 0ins insi2 i1
SIDtypevalue
3deld0
SIDtypevalue
Insert Value Table
i0i1i2
STORE PROD NEW QTYParis rack Y 4Berlin cloth Y 20Berlin chair Y 5
02 -1
SID
∆
0 0ins insi2 i1
SIDtypevalue
3 3ins deli0 d0
SIDtypevalue
Insert Value Table
i0i1i2
RID 5 > 0 + 2
PDT Example
0 0ins insi2 i1
SIDtypevalue
0 10 1RID
∆
0insi4
SIDtypevalue
22RID
∆
1insi3
SIDtypevalue
34RID
∆
3 3ins deli0 d0
SIDtypevalue
4 57 8RID
∆
02 1
SID
∆
RID∆ 2
2
31 0
SID
∆
RID∆ 4
7
13 1
SID
∆
RID∆ 3
4
INSERT INTO inventory VALUES (‘London’, ‘rack’, Y, 4)INSERT INTO inventory VALUES (‘Berlin’, ‘rack’, Y, 4)
Separator SIDsSubtree ∆
Separator RIDsRunning ∆
Stacking PDTs• Arbitrary number of layers: “deltas on deltas on ..”
– RID domain of child PDT = SID domain of parent PDTgeneralization:• PDT contains all differences in time [lo,hi]
Table
PDT
PDT
PDT
lohi
PDT t1t2
PDT t0t1
consecutive t2=t1
PDT t2t3 PDT t0
t1vs are
Table
PDT
PDT
PDT
PDT
PDT
PDT PDT t2t3
Stacking PDTs• Arbitrary number of layers: “deltas on deltas on ..”
– RID domain of child PDT = SID domain of parent PDTgeneralization:• PDT contains all differences in time [lo,hi]
Table
lohi
consecutive t2=t1aligned t2=t0
“same base”
PDT t2t3 PDT t0
t1vs are
Table
PDT PDT
Stacking PDTs• Arbitrary number of layers: “deltas on deltas on ..”
– RID domain of child PDT = SID domain of parent PDTgeneralization:• PDT contains all differences in time [lo,hi]
Table
lohi
consecutive t2=t1aligned t2=t0
“same base”overlapping [t2,t3] overlaps [t0,t1]“uncomparable” / “incompatible”
PDT t2t3 PDT t0
t1vs are
Table
PDT PDT
PDT
Stacking for Isolation• ‘lock’ PDT down for further updates
– Immutable read-PDT BIG: main memory resident• ‘stack’ empty PDT on top
– Updateable write-PDT SMALL: L2 cache resident– Note: PDTs are consecutive
• once in a while changes are propagated– Propagate() operation
• Requires consecutive PDTs
Stable Table
Read-PDT
Write-PDTTABLEx
Propagate()Read-PDT
Stable Table
Read-PDT
Write-PDTTABLEx
Write-PDT
Trans PDT
CopyWrite-PDT
TransactionState
Snapshot Isolation• Transaction creates
snapshot copy of write-PDT
• Updates go into trans-PDT
• On commit, Propagate() trans-PDT into write-PDT
Propagate()
Optimistic Concurrency Control
Stable Table
Read-PDT
Write-PDT
Trans PDT
TABLEx
CopyWrite-PDT
TransA Trans
PDT
CopyWrite-PDT
TransB
• Two concurrent transactions
Optimistic Concurrency Control
Stable Table
Read-PDT
Trans PDT
TABLEx
CopyWrite-PDT
TransA Trans
PDT
CopyWrite-PDT
TransB
Propagate()
Write-PDT
• Two concurrent transactions• A commits before B
Optimistic Concurrency Control
Stable Table
Read-PDT
Write-PDT
Trans PDT
TABLEx
TransA Trans
PDT
CopyWrite-PDT
TransB
Pro
paga
te()
• Two concurrent transactions• A commits before B• Can not commit B into
modified write-PDT!– A changed RID enumeration
Optimistic Concurrency Control
Stable Table
Read-PDT
Write-PDT
Trans PDT
TABLEx
TransA
TransB
Serialize()Trans PDT
• Two concurrent transactions• A commits before B• Can not commit B into
modified write-PDT!– A changed RID enumeration
• Serialize(A, B)– Makes aligned PDTs consecutive– MAY FAIL!! trans abort
= succeeds if no conflict= write set intersection
Consecutive!Trans PDT
Optimistic Concurrency Control
Stable Table
Read-PDT
Trans PDT
TABLEx
TransA Trans
PDT
TransB
Write-PDT
Prop
agat
e()
• Two concurrent transactions• A commits before B• Can not commit B into
modified write-PDT!– A changed RID enumeration
• Serialize(A, B)– Makes aligned PDTs consecutive– MAY FAIL!! trans abort
= succeeds if no conflict= write set intersection
Extend to any number of concurrent transactions by serializing against all PDTs of transactions that committed during its lifetime
(a.k.a. backward looking OCC)
Serialize()
Concluding..
• PDTs speed-up differential update merging– Reduced I/O volume– Reduced CPU merge overhead
• Tree structure – logarithmic lookup & maintenance of volatile RIDs– main operations: Merge(), Propagate(), Serialize()
• PDTs are stackable, and capture Write-Set– Great structure for Snapshot Isolation
• Formal definitions, algorithms and benchmarks in paper
Thank you!
Microbenchmarks
TPCH-30