split snapshots and skippy indexing: long live the past! ross shaull liuba shrira brandeis...
TRANSCRIPT
![Page 1: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/1.jpg)
Split Snapshots and Skippy Indexing:Long Live the Past!
Ross Shaull <[email protected]>
Liuba Shrira <[email protected]>
Brandeis University
![Page 2: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/2.jpg)
Our Idea of a Snapshot
• A window to the past in a storage system• Access data as it was at time snapshot was
requested• System-wide• Snapshots may be kept forever
– I.e., “long-lived” snapshots
• Snapshots are consistent– Whatever that means…
• High frequency (up to CDP)
![Page 3: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/3.jpg)
Why Take Snapshots?
• Fix operator errors• Auditing
– When did Bob’s salary change, and who made the changes?
• Analysis– How much capital was tied up in blue shirts at the
beginning of this fiscal year?
• We don’t necessarily know now what will be interesting in the future
![Page 4: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/4.jpg)
BITE
• Give the storage system a new capability: Back-in-Time Execution
• Run read-only code against current state and any snapshot
• After issuing request for BITE, no special code required for accessing data in the snapshot
![Page 5: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/5.jpg)
Other Approaches: Databases
• ImmortalDB, Time-Split BTree (Lomet) – Reorganizes current state– Complex
• Snapshot isolation (PostgreSQL, Oracle)– Extension to transactions– Only for recent past
• Oracle FlashBack– Page-level copy of recent past (not forever)– Interface seems similar to BITE
![Page 6: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/6.jpg)
Other Approaches: FS
• WAFL (Hitz), ext3cow (Peterson)– Limited on-disk locality– Application-level consistency a challenge
• VSS (Sankaran)– Blocks disk requests– Suitable for backup-type frequency
![Page 7: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/7.jpg)
A Different Approach
• Goals:– Avoid declustering current state– Don’t change how current state is accessed– Application requests snapshot– Snapshots are “on-line” (not in warehouse)
• Split Snapshots– Copy past out incrementally– Snapshots available through virtualized buffer
manager
![Page 8: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/8.jpg)
Our Storage System Model
• A “database”– Has transactions– Has recovery log– Organizes data in pages on disk
![Page 9: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/9.jpg)
Our Consistency Model
• Crash consistency– Imagine that a snapshot is declared, but
then before any modifications can be made, the system crashes
– After restart, recovery kicks in and the current state is restored to *some* consistent point
– All snapshots will have this same consistency guarantee after a crash
![Page 10: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/10.jpg)
I want record R
Our Storage System Model
P1
P3
…
Application
Cache
Disk
P1 … Pn
AccessMethods
Database
Snapshot Now
Find Table
Find Root
Search for R
Return R
P1 Address XP2 Address Y…
Page Table
![Page 11: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/11.jpg)
Retaining the Past
Versus
![Page 12: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/12.jpg)
Copy-on-Write (COW)
P1 P2 P1
P1
P2
P1
P2
PageTable
PageTable
Snapshot PageTable “S”
Operations:
Snapshot “S”
Modify P1
The old page table became the Snapshot
page table
![Page 13: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/13.jpg)
P1P1
Split-COW
Expensive to update P2 in both
page tables
P1 P2
P1 P1 P2
P1
P2
PageTable
P1
P2
SPT(S)
P1P1
P2
P2
SPT(S+1)
![Page 14: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/14.jpg)
What’s next
1. How to manage the metadata?2. How will snapshot pages be accessed?3. Can we be non-disruptive?
![Page 15: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/15.jpg)
Metadata Solution
• Metadata (page tables) created incrementally
• Keeping many SPTs costly
• Instead, write “mappings” into log
• Materialize SPT on-demand
![Page 16: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/16.jpg)
Maplog
Start
Maplog• Mappings created incrementally• Added to append-only log• Start points to first mapping created
after a snapshot is declared
P1 P1 P2 P1 P1 P1 P2 P1P2
Sna
p 1
Sna
p 2
Sna
p 3
Sna
p 4
Sna
p 5
Sna
p 6
P3
![Page 17: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/17.jpg)
P1 P1 P2 P1 P1 P1 P2 P1P2
Sna
p 1
Sna
p 2
Sna
p 3
Sna
p 4
Sna
p 5
Sna
p 6
Maplog
Start
Maplog• Materialize SPT with scan• Scan for SPT(S) begins at Start(S)• Notice that we read some mappings
that we do not need
P3
![Page 18: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/18.jpg)
Cost of Scanning Maplog
• Let overwrite cycle length L be the number of page updates required to overwrite entire database
• Maplog scan cannot be longer than overwrite cycle
• Let N be the number of pages in the database
• For a uniformly random workload, L N ln N (by the “coupon collector’s waiting time” problem)
• Skew in the update workload lengthens overwrite cycle
• Skew of 80/20 (80% of updates to 20% of pages) increases L by a factor of 4
Skew hurts
![Page 19: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/19.jpg)
Skippy
P1 P2 P1 P1P2Skippy Level 1
Maplog
Start
• Copy first-encountered mapping (FEM) within node to next level
P1 P1 P2 P1 P1 P1 P2 P1P2
Sna
p 1
Sna
p 2
Sna
p 3
Sna
p 4
Sna
p 5
Sna
p 6
P3
P3
Pointers
Copies
![Page 20: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/20.jpg)
Skippy
P1 P2 P1 P1P2
Maplog
Start
P1 P1 P2 P1 P1 P1 P2 P1P2
Sna
p 1
Sna
p 2
Sna
p 3
Sna
p 4
Sna
p 5
Sna
p 6
P3
P3Skippy Level 1
Cut redundant mapping count in
half
![Page 21: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/21.jpg)
K-Level Skippy• Can eliminate effect of skew — or more• Enables ad-hoc, on-line access to snapshots,
whether they are old or young
Skew # Skippy Levels Time to Materialize SPT (s)
50/50 0 13.8
80/20 0 19.0
1 15.8
2 14.7
3 13.9
99/1 0 33.3
1 6.69
![Page 22: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/22.jpg)
Read Current StateBITE
Accessing Snapshots• Transparent to layers above cache• Indirection layer to redirect page requests
from a BITE transaction into the snapstore
P1 P2
P1 P1 P2
Cache
P1
P2
P2
![Page 23: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/23.jpg)
Non-Disruptiveness
• Can we create Skippy and COW pre-states without disrupting the current state?
• Key idea:– Leverage recovery to defer all snapshot-
related writes– Write snapshot data in background to
secondary disk
![Page 24: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/24.jpg)
Implementation• BDB 4.6.21• Page cache augmented
– COWs write-locked pages– Trickle COW’d pages out over time
• Leverage recovery– Metadata created in-memory at transaction
commit time, but only written at checkpoint time– After crash, snapshot pages and metadata can be
recovered in one log pass
• Costs– Snapshot log record– Extra memory– Longer checkpoints
![Page 25: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/25.jpg)
Early Disruptiveness Results
• Single-threaded updating workload of 100,000 transactions
• 66M database • We can retain a
snapshot after every transaction for a 6–8% penalty to writers
• Tests with readers show little impact on sequential scans (not depicted)
631
575
472
656
593
508
674
613
511
0
100
200
300
400
500
600
700
800
50/50 80/20 99/1
Skew
Time (s)
No Snapshots
Snapshots Every Other Transaction
Snapshots Every Transaction
![Page 26: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/26.jpg)
Paper Trail
• Upcoming poster and short paper at ICDE08
• “Skippy: a New Snapshot Indexing Method for Time Travel in the Storage Manager” to appear in SIGMOD08
• Poster and workshop talks– NEDBDay08, SYSTOR08
![Page 27: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/27.jpg)
Questions?
![Page 28: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/28.jpg)
Backups…
![Page 29: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/29.jpg)
Recovery Sketch 1
• Snapshots are crash consistent• Must recover data and metadata for all
snapshots since last checkpoint• Pages might have been trickled, so must
truncate snapstore back to last mapping before previous checkpoint
• We require only that a snapshot log record be forced into the log with a group commit, no other data/metadata must be logged until checkpoint.
![Page 30: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/30.jpg)
Recovery Sketch 2
• Walk backward through WAL, applying UNDOs
• When snapshot record is encountered, copy the “dirty” pages and create a mapping
• Trouble is that snapshots can be concurrent with transactions
• Cope with this by “COWing” a page when an UNDO for a different transaction is applied to that page
![Page 31: Split Snapshots and Skippy Indexing: Long Live the Past! Ross Shaull Liuba Shrira Brandeis University](https://reader035.vdocument.in/reader035/viewer/2022062407/56649ce45503460f949b0c5a/html5/thumbnails/31.jpg)
The Future
• Sometimes we want to scrub the past– Running out of space?– Retention windows for SOX-compliance
• Change past state representation– Deduplication– Compression