a chicken in every pot: a persistent snapshot memory scaled in time
DESCRIPTION
A chicken in every pot: a persistent snapshot memory scaled in time. Liuba Shrira and Hao Xu Brandeis University. Storage systems: the 7 year itch. 1984: rotational delay – FFS 1991: large memory - LFS 1998: cheaper disk - Elephant 2005: .. a chicken in every pot : - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/1.jpg)
A chicken in every pot:a persistent snapshot memory
scaled in time
Liuba Shrira and Hao Xu
Brandeis University
![Page 2: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/2.jpg)
Storage systems: the 7 year itch
1984: rotational delay – FFS
1991: large memory - LFS
1998: cheaper disk - Elephant
2005: .. a chicken in every pot :
snapshot box on the side..
![Page 3: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/3.jpg)
Trends
Hardware: Disk
Cheap (1$/GB) and cheaper
Software Industry: Forbes (12/2004) says:
need for keeping past state is growing
![Page 4: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/4.jpg)
Trends cont.
- A casino chases a card counter
- IT dept. chased by Sarbanes Oxley
- Hippocratic DB audited about patient privacy preservation
Need to analyze past activity
![Page 5: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/5.jpg)
SNAP: a snapshot system for an object storage system
Goal:
Storage system capability for
back-in-time execution (BITE):
application runs against
read-only snapshots
without synchronization analysis in retrospect
![Page 6: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/6.jpg)
Baseline Requirements for BITE
Consistent snapshots: same (old) invariants hold
BITE of general code: after-the-fact ad-hoc analysis ( vs predefined SQL access methods)
App chooses the snapshot: snapshot state meaningful to app (vs “some time in the past” )
High time “resolution”: fine-grained past analysis (vs backup for recovery)
![Page 7: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/7.jpg)
Over long time-scales..
Living with the past: how close?today: too close (Temporal DB, CVFS)
or too far (warehouse - Netezza)
Snapshots can be of long-term importance, or transient today: uniform - apps can not discriminate
Inherent tension: latency of access vs cost of representation (space and time) today: limited adaptation - compress or not
![Page 8: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/8.jpg)
Capturing past states
Two ways: Cheep - no-overwrite updatepast stays put, copy new :
less to write, but bloated DB, past inherits same rep
Opportunistic- in-place update past is copied-out, separated:
more to write but can write smartly, can tailor past rep, and DB stays clustered
(vigor)
![Page 9: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/9.jpg)
Our requirements:
Non-disruptive past: just right distance - separated
At adaptive distance: e.g. faster BITE on more recent states
Discriminated past: application classifies, snapshot system filters:
Some snapshots outlive others,some can be accessed faster
Flexible classification: e.g. after the fact
![Page 10: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/10.jpg)
Snapshot system operations
Request to take a snapshot (declaration):sid: snapshot_request (filter_spec)
Request to access a snapshot v:snapshot_access (sid)
Request to specify a filter for a snapshot v:lazy_filter (sid,filter_spec)
T1, T2, S1, T3, T4, T5, S2,…
![Page 11: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/11.jpg)
Baseline storage system
General interface: pages and a page tabletransactions access objects on pages
Server: DB disk: slotted pages of objects
physical oid (page#,o#)and a page table
Transaction Log Cache: pages and modifed object cache
![Page 12: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/12.jpg)
Storage system, cont.optimistic CC+ARIES
Clientsfetch pages, run transactionssend modifed objects to server
Servervalidates, commits (WAL)caches committed modificationsno-force, no-STEAL
![Page 13: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/13.jpg)
The snapshot system
Archive separated from DB:Archive i/o sequential, DB random
Copy-on-write (COW): copy out snapshot states into archive
just before updating DB during cleaning.
![Page 14: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/14.jpg)
Snapshot interface
Same as DB -Snapshot PagesSnapshot Page Table
So BITE is transparent: BITE on snapshot S(v) uses PageTable(v)
![Page 15: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/15.jpg)
Snapshot system:below the interface:
Some S(v) pages are in the archive,
some in DB
and pages in the archive can have
a different representations
![Page 16: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/16.jpg)
BITE (v): namespace redirection
![Page 17: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/17.jpg)
Creating non-disruptive snapshots: (i/o bound system)
Archiving snapshot states when cleaningcan slow down cleaningcompared to a system without snapshots.
Copying to the archive disk (sequential I/O)in parallel to database I/O (random)can partially hide archiving cost
behind database I/O.
![Page 18: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/18.jpg)
Creating snapshots: how well can you hide?
Is determined by:how much is archived:
compactness of snapshot representation,frequency, snapshot update workload (overwriting)
cost of archiving, sequential, other archive traffic – BITE
![Page 19: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/19.jpg)
Creating snapshots: some issues
Issue:avoid overwriting snapshot states
(without blocking, pinning etc)Issue:
update snapshot meta data efficiently (large, dynamic page tables )
Issue:filter out long-lived snaps (focus here)
![Page 20: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/20.jpg)
New techniques for copy-out snapshots:
- VMOB: in-memory versioned data structure preserves snapshot states w/out blocking
- LPT: incrementally archived page table with logarithmic reconstruction cost
- Filtering: exploit smart representation for past states (focus here)
![Page 21: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/21.jpg)
Filtering: motivation
Want unlimited past at high resolutionbut
some snapshots are transientothers of long-term interest to application
application needs to discriminate between snapshots
![Page 22: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/22.jpg)
Thresher: a filtering system for SNAP
![Page 23: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/23.jpg)
Snapshot representation
What can representation do for filtering?
life-time based allocation –avoids fragmentation
diff-based encoding –reduces cost of copying
adaptive combination - real winner
![Page 24: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/24.jpg)
Example: hierarchical snapshots at multiple time granularity
ICU patient monitoring DB takes snapshots::minute by minute vital sign monitor readingshourly includes nurse’s writeup summarizing monitor readings
daily includes doctor’s notes summarizing nurse’s checkups
Doctor’s have longer life-time than nurse’s …
![Page 25: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/25.jpg)
Brief overview: snapshot creation
Some notation:Snapshot spanRecorded pages
example:.. v4, T: w (x_P), T’: w (y_S), v5, T’’..
Span of v4 : T, T’Pages recorded by snapshot v4: P, S
![Page 26: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/26.jpg)
Incremental snapshot creation:Archived snapshot pages: dispersed:v4 P S v5 P Q …-|-----------------------|------------------------
Archived snapshot page tables (PT):
PT(v4): addr (P4), addr(S4); PT(v5): addr(P5), addr(Q5).. …-|-----------------------|-------------------------
Another talk: how to construct archived page tables: :Construct APT (v4) = recorded (v4) + Construct APT (v5)
![Page 27: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/27.jpg)
Filtering example: filter out short-lived v5
Doctor’s Nurse’sv4 P S v5 P Q v6 …-|-----------------------|-----------------------|- Archive Filter: long-lived v4, reclaim v5:
reclaim P5 retain Q5 (v4 needs it)
filtering incremental snapshots creates fragmentation
![Page 28: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/28.jpg)
Problem: fragmentation
• fragmented archive, over time:non sequential archive writes
or
random reads to copy out long lived states
![Page 29: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/29.jpg)
Our approach: filter-spec
Filter spec determines
relative snapshot lifetime
“App knows best”:
the app supplies a filter spec
the system filters
![Page 30: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/30.jpg)
avoid fragmentation with filter-spec
Known at snapshot declaration –
use lifetime-based allocation
After the fact -
use a flexible rep to filter lazily
rep allows adaptive trade-off:
cost of filtering vs cost of BITE
![Page 31: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/31.jpg)
App specifies filter at declaration
P4 S4 Q5 long-lived pages …-|----------------------------------------------- P5 short-lived…-|-----------------------------------------------
Invariant : to reclaim w/out fragmentation,
short-lived areas store no long-lived pages
![Page 32: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/32.jpg)
FilterTree: filter pages for free
![Page 33: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/33.jpg)
After-the-fact (lazy) filtering
Some applications want
to defer filter specification
Lazy filtering requires copying
We can specialize representation (compact)
to reduce copying cost
![Page 34: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/34.jpg)
Compact representation: diffs
Two components filtered separately:
compact diffs – reduce cost of copying (diffs clustered by page)
checkpoints – accelerate BITE (page-based snapshotssystem-declared, can use FilterTree)
![Page 35: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/35.jpg)
Adaptive trade-off
Like recovery log:
less frequent checkpoints
increase compactness
more frequent checkpoints
accelerate BITE
![Page 36: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/36.jpg)
Lazy filtering: checkpoints filtered for free
B1
B1 B2 B3
…
… G2(diffs)
G1(diffs)
E1 E2 E3
FilterTree for checkpoints Archive regions for diff extents
E
![Page 37: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/37.jpg)
But some applications want more:
lazy filteringandfaster BITE
e.g. - app runs BITE on batch of recent snapshotsto decide which ones to retain -
needs fast BITE to keep up..
![Page 38: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/38.jpg)
Combined hybrid
Faster BITE in recent window
and
Lazy filtering
![Page 39: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/39.jpg)
Hybrid: checkpoints and checkpointfiltered for free
![Page 40: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/40.jpg)
Status
Implemented:
SNAP and Thresher for Thor storage system
Performance results –
encouraging.
here is a 5000 feet view:
![Page 41: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/41.jpg)
Performance metrics
Cost of filtering: non-disruptiveness = rate-of-drain/ rate-of-pour
t_clean determins rate-of-drainworkload parameter: overwriting
Compactness of diff-based rep:retention relative to page-based rep
R_diff - fixed R_ckp - tunable by frequency of checkpoints
workload parameter: densityBITE - page-based snapshots, vs diff-based vs DB
![Page 42: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/42.jpg)
Non-disruptiveness
Storage system w/hybrid snapshots vs
w/out snapshots (Thor)
How much drop in
rate-of-drain / rate-of-pour
![Page 43: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/43.jpg)
Experimental configuration
Workoads:extend multiuser 007 to control
density overwriting
System configuration:single client, medium 007 – small DB 185MBmultiple clients – large DB 140GB
![Page 44: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/44.jpg)
FIlterTree
Free!
![Page 45: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/45.jpg)
Non-disruptiveness/ single client “summertime …life is easy”
![Page 46: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/46.jpg)
Non-disruptiveness/multi user: “DB works harder”
![Page 47: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/47.jpg)
Summary: non-disruptive snapshot memory
Unlimited filtered past
is cheaper than you may think.
.. A chicken in every pot..
Every storage system
can have a snapshot box on the side..
![Page 48: A chicken in every pot: a persistent snapshot memory scaled in time](https://reader036.vdocument.in/reader036/viewer/2022062519/56815223550346895dc06872/html5/thumbnails/48.jpg)
To get there:
Generalize:
ARIES/ STEAL / underway
file systems / need extended interfaces
Beyond:
upgrades/ have techniques
provenance / need ideas..