data versioning systems

50
Data Versioning Systems Research Proficiency Exam Ningning Zhu Advisor Tzi-cker Chiueh Computer Science Department State University Of New York at Stony Brook Feb 10, 2003

Upload: donny

Post on 23-Jan-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Data Versioning Systems. Research Proficiency Exam Ningning Zhu Advisor Tzi-cker Chiueh Computer Science Department State University Of New York at Stony Brook Feb 10, 2003. Definitions. Data Object Granularity of Data Object file, tuple, database table, database - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Versioning Systems

Data Versioning Systems

Research Proficiency Exam

Ningning Zhu Advisor Tzi-cker Chiueh

Computer Science DepartmentState University Of New York at Stony

BrookFeb 10, 2003

Page 2: Data Versioning Systems

Definitions

Data Object Granularity of Data Object

file, tuple, database table, database logical volume, database, block device

Version of a Data Object A consistent state, a snapshot, a point-in-time image

Data Repository Version Repository

Page 3: Data Versioning Systems

Why need data versioning?

Documentation Versioning Control Human mistakes Malicious attacks Software failure History Study

Page 4: Data Versioning Systems

Data Versioning Vs. Other Techniques

Backup Mirroring Replication Redundancy Perpetual storage

Page 5: Data Versioning Systems

Design Issues

Resource Consumption Storage capacity, CPU Storage bandwidth, network bandwidth

Performance old versions, current object Throughput, latency

Maintenance Effort

Page 6: Data Versioning Systems

Design Options

Who perform ? User, Application, file system, database system, object store,

virtual disks, block-device

Where and what to save? Separate version repository? Full image vs. delta

How? Frequency Scope

Page 7: Data Versioning Systems

Data Versioning Techniques

Save

Represent

Extract

Page 8: Data Versioning Systems

Save: naive approach (1)

Page 9: Data Versioning Systems

Save: Split Mirror (2)

Page 10: Data Versioning Systems

Save: copy-old-while-update-new (3)

Page 11: Data Versioning Systems

Save: keep-old-and-create-new (4)

Page 12: Data Versioning Systems

Represent (1)

Full image Easy to extract, consume more resource

Delta Reference direction reference object Differencing algorithm

Chain of delta and full image

Page 13: Data Versioning Systems

Represent: Chain structure (2)

Forward delta V1, D(1,2), D(2,3), V4, (D4,5), D(5,6), V7

Forward delta with version jumping V1, D(1,2), D(1,3), V4, (D4,5), D(4,6), V7

Reverse delta V1, D(3,2), D(4,3), V4, D(6,5), D(7,6), V7

Page 14: Data Versioning Systems

Represent: differencing algorithm (3)

Insert/Delete (diff) vs. Insert/Copy (bdiff)

Rabin fingerprint Given a sequence of bytes:

SHA-1: Collision free hashing function

MtpptttRFttRF

MtptptptttttRF

tttt

iiiiii mod))((((

mod)()(

))1)1

11

21,...3,2,1

,...3,2,1

Page 15: Data Versioning Systems

XDFS

Drawback of traditional version control

Slow extraction, fragmentation, lack of atomicity support

XDFS A user-level file system with versioning support Separate version labeling with delta compression Effective delta chain Built upon Berkeley DB

Page 16: Data Versioning Systems

Log Structured File System-SpriteLFS

Access assumption: small write Data Structure

Inode Inode map Indirect block Segment summary Segment usage table Superblock (fixed disk location) Checkpoint region (fixed disk location) Directory change log

Page 17: Data Versioning Systems

Research Data Versioning System

File System Elephant Comprehensive Versioning File System

Object-store Self-Secure-Storage-System Oceanstore

Database System Postgres and Fastrek

Storage System Petal and Frangipani

Page 18: Data Versioning Systems

Elephant File System (1)

Retention Policy Keep one

Keep all

Keep safe

Keep landmark (intelligently add landmark)

Page 19: Data Versioning Systems

Elephant File System (2)

Metadata organization

Page 20: Data Versioning Systems

S4: Self-Secure Storage System (1)

Object-store interface Log everything Audit log Efficient metadata logging

Page 21: Data Versioning Systems

S4: Metadata Inefficiency (2)

Page 22: Data Versioning Systems

CVFS: Comprehensive Versioning (1)

Journal based logging vs. Multi-version B-tree

Page 23: Data Versioning Systems

CVFS: Comprehensive Versioning (2)

Journal-based vs. Multi-version B-tree

Assumptions about metadata access

Optimizations: Cleaner: pointers in version repository Both forward delta and reverse delta Checkpointing and clustering Bounded old version access by forcing checkpoint

Page 24: Data Versioning Systems

Oceanstore: decentralized storage

A global-scale persistent storage A deep archival system Data Entity is identified by

<A-GUID, V-GUID>

Internal data structure is similar to S4.

Use B+ tree for object block indexing

Page 25: Data Versioning Systems

Postgres:a multi-version database(1)

Versioning support “Save” of a version in the database context Optimized towards “extract”

Database Structure and Operation Tables made up of tuples First and secondary indices Transaction log: <TID, operation> Update Delete + Insert

Page 26: Data Versioning Systems

Postgres: record structure (2)

Extra fields for versioning: OID : record ID, shared by versions of this

record Xmin : TID of the inserting transaction Tmin : Commit time of Xmin Xmax : TID of the deleting transaction Tmax : Commit time of Xmax PTR : forward pointer from old new

Page 27: Data Versioning Systems

Postgres: Save (3)

Page 28: Data Versioning Systems

Postgres: Represent & Extract (4)

Full image + forward delta SQL query with TIME parameter Build indices using R-tree for ops:

Contained in , overlap with

Secondary indices When a delta record is inserted, if secondary indices

need to be changed, an full image need to be constructed

Page 29: Data Versioning Systems

Postgres: Frequency of extraction (5)

No archive Timestamp never filled in

Light archive Extract time from TIME meta table

Heavy archive First use, extract time from TIME metadata, then fill

the field Later use, directly from data record

Page 30: Data Versioning Systems

Postgres: Hardware Assumption (6)

Another level of archival storage WORM (optical disks)

Optimizations: Indexing Accessing method Query plan Combine indexing at magnetic disks and archival

storage

Page 31: Data Versioning Systems

Fastrek: application of versioning

Built on top of Postgres Tracking read operation Tracking write operation

Tmin, Tmax

Data dependency analysis Fast and intelligent repair

Page 32: Data Versioning Systems

Petal and Frangipani

Petal: a distributed storage supports virtual disk snapshot <virtual disk id, off> -> <physical disk id, off> <virtual disk id, epoch, off> -> <physical disk id,

off>

Frangipani: A distributed file system built on top of Petal Versioning by creating virtual disks snapshot Coarse granularity: mainly for back purpose

Page 33: Data Versioning Systems

Commercial Data Versioning Systems

Network Appliance IBM EMC

Page 34: Data Versioning Systems

Network Appliance: WAFL

Network Appliance Customized for NFS and RAID

Automatic checkpointing Utilize NVRAM:

fast recovery

Good performance: update batching, least blocking upon versioning

Easy extraction: .snapshot directory

Page 35: Data Versioning Systems

WAFL: system layout

Page 36: Data Versioning Systems

WAFL:Limited Versioning

Page 37: Data Versioning Systems

Network Appliance: SnapMirror

Built upon WAFL Synchronous Mirroring Semi-synchronous Mirroring Asynchronous Mirroring

15 minutes interval, save 50% of update

SnapMirror: Get block information from blockmap Schedule mirroring at block-device level

Page 38: Data Versioning Systems

IBM (Flash Copy ESS)

A block-device mirroring system Copy-old-while-update-new Use ESS cache and fast write to

mask write latency Use bitmap to keep track each

block of old version and new version

Page 39: Data Versioning Systems

EMC (TimeFinder)

Split mirror Implementation

Page 40: Data Versioning Systems

Proposal:

Non-point-in-time versioning What is the most valuable state?

Operation-based journaling Natural metadata journaling efficiency

Design Transparent mirroring and versioning Primary site non-journaling, mirror site journaling against intrusion, mistake Applied to network file server

Page 41: Data Versioning Systems

Repairable File Service: architecture

Page 42: Data Versioning Systems

Represent: operation-based

Delta: NFS packets Journal: Reverse delta chain

No checkpointing overhead A chain of 2 months will cost <$100

Efficiency metadata journaling 100-200 bytes for inode, directory update One hash table entry for indirect block update

Page 43: Data Versioning Systems

Save: a hybrid approach

Data block update Copy-old-create-new

Metadata update: Naïve: Read old, write old, update new Variation of Naïve: Guess old,write old, update-new Variation of Naïve: Get old, write old, update-new

Page 44: Data Versioning Systems

User Level Journaling File System

Page 45: Data Versioning Systems

System Layout

Page 46: Data Versioning Systems

Extract: intelligent and fast repair

Dependency logging Dependency analysis Fast Repair

Fast extract of most valuable state of a data system

Drawback: Poor performance for other extract specification

Page 47: Data Versioning Systems

Conclusion (1)

Hardware technology -> DV possible Capacity Random access storage CPU time

Penalty of data loss -> DV a necessity

Data loss System down time

DV technology: Journaling, B+, differencing algorithm

Page 48: Data Versioning Systems

Conclusion (2)

DV at application level DV at file system/database level DV at storage system/block device

level A combined and flexible solution to

satisfy all DV requirement at low cost.

Page 49: Data Versioning Systems

Future Trend (1)

Comprehensive versioning Perpetual versioning High performance versioning

Comparable to non-versioning system

Intrusion oriented versioning Testing new untrusted application Reduce system maintenance cost

Semantic extraction

Page 50: Data Versioning Systems

Future Trend (2)

In decentralized storage system, integrate and separate DV with

Replication Redundancy Mirroring Encryption

Avoid similar functionality being implemented at by multiple modules