fast and failure-consistent updates of application...

32
Fast and Failure - Consistent Updates of Application Data in Non - Volatile Main Memory File System Jiaxin Ou, Jiwu Shu ([email protected]) Storage Research Laboratory Department of Computer Science and Technology Tsinghua University

Upload: others

Post on 13-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

Fast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

File System

Jiaxin Ou, Jiwu Shu

([email protected])Storage Research LaboratoryDepartment of Computer Science and TechnologyTsinghua University

Page 2: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-2-

Outline

Background and Motivation

FCFS Design

Evaluation

Conclusion

Page 3: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-3-

Failure Consistency

Failure Consistency (Failure-Consistent Updates)

− Atomicity and durability− The system is able to recover to a consistent state from

unexpected system failuresApplication Level Consistency

− Update multiple files atomically and selectively

Atomic_Group{write(fd1, “data1”);write(fd2, “data2”);}

Either both writes persist successfully, or neither does

Example:

Page 4: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-4-

Existing approaches for supporting

application level consistency on NVMM

NVMM-based FS

(e.g., BPFS, PMFS)

NVMM

Application

(e.g., SQLite, MySQL)

Consistent update

protocol (Journaling)

Page 5: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-5-

NVMM-based FS

(e.g., BPFS, PMFS)

NVMM

Application

(e.g., SQLite, MySQL)

Existing approaches for supporting

application level consistency on NVMM

Consistent update

protocol (Journaling)

Complex and

error-prone [OSDI 14]

Page 6: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-6-

Traditional

Transactional FS (Valor)

DRAM Page Cache

Block Layer

NVMM

Application

(e.g., SQLite, MySQL)

NVMM-based FS

(e.g., BPFS, PMFS)

NVMM

Application

(e.g., SQLite, MySQL)

Existing approaches for supporting

application level consistency on NVMM

Consistent update

protocol (Journaling)

Complex and

error-prone [OSDI 14]

Consistent update

protocol (Journaling)

Page 7: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-7-

Traditional

Transactional FS (Valor)

DRAM Page Cache

Block Layer

NVMM

Application

(e.g., SQLite, MySQL)

NVMM-based FS

(e.g., BPFS, PMFS)

NVMM

Application

(e.g., SQLite, MySQL)

Existing approaches for supporting

application level consistency on NVMM

Consistent update

protocol (Journaling)

Complex and

error-prone [OSDI 14]

Consistent update

protocol (Journaling)

Page 8: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-8-

Traditional

Transactional FS (Valor)

DRAM Page Cache

Block Layer

NVMM

Application

(e.g., SQLite, MySQL)

NVMM-based FS

(e.g., BPFS, PMFS)

NVMM

Application

(e.g., SQLite, MySQL)

High double-copy

and block layer

overheads

Existing approaches for supporting

application level consistency on NVMM

Consistent update

protocol (Journaling)

Complex and

error-prone [OSDI 14]

Consistent update

protocol (Journaling)

Page 9: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-9-

Traditional

Transactional FS (Valor)

DRAM Page Cache

Block Layer

NVMM

Application

(e.g., SQLite, MySQL)

NVMM-based FS

(e.g., BPFS, PMFS)

NVMM

Application

(e.g., SQLite, MySQL)

High double-copy

and block layer

overheads

Existing approaches for supporting

application level consistency on NVMM

Consistent update

protocol (Journaling)

Complex and

error-prone [OSDI 14]

Consistent update

protocol (Journaling)

High journaling

overheads

Page 10: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-10-

Traditional

Transactional FS (Valor)

DRAM Page Cache

Block Layer

NVMM

Application

(e.g., SQLite, MySQL)

NVMM-based FS

(e.g., BPFS, PMFS)

NVMM

Application

(e.g., SQLite, MySQL)

High double-copy

and block layer

overheadsOur Goal:

Correct Application

Level Consistency +

High Performance

Existing approaches for supporting

application level consistency on NVMM

Consistent update

protocol (Journaling)

Complex and

error-prone [OSDI 14]

Consistent update

protocol (Journaling)

High journaling

overheads

Page 11: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-11-

Traditional

Transactional FS (Valor)

DRAM Page Cache

Block Layer

NVMM

Consistent update

protocol (Journaling)

Application

(e.g., SQLite, MySQL)

NVMM-based FS

(e.g., BPFS, PMFS)

NVMM

Application

(e.g., SQLite, MySQL)

High double-copy

and block layer

overheads

FCFSConsistent update

protocol (NVMM-

optimized WAL)

Application

(e.g., SQLite, MySQL)

NVMM

Existing approaches for supporting

application level consistency on NVMM

Consistent update

protocol (Journaling)

Complex and

error-prone [OSDI 14]

High journaling

overheads

Page 12: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-12-

Comparison of Different File Systems on

NVMM Storage

High Performance

Application Level Consistency

File System Level Consistency

Low Performance

Valor [FAST 09]

Ext2, Ext3, Ext4 BPFS [SOSP 09], PMFS [EuroSys 14],NOVA [FAST 16]

FCFS

Traditional Transactional File

Systems

Traditional File Systems

State-of-the-art NVMM-based File

Systems

Page 13: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-13-

Outline

Background and Motivation

FCFS Design

Evaluation

Conclusion

Page 14: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-14-

An Example of How to Use FCFS

Atomic_Group{write(fd1, “data1”);write(fd2, “data2”);}

tx_id = tx_begin();tx_add(tx_id, fd1);tx_add(tx_id, fd2);write(fd1, “data1”);write(fd2, “data2”);tx_commit(tx_id);

Interface Description

tx_begin(TxInfo) creates a new transaction

tx_add(TxID, Fd) relates a file descriptor a designated transaction

tx_commit(TxID) commits a transaction

tx_abort(TxID) cancels a transaction entirely

Page 15: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-15-

Opportunities and Challenges for Providing

Fast Failure-Consistent Update in NVMM FS

Opportunities

− Direct access to NVMM allows fine-grained logging− Asynchronous checkpointing can move the checkpointing

latency off the critical path under low storage loadChallenges

− #1: How to guarantee that a log unit will not be shared by different transactions? (Correctness)

− #2: How to balance the tradeoff between copy cost and log tracking overhead? (Performance)

− #3: How to improve checkpointing performance under high storage load? (Performance)

Page 16: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-16-

Key Ideas of FCFS

Our Goal: to propose a novel NVMM-optimized file system (FCFS) providing the application-level consistency but withoutrelying on the OS page cache layer

Key Ideas of FCFS (NVMM-optimized WAL):

− Hybrid Fine-grained Logging to address Challenge #1 and #2 Decouple the logging method of metadata and data updates Using fast Two-Level Volatile Index to track uncheckpointed log data

− Concurrently Selective Checkpointing to address Challenge #3 Committed updates to different blocks are checkpointed concurrently Committed updates of the same block are checkpointed using Selective

Checkpointing Algorithm

Page 17: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-17-

1. Hybrid Fine-grained Logging

Challenge #1: Correctness Logging granularity (byte vs cacheline)

− a log unit should not be shared by different transactions

Metadata

• Smallest unshared unit is a metadata structure

• a metadata structure can be of any size (e.g., directory entry)

Data

• Smallest unshared unit is a file

• File is allocated based on block

Byte Granularity

Cacheline Granularity

Byte Granularity

Cacheline Granularity

Page 18: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-18-

1. Hybrid Fine-grained Logging

Challenge #2: Performance tradeoff : log tracking cost vs data copy cost Impacted by logging granularity (byte vs cacheline) & logging

mode (undo vs redo)

Metadata

(update size is small)

• Byte granularity redo logging has high log tracking cost

Byte granularity undo logging

Data(update size can be

very large)

• Undo logging has highdata copy cost for large update

• Byte granularity redo logging has high log tracking cost

Cacheline granularity redo logging

Page 19: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-19-

1. Hybrid Fine-grained Logging

Another Challenge: How to reduce the log tracking cost of the data log (cacheline granularity redo logging) ?− Example: each 64B cacheline log unit may need at least 16 bytes of

indexSolution: Two-Level Volatile Index

Different versions’ log blocks form a pending list• First level: logic block pending list head

(radix tree)• Second level: traversing the pending list to get

the physical block which contains the latest data of a cacheline using the cacheline bitmap

Overheads: Each 4KB log blocks requires at most 16 bytes of index data (first level) and 8 bytes of bitmap (second level)

(Logic block, cacheline id) (physical block)

Page 20: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-20-

2. Concurrently Selective Checkpointing

Challenge #3: How to improve checkpointing performance under high storage load?

Concurrent Checkpointing

− Committed updates to different blocks are checkpointedconcurrently to enhance the concurrency of checkpointing

Selective Checkpointing

− Committed updates of the same block are checkpointedusing Selective Checkpointing Algorithm to reduce the checkpointing copy overhead

Page 21: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-21-

2. Concurrently Selective Checkpointing

Another Challenge: How to ensure correct failure recovery due to out-of-order checkpointing?− What if a newer log entry is deallocated before an older log entry

and the system crashes before deallocating the older one? − How to guarantee that the commit log entry is deallocated at last?

Solution: Maintaining two ordering properties during log deallocation− Redo log entries are deallocated following the pending list order− Using a global committed list to ensure the deallocation order

between the commit log entry and other metadata/data log entries of a transaction?

Page 22: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-22-

2. Concurrently Selective Checkpointing

Selective Checkpointing Algorithm

− Leveraging NVMM’s byte-addressability to reduce the checkpointing copy overhead

D3:

D2:

D1:

D0:

Note: D0~D3 refers to different versions of block D; Cij is the jth cacheline in the ith version of block D

Log

Blo

ckO

rig

inal

Blo

ck

Page 23: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-23-

2. Concurrently Selective Checkpointing

Selective Checkpointing Algorithm

− Leveraging NVMM’s byte-addressability to reduce the checkpointing copy overhead

D3:

D2:

D1:

D0:

Step1: a new permanent data block, which has the largest number of latest cachelines, is carefully selected

Note: D0~D3 refers to different versions of block D; Cij is the jth cacheline in the ith version of block D

Log

Blo

ckO

rig

inal

Blo

ck

Page 24: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-24-

2. Concurrently Selective Checkpointing

Selective Checkpointing Algorithm

− Leveraging NVMM’s byte-addressability to reduce the checkpointing copy overhead

D3:

D2:

D1:

D0:

Note: D0~D3 refers to different versions of block D; Cij is the jth cacheline in the ith version of block D

Step2: Copy the latest cacheline data from other blocks to the newly-selected permanent block

Copy C22 , C13 , C05from D2 , D1, D0 to D3

Log

Blo

ckO

rig

inal

Blo

ck

Page 25: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-25-

2. Concurrently Selective Checkpointing

Selective Checkpointing Algorithm

− Leveraging NVMM’s byte-addressability to reduce the checkpointing copy overhead

D3:

D2:

D1:

D0:

Step3: Modify the reference to origin original block to refer to newly-selected permanent block atomically

Note: D0~D3 refers to different versions of block D; Cij is the jth cacheline in the ith version of block D

Log

Blo

ckO

rig

inal

Blo

ck

Page 26: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-26-

2. Concurrently Selective Checkpointing

Traditional Constant Checkpointing

• Copy 3 blocks = 3 * 6 * 64 B = 1152 B

Overhead Comparison

Selective Checkpointing

• Copy 3 cacheline and modify one block pointer = 3 * 64 B + 8 B = 200 B

Selective Checkpointing Algorithm significantly reduces the checkpointing copy overhead

Page 27: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-27-

Outline

Background and Motivation

FCFS Design

Evaluation

Conclusion

Page 28: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-28-

Evaluations of Failure-Consistent Updates

• NC is a no-consistency system• FG-WAL implements the failure-consistent update protocol using fine-grained write-ahead logging• SCSP implements the failure-consistent update protocol using short-circuit shadow paging [SOSP 09]• Valor is a traditional transactional file system

The latency of FCFS-based version is the lowest among all failure-consistent versions (FG-WAL, SCSP, Valor)

Page 29: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-29-

Evaluations of Real Applications

• NC turns off the transactional part of each applicationThroughput Performance NVMM Write Size

FCFS-based applications outperform the original ones by up to 93% (MySQL running YSCB workload)

Page 30: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-30-

Outline

Background and Motivation

FCFS Design

Evaluation

Conclusion

Page 31: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-31-

Conclusion

Existing NVMM file systems do not guarantee the consistency of application data, while application’s own consistency protocols are complex and error-prone

FCFS is the first NVMM-optimized file system which enables both correctness and high performance for applications to consistently update their data on NVMM storage

FCFS employs an NVMM-optimized WAL scheme to reduce the overhead towards supporting failure consistency by fully leveraging NVMM’s byte addressability and high concurrency but without relying on the page-cache layer

FCFS’s failure-consistent update protocol and FCFS-based applications significantly outperform conventional protocols and original applications respectively

Page 32: Fast and Failure-Consistent Updates of Application …storageconference.us/2016/Slides/JiaxinOu.pdfFast and Failure-Consistent Updates of Application Data in Non-Volatile Main Memory

-32-

Thank You !

Jiaxin Ou, Jiwu Shu([email protected])