nandfs: a flexible flash file system for ram-constrained systems aviad zuck, ohad barzliay and sivan...

68
NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

Upload: steven-grant

Post on 11-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

NANDFS: A Flexible Flash File System for RAM-Constrained SystemsAviad Zuck, Ohad Barzliay and Sivan Toledo

Page 2: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

Overview

Introduction + motivation Flash properties Big Ideas Going into details Software engineering, tests and experiments General flash issues

Page 3: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

3

Flash is Everywhere

Limi
99% of embedded systems contain some for of flash
Page 4: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

Resilient to vibrations and extreme conditions Faster up 100 times more (random access) than

rotating disks

Page 5: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

What’s missing?

5

Page 6: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

Sequential access And

“Today, consumer-grade SSD costs from $2 to $3.45 per gigabyte, hard drives about $0.38 per gigabyte…”

Computerworld.com, 27.8.2008*

*http://www.computerworld.com/s/article/print/9112065/Solid_state_disk_lackluster_for_laptops_PCs

6

Page 7: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

7

NOR Flash NAND Flash

Looser Constrained

Mostly Reads Storage

Few MB Many MB/GB

Page 8: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

8

Two Ways of Flash Management

NTFSFAText3…

NTFSFAText3…

JFFSYAFFSNANDFS…

JFFSYAFFSNANDFS…

Page 9: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

9

So Why NANDFS?

Page 10: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

10

Page 11: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

1111

NANDFS Also Has:

File locking Transactions Competitive performance and graceful

degradation

Page 12: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

12

How is it Done, in a Nutshell?Explanation does not fit in a nutshell Complex data structures New garbage collection mechanism And much more…

Let’s elaborate

Page 13: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

13

Flash Properties

Page 14: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

14

Flash memory is divided to pages – 0.5KB, 2KB, 4KB Page consists of Data and Metadata areas – 16B of

metadata for every 512B of data Pages arranged in units – 32/64/128 pages per unit Metadata contains unit validity indicator, ECC code and

file system metadata

Page 15: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

1515

Limi
Flash memory is divided to pages – 0.5KB, 2KB, 4KBPages arranged in units – 32/64/128 pages per unitPage bits are initialized to 1’sPage consists of Data and Metadata areas – 16B of metadata for every 512B of dataMetadata is used to indicate validity of a unit, ECC code and file system metadata
Page 16: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

Erasures & Programming

Page bits initialized to 1’s Writing clears bits (1 to 0) Bits set by erasing entire

unit (“erase unit”). Erase unit has limited

endurance

16

Page 17: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

17

The Design of NANDFS -The “Big” Ideas

Page 18: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

18

Log-structured design

Overwrite-in-place is not permitted in flash

Caching avoids rippling effect

Page 19: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

19

Modular Flash File System

Traditional Block Device NANDFS “Block Device”

READ READ

WRITE ALLOCATE-AND-WRITE

(TRIM) TRIM

Modularity is good. But… We need a block device API designated for flash

We call our “block device” the sequencing layer

Limi
We need to allocate a new block and get the handle to itWe need a way to mark blocks that are not usd any more
Page 20: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

2020

High-level Design

A 2-layer structure: File System Layer - transactional file system with

unix-like file structure Sequencing Layer – manages the allocation of

immutable page-sized chunks of data. Assists in crash recovery and atomicity

Page 21: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

2121

The Sequencing Layer

Page 22: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

2222

Divides flash to fixed-size physical units called slots Slots assigned to segments - logical units of the

same size Each segment maps to one physical matching slot,

except one “active segment” which is mapped to two slots.

Page 23: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

2323

Block access Segment ~> Slot mapping table in RAM Block is referenced by a logical handle

<segment_id, offset_in_segment> Address translation

Example: Logical address <0,2> ~> Physical address 8

Page 24: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

2424

Where’s the innovation? Logical address mapping not a new idea:

Logical Disk (1993), YAFFS, JFFS, And more Many FTL’s use some logical address mapping

Full mapping ~> expensive Coarse-grained mapping

Fragmentation, performance degradation Costly merges

Page 25: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

* DFTL: A Flash Translation Layer Employing Demand-based Selective Caching of Page-level Address Mappings (2009)* DFTL: A Flash Translation Layer Employing Demand-based Selective Caching of Page-level Address Mappings (2009)

Page 26: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

The difference in NANDFS NANDFS uses coarse-grained mapping, not full mapping Less RAM for page mapping (more RAM flexibility) Collect garbage while preserving validity of

pointers to non-obsolete blocks

Appropriate for flash, not for magnetic disks

26

Page 27: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

2727

Block allocation NANDFS is log-structured New blocks allocated sequentially from the

active segment. In a log-structured system blocks are never

re-written File pointer structures need to be updated

to reflect the new location of the data.

Page 28: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

2828

Garbage collection

TRIM - pages with obsolete data are marked with a special “obsolete flag”

sequencing layer manages counters of obsolete pages in every segment.

Problem - EUs contain a mixture of valid and obsolete data (pages), we can’t simply collect entire EUs

Solution :Garbage collection is performed together with allocation

Page 29: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

2929

Reclamation unit = Segment The sequencing layer chooses a segment to reclaim, and

allocates it another (fresh) second slot. Reclaim obsolete pages while copying non-obsolete pages

NOTICE – Logical addresses are preserved, although physical translation changed

Page 30: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

Finally when the new slot is full, the old slot is erased. Can now be used to reclaim another segment We choose the segment with the highest obsolete

counter level as the new “active segment”.

This will not go down well in rotating disks – too many seek operations

30

Page 31: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

3131

Sequencing Layer Recovery

When a new slot is allocated to a segment, a segment header is written in the slot’s first page

Header contains: Incremented segment sequencing number Segment number Segment type Checkpoint (further details later)

Page 32: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

3232

On mounting the header of every slot is read The segment-to-slot map can be reconstructed using

only the data from the headers

Other systems (with complete mapping) need to scan entire flash

Page 33: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

3333

Bad EU Management

Each flash memory chip contains some bad EUs Some slots contain more valid EUs than others Solution – some slots are set aside as a bank of

reserve EUs

Page 34: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

3434

Brief Summary

Page 35: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

35

The Design of NANDFS -More Ideas

Page 36: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

36

Wear Leveling Writes and erases should be spread evenly over all EUs

Problem: some slots may be reclaimed rarely Solution: Perform periodic random wear leveling

process Choose random slot and copy it to a fresh slot Incurs only a low overhead Guarantees near-optimal expected endurance

(Ben-Aroya and Toledo, 2006)

Technique widely used (YAFFS, JFFS)

Page 37: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

37

Transactions

File system operations are atomic and transactional

Marking pages as obsolete is not straightforward Simple transaction – block re-write

After rewriting, old data block should be marked obsolete

If we mark it, and the transaction aborts before completing, old data should remain valid

If already marked as obsolete – cannot undo

Page 38: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

38

Solution: Perform valid-to-obsolete-transition (or VOT) AFTER the transaction commits.

Write VOT records to flash in dedicated pages On commit use VOT records to mark pages as obsolete Maintain linked list of all pages written in a specific

transaction on flash Keep in RAM a pointer to the last page written in a

transaction On abort mark all pages written by the transaction as

obsolete

Page 39: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

3939

Page 40: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

40

Checkpoints Snapshot of system state Ensures returning to stable state following a crash Checkpoint is written:

As part of a segment header. Whenever a transaction commits.

Structure: Obsolete counters array Pointer to last-written block address of committed

transaction Pointers to the last-written blocks of all on-going

transactions Pointer to root inode

Page 41: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

41

Simple Example

Page 42: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

42

Finding the Last Checkpoint In every given time there is only one valid

checkpoint in flash On mounting

Locate last allocated slot (using its sequence #) Perform binary search to see if another later checkpoint

exists in the slot Aborting all other transactions Truncate all pages written after the checkpoint Finishing the transaction that was committed

Page 43: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

File System Layer

43

Page 44: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

44

Files represented by inode trees File metadata Direct pointers to data pages Indirect pointers etc.

All pointers are logical pointers Regular files not permitted to be sparse

Page 45: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

45

Root file and directory inodes may be sparse. Hole indicated by special flag

Page 46: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

46

The Root File Array of inodes

Page 47: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

47

When a file is deleted a page-size hole is created

When creating a file a hole can easily be located

If no hole exists, allocate a new inode by extending the root file

Page 48: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

48

Directory Structure Directory = array of directory entries

inode number Length UTF-8 file name.

Direntry length <= 256 bytes. Direntries packed into chunks without gaps

Page 49: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

49

chunk size < (page - direntry size) ~> directory contains “hole”

Allocating new direntry requires finding a hole Direntry Lookup is sequential

Page 50: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

50

System Calls

Most system calls (creat, unlink, mkdir…) are atomic transactions

Transaction that handles a write() commits only when on close() System calls that modify a single file can be bundled

into a single transaction 5 consecutive calls to write() + close() on a single file

are treated as a single transaction Overhead of transaction commit ~ 1

Actual physical page writes

Minimum possible page writes

Page 51: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

51

Running Out of Space Log-structured file system writes even when user deletes files When flash is full, the system may have too few free pages to

delete a file Solution – maintain number of free+obsolete pages. If next write lowers this number below threshold - abort

transactions until we have enough free pages

Threshold is : c = # of blocks written on direntry delete = max file pages = re-do records per page.

( ) /c b l

c

( )b l

Page 52: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

Software Engineering

52

Page 53: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

Coding Code written with intention to be “humanly

readable” (&(transactions[tid]))->f_type = 0x02vs. TRANSACTION_SET_FTYPE(tid, FTYPE_FILE)

Embedded development External libraries not an option (math, string) More macros, less functions (stack) No debugging – need good simulator! Various gcc compliances – cygwin, debian, arm-gcc

53

Page 54: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

Incremental development

High level and Low level design preceded development 3 weeks

Code written bottom up Flash driver –> sequencing layer –> file system layer Caching layer added later. Challenging… 1 year (~commercial code)

Test driven development “By hand” (no libraries)

54

Page 55: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

My own boss - lessons

Time frames Outsider notes

Feedback “pairing”

55

Page 56: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

56

Experiments & Tests

Page 57: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

5757

Testing

Extensive test-suite: Integration and performance tests Extensive crash tests Large set of unit tests for every function

Integrated to eCos Tests and integration verified on actual 32 MB

flash

Page 58: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

5858

Experiments

Simulated 1GB flash Configuration - 512 slots, 8 reserved for bad-

block replacement 6 open files and 8 file descriptors 3 concurrent transactions

Page 59: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

5959

Workload

Page 60: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

60

Slot Partitioning

60

Page 61: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

6161

Mounting

YAFFS mounting time - 2.7s 80% utilization

Limi
256MB simulation
Page 62: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

62

Endurance

Repeatedly re-write a small file when the file system contains a static 205MB file.

Page 63: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

(Some) Challenges in flash

63

Page 64: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

Single vs. Multi level cell Flash classified by number of bits stored in a

single cell

64

SLC (1 bits) MLC (2-4 bit)

Smaller capacity Cheaper

Errors from partial writes Write-constrained

Faster More error-prone

Less endurance

Page 65: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

Parallelism

*Picture from N Agrawal, V Prabhakaran, T Wobber (2008)

65

Page 66: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

Simple example for utilizing parallelism

* J Seol, H Shim, J Kim, and S Maeng (2009)

66

Page 67: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

Enterprise storage

* SW Lee, B Moon, C Park, JM Kim, SW Kim (2008)

Disk bandwidth (sequential) still 2-3 times higher than flash

Read/write latency flash smaller than disk by more than an order of magnitude

This improves throughput of transaction processing – useful for database servers

67

Page 68: NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo

6868

The End

Thank you!