designing a true direct-access file system with...

47
Designing a True Direct-Access File System with DevFS Yuangang Wang, Jun Xu, Gopinath Palani Huawei Technologies Sudarsun Kannan, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University of Wisconsin-Madison

Upload: dangmien

Post on 21-May-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Designing a True Direct-Access File System with DevFS

Yuangang Wang, Jun Xu, Gopinath Palani

Huawei Technologies

Sudarsun Kannan, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau

University of Wisconsin-Madison

Page 2: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Modern Fast Storage Hardware

• Faster nonvolatile memory technologies such as NVMe, 3D Xpoint

Hard Drives

H/W Lat: 7.1ms 68us 12us

BW: 2.6MB/s 250MB/s 1.3GB/s

S/W cost: 8us 8us 6us

OS cost: 5us 5us 4us

PCIe-Flash 3D Xpoint

• Bottlenecks shift from hardware to software (file system)2

Page 3: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Why Use OS File System?

• Millions of applications use OS-level file system (FS)

• Object stores have been designed to reduce OS cost [HDFS, CEPH]

- Need faster file systems and not new interface

- Guarantees integrity, concurrency, crash-consistency, and security

• User-level POSIX-based FS fail to satisfy fundamental properties

- Developers unwilling to modify POSIX-interface

3

Page 4: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

DevFS

NVMe

Application

Read/Write data

Metadata

Data

Data

Data

Device-level File System (DevFS)

• Move file system into the device hardware

• Use device-level CPU and memory for DevFS

• Apps. bypass OS for control and data plane

• DevFS handles integrity, concurreny, crash-

consistency, and security

• Achieves true direct-access

FS kernel

Check security

Update metadata

4

Update data

Check security

Update metadata

Update data

Page 5: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

• Limited memory inside the device

• DevFS lack visibility to OS state (e.g., process permission)

Challenges of Hardware File System

- Reverse-cache inactive file system structures to host memory

- Make OS share required (process) information with “down-call”

5

Page 6: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

• Emulate DevFS at the device-driver level

• Benchmarks - more than 2X write and 1.8X read throughput

Performance

• Snappy compression application - up to 22% higher throughput

• Memory-optimized design reduces file system memory by 5X

• Compare DevFS with state-of-the-art NOVA file system

6

Page 7: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Introduction

Background

Motivation

DevFS Design

Evaluation

Conclusion

Outline

Page 8: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

FS kernel

Check security

Update metadataUpdate data

NVMe

Application

Read/Write data

Maintain security, manage integrity, crash-consistency, and concurrency

Metadata

Data

Data

Data

Traditional S/W Storage Stack

8

Page 9: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

FS kernel

Check security

Update metadataUpdate data

NVMe

Application

Read/Write data

Metadata

Data

Data

Data

Traditional S/W Storage Stack

User-to-kernel switch for every data plane operation

High software-indirection latency before storage access

9

Page 10: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

SSD

FS library

Application

Read/Write data

FS kernel

Challenge 1: How to bypass OS and provide direct-storage access?

Holy grail of Storage Research

Challenge 2: How to provide direct-access without compromising integrity, concurrency, crash-consistency, and security?

MetadataData

10

Page 11: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

• Prior approaches have attempted to provide user-level direct access

Classes of Direct-Access File Systems

• We categorize them into four classes:

- Hybrid user-level

- Hybrid user-level with trusted server (Microkernel approach)

- Hybrid device

• Full device-level file system (proposed)

11

Page 12: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Hybrid User-level File System

NVMe

FS kernel

ApplicationFS lib

Read/Write Data

Sharing, protection

• Split file system into user library and kernel file components

• Library handles data plane (e.g., read, write) and manages metadata

• Kernel FS handles control plane (e.g., file creation)

Well known hybrid approaches- Arrakis (OSDI ’14)- Strata (SOSP ’17)

Create file

12

Page 13: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Hybrid Device File System

• File system split across user-level library, kernel, and hardware

• Control and data-plane operations same as hybrid user-level FS

• However, some functionalities moved inside the hardware

Well known hybrid approaches- Moneta-D (ASPLOS ‘12)

Application

Read/Write Data

NVMe

FS kernel

FS lib

Sharing, protection

Manage metadata

FS H/WPerm. CheckTx

- TxDev (OSDI ‘08)Create file

13

Page 14: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Introduction

Background

Motivation

DevFS Design

Evaluation

Conclusion

Outline

Page 15: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

File System Properties

• Integrity

• Crash-consistency

• Security

- Correctness of FS metadata for single & concurrent access

- FS metadata consistent after a failure

- No permission violation for both control and data-plane- OS-level file system checks permission for control and data plane

15

Page 16: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

NVMe

FS kernel

ApplicationFS lib

Coordinate sharing, protection

Manage metadata Direct-access for the data-plane

Hybrid User-level FS Integrity Problem

Create fileMetadata

Data

Arrakis (OSDI ’14), Strata (SOSP ’17)

16

Page 17: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Hybrid User-level FS Integrity Problem

NVMe

FS kernel

ApplicationFS lib

Coordinate sharing, protection

Manage metadataUntrusted (buggy or malicious)

MetadataData

MetadataData

Can compromise metadata integrity and impact crash consistency

Data plane security compromised

Create file

17

Page 18: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

1Free block bitmap

Set bitmapAppend

Update inode

Data block

Set bitmapAppend

Update inode

inode {size = 0 m_time = 2

}

inode {size = 4K m_time = 1

}

1

Append(F1, buff, 4k) Append(F1, buff, 4k)App. 1FS lib

App. 2FS lib

Concurrent Access?

Arrakis and Strata trap into OS for data-plane and control plane – No direct access

Skip locking

18

18

Page 19: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Approaches Summary

Class File System

Inte

grit

y

Cra

shC

onsi

sten

cy

Secu

rity

Con

curr

ency

PO

SIX

su

ppor

t

Dir

ect-

acce

ss

Kernel-level FS NOVA

Hybrid user-level FS

Arrakis

Strata

Microkernel Aerie

Hybrid-device FS Moneta-D

TxDev

FUSE Ext4-FUSE

Device FS DevFS

19

Page 20: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Introduction

Background

Motivation

DevFS Design

Evaluation

Conclusion

Outline

Page 21: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

DevFS

NVMe

Application

Read/Write data

Metadata

Data

Data

Data

Device-level File System (DevFS)

• Move file system into the device hardware

• Use device-level CPU and memory for DevFS

• Apps. bypass OS for control and data plane

• DevFS handles integrity, concurreny, crash-

consistency, and security

• Achieves true direct-access

FS kernel

Check security

Update metadata

21

Update data

Check security

Update metadata

Update data

Page 22: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

DevFS

DevFS Internals

Controller CPU

Global structures

On-disk file metadata

In-memory metadata

Super Block

Bitmaps Inodes Dentries

Super Block

Bitmaps Inodes Dentries

Per-file structures

22

Page 23: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

DevFS Internals

Per-file structures

Controller CPU

Submission queue (SQ)

Completionqueue (SQ)

Journal Data

Per-file blocks

Per-file Journal

In-memory filemap tree/root

/root/dir/root/proc

filemap {*dentry*inode;*queues

*mem_journal*disk_journal

}

Global structures

On-disk file metadata

In-memory metadata

Super Block

Bitmaps Inodes Dentries

Super Block

Bitmaps Inodes Dentries

• Modern storage device contain multiple CPUs

• Support up to 64K I/O queues

• To exploit concurrency, each file has own I/O queue and journal

DevFS

23

23

Page 24: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

DevFS Internals

Per-file structures

Vaddr = CreateBuffer()

Controller CPU

Submission queue (SQ)

Completionqueue (SQ)

Journal Data

Per-file blocks

Per-file Journal

ApplicationUser FS lib

In-memory filemap tree/root

/root/dir/root/proc

filemap {*dentry*inode;*queues

*mem_journal*disk_journal

}

Global structures

On-disk file metadata

In-memory metadata

Super Block

Bitmaps Inodes Dentries

Super Block

Bitmaps Inodes Dentries

OS allocated command buffer

DevFS

24

24

Page 25: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Per-file structures

ApplicationUser FS lib

On-disk file metadata

In-memory metadata

In-memory filemap tree/root

/root/dir/root/proc

filemap {*dentry*inode;*queues

*mem_journal*disk_journal

}

Submission queue (SQ)

Completionqueue (SQ)

Global structures

Controller CPU

DevFS I/O Operation

Cmd

Cmd

Super Block

Bitmaps Inodes Dentries

Super Block

Bitmaps Inodes Dentries

JournalJournal

Journal Data

Per-file blocks

Open(f1)

Per-file Journal

OS allocated command buffer

DevFS

25

25

Page 26: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Per-file structures

ApplicationUser FS lib

On-disk file metadata

In-memory metadata

In-memory filemap tree/root

/root/dir/root/proc

filemap {*dentry*inode;*queues

*mem_journal*disk_journal

}

Submission queue (SQ)

Completionqueue (SQ)

Global structures

Controller CPU

DevFS I/O Operation

Cmd

Cmd

Super Block

Bitmaps Inodes Dentries

Super Block

Bitmaps Inodes Dentries

JournalJournalJournal

Journal Data

Per-file blocks

Open(f1)

Per-file Journal

OS allocated command buffer

DevFS

26

26

Page 27: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Per-file structures

ApplicationUser FS lib

On-disk file metadata

In-memory metadata

In-memory filemap tree/root

/root/dir/root/proc

filemap {*dentry*inode;*queues

*mem_journal*disk_journal

}

Submission queue (SQ)

Completionqueue (SQ)

Global structures

Controller CPU

DevFS I/O Operation

Cmd

Cmd

Super Block

Bitmaps Inodes Dentries

Super Block

Bitmaps Inodes Dentries

JournalJournalJournal

Journal Data

Per-file blocks

Write(fd, buff, 4k, off=3)

Per-file Journal

OS allocated command buffer

DevFS

27

27

Page 28: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

• Capacitors safely flush memory state to storage after power failure

• Capacitance support improves performance

Capacitance Benefits Inside H/W

• DevFS uses device memory for file system state

- Can avoid writing in-memory state to disk journal

- Overcomes the “double writes” problem

• Writing journals to storage has high overheads

• Modern storage devices have device-level capacitors

28

Page 29: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

• Limited memory inside the storage device

• DevFS lack visibility to OS state (e.g., process permission)

Challenges of Hardware File System

- Reverse-cache inactive file system structures to host memory

- Make OS share required information with “down-call”

- Please see the paper for more details

29

today’s focus

Page 30: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Device Memory Limitation

• RAM used mainly by file translation layer (FTL)

• Device RAM size constrained by cost ($) and power consumption

- RAM size proportional to FTL’s logical-to-physical block mapping

- Example: 512 GB SSD uses 2 GB RAM to support translations

Unlike kernel FS, device FS footprint must be kept small

30

Page 31: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Memory Consuming File Structures• Our analysis shows four in-memory structures using 90% of memory

- Inode (840 bytes) - created for file open, not freed until deletion

- Dentry (192 bytes) - created for file open, kept in a cache

- File pointer (256 bytes) - released when file is closed

- Others (156 bytes) - e.g., DevFS file map structure

- DevFS memory consumption ~1.2 GB (60% of device memory)• Simple workload - open and close 1 million files

31

Page 32: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Reducing Memory Usage

• Reverse Caching

• On-demand allocation of structures

- Structures such as filemap not used after file is closed

- Allocated after first write and released when a file is closed

- Move inactive structures to host memory

32

Page 33: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

0. Reserved during mount

3. open(file)

Device memoryInode list

Dentry listFile Ptr list

DevFS

Reverse-Caching to Reduce Memory

Host memoryInode Cache

Dentry Cache

Host

Application

4. Check host for dentry and inode

5. Move to device and delete cache

1. close(file)

2. Move to host cache

• Move inactive inode and dentry structures to host memory

33

33

Page 34: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Decompose FS Structures• Reverse caching for a complicated for inode

• Inode’s fields accessed even file closing (e.g., directory traversal)

• Frequently moving between host cache and device can be expensive!

• Our solution – split file system structures (e.g., inode) into a host and device structure

34

Page 35: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Devfs inode structure

struct devfs_inode_info {

inode_list

page_tree

journals

…….

struct inode vfs_inode

}

Decompose FS Structures

Decomposed DevFS structure

struct devfs_inode_info {/*always kept in device*/struct *inode_device

/*moved to host after close*/struct *inode_host

}

840 bytes

593 bytes

35

Page 36: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Introduction

Background

Motivation

DevFS Design

Evaluation

Conclusion

Outline

Page 37: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Evaluation

- Filebench

- Snappy – widely used multi-threaded file compression

• Benchmarks and Applications

• Evaluation comparison

- NOVA – state-of-the-art in-kernel NVM file system- DevFS-naïve – DevFS without direct access- DevFS-cap – without direct access but with capacitor support

- DevFS-cap-direct – capacitor support + direct access

• For direct-access, benchmark and applications run as driver

37

Page 38: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Filebench - Random Write

0

4

8

12

16

1KB 4KB 16KB

100K

Ops

/Sec

ond NOVA DevFS-naïve

DevFS-cap DevFS-cap-direct

• DevFS-naïve suffers from high journaling overhead

• DevFS-cap uses capacitors to avoid on-disk journaling

27%

• DevFS-cap-direct achieves true direct-access bypassing OS

2.4X

38

Page 39: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

0

0.2

0.4

0.6

0.8

1

1.2

1KB 4KB 16KB 64KB 256KB

100K

Ops

/Sec

ond

NOVA DevFS-naïveDevFS-cap DevFS-cap-direct

Snappy Compression Performance

File Size

Read a file Compress Write output Sync file

• Gains even for compute + I/O intensive application

22%

39

Page 40: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Memory Reduction Benefits

0

400

800

1200

1600

Cap Demand Dentry Inode + Dentry

Mem

ory

Usa

ge (

MB)

filemap dentry inode

• Demand allocation reduces memory consumption by 156MB (14%)

• Inode and Dentry reverse caching reduces memory by 5X

No memoryreduction

On-demand FS structures

Reverse caching Dentry

Reverse caching Dentry + Inode

• Filebench – File Create workload (Create 1M files and close files)

40

Page 41: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

0

0.4

0.8

1.2

1.6

2

Cap Demand Dentry Inode +Dentry

Inode +Dentry +

Direct

100

K O

ps/s

ecMemory Reduction Performance Impact

• Dentry and Inode reverse caching overhead less than 14%

• Overhead mainly due to structure movement cost

14%

41

Page 42: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Summary

• Motivation- Eliminating OS overhead and providing direct access is critical- Hybrid user-level file systems compromise fundamental properties

• Solution- We design DevFS that moves FS into the storage H/W- Provides direct-access without compromising FS properties- To reduce memory footprint of DevFS designs reverse-caching

• Evaluation- Emulated DevFS shows up to 2X I/O performance gains- Reduces memory usage by 5X with 14% performance impact

42

Page 43: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Conclusion

• We are moving towards a storage era with microsecond latency

• Eliminating software (OS) overhead is critical

- But without compromising fundamental storage properties

• Near-hardware access latency requires embedding S/W into H/W

• We take first step towards moving file system in H/W

• Several challenges such as H/W integration, support for RAID,

snapshots, and deduplication yet to be addressed

43

Page 44: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Permission Checking

44

APP

User-FS

OS

Host CPU Credentials

0 Task1.cred

1 Task1.cred… …24 Task2.cred

Set credential in DevFS

DevFS credential region

Permission manager

Write(UID, buff, 4k,off=1)

payload=buffops = READ

UID= 1off = 1

size = 4K

t_cred = get_task_cred(CPUID)inode_cred= get_inode_cred(fd)compare_cred(t_cred, inode_cred)

1

Process scheduled to CPU

User space

2

3

4

Page 45: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Concurrent Access

0

0.5

1

1.5

2

1 4 8 12 16

100K

Ops

/Sec

ond

#. Of Instances

NOVA DevFS [+cap] DevFS [+cap +direct]

• Limited device CPUs restricts DevFS scaling

Limited CPUs inside device

• DevFS uses only 4 device CPU45

Page 46: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Slow CPU Impact – Snappy 4KB

46

0

0.2

0.4

0.6

0.8

1

1.2

1.2 1.4 1.8 2.2 2.6

100K

Ops

/sec

CPU Frequency (GHz)

DevFS [+cap] DevFS [+cap +direct]

Page 47: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been

Questions?

Thanks!

47