nvram-oriented lustre persistent cache on client · write performance (rw-pcc, filebench) workload...

42
NVRAM-oriented Lustre Persistent Cache on Client Lingfang Zeng, Xi Li*, and Wen Cheng Wuhan National Laboratory for Optoelectronics (WNLO) Huazhong University of Science and Technology (HUST) *DDN / Whamcloud

Upload: others

Post on 20-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

NVRAM-oriented Lustre Persistent Cache on Client

Lingfang Zeng, Xi Li*, and Wen ChengWuhan National Laboratory for Optoelectronics (WNLO)

Huazhong University of Science and Technology (HUST)

*DDN / Whamcloud

Page 2: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Yingjin Qian, Shuichi Ihara, Carlos Aoki

Thomaz, and Shilong Wang @ DDN

Andreas Dilger @ DDN/Whamcloud

Tim Süß, and André Brinkmann @ JGU

Chunyan Li, Fang Wang, and Dan Feng @

HUST

LPCC

SC2019

With Contributions from

Page 3: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Outline

BACKGROUND

PROBLEM &

TERMINOLOGY &

OBJECTIVES

METHODS

HIERARCHICAL

PERSISTENT

CLIENT

CACHING

IMPLEMENTATION

RW-PCC & RO-PCC &

RULE-BASED TRIGGERING &

POLICY ENGINE

EVALUATIONS

EXPERIMENT &

RESULTS

Page 4: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

01 BACKGROUNDPROBLEM & TERMINOLOGY & OBJECTIVES

Page 5: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Hierarchical Storage Management (HSM)

https://semiengineering.com/a-new-memory-contender/

https://storageswiss.com/2018/01/10/enterprise-needs-to-learn-from-hpc-environments/

HPC workloads were too big to be stored only on flash

Page 6: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Compute servers HBM

NVRAM/SCM

Performance storage DRAM

SSD

(performance HDD)

Capacity storage DRAM

Capacity HDD

HSM Tier

Supercomputer

NVRAM/SCM

SSD Burst Buffer

HDD/SSD Parallel File

System

SMR Object Store

DVD/Tape Archive

Lang’s Law: the more tiers, the more tears

Page 7: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Problems

Performance Speed defines the winner

Cache

Utilization rate (Lustre client devices) NVMe

Flash-based SSD

NVRAM/SCM

Data consistency

Transparency

Page 8: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Industry and Academic Solutions

Andrew File System [TOCS’88, CMU]

Coda File System [TOCS’88, CMU]

FS-Cache [Linux Symposium’06, Red Hat]

BWCC [CLUSTER’12, CAS]

Nache [FAST’07, RU & IBM]

Panache [FAST’10, IBM]

Mercury [MSST’12, NetApp]

GPFS’ LROC [IBM]

TRIO [CLUSTER’15, FSU & ORNL & AU]

BurstFS [SC’16, FSU & LLNL]

MetaKV [IPDPS’17, FSU & LLNL]

Dmcache [TOCS’88, CMU]

Xcachefs [SBU, 2005]

FlashCache [CASES’06, UM]

Bcache [LWN, 2010]

Page 9: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Related Work

FSCache

NFS

AFS

ISOFS

CacheFS

CacheFiles

OtherCache

Reference: Howells, FS-Cache: A Network Filesystem Caching Facility, Red Hat UK Ltd.

Sivathanu+, A Versatile Persistent Caching Framework for File Systems, Stony Brook University, Technical Report FSL-05-05.

• Read-only cache

• Tolerate I/O failures in cache

• File system meta-operations (both

cache and source)

Page 10: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Related Work

Reference: Eshel+, Panache: A parallel file system cache for global file access, FAST'10

Wang+,An ephemeral burst-buffer file system for scientific applications, SC'16

Page 11: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Lustre File SystemManagement

Target (MGT)

Metadata Targets

(MDTs)

Metadata

Servers

(~10's)

NVMe MDTs

on client net

NVMe OSTs/LNet routers on

client network "Burst Buffer"

SAS Object Storage Targets

(OSTs)

SATA SMR Archive OSTs

(Erasure Coded)

Figure based on Andreas Dilger's Lustre User Group 2018 presentation: Lustre 2.12 and beyond (see http://opensfs.org/lug-2018-agenda/)

Page 12: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Shared DDN IME @ ICHEC

Cray Trinity @ LANL

HSM Tier

• Data plane

• Control plane

• Erasure coding

Page 13: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Shared DDN IME @ ICHEC

Cray Trinity @ LANL

HSM Tier

• Storage-side flash acceleration

• I/O histogram

• Performance statistics

• Dynamic flush

Server-side Seagate Nytro NXD @ Sanger

Page 14: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Shared DDN IME @ ICHEC

Cray Trinity @ LANL

HSM Tier

• LPCC

Server-side Seagate Nytro NXD @ Sanger

Client-side Lustre Persistence Client Cache (LPCC)

Page 15: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Distributed lock manager (DLM) Data and metadata consistency

A separate namespace

Excusive mode (EX) lock

Concurrent read mode (CR) lock

L.Gen field

Lustre’s DLM and Layout Lock

Server Namespace

Client2 NamespaceClient1 Namespace

Lock ALock B

Lock C

Lock ALock B Lock C

Page 16: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Lustre HSM

Refererence: Lustre manual. http://doc.lustre.org/lustre_manual.pdf

Agents – Lustre file system clients running Copytool

Coordinator – Act as an interface between the policy

engine, the metadata server(MDS) and the Copytool

Page 17: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Logical two-tier (with physical multitier) Simple and efficient architecture (memory vs. disk)

A global namespace Space efficient

Latencies and lock conflicts can be significantly

reduced

Caching reduces the pressure on (OSTs) small or random I/Os can be regularized to big sequential I/Os and

temporary files do not even need to be flushed to OSTs.

Key Idea

Page 18: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

02 METHODSHIERARCHICAL PERSISTENT CLIENT CACHING

Page 19: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Overview of LPCC Architecture

Page 20: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

I/O Path

VFS

Llite

LOV

NVM-FS

User Application

Normal

I/OCache

I/O

NVM

OSC-1 OSC-N

PTR-RPC

LNET

Page 21: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

03 IMPLEMENTATIONRW-PCC & RO-PCC & RULE-BASED TRIGGERING &

POLICY ENGINE

Page 22: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Lustre Read-Write PCC Caching (attach)

Page 23: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Lustre Read-Write PCC Caching (restore)

• Notify all clients

having cached the

layout to invalidate

their layouts

Page 24: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Lustre Read-only PCC Caching (attach)

Page 25: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Lustre Read-only PCC Caching (I/O flow)

Page 26: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Different user, groups, and projects or filenames E.g. (projid={500,1000} & fname=*.h5),(uid=1001)

Quota limitation Cache isolation

Auto LPCC caching mechanism

Rule-based Persistent Client Caching

Page 27: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Policy engine Manage data movement

Lustre changelogs Periodic prefetching decision

LRU and SIZE

Cache Prefetching and Replacement

Page 28: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

04 EVALUATIONSEXPERIMENT & RESULTS

Page 29: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Simulate DRAM to NVRAM DRAM (28GB)/NVM(100GB)

Write latency 200ns

Clients(8)

OSS(3)

MDS(1)

Evaluation SetupServer Inspur SA5248L

CPU Intel(R) Xeon(R) CPU E5-2620, 2.00GHz

DRAM 128GB

SSD Kingston SA400S37/240G

HDD SAS Disks

OS Centos-7.5

Kernel 3.10.0-862.6.3

Network 1GB Ethenet

Lustre Lustre 2.11.53

Fio fio-3.1

Filebench filebench 1.5-alpha3

IOR IOR-3.2.0

Page 30: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Single Client Read Performance (fio)

0

1000

2000

3000

4000

5000

6000

7000

1K 4K 64K 512K 1M 4M

Ba

nd

wid

th (

MB

/s)

I/O Block size

lustre ssd-LPCC pm-ext4-LPCC ext4_dax-LPCC pmfs-LPCC nova-LPCC

Page 31: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Single Client Read Performance (fio)

0

100000

200000

300000

400000

500000

600000

1K 4K 64K 512K 1M 4M

Th

rou

gh

pu

t (I

OP

S)

I/O Block size

lustre ssd-LPCC pm-ext4-LPCC ext4-dax-LPCC pmfs-LPCC nova-LPCC

Page 32: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Single Client Read Performance (fio)

0

5000

10000

15000

20000

25000

30000

35000

40000

1K 4K 64K 512K 1M 4M

La

ten

cy (

Mic

rose

con

d)

I/O Block size

lustre ssd pm-ext4 ext4_dax pmfs nova

Page 33: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Single Client Write Performance (fio)

0

200

400

600

800

1000

1200

1400

1600

1800

1K 4K 64K 512K 1M 4M

Ba

nd

wid

th (

MB

/s)

I/O Block size

lustre ssd-LPCC pm-ext4-LPCC ext4-dax-LPCC pmfs-LPCC nova-LPCC

Page 34: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Single Client Write Performance (fio)

0

50000

100000

150000

200000

250000

300000

1K 4K 64K 512K 1M 4M

Th

rou

gh

pu

t (I

OP

S)

I/O Block size

lustre ssd-LPCC pm-ext4-LPCC ext4-dax-LPCC pmfs-LPCC nova-LPCC

Page 35: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Single Client Write Performance (fio)

0

5000

10000

15000

20000

25000

30000

35000

40000

1K 4K 64K 512K 1M 4M

La

ten

cy (

Mic

rose

con

d)

I/O Block size

Lustre SSD-LPCC PM-ext4-LPCC ext4-dax_LPCC PMFS-LPCC NOVA-LPCC

Page 36: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Write Performance (RW-PCC, filebench)

Workload Average file size Num. of Files Read-write ratio

fileserver 128KB 0.3Million 1:2

webwerver 64KB 0.3Million 10:1

varmail 32KB 0.3Million 1:1

0

5000

10000

15000

20000

25000

fileserver webserver varmail

Th

rou

gh

pu

t (I

OP

S)

Workload

Lustre SSD-LPCC NVRAM-LPCCWebserver:NVRAM-LPCC / SSD-LPCCVs. Lustre9.2x8.3x

Varmial:NVRAM-LPCC / SSD-LPCC vs. Lustre1.94x1.5x

Fileserver:NVRAM-LPCC / SSD-LPCC vs. LustreThroughput:4x2x

Page 37: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Read Scale Test (RW-PCC, IOR)

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

1 2 4 8

Ag

gre

ga

te B

an

dw

idth

(M

B/s

)

Num. of Clients

Lustre NVRAM-LPCC SSD-LPCC

Page 38: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

Write Scale Test (RW-PCC, IOR)

0

2000

4000

6000

8000

10000

12000

14000

1 2 4 8

Agg

reg

ate

Ba

nd

wid

th (

MB

/s)

Num. of Clients

Lustre SSD-LPCC NVRAM-LPCC

Page 39: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

RO-PCC Scalability (I/O block size=1MB)

0

1000

2000

3000

4000

5000

6000

1G 2G 4G 8G 16G 32G 64G

Rea

d B

and

wid

th (

MB

/s)

Data Size

Lustre SSD-LPCC NVRAM-LPCC

NVRAM –LPCC vs. SSD-LPCC (12.52x),NVRAM –LPCC vs. Lustre (44.20x)

Page 40: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

RO-PCC Scalability

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

1 2 4 8

Rea

d B

an

dw

idth

(M

B/s

)

Num. of the Clients

Lustre SSD-LPCC NVRAM-LPCC

Page 41: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

The performance of the cache medium has a great influence on the LPCC performance Kingston SA400S37/240G SSD (proposed)

512GB Samsung 840 PRO SSD (SC19)

High performance based on NVRAM Less overhead, and network latencies and lock conflicts

significantly reduced

Reduce the pressure on the OSTs

No page cache

Optimized flush (Load/Store)

NVRAM-LPCC >> pm-ext4-LPCC

Summary

Page 42: NVRAM-oriented Lustre Persistent Cache on Client · Write Performance (RW-PCC, filebench) Workload Average file size Num. of Files Read-write ratio fileserver 128KB 0.3Million 1:2

NVRAM-oriented Lustre Persistent Cache on Client

Lingfang Zeng*, Xi Li#, and Wen Cheng*

*Wuhan National Laboratory for Optoelectronics (WNLO)*Huazhong University of Science and Technology (HUST)

#DDN / Whamcloud

Thanks!