lustre and nfs v4

21
4/29/2008 1 Lustre and NFS Lustre and NFS v4.0 v4.0 Chris Sosa Chris Sosa For Grimshaw’s Grid For Grimshaw’s Grid Seminar Seminar

Upload: awesomesos

Post on 20-Nov-2014

3.645 views

Category:

Technology


4 download

DESCRIPTION

My presentation contrasting the lustre fs and nfs v4

TRANSCRIPT

Page 1: Lustre And Nfs V4

4/29/2008 1

Lustre and NFS v4.0Lustre and NFS v4.0

Chris SosaChris Sosa

For Grimshaw’s Grid SeminarFor Grimshaw’s Grid Seminar

Page 2: Lustre And Nfs V4

4/29/2008 2

Lustre – MotivationLustre – Motivation

Need for a file system for large Need for a file system for large clusters that has the following clusters that has the following attributesattributes Highly scalable > 10,000 nodesHighly scalable > 10,000 nodes Provide petabytes of storageProvide petabytes of storage High throughput (100 GB/sec)High throughput (100 GB/sec)

Datacenters have different needs so Datacenters have different needs so we need a general-purpose back-end we need a general-purpose back-end file systemfile system

Page 3: Lustre And Nfs V4

4/29/2008 3

Lustre = Linux + ClusterLustre = Linux + Cluster

Peter Braam created the design for Peter Braam created the design for Lustre at CMU whom went on to Lustre at CMU whom went on to found Cluster File Systemsfound Cluster File Systems

Cluster File Systems was bought by Cluster File Systems was bought by Sun in last 2007 – Lustre now part of Sun in last 2007 – Lustre now part of SunSun

Lustre is the file system with the Lustre is the file system with the largest share in HPC (see BlueGene largest share in HPC (see BlueGene (or not))(or not))

Page 4: Lustre And Nfs V4

4/29/2008 4

Features of LustreFeatures of Lustre

Open-source object-based cluster file Open-source object-based cluster file system system

Fully compliant with POSIXFully compliant with POSIX Features (i.e. what I will discuss)Features (i.e. what I will discuss)

Object ProtocolsObject Protocols Intent-based LockingIntent-based Locking Adaptive Locking PoliciesAdaptive Locking Policies Aggressive CachingAggressive Caching

Page 5: Lustre And Nfs V4

4/29/2008 5

System OverviewSystem Overview

Page 6: Lustre And Nfs V4

4/29/2008 6

Object ProtocolsObject Protocols

Page 7: Lustre And Nfs V4

4/29/2008 7

Intent-based LockingIntent-based Locking

Page 8: Lustre And Nfs V4

4/29/2008 8

Adaptive Locking PoliciesAdaptive Locking Policies Policy depends on contextPolicy depends on context Mode 1: Performing Mode 1: Performing

operations on something operations on something they only mostly use they only mostly use (e.g. /home/username)(e.g. /home/username)

Mode 2: Performing Mode 2: Performing operations on a highly operations on a highly contentious Resource contentious Resource (e.g. /tmp)(e.g. /tmp)

DLM capable of granting DLM capable of granting locks on an entire subtree locks on an entire subtree and whole files and whole files

Page 9: Lustre And Nfs V4

4/29/2008 9

Aggressive CachingAggressive Caching Keeps local journal of Keeps local journal of

updates for locked filesupdates for locked files One per file operationOne per file operation Hard linked files get Hard linked files get

special treatment with special treatment with subtree lockssubtree locks

Lock revoked -> Lock revoked -> updates flushed and updates flushed and replayed replayed

Use subtree change Use subtree change times to validate cache times to validate cache entriesentries

Additionally features Additionally features collaborative caching -> collaborative caching -> referrals to other referrals to other dedicated cache servicededicated cache service

Page 10: Lustre And Nfs V4

4/29/2008 10

On to NFS Version 4.0On to NFS Version 4.0

Page 11: Lustre And Nfs V4

4/29/2008 11

MotivationMotivation

We want a file system that provides We want a file system that provides distributed transparent access in a distributed transparent access in a heterogeneous networkheterogeneous network

NFS pre 4 had a lot of issuesNFS pre 4 had a lot of issues Caches had no guaranteesCaches had no guarantees Terrible failure semanticsTerrible failure semantics

Hanging locksHanging locks Server / Clients were never sure of anythingServer / Clients were never sure of anything

Data coherency, what’s that?Data coherency, what’s that?

Page 12: Lustre And Nfs V4

4/29/2008 12

Overview of NFS v4Overview of NFS v4

Stateful ProtocolStateful Protocol Compound OperationsCompound Operations Lease-based LocksLease-based Locks ““Delegation” to clientsDelegation” to clients Close-Open Cache ConsistencyClose-Open Cache Consistency Better securityBetter security

Page 13: Lustre And Nfs V4

4/29/2008 13

StatefulStateful

Borrowed model from CIFS (Common Internet Borrowed model from CIFS (Common Internet File System) see MS (Marty’s supporters)File System) see MS (Marty’s supporters)

Open/CloseOpen/Close Opens also handles creates, etc.Opens also handles creates, etc. Close semanticsClose semantics Opens do byte locking and file locking atomically Opens do byte locking and file locking atomically

on the openon the open Locks / delegation released on file closeLocks / delegation released on file close Everything done with file handlesEverything done with file handles Always a notion of a “current file handle” i.e. see Always a notion of a “current file handle” i.e. see pwdpwd

Page 14: Lustre And Nfs V4

4/29/2008 14

COMPOUND OpsCOMPOUND Ops

Problem: Normal Problem: Normal filesystem semantics filesystem semantics have too many RPC’s have too many RPC’s (boo)(boo)

Solution: Group many Solution: Group many calls into one call (yay)calls into one call (yay)

SemanticsSemantics Run sequentiallyRun sequentially Fails on first failureFails on first failure Returns status of each Returns status of each

individual RPC in the individual RPC in the compound response (either compound response (either to failure or success)to failure or success)Compound

Kitty

Page 15: Lustre And Nfs V4

4/29/2008 15

Lease-based LocksLease-based Locks

Both byte-range and file locksBoth byte-range and file locks Heartbeats keep locks alive (renew Heartbeats keep locks alive (renew

lock)lock) A lease on every lock that indicates A lease on every lock that indicates

that the client is still upthat the client is still up If server fails, waits at least the agreed If server fails, waits at least the agreed

upon lease time (constant) before upon lease time (constant) before accepting any other lock requestsaccepting any other lock requests

If client fails, locks are released by If client fails, locks are released by server at the end of lease periodserver at the end of lease period

Page 16: Lustre And Nfs V4

4/29/2008 16

DelegationDelegation

Tells client no one else has the file (similar to Tells client no one else has the file (similar to Lustre’s first mode)Lustre’s first mode)

Client exposes callbacksClient exposes callbacks Difference here between 4.0 / 4.1 Difference here between 4.0 / 4.1 Here’s a second bulletHere’s a second bullet

Page 17: Lustre And Nfs V4

4/29/2008 17

Close-Open ConsistencyClose-Open Consistency

Any opens that happen after a close Any opens that happen after a close finishes are consistent with the finishes are consistent with the information with the last closeinformation with the last close

Last close wins the competitionLast close wins the competition Not coherent (without locks)Not coherent (without locks) You have to reopen to see if you wonYou have to reopen to see if you won

Page 18: Lustre And Nfs V4

4/29/2008 18

SecuritySecurity

Uses the GSS-API Uses the GSS-API frameworkframework

All id’s are formed All id’s are formed withwith User@domainUser@domain Group@domainGroup@domain

Every Every implementation must implementation must have Kerberos v5have Kerberos v5

Every Every implementation must implementation must have LIPKeyhave LIPKey

Meow

Page 19: Lustre And Nfs V4

4/29/2008 19

Other StuffOther Stuff

Replication / Migration mechanism addedReplication / Migration mechanism added Special error messages to indicate migrationSpecial error messages to indicate migration Special attribute for both replication and Special attribute for both replication and

migration that gives the location of the migration that gives the location of the other / new locationother / new location

If file system response is too slow or get the If file system response is too slow or get the special error message, can check the special special error message, can check the special attribute for the read-only replica (or stop attribute for the read-only replica (or stop using security)using security)

Page 20: Lustre And Nfs V4

4/29/2008 20

Comparison of NFSv3 and NFSv4Comparison of NFSv3 and NFSv4

Page 21: Lustre And Nfs V4

4/29/2008 21

Questions?Questions?