CSS434 DFS 1
CSS434 Distributed File SystemsCSS434 Distributed File SystemsTextbook Ch8, 13Textbook Ch8, 13
Professor: Munehiro Fukuda
CSS434 DFS 2
DFS Desirable Features Transparency:
Access transparency: a single set of operations Location transparency: uniform file name space Mobility transparency: file mobility Performance transparency: Comparable to a centralized file system
Concurrency and synchronization: should complete concurrent access requests consistently.
Forward/backward validation File caching and replication:
Caching: at client/server for scalability Replication: at multiple servers for availability
Heterogeneity: should allow a variety of nodes to share files in different storage media and OS
Similarity between Unix and NTFS: stream-oriented files, a tree-structured system Difference between Unix and NFTS: CR char included in NTFS, file naming
Fault tolerance: at-most-once or at-least-once semantics Consistency: Unix one-copy update semantics, session semantics, etc. Security: should protect files from network intruders.
CSS434 DFS 3
Consistency Maintenance in Various Storage Systems
Sharing Persis-tence
Distributedcache/replicas
Consistencymaintenance
Example
Main memory RAM
File system UNIX file system
Distributed file system Sun NFS
Web Web server
Distributed shared memory Ivy (Ch. 16)
Remote objects (RMI/ORB) CORBA
Persistent object store 1 CORBA PersistentObject Service
Persistent distributed object store PerDiS, Khazana
1
1
1
CSS434 DFS 4
File Service Architecture
Client computer Server computer
Applicationprogram
Applicationprogram
Client module
Flat file service
Directory service
(File caching)(File caching/replication)
Consistency maintenance
CSS434 DFS 5
DFS Services Flat file service
File-accessing mechanism:deciding a place to manage remote
files and unit to transfer data (at server or client? file, block or byte?)
File-sharing semantics: providing similar to Unix but weaker file update semantics
File-caching mechanism: improving performance/scalability File-replication mechanism:
improving performance/availability Directory service
Mapping between text file names and reference to files, (i.e. file IDs)
CSS434 DFS 6
Flat File Service Operations
Read(FileId, i, n) -> Data — throws BadPosition
If 1 ≤ i ≤ Length(File): Reads a sequence of up to n itemsfrom a file starting at item i and returns it in Data.
Write(FileId, i, Data) — throws BadPosition
If 1 ≤ i ≤ Length(File)+1: Writes a sequence of Data to afile, starting at item i, extending the file if necessary.
Create() -> FileId Creates a new file of length 0 and delivers a UFID for it.
Delete(FileId) Removes the file from the file store.
GetAttributes(FileId) -> Attr Returns the file attributes for the file.
SetAttributes(FileId, Attr) Sets the file attributes (only those attributes that are notshaded in ).
CSS434 DFS 7
Directory Service Operations
Lookup(Dir, Name) -> FileId— throws NotFound
Locates the text name in the directory and returns therelevant UFID. If Name is not in the directory, throws anexception.
AddName(Dir, Name, File) — throws NameDuplicate
If Name is not in the directory, adds (Name, File) to thedirectory and updates the file’s attribute record.If Name is already in the directory: throws an exception.
UnName(Dir, Name) — throws NotFound
If Name is in the directory: the entry containing Name isremoved from the directory. If Name is not in the directory: throws an exception.
GetNames(Dir, Pattern) -> NameSeq Returns all the text names in the directory that match theregular expression Pattern.
host1
fileDir
host2 host3
addName( Dir, Name, file)Name1
Name2 Name3
Ref count=3 if ref_count = 0, file deleted
CSS434 DFS 8
File-Accessing Models Accessing Remote Files
Cache consistency problem
Reducing network traffic
At a client that cached a file copy
Data caching model
Communication overhead
A simple implementation
At a serverRemote service model
DemeritsMeritsFile access
Transfer level
Merits Demerits
File Simple, less communication overhead, and immune to server
A client required to have large storage space
Block A client not required to have large storage space
More network traffic/overhead
Byte Flexibility maximized Difficult cache management to handle the variable-length data
Record Handling structured and indexed files
More network trafficMore overhead to re-construct a file.
Unit of Data TransferNFS
CSS434 DFS 9
File-Sharing Semantics
Define when modifications of the file data made by a user are observable by other users
1. Unix semantics2. Session Semantics3. Immutable shared-files semantics4. Transaction-like semantics
CSS434 DFS 10
File-Sharing SemanticsUnix Semantics (One-copy Update Semantics)
Absolute Ordering (seen to all clients as if only a single copy existed and is updated immediately)
t1 t2 t3 t4 t5 t6
a b a b c a b c da b c a b c d e a b c d e
Client A
Client BAppend(c) Append(d)
read
read
Append(e)
Network Delays (Inevitable to have a weaker semantics)
b c
delayed
a
a bdelayed
CSS434 DFS 11
File-Sharing SemanticsSession Semantics
Client A Client B Client C
a b
a b c
a b c d
a b c d e
Open(file)
Append(x)
Append(y)
Append(z)
Close(file)
Server
a b
a b c d e
Open(file)
a bOpen(file)
a b c d e
a b x
a b x y
a b c y z
a b x y z
Close(file)
Append(m) a b c d e m
a b c d e m
Append(c)
Append(d)
Append(e)
Close(file)
File writes may overwrite previous updates.File lock is needed to prevent this overwrites.
CSS434 DFS 12
File-Sharing SemanticsSession Semantics with File Lock
Client A Client B
a b
a b c
Open(file)
Append(x)
Close(file)
Server
a b
Open(file)
Append(c)
lockt
a b
a b x
User need to choose:quit, steal, or proceed
a b x
a b x^x^s
^x^w
a b x
file
file file2
Close(file)
a b c a b c
User need to choose:Quit, save anyway, or type ^x^w
file3
X
X
lockt
CSS434 DFS 13
File-Sharing SemanticsTransaction-Like Semantics (Concurrency
Control)
Backward validation Forward validation
R1R2W3R4W5
R1R2W6R4W7
R1R2W9R4W8
R1R2R6R8W8
Trans_start
Trans_start
Trans_start
Trans_startTrans_end
Trans_end
Trans_end
Trans_abortTrans_restart
validation
Commitment
Client A Client B Client C Client D
R1R2W3R4W5
R1R2W6R4W7
R1R2W9R4W8
R1R2R6R8W8
Trans_start
Trans_start
Trans_start
Trans_startTrans_end
Trans_end
Trans_abortTrans_restart
validation
Commitment
Client A Client B Client C Client D
Compare reads withformer writes
Compare write withlater reads
Trans_endAbort itself or conflicting active transactions
Which validation is better?
CSS434 DFS 14
File-Sharing SemanticsImmutable Shared-Files Semantics
Version1.0
Tentativebased on
1.0
Tentativebased on
1.0
Version1.1
Version conflict
Version1.2
Version1.2
Ignore conflict Merge
Abort
ServerClient BClient A
Depend on each file system.Abortion is simple (later, the client A canDecide to overwrite it with its tentative 1.0by changing the corresponding directory)
CSS434 DFS 15
File-Caching SchemesCache Location
Disk
Mainmemory
Location Merits Demerits
No caching No modifications Frequent disk access,Busy network traffic
In server’s main memory
One-time disk access,Easy implementation,Unix-like file-sharing semantics
Busy network traffic
In client’s disk
One-time network access,No size restriction
Cache consistency problem,File access semantics, Frequent disk access,No Diskless workstation
In client’s main memory
Maximum performance,Diskless workstation,Scalability
Size restriction,Cache consistency problem,File access semantics
Disk
Mainmemory
Node boundaryClient Server
file
copy
copy
copy
CSS434 DFS 16
Mainmemory
File-Caching SchemesModification Propagation
Write-through scheme Pros: Unix-like semantics and high
reliability Cons: Poor write performance
Delayed-write scheme Write on cache displacement Periodic write Write on close Pros:
Write accesses complete quickly Some writes may be omitted by the
following writes. Gathering all writes mitigates network
overhead. Cons:
Delaying of write propagation results in fuzzier file-sharing semantics.
Disk
file
Mainmemory
copycopyW
new
Client 1 Client 2
W
W
Immediate write
Mainmemory
Disk
file
Mainmemory
copyW copy
new
Client 1 Client 2
delayed writeW
CSS434 DFS 17
File-Caching SchemesCache Validation Schemes – Client-Initiated
Approach
Checking before every access (Unix-like semantics but too slow)
Checking periodically (better performance but fuzzy file-sharing semantics)
Checking on file open (simple, suitable for session-semantics)
Problem: High network traffic
Mainmemory
Disk
file
Mainmemory
copy
Client 1 Client 2
copy
Mainmemory
Disk
file
Mainmemory
copycopyW
Client 1 Client 2
W
W
Check beforeevery access
Write through
Delayed write?
W
W
W
Write-on-close Check-on-open
new
Check-on-close?
CSS434 DFS 18
File-Caching SchemesCache Validation Schemes – Server-Initiated
Approach
Keeping track of clients having a copy Denying a new request, queuing it, and disabling caching Notifying all clients of any update on the original file Problem:
violating client-server model Stateful servers Check-on-open still needed for the 2nd file opening.
Mainmemory
Disk
file
Mainmemory
copy copyW
Client 1 Client 2
W
WW
Mainmemory
copy
Client 3
Notify (invalidate)
Mainmemory
Client 4
Deny for a new open
Write throughOr
Delayed write?
CSS434 DFS 19
Homework Assignment 4
Session semantics Client-side/server-side caching Server-initiated invalidation
Server Client 2Client 1
/tmp cwd /tmp
Name Access
Owner
state
file1 write true wOwn
Name Access
Owner
state
file1 read false rShare
name readers
owner state
file1 client2 client1 wShare
file2 clien3 rShare
file1file2file1 file1
download( )upload( )
invalidate( )writeback( )
invalidate( )writeback( )
emacs
chmod 600
emacs
chmod 400
CSS434 DFS 20
File Access Improvements Data sieving for a single client
Read a larger contiguous file portion Extract actual file portions from it
Collective I/O for multiple clients Read contiguous space, thereafter
distribute sub spaces to each client Disk-directed I/O Server-directed I/O Two-phase I/O (Clients-directed)
CSS434 DFS 21
Data Sieving
User’s request for non-contiguous file portions
Read a larger contiguous block into memory
Copy requested portions into user’s buffer
(from R. Thakur’s Data Sieving and Collective I/O in ROMIO, 1998)
CSS434 DFS 22
Two-Phase I/O
P0
P1
P2
P3
Read contiguous
Read contiguous
Read contiguous
Read contiguous
P0
P3
P1
P2
Redistribute
Redistribute
Redistribute
Redistribute
CSS434 DFS 23
File Stripes Transfer in a Hierarchy(from Fukuda/Miyauchi Journal of Supercomputing)
commanderId: 0
rootsentinel
Id: 2
sentinelId: 8
sentinelId: 9
sentinelId: 38
sentinelId: 36
sentinelId: 37
sentinelId: 39
sentinelId: 32
sentinelId: 33
sentinelId: 128
sentinelId: 129
sentinelId: 130
sentinelId: 131
sentinelId: 132
sentinelId:528
128_inputFile1_1 contents
528_inputFile2_7 contents528_inputFile1_7 contents
32_inputFile1_0 contents32_inputFile2_0 contents
key value
GUI
528
528
read files 128_inputFile1_1 contents
528_inputFile2_7 contents528_inputFile1_7 contents
32_inputFile1_0 contents32_inputFile2_0 contents
128_inputFile1_1 contents
528_inputFile2_7 contents528_inputFile1_7 contents
32_inputFile1_0 contents32_inputFile2_0 contents
3232128
128528
528
128_inputFile1_1 contents
32_inputFile1_0 contents32_inputFile2_0 contents
CSS434 DFS 24
DFS ExampleSun NFS
/
usrbin
shared
VFS
Local FS NFS client
RPC stub
/
optbin
shared
VFS
Local FS NFS client
RPC stub
/
usrbin
org
VFS
Local FS NFS server
RPC stub
ServerClient A Client B
export exportUser
process Userprocess
CSS434 DFS 25
Sun NFSInstallation
Server: Check if NFS is running:rpcinfo –p Start NSF: /etc/rc.d/init.d/nfs start Edit /etc/exports file: /dir/to/export client1(permissions), client2(… Export dirs in /etc/exports: exportfs –a Check exported directories: showmount –e
Client: Import a server’s directory: mount –o options server_name:/dir
/my_dir bg: continue working on importing upon a failure, intr: a process will be interupted if its I/O request to the server dir is pending. soft: allowing a client to time out the connection after a number of retries rw/ro: normal r/w or read only
Underlying Connections: portmapperNFS mount service port
mountdpermission
portmapper2049
client
nfsrpc
CSS434 DFS 26
Sun NFSOverviews
Communication RPC: a compound procedure
Lookup, Open, and Read Server status
Stateless: simple implementation in ver 3. Statefull: allowing clients to cache files in ver 4.
RPC call back from a server to invalidate a client’s cache Synchronization
Session semantics File Locking in ver 4: lock, lockt, locku, and renew
Ex. Emacs: Tests with lockt when modifying buffer, locks a file with lockt, and unlock with locku after writing buffer contents to the file.
Share reservation: specify how to share a file (with ro, wo, or r/w)
CSS434 DFS 27
SUN NFSOverviews (Cont’d)
Caching In client’s memory Session semantics Revalidation of client’s cache upon re-opening the same file Open delegation:
A server delegates a open decision to a writing client which can handle an open request from other clients on the same machine.
A server calls back the client when receiving an open request from another machine.
Fault Tolerance RPC failure: use a duplicate-request cache File locking failure: provide a grace period during which a
client reclaim locks previously granted and the server builds up its previous state.
CSS434 DFS 28
Sun NFSDuplicate Request Cache
client server
XID = 1234
reply
XID = 1234
Too soon, ignore
Transactioncompleted
client server
XID = 1234
reply
XID = 1234
Just replied, ignore
Transactioncompleted
client server
XID = 1234
reply
XID = 1234
Too soon, ignore
Transactioncompleted
reply
Then, when does the server delete this cached result?
CSS434 DFS 29
Venus
Workstations Servers
Venus
VenusUserprogram
Network
UNIX kernel
UNIX kernel
Vice
Userprogram
Userprogram
ViceUNIX kernel
UNIX kernel
UNIX kernel
DFS ExampleAndrew File System
CSS434 DFS 30
AFSFile Name Space
/
usrtmp
bin
Unix Kernel(Unix FS)
Client
Symbolic links
Venusprocess
cache
Userprocess
/
usrtmp
bin
Unix Kernel(Unix FS)
Server
Symbolic links
Viceprocess
Local Shared
CSS434 DFS 31
AFSSystem Call Interception
UNIX filesystem calls
Non-local fileoperations
Workstation
Localdisk
Userprogram
UNIX kernel
Venus
UNIX file system
Venus
CSS434 DFS 32
AFSImplementation of file system calls
User process UNIX kernel Venus Net Vice
open(FileName,mode)
If FileName refers to afile in shared file space,pass the request toVenus.
Open the local file andreturn the filedescriptor to theapplication.
Check list of files inlocal cache. If notpresent or there is novalid callback promise,send a request for thefile to the Vice serverthat is custodian of thevolume containing thefile.
Place the copy of thefile in the local filesystem, enter its localname in the local cachelist and return the localname to UNIX.
Transfer a copy of thefile and a callbackpromise to theworkstation. Log thecallback promise.
read(FileDescriptor,Buffer, length)
Perform a normalUNIX read operationon the local copy.
write(FileDescriptor,Buffer, length)
Perform a normalUNIX write operationon the local copy.
close(FileDescriptor) Close the local copyand notify Venus thatthe file has been closed. If the local copy has
been changed, send acopy to the Vice serverthat is the custodian ofthe file.
Replace the filecontents and send acallback to all otherclients holding callbackpromises on the file.
CSS434 DFS 33
DFS ExampleXFS
Client
MetadataManager
StorageServer
MetadataManager
StorageServer
StorageServerClient
LAN
1: Write requests
2: Log themin a segment
3: Fragment a segmentand sent them to a strip group of servers1: Read request
2: Query a manager
3: Collaborative caching(Read data from another client if possible)
CSS434 DFS 34
DFS ExamplePlan 9
/
ba
in ex
d1
da
d2 d3
x y
c
ba dac
x y net
N
File server 1 File server 2 Computation server Network Interface
Client
net
N
import import export
import
Internet
Union directory
Remote execution
Network access
CSS434 DFS 35
Paper Review by Students Sun NFS Andrew File System XFS Plan 9 LFS Discussions
What file-sharing semantics is each system based on? Which systems use server-side caching? Which systems use client-side caching? Which systems use the client-initiated validation? Which systems use the server-initiated validation?
CSS434 DFS 36
Non-Turn-In ExercisesQ1. In transaction-like semantics a.k.a. concurrency control, compare the pros and cons of ba
ckward and forward transactions. In particular, consider the case where each transaction includes more read than write operations.
Backward transactionPros:Cons:Forward transactionPros:
Q2. Answer the following five questions about file-caching. When you are asked to show which systems use a given caching scheme, choose all applicable systems from NFS, AFS, xFS and Plan9.
Q2-1. Why can file-caching contribute to performance improvement? Answer two reasons.Reason 1:Reason 2:
Q2-2. State one merit for using server-side caching? Which system uses server-side-caching? Merit:System: Plan9 (Answer)
CSS434 DFS 37
Non-Turn-In ExercisesQ2-3. Client-side caching allows multiple clients to cache the same file. There are two scheme
s to validate the contents of a locally-cached file (or invalidate the contents of the same file cached at remote clients.) Those are client-initiated and server-initiated validations. Does the client-initiated validation require a file server to be stateful? Justify your answer. Also show which systems use the client-initiated validation.
Stateless or stateful?Reason:Systems: NFS, Plan9 (Answer)
Q2-4. Does the server-initiated validation require a file server to be stateful? Justify your answer. Also show which system uses the server-initiated validation.
Stateless or stateful?Reason:System: AFS, xFS (Answer)