distributed systems cs 15-440 distributed file systems- part ii lecture 20, nov 16, 2011 majd f....
Post on 21-Dec-2015
214 views
TRANSCRIPT
Distributed SystemsCS 15-440
Distributed File Systems- Part II
Lecture 20, Nov 16, 2011
Majd F. Sakr, Mohammad Hammoud andVinay Kolar
1
Today…
Last session Distributed File Systems – Part I
Today’s session Distributed File Systems – Part II
Announcements: Problem solving assignment 4 (the last PS) has been posted
and it is due by Dec 7 Project 4 (the last project) is going to be posted before the end
of this week
2
Discussion on Distributed File Systems
3
Distributed File Systems (DFSs)
Basics DFS AspectsBasics
DFS Aspects
Aspect Description
Architecture How are DFSs generally organized?
Processes • Who are the cooperating processes? • Are processes stateful or stateless?
Communication • What is the typical communication paradigm followed by DFSs?
• How do processes in DFSs communicate?
Naming How is naming often handled in DFSs?
Synchronization What are the file sharing semantics adopted by DFSs?
Consistency and Replication What are the various features of client-side caching as well as server-side replication?
Fault Tolerance How is fault tolerance handled in DFSs?
Aspect Description
Architecture How are DFSs generally organized?
Processes • Who are the cooperating processes? • Are processes stateful or stateless?
Communication • What is the typical communication paradigm followed by DFSs?
• How do processes in DFSs communicate?
Naming How is naming often handled in DFSs?
Synchronization What are the file sharing semantics adopted by DFSs?
Consistency and Replication What are the various features of client-side caching as well as server-side replication?
Fault Tolerance How is fault tolerance handled in DFSs?
DFS Aspects
Aspect Description
Architecture How are DFSs generally organized?
Processes • Who are the cooperating processes? • Are processes stateful or stateless?
Communication • What is the typical communication paradigm followed by DFSs?
• How do processes in DFSs communicate?
Naming How is naming often handled in DFSs?
Synchronization What are the file sharing semantics adopted by DFSs?
Consistency and Replication What are the various features of client-side caching as well as server-side replication?
Fault Tolerance How is fault tolerance handled in DFSs?
Aspect Description
Architecture How are DFSs generally organized?
Processes • Who are the cooperating processes? • Are processes stateful or stateless?
Communication • What is the typical communication paradigm followed by DFSs?
• How do processes in DFSs communicate?
Naming How is naming often handled in DFSs?
Synchronization What are the file sharing semantics adopted by DFSs?
Consistency and Replication What are the various features of client-side caching as well as server-side replication?
Fault Tolerance How is fault tolerance handled in DFSs?
Naming In DFSs
NFS is considered as a representative of how naming is handled in DFSs
Naming In NFS
The fundamental idea underlying the NFS naming model is to provide clients with complete transparency
Transparency in NFS is achieved by allowing a client to mount a remote file system into its own local file system
However, instead of mounting an entire file system, NFS allows clients to mount only part of a file system
A server is said to export a directory to a client when a client mounts a directory, and its entries, into its own name space
Mounting in NFS
remote usr
vu
mbox
Client A
users usr
steen
mbox
Server
work usr
me
mbox
Client B
Exported directorymounted by Client A
Exported directorymounted by Client B
The file named /remote/vu/mbox at Client A
The file named /work/vu/mbox at Client BSharing files becomes harder
Mount steen subdirectory
Mount steen subdirectory
Sharing Files In NFS
A common solution for sharing files in NFS is to provide each client with a name space that is partly standardized
For example, each client may by using the local directory /usr/bin to mount a file system
A remote file system can then be mounted in the same manner for each user
Example
remote usr
bin
mbox
Client A
users usr
steen
mbox
Server
work usr
bin
mbox
Client B
Exported directorymounted by Client A
Exported directorymounted by Client B
The file named /usr/bin/mbox at Client A
The file named /usr/bin/mbox at Client BSharing files resolved
Mount steen subdirectory
Mount steen subdirectory
Mounting Nested Directories In NFSv3
An NFS server, S, can itself mount directories, Ds, that are exported
by other servers
However, in NFSv3, S is not allowed to export Ds to its own clients
Instead, a client of S will have to explicitly mount Ds
If S will be allowed to export Ds, it would have to return to its clients file handles that include identifiers for the exporting servers
NFSv4 solves this problem
Mounting Nested Directories in NFS
bin
draw
install
Client
packages
draw
install
Server A
install
Server B
Client imports directoryfrom server A
Server A imports directoryfrom server B
Client needs to explicitly import subdirectory from server B
NFS: Mounting Upon Logging In (1)
Another problem with the NFS naming model has to do with deciding when a remote file system should be mounted
Example: Let us assume a large system with 1000s of users and that each user has a local directory /home that is used to mount the home directories of other users
Alice’s (a user) home directory is made locally available to her as /home/alice
This directory can be automatically mounted when Alice logs into her workstation
In addition, Alice may have access to Bob’s (another user) public files by accessing Bob’s directory through /home/bob
NFS: Mounting Upon Logging In (2)
Example (Cont’d):
The question, however, is whether Bob’s home directory should also be mounted automatically when Alice logs in
If automatic mounting is followed for each user:
Logging in could incur a lot of communication and administrative overhead
All users should be known in advance
A better approach is to transparently mount another user’s home directory on-demand
On-Demand Mounting In NFS
On-demand mounting of a remote file system is handled in NFS by an automounter, which runs as a separate process on the client’s machine
Client Machine
NFS Client Automounter
Local File System Interface
1. Lookup “/home/alice”
3. Mount request
2. Create subdir “alice”
home
alice4. M
ount subdir “alice” fr
om server
Server Machine
alice
users
DFS Aspects
Aspect Description
Architecture How are DFSs generally organized?
Processes • Who are the cooperating processes? • Are processes stateful or stateless?
Communication • What is the typical communication paradigm followed by DFSs?
• How do processes in DFSs communicate?
Naming How is naming often handled in DFSs?
Synchronization What are the file sharing semantics adopted by DFSs?
Consistency and Replication What are the various features of client-side caching as well as server-side replication?
Fault Tolerance How is fault tolerance handled in DFSs?
Aspect Description
Architecture How are DFSs generally organized?
Processes • Who are the cooperating processes? • Are processes stateful or stateless?
Communication • What is the typical communication paradigm followed by DFSs?
• How do processes in DFSs communicate?
Naming How is naming often handled in DFSs?
Synchronization What are the file sharing semantics adopted by DFSs?
Consistency and Replication What are the various features of client-side caching as well as server-side replication?
Fault Tolerance How is fault tolerance handled in DFSs?
Synchronization In DFSs
File Sharing Semantics Lock Management
Synchronization In DFSs
File Sharing Semantics Lock Management
Unix Semantics In Single Processor Systems
Synchronization for file systems would not be an issue if files were not shared
When two or more users share the same file at the same time, it is necessary to define the semantics of reading and writing
In single processor systems, a read operation after a write will return the value just written
Such a model is referred to as Unix Semantics
Single Machine
a bOriginal File
Process A
Write “c”
a b c
Process B Read gets “abc”
Unix Semantics In DFSs
In a DFS, Unix semantics can be achieved easily if there is only one file server and clients do not cache files
Hence, all reads and writes go directly to the file server, which processes them strictly sequentially
This approach provides UNIX semantics, however, performance might degrade as all file requests must go to a single server
Caching and Unix Semantics
The performance of a DFS with one single file server and Unix semantics can be improved by caching
If a client, however, locally modifies a cache file and shortly another client reads the file from the server, it will get an obsolete file
File Server
Process A
Client Machine #1 Client Machine #2
a b a b
1. Read “ab”
a b c
2. Write “c”
3. Read gets “ab”
Process B
a b
Session Semantics (1)
One way out of getting an obsolete file is to propagate all changes to cached files back to the server immediately
Implementing such an approach is very difficult
An alternative solution is to relax the semantics of file sharing
Changes to an open file are initially visible only to the process that modified the file. Only when the file is closed, the changes are made visible to other processes.
Session Semantics
Session Semantics (2)
Using session semantics raises the question of what happens if two or more clients are simultaneously caching and modifying the same file
One solution is to say that as each file is closed in turn, its value is sent back to the server
The final result depends on whose close request is most recently processed by the server
A less pleasant solution, but easier to implement, is to say that the final result is one of the candidates and leave the choice of the candidate unspecified
Immutable Semantics (1)
A different approach to the semantics of file sharing in DFSs is to make all files immutable
With immutable semantics there is no way to open a file for writing
What is possible is to create an entirely new file
Hence, the problem of how to deal with two processes, one writing and the other reading, just disappears
Immutable Semantics (2)
However, what happens if two processes try to replace the same file?
Allow one of the new files to replace the old one (either the last one or non-deterministically)
What to do if a file is replaced while another process is busy reading it?
Solution 1: Arrange for the reader to continue using the old file Solution 2: Detect that the file has changed and make subsequent
attempts to read from it fail
Atomic Transactions
A different approach to the semantics of file sharing in DFSs is to use atomic transactions where all changes occur atomically
A key property is that all calls contained in a transaction will be carried out in-order
A process first executes some type of BEGIN_TRANSACTION primitive to signal that what follows must be executed indivisibly
Then come system calls to read and write one or more files
When done, an END_TRANSACTION primitive is executed
1
2
3
Semantics of File Sharing: Summary
There are four ways of dealing with the shared files in a DFS:
Method Comment
UNIX Semantics Every operation on a file is instantly visible to all processes
Session Semantics No changes are visible to other processes until the file is closed
Immutable Files No updates are possible; simplifies sharing and replication
Transactions All changes occur atomically
Method Comment
UNIX Semantics Every operation on a file is instantly visible to all processes
Session Semantics No changes are visible to other processes until the file is closed
Immutable Files No updates are possible; simplifies sharing and replication
Transactions All changes occur atomically
Method Comment
UNIX Semantics Every operation on a file is instantly visible to all processes
Session Semantics No changes are visible to other processes until the file is closed
Immutable Files No updates are possible; simplifies sharing and replication
Transactions All changes occur atomically
Method Comment
UNIX Semantics Every operation on a file is instantly visible to all processes
Session Semantics No changes are visible to other processes until the file is closed
Immutable Files No updates are possible; simplifies sharing and replication
Transactions All changes occur atomically
Method Comment
UNIX Semantics Every operation on a file is instantly visible to all processes
Session Semantics No changes are visible to other processes until the file is closed
Immutable Files No updates are possible; simplifies sharing and replication
Transactions All changes occur atomically
Synchronization In DFSs
File Sharing Semantics Lock Management
Central Lock Manager
In client-server architectures (especially with stateless servers), additional facilities for synchronizing accesses to shared files are required
A central lock manager can be deployed where accesses to a shared resource are synchronized by granting and denying access permissions
P0 P1 P2
Central Lock
Manager
Lock Request Lock Granted
Qu
eu
e
P0 P1 P2
Central Lock
Manager
Lock Request
Lo
ck De
nie
d
Qu
eu
e
P0 P1 P2
Central Lock
Manager
ReleaseLock Granted
Qu
eu
e2
Lease is obtained Lease is expired
2
File Locking In NFSv4
NFSv4 distinguishes read locks from write locks
Multiple clients can simultaneously access the same part of a file provided they only read data
A write lock is needed to obtain exclusive access to modify part of a file
NFSv4 operations related to file locking are:
Operation Description
Lock Create a lock for a range of bytes
Lockt Test whether a conflicting lock has been granted
Locku Remove a lock from a range of bytes
Renew Renew the lease on a specified lock
Operation Description
Lock Create a lock for a range of bytes
Lockt Test whether a conflicting lock has been granted
Locku Remove a lock from a range of bytes
Renew Renew the lease on a specified lock
Operation Description
Lock Create a lock for a range of bytes
Lockt Test whether a conflicting lock has been granted
Locku Remove a lock from a range of bytes
Renew Renew the lease on a specified lock
Operation Description
Lock Create a lock for a range of bytes
Lockt Test whether a conflicting lock has been granted
Locku Remove a lock from a range of bytes
Renew Renew the lease on a specified lock
Operation Description
Lock Create a lock for a range of bytes
Lockt Test whether a conflicting lock has been granted
Locku Remove a lock from a range of bytes
Renew Renew the lease on a specified lock
Sharing Files in Coda
When a client successfully opens a file f, an entire copy of f is transferred to the client’s machine
The server records that the client has a copy of f
If client A has opened f for writing and another client B wants to open f (for reading or writing) as well, it will fail
If client A has opened f for reading, an attempt by B for reading succeeds
An attempt by B to open for writing would succeed as well
DFS Aspects
Aspect Description
Architecture How are DFSs generally organized?
Processes • Who are the cooperating processes? • Are processes stateful or stateless?
Communication • What is the typical communication paradigm followed by DFSs?
• How do processes in DFSs communicate?
Naming How is naming often handled in DFSs?
Synchronization What are the file sharing semantics adopted by DFSs?
Consistency and Replication What are the various features of client-side caching as well as server-side replication?
Fault Tolerance How is fault tolerance handled in DFSs?
Aspect Description
Architecture How are DFSs generally organized?
Processes • Who are the cooperating processes? • Are processes stateful or stateless?
Communication • What is the typical communication paradigm followed by DFSs?
• How do processes in DFSs communicate?
Naming How is naming often handled in DFSs?
Synchronization What are the file sharing semantics adopted by DFSs?
Consistency and Replication What are the various features of client-side caching as well as server-side replication?
Fault Tolerance How is fault tolerance handled in DFSs?
Consistency and Replication In DFSs
Client-Side Caching Server-Side Replication
Consistency and Replication In DFSs
Client-Side Caching Server-Side Replication
Client-Side Caching In Coda
Caching and replication play an important role in DFSs, most notably when they are designed to operate over WANs
To see how client-side caching is deployed in practice, we discuss client-side caching in Coda
Clients in Coda always cache entire files, regardless of whether the file is opened for reading or writing
Cache coherence in Coda is maintained by means of callbacks
Callback Promise and Break
For each file, the server from which a client had cached the file keeps track of which clients have a copy of that file
A server is said to record a callback promise
When a client updates its local copy of a file for the first time, it notifies the server
Subsequently, the server sends an invalidation message to other clients
Such an invalidation message is called callback break
Using Cached Copies in Coda
The interesting aspect of client-side caching in Coda is that as long as a client knows it has an outstanding callback promise at the server, it can safely access the file locally
Client A
Server
Client B
Open(RD)
Session SA
File f
Open(RW)
Session SB
Invalidate (callback break)
close
close Open(RW)
File f OK (no file transfer)
Session S’ B
close
Open(RD)
Not OK File f
Session S’A
Consistency and Replication In DFSs
Client-Side Caching Server-Side Replication
Server-Side Replication
Server-side replication in DFSs is applied (as usual) for fault-tolerance and performance
However, a problem with server-side replication is that a combination of a high degree of replication and a low read/write ratio may degrade performance
For an N-fold replicated file, a single update request will lead to an N-fold increase of update operations
Concurrent updates need to be synchronized
Storage Groups in Coda
The unit of replication in Coda is a collection of files called volume
The collection of Coda servers that have a copy of a volume are known as that volume’s Volume Storage Group (VSG)
In the presence of failures, a client may not have access to all servers in a volume’s VSG
A client’s Accessible Volume Storage Group (AVSG) for a volume consists of those servers in that volume’s VSG that the client can currently access
If the AVSG is empty, the client is said to be disconnected
Maintaining Consistency in Coda
Coda uses a variant of Read-One, Write-All (ROWA) to maintain consistency of a replicated volume
When a client needs to read a file, it contacts one of the members in its AVSG of the volume to which that file belongs
When closing a session on an updated file, the client transfers it in parallel to each member in the AVSG
The scheme works fine as long as there are no failures (i.e., Each client’s AVSG of a volume equals to the volume’s VSG)
A Consistency Example (1)
Consider a volume that is replicated across 3 servers S1, S2, and S3
For client A, assume its AVSG covers servers S1 and S2, whereas client B has access only to server S3
ServerS1
ServerS2
Client A
ServerS3
Client B
Broken Network
A Consistency Example (2)
Coda allows both clients, A and B:
To open a replicated file f, for writing Update their respective copies Transfer their copies back to the members in their AVSG
Obviously, there will be different versions of f stored in the VSG
The question is how this inconsistency can be detected and resolved?
The solution adopted by Coda is deploying a versioning scheme
Versioning in Coda (1)
The versioning scheme in Coda entails that a server Si in a VSG maintains a Coda Version Vector CVVi(f) for each file f contained
in the VSG
If CVVi(f)[j] = k, then server Si knows that server Sj has seen at least version k of file f
CVVi(f)[i] is the number of the current version of f stored at server Si
An update of f at server Si will lead to an increment of CVVi(f)[i]
Versioning in Coda (2)
CVV1(f) = CVV2(f) = CVV3(f) = [1, 1, 1] (initially)
When client A reads f from one of the servers in its AVSG, say S1, it also receives CVV1(f)
After updating f, client A multicasts f to each server in its AVSG (i.e., S1 and S2)
ServerS1
ServerS2
Client A
ServerS3
Client B
Broken Network
Versioning in Coda (3)
S1 and S2 will then record that their respective copies have been updated, but not that of S3 (i.e., CVV1(f) = CVV2(f) = [2, 2, 1])
Meanwhile, client B is allowed to open a session in which it receives a copy of f from server S3
If so, client B may subsequently update f
ServerS1
ServerS2
Client A
ServerS3
Client B
Broken Network
Versioning in Coda (4)
Say, client B then closes its session and transfers the update to S3
S3 updates its version vector to CVV3(f) =[1, 1, 2]
When the partition is healed, the 3 servers will notice that a conflict has occurred and it needs to be repaired (i.e., inconsistency is detected)
ServerS1
ServerS2
Client A
ServerS3
Client B
Broken Network
How the inconsistency is resolved is discussed by Kumar and Satyanarayanan (1995)
DFS Aspects
Aspect Description
Architecture How are DFSs generally organized?
Processes • Who are the cooperating processes? • Are processes stateful or stateless?
Communication • What is the typical communication paradigm followed by DFSs?
• How do processes in DFSs communicate?
Naming How is naming often handled in DFSs?
Synchronization What are the file sharing semantics adopted by DFSs?
Consistency and Replication What are the various features of client-side caching as well as server-side replication?
Fault Tolerance How is fault tolerance handled in DFSs?
Aspect Description
Architecture How are DFSs generally organized?
Processes • Who are the cooperating processes? • Are processes stateful or stateless?
Communication • What is the typical communication paradigm followed by DFSs?
• How do processes in DFSs communicate?
Naming How is naming often handled in DFSs?
Synchronization What are the file sharing semantics adopted by DFSs?
Consistency and Replication What are the various features of client-side caching as well as server-side replication?
Fault Tolerance How is fault tolerance handled in DFSs?
Fault Tolerance In DFSs
• Fault Tolerance in DFSs is typically handled according to the principles we discussed in the Fault Tolerance lectures
• Hence, we will concentrate mainly on some special issues in fault tolerance for DFSs
Handling Byzantine Failures
One of the problems that is often ignored when dealing with fault tolerance in DFSs is that servers may exhibit Byzantine failures
To achieve protection against Byzantine failures, the server group must consist of at least 3k+1 processes (assuming that at most k processes fail at once)
In practical settings, such a protection can only be achieved if non-faulty processes are ensured to execute all operations in the same order (Castro and Liskov, 2002)
Each process request can be attached with a sequence number and a single coordinator can be used to serialize all operations
Quorum Mechanism (1)
The coordinator can fail
Furthermore, processes can go through a series of views, where:
In each view the members of a group agree on the non-faulty processes
A member initiate a view change when the current coordinator appears to be failing
To this end, a quorum mechanism is used
Quorum Mechanism (2)
The quorum mechanism goes as follows:
Whenever a process P receives a request to execute an operation o with a number n in a view v, it sends this to all other processes
P waits until it has received a confirmation from at least 2k other processes that have seen the same request
Such a confirmation is called a quorum certificate A quorum of size 2K+1 is said to be obtained
Quorum Mechanism (3)
The whole quorum mechanism consists of 5 phases:
Client
Master
Replica
Replica
Replica
1. Request 2. Pre-Prepare 3. Prepare 4. Commit 5. Reply
The Master multicasts a sequence #
Each replica multicasts its acceptance to others
Agreement isreached and all processes informeach other and execute the operation
The client is given the result
The client sends a request
Next Class
Virtualization