2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
Design and Implementation of SMB Locking in a Clustered File System
Aravind Srinivasan EMC, Isilon Storage Division
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
Agenda
Overview OneFS Overview Fundamentals of Distributed Locking Challenges in implementing distributed locking Design and Implementation of DLM in OneFS Implementation of SMB locking on top of the
DLM in OneFS
2
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
Overview
Any clustered file system needs a robust Distributed Lock Manager (DLM) to synchronize resources
A file sharing protocol, such as SMB must utilize the DLM appropriately to regulate access to files from multiple clients
3
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
OneFS Overview
4
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
Isilon OneFS Cluster
NAS file server Scalable
Add more storage in 5 mins Reliable
8x mirror / +4 parity Striped across nodes
Single volume file system 3 to 144 nodes Fully symmetric peers
No metadata servers Commodity hardware
CPU, Mem, Disks
5 5
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
Isilon OneFS File System
Concurrent access to all files with all protocols SMB1/SMB2 NFSv3/NFSv4 SSH HTTP/FTP
6 6
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
OneFS – High Level Overview
OneFS is Isilon's sixth-generation operating system that provides the intelligence behind all Isilon scale-out storage systems.
It combines the three layers of traditional storage architectures—file system, volume manager and RAID—into one unified software layer, creating a single intelligent file system that spans all nodes within a cluster.
7
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
OneFS – High Level Overview
Isilon's OneFS enables: Independent or linear scalability of
performance and capacity A single point of management for large and
rapidly growing repositories of data Mission-critical reliability and high availability
with state-of-the-art data protection
8
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
Clustered File System Volume
Fundamentals of Distributed Locking
Multiple writers to the same file - need a reader-writer lock
Writers can be on different nodes – need a distributed locking system
9
/volume/somefile
Node 1 Node 2
write write
File contents corrupted!
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
DLM Challenges
Performance Multiple requirements depending upon the
protocol requirements Exposing the appropriate APIs to utilize the DLM
10
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
OneFS Volume
Design and Implementation of DLM In OneFS
Goal of DLM
11
/ifs/somefile
2
EX-lock EX-lock
Lk resource
1
write write
Lk resource File contents intact
DLM module (lk)
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
DLM in OneFS
From the perspective of the DLM a resource is simply an identifier. It can be a number or it can be an arbitrary blob of data (as in OneFS Lock Manager).
Resources can have a number of modes which can be acquired which determine the level of exclusivity required by the client.
The DLM in OneFS is named LK
12
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
Requirements of LK
The goal of LK is to provide the infrastructure upon which POSIX, NFS and SMB can implement kernel enforced, cluster coherent locks.
13
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
Requirements of LK
The requirements can be grouped into the following major areas: Ranges allowed (ie: number of bits, behavior
at boundaries) Semantics (ie: modes allowed) Wait types (ie: blocking, non-blocking,
asynchronous)
14
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
Requirements of LK (Contd)
Conversions allowed (ie: are conversions from one type of lock to another allowed? Eg: converting a lock from shared to exclusive)
Reference counting semantics (stacked vs. reference counted)
Fairness (strict vs. opportunistic) Miscellaneous
15
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
LK Design
The DLM is split into two distinct roles: Initiator and Coordinator
16
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
LK - Coordinator
The coordinator will deal with nodes as a whole, and won’t know about individual threads on a node.
From the coordinator’s point of view, a node will request a lock, own the lock, and then release the lock. For example, if a node asks for an exclusive lock while holding a shared lock, the exclusive lock will be granted immediately, provided that no other nodes hold shared locks.
17
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
LK - Initiator
The initiator is the one requesting the lock. On the initiator side, there is one entry for each
resource for which there is a local owner or waiter.
Each entry contains a list of all the local owners and a number of queues containing waiters. The main queue hangs directly off of the lock entry, while the rest hang off of per-lock-type structures.
18
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
Three Messages In LK
LK uses three messages to communicate between the initiator and the coordinator Request - Generated by initiator and sent to
coordinator. Contains the Needs and Wants of the initiator
Grant - Generated by the coordinator and sent to initiator. Contains the goals for the resource on this initiator and the additional holds.
Release Generated by the initiator and sent to the coordinator.Used to release an initiator's hold on a lock.
19
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
LK Terms
Need This is the mode of the lock that the client requires. Eg: needs shared mode
Want This is the set of modes of the lock that the client may want as soon as they are not being used by another client. Eg: want exclusive and delete.
Holds This is the set of additional modes of the lock which the coordinator has granted the initiator. eg: holding exclusive and shared
Goal This is the set of modes that the initiator should attempt to achieve as soon as it is able.
20
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
LK Terms (Contd)
The resource parameter in LK represents what is being locked or unlocked. It is an arbitrary blob of data.
The locker parameter represents who is locking or unlocking. This is the parameter which is used for deadlock detection.
The domain parameter represents, not surprisingly, the lock domain. There can be multiple lock domain in existence at any time, each one controlling locks for a different aspect of the system. Eg: OPLOCK domain/CBRL domain
The wait type parameter controls whether the potentially-blocking functions are allowed to block indefinitely.
21
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
LK Callbacks
Lock owners can register callbacks to be called when the node gives up a certain type of lock.
Initiator delays releasing locks which have callbacks registered. Instead, it creates a special type of local waiter and puts it on the main queue.
When the special local waiter is converted into a lock owner, the callback is then called. After the callback is done, its lock owner will go away, and the initiator will release the lock for real.
That is, of course, unless there are still other lock callbacks pending in the main queue.
22
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
SMB locking on top of LK in OneFS
SMB in OneFS uses LK for all its locking purposes such as Oplocks and BRLs.
An event channel is registered between the SMB daemon and the OneFS kernel.
The results from LK are communicated via the registered event channel.
23
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
SMB Locking on top of LK in OneFS
The locker parameter is specified as part of the syscall to acquire the appropriate lock.
The locker parameter can be either the client lease key (for leases) or the MID, TID and PID combination for BRLs or just the file pointer for legacy oplocks
Basically, the locker uniquely identifies the owner of the lock.
24
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
SMB Locking on top of LK in OneFS
A unique 64 bit ID is also passed as part of the syscall, which will be used to register call backs in LK
Whenever a lock is contended, the registered callback routine is triggered and will notify the userspace using the appropriate ID.
The userspace has to maintain the async state and should respond to the message from the kernel appropriately
25
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
SMB Locking on top of LK in OneFS
Using LK for locking, pushes all the SMB locking requirements down to the kernel, thereby significantly improving the performance and also achieving cluster coherency.
The support for callbacks enable us to register async operations and prevent blocking in the kernel.
26
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
Summary
Distributed locking in OneFS is achieved by using a OneFS specific DLM called LK
LK achieves the basic cluster coherency and also provides performance benefits as well as scalability
LK can also be easily extended to support other future protocols by adding a new lock domain if necessary.
27
2012 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
Questions?
Contact
Aravind Srinivasan [email protected]
28