implementing witness service for various cluster failover
TRANSCRIPT
![Page 1: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/1.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
1
Implementing Witness service for various cluster failover scenarios
Rafal SzczesniakEMC/Isilon
![Page 2: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/2.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
2
Long time ago vs. now
SMB1 – no high availability at all
2
![Page 3: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/3.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
3
Long time ago vs. now
SMB1 – no high availability at all
SMB2 – durable and resilient handles (file opens)
3
![Page 4: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/4.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
4
Long time ago vs. now
SMB1 – no high availability at all
SMB2 – durable and resilient handles (file opens)
SMB3 – persistent handles, multi-channel and Witness
4
![Page 5: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/5.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
5
What is Witness?
DCE/RPC interface (see [MS-SWN])
Service providing early detection of connection failures instead of relying on TCP timeouts
Means of (partial) control over client connections
5
![Page 6: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/6.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
6
What is Witness?
![Page 7: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/7.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
7
OneFS cluster
Isilon IQ Storage LayerIntracluster
Communication Infiniband
Clients
Client/Application Layer Ethernet Layer
(optional 2nd
switch)(optional 2nd
switch)
NFS, SMB,FTP, HTTP,
HDFS
NFS, SMB,FTP, HTTP,
HDFSClients
Clients
(optional 2nd switch)(optional 2nd switch)
![Page 8: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/8.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
8
Witness Service in OneFS cluster
Isilon IQ Storage LayerIntracluster
Communication Infiniband
Clients
Client/Application Layer Ethernet Layer
Clients
(optional 2nd switch)(optional 2nd switch)
SMB Connection
Witness Registration
![Page 9: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/9.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
9
Interfaces and Groups
Interface group as an abstraction of cluster nodes’ network interfaces
Usually the same as OneFS address pool
Separate groups for separate OneFS Access Zones
9
![Page 10: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/10.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
10
Caching the state of interfaces
Requesting the interface information from the system all the time can be expensive
The interface state does not change so often
We can cache the information for as long as we need it
10
![Page 11: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/11.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
11
Caching the state of interfaces
11
![Page 12: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/12.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
12
Caching on-demand
The internal list of interfaces is propagated when needed
The number of interfaces can be substantial, especially in a cluster with multiple Access Zones
Updating a large cache could be expensive too, so it’s easier to keep track of only those interfaces the clients ask about
12
![Page 13: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/13.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
13
Resource monitor
Thin layer providing access to the cluster “resources”
The only resources monitored (at the moment): Interface, Interface Group
Allows querying the current information
Allows subscribing for events and unsubscribing when the server is no longer interested in updates
13
![Page 14: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/14.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
14
What does the availability mean?
Network interface failures
14
![Page 15: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/15.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
15
What does the availability mean?
Network interface failures
Server process crashes or deadlocks
15
![Page 16: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/16.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
16
What does the availability mean?
Network interface failures
Server process crashes or deadlocks
System crashes
16
![Page 17: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/17.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
17
Resource monitor modules and events
Individual modules can keep track of all sorts of things independently
Subscribing certain (or any) changes enables the module to submit events to Interface or Interface Group
Witness server has the authority to filter the events and make its own decisions on how the clients should be notified
17
![Page 18: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/18.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
18
Resource monitor modules
18
![Page 19: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/19.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
19
Resource events
Virtually any change happening to a subscribed resource can generate an event
Examples of events to watch for:
Interface state change to unavailable
New interface added to an Interface Group
Submitted events are “pre-treated” by the server before they are used to generate client notifications
19
![Page 20: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/20.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
20
Resource events (contd.)
Modules have a large degree of freedom in what can cause an event submission
The server has the authority to say which events will turn into the actual notifications
20
![Page 21: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/21.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
21
Resource event
21
What does it include?
Module Id
Type of event (changed/added/removed)
Resource
Destination (optional, if the module has any suggestions)
![Page 22: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/22.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
22
Interface events queue
22
![Page 23: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/23.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
23
Keeping track of the availability
Multiple different modules look at different aspects of availability
We need all of them to give us a “go” in order to consider an Interface available
Witness server updates a list of Problems for each Interface as “go-s” and “no-go-s” come in their respective events
The list is empty = There are no problems = The interface is available
23
![Page 24: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/24.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
24
Keeping track of availability
24
![Page 25: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/25.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
25
Updating interface state
Any module can submit events to an interface at any time (given subscriptions)
Witness server starts a work item (a function started in a separate thread) to process the events
After processing, subsequent work items are started to queue notifications in each individual client registration
Work items queuing the notifications resume execution of asynchronous request and send the responses to the witness clients
25
![Page 26: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/26.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
26
Updating interface (submit)
26
![Page 27: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/27.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
27
Updating interface state (process)
27
![Page 28: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/28.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
28
Updating interface state (wake up)
28
![Page 29: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/29.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
29
Updating interface state (notify)
29
![Page 30: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/30.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
30
Resource monitor modules
Different modules can keep track of different things independently
Each module handles its specific failover scenario
![Page 31: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/31.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
31
Scenario: Testing
A module with an IPC interface and a command line client simulates the network interfaces and groups and their changes
Can create and keep an arbitrary number of groups and interfaces
Useful for simulating unusual events
31
![Page 32: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/32.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
32
Testing module (netsim)
32
![Page 33: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/33.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
33
Scenario: Network interface failure
Wired to OneFS cluster networking configuration (Flexnet)
Interface and address pool information received from the system service
Waiting for changes in a separate thread watching individual address pools
Notified through file descriptors
33
![Page 34: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/34.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
34
Flexnet Service in OneFS cluster
Isilon IQ Storage LayerIntracluster
Communication Infiniband
Clients
Client/Application Layer Ethernet Layer
(optional 2nd
switch)(optional 2nd
switch)
NFS, SMB,FTP, HTTP,
HDFS
NFS, SMB,FTP, HTTP,
HDFSClients
Clients
(optional 2nd switch)(optional 2nd switch)Fle
xnet
![Page 35: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/35.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
35
Network module
35
![Page 36: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/36.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
36
Scenario: Server process failure
OneFS Group Manager watching other nodes in the cluster provides the feed
It can keep track of the state of certain processes on other nodes
The module gets notified about the changes in the same way as Network module
36
![Page 37: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/37.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
37
Group Manager in OneFS cluster
Isilon IQ Storage LayerIntracluster
Communication Infiniband
Clients
Client/Application Layer Ethernet Layer
(optional 2nd
switch)(optional 2nd
switch)
NFS, SMB,FTP, HTTP,
HDFS
NFS, SMB,FTP, HTTP,
HDFSClients
Clients
(optional 2nd switch)(optional 2nd switch)
Group Manager
![Page 38: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/38.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
38
Scenario: Maintenance
Sometimes we need to gracefully take a node off the cluster
Existing client connections should “go away”
The module can make the node interfaces look unavailable
It can also move all connections to a different node or even a completely different group
38
![Page 39: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/39.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
39
Beyond failover
Witness “move” notification can be used for load balancing
What would it take?
Connection resource type (to have a control over individual connections)
A module checking the load on other nodes and requesting the move if one of them is overloaded (perhaps another use for witness)
39
![Page 40: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/40.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
40
Beyond Witness itself
Witness RPC is not in fact tied to SMB protocol very much
Information provided by the Resource Monitor (network interfaces status) may be useful for other services, too
40
![Page 41: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/41.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
41
Beyond Witness itself
41
![Page 42: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/42.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
42
Beyond Witness itself
42
![Page 43: Implementing Witness service for various cluster failover](https://reader030.vdocument.in/reader030/viewer/2022020911/620139d6a80073194508d804/html5/thumbnails/43.jpg)
2012 Storage Developer Conference. © EMC. All Rights Reserved.2014 Storage Developer Conference. © EMC Corporation. All Rights Reserved.
43
Thank you!
Questions?
Rafal Szczesniak
43