ibm spectrum scale fundamentals workshop for americas part 4 spectrum...
TRANSCRIPT
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Spectrum Scale Replication & Stretch
Clusters
© Copyright IBM Corporation 2015
Unit objectives
After completing this unit, you should be able to:
• Describe Replication
• Describe the Pros and Cons of Replication
• Describe a Stretch Cluster
© Copyright IBM Corporation 2015
Synchronous data replication
• Allows you to synchronously replicate
– A file, set of files or the entire file system
• Gives you better replication granularity as opposed to mirroring an entire
volume, which also saves on space used
– Allows you to replicate Metadata and/or data
– Provides additional layer(s) of protection in addition to the RAID level
protection of volumes underneath
– Supports a maximum of 3 copies of the data
– Replication is Synchronous only
– Asynchronous Replication can be achieved using AFM feature
© Copyright IBM Corporation 2015
Synchronous data replication
• To replicate or not replicate?
– This is Spectrum Scale level of replication which is an availability level
"on top of" the already built-in data availability (RAID) characteristics of
the disk subsystem(s) being used
– Can be used cross site
– Some performance impact.
Writes are 50 – 67% slower with replication levels 1 & 2
Reads are the same speed
– Your storage effectively becomes more expensive since you are using
more of your usable space for duplicate copies of your data
© Copyright IBM Corporation 2015
Data Replication Warnings
• If you decide to use Replication
– Always replicate your Metadata at the minimum
– Never replicate your data and not your metadata
• If you were to do this, then in the event of a failure, you would not be able
to mount your file system to retrieve your replicated data
© Copyright IBM Corporation 2015
Replication relies on failure groups
• Failure group
– A group of disks in a storage pool that Spectrum Scale assumes are
separate from the disks other failure groups.
– Can be changed anytime (mmrestripefs to fix data)
• A file is replicated when a copy of the data blocks exist in two
failure groups
– Ensures that no two replicas of the same block will become
unavailable due to a single failure.
• Can be set either at NSD creation time using the mmcrnsd
command or later on using the mmchdisk command.
• Important to set failure groups correctly to have effective file
system replication.
• Replication is per storage pool.
© Copyright IBM Corporation 2015
The third failure group
File System Descriptor Quorum
• In addition to quorum nodes three
disks, by default (NSDs) are used
as file system descriptors disks.
• A majority of the replicas on the
subset of disks must remain
available to sustain file system
operations.
• Spectrum Scale can move them
from one disk to another in case of
failure.
• Use the mmlsdisk –L command
to see the location of the descriptor.
• Can add one by creating a descOnly NSD.
© Copyright IBM Corporation 2008
Disk Descriptor
Quorum
Node 1
Node 2
Replication/Failure groups and storage pools
• Creation of NSD requires [ mmcrnsd ]
– O/S disk name
– NSD Server List
• Optional, but recommended
– NSD name
– Failure Group (related to Replication)
– Storage Pool (related to Policy / ILM )
• Disk Stanza%nsd:
device=/dev/sdav2
nsd=nsd1
servers=k145n06,k145n05
usage=dataOnly
failureGroup=5
pool=poolA
© Copyright IBM Corporation 2015
Accessing replicated data
• Default operation
– Read: read from all copies
– Write: write both copies
• Control with readReplicaPolicy
– Local: Read from block device or NSD server on same subnet
– Used for read heavy workload replicated across distance
• Operation with unavailable disk
– Disk marked “down” in FS descriptor
– Read: read available copy
– Write:
• Log changes for fast recovery (possible performance impact)
• Set “missing update” flag in the inode
• Write available copy
• Recovery
– Replay only changes to restored storage
© Copyright IBM Corporation 2015
Replication examples
• Full Replication
– 2 Failure Groups
– Data and Metadata
– On failure file system all ok
• Metadata Replication
– Replicate only metadata
– On failure data missing file system
stays mounted
Failure Group 1
Failure Group 2
Failure Group 1
Failure Group 2
Failure Group 3
Failure Group 4
inode
inode
Missing Data
Metadata OK
Failure Group 3 (Desc Only)
© Copyright IBM Corporation 2015
Mixing replication with pools
• Replicate only metadata
• 3 Data pools for capacity and single namespace
© Copyright IBM Corporation 2015
• Multi-site quorum configuration
• Replicate across sites
• Bandwidth requirements based on
application
• Often called - Two sites and a laptop
• Distributed data
– data is distributed across 2 sites, 3rd site contains
quorum node for availability
• Sites A and B
– Contain the core Spectrum Scale nodes and storage
– Multiple quorum nodes in each site
• Site C
– Contains a single quorum node
– Serves as tie breaker if one of the other sites
becomes inaccessible
– File System Descriptor NSD
Reliability: Multiple site High availability
Single Spectrum Scale System
Site A
Site B
Site C
WAN
© Copyright IBM Corporation 2015
Recovering from a storage failure
• Fix replication using mmrestripefs
mmrestripefs -R
• Usage:mmrestripefs Device {-m | -r | -b | -p | -R} [-P PoolName]
[-N {Node[,Node...] | NodeFile | NodeClass}]
© Copyright IBM Corporation 2015
Review
• Replication can be on a single file or a whole file system
• Replication is spread across failure groups
• Replication is even more important when you do not have any
RAID support underneath for your volumes
• Replication is always synchronous
• Asynchronous Replication is covered by another feature called
Active File Management (AFM)
© Copyright IBM Corporation 2015
Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Spectrum Scale Stretch Clusters
© Copyright IBM Corporation 2015
Spectrum Scale Stretch Clusters
• Stretch Clusters combine two or more clusters together to
make one giant cluster
• Stretch Clusters are intended for inter-site or close proximity
clusters, not over WAN unless the amount of data is small
• Replication is not required, but is usually the intention for a
stretch cluster
• If replication between clusters is not the goal, then you might
possibly prefer using a multi-cluster set up.
• If replication is the goal, but it’s between data centers using a
WAN, then AFM may be a better choice if synchronous
replication is NOT required.
© Copyright IBM Corporation 2015
© 2013 IBM Corporation
Spectrum Scale Architecture (Basics)
SAN, Shared SAS, Twin Tailed, etc.
LUN = Logical Unit Number / NSD = Network Shared Disk
1
SAN LUN
Spectrum Scale NSD
„1:1“ Relation
© 2013 IBM Corporation
Spectrum Scale Architecture (Basics)
SAN
LUN = Logical Unit Number / NSD = Network Shared Disk
1a
SAN (etc) LUN
Spectrum Scale NSD
„1:1“ Relation
Twin-tailedSAS
© 2013 IBM Corporation
Spectrum Scale Architecture (Common)
SAN
LAN
LUN‘s
Spectrum Scale NSD Client
Spectrum Scale NSD Server
2
© 2013 IBM Corporation
Spectrum Scale Architecture (Typical)
SAN
LAN / WAN / Infiniband & any Mixture
3
+ Twin-Tailed Disks+ Internal Disks
FPO FPO
(FPO = File Placement Optimizer)
Spectrum Scale replication of data on
disk
One (or multiple) filesystemsFiles placed on different devices under policy control
Spectrum Scale NSD Clients
Spectrum Scale NSD Server
LUNs
© 2013 IBM Corporation
LUN‘s
NSD Clients
NSD Server
(NSD = Network Shared Disk)
LAN
Infiniband
remote cluster
Remote Cluster Mount (synchronous)
local cluster
4
© 2013 IBM Corporation
LUN‘s
NSD Clients
NSD Server
Inter-site LANLocal LAN
Stretch Cluster (synchronous)4a
Quorum node at 3rd site
Local LAN
Spectrum Scale replication of data
between sites
Filesystem active across both sites
Site 2Site 1
© 2013 IBM Corporation
LUN‘s
NSD Clients
NSD Server
(NSD = Network Shared Disk)
WAN
Infiniband
remote cluster
Spectrum Scale Advanced File Management (async)
local cluster
Caching (R/W)
5