ibm spectrum scale fundamentals workshop for americas part 4 spectrum...

Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Spectrum Scale Replication & Stretch

Clusters

© Copyright IBM Corporation 2015

Unit objectives

After completing this unit, you should be able to:

• Describe Replication

• Describe the Pros and Cons of Replication

• Describe a Stretch Cluster


Synchronous data replication

• Allows you to synchronously replicate

– A file, set of files or the entire file system

• Gives you better replication granularity as opposed to mirroring an entire

volume, which also saves on space used

– Allows you to replicate Metadata and/or data

– Provides additional layer(s) of protection in addition to the RAID level

protection of volumes underneath

– Supports a maximum of 3 copies of the data

– Replication is Synchronous only

– Asynchronous Replication can be achieved using AFM feature


Synchronous data replication

• To replicate or not replicate?

– This is Spectrum Scale level of replication which is an availability level

"on top of" the already built-in data availability (RAID) characteristics of

the disk subsystem(s) being used

– Can be used cross site

– Some performance impact.

Writes are 50 – 67% slower with replication levels 1 & 2

Reads are the same speed

– Your storage effectively becomes more expensive since you are using

more of your usable space for duplicate copies of your data


Data Replication Warnings

• If you decide to use Replication

– Always replicate your Metadata at the minimum

– Never replicate your data and not your metadata

• If you were to do this, then in the event of a failure, you would not be able

to mount your file system to retrieve your replicated data


Replication relies on failure groups

• Failure group

– A group of disks in a storage pool that Spectrum Scale assumes are

separate from the disks other failure groups.

– Can be changed anytime (mmrestripefs to fix data)

• A file is replicated when a copy of the data blocks exist in two

failure groups

– Ensures that no two replicas of the same block will become

unavailable due to a single failure.

• Can be set either at NSD creation time using the mmcrnsd

command or later on using the mmchdisk command.

• Important to set failure groups correctly to have effective file

system replication.

• Replication is per storage pool.


The third failure group

File System Descriptor Quorum

• In addition to quorum nodes three

disks, by default (NSDs) are used

as file system descriptors disks.

• A majority of the replicas on the

subset of disks must remain

available to sustain file system

operations.

• Spectrum Scale can move them

from one disk to another in case of

failure.

• Use the mmlsdisk –L command

to see the location of the descriptor.

• Can add one by creating a descOnly NSD.


Disk Descriptor

Quorum

Node 1

Node 2

Replication/Failure groups and storage pools

• Creation of NSD requires [ mmcrnsd ]

– O/S disk name

– NSD Server List

• Optional, but recommended

– NSD name

– Failure Group (related to Replication)

– Storage Pool (related to Policy / ILM )

• Disk Stanza%nsd:

device=/dev/sdav2

nsd=nsd1

servers=k145n06,k145n05

usage=dataOnly

failureGroup=5

pool=poolA


Accessing replicated data

• Default operation

– Read: read from all copies

– Write: write both copies

• Control with readReplicaPolicy

– Local: Read from block device or NSD server on same subnet

– Used for read heavy workload replicated across distance

• Operation with unavailable disk

– Disk marked “down” in FS descriptor

– Read: read available copy

– Write:

• Log changes for fast recovery (possible performance impact)

• Set “missing update” flag in the inode

• Write available copy

• Recovery

– Replay only changes to restored storage


Replication examples

• Full Replication

– 2 Failure Groups

– Data and Metadata

– On failure file system all ok

• Metadata Replication

– Replicate only metadata

– On failure data missing file system

stays mounted

Failure Group 1

Failure Group 2

Failure Group 1

Failure Group 2

Failure Group 3

Failure Group 4

inode

inode

Missing Data

Metadata OK

Failure Group 3 (Desc Only)


Mixing replication with pools

• Replicate only metadata

• 3 Data pools for capacity and single namespace


• Multi-site quorum configuration

• Replicate across sites

• Bandwidth requirements based on

application

• Often called - Two sites and a laptop

• Distributed data

– data is distributed across 2 sites, 3rd site contains

quorum node for availability

• Sites A and B

– Contain the core Spectrum Scale nodes and storage

– Multiple quorum nodes in each site

• Site C

– Contains a single quorum node

– Serves as tie breaker if one of the other sites

becomes inaccessible

– File System Descriptor NSD

Reliability: Multiple site High availability

Single Spectrum Scale System

Site A

Site B

Site C

WAN


Recovering from a storage failure

• Fix replication using mmrestripefs

mmrestripefs -R

• Usage:mmrestripefs Device {-m | -r | -b | -p | -R} [-P PoolName]

[-N {Node[,Node...] | NodeFile | NodeClass}]


Review

• Replication can be on a single file or a whole file system

• Replication is spread across failure groups

• Replication is even more important when you do not have any

RAID support underneath for your volumes

• Replication is always synchronous

• Asynchronous Replication is covered by another feature called

Active File Management (AFM)


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.

Spectrum Scale Stretch Clusters


Spectrum Scale Stretch Clusters

• Stretch Clusters combine two or more clusters together to

make one giant cluster

• Stretch Clusters are intended for inter-site or close proximity

clusters, not over WAN unless the amount of data is small

• Replication is not required, but is usually the intention for a

stretch cluster

• If replication between clusters is not the goal, then you might

possibly prefer using a multi-cluster set up.

• If replication is the goal, but it’s between data centers using a

WAN, then AFM may be a better choice if synchronous

replication is NOT required.


© 2013 IBM Corporation

Spectrum Scale Architecture (Basics)

SAN, Shared SAS, Twin Tailed, etc.

LUN = Logical Unit Number / NSD = Network Shared Disk

1

SAN LUN

Spectrum Scale NSD

„1:1“ Relation


Spectrum Scale Architecture (Basics)

SAN

LUN = Logical Unit Number / NSD = Network Shared Disk

1a

SAN (etc) LUN

Spectrum Scale NSD

„1:1“ Relation

Twin-tailedSAS


Spectrum Scale Architecture (Common)

SAN

LAN

LUN‘s

Spectrum Scale NSD Client

Spectrum Scale NSD Server

2


Spectrum Scale Architecture (Typical)

SAN

LAN / WAN / Infiniband & any Mixture

3

+ Twin-Tailed Disks+ Internal Disks

FPO FPO

(FPO = File Placement Optimizer)

Spectrum Scale replication of data on

disk

One (or multiple) filesystemsFiles placed on different devices under policy control

Spectrum Scale NSD Clients

Spectrum Scale NSD Server

LUNs


LUN‘s

NSD Clients

NSD Server

(NSD = Network Shared Disk)

LAN

Infiniband

remote cluster

Remote Cluster Mount (synchronous)

local cluster

4


LUN‘s

NSD Clients

NSD Server

Inter-site LANLocal LAN

Stretch Cluster (synchronous)4a

Quorum node at 3rd site

Local LAN

Spectrum Scale replication of data

between sites

Filesystem active across both sites

Site 2Site 1


LUN‘s

NSD Clients

NSD Server

(NSD = Network Shared Disk)

WAN

Infiniband

remote cluster

Spectrum Scale Advanced File Management (async)

local cluster

Caching (R/W)

5

Exercise 4

Replication

Exercise


Unit summary

Having completed this unit, you should be able to:

• Describe replication

• Describe a Stretch Cluster


ibm spectrum scale fundamentals workshop for americas part 4 spectrum...

Technology