best practices in dr using vcs & vvr

54
Best Practices in DR using VCS & VVR Customer Forum, March 2012 Shrikant Ghare, Yatin Nayak

Upload: others

Post on 18-Dec-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Best Practices in DR using VCS & VVR

Best Practices in DR using VCS & VVR

Customer Forum, March 2012

Shrikant Ghare, Yatin Nayak

Page 2: Best Practices in DR using VCS & VVR

Agenda

DR using VCS & VVR - Best Practices 2

DR using VCS 1

DR using VVR 2

Q & A 3

Page 3: Best Practices in DR using VCS & VVR

DR using VCS

DR using VCS – Best Practices 3

DR using VCS – Quick overview 1

Best Practices for DR configurations 2

Advanced DR Configurations 3

Page 4: Best Practices in DR using VCS & VVR

DR Testing Data Availability Application Recovery

VCS DR Configuration Process

4 DR using VCS – Best Practices

Page 5: Best Practices in DR using VCS & VVR

DR Configuration: Application Recoverability

5

Production Cluster

gconic

gcoip

wac ClusterService SG

DiskGroup

Mount

Oracle

Oracle SG

DNS

HTC

DR Cluster

gconic

gcoip

wac ClusterService SG

Sync/Async Replication HTC PVOL HTC SVOL

HTC

DiskGroup

Mount

Oracle

Oracle SG

DNS GCO

DR using VCS – Best Practices

Page 6: Best Practices in DR using VCS & VVR

DR Configuration: Data Availability

6

Cluster A

Cluster B

LLT/GCO

PVols SVols

ESCON or FC Replication

link

MCU RCU

SMPL

COPY

SMPL SMPL

paircreate

synchronized Link restored, pairresync

PVOL down, horctakeover

PVOL up, horctakeover

State diagram for HTC SVOLs Replication configuration using HTC

DR using VCS – Best Practices

Page 7: Best Practices in DR using VCS & VVR

VCS integrates with several Data Availability options

7

Symantec

EMC

HDS/HP

IBM

Others

Veritas Volume Replicator

Campus Cluster

NBU RealTime

SRDF, SRDF/Star, VPLEX

MirrorView

RecoverPoint

Hitachi TrueCopy/HP XP CA, Hitachi/HP XP 3DC

Hitachi Universal Replicator

3PAR, EVA CA

DB2 HADR

IBM Global Mirror, IBM Metro Mirror, DS4K

RemoteMirror

SVC CopyServices, XIVMirror

Oracle DataGuard, DataGuard Broker

NetApp SnapMirror

DR using VCS – Best Practices

Page 8: Best Practices in DR using VCS & VVR

Simulate DR Failover: FireDrill

Disaster Readiness Test: Firedrills

8

Primary Site

Volume Snapshot

Initiate Fire Drill Mount Snapshot

Replication

Test Application Reset

Secondary Site

• Support for all replication technologies

– HW rep: SRDFSnap, HTCSnap etc.

– SW rep: DiskGroupSnap, RVGSnapshot, etc.

• Flavors: Gold/Silver/Bronze

DR using VCS – Best Practices

Page 9: Best Practices in DR using VCS & VVR

DR using VCS

9

DR using VCS – Quick overview 1

Best Practices for DR configurations 2

Advanced DR Configurations 3

DR using VCS – Best Practices

Page 10: Best Practices in DR using VCS & VVR

VCS Stretch Cluster: Best Practices

• Define Site boundaries

– VCS: SystemZones,

– VxVM: Host Site ID, Disk/Enclosure Site ID

• Configure the failover behavior for stretch clusters

– AutoFailoverPolicy Recommended value: 2 for manual failover across SystemZones

10

Zone A Zone B

Single VCS Cluster

APP 1

DR using VCS – Best Practices

Page 11: Best Practices in DR using VCS & VVR

VCS GCO: Best Practices

• Configuring DR infrastructure: Use automation tools/wizards

• Java GUI/gcoconfig & VCS CLIs

– Reduce time

– Avoid human errors

• Configuring the failover behavior for Global Service Groups

– Use Manual ClusterFailoverPolicy (default)

• For short distances between DR sites, ClusterFailoverPolicy can be Connected

– Use Preswitch during planned migrations

11

GCO

APP 1

Manual Failover

DR using VCS – Best Practices

Page 12: Best Practices in DR using VCS & VVR

VCS GCO: Best Practices

• Configure Steward for 2-site DR setups

• Configure DNS agent for consistent name resolution after DR failover

12

GCO

Steward

APP 1

GCO

phonon.symantec.com phonon.symantec.com

DR using VCS – Best Practices

Page 13: Best Practices in DR using VCS & VVR

VCS Replication Agents: Best Practices

• Install latest version of DR agent from SORT

– https://sort.symantec.com/agents

– Freeze SGs before upgrading/installing replication agent

• Define the takeover behavior of the replication resource

– SplitTakeover = 0

– AutoTakeover = 0

13

GCO

Stale Data Replication link

suspended/broken

APP 1

DR using VCS – Best Practices

Page 14: Best Practices in DR using VCS & VVR

VCS Replication Agents: Best Practices

• Use separate Device Group per Application

• For out-of-band arrays, configure all the IP addresses in the resource definition

14

APP 1 APP 2 APP 3

clarspa.symantec.com

clarspb.symantec.com

DR using VCS – Best Practices

Page 15: Best Practices in DR using VCS & VVR

VCS Replication Agents: Best Practices

• For Virtual machines managing in-band arrays, map the Control Devices in Raw device mode to the Guest

– e.g. Physical RDM in Vmware

• Adjust the OnlineTimeout of the resource

– For EMC SRDF, use sigma script to assist in setting DevFOTime

– Increase ActionTimeout for Oracle DataGuard

• EMC RecoverPoint (CDP/R), use latest FailoverImage

– e.g. latest (recommended), bookmark_name, TIME=Timestamp

• Set PanicSystemOnDGLoss on DiskGroup

15 DR using VCS – Best Practices

Page 16: Best Practices in DR using VCS & VVR

Disaster Readiness Test: Best Practices

• Identify Firedrill flavor (recommended: Gold)

• Use space optimized snapshots wherever possible

– E.g. EMC CLARiiON Snapview, EMC Symmetrix Snap

16

APP 1

GCO

Inter-array replication

Local mirror/snapshot

DR using VCS – Best Practices

Page 17: Best Practices in DR using VCS & VVR

Disaster Readiness Test: Best Practices

• Use fdsetup wizard to configure Firedrill Service Group

• Schedule periodic Firedrills using fdsched

• Configure only one System for Firedrill SG

17 DR using VCS – Best Practices

Page 18: Best Practices in DR using VCS & VVR

DR using VCS

18

DR using VCS – Quick overview 1

Best Practices for DR configurations 2

Advanced DR Configurations 3

DR using VCS – Best Practices

Page 19: Best Practices in DR using VCS & VVR

Multi Tier applications

SFRAC SFHA SFCFS

VCS

AppHA AppHA AppHA

Production Site DR Site

PROD VBS FINANCE VBS DR VBS

Replication

GCO

Cluster Fault Event

1. Disaster event: (GCO) Cluster Fault

2. On DR site, run “VBS Showplan” to check whether VBS can be started

3. Perform “VBS Start” on DR Site

4. For the GCO Tier force takeover is done to start DB Tier SG

5. Subsequently App and Web Tier SGs Started

19

Dat

abas

e Ti

er

Mid

dle

war

e Ti

er

Web

Tie

r

DR using VCS – Best Practices

Page 20: Best Practices in DR using VCS & VVR

DR for Virtual Machines

• VM images replicated to DR site

– To maintain consistent application environment (no config drifts) across Data Centers

– Replicated Network parameters may be incorrect in the DR site because of different subnet

Best Practices

• Use VCS Virtualization Agents to manage Virtual machines

– E.g. Solaris Ldoms, Solaris Zones, AIX WPARs

• Use VCS Virtualization agents’ DROpts attributes

• IP Address, Netmask, Gateway, DNS, etc.

– VM Network Reconfiguration after DR failover

20

GCO

Replication

DR using VCS – Best Practices

Page 21: Best Practices in DR using VCS & VVR

Multi site replication

21

• Use VCS Replication agents to handle multi site replication configurations

– SRDF (Concurrent), SRDFstar, HTC3DC, RVGPrimary (with Bunker), etc.

• Configure single stretch cluster covering the synchronous sites

• Separate cluster for the asynchronous site, connected by VCS GCO

Sync Replication

Async Replication

Recovery link

APP 4

Primary Site Tertiary Site

Bunker/Secondary Site

DR using VCS – Best Practices

Page 22: Best Practices in DR using VCS & VVR

Agenda

DR using VCS & VVR - Best Practices 22

DR using VCS 1

DR using VVR 2

Q & A 3

Page 23: Best Practices in DR using VCS & VVR

Planning Replication

Goals

23

Advanced Replication Features

Optimizing Replication

Configuring Replication

Elective VVR best

practices

DR using VVR

Page 24: Best Practices in DR using VCS & VVR

Primary Site

Secondary Site

Replication Options: where VVR fits in

24

Storage-Based Replication

Host-Based Replication

Application-Based Replication

APP APP

DR using VVR

Page 25: Best Practices in DR using VCS & VVR

Primary Datacenter Secondary Site

25

VVR – Replication objects

Storage

Replicator

Log (SRL)

RLINK

Primary RVG

Secondary RVG

Storage

Replicator

Log (SRL)

APP

Production

Data

Secondary

Data

DR using VVR

Page 26: Best Practices in DR using VCS & VVR

VVR - Flexible replication modes to meet your SLA

26

• Maximum Protection: Zero RPO

• Ideal for small distances (< 100 KM) Synchronous

• Maximum Performance

• Ideal for any distance between sites Asynchronous

• Maximum Protection + Maximum Performance

• Cost effective 3-site solution Bunker

DR using VVR

Page 27: Best Practices in DR using VCS & VVR

Primary Datacenter

Secondary Site (DR site)

VVR Asynchronous Replication High performance replication across any distance

Application

host Storage

array Storage

array

SAN WAN

Write I/O

Cache & order I/O

Transfer I/O (distance = more time)

Store I/O

(distance = more time)

Transfer ack

Acknowledge

Next I/O Write I/O

in order

Performance Path

NOT in Performance Path

Application

host

27

I/O ack

DR using VVR

Page 28: Best Practices in DR using VCS & VVR

28

Best Practices in VVR - Overview

28

• VRAdvisor

Planning

Replication

• General practices

• Configuring a

secondary

• HA-DR agents

• Bunker replication

• Compression

• Off-host processing • VVR memory pool tuning

Configuring

Replication

Optimizing

Replication

Advanced

Replication

features

Best

Practices

overview

DR using VVR

Page 29: Best Practices in DR using VCS & VVR

29

DR using VVR

Page 30: Best Practices in DR using VCS & VVR

VRAdvisor – a VVR Planning tool

Use VRAdvisor

Graphic interface

No need to install VVR

How can I achieve desired RPO/RTO?

How do I tolerate network outages?

30 DR using VVR

Page 31: Best Practices in DR using VCS & VVR

VRAdvisor – a VVR Planning tool

• Tolerance to network & secondary outages

• Tolerance to I/O spikes

• RPO & RTO planning

The problems

• Data collection and analysis

• Graphical charts

• SRL size planning

• Network bandwidth recommendations for

• sync & async modes of replication

• specified RPO & RTO

• What-if analysis

What VRAdvisor does

31 DR using VVR

Page 32: Best Practices in DR using VCS & VVR

VRAdvisor – a VVR Planning tool

32 DR using VVR

Page 33: Best Practices in DR using VCS & VVR

VRAdvisor – a VVR Planning tool

33 DR using VVR

Page 34: Best Practices in DR using VCS & VVR

• General practices

• Configuring a VVR secondary

• Setting up HA-DR agents

Best Practices in VVR – Configuring Replication

34

Configuring

Replication

Planning

Replication

Optimizing

Replication

Advanced

Replication

features

Best

Practices in

VVR

DR using VVR

Page 35: Best Practices in DR using VCS & VVR

Configuring VVR – 101

• Create the primary Replicated Volume Group (RVG)

#> vradmin –g diskgroup createpri rvg volumelist srl

• Add the secondary site to this RVG

#> vradmin –g diskgroup addsec rvg primary secondary

• Replication is now configured, and can be started by

#> vradmin –g diskgroup –a startrep rvg secondary

• Monitor replication status by running

#> vradmin –l repstatus rvg

35 DR using VVR

Page 36: Best Practices in DR using VCS & VVR

General Practices for Configuring Replication

36

• 1 RVG per application

• 1 RVG per disk group

Better availability

• Use different physical disks for SRL and data volumes

• Stripe the SRL over several physical disks

• Tier-1/SSD storage for SRL: all writes go through the SRL!

Better performance

• Mirror all data volumes and SRLs across arrays

Better protection

DR using VVR

Page 37: Best Practices in DR using VCS & VVR

• General practices

• Configuring a VVR secondary

• Setting up HA-DR agents

Best Practices in VVR – Configuring Replication

37

Setting up

Replication

Planning

Replication

Optimizing

Replication

Advanced

Replication

features

Best

Practices in

VVR

DR using VVR

Page 38: Best Practices in DR using VCS & VVR

Best Practices for Configuring a VVR secondary

38

• Plan ahead for application clustering by configuring the replication IP addresses as virtual IPs

Virtual IP

• Open all the VVR heartbeat and data ports (vrport)

• For NAT-based firewalls, vol_vvr_use_nat tunable must be set to 1

Firewall

• Enable bandwidth limit for VVR, if network bandwidth is constrained or shared

• Use autosync with SmartMove for efficient initial synchronization (SF 6.0+)

• Use compression feature (SF 6.0+)

Network bandwidth

DR using VVR

Page 39: Best Practices in DR using VCS & VVR

• General practices

• Configuring a VVR secondary

• Setting up HA-DR agents

Best Practices in VVR – Configuring Replication

39

Setting up

Replication

Planning

Replication

Optimizing

Replication

Advanced

Replication

features

Best

Practices in

VVR

DR using VVR

Page 40: Best Practices in DR using VCS & VVR

Mount Service Group (Parallel & Global)

40

CVMVoldg

Online local firm dependency

RVGShared

RVGSharedPri

IP

NIC

CFSmount

CVMVxconfigd

CVMCluster

CFSfsckd

CVM Service

Group

(Parallel)

RVGShared Service Group

(Parallel)

RVGLogowner Service Group

(Failover)

App

IP

NIC

GCO Service Group

(Failover)

Online local firm

dependency

Online local firm

dependency

RVGLogowner

HA-DR Agent hierarchy – an example

DR using VVR

Page 41: Best Practices in DR using VCS & VVR

• VVR memory pool tuning

• Performance tuning

Best Practices in VVR – Optimizing replication

41

Setting up

Replication

Planning

Replication

Optimizing

Replication

Advanced

Replication

features

Best

Practices in

VVR

DR using VVR

Page 42: Best Practices in DR using VCS & VVR

Replica Update

State Machine

VVR Memory Pools - overview

NMCOM

MESSAGE SERVER

NMCOM

READBACK I/O ENGINE

READ WRITE

SRL

REPLICAS DATA VOLs

SECONDARY I/O

ENGINE

SRL DATA VOLs

USER

KERNEL

USER

KERNEL

4145 TCP

ANON UDP

WAN

RVIOMEM

NMCOM NMCOM

RDBACK

Primary Network Secondary

42 DR using VVR

Page 43: Best Practices in DR using VCS & VVR

• Using bunker replication

• Using off-host processing

• Using compression

Best Practices in VVR – Advanced Features

43

Setting up

Replication

Planning

Replication

Tuning

Replication

Advanced

Replication

features

Best

Practices in

VVR

DR using VVR

Page 44: Best Practices in DR using VCS & VVR

Best Practices for bunker replication

44

44

VVR VVR

VVR

Production Site

Disaster Recovery Site

Bunker Site

Asynchronous (any distance, over IP)

Synchronous (small distance; IP or FC)

Replay log (during DR)

Storage needed only for bunker SRL

- Follow bunker proximity guidelines

- Dedicate separate disks to bunker SRL

- Dedicate separate controllers to bunker SRL

DR using VVR

Page 45: Best Practices in DR using VCS & VVR

Planning VVR configuration • VRAdvisor

Summary

45

Advanced Replication Features • Bunker replication

Optimizing Replication • VVR memory pool tuning

Configuring Replication • Adding VVR secondary

• HA-DR agents

Elective VVR best

practices

DR using VVR

Page 46: Best Practices in DR using VCS & VVR

Thank you!

Copyright © 2010 Symantec Corporation. All rights reserved. Symantec and the Symantec Logo are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners. This document is provided for informational purposes only and is not intended as advertising. All warranties relating to the information in this document, either express or implied, are disclaimed to the maximum extent allowed by law. The information in this document is subject to change without notice.

Thank you!

46

Shrikant Ghare [email protected] Yatin Nayak [email protected]

Page 47: Best Practices in DR using VCS & VVR

Multi Tier applications: Planned migration (TBD)

SFRAC SFHA SFCFS

VCS

AppHA AppHA AppHA

Production Site DR Site

PROD VBS FINANCE VBS DR VBS

Replication

GCO

ONLINE

1. VBS contains mix of Global & Local SGs

2. Global SG is ONLINE on Production Site

3. “VBS Showplan” on DR site would show that VBS on DR cannot go online

4. Stop the VBS on Production site

5. Start the VBS on DR site

47 Advanced DR Configurations

Page 48: Best Practices in DR using VCS & VVR

Planning for Disaster Recovery

Ensure Business Continuance after Disaster

– Reduce downtime cost and recovery time (RTO)

– Reduce data loss (RPO)

– Automatic detection and notification of disaster/site failures

– Automated application component startup in the right order

– Ability to do planned migration of application to DR site

Veritas Cluster Server (VCS) for DR automation

– Single solution for local High Availability & Disaster Recovery

Planning for Disaster Recovery 48

Page 49: Best Practices in DR using VCS & VVR

DR Events in VOM

Advanced DR Configurations 49

Page 50: Best Practices in DR using VCS & VVR

DR Events in VOM

Advanced DR Configurations 50

Page 51: Best Practices in DR using VCS & VVR

Futures

• DR SLA computation

– Instant computation of DR SLAs like RPO and RTO

• VOM based recovery plan for SG and VBS

• Multi Tier Firedrills

Futures 51

Page 52: Best Practices in DR using VCS & VVR

Replication Configuration: Best Practices

• Identify hosts for replication management and install appropriate vendor tools on all control hosts

– These could be the same as Application’s SystemList, or different

• For in-band arrays, map the control devices to Control hosts

• E.g. EMC Gatekeepers, Hitachi Command Device

– For virtual machine control hosts, map the control devices in Raw device mode e.g. Physical RDM

• Configure security wherever possible

– Install secure vendor tools wherever available e.g. NaviSecCli

– Generate appropriate array security files

Best Practices for DR Configurations 52

Page 53: Best Practices in DR using VCS & VVR

Multi Tier Application: DR Best Practices

• Use VBS to manage multi-tier applications • Use Showplan to check the readiness of the Business Service on

DR Site for planned/unplanned operations

• There should be at least one global service group configured in VBS DR setup on at least one of the VBS Cluster Tier • Recommended: Global SGs at all tiers

• The failover policy of the global service group should be Manual • Multiple tiers can be consolidated into the same tier at the DR

site

53 Advanced DR Configurations

Page 54: Best Practices in DR using VCS & VVR

DR Configurations: Topologies

Metropolitan HA (Stretch Cluster)

Wide-Area DR (Global Cluster)

Local HA • Reduce

• Failover time

• Operator dependency

• Operator error

• Complexity

• Provide comprehensive data and application availability

VCS Benefits Veritas Cluster Server (VCS) for Disaster Recovery

SAP APP 1 APP 3 APP 2 SAP SAP APP 4 APP 4 APP 1 APP 2 APP 3

54

Steward

Asynchronous Replication

Synch Replication or Mirroring

DR using VCS – Best Practices