vmware vcenter srm wp en best practices
TRANSCRIPT
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
1/20
VMware vCenter
SiteRecovery Manager 4.0
Performance and Best
Practices for Performance
ArchitectingYourRecoveryPlantoMinimizeRecoveryTimeW H I T E P A P E R
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
2/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 2
Table of Contents
Introduction 3
About VMware vCenter Site Recovery Manager 4.0 3
Perormance Considerations in a Site Recovery Manager Environment 3
About This Paper 3
Reerence Setup Environment 4
Site Recovery Manager Server Confguration Recommendation
and Database Sizing 5
Summary o perormance improvements made in Site Recovery Manager 4.0
over Site Recovery Manager 1.0 6
1. Basic Operation Latency Overview 7
2. Test Recovery Time vs. Real Recovery Time 8
3. Site Recovery Manager Scalability Perormance 9
3.1. Protection (Scaling with a number o virtual machines) 9
3.2. Recovery (Scaling with a number o virtual machines and scaling
with a number o protection groups) 9
3.3. High Latency Network 11
3.3.1. Creating protection groups on a high latency network 11
3.3.2. Recoveries on a high latency network 12
4. Architecting Recovery Plans (From a perormance/recovery time perspective) 13
4.1. Virtual machine to protection group relation 13
4.2. Recovery time iSCSI/FC vs. NFS 14
4.3. Placeholder virtual machines placement and VMware Distributed
Resource Scheduler (DRS) behavior on the recovery site 14
4.3.1. Job throttling during a recovery 15
4.4. Standby hosts on the recovery site and enabling DPM 15
4.5. High priority virtual machines and suspending virtual machines 17
4.6. Advanced Settings/VMware Tools 18
4.7. Speciy a Non-replicated Datastore or Swap Files 19
5. Recommendations 19
6. Appendix 20
6.1. Acknowledgements 20
6.2. About the author 20
6.3. Reerences 20
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
3/20
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
4/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 4
Reerence Setup Environment
BelowisthesetupusedorallSiteRecoveryManagerexperimentspresentedinthiswhitepaper
Note: SiteRecoveryManagerdoesnotimposesimilarhardwarerequirementsacrossbothsitesYoucanhave
diferentnumberoESXServerhostsatprotectedandrecoverysites
InthisparticularsetupbothSiteRecoveryManagerandvCenterServerwereinstalledonthesamephysical
machine(thisappliestoboththeprotectedandtherecoverysite)
Datastore Groups Datastore Groups
Recovery SiteProtected SiteOver TCP
Virtual CenterVirtual Center
Array Replication
Over TCP
++ ++Site Recovery
ManagerSite Recovery
Manager
Fig 1.IllustrationotheSiteRecoveryManagerTestbedEnvironment
Hardware/Sotware Confguration
Experimental Setup
SiteRecoveryManager40VMwarevCenter40andVMwareESX5U4wereusedorperormance
measurements
WANemwasusedorsimulatingahigh-latencynetworkwithpacketdrops
Site Recovery Manager and VMware vCenter ServerProtected Site and Recovery Site Confguration
HostComputer:DellPowerEdgeR900
CPUs:2xQuadCoreE7310Xeon,1.6GHz
RAM:0GB
Network:BroadcomBCM5708CNetXtremeIIGigE
SiteRecoveryManagerServersotware:SiteRecoveryManager40
VMwarevCenterServersotware:VMwarevCenter40
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
5/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 5
ESX System5 ESX 3.5 hosts on Protected Site, 5 ESX 3.5 hosts on Recovery Site
HostComputer:DellPE2650
CPUs:IntelXeonCPU3.06GHz
RAM:8GB
Network:NetXtremeBCM570GigabitEthernet00mbps
ESXsotware:VMwareESX5U4
Network Simulation Sotware
WANemv
Storage
LeftHandSANIQ8.0VSAnodes-SRA_8.0.00.1682
Site Recovery Manager Server Coniguration Recommendation and Database SizingBelowaretheminimumhardwareresourcerequirementsorSiteRecoveryManager40onbothsites:
Processor2.0GHzorhigherIntelorAMDx86processor
Memory2GBminimum
DiskStorage2GBminimum
NetworkingGigabitrecommended
SiteRecoveryManager40uses400KBRAMperprotectedvirtualmachineortherecoverysiteduring
recoveries
SiteRecoveryManagerusesadatabaseonbothprotectedandrecoverysitestostoreinormationThe
protectedsiteSiteRecoveryManagerdatabasestoresdataregardingtheprotectiongroupsettingsand
protectedvirtualmachineswhiletherecoverysiteSiteRecoveryManagerdatabasestoresinormationonrecoveryplansettingsresultsortestingrecoveryplansresultsorrunningarealrecoverytransactionsmade
duringatestorrealrecoveryrunandmoreSomeothediskspaceusageispermanentinnaturewhilesome
oitistransientegthespacerequiredortemporarytransactionaldataduringarunningrecovery
ItisrecommendedthattheSiteRecoveryManagerdatabasebeinstalledasclosetotheSiteRecoveryManager
serveraspossibleasitreducestheRoundTripTimes(RTT)betweenbothothemThiswayrecoverytime
perormancewillnotsufergreatlybecauseoshorterroundtripstothedatabaseserver
YoucanusethesamedatabaseservertosupportthevCenterdatabaseinstanceandtheSiteRecovery
Managerdatabaseinstance
Databasesizeisdependentupon:
Numberofprotectedvirtualmachines
Numberofprotectiongroups
Numberofrecoveryplans
Transientdatawrittenduringtestandrealrecoveries
Extrastepsaddedtotherecoveryplan
Etc.
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
6/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 6
ReertotheToolssectionintheSiteRecoveryManagerresources pageormoreinormationonhowtosize
yourSiteRecoveryManagerdatabaseorOracleandSQLServerbasedonthementionedparameters
Summary o perormance improvements made in Site Recovery Manager 4.0 over Site Recovery
Manager 1.0
Theollowingimprovementsarerepresentativeotheperormancedatacollectedromthelabenvironment
describedintheIntroductionsectionThescaleoimprovementmayvaryinyourlabenvironmentasit
dependsonthescaleoyourinventory 4andsetupconguration5
UI improvements
UIresponsivenessimprovementforInventoryMappingscreenwithalargescaleinventory
UIresponsivenessimprovementsforcreatingprotectiongroupswithalargenumberofvirtualmachines
UIperformancespeedupfordisplayinglargerecoveryplans
Recovery time improvements
Increasedparallelismforrecoveringvirtualmachines-2recoveryvirtualmachineoperationsstartedper
hostinsteado
NoticeableperformanceimprovementfordeletingswaplesduringrecoveriesavailablewithESX4.0U1
Noticeableperformanceimprovementforpreparingstorageduringrecoveryplanexecutionviaecient
databaseaccessoralargenumberovirtualmachines
Noticeableperformanceimprovementforallconcurrentsub-stepsduringrecoveryplanexecutionviaecient
databaseaccessoralargenumberovirtualmachines
Noticeableperformanceimprovementinsearchingdatastoresduringrecoveryplanexecutionavailablewith
VC40U
Distributionofshadowvirtualmachinesacrosshostsontherecoverysiteduringprotectionimprovesrecovery
timeperormanceOther
Noticeableperformanceimprovementinrecoveryplancreation
TestandrealrecoverytimeperormancehasimprovedinSiteRecoveryManager40whencomparedtoSite
RecoveryManager0becauseotherecoverytimeimprovementsmentionedabove
4 Inventory in reerence includes, but not limited to, virtual machines, protection groups, datastores, clusters, network and olders.
5Setup conguration in reerence includes, but not limited to, hardware conguration and RTT between Site Recovery Manager Server and Site Recovery
Manager database.
http://www.vmware.com/products/srm/resource.htmlhttp://www.vmware.com/products/srm/resource.html -
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
7/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 7
1. Basic Operation Latency Overview
VMwarevCenterSiteRecoveryManagerrevolutionizesthewaydisasterrecoveryplansaredesignedand
executedThisinvolvestwosimplesteps:protectionandrecovery
Protectioninvolvesollowingoperations:
Arraymanagerconguration
Inventorymapping
Creatingaprotectiongroup
Recoveryinvolvesollowingoperations:
Creatingarecoveryplan
Testrecovery
Realrecovery
Theollowinggraphdepictsaveragebaselinelatenciesormajoroperations
Note: Theactualnumberscanvaryinarealdeploymentandthenumberspresentedhereareroma
specicsetupThevirtualmachinesusedorallthedatapresentedinthiswhitepaperdidnothaveanyIP
customizationspecsoranywaitingorheartbeatsduringtherecoveryThemotivationwastoexerciseSite
RecoveryManagerbehaviorduringarecoverywithdiferentsettings
Inventory Mapping(1 entity)
Creating a ProtectionGroup (1 VM)
0.3 0.1
106
94
6
Creating aRecovery Plan (1 VM)
Test Recovery(1 VM)
Real Recovery(1 VM)
Latency
(sec)
0
20
40
60
80
100
120
Major SRM Operations
Fig 2.BasicSiteRecoveryManageroperationslatencyoverview
Creating a protection group:Protectingprotectedsitevirtualmachine(s)andcreatingplaceholdervirtual
machine(s)ontherecoverysite
Inventory mapping:Mappinginventoryentitieslikenetworkscomputeresourcesandvirtualmachineolders
betweentheprotectedandtherecoverysite
6In this white paper, latency is dened as the time taken or a certain operation to nish.
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
8/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 8
Creating a recovery plan: Creatingarecoveryplanwithprotectiongroup(s)consistingoprotectedvirtual
machine(s)
Test recovery: Executingatestrecovery
Real recovery:Executingarealrecovery
2. Test Recovery Time vs. Real Recovery Time
WithanydisasterrecoverysolutionyouneedtohaveanestimationotherecoverytimeInadditionto
providingrealrecoverySiteRecoveryManagersupportsanon-disruptivetest-recoverymodeTheempirical
datagatheredromthesetupshowthatrealrecoverytimeiscappedbytestrecoverytimeTestrecovery
timeishigherthantherealrecoverytimeastheormerinvolvesrevertingthedatacenterbacktoitsoriginal
stateThisinvolvesrecoverysiteoperationslikepoweringoftestvirtualmachinesresettingthestorageand
replacingtestvirtualmachineswithplaceholdervirtualmachines(seeFigure)
Arealrecoveryontheotherhandincludesstepsnotexecutedinatestrecoverysuchaspoweringof
productionvirtualmachinesatprotectedsite(iprotectedsiteisstillavailableorexampleincaseoaplanned
migration)
Fig 3. Stepsexecutedonlyintestrecovery
Fig 4.Stepexecutedonlyinrealrecovery(incaseodatacentermigration)
Asageneralruleyoucangaugetherealrecoverytimebymeasuringthetimetakenoratestrecoveryuntilthe
messagepromptstep(step8inFigure)Incaseoplanneddatacentermigrationyouneedtoaccountorthe
timetakentoshutdownthevirtualmachines(stepinFigure4)ontheprotectedsiteaswell
EstimatinganaccurateRTOisrelativelydicultastherearealotovariableparameterstoconsidersuchas
networklatenciesresourceavailabilityandstorageI/OForinstancerecoveryoperationlatencyduringthe
PrepareStoragestepinatestrecoverydiersfromthelatencyduringarealrecoverysincethetworecovery
modesentaildiferentstorageleveloperations
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
9/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 9
3. Site Recovery Manager Scalability Perormance
Alltheollowingoperationswereexecutedoveralowlatencynetwork
3.1. Protection (Scaling with a number o virtual machines)SiteRecoveryManager40allowsprotectingamaximumo000virtualmachinesSiteRecoveryManager
scaleswellwithanincreasingnumberovirtualmachinesinaprotectiongroupMajoritytimeorthisoperation
isspentincreatingplaceholdervirtualmachinesontherecoverysite
Create Protection Groups
1 protection group -
100 virtual machines
1 protection group -
200 virtual machines
1 protection group -
300 virtual machines
1 protection group -
400 virtual machines
1 protection group -
500 virtual machines
Time(M
in)
0
5
10
15
20
25
30
35
45
Fig 5.Protectiongroupcreationtime:scalingwithvirtualmachinesunderasingleprotectiongroup
3.2. Recovery (Scaling with a number o virtual machines and scaling with a number o protection
groups)
SiteRecoveryManager40allowsrecoveringamaximumo000virtualmachinesandamaximumo50
protectiongroupsAsmentionedpreviouslytestrecoverytimeishigherthanrealrecoverytimeThesamecan
beobservedbyscalingthenumberovirtualmachines
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
10/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 1 0
SiteRecoveryManager40providessupportorreplicatedNFSstorageaswellasiSCSIandFCIngure6and
7youwillndthestatisticswegatheredusingsotwareiSCSIreplicatedstorage
Real RecoveryTest Recovery
RecoveryTime(Min)
0
20
40
60
80
100
120
140
160
1 protection group -
100 virtual machines1 protection group -
200 virtual machines
1 protection group -
300 virtual machines
1 protection group -
400 virtual machines
1 protection group -
500 virtual machines
Fig 6. Testandrealrecoverytime:scalingwithvirtualmachinesunderasingleprotectiongroupSotwareiSCSI
Test Recovery
RecoveryTime(Min)
0
10
20
30
40
50
60
70
80
90
100
30 protection groups -
30 virtual machines
60 protection groups -
60 virtual machines
90 protection groups -
90 virtual machines
120 protection groups -
120 virtual machines
150 protection groups -
150 virtual machines
FigTestrecoverytime:scalingwithprotectiongroups(virtualmachineperprotectiongroup)
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
11/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 1 1
3.3. High Latency Network
Hereisthesetupusedforsimulatingahighlatencynetworkacrossbothsites.WANem2.1wasusedtosimulate
ahighlatencynetworkwithdiferentRTT
Datastore Groups Datastore Groups
Over TCP
Array Replication
Over TCP
WANem
Virtual Center ++ Virtual Center ++
Recovery SiteProtected Site
Site RecoveryManager
Site RecoveryManager
Fig 8.IllustrationotheSiteRecoveryManagerTestbedEnvironmentwithahighlatencynetwork
NetworkRTTaretypically75to00millisecondsorintra-USnetworksabout50millisecondsor
transatlanticnetworksand0to40millisecondsorsatellitenetworks
Latenciesorcreatingprotectiongroupsandrealrecoveriesshowsomeimpactwithahighlatencynetworkbetweentheprotectedandtherecoverysite
TheollowingexperimentswerecarriedoutwithRTTo0050400milliseconds
3.3.1. Creating protection groups on a high latency network
Latenciesorcreatingprotectiongroupsareafectedbyahighlatencynetworkbetweentheprotectedandthe
recoverysite
Creatingaprotectiongroupinvolvesthecreationandmonitoringoplaceholdervirtualmachinesonthe
recoverysitethisinvolvesanumberoRemoteProcedureCalls(RPC)
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
12/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 1 2
Creating a Protection Group
1 protection group with
100 virtual machines - 250ms RTT
1 protection group with
100 virtual machines - 400ms RTT
Time(Min)
0
1
2
3
4
5
6
7
8
910
1 protection group with
100 virtual machines -100ms RTT
Fig 9.CreatingprotectiongroupsonahighlatencynetworkwithdiferentRTTbetweenprotectedandrecoverysite
3.3.2. Recoveries on a high latency network
Realrecoveriesareafectedbyhighlatencynetworksbecauseotheoperationtoshutdownremotevirtual
machineontheprotectedsiteThisappliestoplannedmigrationonlyandnottorealdisasters(astherewould
benoaccesstotheprotectedsitevirtualmachinesinthiscase)Ahighlatencynetworkbetweenaprotected
andarecoverysiteshouldnotafecttestrecoveriesThusnetworklatencybetweentheprotectedandthe
recoverysitesneedstobeconsideredwhenestimatingrealrecoverytime
Real RecoveryTest Recovery
RecoveryTime(Min)
20
20.5
21
21.5
22
22.5
23
23.5
1 protection group with
100 virtual machines - 250ms RTT
1 protection group with
100 virtual machines - 400ms RTT
1 protection group with
100 virtual machines -100ms RTT
Fig 10.TestandrealrecoverytimeonahighlatencynetworkwithdiferentRTT
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
13/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 1 3
4. Architecting Recovery Plans (From a perormance/recovery time perspective)
InthissectionSiteRecoveryManagerbestpracticesincertainareasareprovidedTheywillhelpyouarchitect
ecientrecoveryplansthatminimizerecoverytime
4.1. Virtual machine to protection group relation
ForeachprotectiongroupincludedinarecoveryplanSiteRecoveryManagerneedstocommunicatewith
theunderlyingstoragetocreatesnapshotsoreplicatedLUNs(orpromotereplicatedLUNsincaseoreal
recovery)inthatprotectiongroupandpresentthemtorecoverysitehostsCurrentlythisoperationis
executedsequentiallyoreachprotectiongroupAsaresultaddingmoreprotectiongroupstoarecovery
planasopposedtoaddingmorevirtualmachinesismorecostlyromarecoverytimeperspectiveThisisalso
dependentupontheunderlyingstorageusedorreplicationbetweenthetwosites
Theollowingarerecoverytimemeasurementsoratestrecoveryor50virtualmachinesinasingleprotection
groupvs50virtualmachinesin50protectiongroups(virtualmachineperprotectiongroup)
Prepare Storage Recover VMs Power o Recovered VMs Reset Storage
150 VMs under 1 Protection Group 150 VMs under 150 Protection Group
TestRecoveryTime(sec)
0
1000
2000
3000
4000
5000
6000
Fig 11. Virtualmachinetoprotectiongrouprelation
AsshowninFigureoraspecicenvironmenttheormercaseoutperormsthelatterThesenumberscan
changesignicantlydependingontheenvironment
Key Takeaways:
Groupingvirtualmachinesinfewerprotectiongroupsenablesfastertestandrealrecoveriesprovidedthose
virtualmachineshavenoconstraintspreventingthemrombeinggroupedundersimilarprotectiongroups
Theaboverecommendationappliestobothtestandrealrecoveries.Note:thesteptoresetstorageisnota
partotherealrecovery
Addingacoupleofprotectiongroupsdoesnotnecessarilyincreasetherecoverytimebyalargefactor.The
recoverytimeoverheadoaddingasingleprotectiongroupwithasinglevirtualmachineismorethanadding
asinglevirtualmachineunderanexistingprotectiongroup
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
14/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 1 4
4.2. Recovery time iSCSI/FC vs. NFS
WhenworkingwithiSCSI/FCstorage,SiteRecoveryManagerinitiatesarescanontheHBAsonthehostsused
ortherecoverytomakethestorageavailabletoallthehostsduringarecovery(testandreal)Theserescan
callsareissuedinparallelacrossallrecoverysitehosts
WhenworkingwithNFSstorageSiteRecoveryManagermountsthereplicatedNFSvolumes/snapshotson
theESXhostsduringtherecoverySiteRecoveryManager40issuesallthemount/un-mountcallsinaserial
mannerperhost
ThismeansthatiittakesXsecondstomountasinglereplicatedNFSvolumeonasinglerecoverysitehostand
youhave0volumestomountacross0hostsintherecoverysiteclusterthenitwilltake0*0*Xsecondsto
mountallthevolumesacrossallthehostsduringarecovery(bothtestandreal)Similarbehaviorisexpected
orunmountingvolumes
ForlargescalerecoverieswithalargenumberohostsandNFSvolumeshostingtheprotectedvirtual
machinesitmighttakemoretimetomountorun-mountNFSvolumesacrossalltherecoverysitehostsas
comparedtorescanningallhostHBAsforiSCSIorFC.
Key Takeaways:
Itisagoodpracticetohavefewer,butlarger,NFSvolumessothatthetimetakentomountalargenumberof
volumesdecreasesduringtherecovery
Thismightalsotranslatetofewerprotectiongroupsonyoursetupwhichwillhelpevenmoreinreducingthe
recoverytimeasmentionedinSection4.1
4.3. Placeholder virtual machines placement and VMware Distributed Resource Scheduler (DRS) behav-
ior on the recovery site
SiteRecoveryManager40createsplaceholdervirtualmachinesattherecoverysiteoreachprotectedvirtual
machineontheprotectedsitewhencreatingprotectiongroupsTheplacementotheseplaceholdervirtual
machinesisamajoractorinSiteRecoveryManagerdeterminationowherethevirtualmachineswillbe
poweredonduringrecoveryAruleothumbistodistributethemevenlyonalltheavailableESXhosts
SiteRecoveryManager40distributesalltheshadowvirtualmachinesinarandomashionacrossallhosts
withinacluster.Thisisthedefaultbehaviorandisnotdependentonanykindofclustersettings.Duringarecovery(testandreal)eachotheseplaceholdervirtualmachinesarereplacedbytheirrespectiverecovered
virtualmachinesregisteredromtherecovereddatastore
SiteRecoveryManager40initiatesadeaultoRecoveryvirtualmachinesoperationsperhostItcan
concurrentlystartuptoamaxo8suchoperationsduringarecoveryThatiswhyitbecomesimportantto
ensurethatplaceholdervirtualmachinesaredistributedacrossallhostsplannedtobeusedorrecoveryto
obtainthemaxleveloparallelismSiteRecoveryManagerofersorrecoveryoperationsThiswillhelpin
decreasingtherecoverytime
Formoredetailsonjobthrottlingduringarecoveryreerto Section4.3.1
Iyouareaddingnewhoststotherecoverysiteaterthevirtualmachineshavealreadybeenprotectedyou
shouldmigrateshadowvirtualmachines(draganddrop)acrossthenewlyaddedhoststoullyleveragethe
newhardwareorrecoveryinordertodecreaserecoverytime
SiteRecoveryManager40alsoofershostailureresiliency(ieianyESXhosthostingaplaceholdervirtual
machinedoesnotrespondduringarecoverybecauseoanyreasonsuchasthattheESXhostdoesnothave
accesstotherecovereddatastorethenSiteRecoveryManagerselectsotheravailablehoststorecoverthat
virtualmachineon)
IfVMwareDRSisenabledontherecoverycluster,thenSiteRecoveryManagerwillutilizeVMwareDRStoload
balancetherecoveredvirtualmachinesacrossthehoststoensureoptimalperormanceandRTOInthiscase
youdonotneedtomanuallydistributeshadowvirtualmachinesacrossallthehosts
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
15/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 1 5
IfVMwareDRSisnotenabledontherecoverycluster,concurrencyduringrecoverywillstillbemaintained
becauseothedeaultrandomshadowvirtualmachineplacementduringvirtualmachineprotectionacrossthe
hostswithintherecoverycluster
4.3.1. Job throttling during a recoveryToalleviateresourcepressureonSiteRecoveryManagerServerandvCenterServerSiteRecoveryManager
perormsjobthrottlingbylimitingthemaximumnumberoconcurrentoperationsinitiatedduringarecovery
Mostothesub-stepsinarecoveryplanarecomprisedounitsoexecutioneachowhichisdispatchedtoa
singleESXhostviavCenterServerTheollowingsub-stepssupportconcurrentexecutions:
Shutdownprotectedvirtualmachinesontheprotectedsite(realrecoveryonly)
Suspendnon-criticalvirtualmachines
Recovernormalpriorityvirtualmachines
Recoverlowpriorityvirtualmachines
Recovernopoweronvirtualmachines
Resumenon-criticalvirtualmachines(testrecoveryonly)
Cleanupvirtualmachinesposttest(testrecoveryonly)
Atanytimethenumberosub-stepsexecutedconcurrentlyis8ortimesthenumberoESXhostsusedor
recovery,whicheverisless.Addingmorethan9hoststoyourrecoverysitewillnotspeeduptheRecovery
virtualmachinestepTheseadditionalhostswillbeusedorrecoveringvirtualmachinesbutthemaximum
numberofhoststhatcanbeusedsimultaneouslybySiteRecoveryManagerduringarecoveryis9.
Key Takeaways:
ItisagoodpracticetohaveVMwareDRSenabledonarecoverysite.MigrationsmightbeinitiatedasDRS
triestoloadbalancetheclusterduringtherecovery
IfDRSisnotenabled,ensurethatallrecoverysitehostsarebeingusedforhostingplaceholdervirtual
machinesThiswillensuremaximumparallelismduringrecoveryEvenlydistributeallplaceholdervirtualmachinesmanuallyacrossallhostsinthiscase
4.4. Standby hosts on the recovery site and enabling DPM
SiteRecoveryManager4.0workswithvSpheretorecovervirtualmachinesonaDPMenabledcluster.
IftherecoverysiteclusterhasDPMenabled,SiteRecoveryManagerworkswithDPMtopoweronanystandby
hostsinthecluster.SiteRecoveryManagerthenworkswithDRStousethesehostsforrecoveringvirtual
machinesduringatestandrealrecovery
Forthecasepresentedbelowarealrecoverywith00virtualmachineswascarriedoutonarecoverysite
cluster(withDPMdisabledandenabled)with5hostsoutofwhich2wereplacedinstandbymode.
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
16/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 1 6
DPM DISABLED on a cluster with 2 standby hosts DPM ENABLED on a cluster with 2 standby hosts
Real Recovery - with 1 protection group - 100 virtual machines
RecoveryTime(sec)
0
500
1000
1500
2000
2500
3000
3500
Fig 12.RecoverytimebenetswithVMwareDPM:recoveringvirtualmachinesunderasingleprotectiongroup
RecoverywithDPMenabledcompletedfasterasSiteRecoveryManagerhad2extrapowered-onhoststouse
orrecoveringvirtualmachinesconcurrently
VirtualmachinerecoverystartsonlyaterSiteRecoveryManagerhasnishedbringingallESXhostsouto
thestandbymodeThesestandbyESXhostsarepoweredonconcurrentlyThispower-onprocesscreatesan
overheadwhichisthemaximumothetimetakentobringanyonehostoutostandbymodeIngeneralthe
aorementionedoverheadisrelativelysmallcomparedtothegaininrecoverytimeperormanceduetothe
availabilityomoreESXhostsSincethisoverheaddoesnotvaryasthenumberovirtualmachinesincreases
youwillreapmoreperormancebenetsiyouhavealargernumberoprotectedvirtualmachines
Key Takeaways:
ItisagoodpracticetoenableDPMonrecoverysiteclustersifyourrecoverysitehostsareinstandbymode.
Morehostsleadtomoreconcurrencyorrecoveringvirtualmachinesandthusresultsinshorterrecoverytime
IfthehostsareinastandbymodeandDPMisnotenabled,bringthehostsoutofstandbymodemanuallyand
draganddropshadowvirtualmachinesonthem
Whenprotectingvirtualmachines(i.e.creatingprotectiongroups)ensurethattherecoverysitehosts
mappedundertherespectiveinventoryareinaproperpoweredonstateotherwiseSiteRecoveryManager
willnotusethosehoststocreateplaceholdervirtualmachines
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
17/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 1 7
4.5. High priority virtual machines and suspending virtual machines
Inarecoveryplanthevirtualmachinesbeingrecoveredcanbeassignedashighnormalorlowpriorityvirtual
machines
Theassignmentotheorderothesevirtualmachinesandthetotalnumberovirtualmachinesplacedineachcategorywillafecttherecoverytime
ForexampleshowninFigurearetherecoverytimesorexecutingatestrecoverywithincreasingnumbero
highpriorityvirtualmachines
Test Recovery
1 protection group wtih
100 virtual machines -
20 of which were high priority
1 protection group wtih
100 virtual machines -
40 of which were high priority
1 protection group wtih
100 virtual machines -
60 of which were high priority
RecoveryTime(Min)
23
24
25
26
27
28
29
30
31
Fig 13.Testrecovery:varyingthenumberohighpriorityvirtualmachines
Asyoucanseetherecoverytimedoesincreasewiththenumberovirtualmachinesplacedinhighpriorityas
allhighpriorityoperationstorecovervirtualmachineswillbeexecutedsequentiallyThisappliestobothreal
andtestrecoveries
Asanalternativetoplacingsomevirtualmachinesinthehighprioritygroupandsomevirtualmachinesinthe
normalprioritygroupinasinglerecoveryplanonecanseparateoutallthevirtualmachinestoberecovered
intologicalgroupsGroupwiththerstlevelovirtualmachinestoberecoveredGroupwiththesecond
levelovirtualmachinestoberecovered(iedependentupontherstlevelvirtualmachinesinGroup)
VirtualmachinesinGroupcanbeplacedinnormalpriorityandvirtualmachinesinGroupcanbeplacedinlow
priorityBothotheseprioritygroupsallowparallelrecoveryovirtualmachinesSincethelowprioritygroup
willstartaterthenormalprioritygroupyouwillstillbemaintainingthedependencybetweenthelogical
groupsovirtualmachineswithoutsacricingtherecoverytimeNotethatdependencyismaintainedbetween
prioritygroupsandthereisnoconceptodependencyacrossvirtualmachineswithinasingleprioritygroup
Key Takeaways:
Itisimportanttochartoutthedependenciesbetweenvirtualmachinestoberecoveredheresothatonlya
certainnumberorequiredvirtualmachinescanbeassignedashighpriorityItdoesimpactrecoverytime
testaswellasrealrecovery
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
18/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 1 8
Asanalternativetoplacingvirtualmachinesinhighprioritygroup,onecanseparateallthevirtualmachines
toberecoveredinlogicalgroupsGroupwithlevelvirtualmachinesGroupwithlevelvirtualmachines
whicharedependentuponvirtualmachinesinGroupGroupvirtualmachinescanthenbeplacedinthe
normalprioritygroupandGroupvirtualmachinesinthelowprioritygroupwithinthesamerecovery
planThisshouldmaintaindependencyacrossbothlogicalgroupsandwillstillreducetherecoverytimeby
introducingmoreconcurrencyorbothprioritygroupsNotethatdependencyismaintainedbetweenpriority
groupsandthereisnoconceptodependencyacrossvirtualmachineswithinasingleprioritygroup
Suspendingvirtualmachinesontherecoverysitewillalsoimpactrecoverytime.
4.6. Advanced Settings/VMware Tools
SiteRecoveryManager40providesthecapabilitytochangetheadvancedsettings 7Oneadvancedsettingthat
isrelatedtoSiteRecoveryManagerperormanceisSanProviderminLunGroupComputationInterval
WhenevertheprotectedsiteinventorychangesSiteRecoveryManagerwillperormanewLUNGroup
computationIyouareaddingmultiplevirtualmachinestothedatacenterorchanginganyinventory
ingeneralyoucangetSiteRecoveryManagertowaitbeoredoinganothercomputationbysetting
SanProviderminLunGroupComputationIntervaltotheapproximatetimetakentomakethechangeForexample
settingSanProviderminLunGroupComputationIntervaltoNtellsSiteRecoveryManagerthatthereshouldbeatleastNsecondsinbetweenanytwoconsecutiveLUNGroupComputationtasksThissettingisintendedto
makeLUNGroupComputationtaskslessrequentwhentherearealotoinventorychangesgoingon
VMwarestronglyrecommendsthatVMwareToolsbeinstalledinallprotectedvirtualmachinesManySite
RecoveryManagerrecoveryoperationsdependontheproperinstallationoVMwareToolsintheprotected
virtualmachines
WaitforOSheartbeatwhilepoweringonvirtualmachineandwaitfornetworkchangewhilereconguring
recoveredvirtualmachine
SiteRecoveryManagerdependsonVMwareToolstoreportOSheartbeatandnishonetworkchangeI
youdonothaveVMwareToolsinstalledonanyotheprotectedvirtualmachinesyoucanchoosetosetthe
timeoutvaluesforWaitforOSHeartbeatandWaitforNetworkChangetozero(0).
Waitforvirtualmachinestoshutdownontheprotectedsite.
Duringarealrecovery,SiteRecoveryManagertriestogracefullyshutdownthevirtualmachinesonthe
protectedsiteBeoreSiteRecoveryManagerorciblypowersavirtualmachineofittriestoshutdownthe
guestOSIyourintentionistopowerofthevirtualmachineswithoutgraceullyshuttingdowntheguestOS
youcansetthetimeoutvaluecalledRecoverypowerStateChangeTimeoutintheadvancedsettingstozero
Note:thisappliesonlyorvirtualmachineswithVMwareToolsinstalledandthetimeoutisautomaticallyset
tozero(0)inoVMwareToolsareinstalled
7 Reer to the subsection Working with Advanced Settings in section 5 o Site Recovery Manager Administration Guide or more details on Advanced Settings.
http://www.vmware.com/pdf/srm-admin.pdfhttp://www.vmware.com/pdf/srm-admin.pdf -
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
19/20
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
T E C H N I C A L W H I T E P A P E R / 1 9
4.7. Speciy a Non-replicated Datastore or Swap Files
Everyvirtualmachinerequiresaswaplewhichisnormallycreatedinthesamedatastoreastheothervirtual
machinelesWhenyouuseSiteRecoveryManagerthisdatastoreisreplicatedTopreventswaplesrom
beingreplicatedcreatethemonanon-replicateddatastore
Iyouareusinganon-replicateddatastoreorswaplesyoumustcreateanon-replicateddatastoreorall
protectedclustersatboththeprotectedandrecoverysites
Procedure
1. In the vSphere Client, right-click an ESX cluster and click Edit Settings.
2. In the Settings window or the cluster, click Swapfle Location and select Store the swaple in the
datastore specied by the host, then click OK.
3. For each host in the cluster, select a nonreplicated datastore.
a. Click the Confguration tab.
b. On the Swaple Location line, click Edit.
c. In the Virtual Machine Swaple Location window, select a nonreplicated datastore and click OK.
Recovery Time advantages
SiteRecoveryManagermakesremotecallstovCenterServertodeleteanyswaplesoundonareplicated
datastoreduringarecoveryontherecoverysiteIswaplesresideonnon-replicateddatastoresthenthisstep
willbeskippedanditwillhelpspeeduptherecoveryThiswillalsoavoidwastingonetworkbandwidthduring
replicationbetweenthesites
5. Recommendations
VMwarevCenterSiteRecoveryManagerprovidesadvancedcapabilitiesordisasterrecoverymanagement
non-disruptivetestingandautomatedailoverTheollowingperormancerecommendationshavebeenmade
inthispaper:
ItisrecommendedthatSiteRecoveryManagerdatabasebeinstalledasclosetotheSiteRecoveryManagerserveraspossiblesuchthatitreducestheRTTbetweenbothothemThiswayrecoverytimeperormance
willnotsufergreatlybecauseoroundtripstothedatabaseserver
Groupingvirtualmachinesunderfewerprotectiongroupsenablesfastertestandrealrecoveries,provided
thosevirtualmachineshavenoconstraintspreventingthemrombeinggroupedundersimilarprotection
groups
ItisagoodpracticetohavefewerbutlargerNFSvolumessothatthetimetakentomountalargenumber
osuchvolumesdecreasesduringtherecoveryThismightalsotranslatetoewerprotectiongroupsonyour
setupleadingtoreducedrecoverytime
ItisagoodpracticetohaveVMwareDRSenabledonarecoverysite.MigrationsmightbeseenasDRStriesto
loadbalancetheclusterduringtherecovery
ItisagoodpracticetoenableDPMonrecoverysiteclustersifyourrecoverysitehostsareinstandbystate.Morehostsleadtomoreconcurrencyorrecoveringvirtualmachinesandthusresultsinshorterrecoverytime
IfDPMisnotenabledandthehostsareinastandbystate,bringthehostsoutofstandbymodemanuallyand
draganddropshadowvirtualmachinesonthem
-
8/14/2019 Vmware Vcenter Srm Wp en Best Practices
20/20
VMware, Inc.HillviewAvenuePaloAltoCAUSATel--Fax--wwwvmwarecom
VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance
Itisimportanttochartoutthedependenciesbetweenandprioritiesforvirtualmachinestoberecoveredso
thatonlyacertainnumberorequiredvirtualmachinescanbeassignedashighpriorityAsanalternative
toplacingvirtualmachinesinhighprioritygrouponecanseparateallthevirtualmachinestoberecovered
inlogicalgroupsGroupwithlevelvirtualmachinesandGroupwithlevelvirtualmachineswhich
aredependentuponvirtualmachinesinGroupGroupvirtualmachinescanthenbeplacedinnormal
prioritygroupandGroupvirtualmachinesinthelowprioritygroupwithinthesameplanThisshould
maintaindependencyacrossbothlogicalgroupsandwillstillreducetherecoverytimebyintroducingmore
concurrencyorbothprioritygroupsNotethatdependencyismaintainedbetweenprioritygroupsandthere
isnoconceptodependencyacrossvirtualmachineswithinasingleprioritygroup
ItisstronglyrecommendedthatVMwareToolsbeinstalledinallprotectedvirtualmachinesinorderto
accuratelyacquiretheirheartbeatsandnetworkchangenotication
Makesureanyinternalscriptorcall-outpromptdoesnotblockrecoveryindenitely.
Specifyanon-replicateddatastoreforswaples.Thisavoidswastingnetworkbandwidthduringreplication
betweensitesandremotecallstovCenterServerduringarecoverytodeleteswaplesorallvirtual
machineswhichinturnhelpsinspeedinguptherecovery
6. Appendix
6.1. Acknowledgements
ThankstoArturoFagundo,DavidLevy,MariaBasmanova,WinCrofton,FabioRojas,SiteRecoveryManager
team,JohnLiang,RajitKambo,LeeDilworth,DesmondChan,MichaelWhite,SiteRecoveryManagereldsta,
andVMwareTechnicalMarketingortheirinputandreviewsonvariousdratsothispaper
6.2. About the author
AalapDesaiisaMTSPerformanceEngineeratVMware.Hehasbeenworkingontheperformanceproject
orVMwarevCenterSiteRecoveryManagerAalapreceivedhisMastersinComputerScienceromSyracuse
University
6.3. Reerences
VMwarevCenterSiteRecoveryManagerDocumentation
http://www.vmware.com/support/pubs/srm_pubs.html
VMwarevCenterSiteRecoveryManager4.0EvaluatorsGuide
http://www.vmware.com/fles/pd/vcenter-srm-evaluators-guide.pd
SiteRecoveryManagerAdministrationGuide
http://www.vmware.com/pd/srm-admin.pd
VMwarevCenterSiteRecoveryManagerResourcesforBusinessContinuity
http://www.vmware.com/products/srm/resource.html
ToolsforsizingtheSiteRecoveryManagerDatabasebasedontheinventorysizecanbefoundunderthe
Toolssectiononline.
http://www.vmware.com/support/pubs/srm_pubs.htmlhttp://www.vmware.com/files/pdf/vcenter-srm-evaluators-guide.pdfhttp://www.vmware.com/pdf/srm-admin.pdfhttp://www.vmware.com/products/srm/resource.htmlhttp://www.vmware.com/products/srm/resource.htmlhttp://www.vmware.com/pdf/srm-admin.pdfhttp://www.vmware.com/files/pdf/vcenter-srm-evaluators-guide.pdfhttp://www.vmware.com/support/pubs/srm_pubs.html