vmware vcenter srm wp en best practices

Upload: mmonce8916

Post on 30-May-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    1/20

    VMware vCenter

    SiteRecovery Manager 4.0

    Performance and Best

    Practices for Performance

    ArchitectingYourRecoveryPlantoMinimizeRecoveryTimeW H I T E P A P E R

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    2/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 2

    Table of Contents

    Introduction 3

    About VMware vCenter Site Recovery Manager 4.0 3

    Perormance Considerations in a Site Recovery Manager Environment 3

    About This Paper 3

    Reerence Setup Environment 4

    Site Recovery Manager Server Confguration Recommendation

    and Database Sizing 5

    Summary o perormance improvements made in Site Recovery Manager 4.0

    over Site Recovery Manager 1.0 6

    1. Basic Operation Latency Overview 7

    2. Test Recovery Time vs. Real Recovery Time 8

    3. Site Recovery Manager Scalability Perormance 9

    3.1. Protection (Scaling with a number o virtual machines) 9

    3.2. Recovery (Scaling with a number o virtual machines and scaling

    with a number o protection groups) 9

    3.3. High Latency Network 11

    3.3.1. Creating protection groups on a high latency network 11

    3.3.2. Recoveries on a high latency network 12

    4. Architecting Recovery Plans (From a perormance/recovery time perspective) 13

    4.1. Virtual machine to protection group relation 13

    4.2. Recovery time iSCSI/FC vs. NFS 14

    4.3. Placeholder virtual machines placement and VMware Distributed

    Resource Scheduler (DRS) behavior on the recovery site 14

    4.3.1. Job throttling during a recovery 15

    4.4. Standby hosts on the recovery site and enabling DPM 15

    4.5. High priority virtual machines and suspending virtual machines 17

    4.6. Advanced Settings/VMware Tools 18

    4.7. Speciy a Non-replicated Datastore or Swap Files 19

    5. Recommendations 19

    6. Appendix 20

    6.1. Acknowledgements 20

    6.2. About the author 20

    6.3. Reerences 20

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    3/20

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    4/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 4

    Reerence Setup Environment

    BelowisthesetupusedorallSiteRecoveryManagerexperimentspresentedinthiswhitepaper

    Note: SiteRecoveryManagerdoesnotimposesimilarhardwarerequirementsacrossbothsitesYoucanhave

    diferentnumberoESXServerhostsatprotectedandrecoverysites

    InthisparticularsetupbothSiteRecoveryManagerandvCenterServerwereinstalledonthesamephysical

    machine(thisappliestoboththeprotectedandtherecoverysite)

    Datastore Groups Datastore Groups

    Recovery SiteProtected SiteOver TCP

    Virtual CenterVirtual Center

    Array Replication

    Over TCP

    ++ ++Site Recovery

    ManagerSite Recovery

    Manager

    Fig 1.IllustrationotheSiteRecoveryManagerTestbedEnvironment

    Hardware/Sotware Confguration

    Experimental Setup

    SiteRecoveryManager40VMwarevCenter40andVMwareESX5U4wereusedorperormance

    measurements

    WANemwasusedorsimulatingahigh-latencynetworkwithpacketdrops

    Site Recovery Manager and VMware vCenter ServerProtected Site and Recovery Site Confguration

    HostComputer:DellPowerEdgeR900

    CPUs:2xQuadCoreE7310Xeon,1.6GHz

    RAM:0GB

    Network:BroadcomBCM5708CNetXtremeIIGigE

    SiteRecoveryManagerServersotware:SiteRecoveryManager40

    VMwarevCenterServersotware:VMwarevCenter40

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    5/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 5

    ESX System5 ESX 3.5 hosts on Protected Site, 5 ESX 3.5 hosts on Recovery Site

    HostComputer:DellPE2650

    CPUs:IntelXeonCPU3.06GHz

    RAM:8GB

    Network:NetXtremeBCM570GigabitEthernet00mbps

    ESXsotware:VMwareESX5U4

    Network Simulation Sotware

    WANemv

    Storage

    LeftHandSANIQ8.0VSAnodes-SRA_8.0.00.1682

    Site Recovery Manager Server Coniguration Recommendation and Database SizingBelowaretheminimumhardwareresourcerequirementsorSiteRecoveryManager40onbothsites:

    Processor2.0GHzorhigherIntelorAMDx86processor

    Memory2GBminimum

    DiskStorage2GBminimum

    NetworkingGigabitrecommended

    SiteRecoveryManager40uses400KBRAMperprotectedvirtualmachineortherecoverysiteduring

    recoveries

    SiteRecoveryManagerusesadatabaseonbothprotectedandrecoverysitestostoreinormationThe

    protectedsiteSiteRecoveryManagerdatabasestoresdataregardingtheprotectiongroupsettingsand

    protectedvirtualmachineswhiletherecoverysiteSiteRecoveryManagerdatabasestoresinormationonrecoveryplansettingsresultsortestingrecoveryplansresultsorrunningarealrecoverytransactionsmade

    duringatestorrealrecoveryrunandmoreSomeothediskspaceusageispermanentinnaturewhilesome

    oitistransientegthespacerequiredortemporarytransactionaldataduringarunningrecovery

    ItisrecommendedthattheSiteRecoveryManagerdatabasebeinstalledasclosetotheSiteRecoveryManager

    serveraspossibleasitreducestheRoundTripTimes(RTT)betweenbothothemThiswayrecoverytime

    perormancewillnotsufergreatlybecauseoshorterroundtripstothedatabaseserver

    YoucanusethesamedatabaseservertosupportthevCenterdatabaseinstanceandtheSiteRecovery

    Managerdatabaseinstance

    Databasesizeisdependentupon:

    Numberofprotectedvirtualmachines

    Numberofprotectiongroups

    Numberofrecoveryplans

    Transientdatawrittenduringtestandrealrecoveries

    Extrastepsaddedtotherecoveryplan

    Etc.

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    6/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 6

    ReertotheToolssectionintheSiteRecoveryManagerresources pageormoreinormationonhowtosize

    yourSiteRecoveryManagerdatabaseorOracleandSQLServerbasedonthementionedparameters

    Summary o perormance improvements made in Site Recovery Manager 4.0 over Site Recovery

    Manager 1.0

    Theollowingimprovementsarerepresentativeotheperormancedatacollectedromthelabenvironment

    describedintheIntroductionsectionThescaleoimprovementmayvaryinyourlabenvironmentasit

    dependsonthescaleoyourinventory 4andsetupconguration5

    UI improvements

    UIresponsivenessimprovementforInventoryMappingscreenwithalargescaleinventory

    UIresponsivenessimprovementsforcreatingprotectiongroupswithalargenumberofvirtualmachines

    UIperformancespeedupfordisplayinglargerecoveryplans

    Recovery time improvements

    Increasedparallelismforrecoveringvirtualmachines-2recoveryvirtualmachineoperationsstartedper

    hostinsteado

    NoticeableperformanceimprovementfordeletingswaplesduringrecoveriesavailablewithESX4.0U1

    Noticeableperformanceimprovementforpreparingstorageduringrecoveryplanexecutionviaecient

    databaseaccessoralargenumberovirtualmachines

    Noticeableperformanceimprovementforallconcurrentsub-stepsduringrecoveryplanexecutionviaecient

    databaseaccessoralargenumberovirtualmachines

    Noticeableperformanceimprovementinsearchingdatastoresduringrecoveryplanexecutionavailablewith

    VC40U

    Distributionofshadowvirtualmachinesacrosshostsontherecoverysiteduringprotectionimprovesrecovery

    timeperormanceOther

    Noticeableperformanceimprovementinrecoveryplancreation

    TestandrealrecoverytimeperormancehasimprovedinSiteRecoveryManager40whencomparedtoSite

    RecoveryManager0becauseotherecoverytimeimprovementsmentionedabove

    4 Inventory in reerence includes, but not limited to, virtual machines, protection groups, datastores, clusters, network and olders.

    5Setup conguration in reerence includes, but not limited to, hardware conguration and RTT between Site Recovery Manager Server and Site Recovery

    Manager database.

    http://www.vmware.com/products/srm/resource.htmlhttp://www.vmware.com/products/srm/resource.html
  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    7/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 7

    1. Basic Operation Latency Overview

    VMwarevCenterSiteRecoveryManagerrevolutionizesthewaydisasterrecoveryplansaredesignedand

    executedThisinvolvestwosimplesteps:protectionandrecovery

    Protectioninvolvesollowingoperations:

    Arraymanagerconguration

    Inventorymapping

    Creatingaprotectiongroup

    Recoveryinvolvesollowingoperations:

    Creatingarecoveryplan

    Testrecovery

    Realrecovery

    Theollowinggraphdepictsaveragebaselinelatenciesormajoroperations

    Note: Theactualnumberscanvaryinarealdeploymentandthenumberspresentedhereareroma

    specicsetupThevirtualmachinesusedorallthedatapresentedinthiswhitepaperdidnothaveanyIP

    customizationspecsoranywaitingorheartbeatsduringtherecoveryThemotivationwastoexerciseSite

    RecoveryManagerbehaviorduringarecoverywithdiferentsettings

    Inventory Mapping(1 entity)

    Creating a ProtectionGroup (1 VM)

    0.3 0.1

    106

    94

    6

    Creating aRecovery Plan (1 VM)

    Test Recovery(1 VM)

    Real Recovery(1 VM)

    Latency

    (sec)

    0

    20

    40

    60

    80

    100

    120

    Major SRM Operations

    Fig 2.BasicSiteRecoveryManageroperationslatencyoverview

    Creating a protection group:Protectingprotectedsitevirtualmachine(s)andcreatingplaceholdervirtual

    machine(s)ontherecoverysite

    Inventory mapping:Mappinginventoryentitieslikenetworkscomputeresourcesandvirtualmachineolders

    betweentheprotectedandtherecoverysite

    6In this white paper, latency is dened as the time taken or a certain operation to nish.

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    8/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 8

    Creating a recovery plan: Creatingarecoveryplanwithprotectiongroup(s)consistingoprotectedvirtual

    machine(s)

    Test recovery: Executingatestrecovery

    Real recovery:Executingarealrecovery

    2. Test Recovery Time vs. Real Recovery Time

    WithanydisasterrecoverysolutionyouneedtohaveanestimationotherecoverytimeInadditionto

    providingrealrecoverySiteRecoveryManagersupportsanon-disruptivetest-recoverymodeTheempirical

    datagatheredromthesetupshowthatrealrecoverytimeiscappedbytestrecoverytimeTestrecovery

    timeishigherthantherealrecoverytimeastheormerinvolvesrevertingthedatacenterbacktoitsoriginal

    stateThisinvolvesrecoverysiteoperationslikepoweringoftestvirtualmachinesresettingthestorageand

    replacingtestvirtualmachineswithplaceholdervirtualmachines(seeFigure)

    Arealrecoveryontheotherhandincludesstepsnotexecutedinatestrecoverysuchaspoweringof

    productionvirtualmachinesatprotectedsite(iprotectedsiteisstillavailableorexampleincaseoaplanned

    migration)

    Fig 3. Stepsexecutedonlyintestrecovery

    Fig 4.Stepexecutedonlyinrealrecovery(incaseodatacentermigration)

    Asageneralruleyoucangaugetherealrecoverytimebymeasuringthetimetakenoratestrecoveryuntilthe

    messagepromptstep(step8inFigure)Incaseoplanneddatacentermigrationyouneedtoaccountorthe

    timetakentoshutdownthevirtualmachines(stepinFigure4)ontheprotectedsiteaswell

    EstimatinganaccurateRTOisrelativelydicultastherearealotovariableparameterstoconsidersuchas

    networklatenciesresourceavailabilityandstorageI/OForinstancerecoveryoperationlatencyduringthe

    PrepareStoragestepinatestrecoverydiersfromthelatencyduringarealrecoverysincethetworecovery

    modesentaildiferentstorageleveloperations

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    9/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 9

    3. Site Recovery Manager Scalability Perormance

    Alltheollowingoperationswereexecutedoveralowlatencynetwork

    3.1. Protection (Scaling with a number o virtual machines)SiteRecoveryManager40allowsprotectingamaximumo000virtualmachinesSiteRecoveryManager

    scaleswellwithanincreasingnumberovirtualmachinesinaprotectiongroupMajoritytimeorthisoperation

    isspentincreatingplaceholdervirtualmachinesontherecoverysite

    Create Protection Groups

    1 protection group -

    100 virtual machines

    1 protection group -

    200 virtual machines

    1 protection group -

    300 virtual machines

    1 protection group -

    400 virtual machines

    1 protection group -

    500 virtual machines

    Time(M

    in)

    0

    5

    10

    15

    20

    25

    30

    35

    45

    Fig 5.Protectiongroupcreationtime:scalingwithvirtualmachinesunderasingleprotectiongroup

    3.2. Recovery (Scaling with a number o virtual machines and scaling with a number o protection

    groups)

    SiteRecoveryManager40allowsrecoveringamaximumo000virtualmachinesandamaximumo50

    protectiongroupsAsmentionedpreviouslytestrecoverytimeishigherthanrealrecoverytimeThesamecan

    beobservedbyscalingthenumberovirtualmachines

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    10/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 1 0

    SiteRecoveryManager40providessupportorreplicatedNFSstorageaswellasiSCSIandFCIngure6and

    7youwillndthestatisticswegatheredusingsotwareiSCSIreplicatedstorage

    Real RecoveryTest Recovery

    RecoveryTime(Min)

    0

    20

    40

    60

    80

    100

    120

    140

    160

    1 protection group -

    100 virtual machines1 protection group -

    200 virtual machines

    1 protection group -

    300 virtual machines

    1 protection group -

    400 virtual machines

    1 protection group -

    500 virtual machines

    Fig 6. Testandrealrecoverytime:scalingwithvirtualmachinesunderasingleprotectiongroupSotwareiSCSI

    Test Recovery

    RecoveryTime(Min)

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    30 protection groups -

    30 virtual machines

    60 protection groups -

    60 virtual machines

    90 protection groups -

    90 virtual machines

    120 protection groups -

    120 virtual machines

    150 protection groups -

    150 virtual machines

    FigTestrecoverytime:scalingwithprotectiongroups(virtualmachineperprotectiongroup)

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    11/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 1 1

    3.3. High Latency Network

    Hereisthesetupusedforsimulatingahighlatencynetworkacrossbothsites.WANem2.1wasusedtosimulate

    ahighlatencynetworkwithdiferentRTT

    Datastore Groups Datastore Groups

    Over TCP

    Array Replication

    Over TCP

    WANem

    Virtual Center ++ Virtual Center ++

    Recovery SiteProtected Site

    Site RecoveryManager

    Site RecoveryManager

    Fig 8.IllustrationotheSiteRecoveryManagerTestbedEnvironmentwithahighlatencynetwork

    NetworkRTTaretypically75to00millisecondsorintra-USnetworksabout50millisecondsor

    transatlanticnetworksand0to40millisecondsorsatellitenetworks

    Latenciesorcreatingprotectiongroupsandrealrecoveriesshowsomeimpactwithahighlatencynetworkbetweentheprotectedandtherecoverysite

    TheollowingexperimentswerecarriedoutwithRTTo0050400milliseconds

    3.3.1. Creating protection groups on a high latency network

    Latenciesorcreatingprotectiongroupsareafectedbyahighlatencynetworkbetweentheprotectedandthe

    recoverysite

    Creatingaprotectiongroupinvolvesthecreationandmonitoringoplaceholdervirtualmachinesonthe

    recoverysitethisinvolvesanumberoRemoteProcedureCalls(RPC)

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    12/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 1 2

    Creating a Protection Group

    1 protection group with

    100 virtual machines - 250ms RTT

    1 protection group with

    100 virtual machines - 400ms RTT

    Time(Min)

    0

    1

    2

    3

    4

    5

    6

    7

    8

    910

    1 protection group with

    100 virtual machines -100ms RTT

    Fig 9.CreatingprotectiongroupsonahighlatencynetworkwithdiferentRTTbetweenprotectedandrecoverysite

    3.3.2. Recoveries on a high latency network

    Realrecoveriesareafectedbyhighlatencynetworksbecauseotheoperationtoshutdownremotevirtual

    machineontheprotectedsiteThisappliestoplannedmigrationonlyandnottorealdisasters(astherewould

    benoaccesstotheprotectedsitevirtualmachinesinthiscase)Ahighlatencynetworkbetweenaprotected

    andarecoverysiteshouldnotafecttestrecoveriesThusnetworklatencybetweentheprotectedandthe

    recoverysitesneedstobeconsideredwhenestimatingrealrecoverytime

    Real RecoveryTest Recovery

    RecoveryTime(Min)

    20

    20.5

    21

    21.5

    22

    22.5

    23

    23.5

    1 protection group with

    100 virtual machines - 250ms RTT

    1 protection group with

    100 virtual machines - 400ms RTT

    1 protection group with

    100 virtual machines -100ms RTT

    Fig 10.TestandrealrecoverytimeonahighlatencynetworkwithdiferentRTT

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    13/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 1 3

    4. Architecting Recovery Plans (From a perormance/recovery time perspective)

    InthissectionSiteRecoveryManagerbestpracticesincertainareasareprovidedTheywillhelpyouarchitect

    ecientrecoveryplansthatminimizerecoverytime

    4.1. Virtual machine to protection group relation

    ForeachprotectiongroupincludedinarecoveryplanSiteRecoveryManagerneedstocommunicatewith

    theunderlyingstoragetocreatesnapshotsoreplicatedLUNs(orpromotereplicatedLUNsincaseoreal

    recovery)inthatprotectiongroupandpresentthemtorecoverysitehostsCurrentlythisoperationis

    executedsequentiallyoreachprotectiongroupAsaresultaddingmoreprotectiongroupstoarecovery

    planasopposedtoaddingmorevirtualmachinesismorecostlyromarecoverytimeperspectiveThisisalso

    dependentupontheunderlyingstorageusedorreplicationbetweenthetwosites

    Theollowingarerecoverytimemeasurementsoratestrecoveryor50virtualmachinesinasingleprotection

    groupvs50virtualmachinesin50protectiongroups(virtualmachineperprotectiongroup)

    Prepare Storage Recover VMs Power o Recovered VMs Reset Storage

    150 VMs under 1 Protection Group 150 VMs under 150 Protection Group

    TestRecoveryTime(sec)

    0

    1000

    2000

    3000

    4000

    5000

    6000

    Fig 11. Virtualmachinetoprotectiongrouprelation

    AsshowninFigureoraspecicenvironmenttheormercaseoutperormsthelatterThesenumberscan

    changesignicantlydependingontheenvironment

    Key Takeaways:

    Groupingvirtualmachinesinfewerprotectiongroupsenablesfastertestandrealrecoveriesprovidedthose

    virtualmachineshavenoconstraintspreventingthemrombeinggroupedundersimilarprotectiongroups

    Theaboverecommendationappliestobothtestandrealrecoveries.Note:thesteptoresetstorageisnota

    partotherealrecovery

    Addingacoupleofprotectiongroupsdoesnotnecessarilyincreasetherecoverytimebyalargefactor.The

    recoverytimeoverheadoaddingasingleprotectiongroupwithasinglevirtualmachineismorethanadding

    asinglevirtualmachineunderanexistingprotectiongroup

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    14/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 1 4

    4.2. Recovery time iSCSI/FC vs. NFS

    WhenworkingwithiSCSI/FCstorage,SiteRecoveryManagerinitiatesarescanontheHBAsonthehostsused

    ortherecoverytomakethestorageavailabletoallthehostsduringarecovery(testandreal)Theserescan

    callsareissuedinparallelacrossallrecoverysitehosts

    WhenworkingwithNFSstorageSiteRecoveryManagermountsthereplicatedNFSvolumes/snapshotson

    theESXhostsduringtherecoverySiteRecoveryManager40issuesallthemount/un-mountcallsinaserial

    mannerperhost

    ThismeansthatiittakesXsecondstomountasinglereplicatedNFSvolumeonasinglerecoverysitehostand

    youhave0volumestomountacross0hostsintherecoverysiteclusterthenitwilltake0*0*Xsecondsto

    mountallthevolumesacrossallthehostsduringarecovery(bothtestandreal)Similarbehaviorisexpected

    orunmountingvolumes

    ForlargescalerecoverieswithalargenumberohostsandNFSvolumeshostingtheprotectedvirtual

    machinesitmighttakemoretimetomountorun-mountNFSvolumesacrossalltherecoverysitehostsas

    comparedtorescanningallhostHBAsforiSCSIorFC.

    Key Takeaways:

    Itisagoodpracticetohavefewer,butlarger,NFSvolumessothatthetimetakentomountalargenumberof

    volumesdecreasesduringtherecovery

    Thismightalsotranslatetofewerprotectiongroupsonyoursetupwhichwillhelpevenmoreinreducingthe

    recoverytimeasmentionedinSection4.1

    4.3. Placeholder virtual machines placement and VMware Distributed Resource Scheduler (DRS) behav-

    ior on the recovery site

    SiteRecoveryManager40createsplaceholdervirtualmachinesattherecoverysiteoreachprotectedvirtual

    machineontheprotectedsitewhencreatingprotectiongroupsTheplacementotheseplaceholdervirtual

    machinesisamajoractorinSiteRecoveryManagerdeterminationowherethevirtualmachineswillbe

    poweredonduringrecoveryAruleothumbistodistributethemevenlyonalltheavailableESXhosts

    SiteRecoveryManager40distributesalltheshadowvirtualmachinesinarandomashionacrossallhosts

    withinacluster.Thisisthedefaultbehaviorandisnotdependentonanykindofclustersettings.Duringarecovery(testandreal)eachotheseplaceholdervirtualmachinesarereplacedbytheirrespectiverecovered

    virtualmachinesregisteredromtherecovereddatastore

    SiteRecoveryManager40initiatesadeaultoRecoveryvirtualmachinesoperationsperhostItcan

    concurrentlystartuptoamaxo8suchoperationsduringarecoveryThatiswhyitbecomesimportantto

    ensurethatplaceholdervirtualmachinesaredistributedacrossallhostsplannedtobeusedorrecoveryto

    obtainthemaxleveloparallelismSiteRecoveryManagerofersorrecoveryoperationsThiswillhelpin

    decreasingtherecoverytime

    Formoredetailsonjobthrottlingduringarecoveryreerto Section4.3.1

    Iyouareaddingnewhoststotherecoverysiteaterthevirtualmachineshavealreadybeenprotectedyou

    shouldmigrateshadowvirtualmachines(draganddrop)acrossthenewlyaddedhoststoullyleveragethe

    newhardwareorrecoveryinordertodecreaserecoverytime

    SiteRecoveryManager40alsoofershostailureresiliency(ieianyESXhosthostingaplaceholdervirtual

    machinedoesnotrespondduringarecoverybecauseoanyreasonsuchasthattheESXhostdoesnothave

    accesstotherecovereddatastorethenSiteRecoveryManagerselectsotheravailablehoststorecoverthat

    virtualmachineon)

    IfVMwareDRSisenabledontherecoverycluster,thenSiteRecoveryManagerwillutilizeVMwareDRStoload

    balancetherecoveredvirtualmachinesacrossthehoststoensureoptimalperormanceandRTOInthiscase

    youdonotneedtomanuallydistributeshadowvirtualmachinesacrossallthehosts

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    15/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 1 5

    IfVMwareDRSisnotenabledontherecoverycluster,concurrencyduringrecoverywillstillbemaintained

    becauseothedeaultrandomshadowvirtualmachineplacementduringvirtualmachineprotectionacrossthe

    hostswithintherecoverycluster

    4.3.1. Job throttling during a recoveryToalleviateresourcepressureonSiteRecoveryManagerServerandvCenterServerSiteRecoveryManager

    perormsjobthrottlingbylimitingthemaximumnumberoconcurrentoperationsinitiatedduringarecovery

    Mostothesub-stepsinarecoveryplanarecomprisedounitsoexecutioneachowhichisdispatchedtoa

    singleESXhostviavCenterServerTheollowingsub-stepssupportconcurrentexecutions:

    Shutdownprotectedvirtualmachinesontheprotectedsite(realrecoveryonly)

    Suspendnon-criticalvirtualmachines

    Recovernormalpriorityvirtualmachines

    Recoverlowpriorityvirtualmachines

    Recovernopoweronvirtualmachines

    Resumenon-criticalvirtualmachines(testrecoveryonly)

    Cleanupvirtualmachinesposttest(testrecoveryonly)

    Atanytimethenumberosub-stepsexecutedconcurrentlyis8ortimesthenumberoESXhostsusedor

    recovery,whicheverisless.Addingmorethan9hoststoyourrecoverysitewillnotspeeduptheRecovery

    virtualmachinestepTheseadditionalhostswillbeusedorrecoveringvirtualmachinesbutthemaximum

    numberofhoststhatcanbeusedsimultaneouslybySiteRecoveryManagerduringarecoveryis9.

    Key Takeaways:

    ItisagoodpracticetohaveVMwareDRSenabledonarecoverysite.MigrationsmightbeinitiatedasDRS

    triestoloadbalancetheclusterduringtherecovery

    IfDRSisnotenabled,ensurethatallrecoverysitehostsarebeingusedforhostingplaceholdervirtual

    machinesThiswillensuremaximumparallelismduringrecoveryEvenlydistributeallplaceholdervirtualmachinesmanuallyacrossallhostsinthiscase

    4.4. Standby hosts on the recovery site and enabling DPM

    SiteRecoveryManager4.0workswithvSpheretorecovervirtualmachinesonaDPMenabledcluster.

    IftherecoverysiteclusterhasDPMenabled,SiteRecoveryManagerworkswithDPMtopoweronanystandby

    hostsinthecluster.SiteRecoveryManagerthenworkswithDRStousethesehostsforrecoveringvirtual

    machinesduringatestandrealrecovery

    Forthecasepresentedbelowarealrecoverywith00virtualmachineswascarriedoutonarecoverysite

    cluster(withDPMdisabledandenabled)with5hostsoutofwhich2wereplacedinstandbymode.

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    16/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 1 6

    DPM DISABLED on a cluster with 2 standby hosts DPM ENABLED on a cluster with 2 standby hosts

    Real Recovery - with 1 protection group - 100 virtual machines

    RecoveryTime(sec)

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    Fig 12.RecoverytimebenetswithVMwareDPM:recoveringvirtualmachinesunderasingleprotectiongroup

    RecoverywithDPMenabledcompletedfasterasSiteRecoveryManagerhad2extrapowered-onhoststouse

    orrecoveringvirtualmachinesconcurrently

    VirtualmachinerecoverystartsonlyaterSiteRecoveryManagerhasnishedbringingallESXhostsouto

    thestandbymodeThesestandbyESXhostsarepoweredonconcurrentlyThispower-onprocesscreatesan

    overheadwhichisthemaximumothetimetakentobringanyonehostoutostandbymodeIngeneralthe

    aorementionedoverheadisrelativelysmallcomparedtothegaininrecoverytimeperormanceduetothe

    availabilityomoreESXhostsSincethisoverheaddoesnotvaryasthenumberovirtualmachinesincreases

    youwillreapmoreperormancebenetsiyouhavealargernumberoprotectedvirtualmachines

    Key Takeaways:

    ItisagoodpracticetoenableDPMonrecoverysiteclustersifyourrecoverysitehostsareinstandbymode.

    Morehostsleadtomoreconcurrencyorrecoveringvirtualmachinesandthusresultsinshorterrecoverytime

    IfthehostsareinastandbymodeandDPMisnotenabled,bringthehostsoutofstandbymodemanuallyand

    draganddropshadowvirtualmachinesonthem

    Whenprotectingvirtualmachines(i.e.creatingprotectiongroups)ensurethattherecoverysitehosts

    mappedundertherespectiveinventoryareinaproperpoweredonstateotherwiseSiteRecoveryManager

    willnotusethosehoststocreateplaceholdervirtualmachines

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    17/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 1 7

    4.5. High priority virtual machines and suspending virtual machines

    Inarecoveryplanthevirtualmachinesbeingrecoveredcanbeassignedashighnormalorlowpriorityvirtual

    machines

    Theassignmentotheorderothesevirtualmachinesandthetotalnumberovirtualmachinesplacedineachcategorywillafecttherecoverytime

    ForexampleshowninFigurearetherecoverytimesorexecutingatestrecoverywithincreasingnumbero

    highpriorityvirtualmachines

    Test Recovery

    1 protection group wtih

    100 virtual machines -

    20 of which were high priority

    1 protection group wtih

    100 virtual machines -

    40 of which were high priority

    1 protection group wtih

    100 virtual machines -

    60 of which were high priority

    RecoveryTime(Min)

    23

    24

    25

    26

    27

    28

    29

    30

    31

    Fig 13.Testrecovery:varyingthenumberohighpriorityvirtualmachines

    Asyoucanseetherecoverytimedoesincreasewiththenumberovirtualmachinesplacedinhighpriorityas

    allhighpriorityoperationstorecovervirtualmachineswillbeexecutedsequentiallyThisappliestobothreal

    andtestrecoveries

    Asanalternativetoplacingsomevirtualmachinesinthehighprioritygroupandsomevirtualmachinesinthe

    normalprioritygroupinasinglerecoveryplanonecanseparateoutallthevirtualmachinestoberecovered

    intologicalgroupsGroupwiththerstlevelovirtualmachinestoberecoveredGroupwiththesecond

    levelovirtualmachinestoberecovered(iedependentupontherstlevelvirtualmachinesinGroup)

    VirtualmachinesinGroupcanbeplacedinnormalpriorityandvirtualmachinesinGroupcanbeplacedinlow

    priorityBothotheseprioritygroupsallowparallelrecoveryovirtualmachinesSincethelowprioritygroup

    willstartaterthenormalprioritygroupyouwillstillbemaintainingthedependencybetweenthelogical

    groupsovirtualmachineswithoutsacricingtherecoverytimeNotethatdependencyismaintainedbetween

    prioritygroupsandthereisnoconceptodependencyacrossvirtualmachineswithinasingleprioritygroup

    Key Takeaways:

    Itisimportanttochartoutthedependenciesbetweenvirtualmachinestoberecoveredheresothatonlya

    certainnumberorequiredvirtualmachinescanbeassignedashighpriorityItdoesimpactrecoverytime

    testaswellasrealrecovery

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    18/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 1 8

    Asanalternativetoplacingvirtualmachinesinhighprioritygroup,onecanseparateallthevirtualmachines

    toberecoveredinlogicalgroupsGroupwithlevelvirtualmachinesGroupwithlevelvirtualmachines

    whicharedependentuponvirtualmachinesinGroupGroupvirtualmachinescanthenbeplacedinthe

    normalprioritygroupandGroupvirtualmachinesinthelowprioritygroupwithinthesamerecovery

    planThisshouldmaintaindependencyacrossbothlogicalgroupsandwillstillreducetherecoverytimeby

    introducingmoreconcurrencyorbothprioritygroupsNotethatdependencyismaintainedbetweenpriority

    groupsandthereisnoconceptodependencyacrossvirtualmachineswithinasingleprioritygroup

    Suspendingvirtualmachinesontherecoverysitewillalsoimpactrecoverytime.

    4.6. Advanced Settings/VMware Tools

    SiteRecoveryManager40providesthecapabilitytochangetheadvancedsettings 7Oneadvancedsettingthat

    isrelatedtoSiteRecoveryManagerperormanceisSanProviderminLunGroupComputationInterval

    WhenevertheprotectedsiteinventorychangesSiteRecoveryManagerwillperormanewLUNGroup

    computationIyouareaddingmultiplevirtualmachinestothedatacenterorchanginganyinventory

    ingeneralyoucangetSiteRecoveryManagertowaitbeoredoinganothercomputationbysetting

    SanProviderminLunGroupComputationIntervaltotheapproximatetimetakentomakethechangeForexample

    settingSanProviderminLunGroupComputationIntervaltoNtellsSiteRecoveryManagerthatthereshouldbeatleastNsecondsinbetweenanytwoconsecutiveLUNGroupComputationtasksThissettingisintendedto

    makeLUNGroupComputationtaskslessrequentwhentherearealotoinventorychangesgoingon

    VMwarestronglyrecommendsthatVMwareToolsbeinstalledinallprotectedvirtualmachinesManySite

    RecoveryManagerrecoveryoperationsdependontheproperinstallationoVMwareToolsintheprotected

    virtualmachines

    WaitforOSheartbeatwhilepoweringonvirtualmachineandwaitfornetworkchangewhilereconguring

    recoveredvirtualmachine

    SiteRecoveryManagerdependsonVMwareToolstoreportOSheartbeatandnishonetworkchangeI

    youdonothaveVMwareToolsinstalledonanyotheprotectedvirtualmachinesyoucanchoosetosetthe

    timeoutvaluesforWaitforOSHeartbeatandWaitforNetworkChangetozero(0).

    Waitforvirtualmachinestoshutdownontheprotectedsite.

    Duringarealrecovery,SiteRecoveryManagertriestogracefullyshutdownthevirtualmachinesonthe

    protectedsiteBeoreSiteRecoveryManagerorciblypowersavirtualmachineofittriestoshutdownthe

    guestOSIyourintentionistopowerofthevirtualmachineswithoutgraceullyshuttingdowntheguestOS

    youcansetthetimeoutvaluecalledRecoverypowerStateChangeTimeoutintheadvancedsettingstozero

    Note:thisappliesonlyorvirtualmachineswithVMwareToolsinstalledandthetimeoutisautomaticallyset

    tozero(0)inoVMwareToolsareinstalled

    7 Reer to the subsection Working with Advanced Settings in section 5 o Site Recovery Manager Administration Guide or more details on Advanced Settings.

    http://www.vmware.com/pdf/srm-admin.pdfhttp://www.vmware.com/pdf/srm-admin.pdf
  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    19/20

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    T E C H N I C A L W H I T E P A P E R / 1 9

    4.7. Speciy a Non-replicated Datastore or Swap Files

    Everyvirtualmachinerequiresaswaplewhichisnormallycreatedinthesamedatastoreastheothervirtual

    machinelesWhenyouuseSiteRecoveryManagerthisdatastoreisreplicatedTopreventswaplesrom

    beingreplicatedcreatethemonanon-replicateddatastore

    Iyouareusinganon-replicateddatastoreorswaplesyoumustcreateanon-replicateddatastoreorall

    protectedclustersatboththeprotectedandrecoverysites

    Procedure

    1. In the vSphere Client, right-click an ESX cluster and click Edit Settings.

    2. In the Settings window or the cluster, click Swapfle Location and select Store the swaple in the

    datastore specied by the host, then click OK.

    3. For each host in the cluster, select a nonreplicated datastore.

    a. Click the Confguration tab.

    b. On the Swaple Location line, click Edit.

    c. In the Virtual Machine Swaple Location window, select a nonreplicated datastore and click OK.

    Recovery Time advantages

    SiteRecoveryManagermakesremotecallstovCenterServertodeleteanyswaplesoundonareplicated

    datastoreduringarecoveryontherecoverysiteIswaplesresideonnon-replicateddatastoresthenthisstep

    willbeskippedanditwillhelpspeeduptherecoveryThiswillalsoavoidwastingonetworkbandwidthduring

    replicationbetweenthesites

    5. Recommendations

    VMwarevCenterSiteRecoveryManagerprovidesadvancedcapabilitiesordisasterrecoverymanagement

    non-disruptivetestingandautomatedailoverTheollowingperormancerecommendationshavebeenmade

    inthispaper:

    ItisrecommendedthatSiteRecoveryManagerdatabasebeinstalledasclosetotheSiteRecoveryManagerserveraspossiblesuchthatitreducestheRTTbetweenbothothemThiswayrecoverytimeperormance

    willnotsufergreatlybecauseoroundtripstothedatabaseserver

    Groupingvirtualmachinesunderfewerprotectiongroupsenablesfastertestandrealrecoveries,provided

    thosevirtualmachineshavenoconstraintspreventingthemrombeinggroupedundersimilarprotection

    groups

    ItisagoodpracticetohavefewerbutlargerNFSvolumessothatthetimetakentomountalargenumber

    osuchvolumesdecreasesduringtherecoveryThismightalsotranslatetoewerprotectiongroupsonyour

    setupleadingtoreducedrecoverytime

    ItisagoodpracticetohaveVMwareDRSenabledonarecoverysite.MigrationsmightbeseenasDRStriesto

    loadbalancetheclusterduringtherecovery

    ItisagoodpracticetoenableDPMonrecoverysiteclustersifyourrecoverysitehostsareinstandbystate.Morehostsleadtomoreconcurrencyorrecoveringvirtualmachinesandthusresultsinshorterrecoverytime

    IfDPMisnotenabledandthehostsareinastandbystate,bringthehostsoutofstandbymodemanuallyand

    draganddropshadowvirtualmachinesonthem

  • 8/14/2019 Vmware Vcenter Srm Wp en Best Practices

    20/20

    VMware, Inc.HillviewAvenuePaloAltoCAUSATel--Fax--wwwvmwarecom

    VMware vCenter Site Recovery Manager 4.0Performance and Best Practices for Performance

    Itisimportanttochartoutthedependenciesbetweenandprioritiesforvirtualmachinestoberecoveredso

    thatonlyacertainnumberorequiredvirtualmachinescanbeassignedashighpriorityAsanalternative

    toplacingvirtualmachinesinhighprioritygrouponecanseparateallthevirtualmachinestoberecovered

    inlogicalgroupsGroupwithlevelvirtualmachinesandGroupwithlevelvirtualmachineswhich

    aredependentuponvirtualmachinesinGroupGroupvirtualmachinescanthenbeplacedinnormal

    prioritygroupandGroupvirtualmachinesinthelowprioritygroupwithinthesameplanThisshould

    maintaindependencyacrossbothlogicalgroupsandwillstillreducetherecoverytimebyintroducingmore

    concurrencyorbothprioritygroupsNotethatdependencyismaintainedbetweenprioritygroupsandthere

    isnoconceptodependencyacrossvirtualmachineswithinasingleprioritygroup

    ItisstronglyrecommendedthatVMwareToolsbeinstalledinallprotectedvirtualmachinesinorderto

    accuratelyacquiretheirheartbeatsandnetworkchangenotication

    Makesureanyinternalscriptorcall-outpromptdoesnotblockrecoveryindenitely.

    Specifyanon-replicateddatastoreforswaples.Thisavoidswastingnetworkbandwidthduringreplication

    betweensitesandremotecallstovCenterServerduringarecoverytodeleteswaplesorallvirtual

    machineswhichinturnhelpsinspeedinguptherecovery

    6. Appendix

    6.1. Acknowledgements

    ThankstoArturoFagundo,DavidLevy,MariaBasmanova,WinCrofton,FabioRojas,SiteRecoveryManager

    team,JohnLiang,RajitKambo,LeeDilworth,DesmondChan,MichaelWhite,SiteRecoveryManagereldsta,

    andVMwareTechnicalMarketingortheirinputandreviewsonvariousdratsothispaper

    6.2. About the author

    AalapDesaiisaMTSPerformanceEngineeratVMware.Hehasbeenworkingontheperformanceproject

    orVMwarevCenterSiteRecoveryManagerAalapreceivedhisMastersinComputerScienceromSyracuse

    University

    6.3. Reerences

    VMwarevCenterSiteRecoveryManagerDocumentation

    http://www.vmware.com/support/pubs/srm_pubs.html

    VMwarevCenterSiteRecoveryManager4.0EvaluatorsGuide

    http://www.vmware.com/fles/pd/vcenter-srm-evaluators-guide.pd

    SiteRecoveryManagerAdministrationGuide

    http://www.vmware.com/pd/srm-admin.pd

    VMwarevCenterSiteRecoveryManagerResourcesforBusinessContinuity

    http://www.vmware.com/products/srm/resource.html

    ToolsforsizingtheSiteRecoveryManagerDatabasebasedontheinventorysizecanbefoundunderthe

    Toolssectiononline.

    http://www.vmware.com/support/pubs/srm_pubs.htmlhttp://www.vmware.com/files/pdf/vcenter-srm-evaluators-guide.pdfhttp://www.vmware.com/pdf/srm-admin.pdfhttp://www.vmware.com/products/srm/resource.htmlhttp://www.vmware.com/products/srm/resource.htmlhttp://www.vmware.com/pdf/srm-admin.pdfhttp://www.vmware.com/files/pdf/vcenter-srm-evaluators-guide.pdfhttp://www.vmware.com/support/pubs/srm_pubs.html