table of contents - vmgu.ru · table of contents introduction disclaimer about the author...

163

Upload: others

Post on 01-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere
Page 2: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

TableofContentsIntroduction

Disclaimer

Abouttheauthor

IntroductiontoHA

ComponentsofHA

FundamentalConcepts

RestartingVirtualMachines

VirtualSANandVirtualVolumesspecifics

AddingresiliencytoHA

AdmissionControl

VMandApplicationMonitoring

vSphereHAand...

UseCase-StretchedClusters

AdvancedSettings

Summarizing

Changelog

vSphere6.xHADeepdive

2

Page 3: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

VMwarevSphere6.xHADeepdiveLikemanyofyouIamconstantlytryingtoexplorenewwaystosharecontentwiththerestoftheworld.OverthecourseofthelastdecadeIhavedonethisinmanydifferentformats,someofthemwereeasytodoandothersnotsomuch.Booksalwaysfellinthatlastcategory,whichisashameasIhavealwaysenjoyedwritingthem.

Iwantedtoexplorethedifferentoptionstherearetocreatecontentandshareitindifferentways,withouttheneedtore-doformattingandwastealotoftimeonthingsIdonotwanttowastetimeon.AfteranafternoonofreadingandresearchingGitBookpoppedup.Itlookedlikeaninterestingplatform/solutionthatwouldallowmetocreatecontentbothonlineandoffline,pushandpullittoandfromarepositoryandbuildbothastaticwebsitefromitaswellaspublishitinavarietyofdifferentformats.

Letitbeclearthatthisisatrial,andthismayormaynotresultinafollowup.IamstartingwiththevSphereHighAvailabilitycontentasthatiswhatIammostfamiliarwithandwillbeeasiesttoupdate.

Aspecialthanksgoesouttoeveryonewhohascontributedinanyshapeorformtothisproject.FirstofallFrankDenneman,thepersonwhomIwrotethefirst3versionsoftheClusteringDeepdivewithandwhodesignedallthegreatdiagramswhichyoufindthroughoutthispublication.Ofcoursealso:DougBaerforeditingthecontentinthepastandmytechnicalconscious:KeithFarkas,CormacHogan,ManojKrishnan,AnneHoller,MustafaUysalandGabrielTarasuk-Levin.

Forofflinereading,feelfreetodownloadthispublicationinanyofthefollowingformats:PDF-ePub-Mobi.

ThesourceofthispublicationisstoredonbothGitbookaswellasGithub.Feelfreetosubmit/contributewherepossibleandneeded.Notethatitisalsopossibletoleavefeedbackonthecontentbysimplyclickingonthe"+"ontherightsideoftheparagraphyouwanttocommenton(hoveroveritwithyourmouse).IwillreadandincorporatefeedbackassoonasIhavetime,henceitisusefultocheckbackregularlyandvalidateyourdownloadedversionagainstthedetailsbelow.

vSphere6.xHADeepdive,bookversion:1.0.4.BookbuiltwithGitBookversion:2.6.7.

Thanksforreading,andenjoy!

vSphere6.xHADeepdive

3Introduction

Page 4: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

DuncanEppingChiefTechnologistStorageandAvailability-VMware

vSphere6.xHADeepdive

4Introduction

Page 5: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

DisclaimerAlthougheveryprecautionhasbeentakeninthepreparationofthisbook,thepublisherandauthorassumenoresponsibilityforerrorsoromissions.Neitherisanyliabilityassumedfordamagesresultingfromtheuseoftheinformationcontainedherein.

TheauthorofthispublicationworksforVMware.Theopinionsexpressedhereistheauthor'spersonalopinion.ContentpublishedwasnotapprovedinadvancebyVMwareanddoesnotnecessarilyreflecttheviewsandopinionofVMware.Thisistheauthor'sbook,notaVMware.

Copyrights/Licensing

Figure1-CreativeCommonsLicense

vSphere6.xHADeepdive

5Disclaimer

Page 6: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

AbouttheAuthorDuncanEppingisaChiefTechnologistworkingintheOfficeofCTOofVMware'sStorageandAvailabilitybusinessunit.Inthatrole,heservesasapartnerandtrustedadvisertoVMware’scustomersprimarilyinEMEA.MainresponsibilitiesareensuringVMware’sfutureinnovationsalignwithessentialcustomerneedsandtranslatingcustomerproblemstoopportunities.DuncanspecializesinSoftwareDefinedStorage,hyper-convergedinfrastructuresandbusinesscontinuity/disasterrecoverysolutions.Hehas1patentgrantedand4patentspendingonthetopicofavailability,storageandresourcemanagement.DuncanisaVMwareCertifiedDesignExpert(VCDX007)andthemainauthorandownerofVMware/VirtualizationblogYellow-Bricks.com.

Hecanbefollowedontwitter@DuncanYB.

vSphere6.xHADeepdive

6Abouttheauthor

Page 7: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

IntroductiontovSphereHighAvailabilityAvailabilityhastraditionallybeenoneofthemostimportantaspectswhenprovidingservices.WhenprovidingservicesonasharedplatformlikeVMwarevSphere,theimpactofdowntimeexponentiallygrowsasmanyservicesrunonasinglephysicalmachine.AssuchVMwareengineeredafeaturecalledVMwarevSphereHighAvailability.VMwarevSphereHighAvailability,hereaftersimplyreferredtoasHA,providesasimpleandcosteffectivesolutiontoincreaseavailabilityforanyapplicationrunninginavirtualmachineregardlessofitsoperatingsystem.ItisconfiguredusingacoupleofsimplestepsthroughvCenterServer(vCenter)andassuchprovidesauniformandsimpleinterface.HAenablesyoutocreateaclusteroutofmultipleESXihosts.Thiswillallowyoutoprotectvirtualmachinesandtheirworkloads.Intheeventofafailureofoneofthehostsinthecluster,impactedvirtualmachinesareautomaticallyrestartedonotherESXihostswithinthatsameVMwarevSphereCluster(cluster).

Figure2-HighAvailabilityinaction

Ontopofthat,inthecaseofaGuestOSlevelfailure,HAcanrestartthefailedGuestOS.ThisfeatureiscalledVMMonitoring,butissometimesalsoreferredtoasVM-HA.Thismightsoundfairlycomplexbutagaincanbeimplementedwithasingleclick.

vSphere6.xHADeepdive

7IntroductiontoHA

Page 8: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure3-OSLevelHAjustasingleclickaway

Unlikemanyotherclusteringsolutions,HAisasimplesolutiontoimplementandliterallyenabledwithin5clicks.Ontopofthat,HAiswidelyadoptedandusedinallsituations.However,HAisnota1:1replacementforsolutionslikeMicrosoftClusteringServices/WindowsServerFailoverClustering(WSFC).ThemaindifferencebetweenWSFCandHAbeingthatWSFCwasdesignedtoprotectstatefulcluster-awareapplicationswhileHAwasdesignedtoprotectanyvirtualmachineregardlessofthetypeofworkloadwithin,butalsocanbeextendedtotheapplicationlayerthroughtheuseofVMandApplicationMonitoring.

InthecaseofHA,afail-overincursdowntimeasthevirtualmachineisliterallyrestartedononeoftheremaininghostsinthecluster.WhereasMSCStransitionstheservicetooneoftheremainingnodesintheclusterwhenafailureoccurs.Incontrarytowhatmanybelieve,WSFCdoesnotguaranteethatthereisnodowntimeduringatransition.Ontopofthat,yourapplicationneedstobecluster-awareandstatefulinordertogetthemostoutofthismechanism,whichlimitsthenumberofworkloadsthatcouldreallybenefitfromthistypeofclustering.

OnemightaskwhywouldyouwanttouseHAwhenavirtualmachineisrestartedandserviceistemporarilylost.Theanswerissimple;notallvirtualmachines(orservices)need99.999%uptime.FormanyservicesthetypeofavailabilityHAprovidesismorethansufficient.Ontopofthat,manyapplicationswereneverdesignedtorunontopofanWSFCcluster.ThismeansthatthereisnoguaranteeofavailabilityordataconsistencyifanapplicationisclusteredwithWSFCbutisnotcluster-aware.

Inaddition,WSFCclusteringcanbecomplexandrequiresspecialskillsandtraining.Oneexampleismanagingpatchesandupdates/upgradesinaWSFCenvironment;thiscouldevenleadtomoredowntimeifnotoperatedcorrectlyanddefinitelycomplicatesoperationalprocedures.HAhoweverreducescomplexity,costs(associatedwithdowntimeandMSCS),resourceoverheadandunplanneddowntimeforminimaladditionalcosts.ItisimportanttonotethatHA,contrarytoWSFC,doesnotrequireanychangestotheguestasHAisprovidedonthehypervisorlevel.Also,VMMonitoringdoesnotrequireanyadditionalsoftwareorOSmodificationsexceptforVMwareTools,whichshouldbeinstalledanywayasabestpractice.Incaseevenhigheravailabilityisrequired,VMwarealsoprovidesalevelof

vSphere6.xHADeepdive

8IntroductiontoHA

Page 9: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

applicationawarenessthroughApplicationMonitoring,whichhasbeenleveragedbypartnerslikeSymantectoenableapplicationlevelresiliencyandcouldbeusedbyin-housedevelopmentteamstoincreaseresiliencyfortheirapplication.

HAhasprovenitselfoverandoveragainandiswidelyadoptedwithintheindustry;ifyouarenotusingittoday,hopefullyyouwillbeconvincedafterreadingthissectionofthebook.

vSphere6.0BeforewediveintothemainconstructsofHAanddescribeallthechoicesonehastomakewhenconfiguringHA,wewillfirstbrieflytouchonwhat’snewinvSphere6.0anddescribethebasicrequirementsandstepsneededtoenableHA.ThisbookcoversallthereleasedversionsofwhatisknownwithinVMwareas“FaultDomainManager”(FDM)whichwasintroducedwithvSphere5.0.Wewillcalloutthedifferencesinbehaviorinthedifferentversionswhereapplicable,ourbaselinehoweverisvSphere6.0.

What’sNewin6.0?

ComparedtovSphere5.0thechangesintroducedwithvSphere6.0forHAappeartobeminor.However,someofthenewfunctionalitywillmakethelifeofmanyofyoumucheasier.Althoughthelistisrelativelyshort,fromanengineeringpointofviewmanyofthesethingshavebeenanenormouseffortastheyrequiredchangetothedeepfundamentsoftheHAarchitecture.

SupportforVirtualVolumes–WithVirtualVolumesanewtypeofstorageentityisintroducedinvSphere6.0.ThishasalsoresultedinsomechangesintheHAarchitecturetoaccommodateforthisnewwayofstoringvirtualmachinesSupportforVirtualSAN–ThiswasactuallyintroducedwithvSphere5.5,butasitisnewtomanyofyouandledtochangesinthearchitecturewedecidedtoincludeitinthisupdateVMComponentProtection–ThisallowsHAtorespondtoascenariowheretheconnectiontothevirtualmachine’sdatastoreisimpactedtemporarilyorpermanently

HA“ResponseforDatastorewithAllPathsDown”HA“ResponseforDatastorewithPermanentDeviceLoss”

Increasedhostscale–Clusterlimithasgrownfrom32to64hostsIncreasedVMscale–Clusterlimithasgrownfrom4000VMsto8000VMsperclusterSecureRPC–SecurestheVM/AppmonitoringchannelFullIPv6supportRegistrationof“HADisabled”VMsonhostsafterfailure

vSphere6.xHADeepdive

9IntroductiontoHA

Page 10: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

WhatisrequiredforHAtoWork?EachfeatureorproducthasveryspecificrequirementsandHAisnodifferent.KnowingtherequirementsofHAispartofthebasicswehavetocoverbeforedivingintosomeofthemorecomplexconcepts.ForthosewhoarecompletelynewtoHA,wewillalsoshowyouhowtoconfigureit.

Prerequisites

BeforeenablingHAitishighlyrecommendvalidatingthattheenvironmentmeetsalltheprerequisites.Wehavealsoincludedrecommendationsfromaninfrastructureperspectivethatwillenhanceresiliency.

Requirements:

MinimumoftwoESXihostsMinimumof5GBmemoryperhosttoinstallESXiandenableHAVMwarevCenterServerSharedStorageforvirtualmachinesPingablegatewayorotherreliableaddress

Recommendation:

RedundantManagementNetwork(notarequirement,buthighlyrecommended)8GBofmemoryormoreperhostMultipleshareddatastores

FirewallRequirements

ThefollowingtablecontainstheportsthatareusedbyHAforcommunication.Ifyourenvironmentcontainsfirewallsexternaltothehost,ensuretheseportsareopenedforHAtofunctioncorrectly.HAwillopentherequiredportsontheESXorESXifirewall.

Port Protocol Direction

8182 UDP Inbound

8182 TCP Inbound

8182 UDP Outbound

8182 TCP Outbound

ConfiguringvSphereHighAvailability

vSphere6.xHADeepdive

10IntroductiontoHA

Page 11: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

HAcanbeconfiguredwiththedefaultsettingswithinacoupleofclicks.ThefollowingstepswillshowyouhowtocreateaclusterandenableHA,includingVMMonitoring,usingthevSphereWebClient.Eachofthesettingsandthedesigndecisionsassociatedwiththesestepswillbedescribedinmoredepthinthefollowingchapters.

1. Click“Hosts&Clusters”underInventoriesontheHometab.2. Right-clicktheDatacenterintheInventorytreeandclickNewCluster.3. Givethenewclusteranappropriatename.Werecommendataminimumincludingthe

locationoftheclusterandasequencenumberie.ams-hadrs-001.4. SelectTurnOnvSphereHA.5. Ensure“Enablehostmonitoring”and“Enableadmissioncontrol”isselected.6. Select“Percentageofclusterresources…”underPolicyandspecifyapercentage.7. EnableVMMonitoringStatusbyselecting“VMandApplicationMonitoring”.8. Click“OK”tocompletethecreationofthecluster.

Figure4-ReadytocompletetheNewClusterWizard

vSphere6.xHADeepdive

11IntroductiontoHA

Page 12: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

WhentheHAclusterhasbeencreated,theESXihostscanbeaddedtotheclustersimplybyrightclickingthehostandselecting“MoveTo”,iftheywerealreadyaddedtovCenter,orbyrightclickingtheclusterandselecting“AddHost”.

WhenanESXihostisaddedtothenewly-createdcluster,theHAagentwillbeloadedandconfigured.Oncethishascompleted,HAwillenableprotectionoftheworkloadsrunningonthisESXihost.

Aswehaveclearlydemonstrated,HAisasimpleclusteringsolutionthatwillallowyoutoprotectvirtualmachinesagainsthostfailureandoperatingsystemfailureinliterallyminutes.UnderstandingthearchitectureofHAwillenableyoutoreachthatextra9whenitcomestoavailability.ThefollowingchapterswilldiscussthearchitectureandfundamentalconceptsofHA.Wewillalsodiscussalldecision-makingmomentstoensureyouwillconfigureHAinsuchawaythatitmeetstherequirementsofyouroryourcustomer’senvironment.

vSphere6.xHADeepdive

12IntroductiontoHA

Page 13: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

ComponentsofHighAvailabilityNowthatweknowwhatthepre-requisitesareandhowtoconfigureHAthenextstepswillbedescribingwhichcomponentsformHA.Keepinmindthatthisisstilla“highlevel”overview.Thereismoreunderthecoverthatwewillexplaininfollowingchapters.Thefollowingdiagramdepictsatwo-hostclusterandshowsthekeyHAcomponents.

Figure5-ComponentsofHighAvailability

Asyoucanclearlysee,therearethreemajorcomponentsthatformthefoundationforHAasofvSphere6.0:

FDMHOSTDvCenter

ThefirstandprobablythemostimportantcomponentthatformsHAisFDM(FaultDomainManager).ThisistheHAagent.

vSphere6.xHADeepdive

13ComponentsofHA

Page 14: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

TheFDMAgentisresponsibleformanytaskssuchascommunicatinghostresourceinformation,virtualmachinestatesandHApropertiestootherhostsinthecluster.FDMalsohandlesheartbeatmechanisms,virtualmachineplacement,virtualmachinerestarts,loggingandmuchmore.Wearenotgoingtodiscussallofthisin-depthseparatelyaswefeelthatthiswillcomplicatethingstoomuch.

FDM,inouropinion,isoneofthemostimportantagentsonanESXihost,whenHAisenabled,ofcourse,andweareassumingthisisthecase.TheengineersrecognizedthisimportanceandaddedanextralevelofresiliencytoHA.FDMusesasingle-processagent.However,FDMspawnsawatchdogprocess.Intheunlikelyeventofanagentfailure,thewatchdogfunctionalitywillpickuponthisandrestarttheagenttoensureHAfunctionalityremainswithoutanyoneevernoticingitfailed.Theagentisalsoresilienttonetworkinterruptionsand“allpathsdown”(APD)conditions.Inter-hostcommunicationautomaticallyusesanothercommunicationpath(ifthehostisconfiguredwithredundantmanagementnetworks)inthecaseofanetworkfailure.

HAhasnodependencyonDNSasitworkswithIPaddressesonly.ThisisoneofthemajorimprovementsthatFDMbrought.ThisdoesnotmeanthatESXihostsneedtoberegisteredwiththeirIPaddressesinvCenter;itisstillabestpracticetoregisterESXihostsbyitsfullyqualifieddomainname

(FQDN)invCenter.AlthoughHAdoesnotdependonDNS,rememberthatotherservicesmaydependonit.Ontopofthat,monitoringandtroubleshootingwillbemucheasierwhenhostsarecorrectlyregisteredwithinvCenterandhaveavalidFQDN.

Basicdesignprinciple:AlthoughHAisnotdependentonDNS,itisstillrecommendedtoregisterthehostswiththeirFQDNforeaseofoperations/management.

vSphereHAalsohasastandardizedloggingmechanism,whereasinglelogfilehasbeencreatedforalloperationallogmessages;itiscalledfdm.log.Thislogfileisstoredunder/var/log/asdepictedinFigure5.

Figure6-HAlogfile

vSphere6.xHADeepdive

14ComponentsofHA

Page 15: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Basicdesignprinciple:Ensuresyslogiscorrectlyconfiguredandlogfilesareoffloadedtoasafelocationtoofferthepossibilityofperformingarootcauseanalysisincasedisasterstrikes.

HOSTDAgentOneofthemostcrucialagentsonahostisHOSTD.Thisagentisresponsibleformanyofthetaskswetakeforgrantedlikepoweringonvirtualmachines.FDMtalksdirectlytoHOSTDandvCenter,soitisnotdependentonVPXA,likeinpreviousreleases.Thisis,ofcourse,toavoidanyunnecessaryoverheadanddependencies,makingHAmorereliablethaneverbeforeandenablingHAtorespondfastertopower-onrequests.ThatultimatelyresultsinhigherVMuptime.

When,forwhateverreason,HOSTDisunavailableornotyetrunningafterarestart,thehostwillnotparticipateinanyFDM-relatedprocesses.FDMreliesonHOSTDforinformationaboutthevirtualmachinesthatareregisteredtothehost,andmanagesthevirtualmachinesusingHOSTDAPIs.Inshort,FDMisdependentonHOSTDandifHOSTDisnotoperational,FDMhaltsallfunctionsandwaitsforHOSTDtobecomeoperational.

vCenterThatbringsustoourfinalcomponent,thevCenterServer.vCenteristhecoreofeveryvSphereClusterandisresponsibleformanytasksthesedays.Forourpurposes,thefollowingarethemostimportantandtheoneswewilldiscussinmoredetail:

DeployingandconfiguringHAAgentsCommunicationofclusterconfigurationchangesProtectionofvirtualmachines

vCenterisresponsibleforpushingouttheFDMagenttotheESXihostswhenapplicable.Thepushoftheseagentsisdoneinparalleltoallowforfasterdeploymentandconfigurationofmultiplehostsinacluster.vCenterisalsoresponsibleforcommunicatingconfigurationchangesintheclustertothehostwhichiselectedasthemaster.Wewilldiscussthisconceptofmasterandslavesinthefollowingchapter.Examplesofconfigurationchangesaremodificationoradditionofanadvancedsettingortheintroductionofanewhostintothecluster.

HAleveragesvCentertoretrieveinformationaboutthestatusofvirtualmachinesand,ofcourse,vCenterisusedtodisplaytheprotectionstatus(Figure6)ofvirtualmachines.(What“virtualmachineprotection”actuallymeanswillbediscussedinchapter3.)Ontopofthat,vCenterisresponsiblefortheprotectionandunprotectionofvirtualmachines.Thisnotonly

vSphere6.xHADeepdive

15ComponentsofHA

Page 16: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

appliestouserinitiatedpower-offsorpower-onsofvirtualmachines,butalsointhecasewhereanESXihostisdisconnectedfromvCenteratwhichpointvCenterwillrequestthemasterHAagenttounprotecttheaffectedvirtualmachines.

Figure7-Virtualmachineprotectionstate

AlthoughHAisconfiguredbyvCenterandexchangesvirtualmachinestateinformationwithHA,vCenterisnotinvolvedwhenHArespondstofailure.ItiscomfortingtoknowthatincaseofahostfailurecontainingthevirtualizedvCenterServer,HAtakescareofthefailureandrestartsthevCenterServeronanotherhost,includingallotherconfiguredvirtualmachinesfromthatfailedhost.

ThereisacornercasescenariowithregardstovCenterfailure:iftheESXihostsaresocalled“statelesshosts”andDistributedvSwitchesareusedforthemanagementnetwork,virtualmachinerestartswillnotbeattempteduntilvCenterisrestarted.Forstatelessenvironments,vCenterandAutoDeployavailabilityiskeyastheESXihostsliterallydependonthem.

IfvCenterisunavailable,itwillnotbepossibletomakechangestotheconfigurationofthecluster.vCenteristhesourceoftruthforthesetofvirtualmachinesthatareprotected,theclusterconfiguration,thevirtualmachine-to-hostcompatibilityinformation,andthehostmembership.So,whileHA,bydesign,willrespondtofailureswithoutvCenter,HAreliesonvCentertobeavailabletoconfigureormonitorthecluster.

WhenavirtualvCenterServer,orthevCenterServerAppliance,hasbeenimplemented,werecommendsettingthecorrectHArestartprioritiesforit.AlthoughvCenterServerisnotrequiredtorestartvirtualmachines,therearemultiplecomponentsthatrelyonvCenterand,assuch,aspeedyrecoveryisdesired.WhenconfiguringyourvCentervirtualmachinewitha

vSphere6.xHADeepdive

16ComponentsofHA

Page 17: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

highpriorityforrestarts,remembertoincludeallservicesonwhichyourvCenterserverdependsforasuccessfulrestart:DNS,MSADandMSSQL(oranyotherdatabaseserveryouareusing).

Basicdesignprinciples:

1. Instatelessenvironments,ensurevCenterandAutoDeployarehighlyavailableasrecoverytimeofyourvirtualmachinesmightbedependentonthem.

2. UnderstandtheimpactofvirtualizingvCenter.EnsureithashighpriorityforrestartsandensurethatserviceswhichvCenterServerdependsonareavailable:DNS,ADanddatabase.

vSphere6.xHADeepdive

17ComponentsofHA

Page 18: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

FundamentalConceptsNowthatyouknowaboutthecomponentsofHA,itistimetostarttalkingaboutsomeofthefundamentalconceptsofHAclusters:

Master/SlaveagentsHeartbeatingIsolatedvsNetworkpartitionedVirtualMachineProtectionComponentProtection

EveryonewhohasimplementedvSphereknowsthatmultiplehostscanbeconfiguredintoacluster.Aclustercanbestbeseenasacollectionofresources.TheseresourcescanbecarvedupwiththeuseofvSphereDistributedResourceScheduler(DRS)intoseparatepoolsofresourcesorusedtoincreaseavailabilitybyenablingHA.

TheHAarchitectureintroducestheconceptofmasterandslaveHAagents.Exceptduringnetworkpartitions,whicharediscussedlater,thereisonlyonemasterHAagentinacluster.Anyagentcanserveasamaster,andallothersareconsidereditsslaves.Amasteragentisinchargeofmonitoringthehealthofvirtualmachinesforwhichitisresponsibleandrestartinganythatfail.Theslavesareresponsibleforforwardinginformationtothemasteragentandrestartinganyvirtualmachinesatthedirectionofthemaster.TheHAagent,regardlessofitsroleasmasterorslave,alsoimplementstheVM/AppmonitoringfeaturewhichallowsittorestartvirtualmachinesinthecaseofanOperatingSystemorrestartservicesinthecaseofanapplicationfailure.

MasterAgentAsstated,oneoftheprimarytasksofthemasteristokeeptrackofthestateofthevirtualmachinesitisresponsibleforandtotakeactionwhenappropriate.Inanormalsituationthereisonlyasinglemasterinacluster.Wewilldiscussthescenariowheremultiplemasterscanexistinasingleclusterinoneofthefollowingsections,butfornowlet’stalkaboutaclusterwithasinglemaster.Amasterwillclaimresponsibilityforavirtualmachinebytaking“ownership”ofthedatastoreonwhichthevirtualmachine’sconfigurationfileisstored.

vSphere6.xHADeepdive

18FundamentalConcepts

Page 19: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Basicdesignprinciple:Tomaximizethechanceofrestartingvirtualmachinesafterafailurewerecommendmaskingdatastoresonaclusterbasis.Althoughsharingofdatastoresacrossclusterswillwork,itwillincreasecomplexityfromanadministrativeperspective.

Thatisnotall,ofcourse.TheHAmasterisalsoresponsibleforexchangingstateinformationwithvCenter.ThismeansthatitwillnotonlyreceivebutalsosendinformationtovCenterwhenrequired.TheHAmasterisalsothehostthatinitiatestherestartofvirtualmachineswhenahosthasfailed.Youmayimmediatelywanttoaskwhathappenswhenthemasteristheonethatfails,or,moregenerically,whichofthehostscanbecomethemasterandwhenisitelected?

Election

AmasteriselectedbyasetofHAagentswhenevertheagentsarenotinnetworkcontactwithamaster.AmasterelectionthusoccurswhenHAisfirstenabledonaclusterandwhenthehostonwhichthemasterisrunning:

fails,becomesnetworkpartitionedorisolated,isdisconnectedfromvCenterServer,isputintomaintenanceorstandbymode,orwhenHAisreconfiguredonthehost.

TheHAmasterelectiontakesapproximately15secondsandisconductedusingUDP.WhileHAwon’treacttofailuresduringtheelection,onceamasteriselected,failuresdetectedbeforeandduringtheelectionwillbehandled.Theelectionprocessissimplebutrobust.Thehostthatisparticipatingintheelectionwiththegreatestnumberofconnecteddatastoreswillbeelectedmaster.Iftwoormorehostshavethesamenumberofdatastoresconnected,theonewiththehighestManagedObjectIdwillbechosen.Thishoweverisdonelexically;meaningthat99beats100as9islargerthan1.Foreachhost,theHAStateofthehostwillbeshownontheSummarytab.Thisincludestheroleasdepictedinscreenshotbelowwherethehostisamasterhost.

Afteramasteriselected,eachslavethathasmanagementnetworkconnectivitywithitwillsetupasinglesecure,encrypted,TCPconnectiontothemaster.ThissecureconnectionisSSL-based.Onethingtostressherethoughisthatslavesdonotcommunicatewitheachotherafterthemasterhasbeenelectedunlessare-electionofthemasterneedstotakeplace.

vSphere6.xHADeepdive

19FundamentalConcepts

Page 20: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure8-MasterAgent

Asstatedearlier,whenamasteriselecteditwilltrytoacquireownershipofallofthedatastoresitcandirectlyaccessoraccessbyproxyingrequeststooneoftheslavesconnectedtoitusingthemanagementnetwork.Forregularstoragearchitecturesitdoesthisbylockingafilecalled“protectedlist”thatisstoredonthedatastoresinanexistingcluster.Themasterwillalsoattempttotakeownershipofanydatastoresitdiscoversalongtheway,anditwillperiodicallyretryanyitcouldnottakeownershipofpreviously.

Thenamingformatandlocationofthisfileisasfollows:

/<rootofdatastore>/.vSphere-HA/<cluster-specific-directory>/protectedlist

Forthosewonderinghow“cluster-specific-directory”isconstructed:

<uuidofvCenterServer>-<numberpartoftheMoIDofthecluster>-<random8charstring>-

<nameofthehostrunningvCenterServer>

Themasterusesthisprotectedlistfiletostoretheinventory.ItkeepstrackofwhichvirtualmachinesareprotectedbyHA.Callingitaninventorymightbeslightlyoverstating:itisalistofprotectedvirtualmachinesanditincludesinformationaroundvirtualmachineCPUreservationandmemoryoverhead.Themasterdistributesthisinventoryacrossalldatastoresinusebythevirtualmachinesinthecluster.Thenextscreenshotshowsanexampleofthisfileononeofthedatastores.

vSphere6.xHADeepdive

20FundamentalConcepts

Page 21: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure9-Protectedlistfile

Nowthatweknowthemasterlocksafileonthedatastoreandthatthisfilestoresinventorydetails,whathappenswhenthemasterisisolatedorfails?Ifthemasterfails,theanswerissimple:thelockwillexpireandthenewmasterwillrelockthefileifthedatastoreisaccessibletoit.

Inthecaseofisolation,thisscenarioisslightlydifferent,althoughtheresultissimilar.ThemasterwillreleasethelockithasonthefileonthedatastoretoensurethatwhenanewmasteriselecteditcandeterminethesetofvirtualmachinesthatareprotectedbyHAbyreadingthefile.If,byanychance,amastershouldfailrightatthemomentthatitbecameisolated,therestartofthevirtualmachineswillbedelayeduntilanewmasterhasbeenelected.Inascenariolikethis,accuracyandthefactthatvirtualmachinesarerestartedismoreimportantthanashortdelay.

Let’sassumeforasecondthatyourmasterhasjustfailed.Whatwillhappenandhowdotheslavesknowthatthemasterhasfailed?HAusesapoint-to-pointnetworkheartbeatmechanism.Iftheslaveshavereceivednonetworkheartbeatsfromthemaster,theslaveswilltrytoelectanewmaster.Thisnewmasterwillreadtherequiredinformationandwillinitiatetherestartofthevirtualmachineswithinroughly10seconds.

Restartingvirtualmachinesisnottheonlyresponsibilityofthemaster.ItisalsoresponsibleformonitoringthestateoftheslavehostsandreportingthisstatetovCenterServer.Ifaslavefailsorbecomesisolatedfromthemanagementnetwork,themasterwilldeterminewhichvirtualmachinesmustberestarted.Whenvirtualmachinesneedtoberestarted,themasterisalsoresponsiblefordeterminingtheplacementofthosevirtualmachines.Itusesaplacementenginethatwilltrytodistributethevirtualmachinestoberestartedevenlyacrossallavailablehosts.

vSphere6.xHADeepdive

21FundamentalConcepts

Page 22: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Alloftheseresponsibilitiesarereallyimportant,butwithoutamechanismtodetectaslavehasfailed,themasterwouldbeuseless.Justliketheslavesreceiveheartbeatsfromthemaster,themasterreceivesheartbeatsfromtheslavessoitknowstheyarealive.

SlavesAslavehassubstantiallyfewerresponsibilitiesthanamaster:aslavemonitorsthestateofthevirtualmachinesitisrunningandinformsthemasteraboutanychangestothisstate.

Theslavealsomonitorsthehealthofthemasterbymonitoringheartbeats.Ifthemasterbecomesunavailable,theslavesinitiateandparticipateintheelectionprocess.Lastbutnotleast,theslavessendheartbeatstothemastersothatthemastercandetectoutages.Likethemastertoslavecommunication,allslavetomastercommunicationispointtopoint.HAdoesnotusemulticast.

Figure10-SlaveAgent

FilesforbothSlaveandMasterBeforeexplainingthedetailsitisimportanttounderstandthatbothVirtualSANandVirtualVolumeshaveintroducedchangestothelocationandtheusageoffiles.Forspecificsonthesetwodifferentstoragearchitectureswereferyoutothoserespectivesectionsinthebook.

Boththemasterandslaveusefilesnotonlytostorestate,butalsoasacommunicationmechanism.We’vealreadyseentheprotectedlistfile(Figure8)usedbythemastertostorethelistofprotectedvirtualmachines.Wewillnowdiscussthefilesthatarecreatedbyboth

vSphere6.xHADeepdive

22FundamentalConcepts

Page 23: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

themasterandtheslaves.Remotefilesarefilesstoredonashareddatastoreandlocalfilesarefilesthatarestoredinalocationonlydirectlyaccessibletothathost.

RemoteFiles

Thesetofpoweredonvirtualmachinesisstoredinaper-host“poweron”file.Itshouldbenotedthat,becauseamasteralsohostsvirtualmachines,italsocreatesa“poweron”file.

Thenamingschemeforthisfileisasfollows:host-number-poweron

Trackingvirtualmachinepower-onstateisnottheonlythingthe“poweron”fileisusedfor.Thisfileisalsousedbytheslavestoinformthemasterthatitisisolatedfromthemanagementnetwork:thetoplineofthefilewilleithercontaina0ora1.A0(zero)meansnot-isolatedanda1(one)meansisolated.ThemasterwillinformvCenterabouttheisolationofthehost.

LocalFiles

Asmentionedbefore,whenHAisconfiguredonahost,thehostwillstorespecificinformationaboutitsclusterlocally.

Figure11-Locallystoredfiles

Eachhost,includingthemaster,willstoredatalocally.Thedatathatislocallystoredisimportantstateinformation.Namely,theVM-to-hostcompatibilitymatrix,clusterconfiguration,andhostmembershiplist.Thisinformationispersistedlocallyoneachhost.UpdatestothisinformationissenttothemasterbyvCenterandpropagatedbythemastertotheslaves.Althoughweexpectthatmostofyouwillnevertouchthesefiles–andwehighlyrecommendagainstmodifyingthem–wedowanttoexplainhowtheyareused:

vSphere6.xHADeepdive

23FundamentalConcepts

Page 24: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

clusterconfigThisfileisnothuman-readable.Itcontainstheconfigurationdetailsofthecluster.vmmetadataThisfileisnothuman-readable.ItcontainstheactualcompatibilityinfomatrixforeveryHAprotectedvirtualmachineandlistsallthehostswithwhichitiscompatibleplusavm/hostdictionaryfdm.cfgThisfilecontainstheconfigurationsettingsaroundlogging.Forinstance,thelevelofloggingandsyslogdetailsarestoredinhere.hostlistAlistofhostsparticipatinginthecluster,includinghostname,IPaddresses,MACaddressesandheartbeatdatastores.

HeartbeatingWementioneditacoupleoftimesalreadyinthischapter,anditisanimportantmechanismthatdeservesitsownsection:heartbeating.HeartbeatingisthemechanismusedbyHAtovalidatewhetherahostisalive.HAhastwodifferentheartbeatingmechanisms.Theseheartbeatmechanismsallowsittodeterminewhathashappenedtoahostwhenitisnolongerresponding.Let’sdiscusstraditionalnetworkheartbeatingfirst.

NetworkHeartbeating

NetworkHeartbeatingisusedbyHAtodetermineifanESXihostisalive.Eachslavewillsendaheartbeattoitsmasterandthemastersendsaheartbeattoeachoftheslaves,thisisapoint-to-pointcommunication.Theseheartbeatsaresentbydefaulteverysecond.

Whenaslaveisn’treceivinganyheartbeatsfromthemaster,itwilltrytodeterminewhetheritisIsolated–wewilldiscuss“states”inmoredetaillateroninthischapter.

Basicdesignprinciple:Networkheartbeatingiskeyfordeterminingthestateofahost.Ensurethemanagementnetworkishighlyresilienttoenableproperstatedetermination.

DatastoreHeartbeating

DatastoreheartbeatingaddsanextralevelofresiliencyandpreventsunnecessaryrestartattemptsfromoccurringasitallowsvSphereHAtodeterminewhetherahostisisolatedfromthenetworkoriscompletelyunavailable.Howdoesthiswork?

Datastoreheartbeatingenablesamastertomoredeterminethestateofahostthatisnotreachableviathemanagementnetwork.Thenewdatastoreheartbeatmechanismisusedincasethemasterhaslostnetworkconnectivitywiththeslaves.Thedatastoreheartbeatmechanismisthenusedtovalidatewhetherahosthasfailedorismerelyisolated/network

vSphere6.xHADeepdive

24FundamentalConcepts

Page 25: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

partitioned.Isolationwillbevalidatedthroughthe“poweron”filewhich,asmentionedearlier,willbeupdatedbythehostwhenitisisolated.Withoutthe“poweron”file,thereisnowayforthemastertovalidateisolation.Letthatbeclear!Basedontheresultsofchecksofbothfiles,themasterwilldeterminetheappropriateactiontotake.Ifthemasterdeterminesthatahosthasfailed(nodatastoreheartbeats),themasterwillrestartthefailedhost’svirtualmachines.IfthemasterdeterminesthattheslaveisIsolatedorPartitioned,itwillonlytakeactionwhenitisappropriatetotakeaction.Withthatmeaningthatthemasterwillonlyinitiaterestartswhenvirtualmachinesaredownorpowereddown/shutdownbyatriggeredisolationresponse,butwewilldiscussthisinmoredetailinChapter4.

Bydefault,HAselects2heartbeatdatastores–itwillselectdatastoresthatareavailableonallhosts,orasmanyaspossible.Althoughitispossibletoconfigureanadvancedsetting(das.heartbeatDsPerHost)toallowformoredatastoresfordatastoreheartbeatingwedonotrecommendconfiguringthisoptionasthedefaultshouldbesufficientformostscenarios,exceptforstretchedclusterenvironmentswhereitisrecommendedtohavetwoineachsitemanuallyselected.

TheselectionprocessgivespreferencetoVMFSoverNFSdatastores,andseekstochoosedatastoresthatarebackedbydifferentLUNsorNFSserverswhenpossible.Ifdesired,youcanalsoselecttheheartbeatdatastoresyourself.We,however,recommendlettingvCenterdealwiththisoperational“burden”asvCenterusesaselectionalgorithmtoselectheartbeatdatastoresthatarepresentedtoallhosts.ThishoweverisnotaguaranteethatvCentercanselectdatastoreswhichareconnectedtoallhosts.ItshouldbenotedthatvCenterisnotsite-aware.Inscenarioswherehostsaregeographicallydisperseditisrecommendtomanuallyselectheartbeatdatastorestoensureeachsitehasonesite-localheartbeatdatastoreatminimum.

Basicdesignprinciple:Inametro-cluster/geographicallydispersedclusterwerecommendsettingtheminimumnumberofheartbeatdatastorestofour.Itisrecommendedtomanuallyselectsitelocaldatastores,twoforeachsite.

vSphere6.xHADeepdive

25FundamentalConcepts

Page 26: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure12-Selectingtheheartbeatdatastores

Thequestionnowarises:what,exactly,isthisdatastoreheartbeatingandwhichdatastoreisusedforthisheartbeating?Let’sanswerwhichdatastoreisusedfordatastoreheartbeatingfirstaswecansimplyshowthatwithascreenshot,seebelow.vSpheredisplaysextensivedetailsaroundthe“ClusterStatus”ontheCluster’sMonitortab.Thisforinstanceshowsyouwhichdatastoresarebeingusedforheartbeatingandwhichhostsareusingwhichspecificdatastore(s).Inaddition,itdisplayshowmanyvirtualmachinesareprotectedandhowmanyhostsareconnectedtothemaster.

InblockbasedstorageenvironmentsHAleveragesanexistingVMFSfilesystemmechanism.Thedatastoreheartbeatmechanismusesasocalled“heartbeatregion”whichisupdatedaslongasthefileisopen.OnVMFSdatastores,HAwillsimplycheckwhethertheheartbeatregionhasbeenupdated.Inordertoupdateadatastoreheartbeatregion,a

vSphere6.xHADeepdive

26FundamentalConcepts

Page 27: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

hostneedstohaveatleastoneopenfileonthevolume.HAensuresthereisatleastonefileopenonthisvolumebycreatingafilespecificallyfordatastoreheartbeating.Inotherwords,aper-hostfileiscreatedonthedesignatedheartbeatingdatastores,asshownbelow.Thenamingschemeforthisfileisasfollows:host-number-hb.

OnNFSdatastores,eachhostwillwritetoitsheartbeatfileonceevery5seconds,ensuringthatthemasterwillbeabletocheckhoststate.Themasterwillsimplyvalidatethisbycheckingthatthetime-stampofthefilechanged.

Realizethatinthecaseofaconvergednetworkenvironment,theeffectivenessofdatastoreheartbeatingwillvarydependingonthetypeoffailure.Forinstance,aNICfailurecouldimpactbothnetworkanddatastoreheartbeating.If,forwhateverreason,thedatastoreorNFSsharebecomesunavailableorisremovedfromthecluster,HAwilldetectthisandselectanewdatastoreorNFSsharetousefortheheartbeatingmechanism.

Basicdesignprinciple

Datastoreheartbeatingaddsanewlevelofresiliencybutisnotthebe-allend-all.Inconvergednetworkingenvironments,theuseofdatastoreheartbeatingaddslittlevalueduetothefactthataNICfailuremayresultinboththenetworkandstoragebecomingunavailable.

IsolatedversusPartitionedWe’vealreadybrieflytouchedonitanditistimetohaveacloserlook.Whenitcomestonetworkfailurestherearetwodifferentstatesthatexist.WhataretheseexactlyandwhenisahostPartitionedratherthanIsolated?Beforewewillexplainthiswewanttopointoutthatthereisthestateasreportedbythemasterandthestateasobservedbyanadministratorandthecharacteristicsthesehave.

vSphere6.xHADeepdive

27FundamentalConcepts

Page 28: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

First,considertheadministrator’sperspective.Twohostsareconsideredpartitionediftheyareoperationalbutcannotreacheachotheroverthemanagementnetwork.Further,ahostisisolatedifitdoesnotobserveanyHAmanagementtrafficonthemanagementnetworkanditcan’tpingtheconfiguredisolationaddresses.Itispossibleformultiplehoststobeisolatedatthesametime.Wecallasetofhoststhatarepartitionedbutcancommunicatewitheachothera“managementnetworkpartition”.Networkpartitionsinvolvingmorethantwopartitionsarepossiblebutnotlikely.

Now,considertheHAperspective.WhenanyHAagentisnotinnetworkcontactwithamaster,theywillelectanewmaster.So,whenanetworkpartitionexists,amasterelectionwilloccursothatahostfailureornetworkisolationwithinthispartitionwillresultinappropriateactionontheimpactedvirtualmachine(s).ThescreenshotbelowshowspossiblewaysinwhichanIsolationoraPartitioncanoccur.

Figure13-IsolatedversusPartitioned

vSphere6.xHADeepdive

28FundamentalConcepts

Page 29: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Ifaclusterispartitionedinmultiplesegments,eachpartitionwillelectitsownmaster,meaningthatifyouhave4partitionsyourclusterwillhave4masters.Whenthenetworkpartitioniscorrected,anyofthefourmasterswilltakeovertheroleandberesponsiblefortheclusteragain.Itshouldbenotedthatamastercouldclaimresponsibilityforavirtualmachinethatlivesinadifferentpartition.Ifthisoccursandthevirtualmachinehappenstofail,themasterwillbenotifiedthroughthedatastorecommunicationmechanism.

IntheHAarchitecture,whetherahostispartitionedisdeterminedbythemasterreportingthecondition.So,intheaboveexample,themasteronhostESXi-01willreportESXi-03and04partitionedwhilethemasteronhost04willreport01and02partitioned.Whenapartitionoccurs,vCenterreportstheperspectiveofonemaster.

Amasterreportsahostaspartitionedorisolatedwhenitcan’tcommunicatewiththehostoverthemanagementnetwork,itcanobservethehost’sdatastoreheartbeatsviatheheartbeatdatastores.Themastercannotalonedifferentiatebetweenthesetwostates–ahostisreportedasisolatedonlyifthehostinformsthemasterviathedatastoresthatisisolated.

ThisstillleavesthequestionopenhowthemasterdifferentiatesbetweenaFailed,Partitioned,orIsolatedhost.

Whenthemasterstopsreceivingnetworkheartbeatsfromaslave,itwillcheckforhost“liveness”forthenext15seconds.Beforethehostisdeclaredfailed,themasterwillvalidateifithasactuallyfailedornotbydoingadditionallivenesschecks.First,themasterwillvalidateifthehostisstillheartbeatingtothedatastore.Second,themasterwillpingthemanagementIPaddressofthehost.Ifbotharenegative,thehostwillbedeclaredFailed.Thisdoesn’tnecessarilymeanthehosthasPSOD’ed;itcouldbethenetworkisunavailable,includingthestoragenetwork,whichwouldmakethishostIsolatedfromanadministrator’sperspectivebutFailedfromanHAperspective.Asyoucanimagine,however,thereareavariouscombinationspossible.Thefollowingtabledepictsthesecombinationsincludingthe“state”.

State NetworkHeartbeat StorageHeartbeatHostLive-

nessPing

IsolationCriteriaMet

Running Yes N/A N/A N/A

Isolated No Yes No Yes

Partitioned No Yes No No

Failed No No No N/A

FDMAgentDown N/A N/A Yes N/A

vSphere6.xHADeepdive

29FundamentalConcepts

Page 30: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

HAwilltriggeranactionbasedonthestateofthehost.WhenthehostismarkedasFailed,arestartofthevirtualmachineswillbeinitiated.WhenthehostismarkedasIsolated,themastermightinitiatetherestarts.

Theonethingtokeepinmindwhenitcomestoisolationresponseisthatavirtualmachinewillonlybeshutdownorpoweredoffwhentheisolatedhostknowsthereisamasterouttherethathastakenownershipforthevirtualmachineorwhentheisolatedhostlosesaccesstothehomedatastoreofthevirtualmachine.

Forexample,ifahostisisolatedandrunstwovirtualmachines,storedonseparatedatastores,thehostwillvalidateifitcanaccesseachofthehomedatastoresofthosevirtualmachines.Ifitcan,thehostwillvalidatewhetheramasterownsthesedatastores.Ifnomasterownsthedatastores,theisolationresponsewillnotbetriggeredandrestartswillnotbeinitiated.Ifthehostdoesnothaveaccesstothedatastore,forinstance,duringan“AllPathsDown”condition,HAwilltriggertheisolationresponsetoensurethe“original”virtualmachineispowereddownandwillbesafelyrestarted.Thistoavoidso-called“split-brain”scenarios.

Toreiterate,asthisisaveryimportantaspectofHAandhowithandlesnetworkisolations,theremaininghostsintheclusterwillonlyberequestedtorestartvirtualmachineswhenthemasterhasdetectedthateitherthehosthasfailedorhasbecomeisolatedandtheisolationresponsewastriggered.

VirtualMachineProtectionVirtualmachineprotectionhappensonseverallayersbutisultimatelytheresponsibilityofvCenter.WehaveexplainedthisbrieflybutwanttoexpandonitabitmoretomakesureeveryoneunderstandsthedependencyonvCenterwhenitcomestoprotectingvirtualmachines.Wedowanttostressthatthisonlyappliestoprotectingvirtualmachines;virtualmachinerestartsinnowayrequirevCentertobeavailableatthetime.

Whenthestateofavirtualmachinechanges,vCenterwilldirectthemastertoenableordisableHAprotectionforthatvirtualmachine.Protection,however,isonlyguaranteedwhenthemasterhascommittedthechangeofstatetodisk.Thereasonforthis,ofcourse,isthatafailureofthemasterwouldresultinthelossofanystatechangesthatexistonlyinmemory.Aspointedoutearlier,thisstateisdistributedacrossthedatastoresandstoredinthe“protectedlist”file.

Whenthepowerstatechangeofavirtualmachinehasbeencommittedtodisk,themasterwillinformvCenterServersothatthechangeinstatusisvisiblebothfortheuserinvCenterandforotherprocesseslikemonitoringtools.

vSphere6.xHADeepdive

30FundamentalConcepts

Page 31: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Toclarifytheprocess,wehavecreatedaworkflowdiagramoftheprotectionofavirtualmachinefromthepointitispoweredonthroughvCenter:

Figure14-VirtualMachineprotectionworkflow

Butwhatabout“unprotection?”Whenavirtualmachineispoweredoff,itmustberemovedfromtheprotectedlist.WehavedocumentedthisworkflowinthefollowingdiagramforthesituationwherethepoweroffisinvokedfromvCenter.

vSphere6.xHADeepdive

31FundamentalConcepts

Page 32: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure15-VirtualMachineUnprotectionworkflow

vSphere6.xHADeepdive

32FundamentalConcepts

Page 33: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

RestartingVirtualMachinesInthepreviouschapter,wehavedescribedmostofthelowerlevelfundamentalconceptsofHA.WehaveshownyouthatmultiplemechanismsincreaseresiliencyandreliabilityofHA.ReliabilityofHAinthiscasemostlyreferstorestarting(orresetting)virtualmachines,asthatremainsHA’sprimarytask.

HAwillrespondwhenthestateofahosthaschanged,or,bettersaid,whenthestateofoneormorevirtualmachineshaschanged.TherearemultiplescenariosinwhichHAwillrespondtoavirtualmachinefailure,themostcommonofwhicharelistedbelow:

FailedhostIsolatedhostFailedguestoperatingsystem

Dependingonthetypeoffailure,butalsodependingontheroleofthehost,theprocesswilldifferslightly.Changingtheprocessresultsinslightlydifferentrecoverytimelines.Therearemanydifferentscenariosandthereisnopointincoveringallofthem,sowewilltrytodescribethemostcommonscenarioandincludetimelineswherepossible.

Beforewediveintothedifferentfailurescenarios,wewanttoexplainhowrestartpriorityandretrieswork.

RestartPriorityandOrderHAcantaketheconfiguredpriorityofthevirtualmachineintoaccountwhenrestartingVMs.However,itisgoodtoknowthatAgentVMstakeprecedenceduringtherestartprocedureasthe“regular”virtualmachinesmayrelyonthem.Agoodexampleofanagentvirtualmachineisavirtualstorageappliance.

Prioritizationisdonebyeachhostandnotglobally.Eachhostthathasbeenrequestedtoinitiaterestartattemptswillattempttorestartalltoppriorityvirtualmachinesbeforeattemptingtostartanyothervirtualmachines.Iftherestartofatoppriorityvirtualmachinefails,itwillberetriedafteradelay.Inthemeantime,however,HAwillcontinuepoweringontheremainingvirtualmachines.Keepinmindthatsomevirtualmachinesmightbedependentontheagentvirtualmachines.Youshoulddocumentwhichvirtualmachinesaredependentonwhichagentvirtualmachinesanddocumenttheprocesstostartuptheseservicesintherightorderinthecasetheautomaticrestartofanagentvirtualmachinefails.

vSphere6.xHADeepdive

33RestartingVirtualMachines

Page 34: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Basicdesignprinciple:Virtualmachinescanbedependentontheavailabilityofagentvirtualmachinesorothervirtualmachines.AlthoughHAwilldoitsbesttoensureallvirtualmachinesarestartedinthecorrectorder,thisisnotguaranteed.Documenttheproperrecoveryprocess.

Besidesagentvirtualmachines,HAalsoprioritizesFTsecondarymachines.Wehavelistedthefullorderinwhichvirtualmachineswillberestartedbelow:

AgentvirtualmachinesFTsecondaryvirtualmachinesVirtualMachinesconfiguredwitharestartpriorityofhighVirtualMachinesconfiguredwithamediumrestartpriorityVirtualMachinesconfiguredwithalowrestartpriority

ItshouldbenotedthatHAwillnotplaceanyvirtualmachinesonahostiftherequirednumberofagentvirtualmachinesarenotrunningonthehostatthetimeplacementisdone.

Nowthatwehavebrieflytouchedonit,wewouldalsoliketoaddress“restartretries”andparallelizationofrestartsasthatmoreorlessdictateshowlongitcouldtakebeforeallvirtualmachinesofafailedorisolatedhostarerestarted.

RestartRetriesThenumberofretriesisconfigurableasofvCenter2.5U4withtheadvancedoption“das.maxvmrestartcount”.Thedefaultvalueis5.Notethattheinitialrestartisincluded.

HAwilltrytostartthevirtualmachineononeofyourhostsintheaffectedcluster;ifthisisunsuccessfulonthathost,therestartcountwillbeincreasedby1.Beforewegointotheexacttimeline,letitbeclearthatT0isthepointatwhichthemasterinitiatesthefirstrestartattempt.Thisbyitselfcouldbe30secondsafterthevirtualmachinehasfailed.Theelapsedtimebetweenthefailureofthevirtualmachineandtherestart,though,willdependonthescenarioofthefailure,whichwewilldiscussinthischapter.

Assaid,thedefaultnumberofrestartsis5.Therearespecifictimesassociatedwitheachoftheseattempts.Thefollowingbulletlistwillclarifythisconcept.The‘m’standsfor“minutes”inthislist.

T0–InitialRestartT2m–Restartretry1T6m–Restartretry2T14m–Restartretry3T30m–Restartretry4

vSphere6.xHADeepdive

34RestartingVirtualMachines

Page 35: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure16-HighAvailabilityrestarttimeline

vSphere6.xHADeepdive

35RestartingVirtualMachines

Page 36: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Asclearlydepictedinthediagramabove,asuccessfulpower-onattemptcouldtakeupto~30minutesinthecasewheremultiplepower-onattemptsareunsuccessful.Thisis,however,notexactscience.Forinstance,thereisa2-minutewaitingperiodbetweentheinitialrestartandthefirstrestartretry.HAwillstartthe2-minutewaitassoonasithasdetectedthattheinitialattempthasfailed.So,inreality,T2couldbeT2plus8seconds.Anotherimportantfactthatwewantemphasizeisthatthereisnocoordinationbetweenmasters,andsoifmultipleonesareinvolvedintryingtorestartthevirtualmachine,eachwillretaintheirownsequence.Multiplemasterscouldattempttorestartavirtualmachine.Althoughonlyonewillsucceed,itmightchangesomeofthetimelines.

WhataboutVMswhichare"disabled"forHA?WhatwillhappenwiththoseVMs?BeforevSphere6.0thoseVMswouldbeleftalone,asofvSphere6.0theseVMswillberegisteredonanotherhostafterafailure.Thiswillallowyoutoeasilypower-onthatVMwhenneededwithoutneededtomanuallyre-registerityourself.Note,HAwillnotdoapower-onoftheVM,itwilljustregisteritforyou!

Let’sgiveanexampletoclarifythescenarioinwhichamasterfailsduringarestartsequence:

Cluster:4Host(esxi01,esxi02,esxi03,esxi04)

Master:esxi01

Thehost“esxi02”isrunningasinglevirtualmachinecalled“vm01”anditfails.Themaster,esxi01,willtrytorestartitbuttheattemptfails.Itwilltryrestarting“vm01”upto5timesbut,

unfortunately,onthe4thtry,themasteralsofails.Anelectionoccursand“esxi03”becomesthenewmaster.Itwillnowinitiatetherestartof“vm01”,andifthatrestartwouldfailitwillretryitupto4timesagainforatotalincludingtheinitialrestartof5.

Beaware,though,thatasuccessfulrestartmightneveroccuriftherestartcountisreachedandallfiverestartattempts(thedefaultvalue)wereunsuccessful.

Whenitcomestorestarts,onethingthatisveryimportanttorealizeisthatHAwillnotissuemorethan32concurrentpower-ontasksonagivenhost.Tomakethatmoreclear,let’susetheexampleofatwohostcluster:ifahostfailswhichcontained33virtualmachinesandall

ofthesehadthesamerestartpriority,32poweronattemptswouldbeinitiated.The33rd

poweronattemptwillonlybeinitiatedwhenoneofthose32attemptshascompletedregardlessofsuccessorfailureofoneofthoseattempts.

Now,herecomesthegotcha.Ifthereare32low-priorityvirtualmachinestobepoweredonandasinglehigh-priorityvirtualmachine,thepoweronattemptforthelow-priorityvirtualmachineswillnotbeissueduntilthepoweronattemptforthehighpriorityvirtualmachinehascompleted.LetitbeabsolutelyclearthatHAdoesnotwaittorestartthelow-priorityvirtualmachinesuntilthehigh-priorityvirtualmachinesarestarted,itwaitsfortheissued

vSphere6.xHADeepdive

36RestartingVirtualMachines

Page 37: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

poweronattempttobereportedas“completed”.Intheory,thismeansthatifthepoweronattemptfails,thelow-priorityvirtualmachinescouldbepoweredonbeforethehighpriorityvirtualmachine.

Therestartpriorityhoweverdoesguaranteethatwhenaplacementisdone,thehigherpriorityvirtualmachinesgetfirstrighttoanyavailableresources.

Basicdesignprinciple:Configuringrestartpriorityofavirtualmachineisnotaguaranteethatvirtualmachineswillactuallyberestartedinthisorder.Ensureproperoperationalproceduresareinplaceforrestartingservicesorvirtualmachinesintheappropriateorderintheeventofafailure.

Nowthatweknowhowvirtualmachinerestartpriorityandrestartretriesarehandled,itistimetolookatthedifferentscenarios.

FailedhostFailureofamasterFailureofaslave

Isolatedhostandresponse

FailedHostWhendiscussingafailedhostscenarioitisneededtomakeadistinctionbetweenthefailureofamasterversusthefailureofaslave.Wewanttoemphasizethisbecausethetimeittakesbeforearestartattemptisinitiateddiffersbetweenthesetwoscenarios.Althoughthemajorityofyouprobablywon’tnoticethetimedifference,itisimportanttocallout.Let’sstartwiththemostcommonfailure,thatofahostfailing,butnotethatfailuresgenerallyoccurinfrequently.Inmostenvironments,hardwarefailuresareveryuncommontobeginwith.Justincaseithappens,itdoesn’thurttounderstandtheprocessanditsassociatedtimelines.

TheFailureofaSlave

Thefailureofaslavehostisafairlycomplexscenario.Partofthiscomplexitycomesfromtheintroductionofanewheartbeatmechanism.Actually,therearetwodifferentscenarios:onewhereheartbeatdatastoresareconfiguredandonewhereheartbeatdatastoresarenotconfigured.Keepinginmindthatthisisanactualfailureofthehost,thetimelineisasfollows:

T0–Slavefailure.T3s–Masterbeginsmonitoringdatastoreheartbeatsfor15seconds.T10s–Thehostisdeclaredunreachableandthemasterwillpingthemanagementnetworkofthefailedhost.Thisisacontinuouspingfor5seconds.

vSphere6.xHADeepdive

37RestartingVirtualMachines

Page 38: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

T15s–Ifnoheartbeatdatastoresareconfigured,thehostwillbedeclareddead.T18s–Ifheartbeatdatastoresareconfigured,thehostwillbedeclareddead.

Themastermonitorsthenetworkheartbeatsofaslave.Whentheslavefails,theseheartbeatswillnolongerbereceivedbythemaster.WehavedefinedthisasT0.After3seconds(T3s),themasterwillstartmonitoringfordatastoreheartbeatsanditwilldothisfor

15seconds.Onthe10thsecond(T10s),whennonetworkordatastoreheartbeatshavebeendetected,thehostwillbedeclaredas“unreachable”.Themasterwillalsostartpinging

themanagementnetworkofthefailedhostatthe10thsecondanditwilldosofor5seconds.Ifnoheartbeatdatastoreswereconfigured,thehostwillbedeclared“dead”atthe

15thsecond(T15s)andvirtualmachinerestartswillbeinitiatedbythemaster.Ifheartbeat

datastoreshavebeenconfigured,thehostwillbedeclareddeadatthe18thsecond(T18s)andrestartswillbeinitiated.Werealizethatthiscanbeconfusingandhopethetimelinedepictedinthediagrambelowmakesiteasiertodigest.

vSphere6.xHADeepdive

38RestartingVirtualMachines

Page 39: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure17-Restarttimelineslavefailure

Themasterfiltersthevirtualmachinesitthinksfailedbeforeinitiatingrestarts.Themasterusestheprotectedlistforthis,on-diskstatecouldbeobtainedonlybyonemasteratatimesinceitrequiredopeningtheprotectedlistfileinexclusivemode.IfthereisanetworkpartitionmultiplemasterscouldtrytorestartthesamevirtualmachineasvCenterServeralsoprovidedthenecessarydetailsforarestart.Asanexample,itcouldhappenthatamasterhaslockedavirtualmachine’shomedatastoreandhasaccesstotheprotectedlist

vSphere6.xHADeepdive

39RestartingVirtualMachines

Page 40: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

whiletheothermasterisincontactwithvCenterServerandassuchisawareofthecurrentdesiredprotectedstate.InthisscenarioitcouldhappenthatthemasterwhichdoesnotownthehomedatastoreofthevirtualmachinewillrestartthevirtualmachinebasedontheinformationprovidedbyvCenterServer.

Thischangeinbehaviorwasintroducedtoavoidthescenariowherearestartofavirtualmachinewouldfailduetoinsufficientresourcesinthepartitionwhichwasresponsibleforthevirtualmachine.Withthischange,thereislesschanceofsuchasituationoccurringasthemasterintheotherpartitionwouldbeusingtheinformationprovidedbyvCenterServertoinitiatetherestart.

Thatleavesuswiththequestionofwhathappensinthecaseofthefailureofamaster.

TheFailureofaMaster

Inthecaseofamasterfailure,theprocessandtheassociatedtimelineareslightlydifferent.Thereasonbeingthatthereneedstobeamasterbeforeanyrestartcanbeinitiated.Thismeansthatanelectionwillneedtotakeplaceamongsttheslaves.Thetimelineisasfollows:

T0–Masterfailure.T10s–Masterelectionprocessinitiated.T25s–Newmasterelectedandreadstheprotectedlist.T35s–Newmasterinitiatesrestartsforallvirtualmachinesontheprotectedlistwhicharenotrunning.

Slavesreceivenetworkheartbeatsfromtheirmaster.Ifthemasterfails,let’sdefinethisasT0(Tzero),theslavesdetectthiswhenthenetworkheartbeatsceasetobereceived.Aseveryclusterneedsamaster,theslaveswillinitiateanelectionatT10s.Theelectionprocesstakes15stocomplete,whichbringsustoT25s.AtT25s,thenewmasterreadstheprotectedlist.Thislistcontainsallthevirtualmachines,whichareprotectedbyHA.AtT35s,themasterinitiatestherestartofallvirtualmachinesthatareprotectedbutnotcurrentlyrunning.Thetimelinedepictedinthediagrambelowhopefullyclarifiestheprocess.

vSphere6.xHADeepdive

40RestartingVirtualMachines

Page 41: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure18-Restarttimelinemasterfailure

Besidesthefailureofahost,thereisanotherreasonforrestartingvirtualmachines:anisolationevent.

IsolationResponseandDetectionBeforewewilldiscussthetimelineandtheprocessaroundtherestartofvirtualmachinesafteranisolationevent,wewilldiscussIsolationResponseandIsolationDetection.OneofthefirstdecisionsthatwillneedtobemadewhenconfiguringHAisthe“IsolationResponse”.

IsolationResponse

vSphere6.xHADeepdive

41RestartingVirtualMachines

Page 42: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

TheIsolationResponsereferstotheactionthatHAtakesforitsvirtualmachineswhenthehosthaslostitsconnectionwiththenetworkandtheremainingnodesinthecluster.Thisdoesnotnecessarilymeanthatthewholenetworkisdown;itcouldjustbethemanagementnetworkportsofthisspecifichost.Todaytherearethreeisolationresponses:“Poweroff”,“Leavepoweredon”and“Shutdown”.Thisisolationresponseanswersthequestion,“whatshouldahostdowiththevirtualmachinesitmanageswhenitdetectsthatitisisolatedfromthenetwork?”Let’sdiscussthesethreeoptionsmorein-depth:

Poweroff–Whenisolationoccurs,allvirtualmachinesarepoweredoff.Itisahardstop,ortoputitbluntly,the“virtual”powercableofthevirtualmachinewillbepulledout!Shutdown–Whenisolationoccurs,allvirtualmachinesrunningonthehostwillbeshutdownusingaguest-initiatedshutdownthroughVMwareTools.Ifthisisnotsuccessfulwithin5minutes,a“poweroff”willbeexecuted.Thistimeoutvaluecanbeadjustedbysettingtheadvancedoptiondas.isolationShutdownTimeout.IfVMwareToolsisnotinstalled,a“poweroff”willbeinitiatedimmediately.Leavepoweredon–Whenisolationoccursonthehost,thestateofthevirtualmachinesremainsunchanged.

Thissettingcanbechangedontheclustersettingsundervirtualmachineoptions.

Figure19-Clusterdefaultsettings

Thedefaultsettingfortheisolationresponsehaschangedmultipletimesoverthelastcoupleofyearsandthishascausedsomeconfusion.

UptoESXi3.5U2/vCenter2.5U2thedefaultisolationresponsewas“Poweroff”WithESXi3.5U3/vCenter2.5U3thiswaschangedto“Leavepoweredon”WithvSphere4.0itwaschangedto“Shutdown”.WithvSphere5.0ithasbeenchangedto“Leavepoweredon”.

Keepinmindthatthesechangesareonlyapplicabletonewlycreatedclusters.Whencreatinganewcluster,itmayberequiredtochangethedefaultisolationresponsebasedontheconfigurationofexistingclustersand/oryourcustomer’srequirements,constraintsandexpectations.Whenupgradinganexistingcluster,itmightbewisetoapplythelatestdefaultvalues.Youmightwonderwhythedefaulthaschangedonceagain.Therewasalotoffeedbackfromcustomersthat“Leavepoweredon”wasthedesireddefaultvalue.

vSphere6.xHADeepdive

42RestartingVirtualMachines

Page 43: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Basicdesignprinciple:Beforeupgradinganenvironmenttolaterversions,ensureyouvalidatethebestpracticesanddefaultsettings.Documentthem,includingjustification,toensureallpeopleinvolvedunderstandyourreasons.

Thequestionremains,whichsettingshouldbeused?Theobviousanswerapplieshere;itdepends.Weprefer“Leavepoweredon”becauseiteliminatesthechancesofhavingafalsepositiveanditsassociateddowntime.OneoftheproblemsthatpeoplehaveexperiencedinthepastisthatHAtriggereditsisolationresponsewhenthefullmanagementnetworkwentdown.Basicallyresultinginthepoweroff(orshutdown)ofeverysinglevirtualmachineandnonebeingrestarted.Thisproblemhasbeenmitigated.HAwillvalidateifvirtualmachinesrestartscanbeattempted–thereisnoreasontoincuranydowntimeunlessabsolutelynecessary.Itdoesthisbyvalidatingthatamasterownsthedatastorethevirtualmachineisstoredon.Ofcourse,theisolatedhostcanonlyvalidatethisifithasaccesstothedatastores.InaconvergednetworkenvironmentwithiSCSIstorage,forinstance,itwouldbeimpossibletovalidatethisduringafullisolationasthevalidationwouldfailduetotheinaccessibledatastorefromtheperspectiveoftheisolatedhost.

Wefeelthatchangingtheisolationresponseismostusefulinenvironmentswhereafailureofthemanagementnetworkislikelycorrelatedwithafailureofthevirtualmachinenetwork(s).Ifthefailureofthemanagementnetworkwon’tlikelycorrespondwiththefailureofthevirtualmachinenetworks,isolationresponsewouldcauseunnecessarydowntimeasthevirtualmachinescancontinuetorunwithoutmanagementnetworkconnectivitytothehost.

Aseconduseforpoweroff/shutdownisinscenarioswherethevirtualmachineretainsaccesstothevirtualmachinenetworkbutlosesaccesstoitsstorage,leavingthevirtualmachinepowered-oncouldresultintwovirtualmachinesonthenetworkwiththesameIPaddress.

Itisstilldifficulttodecidewhichisolationresponseshouldbeused.Thefollowingtablewascreatedtoprovidesomemoreguidelines.

vSphere6.xHADeepdive

43RestartingVirtualMachines

Page 44: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Likelihoodthathostwillretainaccessto

VMdatastore

LikelihoodVMswillretain

accesstoVM

network

RecommendedIsolationPolicy

Rationale

Likely Likely LeavePoweredOn

Virtualmachineisrunningfine,noreasontopoweritoff

Likely UnlikelyEitherLeavePoweredOnorShutdown.

ChooseshutdowntoallowHAtorestartvirtualmachinesonhoststhatarenotisolatedandhencearelikelytohaveaccesstostorage

Unlikely Likely PowerOff

UsePowerOfftoavoidhavingtwoinstancesofthesamevirtualmachineonthevirtualmachinenetwork

Unlikely UnlikelyLeavePoweredOnorPowerOff

LeavePoweredonifthevirtualmachinecanrecoverfromthenetwork/datastoreoutageifitisnotrestartedbecauseoftheisolation,andPowerOffifitlikelycan’t.

Thequestionthatwehaven’tansweredyetishowHAknowswhichvirtualmachineshavebeenpowered-offduetothetriggeredisolationresponseandwhytheisolationresponseismorereliablethanwithpreviousversionsofHA.Previously,HAdidnotcareandwouldalwaystrytorestartthevirtualmachinesaccordingtothelastknownstateofthehost.Thatisnolongerthecase.Beforetheisolationresponseistriggered,theisolatedhostwillverifywhetheramasterisresponsibleforthevirtualmachine.

Asmentionedearlier,itdoesthisbyvalidatingifamasterownsthehomedatastoreofthevirtualmachine.Whenisolationresponseistriggered,theisolatedhostremovesthevirtualmachineswhicharepoweredofforshutdownfromthe“poweron”file.Themasterwillrecognizethatthevirtualmachineshavedisappearedandinitiatearestart.Ontopofthat,whentheisolationresponseistriggered,itwillcreateaper-virtualmachinefileundera“poweredoff”directorywhichindicatesforthemasterthatthisvirtualmachinewaspowereddownasaresultofatriggeredisolationresponse.Thisinformationwillbereadbythemasternodewhenitinitiatestherestartattemptinordertoguaranteethatonlyvirtualmachinesthatwerepoweredoff/shutdownbyHAwillberestartedbyHA.

Thisis,however,onlyonepartoftheincreasedreliabilityofHA.Reliabilityhasalsobeenimprovedwithrespectto“isolationdetection,”whichwillbedescribedinthefollowingsection.

IsolationDetection

vSphere6.xHADeepdive

44RestartingVirtualMachines

Page 45: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Wehaveexplainedwhattheoptionsaretorespondtoanisolationeventandwhathappenswhentheselectedresponseistriggered.However,wehavenotextensivelydiscussedhowisolationisdetected.Themechanismisfairlystraightforwardandworkswithheartbeats,asearlierexplained.Thereare,however,twoscenariosagain,andtheprocessandassociatedtimelinesdifferforeachofthem:

IsolationofaslaveIsolationofamaster

Beforeweexplainthedifferencesinprocessbetweenbothscenarios,wewanttomakesureitisclearthatachangeinstatewillresultintheisolationresponsenotbeingtriggeredineitherscenario.Meaningthatifasinglepingissuccessfulorthehostobserveselectiontrafficandiselectedamasterorslave,theisolationresponsewillnotbetriggered,whichisexactlywhatyouwantasavoidingdowntimeisatleastasimportantasrecoveringfromdowntime.Whenahosthasdeclareditselfisolatedandobserveselectiontrafficitwilldeclareitselfnolongerisolated.

IsolationofaSlave

HAtriggersamasterelectionprocessbeforeitwilldeclareahostisisolated.Inthebelowtimeline,“s”referstoseconds.

T0–Isolationofthehost(slave)T10s–Slaveenters“electionstate”T25s–SlaveelectsitselfasmasterT25s–Slavepings“isolationaddresses”T30s–SlavedeclaresitselfisolatedT60s–Slave“triggers”isolationresponse

WhentheisolationresponseistriggeredHAcreatesa“power-off”fileforanyvirtualmachineHApowersoffwhosehomedatastoreisaccessible.Nextitpowersoffthevirtualmachine(orshutsdown)andupdatesthehost’spoweronfile.Thepower-offfileisusedtorecordthatHApoweredoffthevirtualmachineandsoHAshouldrestartit.Thesepower-offfilesaredeletedwhenavirtualmachineispoweredbackonorHAisdisabled.

Afterthecompletionofthissequence,themasterwilllearntheslavewasisolatedthroughthe“poweron”fileasmentionedearlier,andwillrestartvirtualmachinesbasedontheinformationprovidedbytheslave.

vSphere6.xHADeepdive

45RestartingVirtualMachines

Page 46: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure20-Isolationofaslavetimeline

IsolationofaMaster

Inthecaseoftheisolationofamaster,thistimelineisabitlesscomplicatedbecausethereisnoneedtogothroughanelectionprocess.Inthistimeline,“s”referstoseconds.

T0–Isolationofthehost(master)T0–Masterpings“isolationaddresses”T5s–MasterdeclaresitselfisolatedT35s–Master“triggers”isolationresponse

AdditionalChecks

Beforeahostdeclaresitselfisolated,itwillpingthedefaultisolationaddresswhichisthegatewayspecifiedforthemanagementnetwork,andwillcontinuetopingtheaddressuntilitbecomesunisolated.HAgivesyoutheoptiontodefineoneormultipleadditionalisolationaddressesusinganadvancedsetting.Thisadvancedsettingiscalleddas.isolationaddressandcouldbeusedtoreducethechancesofhavingafalsepositive.Werecommendsettinganadditionalisolationaddress.Ifasecondarymanagementnetworkisconfigured,thisadditionaladdressshouldbepartofthesamenetworkasthesecondarymanagementnetwork.Ifrequired,youcanconfigureupto10additionalisolationaddresses.Asecondarymanagementnetworkwillmorethanlikelybeonadifferentsubnetanditisrecommendedtospecifyanadditionalisolationaddresswhichispartofthesubnet.

vSphere6.xHADeepdive

46RestartingVirtualMachines

Page 47: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure21-IsolationAddress

SelectinganAdditionalIsolationAddressAquestionaskedbymanypeopleiswhichaddressshouldbespecifiedforthisadditionalisolationverification.Wegenerallyrecommendanisolationaddressclosetothehoststoavoidtoomanynetworkhopsandanaddressthatwouldcorrelatewiththelivenessofthevirtualmachinenetwork.Inmanycases,themostlogicalchoiceisthephysicalswitchtowhichthehostisdirectlyconnected.Basically,usethegatewayforwhateversubnetyourmanagementnetworkison.Anotherusualsuspectwouldbearouteroranyotherreliableandpingabledeviceonthesamesubnet.However,whenyouareusingIP-basedsharedstoragelikeNFSoriSCSI,theIP-addressofthestoragedevicecanalsobeagoodchoice.

Basicdesignprinciple:Selectareliablesecondaryisolationaddress.Trytominimizethenumberof“hops”betweenthehostandthisaddress.

IsolationPolicyDelayForthosewhowanttoincreasethetimeittakesbeforeHAexecutestheisolationresponseanadvancedsettingisavailable.Thussettingiscalled“das.config.fdm.isolationPolicyDelaySec”andallowschangingthenumberofsecondstowaitbeforetheisolationpolicyisexecutedis.Theminimumvalueis30.Ifsettoavaluelessthan30,thedelaywillbe30seconds.Wedonotrecommendchangingthisadvancedsettingunlessthereisaspecificrequirementtodoso.Inalmostallscenarios30secondsshouldsuffice.

RestartingVirtualMachinesThemostimportantprocedurehasnotyetbeenexplained:restartingvirtualmachines.Wehavededicatedafullsectiontothisconcept.

vSphere6.xHADeepdive

47RestartingVirtualMachines

Page 48: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Wehaveexplainedthedifferenceinbehaviorfromatimingperspectiveforrestartingvirtualmachinesinthecaseofabothmasternodeandslavenodefailures.Fornow,let’sassumethataslavenodehasfailed.WhenthemasternodedeclarestheslavenodeasPartitionedorIsolated,itdetermineswhichvirtualmachineswererunningonusingtheinformationitpreviouslyreadfromthehost’s“poweron”file.Thesefilesareasynchronouslyreadapproximatelyevery30s.IfthehostwasnotPartitionedorIsolatedbeforethefailure,themasterusescacheddatatodeterminethevirtualmachinesthatwerelastrunningonthehostbeforethefailureoccurred.

Beforeitwillinitiatetherestartattempts,though,themasterwillfirstvalidatethatthevirtualmachineshouldberestarted.ThisvalidationusestheprotectioninformationvCenterServerprovidestoeachmaster,orifthemasterisnotincontactwithvCenterServer,theinformationsavedintheprotectedlistfiles.IfthemasterisnotincontactwithvCenterServerorhasnotlockedthefile,thevirtualmachineisfilteredout.Atthispoint,allvirtualmachineshavingarestartpriorityof“disabled”arealsofilteredout.

NowthatHAknowswhichvirtualmachinesitshouldrestart,itistimetodecidewherethevirtualmachinesareplaced.HAwilltakemultiplethingsintoaccount:

CPUandmemoryreservation,includingthememoryoverheadofthevirtualmachineUnreservedcapacityofthehostsintheclusterRestartpriorityofthevirtualmachinerelativetotheothervirtualmachinesthatneedtoberestartedVirtual-machine-to-hostcompatibilitysetThenumberofdvPortsrequiredbyavirtualmachineandthenumberavailableonthecandidatehostsThemaximumnumberofvCPUsandvirtualmachinesthatcanberunonagivenhostRestartlatencyWhethertheactivehostsarerunningtherequirednumberofagentvirtualmachines.

Restartlatencyreferstotheamountoftimeittakestoinitiatevirtualmachinerestarts.Thismeansthatvirtualmachinerestartswillbedistributedbythemasteracrossmultiplehoststoavoidabootstorm,andthusadelay,onasinglehost.

Ifaplacementisfound,themasterwillsendeachtargethostthesetofvirtualmachinesitneedstorestart.Ifthislistexceeds32virtualmachines,HAwilllimitthenumberofconcurrentpoweronattemptsto32.Ifavirtualmachinesuccessfullypowerson,thenodeonwhichthevirtualmachinewaspoweredonwillinformthemasterofthechangeinpowerstate.Themasterwillthenremovethevirtualmachinefromtherestartlist.

Ifaplacementcannotbefound,themasterwillplacethevirtualmachineona“pendingplacementlist”andwillretryplacementofthevirtualmachinewhenoneofthefollowingconditionschanges:

vSphere6.xHADeepdive

48RestartingVirtualMachines

Page 49: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Anewvirtual-machine-to-hostcompatibilitylistisprovidedbyvCenter.Ahostreportsthatitsunreservedcapacityhasincreased.Ahost(re)joinsthecluster(Forinstance,whenahostistakenoutofmaintenancemode,ahostisaddedtoacluster,etc.)Anewfailureisdetectedandvirtualmachineshavetobefailedover.Afailureoccurredwhenfailingoveravirtualmachine.

ButwhataboutDRS?Wouldn’tDRSbeabletohelpduringtheplacementofvirtualmachineswhenallelsefails?Itdoes.ThemasternodewillreporttovCenterthesetofvirtualmachinesthatwerenotplacedduetoinsufficientresources,asisthecasetoday.IfDRSisenabled,thisinformationwillbeusedinanattempttohaveDRSmakecapacityavailable.

ComponentProtectionInvSphere6.0anewfeatureaspartofvSphereHAisintroducedcalledVMComponentProtection.VMComponentProtection(VMCP)invSphere6.0allowsyoutoprotectvirtualmachinesagainstthefailureofyourstoragesystem.TherearetwotypesoffailuresVMCPwillrespondtoandthosearePermanentDeviceLoss(PDL)andAllPathsDown(APD).Beforewelookatsomeofthedetails,wewanttopointoutthatenablingVMCPisextremelyeasy.Itcanbeenabledbyasingletickboxasshowninthescreenshotbelow.

Figure22-VirtualMachineComponentProtection

vSphere6.xHADeepdive

49RestartingVirtualMachines

Page 50: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

AsstatedtherearetwoscenariosHAcanrespondto,PDLandAPD.Letslookatthosetwoscenariosabitcloser.WithvSphere5.0afeaturewasintroducedasanadvancedoptionthatwouldallowvSphereHAtorestartVMsimpactedbyaPDLcondition.

APDLcondition,isaconditionthatiscommunicatedbythearraycontrollertoESXiviaaSCSIsensecode.Thisconditionindicatesthatadevice(LUN)hasbecomeunavailableandislikelypermanentlyunavailable.AnexamplescenarioinwhichthisconditionwouldbecommunicatedbythearraywouldbewhenaLUNissetoffline.ThisconditionisusedduringafailurescenariotoensureESXitakesappropriateactionwhenaccesstoaLUNisrevoked.ItshouldbenotedthatwhenafullstoragefailureoccursitisimpossibletogeneratethePDLconditionasthereisnocommunicationpossiblebetweenthearrayandtheESXihost.ThisstatewillbeidentifiedbytheESXihostasanAPDcondition.

Althoughthefunctionalityitselfworkedasadvertised,enablingandmanagingitwascumbersomeanderrorprone.Itwasrequiredtosettheoption“disk.terminateVMOnPDLDefault”manually.WithvSphere6.0asimpleoptionintheWebClientisintroducedwhichallowsyoutospecifywhattheresponseshouldbetoaPDLsensecode.

Figure23-EnablingVirtualMachineComponentProtection

Thetwooptionsprovidedare“IssueEvents”and“PoweroffandrestartVMs”.Notethat“PoweroffandrestartVMs”doesexactlythat,yourVMprocessiskilledandtheVMisrestartedonahostwhichstillhasaccesstothestoragedevice.

UntilnowitwasnotpossibleforvSpheretorespondtoanAPDscenario.APDisthesituationwherethestoragedeviceisinaccessiblebutforunknownreasons.Inmostcaseswherethisoccursitistypicallyrelatedtoastoragenetworkproblem.WithvSphere5.1changeswereintroducedtothewayAPDscenarioswerehandledbythehypervisor.ThismechanismisleveragedbyHAtoallowforaresponse.

WhenanAPDoccursatimerstarts.After140secondstheAPDisdeclaredandthedeviceismarkedasAPDtimeout.Whenthe140secondshaspassedHAwillstartcounting.TheHAtimeoutis3minutesbydefaultatshowninFigure24.Whenthe3minuteshaspassed

vSphere6.xHADeepdive

50RestartingVirtualMachines

Page 51: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

HAwilltaketheactiondefined.Thereareagaintwooptions“IssueEvents”and“PoweroffandrestartVMs”.

YoucanalsospecifyhowaggressivelyHAneedstotrytorestartVMsthatareimpactedbyanAPD.Notethataggressive/conservativereferstothelikelihoodofHAbeingabletorestartVMs.Whensetto“conservative”HAwillonlyrestarttheVMthatisimpactedbytheAPDifitknowsanotherhostcanrestartit.Inthecaseof“aggressive”HAwilltrytorestarttheVMevenifitdoesn’tknowthestateoftheotherhosts,whichcouldleadtoasituationwhereyourVMisnotrestartedasthereisnohostthathasaccesstothedatastoretheVMislocatedon.

ItisalsogoodtoknowthatiftheAPDisliftedandaccesstothestorageisrestoredduringthetotaloftheapproximate5minutesand20secondsitwouldtakebeforetheVMrestartisinitiated,thatHAwillnotdoanythingunlessyouexplicitlyconfigureitdoso.Thisiswherethe“ResponseforAPDrecoveryafterAPDtimeout”comesintoplay.IfthereisadesiretodosoyoucanrestarttheVMevenwhenthehosthasrecoveredfromtheAPDscenario,duringthe3minute(defaultvalue)graceperiod.

Basicdesignprinciple:Withoutaccesstosharedstorageavirtualmachinebecomesuseless.ItishighlyrecommendedtoconfigureVMCPtoactonaPDLandAPDscenario.Werecommendtosetbothto“poweroffandrestartsVMs”butleavethe“responseforAPDrecoveryafterAPDtimeout”disabledsothatVMsarenotrebootedunnecessarrily.

vSphereHAnuggetsPriortovSphere5.5,HAdidnothingwithVMtoVMAffinityorAntiAffinityrules.Typicallyforpeopleusing“affinity”rulesthiswasnotanissue,butthoseusing“anti-affinity”rulesdidseethisasanissue.Theycreatedtheserulestoensurespecificvirtualmachineswouldneverberunningonthesamehost,butvSphereHAwouldsimplyignoretherulewhenafailurehadoccurredandjustplacetheVMs“randomly”.WithvSphere5.5thishaschanged!vSphereHAisnow“antiaffinity”aware.Inordertoensureanti-affinityrulesarerespectedyoucansetanadvancedsettingorconfigureinthevSphereWebClientasofvSphere6.0.

das.respectVmVmAntiAffinityRules-Values:"false"(default)and"true"

Nownotethatthisalsomeansthatwhenyouconfigureanti-affinityrulesandhavethisadvancedsettingconfiguredto“true”andsomehowtherearen’tsufficienthostsavailabletorespecttheserules…thenruleswillberespectedanditcouldresultinHAnotrestartingaVM.Makesuretounderstandthispotentialimpactwhenconfiguringthissettingandconfiguringtheserules.

vSphere6.xHADeepdive

51RestartingVirtualMachines

Page 52: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

WithvSphere6.0supportforrespectingVMtoHostaffinityruleshasbeenincluded.Thisisenabledthroughtheuseofanadvancedsettingcalled“das.respectVmHostSoftAffinityRules”.Whentheadvancedsetting“das.respectVmHostSoftAffinityRules”isconfiguredvSphereHAwilltrytorespecttheruleswhenitcan.IfthereareanyhostsintheclusterwhichbelongtothesameVM-HostgroupthenHAwillrestarttherespectiveVMonthathost.Asthisisa“shouldrule”HAhastheabilitytoignoretherulewhenneeded.IfthereisascenariowherenoneofthehostsintheVM-HostshouldruleisavailableHAwillrestarttheVMonanyotherhostinthecluster.

das.respectVmHostSoftAffinityRules-Values:"false"(default)and"true"

ADDSCREENSHOTHERE!#RestartingVirtualMachines

Inthepreviouschapter,wehavedescribedmostofthelowerlevelfundamentalconceptsofHA.WehaveshownyouthatmultiplemechanismsincreaseresiliencyandreliabilityofHA.ReliabilityofHAinthiscasemostlyreferstorestarting(orresetting)virtualmachines,asthatremainsHA’sprimarytask.

HAwillrespondwhenthestateofahosthaschanged,or,bettersaid,whenthestateofoneormorevirtualmachineshaschanged.TherearemultiplescenariosinwhichHAwillrespondtoavirtualmachinefailure,themostcommonofwhicharelistedbelow:

FailedhostIsolatedhostFailedguestoperatingsystem

Dependingonthetypeoffailure,butalsodependingontheroleofthehost,theprocesswilldifferslightly.Changingtheprocessresultsinslightlydifferentrecoverytimelines.Therearemanydifferentscenariosandthereisnopointincoveringallofthem,sowewilltrytodescribethemostcommonscenarioandincludetimelineswherepossible.

Beforewediveintothedifferentfailurescenarios,wewanttoexplainhowrestartpriorityandretrieswork.

RestartPriorityandOrderHAcantaketheconfiguredpriorityofthevirtualmachineintoaccountwhenrestartingVMs.However,itisgoodtoknowthatAgentVMstakeprecedenceduringtherestartprocedureasthe“regular”virtualmachinesmayrelyonthem.Agoodexampleofanagentvirtualmachineisavirtualstorageappliance.

Prioritizationisdonebyeachhostandnotglobally.Eachhostthathasbeenrequestedtoinitiaterestartattemptswillattempttorestartalltoppriorityvirtualmachinesbeforeattemptingtostartanyothervirtualmachines.Iftherestartofatoppriorityvirtualmachine

vSphere6.xHADeepdive

52RestartingVirtualMachines

Page 53: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

fails,itwillberetriedafteradelay.Inthemeantime,however,HAwillcontinuepoweringontheremainingvirtualmachines.Keepinmindthatsomevirtualmachinesmightbedependentontheagentvirtualmachines.Youshoulddocumentwhichvirtualmachinesaredependentonwhichagentvirtualmachinesanddocumenttheprocesstostartuptheseservicesintherightorderinthecasetheautomaticrestartofanagentvirtualmachinefails.

Basicdesignprinciple:Virtualmachinescanbedependentontheavailabilityofagentvirtualmachinesorothervirtualmachines.AlthoughHAwilldoitsbesttoensureallvirtualmachinesarestartedinthecorrectorder,thisisnotguaranteed.Documenttheproperrecoveryprocess.

Besidesagentvirtualmachines,HAalsoprioritizesFTsecondarymachines.Wehavelistedthefullorderinwhichvirtualmachineswillberestartedbelow:

AgentvirtualmachinesFTsecondaryvirtualmachinesVirtualMachinesconfiguredwitharestartpriorityofhighVirtualMachinesconfiguredwithamediumrestartpriorityVirtualMachinesconfiguredwithalowrestartpriority

ItshouldbenotedthatHAwillnotplaceanyvirtualmachinesonahostiftherequirednumberofagentvirtualmachinesarenotrunningonthehostatthetimeplacementisdone.

Nowthatwehavebrieflytouchedonit,wewouldalsoliketoaddress“restartretries”andparallelizationofrestartsasthatmoreorlessdictateshowlongitcouldtakebeforeallvirtualmachinesofafailedorisolatedhostarerestarted.

RestartRetriesThenumberofretriesisconfigurableasofvCenter2.5U4withtheadvancedoption“das.maxvmrestartcount”.Thedefaultvalueis5.Notethattheinitialrestartisincluded.

HAwilltrytostartthevirtualmachineononeofyourhostsintheaffectedcluster;ifthisisunsuccessfulonthathost,therestartcountwillbeincreasedby1.Beforewegointotheexacttimeline,letitbeclearthatT0isthepointatwhichthemasterinitiatesthefirstrestartattempt.Thisbyitselfcouldbe30secondsafterthevirtualmachinehasfailed.Theelapsedtimebetweenthefailureofthevirtualmachineandtherestart,though,willdependonthescenarioofthefailure,whichwewilldiscussinthischapter.

Assaid,thedefaultnumberofrestartsis5.Therearespecifictimesassociatedwitheachoftheseattempts.Thefollowingbulletlistwillclarifythisconcept.The‘m’standsfor“minutes”inthislist.

vSphere6.xHADeepdive

53RestartingVirtualMachines

Page 54: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

T0–InitialRestartT2m–Restartretry1T6m–Restartretry2T14m–Restartretry3T30m–Restartretry4

vSphere6.xHADeepdive

54RestartingVirtualMachines

Page 55: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure24-HighAvailabilityrestarttimeline

vSphere6.xHADeepdive

55RestartingVirtualMachines

Page 56: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Asclearlydepictedinthediagramabove,asuccessfulpower-onattemptcouldtakeupto~30minutesinthecasewheremultiplepower-onattemptsareunsuccessful.Thisis,however,notexactscience.Forinstance,thereisa2-minutewaitingperiodbetweentheinitialrestartandthefirstrestartretry.HAwillstartthe2-minutewaitassoonasithasdetectedthattheinitialattempthasfailed.So,inreality,T2couldbeT2plus8seconds.Anotherimportantfactthatwewantemphasizeisthatthereisnocoordinationbetweenmasters,andsoifmultipleonesareinvolvedintryingtorestartthevirtualmachine,eachwillretaintheirownsequence.Multiplemasterscouldattempttorestartavirtualmachine.Althoughonlyonewillsucceed,itmightchangesomeofthetimelines.

Let’sgiveanexampletoclarifythescenarioinwhichamasterfailsduringarestartsequence:

Cluster:4Host(esxi01,esxi02,esxi03,esxi04)

Master:esxi01

Thehost“esxi02”isrunningasinglevirtualmachinecalled“vm01”anditfails.Themaster,esxi01,willtrytorestartitbuttheattemptfails.Itwilltryrestarting“vm01”upto5timesbut,

unfortunately,onthe4thtry,themasteralsofails.Anelectionoccursand“esxi03”becomesthenewmaster.Itwillnowinitiatetherestartof“vm01”,andifthatrestartwouldfailitwillretryitupto4timesagainforatotalincludingtheinitialrestartof5.

Beaware,though,thatasuccessfulrestartmightneveroccuriftherestartcountisreachedandallfiverestartattempts(thedefaultvalue)wereunsuccessful.

Whenitcomestorestarts,onethingthatisveryimportanttorealizeisthatHAwillnotissuemorethan32concurrentpower-ontasksonagivenhost.Tomakethatmoreclear,let’susetheexampleofatwohostcluster:ifahostfailswhichcontained33virtualmachinesandall

ofthesehadthesamerestartpriority,32poweronattemptswouldbeinitiated.The33rd

poweronattemptwillonlybeinitiatedwhenoneofthose32attemptshascompletedregardlessofsuccessorfailureofoneofthoseattempts.

Now,herecomesthegotcha.Ifthereare32low-priorityvirtualmachinestobepoweredonandasinglehigh-priorityvirtualmachine,thepoweronattemptforthelow-priorityvirtualmachineswillnotbeissueduntilthepoweronattemptforthehighpriorityvirtualmachinehascompleted.LetitbeabsolutelyclearthatHAdoesnotwaittorestartthelow-priorityvirtualmachinesuntilthehigh-priorityvirtualmachinesarestarted,itwaitsfortheissuedpoweronattempttobereportedas“completed”.Intheory,thismeansthatifthepoweronattemptfails,thelow-priorityvirtualmachinescouldbepoweredonbeforethehighpriorityvirtualmachine.

Therestartpriorityhoweverdoesguaranteethatwhenaplacementisdone,thehigherpriorityvirtualmachinesgetfirstrighttoanyavailableresources.

vSphere6.xHADeepdive

56RestartingVirtualMachines

Page 57: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Basicdesignprinciple:Configuringrestartpriorityofavirtualmachineisnotaguaranteethatvirtualmachineswillactuallyberestartedinthisorder.Ensureproperoperationalproceduresareinplaceforrestartingservicesorvirtualmachinesintheappropriateorderintheeventofafailure.

Nowthatweknowhowvirtualmachinerestartpriorityandrestartretriesarehandled,itistimetolookatthedifferentscenarios.

FailedhostFailureofamasterFailureofaslave

Isolatedhostandresponse

FailedHostWhendiscussingafailedhostscenarioitisneededtomakeadistinctionbetweenthefailureofamasterversusthefailureofaslave.Wewanttoemphasizethisbecausethetimeittakesbeforearestartattemptisinitiateddiffersbetweenthesetwoscenarios.Althoughthemajorityofyouprobablywon’tnoticethetimedifference,itisimportanttocallout.Let’sstartwiththemostcommonfailure,thatofahostfailing,butnotethatfailuresgenerallyoccurinfrequently.Inmostenvironments,hardwarefailuresareveryuncommontobeginwith.Justincaseithappens,itdoesn’thurttounderstandtheprocessanditsassociatedtimelines.

TheFailureofaSlave

Thefailureofaslavehostisisafairlycomplexscenario.Partofthiscomplexitycomesfromtheintroductionofanewheartbeatmechanism.Actually,therearetwodifferentscenarios:onewhereheartbeatdatastoresareconfiguredandonewhereheartbeatdatastoresarenotconfigured.Keepinginmindthatthisisanactualfailureofthehost,thetimelineisasfollows:

T0–Slavefailure.T3s–Masterbeginsmonitoringdatastoreheartbeatsfor15seconds.T10s–Thehostisdeclaredunreachableandthemasterwillpingthemanagementnetworkofthefailedhost.Thisisacontinuouspingfor5seconds.T15s–Ifnoheartbeatdatastoresareconfigured,thehostwillbedeclareddead.T18s–Ifheartbeatdatastoresareconfigured,thehostwillbedeclareddead.

Themastermonitorsthenetworkheartbeatsofaslave.Whentheslavefails,theseheartbeatswillnolongerbereceivedbythemaster.WehavedefinedthisasT0.After3seconds(T3s),themasterwillstartmonitoringfordatastoreheartbeatsanditwilldothisfor

vSphere6.xHADeepdive

57RestartingVirtualMachines

Page 58: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

15seconds.Onthe10thsecond(T10s),whennonetworkordatastoreheartbeatshavebeendetected,thehostwillbedeclaredas“unreachable”.Themasterwillalsostartpinging

themanagementnetworkofthefailedhostatthe10thsecondanditwilldosofor5seconds.Ifnoheartbeatdatastoreswereconfigured,thehostwillbedeclared“dead”atthe

15thsecond(T15s)andvirtualmachinerestartswillbeinitiatedbythemaster.Ifheartbeat

datastoreshavebeenconfigured,thehostwillbedeclareddeadatthe18thsecond(T18s)andrestartswillbeinitiated.Werealizethatthiscanbeconfusingandhopethetimelinedepictedinthediagrambelowmakesiteasiertodigest.

vSphere6.xHADeepdive

58RestartingVirtualMachines

Page 59: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure25-Restarttimelineslavefailure

Themasterfiltersthevirtualmachinesitthinksfailedbeforeinitiatingrestarts.Themasterusestheprotectedlistforthis,on-diskstatecouldbeobtainedonlybyonemasteratatimesinceitrequiredopeningtheprotectedlistfileinexclusivemode.IfthereisanetworkpartitionmultiplemasterscouldtrytorestartthesamevirtualmachineasvCenterServeralsoprovidedthenecessarydetailsforarestart.Asanexample,itcouldhappenthatamasterhaslockedavirtualmachine’shomedatastoreandhasaccesstotheprotectedlist

vSphere6.xHADeepdive

59RestartingVirtualMachines

Page 60: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

whiletheothermasterisincontactwithvCenterServerandassuchisawareofthecurrentdesiredprotectedstate.InthisscenarioitcouldhappenthatthemasterwhichdoesnotownthehomedatastoreofthevirtualmachinewillrestartthevirtualmachinebasedontheinformationprovidedbyvCenterServer.

Thischangeinbehaviorwasintroducedtoavoidthescenariowherearestartofavirtualmachinewouldfailduetoinsufficientresourcesinthepartitionwhichwasresponsibleforthevirtualmachine.Withthischange,thereislesschanceofsuchasituationoccurringasthemasterintheotherpartitionwouldbeusingtheinformationprovidedbyvCenterServertoinitiatetherestart.

Thatleavesuswiththequestionofwhathappensinthecaseofthefailureofamaster.

TheFailureofaMaster

Inthecaseofamasterfailure,theprocessandtheassociatedtimelineareslightlydifferent.Thereasonbeingthatthereneedstobeamasterbeforeanyrestartcanbeinitiated.Thismeansthatanelectionwillneedtotakeplaceamongsttheslaves.Thetimelineisasfollows:

T0–Masterfailure.T10s–Masterelectionprocessinitiated.T25s–Newmasterelectedandreadstheprotectedlist.T35s–Newmasterinitiatesrestartsforallvirtualmachinesontheprotectedlistwhicharenotrunning.

Slavesreceivenetworkheartbeatsfromtheirmaster.Ifthemasterfails,let’sdefinethisasT0(Tzero),theslavesdetectthiswhenthenetworkheartbeatsceasetobereceived.Aseveryclusterneedsamaster,theslaveswillinitiateanelectionatT10s.Theelectionprocesstakes15stocomplete,whichbringsustoT25s.AtT25s,thenewmasterreadstheprotectedlist.Thislistcontainsallthevirtualmachines,whichareprotectedbyHA.AtT35s,themasterinitiatestherestartofallvirtualmachinesthatareprotectedbutnotcurrentlyrunning.Thetimelinedepictedinthediagrambelowhopefullyclarifiestheprocess.

vSphere6.xHADeepdive

60RestartingVirtualMachines

Page 61: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure26-Restarttimelinemasterfailure

Besidesthefailureofahost,thereisanotherreasonforrestartingvirtualmachines:anisolationevent.

IsolationResponseandDetectionBeforewewilldiscussthetimelineandtheprocessaroundtherestartofvirtualmachinesafteranisolationevent,wewilldiscussIsolationResponseandIsolationDetection.OneofthefirstdecisionsthatwillneedtobemadewhenconfiguringHAisthe“IsolationResponse”.

IsolationResponse

vSphere6.xHADeepdive

61RestartingVirtualMachines

Page 62: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

TheIsolationResponsereferstotheactionthatHAtakesforitsvirtualmachineswhenthehosthaslostitsconnectionwiththenetworkandtheremainingnodesinthecluster.Thisdoesnotnecessarilymeanthatthewholenetworkisdown;itcouldjustbethemanagementnetworkportsofthisspecifichost.Todaytherearethreeisolationresponses:“Poweroff”,“Leavepoweredon”and“Shutdown”.Thisisolationresponseanswersthequestion,“whatshouldahostdowiththevirtualmachinesitmanageswhenitdetectsthatitisisolatedfromthenetwork?”Let’sdiscussthesethreeoptionsmorein-depth:

Poweroff–Whenisolationoccurs,allvirtualmachinesarepoweredoff.Itisahardstop,ortoputitbluntly,the“virtual”powercableofthevirtualmachinewillbepulledout!Shutdown–Whenisolationoccurs,allvirtualmachinesrunningonthehostwillbeshutdownusingaguest-initiatedshutdownthroughVMwareTools.Ifthisisnotsuccessfulwithin5minutes,a“poweroff”willbeexecuted.Thistimeoutvaluecanbeadjustedbysettingtheadvancedoptiondas.isolationShutdownTimeout.IfVMwareToolsisnotinstalled,a“poweroff”willbeinitiatedimmediately.Leavepoweredon–Whenisolationoccursonthehost,thestateofthevirtualmachinesremainsunchanged.

Thissettingcanbechangedontheclustersettingsundervirtualmachineoptions.

Figure27-Clusterdefaultsettings

Thedefaultsettingfortheisolationresponsehaschangedmultipletimesoverthelastcoupleofyearsandthishascausedsomeconfusion.

UptoESXi3.5U2/vCenter2.5U2thedefaultisolationresponsewas“Poweroff”WithESXi3.5U3/vCenter2.5U3thiswaschangedto“Leavepoweredon”WithvSphere4.0itwaschangedto“Shutdown”.WithvSphere5.0ithasbeenchangedto“Leavepoweredon”.

Keepinmindthatthesechangesareonlyapplicabletonewlycreatedclusters.Whencreatinganewcluster,itmayberequiredtochangethedefaultisolationresponsebasedontheconfigurationofexistingclustersand/oryourcustomer’srequirements,constraintsandexpectations.Whenupgradinganexistingcluster,itmightbewisetoapplythelatestdefaultvalues.Youmightwonderwhythedefaulthaschangedonceagain.Therewasalotoffeedbackfromcustomersthat“Leavepoweredon”wasthedesireddefaultvalue.

vSphere6.xHADeepdive

62RestartingVirtualMachines

Page 63: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Basicdesignprinciple:Beforeupgradinganenvironmenttolaterversions,ensureyouvalidatethebestpracticesanddefaultsettings.Documentthem,includingjustification,toensureallpeopleinvolvedunderstandyourreasons.

Thequestionremains,whichsettingshouldbeused?Theobviousanswerapplieshere;itdepends.Weprefer“Leavepoweredon”becauseiteliminatesthechancesofhavingafalsepositiveanditsassociateddowntime.OneoftheproblemsthatpeoplehaveexperiencedinthepastisthatHAtriggereditsisolationresponsewhenthefullmanagementnetworkwentdown.Basicallyresultinginthepoweroff(orshutdown)ofeverysinglevirtualmachineandnonebeingrestarted.Thisproblemhasbeenmitigated.HAwillvalidateifvirtualmachinesrestartscanbeattempted–thereisnoreasontoincuranydowntimeunlessabsolutelynecessary.Itdoesthisbyvalidatingthatamasterownsthedatastorethevirtualmachineisstoredon.Ofcourse,theisolatedhostcanonlyvalidatethisifithasaccesstothedatastores.InaconvergednetworkenvironmentwithiSCSIstorage,forinstance,itwouldbeimpossibletovalidatethisduringafullisolationasthevalidationwouldfailduetotheinaccessibledatastorefromtheperspectiveoftheisolatedhost.

Wefeelthatchangingtheisolationresponseismostusefulinenvironmentswhereafailureofthemanagementnetworkislikelycorrelatedwithafailureofthevirtualmachinenetwork(s).Ifthefailureofthemanagementnetworkwon’tlikelycorrespondwiththefailureofthevirtualmachinenetworks,isolationresponsewouldcauseunnecessarydowntimeasthevirtualmachinescancontinuetorunwithoutmanagementnetworkconnectivitytothehost.

Aseconduseforpoweroff/shutdownisinscenarioswherethevirtualmachineretainsaccesstothevirtualmachinenetworkbutlosesaccesstoitsstorage,leavingthevirtualmachinepowered-oncouldresultintwovirtualmachinesonthenetworkwiththesameIPaddress.

Itisstilldifficulttodecidewhichisolationresponseshouldbeused.Thefollowingtablewascreatedtoprovidesomemoreguidelines.

vSphere6.xHADeepdive

63RestartingVirtualMachines

Page 64: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Likelihoodthathostwillretainaccessto

VMdatastore

LikelihoodVMswillretain

accesstoVM

network

RecommendedIsolationPolicy

Rationale

Likely Likely LeavePoweredOn

Virtualmachineisrunningfine,noreasontopoweritoff

Likely UnlikelyEitherLeavePoweredOnorShutdown.

ChooseshutdowntoallowHAtorestartvirtualmachinesonhoststhatarenotisolatedandhencearelikelytohaveaccesstostorage

Unlikely Likely PowerOff

UsePowerOfftoavoidhavingtwoinstancesofthesamevirtualmachineonthevirtualmachinenetwork

Unlikely UnlikelyLeavePoweredOnorPowerOff

LeavePoweredonifthevirtualmachinecanrecoverfromthenetwork/datastoreoutageifitisnotrestartedbecauseoftheisolation,andPowerOffifitlikelycan’t.

Thequestionthatwehaven’tansweredyetishowHAknowswhichvirtualmachineshavebeenpowered-offduetothetriggeredisolationresponseandwhytheisolationresponseismorereliablethanwithpreviousversionsofHA.Previously,HAdidnotcareandwouldalwaystrytorestartthevirtualmachinesaccordingtothelastknownstateofthehost.Thatisnolongerthecase.Beforetheisolationresponseistriggered,theisolatedhostwillverifywhetheramasterisresponsibleforthevirtualmachine.

Asmentionedearlier,itdoesthisbyvalidatingifamasterownsthehomedatastoreofthevirtualmachine.Whenisolationresponseistriggered,theisolatedhostremovesthevirtualmachineswhicharepoweredofforshutdownfromthe“poweron”file.Themasterwillrecognizethatthevirtualmachineshavedisappearedandinitiatearestart.Ontopofthat,whentheisolationresponseistriggered,itwillcreateaper-virtualmachinefileundera“poweredoff”directorywhichindicatesforthemasterthatthisvirtualmachinewaspowereddownasaresultofatriggeredisolationresponse.Thisinformationwillbereadbythemasternodewhenitinitiatestherestartattemptinordertoguaranteethatonlyvirtualmachinesthatwerepoweredoff/shutdownbyHAwillberestartedbyHA.

Thisis,however,onlyonepartoftheincreasedreliabilityofHA.Reliabilityhasalsobeenimprovedwithrespectto“isolationdetection,”whichwillbedescribedinthefollowingsection.

IsolationDetection

vSphere6.xHADeepdive

64RestartingVirtualMachines

Page 65: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Wehaveexplainedwhattheoptionsaretorespondtoanisolationeventandwhathappenswhentheselectedresponseistriggered.However,wehavenotextensivelydiscussedhowisolationisdetected.Themechanismisfairlystraightforwardandworkswithheartbeats,asearlierexplained.Thereare,however,twoscenariosagain,andtheprocessandassociatedtimelinesdifferforeachofthem:

IsolationofaslaveIsolationofamaster

Beforeweexplainthedifferencesinprocessbetweenbothscenarios,wewanttomakesureitisclearthatachangeinstatewillresultintheisolationresponsenotbeingtriggeredineitherscenario.Meaningthatifasinglepingissuccessfulorthehostobserveselectiontrafficandiselectedamasterorslave,theisolationresponsewillnotbetriggered,whichisexactlywhatyouwantasavoidingdowntimeisatleastasimportantasrecoveringfromdowntime.Whenahosthasdeclareditselfisolatedandobserveselectiontrafficitwilldeclareitselfnolongerisolated.

IsolationofaSlave

HAtriggersamasterelectionprocessbeforeitwilldeclareahostisisolated.Inthebelowtimeline,“s”referstoseconds.

T0–Isolationofthehost(slave)T10s–Slaveenters“electionstate”T25s–SlaveelectsitselfasmasterT25s–Slavepings“isolationaddresses”T30s–SlavedeclaresitselfisolatedT60s–Slave“triggers”isolationresponse

WhentheisolationresponseistriggeredHAcreatesa“power-off”fileforanyvirtualmachineHApowersoffwhosehomedatastoreisaccessible.Nextitpowersoffthevirtualmachine(orshutsdown)andupdatesthehost’spoweronfile.Thepower-offfileisusedtorecordthatHApoweredoffthevirtualmachineandsoHAshouldrestartit.Thesepower-offfilesaredeletedwhenavirtualmachineispoweredbackonorHAisdisabled.

Afterthecompletionofthissequence,themasterwilllearntheslavewasisolatedthroughthe“poweron”fileasmentionedearlier,andwillrestartvirtualmachinesbasedontheinformationprovidedbytheslave.

vSphere6.xHADeepdive

65RestartingVirtualMachines

Page 66: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure28-Isolationofaslavetimeline

IsolationofaMaster

Inthecaseoftheisolationofamaster,thistimelineisabitlesscomplicatedbecausethereisnoneedtogothroughanelectionprocess.Inthistimeline,“s”referstoseconds.

T0–Isolationofthehost(master)T0–Masterpings“isolationaddresses”T5s–MasterdeclaresitselfisolatedT35s–Master“triggers”isolationresponse

AdditionalChecks

Beforeahostdeclaresitselfisolated,itwillpingthedefaultisolationaddresswhichisthegatewayspecifiedforthemanagementnetwork,andwillcontinuetopingtheaddressuntilitbecomesunisolated.HAgivesyoutheoptiontodefineoneormultipleadditionalisolationaddressesusinganadvancedsetting.Thisadvancedsettingiscalleddas.isolationaddressandcouldbeusedtoreducethechancesofhavingafalsepositive.Werecommendsettinganadditionalisolationaddress.Ifasecondarymanagementnetworkisconfigured,thisadditionaladdressshouldbepartofthesamenetworkasthesecondarymanagementnetwork.Ifrequired,youcanconfigureupto10additionalisolationaddresses.Asecondarymanagementnetworkwillmorethanlikelybeonadifferentsubnetanditisrecommendedtospecifyanadditionalisolationaddresswhichispartofthesubnet.

vSphere6.xHADeepdive

66RestartingVirtualMachines

Page 67: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure29-IsolationAddress

SelectinganAdditionalIsolationAddressAquestionaskedbymanypeopleiswhichaddressshouldbespecifiedforthisadditionalisolationverification.Wegenerallyrecommendanisolationaddressclosetothehoststoavoidtoomanynetworkhopsandanaddressthatwouldcorrelatewiththelivenessofthevirtualmachinenetwork.Inmanycases,themostlogicalchoiceisthephysicalswitchtowhichthehostisdirectlyconnected.Basically,usethegatewayforwhateversubnetyourmanagementnetworkison.Anotherusualsuspectwouldbearouteroranyotherreliableandpingabledeviceonthesamesubnet.However,whenyouareusingIP-basedsharedstoragelikeNFSoriSCSI,theIP-addressofthestoragedevicecanalsobeagoodchoice.

Basicdesignprinciple:Selectareliablesecondaryisolationaddress.Trytominimizethenumberof“hops”betweenthehostandthisaddress.

IsolationPolicyDelayForthosewhowanttoincreasethetimeittakesbeforeHAexecutestheisolationresponseanadvancedsettingisavailable.Thussettingiscalled“das.config.fdm.isolationPolicyDelaySec”andallowschangingthenumberofsecondstowaitbeforetheisolationpolicyisexecutedis.Theminimumvalueis30.Ifsettoavaluelessthan30,thedelaywillbe30seconds.Wedonotrecommendchangingthisadvancedsettingunlessthereisaspecificrequirementtodoso.Inalmostallscenarios30secondsshouldsuffice.

RestartingVirtualMachinesThemostimportantprocedurehasnotyetbeenexplained:restartingvirtualmachines.Wehavededicatedafullsectiontothisconcept.

vSphere6.xHADeepdive

67RestartingVirtualMachines

Page 68: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Wehaveexplainedthedifferenceinbehaviorfromatimingperspectiveforrestartingvirtualmachinesinthecaseofabothmasternodeandslavenodefailures.Fornow,let’sassumethataslavenodehasfailed.WhenthemasternodedeclarestheslavenodeasPartitionedorIsolated,itdetermineswhichvirtualmachineswererunningonusingtheinformationitpreviouslyreadfromthehost’s“poweron”file.Thesefilesareasynchronouslyreadapproximatelyevery30s.IfthehostwasnotPartitionedorIsolatedbeforethefailure,themasterusescacheddatatodeterminethevirtualmachinesthatwerelastrunningonthehostbeforethefailureoccurred.

Beforeitwillinitiatetherestartattempts,though,themasterwillfirstvalidatethatthevirtualmachineshouldberestarted.ThisvalidationusestheprotectioninformationvCenterServerprovidestoeachmaster,orifthemasterisnotincontactwithvCenterServer,theinformationsavedintheprotectedlistfiles.IfthemasterisnotincontactwithvCenterServerorhasnotlockedthefile,thevirtualmachineisfilteredout.Atthispoint,allvirtualmachineshavingarestartpriorityof“disabled”arealsofilteredout.

NowthatHAknowswhichvirtualmachinesitshouldrestart,itistimetodecidewherethevirtualmachinesareplaced.HAwilltakemultiplethingsintoaccount:

CPUandmemoryreservation,includingthememoryoverheadofthevirtualmachineUnreservedcapacityofthehostsintheclusterRestartpriorityofthevirtualmachinerelativetotheothervirtualmachinesthatneedtoberestartedVirtual-machine-to-hostcompatibilitysetThenumberofdvPortsrequiredbyavirtualmachineandthenumberavailableonthecandidatehostsThemaximumnumberofvCPUsandvirtualmachinesthatcanberunonagivenhostRestartlatencyWhethertheactivehostsarerunningtherequirednumberofagentvirtualmachines.

Restartlatencyreferstotheamountoftimeittakestoinitiatevirtualmachinerestarts.Thismeansthatvirtualmachinerestartswillbedistributedbythemasteracrossmultiplehoststoavoidabootstorm,andthusadelay,onasinglehost.

Ifaplacementisfound,themasterwillsendeachtargethostthesetofvirtualmachinesitneedstorestart.Ifthislistexceeds32virtualmachines,HAwilllimitthenumberofconcurrentpoweronattemptsto32.Ifavirtualmachinesuccessfullypowerson,thenodeonwhichthevirtualmachinewaspoweredonwillinformthemasterofthechangeinpowerstate.Themasterwillthenremovethevirtualmachinefromtherestartlist.

Ifaplacementcannotbefound,themasterwillplacethevirtualmachineona“pendingplacementlist”andwillretryplacementofthevirtualmachinewhenoneofthefollowingconditionschanges:

vSphere6.xHADeepdive

68RestartingVirtualMachines

Page 69: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Anewvirtual-machine-to-hostcompatibilitylistisprovidedbyvCenter.Ahostreportsthatitsunreservedcapacityhasincreased.Ahost(re)joinsthecluster(Forinstance,whenahostistakenoutofmaintenancemode,ahostisaddedtoacluster,etc.)Anewfailureisdetectedandvirtualmachineshavetobefailedover.Afailureoccurredwhenfailingoveravirtualmachine.

ButwhataboutDRS?Wouldn’tDRSbeabletohelpduringtheplacementofvirtualmachineswhenallelsefails?Itdoes.ThemasternodewillreporttovCenterthesetofvirtualmachinesthatwerenotplacedduetoinsufficientresources,asisthecasetoday.IfDRSisenabled,thisinformationwillbeusedinanattempttohaveDRSmakecapacityavailable.

ComponentProtectionInvSphere6.0anewfeatureaspartofvSphereHAisintroducedcalledVMComponentProtection.VMComponentProtection(VMCP)invSphere6.0allowsyoutoprotectvirtualmachinesagainstthefailureofyourstoragesystem.TherearetwotypesoffailuresVMCPwillrespondtoandthosearePermanentDeviceLoss(PDL)andAllPathsDown(APD).Beforewelookatsomeofthedetails,wewanttopointoutthatenablingVMCPisextremelyeasy.Itcanbeenabledbyasingletickboxasshowninthescreenshotbelow.

Figure30-VirtualMachineComponentProtection

vSphere6.xHADeepdive

69RestartingVirtualMachines

Page 70: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

AsstatedtherearetwoscenariosHAcanrespondto,PDLandAPD.Letslookatthosetwoscenariosabitcloser.WithvSphere5.0afeaturewasintroducedasanadvancedoptionthatwouldallowvSphereHAtorestartVMsimpactedbyaPDLcondition.

APDLcondition,isaconditionthatiscommunicatedbythearraycontrollertoESXiviaaSCSIsensecode.Thisconditionindicatesthatadevice(LUN)hasbecomeunavailableandislikelypermanentlyunavailable.AnexamplescenarioinwhichthisconditionwouldbecommunicatedbythearraywouldbewhenaLUNissetoffline.ThisconditionisusedduringafailurescenariotoensureESXitakesappropriateactionwhenaccesstoaLUNisrevoked.ItshouldbenotedthatwhenafullstoragefailureoccursitisimpossibletogeneratethePDLconditionasthereisnocommunicationpossiblebetweenthearrayandtheESXihost.ThisstatewillbeidentifiedbytheESXihostasanAPDcondition.

Althoughthefunctionalityitselfworkedasadvertised,enablingandmanagingitwascumbersomeanderrorprone.Itwasrequiredtosettheoption“disk.terminateVMOnPDLDefault”manually.WithvSphere6.0asimpleoptionintheWebClientisintroducedwhichallowsyoutospecifywhattheresponseshouldbetoaPDLsensecode.

Figure31-EnablingVirtualMachineComponentProtection

Thetwooptionsprovidedare“IssueEvents”and“PoweroffandrestartVMs”.Notethat“PoweroffandrestartVMs”doesexactlythat,yourVMprocessiskilledandtheVMisrestartedonahostwhichstillhasaccesstothestoragedevice.

UntilnowitwasnotpossibleforvSpheretorespondtoanAPDscenario.APDisthesituationwherethestoragedeviceisinaccessiblebutforunknownreasons.Inmostcaseswherethisoccursitistypicallyrelatedtoastoragenetworkproblem.WithvSphere5.1changeswereintroducedtothewayAPDscenarioswerehandledbythehypervisor.ThismechanismisleveragedbyHAtoallowforaresponse.

WhenanAPDoccursatimerstarts.After140secondstheAPDisdeclaredandthedeviceismarkedasAPDtimeout.Whenthe140secondshaspassedHAwillstartcounting.TheHAtimeoutis3minutesbydefaultatshowninFigure24.Whenthe3minuteshaspassed

vSphere6.xHADeepdive

70RestartingVirtualMachines

Page 71: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

HAwilltaketheactiondefined.Thereareagaintwooptions“IssueEvents”and“PoweroffandrestartVMs”.

YoucanalsospecifyhowaggressivelyHAneedstotrytorestartVMsthatareimpactedbyanAPD.Notethataggressive/conservativereferstothelikelihoodofHAbeingabletorestartVMs.Whensetto“conservative”HAwillonlyrestarttheVMthatisimpactedbytheAPDifitknowsanotherhostcanrestartit.Inthecaseof“aggressive”HAwilltrytorestarttheVMevenifitdoesn’tknowthestateoftheotherhosts,whichcouldleadtoasituationwhereyourVMisnotrestartedasthereisnohostthathasaccesstothedatastoretheVMislocatedon.

ItisalsogoodtoknowthatiftheAPDisliftedandaccesstothestorageisrestoredduringthetotaloftheapproximate5minutesand20secondsitwouldtakebeforetheVMrestartisinitiated,thatHAwillnotdoanythingunlessyouexplicitlyconfigureitdoso.Thisiswherethe“ResponseforAPDrecoveryafterAPDtimeout”comesintoplay.IfthereisadesiretodosoyoucanrestarttheVMevenwhenthehosthasrecoveredfromtheAPDscenario,duringthe3minute(defaultvalue)graceperiod.

Basicdesignprinciple:Withoutaccesstosharedstorageavirtualmachinebecomesuseless.ItishighlyrecommendedtoconfigureVMCPtoactonaPDLandAPDscenario.Werecommendtosetbothto“poweroffandrestartsVMs”butleavethe“responseforAPDrecoveryafterAPDtimeout”disabledsothatVMsarenotrebootedunnecessarrily.

vSphereHAnuggetsPriortovSphere5.5,HAdidnothingwithVMtoVMAffinityorAntiAffinityrules.Typicallyforpeopleusing“affinity”rulesthiswasnotanissue,butthoseusing“anti-affinity”rulesdidseethisasanissue.Theycreatedtheserulestoensurespecificvirtualmachineswouldneverberunningonthesamehost,butvSphereHAwouldsimplyignoretherulewhenafailurehadoccurredandjustplacetheVMs“randomly”.WithvSphere5.5thishaschanged!vSphereHAisnow“antiaffinity”aware.Inordertoensureanti-affinityrulesarerespectedyoucansetanadvancedsettingorconfigureinthevSphereWebClientasofvSphere6.0.

das.respectVmVmAntiAffinityRules-Values:"false"(default)and"true"

Nownotethatthisalsomeansthatwhenyouconfigureanti-affinityrulesandhavethisadvancedsettingconfiguredto“true”andsomehowtherearen’tsufficienthostsavailabletorespecttheserules…thenruleswillberespectedanditcouldresultinHAnotrestartingaVM.Makesuretounderstandthispotentialimpactwhenconfiguringthissettingandconfiguringtheserules.

vSphere6.xHADeepdive

71RestartingVirtualMachines

Page 72: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

WithvSphere6.0supportforrespectingVMtoHostaffinityruleshasbeenincluded.Thisisenabledthroughtheuseofanadvancedsettingcalled“das.respectVmHostSoftAffinityRules”.Whentheadvancedsetting“das.respectVmHostSoftAffinityRules”isconfiguredvSphereHAwilltrytorespecttheruleswhenitcan.IfthereareanyhostsintheclusterwhichbelongtothesameVM-HostgroupthenHAwillrestarttherespectiveVMonthathost.Asthisisa“shouldrule”HAhastheabilitytoignoretherulewhenneeded.IfthereisascenariowherenoneofthehostsintheVM-HostshouldruleisavailableHAwillrestarttheVMonanyotherhostinthecluster.

das.respectVmHostSoftAffinityRules-Values:"false"(default)and"true"

ADDSCREENSHOTHERE!

vSphere6.xHADeepdive

72RestartingVirtualMachines

Page 73: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

VirtualSANandVirtualVolumesspecificsInthelastcoupleofsectionswehavediscussedtheinsandoutofHA.AllofitbasedonVMFSbasedorNFSbasedstorage.WiththeintroductionofVirtualSANandVirtualVolumesalsocomeschangestosomeofthediscussedconcepts.

HAandVirtualSANVirtualSANisVMware’sapproachtoSoftwareDefinedStorage.WearenotgoingtoexplaintheinsandoutsofVirtualSAN,butdowanttoprovideabasicunderstandingforthosewhohaveneverdoneanythingwithit.VirtualSANleverageshostlocalstorageandcreatesashareddatastoreoutofit.

Figure32-VirtualSANCluster

vSphere6.xHADeepdive

73VirtualSANandVirtualVolumesspecifics

Page 74: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

VirtualSANrequiresaminimumof3hostsandeachofthose3hostswillneedtohave1SSDforcachingand1capacitydevice(canbeSSDorHDD).Onlythecapacitydeviceswillcontributetotheavailablecapacityofthedatastore.Ifyouhave1TBworthofcapacitydevicesperhostthenwiththreehoststhetotalsizeofyourdatastorewillbe3TB.

Havingthatsaid,withVirtualSAN6.1VMwareintroduceda"2-node"option.This2-nodeoptionisactually2regularVSANnodeswithathird"witness"node.

ThebigdifferentiatorbetweenmoststoragesystemsandVirtualSANisthatavailabilityofthevirtualmachine’sisdefinedonapervirtualdiskorpervirtualmachinebasis.Thisiscalled“FailuresToTolerate”andcanbeconfiguredtoanyvaluebetween0(zero)and3.Whenconfiguredto0thenthevirtualmachinewillhaveonly1copyofitsvirtualdiskswhichmeansthatifahostfailswherethevirtualdisksarestoredthevirtualmachineislost.AssuchallvirtualmachinesaredeployedbydefaultwithFailuresToTolerate(FTT)setto1.AvirtualdiskiswhatVSANreferstoasanobject.Anobject,whenFTTisconfiguredas1orhigher,hasmultiplecomponents.InthediagrambelowwedemonstratetheFTT=1scenario,andthevirtualdiskinthiscasehas2"datacomponents"anda"witnesscomponents".Thewitnessisusedasa"quorom"mechnanism.

Figure33-VirtualSANObjectmodel

vSphere6.xHADeepdive

74VirtualSANandVirtualVolumesspecifics

Page 75: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Asthediagramabovedepicts,avirtualmachinecanberunningonthefirsthostwhileitsstoragecomponentsareontheremaininghostsinthecluster.AsyoucanimaginefromanHApointofviewthischangesthingsasaccesstothenetworkisnotonlycriticalforHAtofunctioncorrectlybutalsoforVirtualSAN.WhenitcomestonetworkingnotethatwhenVirtualSANisconfiguredinaclusterHAwillusethesamenetworkforitscommunications(heartbeatingetc).Ontopofthat,itisgoodtoknowthatVMwarehighlyrecommends10GbEtobeusedforVirtualSAN.

Basicdesignprinciple:10GbEishighlyrecommendforVirtualSAN,asvSphereHAalsoleveragestheVirtualSANnetworkandavailabilityofVMsisdependentonnetworkconnectivityensurethatataminimumtwo10GbEportsareusedandtwophysicalswitchesforresiliency.

ThereasonthatHAusesthesamenetworkasVirtualSANissimple,itistooavoidnetworkpartitionscenarioswhereHAcommunicationsisseparatedfromVirtualSANandthestateoftheclusterisunclear.NotethatyouwillneedtoensurethatthereisapingableisolationaddressontheVirtualSANnetworkandthisisolationaddresswillneedtobeconfiguredassuchthroughtheuseoftheadvancedsetting“das.isolationAddress0”.Wealsorecommendtodisabletheuseofthedefaultisolationaddressthroughtheadvancedsetting“das.useDefaultIsolationAddress”(settofalse).

Whenanisolationdoesoccurtheisolationresponseistriggeredasexplainedinearlierchapters.ForVirtualSANtherecommendationissimple,configuretheisolationresponseto“PowerOff,thenfailover”.Thisisthesafestoption.VirtualSANcanbecomparedtothe“convergednetworkwithIPbasedstorage”exampleweprovided.ItisveryeasytoreachasituationwhereahostisisolatedallvirtualmachinesremainrunningbutarerestartedonanotherhostbecausetheconnectiontotheVirtualSANdatastoreislost.

Basicdesignprinciple:ConfigureyourIsolationAddressandyourIsolationPolicyaccordingly.Werecommendselecting“poweroff”astheIsolationPolicyandareliablepingabledeviceastheisolationaddress.ItisrecommendedtoconfiguretheIsolationPolicyto“poweroff”.

WhataboutthingslikeheartbeatdatastoresandthefolderstructurethatexistsonaVMFSdatastore,hasanyofthatchangedwithVirtualSAN.Yesithas.Firstofall,ina“VirtualSAN”onlyenvironmenttheconceptofHearbeatDatastoresisnotusedatall.Thereasonforthisisstraightforward,asHAandVirtualSANsharethesamenetworkitissafetoassumethatwhentheHAheartbeatislostbecauseofanetworkfailuresoisaccesstotheVirtualSANdatastore.Onlyinanenvironmentwherethereisalsotraditionalstoragetheheartbeatdatastoreswillbeconfigured,leveragingthosetraditionaldatastoresasaheartbeatdatastore.NotethatwedonotfeelthereisareasontointroducetraditionalstoragejusttoprovideHAthisfunctionality,HAandVirtualSANworkperfectlyfinewithoutheartbeatdatastores.

vSphere6.xHADeepdive

75VirtualSANandVirtualVolumesspecifics

Page 76: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

NormallyHAmetadataisstoredintherootofthedatastore,forVirtualSANthisisdifferentasthemetadataisstoredintheVMsnamespaceobject.TheprotectedlistisheldinmemoryandupdatedautomaticallywhenVMsarepoweredonoroff.

Nowyoumaywonder,whathappenswhenthereisanisolation?HowdoesHAknowwheretostarttheVMthatisimpacted?Letstakealookatapartitionscenario.

Figure34-VSANPartitionscenario

Inthisscenariothereanetworkproblemhascausedaclusterpartition.WhereaVMisrestartedisdeterminedbywhichpartitionownsthevirtualmachinefiles.WithinaVSANclusterthisisfairlystraightforward.Therearetwopartitions,oneofwhichisrunningtheVMwithitsVMDKandtheotherpartitionhasaVMDKreplicaandawitness.Guesswhathappens?Right,VSANusesthewitnesstoseewhichpartitionhasquorumandbasedonthatresult,oneofthetwopartitionswillwin.Inthiscase,Partition2hasmorethan50%ofthecomponentsofthisobjectandassuchisthewinner.ThismeansthattheVMwillbe

vSphere6.xHADeepdive

76VirtualSANandVirtualVolumesspecifics

Page 77: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

restartedoneither“esxi-03″or“esxi-04″byvSphereHA.NotethattheVMinPartition1willbepoweredoffonlyifyouhaveconfiguredtheisolationresponsetodoso.Wewouldliketostressthatthisishighlyrecommended!(Isolationresponse–>poweroff)

HAandVirtualVolumesLetusstartwithfirstdescribingwhatVirtualVolumesisandwhatvalueitbringsforanadministrator.VirtualVolumeswasdevelopedtomakeyourlife(vSphereadmin)andthatofthestorageadministratoreasier.ThisisdonebyprovidingaframeworkthatenablesthevSphereadministratortoassignpoliciestovirtualmachinesorvirtualdisks.Inthesepoliciescapabilitiesofthestoragearraycanbedefined.Thesecapabilitiescanbethingslikesnapshotting,deduplication,raid-level,thin/thickprovisioningetc.WhatisofferedtothevSphereadministratorisuptotheStorageadministrator,andofcourseuptowhatthestoragesystemcanoffertobeginwith.Whenavirtualmachineisdeployedandapolicyisassignedthenthestoragesystemwillenablecertainfunctionalityofthearraybasedonwhatwasspecifiedinthepolicy.SonolongeraneedtoassigncapabilitiestoaLUNwhichholdsmanyVMs,butratheraperVMorevenperVMDKlevelcontrol.Sohowdoesthiswork?Wellletstakealookatanarchitecturaldiagramfirst.

vSphere6.xHADeepdive

77VirtualSANandVirtualVolumesspecifics

Page 78: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure35-VirtualVolumesArchitecture

ThediagramshowsacoupleofcomponentswhichareimportantintheVVolarchitecture.Letslistthemout:

ProtocolEndpointsakaPEVirtualDatastoreandaStorageContainerVendorProvider/VASAPoliciesVirtualVolumes

Letstakealookatallofthesethreeintheaboveorder.ProtocolEndpoints,whatarethey?

ProtocolEndpointsareliterallytheaccesspointtoyourstoragesystem.AllIOtovirtualvolumesisproxiedthroughaProtocolEndpointandyoucanhave1ormoreoftheseperstoragesystem,ifyourstoragesystemsupportshavingmultipleofcourse.(Implementationsofdifferentvendorswillvary.)PEsarecompatiblewithdifferentprotocols(FC,FCoE,iSCSI,NFS)andifyouaskmethatwholediscussionwithVirtualVolumeswillcometoanend.You

vSphere6.xHADeepdive

78VirtualSANandVirtualVolumesspecifics

Page 79: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

couldseeaProtocolEndpointasa“mountpoint”oradevice,andyestheywillcounttowardsyourmaximumnumberofdevicesperhost(256).(VirtualVolumesitselfwon’tcounttowardsthat!)

NextupistheStorageContainer.Thisistheplacewhereyoustoreyourvirtualmachines,orbettersaidwhereyourvirtualvolumesendup.TheStorageContainerisastoragesystemlogicalconstructandisrepresentedwithinvSphereasa“virtualdatastore”.Youneed1perstoragesystem,butyoucanhavemanywhendesired.TothisStorageContaineryoucanapplycapabilities.Soifyoulikeyourvirtualvolumestobeabletousearraybasedsnapshotsthenthestorageadministratorwillneedtoassignthatcapabilitytothestoragecontainer.Notethatastorageadministratorcangrowastoragecontainerwithouteveninformingyou.Astoragecontainerisn’tformattedwithVMFSoranythinglikethat,soyoudon’tneedtoincreasethevolumeinordertousethespace.

ButhowdoesvSphereknowwhichcontaineriscapableofdoingwhat?Inordertodiscoverastoragecontaineranditscapabilitiesweneedtobeabletotalktothestoragesystemfirst.ThisisdonethroughthevSphereAPIsforStorageAwareness.YousimplypointvSpheretotheVendorProviderandthevendorproviderwillreporttovSpherewhat’savailable,thisincludesboththestoragecontainersaswellasthecapabilitiestheypossess.NotethatasingleVendorProvidercanbemanagingmultiplestoragesystemswhichinitsturncanhavemultiplestoragecontainerswithmanycapabilities.Thesevendorproviderscanalsocomeindifferentflavours,forsomestoragesystemsitispartoftheirsoftwarebutforothersitwillcomeasavirtualappliancethatsitsontopofvSphere.

NowthatvSphereknowswhichsystemsthereare,whatcontainersareavailablewithwhichcapabilitiesyoucanstartcreatingpolicies.Thesepoliciescanbeacombinationofcapabilitiesandwillultimatelybeassignedtovirtualmachinesorvirtualdiskseven.YoucanimaginethatinsomecasesyouwouldlikeQualityofServiceenabledtoensureperformanceforaVMwhileinothercasesitisn’tasrelevantbutyouneedtohaveasnapshoteveryhour.Allofthisisenabledthroughthesepolicies.NolongerwillyoubemaintainingthatspreadsheetwithallyourLUNsandwhichdataservicewereenabledandwhatnot,noyousimplyassignapolicy.(Yes,apropernamingschemewillbehelpfulwhendefiningpolicies.)WhenrequirementschangeforaVMyoudon’tmovetheVMaround,noyouchangethepolicyandthestoragesystemwilldowhatisrequiredinordertomaketheVM(anditsdisks)compliantagainwiththepolicy.NottheVMreally,buttheVirtualVolumes.

Okay,thosearethebasics,nowwhataboutVirtualVolumesandvSphereHA.WhatchangeswhenyouarerunningVirtualVolumes,whatdoyouneedtokeepinmindwhenrunningVirtualVolumeswhenitcomestoHA?

Firstofall,letmementionthis,insomecasesstoragevendorshavedesignedasolutionwherethe"vendorprovider"isn'tdesignedinanHAfashion(VMwareallowsforActive/Active,Active/Standbyorjust"Active"asinasingleinstance).Makesuretovalidate

vSphere6.xHADeepdive

79VirtualSANandVirtualVolumesspecifics

Page 80: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

whatkindofimplementationyourstoragevendorhas,astheVendorProviderneedstobeavailablewhenpoweringonVMs.Thefollowingquoteexplainswhy:

WhenaVirtualVolumeiscreated,itisnotimmediatelyaccessibleforIO.ToAccessVirtualVolumes,vSphereneedstoissuea“Bind”operationtoaVASAProvider(VP),whichcreatesIOaccesspointforaVirtualVolumeonaProtocolEndpoint(PE)chosenbyaVP.AsinglePEcanbetheIOaccesspointformultipleVirtualVolumes.“Unbind”OperationwillremovethisIOaccesspointforagivenVirtualVolume.

Thatisthe"VirtualVolumes"implementationaspect,butofcoursethingshavealsochangedfromavSphereHApointofview.NolongerdowehaveVMFSorNFSdatastorestostorefilesonoruseforheartbeating.Whatchangesfromthatperspective.FirstofallaVMiscarvedupindifferentVirtualVolumes:

VMConfigurationVirtualMachineDisk'sSwapFileSnapshot(ifthereareany)

Besidesthesedifferenttypesofobjects,whenvSphereHAisenabledtherealsoisavolumeusedbyvSphereHAandthisvolumewillcontainallthemetadatawhichisnormallystoredunder"/<rootofdatastore>/.vSphere-HA/<cluster-specific-directory>/"onregularVMFS.ForeachFaultDomainaseperatefolderwillbecreatedinthisVVol.

AllVMrelatedHAfileswhichnormallywouldbeundertheVMfolder,likeforinstancethepower-offfile,arenowstoredintheVMConfigurationVVolobject.ConceptuallyspeakingsimilartoregularVMFS,implementationwisehowevercompletelydifferent.

AnotherthingthatchangeswithVVolsisHeartbeatDatastores.

BEINGWORKEDON-EARLYDRAFT

vSphere6.xHADeepdive

80VirtualSANandVirtualVolumesspecifics

Page 81: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

AddingResiliencytoHA(NetworkRedundancy)InthepreviouschapterweextensivelycoveredbothIsolationDetection,whichtriggerstheselectedIsolationResponseandtheimpactofafalsepositive.TheIsolationResponseenablesHAtorestartvirtualmachineswhen“Poweroff”or“Shutdown”hasbeenselectedandthehostbecomesisolatedfromthenetwork.However,thisalsomeansthatitispossiblethat,withoutproperredundancy,theIsolationResponsemaybeunnecessarilytriggered.Thisleadstodowntimeandshouldbeprevented.

Toincreaseresiliencyfornetworking,VMwareimplementedtheconceptofNICteaminginthehypervisorforbothVMkernelandvirtualmachinenetworking.WhendiscussingHA,thisisespeciallyimportantfortheManagementNetwork.

NICteamingistheprocessofgroupingtogetherseveralphysicalNICsintoonesinglelogicalNIC,whichcanbeusedfornetworkfaulttoleranceandloadbalancing.

Usingthismechanism,itispossibletoaddredundancytotheManagementNetworktodecreasethechancesofanisolationevent.Thisis,ofcourse,alsopossibleforother“Portgroups”butthatisnotthetopicofthischapterorbook.AnotheroptionisconfiguringanadditionalManagementNetworkbyenablingthe“managementnetwork”tickboxonanotherVMkernelport.AlittleunderstoodfactisthatiftherearemultipleVMkernelnetworksonthesamesubnet,HAwilluseallofthemformanagementtraffic,evenifonlyoneisspecifiedformanagementtraffic!

Althoughtherearemanyconfigurationspossibleandsupported,werecommendasimplebuthighlyresilientconfiguration.WehaveincludedthevMotion(VMkernel)networkinourexampleascombiningtheManagementNetworkandthevMotionnetworkonasinglevSwitchisthemostcommonlyusedconfigurationandanindustryacceptedbestpractice.

Requirements:

2physicalNICsVLANtrunking

Recommended:

2physicalswitchesIfavailable,enable“linkstatetracking”toensurelinkfailuresarereported

ThevSwitchshouldbeconfiguredasfollows:

vSphere6.xHADeepdive

81AddingresiliencytoHA

Page 82: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

vSwitch0:2PhysicalNICs(vmnic0andvmnic1).2Portgroups(ManagementNetworkandvMotionVMkernel).ManagementNetworkactiveonvmnic0andstandbyonvmnic1.vMotionVMkernelactiveonvmnic1andstandbyonvmnic0.FailbacksettoNo.

EachportgrouphasaVLANIDassignedandrunsdedicatedonitsownphysicalNIC;onlyinthecaseofafailureitisswitchedovertothestandbyNIC.Wehighlyrecommendsettingfailbackto“No”toavoidchancesofanunwantedisolationevent,whichcanoccurwhenaphysicalswitchroutesnotrafficduringbootbuttheportsarereportedas“up”.(NICTeamingTab)

Pros:Only2NICsintotalareneededfortheManagementNetworkandvMotionVMkernel,especiallyusefulinbladeserverenvironments.Easytoconfigure.

Cons:Justasingleactivepathforheartbeats.

Thefollowingdiagramdepictsthisactive/standbyscenario:

vSphere6.xHADeepdive

82AddingresiliencytoHA

Page 83: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure36-Active-StandbyManagementNetworkdesign

Toincreaseresiliency,wealsorecommendimplementingthefollowingadvancedsettingsandusingNICportsondifferentPCIbusses–preferablyNICsofadifferentmakeandmodel.Whenusingadifferentmakeandmodel,evenadriverfailurecouldbemitigated.

AdvancedSettings:das.isolationaddressX=<ip-address>

Theisolationaddresssettingisdiscussedinmoredetailinthesectiontitled"FundamentalConcepts".Inshort;itistheIPaddressthattheHAagentpingstoidentifyifthehostiscompletelyisolatedfromthenetworkorjustnotreceivinganyheartbeats.IfmultipleVMkernelnetworksondifferentsubnetsareused,itisrecommendedtosetanisolationaddresspernetworktoensurethateachofthesewillbeabletovalidateisolationofthehost.

Basicdesignprinciple:TakeadvantageofsomeofthebasicfeaturesvSpherehastoofferlikeNICteaming.CombiningdifferentphysicalNICswillincreaseoverallresiliencyofyoursolution.

vSphere6.xHADeepdive

83AddingresiliencytoHA

Page 84: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

CornerCaseScenario:Split-BrainAsplitbrainscenarioisascenariowhereasinglevirtualmachineispoweredupmultipletimes,typicallyontwodifferenthosts.Thisispossibleinthescenariowheretheisolationresponseissetto“leavepoweredon”andnetworkbasedstorage,likeNFS/iSCSIandevenVirtualSAN,isused.Thissituationcanoccurduringafullnetworkisolation,whichmayresultinthelockonthevirtualmachine’sVMDKbeinglost,enablingHAtoactuallypowerupthevirtualmachine.Asthevirtualmachinewasnotpoweredoffonitsoriginalhost(isolationresponsesetto“leavepoweredon”),itwillexistinmemoryontheisolatedhostandinmemorywithadisklockonthehostthatwasrequestedtorestartthevirtualmachine.

Keepinmindthatthistrulyisacornercasescenariowhichisveryunlikelytooccurinmostenvironments.Incaseitdoeshappen,HAreliesonthe“lostlockdetection”mechanismtomitigatethisscenario.InshortESXidetectsthatthelockontheVMDKhasbeenlostand,whenthedatastorebecomesaccessibleagainandthelockcannotbereacquired,issuesaquestionwhetherthevirtualmachineshouldbepoweredoff;HAautomaticallyanswersthequestionwithYes.However,youwillonlyseethisquestionifyoudirectlyconnecttotheESXihostduringthefailure.HAwillgenerateaneventforthisauto-answeredquestionthough.

Asstatedabovethequestionwillbeauto-answeredandthevirtualmachinewillbepoweredofftorecoverfromthesplitbrainscenario.Thequestionstillremains:inthecaseofanisolationwithiSCSIorNFS,shouldyoupoweroffvirtualmachinesorleavethempoweredon?

Asjustexplained,HAwillautomaticallypoweroffyouroriginalvirtualmachinewhenitdetectsasplit-brainscenario.Thisprocesshoweverisnotinstantaneousandassuchitisrecommendedtousetheisolationresponseof“PowerOff”or“Leavepoweredon.Wealsorecommendincreasingheartbeatnetworkresiliencytoavoidgettingintothissituation.WewilldiscusstheoptionsyouhaveforenhancingManagementNetworkresiliencyinthenextchapter.

LinkStateTrackingThiswasalreadybrieflymentionedinthelistofrecommendations,butthisfeatureissomethingwewouldliketoemphasize.Wehavenoticedthatpeopleoftenforgetaboutthiseventhoughmanyswitchesofferthiscapability,especiallyinbladeserverenvironments.

Linkstatetrackingwillmirrorthestateofanupstreamlinktoadownstreamlink.Let’sclarifythatwithadiagram.

vSphere6.xHADeepdive

84AddingresiliencytoHA

Page 85: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure37-LinkStatetrackingmechanism

Thediagramabovedepictsascenariowhereanuplinkofa“CoreSwitch”hasfailed.WithoutLinkStateTracking,theconnectionfromthe“EdgeSwitch”tovmnic0willbereportedasup.WithLinkStateTrackingenabled,thestateofthelinkonthe“EdgeSwitch”willreflectthestateofthelinkofthe“CoreSwitch”andassuchbemarkedas“down”.Youmightwonderwhythisisimportantbutthinkaboutitforasecond.ManyfeaturesthatvSphereofferrelyonnetworkingandsodoyourvirtualmachines.Inthecasewherethestateisnotreflected,somefunctionalitymightjustfail,forinstancenetworkheartbeatingcouldfailifitneedstoflowthroughthecoreswitch.Wecallthisa‘blackhole’scenario:thehostsendstrafficdownapaththatitbelievesisup,butthetrafficneverreachesitsdestinationduetothefailedupstreamlink.

Basicdesignprinciple:Knowyournetworkenvironment,talktothenetworkadministratorsandensureadvancedfeatureslikeLinkStateTrackingareusedwhenpossibletoincreaseresiliency.

vSphere6.xHADeepdive

85AddingresiliencytoHA

Page 86: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

AdmissionControlAdmissionControlismorethanlikelythemostmisunderstoodconceptvSphereholdstodayandbecauseofthisitisoftendisabled.However,AdmissionControlisamustwhenavailabilityneedstobeguaranteedandisn’tthatthereasonforenablingHAinthefirstplace?

WhatisHAAdmissionControlabout?WhydoesHAcontainthisconceptcalledAdmissionControl?The“AvailabilityGuide”a.k.aHAbiblestatesthefollowing:

vCenterServerusesadmissioncontroltoensurethatsufficientresourcesareavailableinaclustertoprovidefailoverprotectionandtoensurethatvirtualmachineresourcereservationsarerespected.

Pleasereadthatquoteagainandespeciallythefirsttwowords.IndeeditisvCenterthatisresponsibleforAdmissionControl,contrarytowhatmanybelieve.AlthoughthismightseemlikeatrivialfactitisimportanttounderstandthatthisimpliesthatAdmissionControlwillnotdisallowHAinitiatedrestarts.HAinitiatedrestartsaredoneonahostlevelandnotthroughvCenter.

Assaid,AdmissionControlguaranteesthatcapacityisavailableforanHAinitiatedfailoverbyreservingresourceswithinacluster.Itcalculatesthecapacityrequiredforafailoverbasedonavailableresources.Inotherwords,ifahostisplacedintomaintenancemodeordisconnected,itistakenoutoftheequation.Thisalsoimpliesthatifahosthasfailedorisnotrespondingbuthasnotbeenremovedfromthecluster,itisstillincludedintheequation.“AvailableResources”indicatesthatthevirtualizationoverheadhasalreadybeensubtractedfromthetotalamount.

Togiveanexample;VMkernelmemoryissubtractedfromthetotalamountofmemorytoobtainthememoryavailablememoryforvirtualmachines.ThereisonegotchawithAdmissionControlthatwewanttobringtoyourattentionbeforedrillingintothedifferentpolicies.WhenAdmissionControlisenabled,HAwillinnowayviolateavailabilityconstraints.Thismeansthatitwillalwaysensuremultiplehostsareupandrunningandthisappliesformanualmaintenancemodeactionsand,forinstance,toVMwareDistributedPowerManagement.So,ifahostisstucktryingtoenterMaintenanceMode,rememberthatitmightbeHAwhichisnotallowingMaintenanceModetoproceedasitwouldviolatetheAdmissionControlPolicy.Inthissituation,userscanmanuallyvMotionvirtualmachinesoffthehostortemporarilydisableadmissioncontroltoallowtheoperationtoproceed.

vSphere6.xHADeepdive

86AdmissionControl

Page 87: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

ButwhatifyouusesomethinglikeDistributedPowerManagement(DPM),wouldthatplaceallhostsinstandbymodetoreducepowerconsumption?No,DPMissmartenoughtotakehostsoutofstandbymodetoensureenoughresourcesareavailabletoprovideforHAinitiatedfailovers.Ifbyanychancetheresourcesarenotavailable,HAwillwaitfortheseresourcestobemadeavailablebyDPMandthenattempttherestartofthevirtualmachines.Inotherwords,theretrycount(5retriesbydefault)isnotwastedinscenarioslikethese.

AdmissionControlPolicyTheAdmissionControlPolicydictatesthemechanismthatHAusestoguaranteeenoughresourcesareavailableforanHAinitiatedfailover.ThissectiongivesageneraloverviewoftheavailableAdmissionControlPolicies.Theimpactofeachpolicyisdescribedinthefollowingsection,includingourrecommendation.HAhasthreemechanismstoguaranteeenoughcapacityisavailabletorespectvirtualmachineresourcereservations.

Figure38-Admissioncontrolpolicy

vSphere6.xHADeepdive

87AdmissionControl

Page 88: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

BelowwehavelistedallthreeoptionscurrentlyavailableastheAdmissionControlPolicy.Eachoptionhasadifferentmechanismtoensureresourcesareavailableforafailoverandeachoptionhasitscaveats.

AdmissionControlMechanismsEachAdmissionControlPolicyhasitsownAdmissionControlmechanism.UnderstandingeachoftheseAdmissionControlmechanismsisimportanttoappreciatetheimpacteachonehasonyourclusterdesign.Forinstance,settingareservationonaspecificvirtualmachinecanhaveanimpactontheachievedconsolidationratio.ThissectionwilltakeyouonajourneythroughthetrenchesofAdmissionControlPoliciesandtheirrespectivemechanismsandalgorithms.

HostFailuresClusterTolerates

TheAdmissionControlPolicythathasbeenaroundthelongestisthe“HostFailuresClusterTolerates”policy.ItisalsohistoricallytheleastunderstoodAdmissionControlPolicyduetoitscomplexadmissioncontrolmechanism.

ThisadmissioncontrolpolicycanbeconfiguredinanN-1fashion.Thismeansthatthenumberofhostfailuresyoucanspecifyina32hostclusteris31.

WithinthevSphereWebClientitispossibletomanuallyspecifytheslotsizeascanbeseeninthebelowscreenshot.ThevSphereWebClientalsoallowsyoutoviewwhichvirtualmachinesspanmultipleslots.Thiscanbeveryusefulinscenarioswheretheslotsizehasbeenexplicitlyspecified,wewillexplainwhyinjustasecond.

vSphere6.xHADeepdive

88AdmissionControl

Page 89: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure39-HostFailures

Theso-called“slots”mechanismisusedwhenthe“Hostfailuresclustertolerates”hasbeenselectedastheAdmissionControlPolicy.Thedetailsofthismechanismhavechangedseveraltimesinthepastanditisoneofthemostrestrictivepolicies;morethanlikely,itisalsotheleastunderstood.

SlotsdictatehowmanyvirtualmachinescanbepoweredonbeforevCenterstartsyelling“OutOfResources!”Normally,aslotrepresentsonevirtualmachine.AdmissionControldoesnotlimitHAinrestartingvirtualmachines,itensuresenoughunfragmentedresourcesareavailabletopoweronallvirtualmachinesintheclusterbypreventing“over-commitment”.Technicallyspeaking“over-commitment”isnotthecorrectterminologyasAdmissionControlensuresvirtualmachinereservationscanbesatisfiedandthatallvirtualmachines’initialmemoryoverheadrequirementsaremet.Althoughwehavealreadytouchedonthis,itdoesn’thurtrepeatingitasitisoneofthosemythsthatkeepscomingback;HAinitiatedfailoversarenotpronetotheAdmissionControlPolicy.AdmissionControlisdonebyvCenter.HAinitiatedrestarts,inanormalscenario,areexecuteddirectlyontheESXihostwithouttheuseofvCenter.Thecorner-caseiswhereHArequestsDRS(DRSisavCentertask!)todefragmentresourcesbutthatisbesidethepoint.EvenifresourcesarelowandvCenterwouldcomplain,itcouldn’tstoptherestartfromhappening.

Let’sdigintothisconceptwehavejustintroduced,slots.

AslotisdefinedasalogicalrepresentationofthememoryandCPUresourcesthatsatisfythereservationrequirementsforanypowered-onvirtualmachineinthecluster.

InotherwordsaslotistheworstcaseCPUandmemoryreservationscenarioinacluster.Thisdirectlyleadstothefirst“gotcha.”

vSphere6.xHADeepdive

89AdmissionControl

Page 90: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

HAusesthehighestCPUreservationofanygivenpowered-onvirtualmachineandthehighestmemoryreservationofanygivenpowered-onvirtualmachineinthecluster.Ifnoreservationofhigherthan32MHzisset,HAwilluseadefaultof32MHzforCPU.Ifnomemoryreservationisset,HAwilluseadefaultof0MB+memoryoverheadformemory.(SeetheVMwarevSphereResourceManagementGuideformoredetailsonmemoryoverheadpervirtualmachineconfiguration.)Thefollowingexamplewillclarifywhat“worst-case”actuallymeans.

Example:Ifvirtualmachine“VM1”has2GHzofCPUreservedand1024MBofmemoryreservedandvirtualmachine“VM2”has1GHzofCPUreservedand2048MBofmemoryreservedtheslotsizeformemorywillbe2048MB(+itsmemoryoverhead)andtheslotsizeforCPUwillbe2GHz.Itisacombinationofthehighestreservationofbothvirtualmachinesthatleadstothetotalslotsize.ReservationsdefinedattheResourcePoollevelhowever,willnotaffectHAslotsizecalculations.

Basicdesignprinciple:Bereallycarefulwithreservations,ifthere’snoneedtohavethemonapervirtualmachinebasis;don’tconfigurethem,especiallywhenusinghostfailuresclustertolerates.Ifreservationsareneeded,resorttoresourcepoolbasedreservations.

Nowthatweknowtheworst-casescenarioisalwaystakenintoaccountwhenitcomestoslotsizecalculations,wewilldescribewhatdictatestheamountofavailableslotsperclusterasthatultimatelydictateshowmanyvirtualmachinescanbepoweredoninyourcluster.

First,wewillneedtoknowtheslotsizeformemoryandCPU,nextwewilldividethetotalavailableCPUresourcesofahostbytheCPUslotsizeandthetotalavailablememoryresourcesofahostbythememoryslotsize.ThisleavesuswithatotalnumberofslotsforbothmemoryandCPUforahost.Themostrestrictivenumber(worst-casescenario)isthenumberofslotsforthishost.Inotherwords,whenyouhave25CPUslotsbutonly5memoryslots,theamountofavailableslotsforthishostwillbe5asHAalwaystakestheworstcasescenariointoaccountto“guarantee”allvirtualmachinescanbepoweredonincaseofafailureorisolation.

ThequestionwereceivealotishowdoIknowwhatmyslotsizeis?ThedetailsaroundslotsizescanbemonitoredontheHAsectionoftheCluster’sMonitortabbycheckingthethe“AdvancedRuntimeInfo”sectionwhenthe“HostFailures”AdmissionControlPolicyisconfigured.

vSphere6.xHADeepdive

90AdmissionControl

Page 91: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure40-HighAvailabilityclustermonitorsection

AdvancedRuntimeInfowillshowthespecificstheslotsizeandmoreusefuldetailssuchasthenumberofslotsavailableasdepictedinFigure30.

vSphere6.xHADeepdive

91AdmissionControl

Page 92: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure41-HighAvailabilityadvancedruntimeinfo

Asyoucanimagine,usingreservationsonapervirtualmachinebasiscanleadtoveryconservativeconsolidationratios.However,thisissomethingthatisconfigurablethroughtheWebClient.Ifyouhavejustonevirtualmachinewithareallyhighreservation,youcansetanexplicitslotsizebygoingto“EditClusterServices”andspecifyingthemundertheAdmissionControlPolicysectionasshowninFigure29.

Ifoneoftheseadvancedsettingsisused,HAwillensurethatthevirtualmachinethatskewedthenumberscanberestartedby“assigning”multipleslotstoit.However,whenyouarelowonresources,thiscouldmeanthatyouarenotabletopoweronthevirtualmachinewiththisreservationbecauseresourcesmaybefragmentedthroughouttheclusterinsteadofavailableonasinglehost.HAwillnotifyDRSthatapower-onattemptwasunsuccessfulandarequestwillbemadetodefragmenttheresourcestoaccommodatetheremainingvirtualmachinesthatneedtobepoweredon.InorderforthistobesuccessfulDRSwillneedtobeenabledandconfiguredtofullyautomated.WhennotconfiguredtofullyautomateduseractionisrequiredtoexecuteDRSrecommendations.

vSphere6.xHADeepdive

92AdmissionControl

Page 93: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Thefollowingdiagramdepictsascenariowhereavirtualmachinespansmultipleslots:

Figure42-VirtualmachinespanningmultipleHAslots

Noticethatbecausethememoryslotsizehasbeenmanuallysetto1024MB,oneofthevirtualmachines(groupedwithdottedlines)spansmultipleslotsduetoa4GBmemoryreservation.Asyoumighthavenoticed,noneofthehostshasenoughresourcesavailabletosatisfythereservationofthevirtualmachinethatneedstofailover.Althoughintotalthereareenoughresourcesavailable,theyarefragmentedandHAwillnotbeabletopower-onthisparticularvirtualmachinedirectlybutwillrequestDRStodefragmenttheresourcestoaccommodatethisvirtualmachine’sresourcerequirements.

AdmissionControldoesnottakefragmentationofslotsintoaccountwhenslotsizesaremanuallydefinedwithadvancedsettings.Itwilltakethenumberofslotsthisvirtualmachinewillconsumeintoaccountbysubtractingthemfromthetotalnumberofavailableslots,butitwillnotverifytheamountofavailableslotsperhosttoensurefailover.Asstatedearlier,though,HAwillrequestDRStodefragmenttheresources.Thisisbynomeansaguaranteeofasuccessfulpower-onattempt.

vSphere6.xHADeepdive

93AdmissionControl

Page 94: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Basicdesignprinciple:Avoidusingadvancedsettingstodecreasetheslotsizeasitcouldleadtomoredowntimeandaddsanextralayerofcomplexity.Ifthereisalargediscrepancyinsizeandreservationswerecommendusingthepercentagebasedadmissioncontrolpolicy.

WithinthevSphereWebClientthereisfunctionalitywhichenablesyoutoidentifyvirtualmachineswhichspanmultipleslots,asshowninFigure29.Wehighlyrecommendmonitoringthissectiononaregularbasistogetabetterunderstandofyourenvironmentandtoidentifythosevirtualmachinesthatmightbeproblematictorestartincaseofahostfailure.

UnbalancedConfigurationsandImpactonSlotCalculation

Itisanindustrybestpracticetocreateclusterswithsimilarhardwareconfigurations.However,manycompaniesstartedoutwithasmallVMwareclusterwhenvirtualizationwasfirstintroduced.Whenthetimehascometoexpand,chancesarefairlylargethesamehardwareconfigurationisnolongeravailable.Thequestioniswillyouaddthenewlyboughthoststothesameclusterorcreateanewcluster?

FromaDRSperspective,largeclustersarepreferredasitincreasestheloadbalancingopportunities.HoweverthereisacaveatforDRSaswell,whichisdescribedintheDRSsectionofthisbook.ForHA,thereisabigcaveat.WhenyouthinkaboutitandunderstandtheinternalworkingsofHA,morespecificallytheslotalgorithm,youprobablyalreadyknowwhatiscomingup.

Let’sfirstdefinetheterm“unbalancedcluster.”

Anunbalancedclusterwould,forinstance,beaclusterwith3hostsofwhichonecontainssubstantiallymorememorythantheotherhostsinthecluster.

Let’strytoclarifythatwithanexample.

Example:Whatwouldhappentothetotalnumberofslotsinaclusterofthefollowingspecifications?

ThreehostclusterTwohostshave16GBofavailablememoryOnehosthas32GBofavailablememory

Thethirdhostisabrandnewhostthathasjustbeenboughtandaspricesofmemorydroppedimmenselythedecisionwasmadetobuy32GBinsteadof16GB.

Theclustercontainsavirtualmachinethathas1vCPUand4GBofmemory.A1024MBmemoryreservationhasbeendefinedonthisvirtualmachine.Asexplainedearlier,areservationwilldictatetheslotsize,whichinthiscaseleadstoamemoryslotsizeof1024

vSphere6.xHADeepdive

94AdmissionControl

Page 95: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

MB+memoryoverhead.Forthesakeofsimplicity,wewillcalculatewith1024MB.Thefollowingdiagramdepictsthisscenario:

Figure43-HighAvailabilitymemoryslotsize

WhenAdmissionControlisenabledandthenumberofhostfailureshasbeenselectedastheAdmissionControlPolicy,thenumberofslotswillbecalculatedperhostandtheclusterintotal.Thiswillresultin:

Host Numberofslots

ESXi-01 16Slots

ESXi-02 16Slots

ESXi-03 32Slots

AsAdmissionControlisenabled,aworst-casescenarioistakenintoaccount.Whenasinglehostfailurehasbeenspecified,thismeansthatthehostwiththelargestnumberofslotswillbetakenoutoftheequation.Inotherwords,forourcluster,thiswouldresultin:

ESXi-01+ESXi-02=32slotsavailable

vSphere6.xHADeepdive

95AdmissionControl

Page 96: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Althoughyouhavedoubledtheamountofmemoryinoneofyourhosts,youarestillstuckwithonly32slotsintotal.Asclearlydemonstrated,thereisabsolutelynopointinbuyingadditionalmemoryforasinglehostwhenyourclusterisdesignedwithAdmissionControlenabledandthenumberofhostfailureshasbeenselectedastheAdmissionControlPolicy.

Inourexample,thememoryslotsizehappenedtobethemostrestrictive;however,thesameprincipleapplieswhenCPUslotsizeismostrestrictive.

Basicdesignprinciple:Whenusingadmissioncontrol,balanceyourclustersandbeconservativewithreservationsasitleadstodecreasedconsolidationratios.

Now,whatwouldhappeninthescenarioabovewhenthenumberofallowedhostfailuresisto2?InthiscaseESXi-03istakenoutoftheequationandoneofanyoftheremaininghostsintheclusterisalsotakenout,resultingin16slots.Thismakessense,doesn’tit?

CanyouavoidlargeHAslotsizesduetoreservationswithoutresortingtoadvancedsettings?That’sthequestionwegetalmostdailyandtheansweristhe“PercentageofClusterResourcesReserved”admissioncontrolmechanism.

PercentageofClusterResourcesReserved

ThePercentageofClusterResourcesReservedadmissioncontrolpolicyisoneofthemostusedadmissioncontrolpolicies.Thesimplereasonforthisisthatitistheleastrestrictiveandmostflexible.Itisalsoveryeasytoconfigureasshowninthescreenshotbelow.

Figure44-SettingadifferentpercentageforCPU/Memory

ThemainadvantageofthepercentagebasedAdmissionControlPolicyisthatitavoidsthecommonlyexperiencedslotsizeissuewherevaluesareskewedduetoalargereservation.Butifitdoesn’tusetheslotalgorithm,whatdoesituse?

Whenyouspecifyapercentage,andlet’sassumefornowthatthepercentageforCPUandmemorywillbeconfiguredequally,thatpercentageofthetotalamountofavailableresourceswillstayreservedforHApurposes.Firstofall,HAwilladdupallavailable

vSphere6.xHADeepdive

96AdmissionControl

Page 97: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

resourcestoseehowmuchithasavailable(virtualizationoverheadwillbesubtracted)intotal.Then,HAwillcalculatehowmuchresourcesarecurrentlyreservedbyaddingupallreservationsformemoryandforCPUforallpoweredonvirtualmachines.

Forthosevirtualmachinesthatdonothaveareservation,adefaultof32MHzwillbeusedforCPUandadefaultof0MB+memoryoverheadwillbeusedforMemory.(Amountofoverheadperconfigurationtypecanbefoundinthe“UnderstandingMemoryOverhead”sectionoftheResourceManagementguide.)

Inotherwords:

((totalamountofavailableresources–totalreservedvirtualmachineresources)/totalamountofavailableresources)<=(percentageHAshouldreserveassparecapacity)

Totalreservedvirtualmachineresourcesincludesthedefaultreservationof32MHzandthememoryoverheadofthevirtualmachine.

Let’suseadiagramtomakeitabitclearer:

Figure45-Percentageofclusterresourcesreserved

Totalclusterresourcesare24GHz(CPU)and96GB(MEM).Thiswouldleadtothefollowingcalculations:

((24GHz-(2GHz+1GHz+32MHz+4GHz))/24GHz)=69%available

vSphere6.xHADeepdive

97AdmissionControl

Page 98: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

((96GB-(1,1GB+114MB+626MB+3,2GB)/96GB=85%available

Asyoucansee,theamountofmemorydiffersfromthediagram.Evenifareservationhasbeenset,theamountofmemoryoverheadisaddedtothereservation.ThisexamplealsodemonstrateshowkeepingCPUandmemorypercentageequalcouldcreateanimbalance.Ideally,ofcourse,thehostsareprovisionedinsuchawaythatthereisnoCPU/memoryimbalance.Experienceovertheyearshasproven,unfortunately,thatmostenvironmentsrunoutofmemoryresourcesfirstandthismightneedtobefactoredinwhencalculatingthecorrectvalueforthepercentage.However,thistrendmightbechangingasmemoryisgettingcheapereveryday.

Inordertoensurevirtualmachinescanalwaysberestarted,AdmissionControlwillconstantlymonitorifthepolicyhasbeenviolatedornot.PleasenotethatthisAdmissionControlprocessispartofvCenterandnotoftheESXihost!Whenoneofthethresholdsisreached,memoryorCPU,AdmissionControlwilldisallowpoweringonanyadditionalvirtualmachinesasthatcouldpotentiallyimpactavailability.ThesethresholdscanbemonitoredontheHAsectionoftheCluster’ssummarytab.

Figure46-HighAvailabilitysummary

Ifyouhaveanunbalancedcluster(hostswithdifferentsizesofCPUormemoryresources),yourpercentageshouldbeequalorpreferablylargerthanthepercentageofresourcesprovidedbythelargesthost.Thiswayyouensurethatallvirtualmachinesresidingonthishostcanberestartedincaseofahostfailure.

Asearlierexplained,thisAdmissionControlPolicydoesnotuseslots.Assuch,resourcesmightbefragmentedthroughoutthecluster.AlthoughDRSisnotifiedtorebalancethecluster,ifneeded,toaccommodatethesevirtualmachinesresourcerequirements,a

vSphere6.xHADeepdive

98AdmissionControl

Page 99: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

guaranteecannotbegiven.Werecommendselectingthehighestrestartpriorityforthisvirtualmachine(ofcourse,dependingontheSLA)toensureitwillbeabletoboot.

Thefollowingexampleanddiagram(Figure37)willmakeitmoreobvious:Youhave3hosts,eachwithroughly80%memoryusage,andyouhaveconfiguredHAtoreserve20%ofresourcesforbothCPUandmemory.Ahostfailsandallvirtualmachineswillneedtofailover.Oneofthosevirtualmachineshasa4GBmemoryreservation.Asyoucanimagine,HAwillnotbeabletoinitiateapower-onattempt,astherearenotenoughmemoryresourcesavailabletoguaranteethereservedcapacity.Insteadaneventwillgetgeneratedindicating"notenoughresourcesforfailover"forthisvirtualmachine.

Figure47-Availableresources

Basicdesignprinciple:AlthoughHAwillutilizeDRStotrytoaccommodatefortheresourcerequirementsofthisvirtualmachineaguaranteecannotbegiven.Dothemath;verifythatanysinglehosthasenoughresourcestopower-onyourlargestvirtualmachine.Alsotakerestartpriorityintoaccountforthis/thesevirtualmachine(s).

FailoverHosts

ThethirdoptiononecouldchooseistoselectoneormultipledesignatedFailoverhosts.Thisiscommonlyreferredtoasahotstandby.

vSphere6.xHADeepdive

99AdmissionControl

Page 100: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure48-SelectfailoverhostsAdmissionControlPolicy

Itis“whatyouseeiswhatyouget”.Whenyoudesignatehostsasfailoverhosts,theywillnotparticipateinDRSandyouwillnotbeabletorunvirtualmachinesonthesehosts!Thesehostsareliterallyreservedforfailoversituations.HAwillattempttousethesehostsfirsttofailoverthevirtualmachines.If,forwhateverreason,thisisunsuccessful,itwillattemptafailoveronanyoftheotherhosts.Forexample,whenthreehostswouldfail,includingthehostsdesignatedasfailoverhosts,HAwillstilltrytorestarttheimpactedvirtualmachinesonthehostthatisleft.Althoughthishostwasnotadesignatedfailoverhost,HAwilluseittolimitdowntime.

vSphere6.xHADeepdive

100AdmissionControl

Page 101: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure49-Selectmultiplefailoverhosts

DecisionMakingTimeAswithanydecisionyoumake,thereisanimpacttoyourenvironment.Thisimpactcouldbepositivebutalso,forinstance,unexpected.ThisespeciallygoesforHAAdmissionControl.SelectingtherightAdmissionControlPolicycanleadtoaquickerReturnOnInvestmentandalowerTotalCostofOwnership.Intheprevioussection,wedescribedallthealgorithmsandmechanismsthatformAdmissionControlandinthissectionwewillfocusmoreonthedesignconsiderationsaroundselectingtheappropriateAdmissionControlPolicyforyouroryourcustomer’senvironment.

ThefirstdecisionthatwillneedtobemadeiswhetherAdmissionControlwillbeenabled.WegenerallyrecommendenablingAdmissionControlasitistheonlywayofguaranteeingyourvirtualmachineswillbeallowedtorestartafterafailure.Itisimportant,though,thatthepolicyiscarefullyselectedandfitsyouroryourcustomer’srequirements.

Basicdesignprinciple

Admissioncontrolguaranteesenoughcapacityisavailableforvirtualmachinefailover.Assuchwerecommendenablingit.

vSphere6.xHADeepdive

101AdmissionControl

Page 102: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Althoughwealreadyhaveexplainedallthemechanismsthatarebeingusedbyeachofthepoliciesintheprevioussection,wewillgiveahighleveloverviewandlistalltheprosandconsinthissection.Ontopofthat,wewillexpandonwhatwefeelisthemostflexibleAdmissionControlPolicyandhowitshouldbeconfiguredandcalculated.

HostFailuresClusterTolerates

ThisoptionishistoricallyspeakingthemostusedforAdmissionControl.MostenvironmentsaredesignedwithanN+1redundancyandN+2isalsonotuncommon.ThisAdmissionControlPolicyuses“slots”toensureenoughcapacityisreservedforfailover,whichisafairlycomplexmechanism.SlotsarebasedonVM-levelreservationsandifreservationsarenotusedadefaultslotsizeforCPUof32MHzisdefinedandformemorythelargestmemoryoverheadofanygivenvirtualmachineisused.

Pros:

Fullyautomated(Whenahostisaddedtoacluster,HAre-calculateshowmanyslotsareavailable.)Guaranteesfailoverbycalculatingslotsizes.

Cons:

Canbeveryconservativeandinflexiblewhenreservationsareusedasthelargestreservationdictatesslotsizes.Unbalancedclustersleadtowastageofresources.Complexityforadministratorfromcalculationperspective.

PercentageasClusterResourcesReserved

ThepercentagebasedAdmissionControlisbasedonper-reservationcalculationinsteadoftheslotsmechanism.ThepercentagebasedAdmissionControlPolicyislessconservativethan“HostFailures”andmoreflexiblethan“FailoverHosts”.

Pros:

Accurateasitconsidersactualreservationpervirtualmachinetocalculateavailablefailoverresources.Clusterdynamicallyadjustswhenresourcesareadded.

Cons:

Manualcalculationsneededwhenaddingadditionalhostsinaclusterandnumberofhostfailuresneedstoremainunchanged.Unbalancedclusterscanbeaproblemwhenchosenpercentageistoolowand

vSphere6.xHADeepdive

102AdmissionControl

Page 103: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

resourcesarefragmented,whichmeansfailoverofavirtualmachinecan’tbeguaranteedasthereservationofthisvirtualmachinemightnotbeavailableasablockofresourcesonasinglehost.

Pleasenotethat,althoughafailovercannotbeguaranteed,therearefewscenarioswhereavirtualmachinewillnotbeabletorestartduetotheintegrationHAofferswithDRSandthefactthatmostclustershavesparecapacityavailabletoaccountforvirtualmachinedemandvariance.Althoughthisisacorner-casescenario,itneedstobeconsideredinenvironmentswhereabsoluteguaranteesmustbeprovided.

SpecifyFailoverHosts

Withthe“SpecifyFailoverHosts”AdmissionControlPolicy,whenoneormultiplehostsfail,HAwillattempttorestartallvirtualmachinesonthedesignatedfailoverhosts.Thedesignatedfailoverhostsareessentially“hotstandby”hosts.Inotherwords,DRSwillnotmigratevirtualmachinestothesehostswhenresourcesarescarceortheclusterisimbalanced.

Pros:

Whatyouseeiswhatyouget.Nofragmentedresources.

Cons:

Whatyouseeiswhatyouget.Dedicatedfailoverhostsnotutilizedduringnormaloperations.

RecommendationsWehavebeenaskedmanytimesforourrecommendationonAdmissionControlanditisdifficulttoansweraseachpolicyhasitsprosandcons.However,wegenerallyrecommendaPercentagebasedAdmissionControlPolicy.Itisthemostflexiblepolicyasitusestheactualreservationpervirtualmachineinsteadoftakinga“worstcase”scenarioapproachlikethenumberofhostfailuresdoes.However,thenumberofhostfailurespolicyguaranteesthefailoverlevelunderallcircumstances.Percentagebasedislessrestrictive,butofferslowerguaranteesthatinallscenariosHAwillbeabletorestartallvirtualmachines.WiththeaddedlevelofintegrationbetweenHAandDRSwebelieveaPercentagebasedAdmissionControlPolicywillfitmostenvironments.

vSphere6.xHADeepdive

103AdmissionControl

Page 104: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Basicdesignprinciple:Dothemath,andtakecustomerrequirementsintoaccount.Werecommendusinga“percentage”basedadmissioncontrolpolicy,asitisthemostflexible.

NowthatwehaverecommendedwhichAdmissionControlPolicytouse,thenextstepistoprovideguidancearoundselectingthecorrectpercentage.Wecannottellyouwhattheidealpercentageisasthattotallydependsonthesizeofyourclusterand,ofcourse,onyourresiliencymodel(N+1vs.N+2).Wecan,however,provideguidelinesaroundcalculatinghowmuchofyourresourcesshouldbesetasideandhowtopreventwastingresources.

SelectingtheRightPercentageItisacommonstrategytoselectasinglehostasapercentageofresourcesreservedforfailover.Wegeneårallyrecommendselectingapercentagewhichistheequivalentofasingleormultiplehosts,Let’sexplainwhyandwhattheimpactisofnotusingtheequivalentofasingleormultiplehosts.

Let’sstartwithanexample:aclusterexistsof8ESXihosts,eachcontaining70GBofavailableRAM.Thismightsoundlikeanawkwardmemoryconfigurationbuttosimplifythingswehavealreadysubtracted2GBasvirtualizationoverhead.Althoughvirtualizationoverheadisprobablylessthan2GB,wehaveusedthisnumbertomakecalculationseasier.ThisexamplezoomsinonmemorybutthisconceptalsoappliestoCPU,ofcourse.

ForthisclusterwewilldefinethepercentageofresourcestoreserveforbothMemoryandCPUto20%.Formemory,thisleadstoatotalclustermemorycapacityof448GB:

(70GB+70GB+70GB+70GB+70GB+70GB+70GB+70GB)*(1–20%)

Atotalof112GBofmemoryisreservedasfailovercapacity.

Onceapercentageisspecified,thatpercentageofresourceswillbeunavailableforvirtualmachines,thereforeitmakessensetosetthepercentageasclosetothevaluethatequalstheresourcesasingle(ormultiple)hostrepresents.Wewilldemonstratewhythisisimportantinsubsequentexamples.

Intheexampleabove,20%wasusedtobereservedforresourcesinan8-hostcluster.Thisconfigurationreservesmoreresourcesthanasinglehostcontributestothecluster.HA’smainobjectiveistoprovideautomaticrecoveryforvirtualmachinesafteraphysicalserverfailure.Forthisreason,itisrecommendedtoreserveresourcesequaltoasingleormultiplehosts.Whenusingtheper-hostlevelgranularityinan8-hostcluster(homogeneousconfiguredhosts),theresourcecontributionperhosttotheclusteris12.5%.However,the

vSphere6.xHADeepdive

104AdmissionControl

Page 105: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

percentageusedmustbeaninteger(wholenumber).Itisrecommendedtorounduptothevalueguaranteeingthatthefullcapacityofonehostisprotected,inthisexample(Figure40),theconservativeapproachwouldleadtoapercentageof13%.

Figure50-Settingthecorrectvalue

AggressiveApproach

Wehaveseenmanyenvironmentswherethepercentagewassettoavaluethatwaslessthanthecontributionofasinglehosttothecluster.Althoughthisapproachreducestheamountofresourcesreservedforaccommodatinghostfailuresandresultsinhigherconsolidationratios,italsooffersalowerguaranteethatHAwillbeabletorestartallvirtualmachinesafterafailure.Onemightarguethatthisapproachwillmorethanlikelyworkasmostenvironmentswillnotbefullyutilized;howeveritalsodoeseliminatetheguaranteethatafterafailureallvirtualmachineswillberecovered.Wasn’tthatthereasonforenablingHAinthefirstplace?

AddingHoststoYourCluster

Althoughthepercentageisdynamicandcalculatescapacityatacluster-level,changestoyourselectedpercentagemightberequiredwhenexpandingthecluster.Thereasonbeingthattheamountofreservedresourcesforafail-overmightnotcorrespondwiththecontributionperhostandasaresultleadtoresourcewastage.Forexample,adding4hoststoan8-hostclusterandcontinuingtousethepreviouslyconfiguredadmissioncontrolpolicyvalueof13%willresultinafailovercapacitythatisequivalentto1.5hosts.Figure41depicts

vSphere6.xHADeepdive

105AdmissionControl

Page 106: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

ascenariowherean8-hostclusterisexpandedto12hosts.Eachhostholds82GHzcoresand70GBofmemory.Theclusterwasoriginallyconfiguredwithadmissioncontrolsetto13%,whichequalsto109.2GBand24.96GHz.Iftherequirementistoallowasinglehostfailure7.68Ghzand33.6GBis“wasted”asclearlydemonstratedinthediagrambelow.

Figure51-Avoidwastingresources

HowtoDefineYourPercentage?

AsexplainedearlieritwillfullydependontheN+Xmodelthathasbeenchosen.Basedonthismodel,werecommendselectingapercentagethatequalstheamountofresourcesasinglehostrepresents.So,inthecaseofan8hostclusterandN+2resiliency,thepercentageshouldbesetasfollows:2/8(*100)=25%

Basicdesignprinciple:InordertoavoidwastingresourceswerecommendcarefullyselectingyourN+Xresiliencyarchitecture.Calculatetherequiredpercentagebasedonthisarchitecture.

vSphere6.xHADeepdive

106AdmissionControl

Page 107: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

VMandApplicationMonitoringVMandApplicationMonitoringisanoftenoverlookedbutreallypowerfulfeatureofHA.ThereasonforthisismostlikelythatitisdisabledbydefaultandrelativelynewcomparedtoHA.WehavetriedtogatheralltheinformationwecouldaroundVMandApplicationMonitoring,butitisaprettystraightforwardproductthatactuallydoeswhatyouexpectitwoulddo.

Figure52-VMandApplicationMonitoring

WhyDoYouNeedVM/ApplicationMonitoring?VMandApplicationMonitoringactsonadifferentlevelfromHA.VM/AppMonitoringrespondstoasinglevirtualmachineorapplicationfailureasopposedtoHAwhichrespondstoahostfailure.Anexampleofasinglevirtualmachinefailurewould,forinstance,betheinfamous“bluescreenofdeath”.InthecaseofAppMonitoringthetypeoffailurethattriggersaresponseisdefinedbytheapplicationdeveloperoradministrator.

HowDoesVM/AppMonitoringWork?

VMMonitoringresetsindividualvirtualmachineswhenneeded.VM/AppmonitoringusesaheartbeatsimilartoHA.Ifheartbeats,and,inthiscase,VMwareToolsheartbeats,arenotreceivedforaspecific(andconfigurable)amountoftime,thevirtualmachinewillbe

vSphere6.xHADeepdive

107VMandApplicationMonitoring

Page 108: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

restarted.TheseheartbeatsaremonitoredbytheHAagentandarenotsentoveranetwork,butstaylocaltothehost.

Figure53-VMMonitoringsensitivity

WhenenablingVM/AppMonitoring,thelevelofsensitivity(Figure43)canbeconfigured.Thedefaultsettingshouldfitmostsituations.Lowsensitivitybasicallymeansthatthenumberofallowed“missed”heartbeatsishigherandthechancesofrunningintoafalsepositivearelower.However,ifafailureoccursandthesensitivitylevelissettoLow,theexperienceddowntimewillbehigher.Whenquickactionisrequiredintheeventofafailure,“highsensitivity”canbeselected.Asexpected,thisistheoppositeof“lowsensitivity”.Notethattheadvancedsettingsmentionedinthefollowingtablearedeprecatedandlistedforeducationalpurposes.

Sensitivity Failureinterval Maxfailures Maximresetstimewindow

Low 120Seconds 3 7Days

Medium 60Seconds 3 24Hours

High 30Seconds 3 1hour

ItisimportanttorememberthatVMMonitoringdoesnotinfinitelyrebootvirtualmachinesunlessyouspecifyacustompolicywiththisrequirement.Thisistoavoidaproblemfromrepeating.Bydefault,whenavirtualmachinehasbeenrebootedthreetimeswithinanhour,nofurtherattemptswillbetaken.Unlessthespecifiedtimehaselapsed.Thefollowingadvancedsettingscanbesettochangethisdefaultbehavioror“custom”canbeselectedasshowninFigure43.

AlthoughtheheartbeatproducedbyVMwareToolsisreliable,VMwareaddedafurtherverificationmechanism.Toavoidfalsepositives,VMMonitoringalsomonitorsI/Oactivityofthevirtualmachine.WhenheartbeatsarenotreceivedANDnodiskornetworkactivityhas

vSphere6.xHADeepdive

108VMandApplicationMonitoring

Page 109: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

occurredoverthelast120seconds,perdefault,thevirtualmachinewillbereset.Changingtheadvancedsetting“das.iostatsInterval”canmodifythis120-secondinterval.

Itisrecommendedtoalignthedas.iostatsIntervalwiththefailureintervalselectedintheVMMonitoringsectionofvSphereHAwithintheWebClientorthevSphereClient.

Basicdesignprinciple:Aligndas.iostatsIntervalwiththefailureinterval.

ScreenshotsOneofthemostusefulfeaturesaspartofVMMonitoringisthefactthatittakesscreenshotsofthevirtualmachine’sconsole.ThescreenshotsaretakenrightbeforeVMMonitoringresetsavirtualmachine.Itisaveryusefulfeaturewhenavirtualmachine“freezes”everyonceinawhilefornoapparentreason.Thisscreenshotcanbeusedtodebugthevirtualmachineoperatingsystemwhenneeded,andisstoredinthevirtualmachine’sworkingdirectoryasloggedintheEventsviewontheMonitortabofthevirtualmachine.

Basicdesignprinciple:VMandApplicationmonitoringcansubstantiallyincreaseavailability.ItispartoftheHAstackandwestronglyrecommendusingit!

VMMonitoringImplementationDetailsVM/AppMonitoringisimplementedaspartoftheHAagentitself.Theagentusesthe“PerformanceManager”tomonitordiskandnetworkI/O;VM/AppMonitoringusesthe“usage”countersforbothdiskandnetworkanditrequeststhesecountersonceenoughheartbeatshavebeenmissedthattheconfiguredpolicyistriggered.

Asstatedbefore,VM/AppMonitoringusesheartbeatsjustlikehost-levelHA.TheheartbeatsaremonitoredbytheHAagent,whichisresponsiblefortherestarts.Ofcourse,thisinformationisalsobeingrolledupintovCenter,butthatisdoneviatheManagementNetwork,notusingthevirtualmachinenetwork.Thisiscrucialtoknowasthismeansthatwhenavirtualmachinenetworkerroroccurs,thevirtualmachineheartbeatwillstillbereceived.Whenanerroroccurs,HAwilltriggerarestartofthevirtualmachinewhenallthreeconditionsaremet:

1. NoVMwareToolsheartbeatreceived2. NonetworkI/Ooverthelast120seconds3. NostorageI/Ooverthelast120seconds

Justlikewithhost-levelHA,theHAagentworksindependentlyofvCenterwhenitcomestovirtualmachinerestarts.

vSphere6.xHADeepdive

109VMandApplicationMonitoring

Page 110: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Timing

TheVM/Appmonitoringfeaturemonitorstheheartbeat(s)issuedbyaguestandresetsthevirtualmachineifthereisaheartbeatfailurethatsatisfiestheconfiguredpolicyforthevirtualmachine.HAcanmonitorjusttheheartbeatsissuedbytheVMwaretoolsprocessorcanmonitortheseheartbeatsplusthoseissuedbyanoptionalin-guestagent.

IftheVMmonitoringheartbeatsstopattimeT-0,theminimumtimebeforeHAwilldeclareaheartbeatfailureisintherangeof81secondsto119seconds,whereasforheartbeatsissuedbyanin-guestapplicationagent,HAwilldeclareafailureintherangeof61secondsto89seconds.Onceaheartbeatfailureisdeclaredforapplicationheartbeats,HAwillattempttoresetthevirtualmachine.However,forVMwaretoolsheartbeats,HAwillfirstcheckwhetheranyIOhasbeenissuedbythevirtualmachineforthelast2minutes(bydefault)andonlyiftherehasbeennoIOwillitissueareset.DuetohowHOSTDpublishestheI/Ostatistics,thischeckcoulddelaytheresetbyapproximately20secondsforvirtualmachinesthatwereissuingI/Owithinapproximately1minuteofT-0.

Timingdetails:therangedependsonwhentheheartbeatsstoprelativetotheHOSTDthreadthatmonitorsthem.ForthelowerboundoftheVMwaretoolsheartbeats,theheartbeatsstopasecondbeforetheHOSTDthreadruns,whichmeans,atT+31,theFDMagentonthehostwillbenotifiedofatoolsyellowstate,andthenatT+61oftheredstate,whichHAreactsto.HAthenmonitorstheheartbeatfailureforaminimumof30seconds,leadingtotheminofT+91.The30secondsmonitoringperioddonebyHAcanbeincreasedusingthedas.failureIntervalpolicysetting.Fortheupperbound,theFDMisnotnotifieduntilT+59s(T=0thefailureoccurs,T+29HOSTDnoticesitandstartstheheartbeatfailuretimer,andatT+59HOSTDreportsayellowstate,andatT+89reportsaredstate).

Fortheheartbeatsissuedbyanin-guestagent,noyellowstateissent,sothethereisnoadditional30secondsperiod.

ApplicationMonitoringApplicationMonitoringisapartofVMMonitoring.ApplicationMonitoringisafeaturethatpartnersand/orcustomerscanleveragetoincreaseresiliency,asshowninthescreenshotbelowbutfromanapplicationpointofviewratherthanfromaVMpointofview.ThereisanSDKavailabletothegeneralpublicanditispartoftheguestSDK.

vSphere6.xHADeepdive

110VMandApplicationMonitoring

Page 111: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure54-VMandApplicationMonitoring

TheGuestSDKiscurrentlyprimarilyusedbyapplicationdevelopersfrompartnerslikeSymantectodevelopsolutionsthatincreaseresilienceonadifferentlevelthanVMMonitoringandHA.InthecaseofSymantec,asimplifiedversionofVeritasClusterServer(VCS)isusedtoenableapplicationavailabilitymonitoring,includingrespondingtoissues.Notethatthisisnotamulti-nodeclusteringsolutionlikeVCSitself,butasinglenodesolution.

SymantecApplicationHA,asitiscalled,istriggeredtogettheapplicationupandrunningagainbyrestartingit.Symantec'sApplicationHAisawareofdependenciesandknowsinwhichorderservicesshouldbestartedorstopped.If,however,thisfailsforacertainnumber(configurableoptionwithinApplicationHA)oftimes,VMwareHAwillberequestedtotakeaction.Thisactionwillbearestartofthevirtualmachine.

AlthoughApplicationMonitoringisrelativelynewandthereareonlyafewpartnerscurrentlyexploringthecapabilities,inouropinion,itdoesaddawholenewlevelofresiliency.Yourin-housedevelopmentteamcouldleveragefunctionalityofferedthroughtheAPI,oryoucoulduseasolutiondevelopedbyoneofVMware’spartners.WehavetestedApplicationHAbySymantecandpersonallyfeelitisthemissinglink.ItenablesyouasSystemAdmintointegrateyourvirtualizationlayerwithyourapplicationlayer.ItensuresyouasaSystemAdminthatserviceswhichareprotectedarerestartedinthecorrectorderanditavoidsthecommonpitfallsassociatedwithrestartsandmaintenance.NotethatVMwarealsointroducedan"ApplicationMonitoring"solutionwhichwasbasedonHyperictechnology,thisproducthoweverhasbeendeprecatedandassuchwillnotbediscussedinthispublication.

ApplicationAwarenessAPI

TheApplicationAwarenessAPIisopenforeveryone.Wefeelthatthisisnottheplacetodoafulldeepdiveonhowtouseit,butwedowanttodiscussitbriefly.

TheApplicationAwarenessAPIallowsforanyonetotalktoit,includingscripts,whichmakesthepossibilitiesendless.Currentlythereare6functionsdefined:

_VMGuestAppMonitor_Enable_()

EnablesMonitoring_VMGuestAppMonitor_MarkActive_()

vSphere6.xHADeepdive

111VMandApplicationMonitoring

Page 112: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Callevery30secondstomarkapplicationasactive_VMGuestAppMonitor_Disable_()

DisableMonitoring_VMGuestAppMonitor_IsEnabled_()

ReturnsstatusofMonitoring_VMGuestAppMonitor_GetAppStatus_()

Returnsthecurrentapplicationstatusrecordedfortheapplication_VMGuestAppMonitor_Free(_)

FreestheresultoftheVMGuestAppMonitor_GetAppStatus()call

Thesefunctionscanbeusedbyyourdevelopmentteam,howeverAppMonitoringalsooffersanewexecutable.ThisallowsyoutousethefunctionalityAppMonitoringofferswithouttheneedtocompileafullbinary.Thisnewcommand,vmware-appmonitoring.exe,takesthefollowingarguments,whicharenotcoincidentallysimilartothefunctions:

EnableDisablemarkActiveisEnabledgetAppStatus

Whenrunningthecommandvmware-appmonitor.exe,whichcanbefoundunder"VMware-GuestAppMonitorSDK\bin\win32\"thefollowingoutputispresented:

Usage:vmware-appmonitor.exe{enable|disable|markActive|isEnabled|getAppStatus}

AsshowntherearemultiplewaysofleveragingApplicationMonitoringandtoenhanceresiliencyonanapplicationlevel.

vSphere6.xHADeepdive

112VMandApplicationMonitoring

Page 113: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

vSphereHAand...NowthatyouknowhowHAworksinsideout,wewanttoexplainthedifferentintegrationpointsbetweenHA,DRSandSDRS.

HAandStorageDRSvSphereHAinformsStorageDRSwhenafailurehasoccurred.ThistopreventtherelocationofanyHAprotectedvirtualmachine,meaning,avirtualmachinethatwaspoweredon,butwhichfailed,andhasnotbeenrestartedyetduetotheirbeinginsufficientcapacityavailable.Further,StorageDRSisnotallowedtoStoragevMotionavirtualmachinethatisownedbyamasterotherthantheonevCenterServeristalkingto.Thisisbecauseinsuchasituation,HAwouldnotbeabletoreprotectthevirtualmachineuntilthemastertowhichvCenterServeristalkingisabletolockthedatastoreagain.

StoragevMotionandHAIfavirtualmachineneedstoberestartedbyHAandthevirtualmachineisintheprocessofbeingStoragevMotionedandthevirtualmachinefails,therestartprocessisnotstarteduntilvCenterinformsthemasterthattheStoragevMotiontaskhascompletedorhasbeenrolledback.Ifthesourcehostfails,however,virtualmachinewillrestartthevirtualmachineaspartofthenormalworkflow.DuringaStoragevMotion,theHAagentonthehostonwhichtheStoragevMotionwasinitiatedmasksthefailurestateofthevirtualmachine.If,forwhateverreason,vCenterisunavailable,themaskingwilltimeoutafter15minutestoensurethatthevirtualmachinewillberestarted.

AlsonotethatwhenaStoragevMotioncompletes,vCenterwillreportthevirtualmachineasunprotecteduntilthemasterreportsitprotectedagainunderthenewpath.

HAandDRSHAintegratesonmultiplelevelswithDRS.ItisahugeimprovementanditissomethingthatwewantedtostressasithaschangedboththebehaviorandthereliabilityofHA.

HAandResourceFragmentation

vSphere6.xHADeepdive

113vSphereHAand...

Page 114: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Whenafailoverisinitiated,HAwillfirstcheckwhetherthereareresourcesavailableonthedestinationhostsforthefailover.If,forinstance,aparticularvirtualmachinehasaverylargereservationandtheAdmissionControlPolicyisbasedonapercentage,forexample,itcouldhappenthatresourcesarefragmentedacrossmultiplehosts.(Formoredetailsonthisscenario,seeChapter7.)HAwillaskDRStodefragmenttheresourcestoaccommodateforthisvirtualmachine’sresourcerequirements.AlthoughHAwillrequestadefragmentationofresources,aguaranteecannotbegiven.Assuch,evenwiththisadditionalintegration,youshouldstillbecautiouswhenitcomestoresourcefragmentation.

FlattenedShares

WhenshareshavebeensetcustomonavirtualmachineanissuecanarisewhenthatVMneedstoberestarted.WhenHAfailsoveravirtualmachine,itwillpower-onthevirtualmachineintheRootResourcePool.However,thevirtualmachine’sshareswerethoseconfiguredbyauserforit,andnotscaledforitbeingparentedundertheRootResourcePool.Thiscouldcausethevirtualmachinetoreceiveeithertoomanyortoofewresourcesrelativetoitsentitlement.

Ascenariowhereandwhenthiscanoccurwouldbethefollowing:

VM1hasa1000sharesandResourcePoolAhas2000shares.HoweverResourcePoolAhas2virtualmachinesandbothvirtualmachineswillhave50%ofthose“2000”shares.Thefollowingdiagramdepictsthisscenario:

vSphere6.xHADeepdive

114vSphereHAand...

Page 115: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure55-Flattensharesstartingpoint

Whenthehostfails,bothVM2andVM3willenduponthesamelevelasVM1,theRootResourcePool.However,asacustomsharesvalueof10,000wasspecifiedonbothVM2andVM3,theywillcompletelyblowawayVM1intimesofcontention.Thisisdepictedinthefollowingdiagram:

vSphere6.xHADeepdive

115vSphereHAand...

Page 116: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure56-Flattenshareshostfailure

ThissituationwouldpersistuntilthenextinvocationofDRSwouldre-parentthevirtualmachinesVM2andVM3totheiroriginalResourcePool.ToaddressthisissueHAcalculatesaflattenedsharevaluebeforethevirtualmachine’sisfailed-over.ThisflatteningprocessensuresthatthevirtualmachinewillgettheresourcesitwouldhavereceivedifithadfailedovertothecorrectResourcePool.Thisscenarioisdepictedinthefollowingdiagram.NotethatbothVM2andVM3areplacedundertheRootResourcePoolwithasharesvalueof1000.

vSphere6.xHADeepdive

116vSphereHAand...

Page 117: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure57-FlattensharesafterhostfailurebeforeDRSinvocation

Ofcourse,whenDRSisinvoked,bothVM2andVM3willbere-parentedunderResourcePool1andwillagainreceivethenumberofsharestheyhadbeenoriginallyassigned.

DPMandHA

IfDPMisenabledandresourcesarescarceduringanHAfailover,HAwilluseDRStotrytoadjustthecluster(forexample,bybringinghostsoutofstandbymodeormigratingvirtualmachinestodefragmenttheclusterresources)sothatHAcanperformthefailovers.

IfHAstrictAdmissionControlisenabled(default),DPMwillmaintainthenecessarylevelofpowered-oncapacitytomeettheconfiguredHAfailovercapacity.HAplacesaconstrainttopreventDPMfrompoweringdowntoomanyESXihostsifitwouldviolatetheAdmissionControlPolicy.

WhenHAadmissioncontrolisdisabled,HAwillpreventDPMfrompoweringoffallbutonehostinthecluster.Aminimumoftwohostsarekeptupregardlessoftheresourceconsumption.Thereasonthisbehaviorhaschangedisthatitisimpossibletorestartvirtualmachineswhentheonlyhostleftintheclusterhasjustfailed.

Inafailurescenario,ifHAcannotrestartsomevirtualmachines,itasksDRS/DPMtotrytodefragmentresourcesorbringhostsoutofstandbytoallowHAanotheropportunitytorestartthevirtualmachines.AnotherchangeisthatDRS/DPMwillpower-onorkeepon

vSphere6.xHADeepdive

117vSphereHAand...

Page 118: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

hostsneededtoaddressclusterconstraints,evenifthosehostarelightlyutilized.Onceagain,inorderforthistobesuccessfulDRSwillneedtobeenabledandconfiguredtofullyautomated.WhennotconfiguredtofullyautomateduseractionisrequiredtoexecuteDRSrecommendationsandallowtherestartofvirtualmachinestooccur.

vSphere6.xHADeepdive

118vSphereHAand...

Page 119: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

UseCase:StretchedClusterInthispartwewillbediscussingaspecificinfrastructurearchitectureandhowHA,DRSandStorageDRScanbeleveragedandshouldbedeployedtoincreaseavailability.Beitavailabilityofyourworkloadortheresourcesprovidedtoyourworkload,wewillguideyouthroughsomeofthedesignconsiderationsanddecisionpointsalongtheway.Ofcourse,afullunderstandingofyourenvironmentwillberequiredinordertomakeappropriatedecisionsregardingspecificimplementationdetails.Nevertheless,wehopethatthissectionwillprovideaproperunderstandingofhowcertainfeaturesplaytogetherandhowthesecanbeusedtomeettherequirementsofyourenvironmentandbuildthedesiredarchitecture.

ScenarioThescenariowehavechosenisastretchedclusteralsoreferredtoasaVMwarevSphereMetroStorageClustersolution.Wehavechosenthisspecificscenarioasitallowsustoexplainamultitudeofdesignandarchitecturalconsiderations.Althoughthisscenariohasbeentestedandvalidatedinourlab,everyenvironmentisuniqueandourrecommendationsarebasedonourexperienceandyourmileagemayvary.

AVMwarevSphereMetroStorageCluster(vMSC)configurationisaVMwarevSpherecertifiedsolutionthatcombinessynchronousreplicationwithstoragearraybasedclustering.Thesesolutionsaretypicallydeployedinenvironmentswherethedistancebetweendatacentersislimited,oftenmetropolitanorcampusenvironments.

Theprimarybenefitofastretchedclustermodelistoenablefullyactiveandworkload-balanceddatacenterstobeusedtotheirfullpotential.ManycustomersfindthisarchitectureattractiveduetothecapabilityofmigratingvirtualmachineswithvMotionandStoragevMotionbetweensites.Thisenableson-demandandnon-intrusivecross-sitemobilityofworkloads.Thecapabilityofastretchedclustertoprovidethisactivebalancingofresourcesshouldalwaysbetheprimarydesignandimplementationgoal.

Stretchedclustersolutionsofferthebenefitof:

WorkloadmobilityCross-siteautomatedloadbalancingEnhanceddowntimeavoidanceDisasteravoidance

Technicalrequirementsandconstraints

vSphere6.xHADeepdive

119UseCase-StretchedClusters

Page 120: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

DuetothetechnicalconstraintsofanonlinemigrationofVMs,thefollowingspecificrequirements,whicharelistedintheVMwareCompatibilityGuide,mustbemetpriortoconsiderationofastretchedclusterimplementation:

StorageconnectivityusingFibreChannel,iSCSI,NFS,andFCoEissupported.ThemaximumsupportednetworklatencybetweensitesfortheVMwareESXimanagementnetworksis10msround-triptime(RTT).vMotion,andStoragevMotion,supportsamaximumof150mslatencyasofvSphere6.0,butthisisnotintendedforstretchedclusteringusage.Themaximumsupportedlatencyforsynchronousstoragereplicationlinksis10msRTT.Refertodocumentationfromthestoragevendorbecausethemaximumtoleratedlatencyislowerinmostcases.ThemostcommonlysupportedmaximumRTTis5ms.TheESXivSpherevMotionnetworkhasaredundantnetworklinkminimumof250Mbps.

Thestoragerequirementsareslightlymorecomplex.AvSphereMetroStorageClusterrequireswhatisineffectasinglestoragesubsystemthatspansbothsites.Inthisdesign,agivendatastoremustbeaccessible—thatis,beabletobereadandbewrittento—simultaneouslyfrombothsites.Further,whenproblemsoccur,theESXihostsmustbeabletocontinuetoaccessdatastoresfromeitherarraytransparentlyandwithnoimpacttoongoingstorageoperations.

Thisprecludestraditionalsynchronousreplicationsolutionsbecausetheycreateaprimary–secondaryrelationshipbetweentheactive(primary)LUNwheredataisbeingaccessedandthesecondaryLUNthatisreceivingreplication.ToaccessthesecondaryLUN,replicationisstopped,orreversed,andtheLUNismadevisibletohosts.This“promoted”secondaryLUNhasacompletelydifferentLUNIDandisessentiallyanewlyavailablecopyofaformerprimaryLUN.Thistypeofsolutionworksfortraditionaldisasterrecovery–typeconfigurationsbecauseitisexpectedthatVMsmustbestarteduponthesecondarysite.ThevMSCconfigurationrequiressimultaneous,uninterruptedaccesstoenablelivemigrationofrunningVMsbetweensites.

ThestoragesubsystemforavMSCmustbeabletobereadfromandwritetobothlocationssimultaneously.Alldiskwritesarecommittedsynchronouslyatbothlocationstoensurethatdataisalwaysconsistentregardlessofthelocationfromwhichitisbeingread.Thisstoragearchitecturerequiressignificantbandwidthandverylowlatencybetweenthesitesinthecluster.Increaseddistancesorlatenciescausedelaysinwritingtodiskandadramaticdeclineinperformance.TheyalsoprecludesuccessfulvMotionmigrationbetweenclusternodesthatresideindifferentlocations.

UniformversusNon-Uniform

vSphere6.xHADeepdive

120UseCase-StretchedClusters

Page 121: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

vMSCsolutionsareclassifiedintotwodistinctcategories.Thesecategoriesarebasedonafundamentaldifferenceinhowhostsaccessstorage.Itisimportanttounderstandthedifferenttypesofstretchedstoragesolutionsbecausethisinfluencesdesignconsiderations.ThefollowingtwomaincategoriesareasdescribedontheVMwareHardwareCompatibilityList:

Uniformhostaccessconfiguration–ESXihostsfrombothsitesareallconnectedtoastoragenodeinthestorageclusteracrossallsites.PathspresentedtoESXihostsarestretchedacrossadistance.Nonuniformhostaccessconfiguration–ESXihostsateachsiteareconnectedonlytostoragenode(s)atthesamesite.PathspresentedtoESXihostsfromstoragenodesarelimitedtothelocalsite.

Thefollowingin-depthdescriptionsofbothcategoriesclearlydefinethemfromarchitecturalandimplementationperspectives.

Withuniformhostaccessconfiguration,hostsindatacenterAanddatacenterBhaveaccesstothestoragesystemsinbothdatacenters.Ineffect,thestorageareanetworkisstretchedbetweenthesites,andallhostscanaccessallLUNs.NetAppMetroClustersoftwareisanexampleofuniformstorage.Inthisconfiguration,read/writeaccesstoaLUNtakesplaceononeofthetwoarrays,andasynchronousmirrorismaintainedinahidden,read-onlystateonthesecondarray.Forexample,ifaLUNcontainingadatastoreisread/writeonthearrayindatacenterA,allESXihostsaccessthatdatastoreviathearrayindatacenterA.ForESXihostsindatacenterA,thisislocalaccess.ESXihostsindatacenterBthatarerunningVMshostedonthisdatastoresendread/writetrafficacrossthenetworkbetweendatacenters.Incaseofanoutageoranoperator-controlledshiftofcontroloftheLUNtodatacenterB,allESXihostscontinuetodetecttheidenticalLUNbeingpresented,butitisnowbeingaccessedviathearrayindatacenterB.

TheidealsituationisoneinwhichVMsaccessadatastorethatiscontrolled(read/write)bythearrayinthesamedatacenter.Thisminimizestrafficbetweendatacenterstoavoidtheperformanceimpactofreads’traversingtheinterconnect.

Thenotionof“siteaffinity”foraVMisdictatedbytheread/writecopyofthedatastore.“Siteaffinity”isalsosometimesreferredtoas“sitebias”or“LUNlocality.”ThismeansthatwhenaVMhassiteaffinitywithdatacenterA,itsread/writecopyofthedatastoreislocatedindatacenterA.Thisisexplainedinmoredetailinthe“vSphereDRS”subsectionofthissection.

vSphere6.xHADeepdive

121UseCase-StretchedClusters

Page 122: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure58-UniformConfiguration

Withnonuniformhostaccessconfiguration,hostsindatacenterAhaveaccessonlytothearraywithinthelocaldatacenter;thearray,aswellasitspeerarrayintheoppositedatacenter,isresponsibleforprovidingaccesstodatastoresinonedatacentertoESXihostsintheoppositedatacenter.EMCVPLEXisanexampleofastoragesystemthatcanbedeployedasanonuniformstoragecluster,althoughitcanalsobeconfiguredinauniformmanner.VPLEXprovidestheconceptofa“virtualLUN,”whichenablesESXihostsineachdatacentertoreadandwritetothesamedatastoreorLUN.VPLEXtechnologymaintainsthecachestateoneacharraysoESXihostsineitherdatacenterdetecttheLUNaslocal.EMCcallsthissolution“writeanywhere.”EvenwhentwoVMsresideonthesamedatastorebutarelocatedindifferentdatacenters,theywritelocallywithoutanyperformanceimpactoneitherVM.AkeypointwiththisconfigurationisthateachLUNordatastorehas“siteaffinity,”alsosometimesreferredtoas“sitebias”or“LUNlocality.”Inotherwords,ifanythinghappenstothelinkbetweenthesites,thestoragesystemonthepreferredsiteforagivendatastorewillbetheonlyoneremainingwithread/writeaccesstoit.Thispreventsanydatacorruptionincaseofafailurescenario.

vSphere6.xHADeepdive

122UseCase-StretchedClusters

Page 123: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure59-NonuniformConfiguration

Ourexamplesuseuniformstoragebecausetheseconfigurationsarecurrentlythemostcommonlydeployed.Manyofthedesignconsiderations,however,alsoapplytononuniformconfigurations.Wepointoutexceptionswhenthisisnotthecase.

ScenarioArchitectureInthissectionwewilldescribethearchitecturedeployedforthisscenario.WewillalsodiscusssomeofthebasicconfigurationandbehaviorofthevariousvSpherefeatures.Foranin-depthexplanationofeachrespectivefeature,refertotheHAandtheDRSsectionofthisbook.WewillmakespecificrecommendationsbasedonVMwarebestpracticesandprovideoperationalguidancewhereapplicable.Inourfailurescenariositwillbeexplainedhowthesepracticespreventorlimitdowntime.

Infrastructure

vSphere6.xHADeepdive

123UseCase-StretchedClusters

Page 124: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

ThedescribedinfrastructureconsistsofasinglevSphere6.0clusterwithfourESXi6.0hosts.ThesehostsaremanagedbyasinglevCenterServer6.0instance.ThefirstsiteiscalledFrimley;thesecondsiteiscalledBluefin.ThenetworkbetweenFrimleydatacenterandBluefindatacenterisastretchedlayer2network.Thereisaminimaldistancebetweenthesites,asistypicalincampusclusterscenarios.

EachsitehastwoESXihosts,andthevCenterServerinstanceisconfiguredwithvSphereDRSaffinitytothehostsinBluefindatacenter.Inastretchedclusterenvironment,onlyasinglevCenterServerinstanceisused.ThisisdifferentfromatraditionalVMwareSiteRecoveryManager™configurationinwhichadualvCenterServerconfigurationisrequired.TheconfigurationofVM-to-hostaffinityrulesisdiscussedinmoredetailinthe“vSphereDRS”subsectionofthisdocument.

EightLUNsaredepictedthediagrambelow.FouroftheseareaccessedthroughthevirtualIPaddressactiveontheiSCSIstoragesystemintheFrimleydatacenter;fourareaccessedthroughthevirtualIPaddressactiveontheiSCSIstoragesystemintheBluefindatacenter.

vSphere6.xHADeepdive

124UseCase-StretchedClusters

Page 125: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure60-TestEnvironment

Location Hosts Datastores LocalIsolationAddress

Bluefin 172.16.103.184 Bluefin01 172.16.103.10

172.16.103.185 Bluefin02 n/a

Bluefin03 n/a

Bluefin04 n/a

Frimley 172.16.103.182 Frimley01 172.16.103.11

172.16.103.183 Frimley02 n/a

Frimley03 n/a

Frimley04 n/a

vSphere6.xHADeepdive

125UseCase-StretchedClusters

Page 126: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

ThevSphereclusterisconnectedtoastretchedstoragesysteminafabricconfigurationwithauniformdeviceaccessmodel.Thismeansthateveryhostintheclusterisconnectedtobothstorageheads.Eachoftheheadsisconnectedtotwoswitches,whichareconnectedtotwosimilarswitchesinthesecondarylocation.ForanygivenLUN,oneofthetwostorageheadspresentstheLUNasread/writeviaiSCSI.Theotherstorageheadmaintainsthereplicated,read-onlycopythatiseffectivelyhiddenfromtheESXihosts.

vSphereConfigurationOurfocusinthissectionisonvSphereHA,vSphereDRS,andvSphereStorageDRSinrelationtostretchedclusterenvironments.DesignandoperationalconsiderationsregardingvSpherearecommonlyoverlookedandunderestimated.Muchemphasishastraditionallybeenplacedonthestoragelayer,butlittleattentionhasbeenappliedtohowworkloadsareprovisionedandmanaged.

Oneofthekeydriversforusingastretchedclusterisworkloadbalanceanddisasteravoidance.Howdoweensurethatourenvironmentisproperlybalancedwithoutimpactingavailabilityorseverelyincreasingtheoperationalexpenditure?Howdowebuildtherequirementsintoourprovisioningprocessandvalidateperiodicallythatwestillmeetthem?Ignoringtherequirementsmakestheenvironmentconfusingtoadministrateandlesspredictableduringthevariousfailurescenariosforwhichitshouldbeofhelp.

EachofthesethreevSpherefeatureshasveryspecificconfigurationrequirementsandcanenhanceenvironmentresiliencyandworkloadavailability.Architecturalrecommendationsbasedonourfindingsduringthetestingofthevariousfailurescenariosaregiventhroughoutthissection.

vSphereHA

Theenvironmenthasfourhostsandauniformstretchedstoragesolution.Afullsitefailureisonescenariothatmustbetakenintoaccountinaresilientarchitecture.VMwarerecommendsenablingvSphereHAadmissioncontrol.Workloadavailabilityistheprimarydriverformoststretchedclusterenvironments,soprovidingsufficientcapacityforafullsitefailureisrecommended.Suchhostsareequallydividedacrossbothsites.ToensurethatallworkloadscanberestartedbyvSphereHAonjustonesite,configuringtheadmissioncontrolpolicyto50percentforbothmemoryandCPUisrecommended.

VMwarerecommendsusingapercentage-basedpolicybecauseitoffersthemostflexibilityandreducesoperationaloverhead.Evenwhennewhostsareintroducedtotheenvironment,thereisnoneedtochangethepercentageandnoriskofaskewedconsolidationratioduetopossibleuseofVM-levelreservations.

vSphere6.xHADeepdive

126UseCase-StretchedClusters

Page 127: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

ThescreenshotbelowshowsavSphereHAclusterconfiguredwithadmissioncontrolenabledandwiththepercentage-basedpolicysetto50percent.

Figure61-vSphereHAConfiguration

vSphereHAusesheartbeatmechanismstovalidatethestateofahost.Therearetwosuchmechanisms:networkheartbeatinganddatastoreheartbeating.NetworkheartbeatingistheprimarymechanismforvSphereHAtovalidateavailabilityofthehosts.Datastore

vSphere6.xHADeepdive

127UseCase-StretchedClusters

Page 128: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

heartbeatingisthesecondarymechanismusedbyvSphereHA;itdeterminestheexactstateofthehostafternetworkheartbeatinghasfailed.

Ifahostisnotreceivinganyheartbeats,itusesafail-safemechanismtodetectifitismerelyisolatedfromitsmasternodeorcompletelyisolatedfromthenetwork.Itdoesthisbypingingthedefaultgateway.Inadditiontothismechanism,oneormoreisolationaddressescanbespecifiedmanuallytoenhancereliabilityofisolationvalidation.VMwarerecommendsspecifyingaminimumoftwoadditionalisolationaddresses,witheachaddresssitelocal.

Inourscenario,oneoftheseaddressesphysicallyresidesintheFrimleydatacenter;theotherphysicallyresidesintheBluefindatacenter.ThisenablesvSphereHAvalidationforcompletenetworkisolation,evenincaseofaconnectionfailurebetweensites.Thenextscreenshotshowsanexampleofhowtoconfiguremultipleisolationaddresses.ThevSphereHAadvancedsettingusedisdas.isolationaddress.MoredetailsonhowtoconfigurethiscanbefoundinVMwareKnowledgeBasearticle1002117.

Theminimumnumberofheartbeatdatastoresistwoandthemaximumisfive.ForvSphereHAdatastoreheartbeatingtofunctioncorrectlyinanytypeoffailurescenario,VMwarerecommendsincreasingthenumberofheartbeatdatastoresfromtwotofourinastretchedclusterenvironment.Thisprovidesfullredundancyforbothdatacenterlocations.Definingfourspecificdatastoresaspreferredheartbeatdatastoresisalsorecommended,selectingtwofromonesiteandtwofromtheother.ThisenablesvSphereHAtoheartbeattoadatastoreeveninthecaseofaconnectionfailurebetweensites.Subsequently,itenablesvSphereHAtodeterminethestateofahostinanyscenario.

Addinganadvancedsettingcalleddas.heartbeatDsPerHostcanincreasethenumberofheartbeatdatastores.Thisisshowninthescreenshotbelow.

vSphere6.xHADeepdive

128UseCase-StretchedClusters

Page 129: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure62-vSphereHAAdvancedSettings

Todesignatespecificdatastoresasheartbeatdevices,VMwarerecommendsusingSelectanyoftheclusterdatastorestakingintoaccountmypreferences.ThisenablesvSphereHAtoselectanyotherdatastoreifthefourdesignateddatastoresthathavebeenmanuallyselectedbecomeunavailable.VMwarerecommendsselectingtwodatastoresineachlocationtoensurethatdatastoresareavailableateachsiteinthecaseofasitepartition.

Figure63-DatastoreHeartbeating

PermanentDeviceLossandAllPathsDownScenarios

vSphere6.xHADeepdive

129UseCase-StretchedClusters

Page 130: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

AsofvSphere6.0,enhancementshavebeenintroducedtoenableanautomatedfailoverofVMsresidingonadatastorethathaseitheranallpathsdown(APD)orapermanentdeviceloss(PDL)condition.PDLisapplicableonlytoblockstoragedevices.

APDLcondition,asisdiscussedinoneofourfailurescenarios,isaconditionthatiscommunicatedbythearraycontrollertotheESXihostviaaSCSIsensecode.Thisconditionindicatesthatadevice(LUN)hasbecomeunavailableandislikelypermanentlyunavailable.AnexamplescenarioinwhichthisconditioniscommunicatedbythearrayiswhenaLUNissetoffline.ThisconditionisusedinnonuniformmodelsduringafailurescenariotoensurethattheESXihosttakesappropriateactionwhenaccesstoaLUNisrevoked.Whenafullstoragefailureoccurs,itisimpossibletogeneratethePDLconditionbecausethereisnocommunicationpossiblebetweenthearrayandtheESXihost.ThisstateisidentifiedbytheESXihostasanAPDcondition.AnotherexampleofanAPDconditioniswherethestoragenetworkhasfailedcompletely.Inthisscenario,theESXihostalsodoesnotdetectwhathashappenedwiththestorageanddeclaresanAPD.

ToenablevSphereHAtorespondtobothanAPDandaPDLcondition,vSphereHAmustbeconfiguredinaspecificway.VMwarerecommendsenablingVMComponentProtection(VMCP).Afterthecreationofthecluster,VMCPmustbeenabled,asisshownbelow.

Figure64-VMComponentProtection

Theconfigurationscreencanbefoundasfollows:

LogintoVMwarevSphereWebClient.ClickHostsandClusters.Clicktheclusterobject.ClicktheManagetab.ClickvSphereHAandthenEdit.SelectProtectagainstStorageConnectivityLoss.Selectindividualfunctionality,asdescribedinthefollowing,byopeningFailureconditionsandVMresponse.

TheconfigurationforPDLisbasic.IntheFailureconditionsandVMresponsesection,theresponsefollowingdetectionofaPDLconditioncanbeconfigured.VMwarerecommendssettingthistoPoweroffandrestartVMs.Whenthisconditionisdetected,aVMisrestartedinstantlyonahealthyhostwithinthevSphereHAcluster.

vSphere6.xHADeepdive

130UseCase-StretchedClusters

Page 131: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

ForanAPDscenario,configurationmustoccurinthesamesection,asisshowninthrscreenshotbelow.BesidesdefiningtheresponsetoanAPDcondition,itisalsopossibletoalterthetimingandtoconfigurethebehaviorwhenthefailureisrestoredbeforetheAPDtimeouthaspassed.

Figure65-VMCPDetailedConfiguration

WhenanAPDconditionisdetected,atimerisstarted.After140seconds,theAPDconditionisofficiallydeclaredandthedeviceismarkedasAPDtimeout.When140secondshavepassed,vSphereHAstartscounting.ThedefaultvSphereHAtimeoutis3minutes.Whenthe3minuteshavepassed,vSphereHArestartstheimpactedVMs,butVMCPcanbeconfiguredtoresponddifferentlyifpreferred.VMwarerecommendsconfiguringittoPoweroffandrestartVMs(conservative).

ConservativereferstothelikelihoodthatvSphereHAwillbeabletorestartVMs.Whensettoconservative,vSphereHArestartsonlytheVMthatisimpactedbytheAPDifitdetectsthatahostintheclustercanaccessthedatastoreonwhichtheVMresides.Inthecaseofaggressive,vSphereHAattemptstorestarttheVMevenifitdoesn’tdetectthestateoftheotherhosts.ThiscanleadtoasituationinwhichaVMisnotrestartedbecausethereisnohostthathasaccesstothedatastoreonwhichtheVMislocated.

IftheAPDisliftedandaccesstothestorageisrestoredbeforethetimeouthaspassed,vSphereHAdoesnotunnecessarilyrestarttheVMunlessexplicitlyconfiguredtodoso.IfaresponseischosenevenwhentheenvironmenthasrecoveredfromtheAPDcondition,ResponseforAPDrecoveryafterAPDtimeoutcanbeconfiguredtoResetVMs.VMwarerecommendsleavingthissettingdisabled.

WiththereleaseofvSphere5.5,anadvancedsettingcalledDisk.AutoremoveOnPDLwasintroduced.Itisimplementedbydefault.ThisfunctionalityenablesvSpheretoremovedevicesthataremarkedasPDLandhelpspreventreaching,forexample,the256-devicelimitforanESXihost.However,ifthePDLscenarioissolvedandthedevicereturns,the

vSphere6.xHADeepdive

131UseCase-StretchedClusters

Page 132: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

ESXihost’sstoragesystemmustberescannedbeforethisdeviceappears.VMwarerecommendsdisablingDisk.AutoremoveOnPDLinthehostadvancedsettingsbysettingitto0.

Figure66-Disk.AutoremoveOnPDL

vSphereDRS

vSphereDRSisusedinmanyenvironmentstodistributeloadwithinacluster.Itoffersmanyotherfeaturesthatcanbeveryhelpfulinstretchedclusterenvironments.VMwarerecommendsenablingvSphereDRStofacilitateloadbalancingacrosshostsinthecluster.ThevSphereDRSload-balancingcalculationisbasedonCPUandmemoryuse.Careshouldbetakenwithregardtobothstorageandnetworkingresourcesaswellastotrafficflow.Toavoidstorageandnetworktrafficoverheadinastretchedclusterenvironment,VMwarerecommendsimplementingvSphereDRSaffinityrulestoenablealogicalseparationofVMs.Thissubsequentlyhelpsimproveavailability.ForVMsthatareresponsibleforinfrastructureservices,suchasMicrosoftActiveDirectoryandDNS,itassistsbyensuringseparationoftheseservicesacrosssites.

vSphereDRSaffinityrulesalsohelppreventunnecessarydowntime,andstorageandnetworktrafficflowoverhead,byenforcingpreferredsiteaffinity.VMwarerecommendsaligningvSphereVM-to-hostaffinityruleswiththestorageconfiguration—thatis,settingVM-to-hostaffinityruleswithapreferencethataVMrunonahostatthesamesiteasthearray

vSphere6.xHADeepdive

132UseCase-StretchedClusters

Page 133: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

thatisconfiguredastheprimaryread/writenodeforagivendatastore.Forexample,inourtestconfiguration,VMsstoredontheFrimley01datastorearesetwithVM-to-hostaffinitywithapreferenceforhostsintheFrimleydatacenter.Thisensuresthatinthecaseofanetworkconnectionfailurebetweensites,VMsdonotloseconnectionwiththestoragesystemthatisprimaryfortheirdatastore.VM-to-hostaffinityrulesaimtoensurethatVMsstaylocaltothestorageprimaryforthatdatastore.ThiscoincidentallyalsoresultsinallreadI/O’sstayinglocal.

NOTE:DifferentstoragevendorsusedifferentterminologytodescribetherelationshipofaLUNtoaparticulararrayorcontroller.Forthepurposesofthisdocument,weusethegenericterm“storagesiteaffinity,”whichreferstothepreferredlocationforaccesstoagivenLUN.

VMwarerecommendsimplementing“shouldrules”becausetheseareviolatedbyvSphereHAinthecaseofafullsitefailure.Availabilityofservicesshouldalwaysprevail.Inthecaseof“mustrules,”vSphereHAdoesnotviolatetheruleset,andthiscanpotentiallyleadtoserviceoutages.Inthescenariowhereafulldatacenterfails,“mustrules”donotallowvSphereHAtorestarttheVMs,becausetheydonothavetherequiredaffinitytostartonthehostsintheotherdatacenter.Thisnecessitatestherecommendationtoimplement“shouldrules.”vSphereDRScommunicatestheserulestovSphereHA,andthesearestoredina“compatibilitylist”governingallowedstart-up.Ifasinglehostfails,VM-to-host“shouldrules”areignoredbydefault.VMwarerecommendsconfiguringvSphereHArulesettingstorespectVM-to-hostaffinityruleswherepossible.Withafullsitefailure,vSphereHAcanrestarttheVMsonhoststhatviolatetherules.Availabilitytakespreferenceinthisscenario.

Figure67-HAAffinityRuleSettings

Undercertaincircumstances,suchasmassivehostsaturationcoupledwithaggressiverecommendationsettings,vSphereDRScanalsoviolate“shouldrules.”Althoughthisisveryrare,werecommendmonitoringforviolationoftheserulesbecauseaviolationmightimpactavailabilityandworkloadperformance.

VMwarerecommendsmanuallydefining“sites”bycreatingagroupofhoststhatbelongtoasiteandthenaddingVMstothesesitesbasedontheaffinityofthedatastoreonwhichtheyareprovisioned.Inourscenario,onlyalimitednumberofVMswereprovisioned.VMwarerecommendsautomatingtheprocessofdefiningsiteaffinitybyusingtoolssuchasVMwarevCenterOrchestrator™orVMwarevSpherePowerCLI™.Ifautomatingtheprocessisnotan

vSphere6.xHADeepdive

133UseCase-StretchedClusters

Page 134: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

option,useofagenericnamingconventionisrecommendedtosimplifythecreationofthesegroups.VMwarerecommendsthatthesegroupsbevalidatedonaregularbasistoensurethatallVMsbelongtothegroupwiththecorrectsiteaffinity.

Thefollowingscreenshotsdepicttheconfigurationusedforourscenario.Inthefirstscreenshot,allVMsthatshouldremainlocaltotheBluefindatacenterareaddedtotheBluefinVMgroup.

Figure68-VMGroup

Next,aBluefinhostgroupiscreatedthatcontainsallhostsresidinginthislocation.

vSphere6.xHADeepdive

134UseCase-StretchedClusters

Page 135: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure69-HostGroup

Next,anewruleiscreatedthatisdefinedasa“shouldrunonrule.”ItlinksthehostgroupandtheVMgroupfortheBluefinlocation.

vSphere6.xHADeepdive

135UseCase-StretchedClusters

Page 136: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure70-RuleDefinition

Thisshouldbedoneforbothlocations,whichshouldresultintworules.

Figure71-VM/HostRules

vSphere6.xHADeepdive

136UseCase-StretchedClusters

Page 137: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

CorrectingAffinityRuleViolation

vSphereDRSassignsahighprioritytocorrectingaffinityruleviolations.Duringinvocation,theprimarygoalofvSphereDRSistocorrectanyviolationsandgeneraterecommendationstomigrateVMstothehostslistedinthehostgroup.Thesemigrationshaveahigherprioritythanload-balancingmovesandarestartedbeforethem.

vSphereDRSisinvokedevery5minutesbydefault,butitisalsotriggerediftheclusterdetectschanges.Forinstance,whenahostreconnectstothecluster,vSphereDRSisinvokedandgeneratesrecommendationstocorrecttheviolation.OurtestinghasshownthatvSphereDRSgeneratesrecommendationstocorrectaffinityrulesviolationswithin30secondsafterahostreconnectstothecluster.vSphereDRSislimitedbytheoverallcapacityofthevSpherevMotionnetwork,soitmighttakemultipleinvocationsbeforeallaffinityruleviolationsarecorrected.

vSphereStorageDRS

vSphereStorageDRSenablesaggregationofdatastorestoasingleunitofconsumptionfromanadministrativeperspective,anditbalancesVMdiskswhendefinedthresholdsareexceeded.Itensuresthatsufficientdiskresourcesareavailabletoaworkload.VMwarerecommendsenablingvSphereStorageDRSwithI/OMetricdisabled.TheuseofI/OMetricorVMwarevSphereStorageI/OControlisnotsupportedinavMSCconfiguration,asisdescribedinVMwareKnowledgeBasearticle2042596.

vSphere6.xHADeepdive

137UseCase-StretchedClusters

Page 138: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure72-StorageDRSConfiguration

vSphereStorageDRSusesvSphereStoragevMotiontomigrateVMdisksbetweendatastoreswithinadatastorecluster.Becausetheunderlyingstretchedstoragesystemsusesynchronousreplication,amigrationorseriesofmigrationshaveanimpactonreplicationtrafficandmightcausetheVMstobecometemporarilyunavailableduetocontentionfornetworkresourcesduringthemovementofdisks.MigrationtorandomdatastorescanalsopotentiallyleadtoadditionalI/OlatencyinuniformhostaccessconfigurationsifVMsarenotmigratedalongwiththeirvirtualdisks.Forexample,ifaVMresidingonahostatsiteAhasitsdiskmigratedtoadatastoreatsiteB,itcontinuesoperatingbutwithpotentiallydegradedperformance.TheVM’sdiskreadsnowaresubjecttotheincreasedlatencyassociatedwithreadingfromthevirtualiSCSIIPatsiteB.Readsaresubjecttointersitelatencyratherthanbeingsatisfiedbyalocaltarget.

Tocontrolifandwhenmigrationsoccur,VMwarerecommendsconfiguringvSphereStorageDRSinmanualmode.Thisenableshumanvalidationperrecommendationaswellasrecommendationstobeappliedduringoff-peakhours,whilegainingtheoperationalbenefitandefficiencyoftheinitialplacementfunctionality.

VMwarerecommendscreatingdatastoreclustersbasedonthestorageconfigurationwithrespecttostoragesiteaffinity.DatastoreswithasiteaffinityforsiteAshouldnotbemixedindatastoreclusterswithdatastoreswithasiteaffinityforsiteB.ThisenablesoperationalconsistencyandeasesthecreationandongoingmanagementofvSphereDRSVM-to-hostaffinityrules.EnsurethatallvSphereDRSVM-to-hostaffinityrulesareupdatedaccordingly

vSphere6.xHADeepdive

138UseCase-StretchedClusters

Page 139: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

whenVMsaremigratedviavSphereStoragevMotionbetweendatastoreclustersandwhencrossingdefinedstoragesiteaffinityboundaries.Tosimplifytheprovisioningprocess,VMwarerecommendsaligningnamingconventionsfordatastoreclustersandVM-to-hostaffinityrules.

Figure73-DatastoreClusters

Thenamingconventionusedinourtestinggivesbothdatastoresanddatastoreclustersasite-specificnametoprovideeaseofalignmentofvSphereDRShostaffinitywithVMdeploymentinthecorrelatesite.

FailureScenariosTherearemanyfailuresthatcanbeintroducedinclusteredsystems.Butinaproperlyarchitectedenvironment,vSphereHA,vSphereDRS,andthestoragesubsystemdonotdetectmanyofthese.Wedonotaddressthezero-impactfailures,suchasthefailureofasinglenetworkcable,becausetheyareexplainedindepthinthedocumentationprovidedbythestoragevendorofthevarioussolutions.Wediscussthefollowing“common”failurescenarios:

Single-hostfailureinFrimleydatacenterSingle-hostisolationinFrimleydatacenterStoragepartitionDatacenterpartitionDiskshelffailureinFrimleydatacenterFullstoragefailureinFrimleydatacenterFullcomputefailureinFrimleydatacenterFullcomputefailureinFrimleydatacenterandfullstoragefailureinBluefindatacenterLossofcompleteFrimleydatacenter

vSphere6.xHADeepdive

139UseCase-StretchedClusters

Page 140: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Wealsoexaminescenariosinwhichspecificsettingsareincorrectlyconfigured.ThesesettingsdeterminetheavailabilityandrecoverabilityofVMsinafailurescenario.Itisimportanttounderstandtheimpactofmisconfigurationssuchasthefollowing:

IncorrectlyconfiguredVM-to-hostaffinityrulesIncorrectlyconfiguredheartbeatdatastoresIncorrectlyconfiguredisolationaddressIncorrectlyconfiguredPDLhandlingvCenterServersplit-brainscenario

Single-HostFailureinFrimleyDataCenter

Inthisscenario,wedescribethecompletefailureofahostinFrimleydatacenter.Thisscenarioisdepictedbelow.

vSphere6.xHADeepdive

140UseCase-StretchedClusters

Page 141: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure74-Single-HostFailureScenario

Result:vSphereHAsuccessfullyrestartedallVMsinaccordancewithVM-to-hostaffinityrules.

Explanation:Ifahostfails,thecluster’svSphereHAmasternodedetectsthefailurebecauseitnolongerisreceivingnetworkheartbeatsfromthehost.Thenthemasterstartsmonitoringfordatastoreheartbeats.Becausethehosthasfailedcompletely,itcannotgeneratedatastoreheartbeats;thesetooaredetectedasmissingbythevSphereHAmasternode.Duringthistime,athirdavailabilitycheck—pingingthemanagementaddressesofthefailedhosts—isconducted.Ifallofthesechecksreturnasunsuccessful,themasterdeclaresthemissinghostasdeadandattemptstorestartalltheprotectedVMsthathadbeenrunningonthehostbeforethemasterlostcontactwiththehost.

vSphere6.xHADeepdive

141UseCase-StretchedClusters

Page 142: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

ThevSphereVM-to-hostaffinityrulesdefinedonaclusterlevelare“shouldrules.”vSphereHAVM-to-hostaffinityrulesshouldberespectedsoallVMsarerestartedwithinthecorrectsite.

However,ifthehostelementsoftheVM-to-hostgrouparetemporarilywithoutresources,oriftheyareunavailableforrestartsforanyotherreason,vSphereHAcandisregardtherulesandrestarttheremainingVMsonanyoftheremaininghostsinthecluster,regardlessoflocationandrules.Ifthisoccurs,vSphereDRSattemptstocorrectanyviolatedaffinityrulesatthefirstinvocationandautomaticallymigratesVMsinaccordancewiththeiraffinityrulestobringVMplacementinalignment.VMwarerecommendsmanuallyinvokingvSphereDRSafterthecauseforthefailurehasbeenidentifiedandresolved.ThisensuresthatallVMsareplacedonhostsinthecorrectlocationtoavoidpossibleperformancedegradationduetomisplacement.

Single-HostIsolationinFrimleyDataCenter

Inthisscenario,wedescribetheresponsetoisolationofasinglehostinFrimleydatacenterfromtherestofthenetwork.

Figure75-Single-HostIsolationScenario

Result:VMsremainrunningbecauseisolationresponseisconfiguredtoleavepoweredon.

Explanation:Whenahostisisolated,thevSphereHAmasternodedetectstheisolationbecauseitnolongerisreceivingnetworkheartbeatsfromthehost.Thenthemasterstartsmonitoringfordatastoreheartbeats.Becausethehostisisolated,itgeneratesdatastore

vSphere6.xHADeepdive

142UseCase-StretchedClusters

Page 143: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

heartbeatsforthesecondaryvSphereHAdetectionmechanism.DetectionofvalidhostheartbeatsenablesthevSphereHAmasternodetodeterminethatthehostisrunningbutisisolatedfromthenetwork.Dependingontheisolationresponseconfigured,theimpactedhostcanpowerofforshutdownVMsorcanleavethempoweredon.Theisolationresponseistriggered30secondsafterthehosthasdetectedthatitisisolated.

VMwarerecommendsaligningtheisolationresponsetobusinessrequirementsandphysicalconstraints.Fromabestpracticesperspective,leavepoweredonistherecommendedisolationresponsesettingforthemajorityofenvironments.Isolatedhostsarerareinaproperlyarchitectedenvironment,giventhebuilt-inredundancyofmostmoderndesigns.Inenvironmentsthatusenetwork-basedstorageprotocols,suchasiSCSIandNFS,andwherenetworksareconverged,therecommendedisolationresponseispoweroff.Intheseenvironments,itismorelikelythatanetworkoutagethatcausesahosttobecomeisolatedalsoaffectsthehost’sabilitytocommunicatetothedatastores.

Ifanisolationresponsedifferentfromtherecommendedleavepoweredonisselectedandapowerofforshutdownresponseistriggered,thevSphereHAmasterrestartsVMsontheremainingnodesinthecluster.ThevSphereVM-to-hostaffinityrulesdefinedonaclusterlevelare“shouldrules.”However,becausethevSphereHArulesettingsspecifythatthevSphereHAVM-to-hostaffinityrulesshouldberespected,allVMsarerestartedwithinthecorrectsiteunder“normal”circumstances.

StoragePartition

Inthisscenario,afailurehasoccurredonthestoragenetworkbetweendatacenters,asisdepictedbelow.

vSphere6.xHADeepdive

143UseCase-StretchedClusters

Page 144: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure76-StoragePartitionScenario

Result:VMsremainrunningwithnoimpact.

Explanation:StoragesiteaffinityisdefinedforeachLUN,andvSphereDRSrulesalignwiththisaffinity.Therefore,becausestorageremainsavailablewithinthesite,noVMisimpacted.

IfforanyreasontheaffinityruleforaVMhasbeenviolatedandtheVMisrunningonahostinFrimleydatacenterwhileitsdiskresidesonadatastorethathasaffinitywithBluefindatacenter,itcannotsuccessfullyissueI/Ofollowinganintersitestoragepartition.ThisisbecausethedatastoreisinanAPDcondition.Inthisscenario,theVMcanberestartedbecausevSphereHAisconfiguredtorespondtoAPDconditions.Theresponseoccursafterthe3-minutegraceperiodhaspassed.This3-minuteperiodstartsaftertheAPDtimeoutof140secondshaspassedandtheAPDconditionhasbeendeclared.

vSphere6.xHADeepdive

144UseCase-StretchedClusters

Page 145: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

ToavoidunnecessarydowntimeinanAPDscenario,VMwarerecommendsmonitoringcomplianceofvSphereDRSrules.AlthoughvSphereDRSisinvokedevery5minutes,thisdoesnotguaranteeresolutionofallaffinityruleviolations.Therefore,topreventunnecessarydowntime,rigidmonitoringisrecommendedthatenablesquickidentificationofanomaliessuchasaVM’scompute’sresidinginonesitewhileitsstorageresidesintheothersite.

DataCenterPartition

Inthisscenario,theFrimleydatacenterisisolatedfromtheBluefindatacenter,asisdepictedbelow.

Figure77-DataCenterPartitionScenario

Result:VMsremainrunningwithnoimpact.

vSphere6.xHADeepdive

145UseCase-StretchedClusters

Page 146: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Explanation:Inthisscenario,thetwodatacentersarefullyisolatedfromeachother.Thisscenarioissimilartoboththestoragepartitionandthehostisolationscenario.VMsarenotimpactedbythisfailurebecausevSphereDRSruleswerecorrectlyimplementedandnoruleswereviolated.

vSphereHAfollowsthislogicalprocesstodeterminewhichVMsrequirerestartingduringaclusterpartition:

ThevSphereHAmasternoderunninginFrimleydatacenterdetectsthatallhostsinBluefindatacenterareunreachable.Itfirstdetectsthatnonetworkheartbeatsarebeingreceived.Itthendetermineswhetheranystorageheartbeatsarebeinggenerated.Thischeckdoesnotdetectstorageheartbeatsbecausethestorageconnectionbetweensitesalsohasfailed,andtheheartbeatdatastoresareupdatedonly“locally.”BecausetheVMswithaffinitytotheremaininghostsarestillrunning,noactionisneededforthem.Next,vSphereHAdetermineswhetherarestartcanbeattempted.However,theread/writeversionofthedatastoreslocatedinBluefindatacenterarenotaccessiblebythehostsinFrimleydatacenter.Therefore,noattemptismadetostartthemissingVMs.

Similarly,theESXihostsinBluefindatacenterdetectthatthereisnomasteravailable,andtheyinitiateamasterelectionprocess.Afterthemasterhasbeenelected,ittriestodeterminewhichVMshadbeenrunningbeforethefailureanditattemptstorestartthem.BecauseallVMswithaffinitytoBluefindatacenterarestillrunningthere,thereisnoneedforarestart.OnlytheVMswithaffinitytoFrimleydatacenterareunavailable,andvSphereHAcannotrestartthembecausethedatastoresonwhichtheyarestoredhaveaffinitywithFrimleydatacenterandareunavailableinBluefindatacenter.

IfVM-to-hostaffinityruleshavebeenviolated—thatis,VMshavebeenrunningatalocationwheretheirstorageisnotdefinedasread/writebydefault—thebehaviorchanges.Thefollowingsequencedescribeswhatwouldhappeninthatcase:

1. TheVMwithaffinitytoFrimleydatacenterbutresidinginBluefindatacenterisunabletoreachitsdatastore.ThisresultsintheVM’sbeingunabletowritetoorreadfromdisk.

2. InFrimleydatacenter,thisVMisrestartedbyvSphereHAbecausethehostsinFrimleydatacenterdonotdetecttheinstance’srunninginBluefindatacenter.

3. BecausethedatastoreisavailableonlytoFrimleydatacenter,oneofthehostsinFrimleydatacenteracquiresalockontheVMDKandisabletopoweronthisVM.

4. ThiscanresultinascenarioinwhichthesameVMispoweredonandrunninginbothdatacenters.

vSphere6.xHADeepdive

146UseCase-StretchedClusters

Page 147: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure78-GhostVM

IftheAPDresponseisconfiguredtoPoweroffandrestartVMs(aggressive),asisrecommendedintheVMComponentProtectionsectionofthiswhitepaper,theVMispoweredoffaftertheAPDtimeoutandthegraceperiodhavepassed.ThisbehaviorisnewinvSphere6.0.

IftheAPDresponseisnotcorrectlyconfigured,twoVMswillberunning,forthefollowingpossiblereasons:

ThenetworkheartbeatfromthehostthatisrunningthisVMismissingbecausethereisnoconnectiontothatsite.Thedatastoreheartbeatismissingbecausethereisnoconnectiontothatsite.ApingtothemanagementaddressofthehostthatisrunningtheVMfailsbecausethereisnoconnectiontothatsite.ThemasterlocatedinFrimleydatacenterdetectsthattheVMhadbeenpoweredonbeforethefailure.BecauseitisunabletocommunicatewiththeVM’shostinBluefindatacenterafterthefailure,itattemptstorestarttheVMbecauseitcannotdetecttheactualstate.

vSphere6.xHADeepdive

147UseCase-StretchedClusters

Page 148: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Iftheconnectionbetweensitesisrestored,aclassic“VMsplit-brainscenario”willexist.Forashortperiodoftime,twocopiesoftheVMwillbeactiveonthenetwork,withbothhavingthesameMACaddress.Onlyonecopy,however,willhaveaccesstotheVMfiles,andvSphereHAwilldetectthis.Assoonasthisisdetected,allprocessesbelongingtotheVMcopythathasnoaccesstotheVMfileswillbekilled,asisdepictedbelow.

Figure79-TasksandEvents

Inthisexample,thedowntimeequatestoaVM’shavingtoberestarted.Propermaintenanceofsiteaffinitycanpreventthis.Toavoidunnecessarydowntime,VMwarerecommendsclosemonitoringtoensurethatvSphereDRSrulesalignwithdatastoresiteaffinity.

DiskShelfFailureinFrimleyDataCenter

Inthisscenario,oneofthediskshelvesinFrimleydatacenterhasfailed.BothFrimley01andFrimley02onstorageAareimpacted.

vSphere6.xHADeepdive

148UseCase-StretchedClusters

Page 149: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure80-DiskShelfFailureScenario

Result:VMsremainrunningwithnoimpact.

Explanation:Inthisscenario,onlyadiskshelfinFrimleydatacenterhasfailed.ThestorageprocessorhasdetectedthefailureandhasinstantlyswitchedfromtheprimarydiskshelfinFrimleydatacentertothemirrorcopyinBluefindatacenter.ThereisnonoticeableimpacttoanyoftheVMsexceptforatypicalshortspikeinI/Oresponsetime.Thestoragesolutionfullydetectsandhandlesthisscenario.ThereisnoneedforarescanofthedatastoresortheHBAsbecausetheswitchoverisseamlessandtheLUNsareidenticalfromtheESXiperspective.

FullStorageFailureinFrimleyDataCenter

vSphere6.xHADeepdive

149UseCase-StretchedClusters

Page 150: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Inthisscenario,afullstoragesystemfailurehasoccurredinFrimleydatacenter.

Figure81-FullStorageFailureScenario

Result:VMsremainrunningwithnoimpact.

Explanation:WhenthefullstoragesystemfailsinFrimleydatacenter,atakeovercommandmustbeinitiatedmanually.Asdescribedpreviously,weusedaNetAppMetroClusterconfigurationtodescribethisbehavior.ThistakeovercommandisparticulartoNetAppenvironments;dependingontheimplementedstoragesystem,therequiredprocedurecandiffer.Afterthecommandhasbeeninitiated,themirrored,read-onlycopyofeachofthefaileddatastoresissettoread/writeandisinstantlyaccessible.Wehavedescribedthisprocessonanextremelyhighlevel.Formoredetails,refertothestoragevendor’sdocumentation.

vSphere6.xHADeepdive

150UseCase-StretchedClusters

Page 151: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

FromtheVMperspective,thisfailoverisseamless:Thestoragecontrollershandlethis,andnoactionisrequiredfromeitherthevSphereorstorageadministrator.AllI/OnowpassesacrosstheintrasiteconnectiontotheotherdatacenterbecauseVMsremainrunninginFrimleydatacenterwhiletheirdatastoresareaccessibleonlyinBluefindatacenter.

vSphereHAdoesnotdetectthistypeoffailure.Althoughthedatastoreheartbeatmightbelostbriefly,vSphereHAdoesnottakeactionbecausethevSphereHAmasteragentchecksforthedatastoreheartbeatonlywhenthenetworkheartbeatisnotreceivedfor3seconds.Becausethenetworkheartbeatremainsavailablethroughoutthestoragefailure,vSphereHAisnotrequiredtoinitiateanyrestarts.

PermanentDeviceLoss

Inthescenarioshownthediagrambelow,apermanentdeviceloss(PDL)conditionoccursbecausedatastoreFrimley01hasbeentakenofflineforESXi-01andESXi-02.PDLscenariosareuncommoninuniformconfigurationsandaremorelikelytooccurinanonuniformvMSCconfiguration.However,aPDLscenariocan,forinstance,occurwhentheconfigurationofastoragegroupchangesasinthecaseofthisdescribedscenario.

vSphere6.xHADeepdive

151UseCase-StretchedClusters

Page 152: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure82-PermanentDeviceLoss

Result:VMsarerestartedbyvSphereHAonESXi-03andESXi-04.

Explanation:WhenthePDLconditionoccurs,VMsrunningondatastoreFrimley01onhostsESXi-01andESXi-02arekilledinstantly.TheythenarerestartedbyvSphereHAonhostswithintheclusterthathaveaccesstothedatastore,ESXi-03andESXi-04inthisscenario.ThePDLandkillingoftheVMworldgroupcanbewitnessedbyfollowingtheentriesinthevmkernel.logfilelocatedin/var/log/ontheESXihosts.Thefollowingisanouttakeofthevmkernel.logfilewhereaPDLisrecognizedandappropriateactionistaken.

2012-03-14T13:39:25.085Zcpu7:4499)WARNING:VSCSI:4055:handle8198(vscsi4:0):openedby

wid4499(vmm0:fri-iscsi-02)hasPermanentDeviceLoss.Killingworldgroupleader4491

vSphere6.xHADeepdive

152UseCase-StretchedClusters

Page 153: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

VMwarerecommendsconfiguringResponseforDatastorewithPermanentDeviceLoss(PDL)toPoweroffandrestartVMs.ThissettingensuresthatappropriateactionistakenwhenaPDLconditionexists.Thecorrectconfigurationisshownbelow.

Figure83-APD/PDLConfiguration

FullComputeFailureinFrimleyDataCenter

Inthisscenario,afullcomputefailurehasoccurredinFrimleydatacenter.

Figure84-FullComputeFailureScenario

Result:AllVMsaresuccessfullyrestartedinBluefindatacenter.

vSphere6.xHADeepdive

153UseCase-StretchedClusters

Page 154: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Explanation:ThevSphereHAmasterwaslocatedinFrimleydatacenteratthetimeofthefullcomputefailureatthatlocation.AfterthehostsinBluefindatacenterdetectedthatnonetworkheartbeatshadbeenreceived,anelectionprocesswasstarted.Withinapproximately20seconds,anewvSphereHAmasterwaselectedfromtheremaininghosts.ThenthenewmasterdeterminedwhichhostshadfailedandwhichVMshadbeenimpactedbythisfailure.BecauseallhostsattheothersitehadfailedandallVMsresidingonthemhadbeenimpacted,vSphereHAinitiatedtherestartofalloftheseVMs.vSphereHAcaninitiate32concurrentrestartsonasinglehost,providingalowrestartlatencyformostenvironments.Theonlysequencingofstartordercomesfromthebroadhigh,medium,andlowcategoriesforvSphereHA.Thispolicymustbesetonaper-VMbasis.Thesepoliciesweredeterminedtohavebeenadheredto;high-priorityVMsstartedfirst,followedbymedium-priorityandlow-priorityVMs.

Aspartofthetest,thehostsattheFrimleydatacenterwereagainpoweredon.AssoonasvSphereDRSdetectedthatthesehostswereavailable,avSphereDRSrunwasinvoked.BecausetheinitialvSphereDRSruncorrectsonlythevSphereDRSaffinityruleviolations,resourceimbalancewasnotcorrectuntilthenextfullinvocationofvSphereDRS.vSphereDRSisinvokedbydefaultevery5minutesorwhenVMsarepoweredofforonthroughtheuseofthevCenterWebClient.

LossofFrimleyDataCenter

Inthisscenario,afullfailureofFrimleydatacenterissimulated.

vSphere6.xHADeepdive

154UseCase-StretchedClusters

Page 155: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Figure85-FullDataCenterFailureScenario

Result:AllVMsweresuccessfullyrestartedinBluefindatacenter.

Explanation:Inthisscenario,thehostsinBluefindatacenterlostcontactwiththevSphereHAmasterandelectedanewvSphereHAmaster.Becausethestoragesystemhadfailed,atakeovercommandhadtobeinitiatedonthesurvivingsite,againduetotheNetApp-specificprocess.Afterthetakeovercommandhadbeeninitiated,thenewvSphereHAmasteraccessedtheper-datastorefilesthatvSphereHAusestorecordthesetofprotectedVMs.ThevSphereHAmasterthenattemptedtorestarttheVMsthatwerenotrunningonthesurvivinghostsinBluefindatacenter.Inourscenario,allVMswererestartedwithin2minutesafterfailureandwerefullyaccessibleandfunctionalagain.

NOTE:Bydefault,vSphereHAstopsattemptingtostartaVMafter30minutes.Ifthestorageteamdoesnotissueatakeovercommandwithinthattimeframe,thevSphereadministratormustmanuallystartupVMsafterthestoragebecomesavailable.

vSphere6.xHADeepdive

155UseCase-StretchedClusters

Page 156: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

StretchedClusterusingVSANThisquestionkeepsoncomingupoverandoveragainlately,StretchedClusterusingVirtualSAN,canIdoit?WhenVirtualSANwasfirstreleasedtheanswertothisquestionwasaclearno,VirtualSANdidnotallowa"traditional"stretcheddeploymentusing2"data"sitesandathird"witness"site.AregularVirtualSANclusterstretchedacross3siteswithincampusdistancehoweverwaspossible.WithVirtualSAN6.1howeverintroducedthe"traditional"stretchedclusterdeploymentsupport.

Figure86-StretchedVirtualSANConfiguration

EverythinglearnedinthispublicationalsoappliestoastretchedVirtualSANcluster,withthatmeaningallHAandDRSbestpractices.ThereareacoupleofdifferencesthoughatthetimeofwritingbetweenavSphereMetroStorageClusterandaVSANStretchedClusterand

vSphere6.xHADeepdive

156UseCase-StretchedClusters

Page 157: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

inthissectionwewillcalloutthesedifference.PleasenotethatthereisanextensiveVirtualSANStretchedClusteringGuideavailablewrittenbyCormacHoganandthereisafullVirtualSANbookavailablewrittenbyCormacHogananmyself(DuncanEpping).IfyouwanttoknowmoredetailsaboutVirtualSANwewouldliketorefertothesetwopublications.

Firstthingthatneedstobelookedatisthenetwork.FromaVirtualSANperspectivethereareclearrequirements:

5msRTTlatencymaxbetweendatasites200msRTTlatencymaxbetweendataandwitnesssiteBothL3andL2aresupportedbetweenthedatasites

10Gbpsbandwidthisrecommended,dependentonthenumberofVMsthiscouldbelowerorhigher,moreguidancewillbeprovidedsoonaroundthis!Multicastrequired,whichmeansthatifL3isused,someformofmulticastroutingisneeded.

L3isexpectedbetweendataandthewitnesssites100Mbpsbandwidthisrecommended,dependentonthenumberofVMsthiscouldbelowerorhigher,moreguidancewillbeprovidedsoonaroundthis!Nomulticastrequiredtothewitnesssite.

WhenitcomestoHAandDRStheconfigurationisprettystraightforward.Acoupleofthingswewanttopointoutastheyareconfigurationdetailswhichareeasytoforgetabout.Somearediscussedin-depthabove,somearesettingsyouactuallydonotusewithVSAN.Wewillpointthisoutinthelistbelow:

Makesuretospecifyadditionalisolationaddresses,oneineachsite(das.isolationAddress0–1).Disablethedefaultisolationaddressifitcan’tbeusedtovalidatethestateoftheenvironmentduringapartition(ifthegatewayisn’tavailableinbothsides).DisableDatastoreheartbeating,withouttraditionalexternalstoragethereisnoreasontohavethis.EnableHAAdmissionControlandmakesureitissetto50%forCPUandMemory.KeepVMslocalbycreating“VM/Host”shouldrules.

Thatcoversmostofit,summarizedrelativelybrieflycomparedtotheexcellentdocumentCormacdevelopedwithalldetailsyoucanwishfor.MakesuretoreadthatifyouwanttoknoweveryaspectandangleofastretchedVirtualSANclusterconfiguration.

vSphere6.xHADeepdive

157UseCase-StretchedClusters

Page 158: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

AdvancedSettingsTherearevarioustypesofKBarticlesandthisKBarticleexplainsit,butletmesummarizeitandsimplifyitabittomakeiteasiertodigest.

Therearevarioussortsofadvancedsettings,butforHAthreeinparticular:

das.*–>Clusterleveladvancedsetting.fdm.*–>FDMhostleveladvancedsettingvpxd.*–>vCenterleveladvancedsetting.

Howdoyouconfigurethese?Configuringtheseistypicallystraightforward,andmostofyouhopefullyknowthisalready,ifnot,letusgooverthestepstohelpconfiguringyourenvironmentasdesired.

ClusterLevelIntheWebClient:

Click“HostsandClusters”clickyourclusterobjectclickthe“Manage”tabclick“Settings”and“vSphereHA”hitthe“Edit”button

FDMHostLevel

OpenupanSSHsessiontoyourhostandedit“/etc/opt/vmware/fdm/fdm.cfg”

vCenterLevelIntheWebClient:

Click“vCenter”click“vCenterServers”selecttheappropriatevCenterServerandclickthe“Manage”tabclick“Settings”and“AdvancedSettings”

Inthissectionwewillprimarilyfocusontheonesmostcommonlyused,afulldetailedlistcanbefoundinKB2033250.Pleasenotethateachbulletdetailstheversionwhichsupportsthisadvancedsetting.

das.maskCleanShutdownEnabled-5.0,5.1,5.5

vSphere6.xHADeepdive

158AdvancedSettings

Page 159: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

WhetherthecleanshutdownflagwilldefaulttofalseforaninaccessibleandpoweredOffVM.EnablingthisoptionwilltriggerVMfailoveriftheVM'shomedatastoreisn'taccessiblewhenitdiesorisintentionallypoweredoff.

das.ignoreInsufficientHbDatastore-5.0,5.1,5.5,6.0Suppressthehostconfigissuethatthenumberofheartbeatdatastoresislessthandas.heartbeatDsPerHost.Defaultvalueis“false”.Canbeconfiguredas“true”or“false”.

das.heartbeatDsPerHost-5.0,5.1,5.5,6.0Thenumberofrequiredheartbeatdatastoresperhost.Thedefaultvalueis2;valueshouldbebetween2and5.

das.failuredetectiontime-4.1andpriorNumberofmilliseconds,timeouttime,forisolationresponseaction(withadefaultof15000milliseconds).Pre-vSphere4.0itwasageneralbestpracticetoincreasethevalueto60000whenanactive/standbyServiceConsolesetupwasused.Thisisnolongerneeded.ForahostwithtwoServiceConsolesorasecondaryisolationaddressafailuredetectiontimeof15000isrecommended.

das.isolationaddress[x]-5.0,5.1,5.5,6.0IPaddresstheESXhostsusestocheckonisolationwhennoheartbeatsarereceived,where[x]=0‐9.(seescreenshotbelowforanexample)VMwareHAwillusethedefaultgatewayasanisolationaddressandtheprovidedvalueasanadditionalcheckpoint.Irecommendtoaddanisolationaddresswhenasecondaryserviceconsoleisbeingusedforredundancypurposes.

das.usedefaultisolationaddress-5.0,5.1,5.5,6.0Valuecanbe“true”or“false”andneedstobesettofalseincasethedefaultgateway,whichisthedefaultisolationaddress,shouldnotorcannotbeusedforthispurpose.Inotherwords,ifthedefaultgatewayisanon-pingableaddress,setthe“das.isolationaddress0”toapingableaddressanddisabletheusageofthedefaultgatewaybysettingthisto“false”.

das.isolationShutdownTimeout-5.0,5.1,5.5,6.0TimeinsecondstowaitforaVMtobecomepoweredoffafterinitiatingaguestshutdown,beforeforcingapoweroff.

das.allowNetwork[x]-5.0,5.1,5.5EnablestheuseofportgroupnamestocontrolthenetworksusedforVMwareHA,where[x]=0–?.YoucansetthevaluetobeʺServiceConsole2ʺorʺManagementNetworkʺtouse(only)thenetworksassociatedwiththoseportgroupnamesinthenetworkingconfiguration.In5.5thisoptionisignoredwhenVSANisenabledbytheway!

das.bypassNetCompatCheck-4.1andpriorDisablethe“compatiblenetwork”checkforHAthatwasintroducedwithESX3.5Update2.DisablingthischeckwillenableHAtobeconfiguredinaclusterwhich

vSphere6.xHADeepdive

159AdvancedSettings

Page 160: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

containshostsindifferentsubnets,so-calledincompatiblenetworks.Defaultvalueis“false”;settingitto“true”disablesthecheck.

das.ignoreRedundantNetWarning-5.0,5.1,5.5Removetheerroricon/messagefromyourvCenterwhenyoudon’thavearedundantServiceConsoleconnection.Defaultvalueis“false”,settingitto“true”willdisablethewarning.HAmustbereconfiguredaftersettingtheoption.

das.vmMemoryMinMB-5.0,5.1,5.5Theminimumdefaultslotsizeusedforcalculatingfailovercapacity.Highervalueswillreservemorespaceforfailovers.Donotconfusewith“das.slotMemInMB”.

das.slotMemInMB-5.0,5.1,5.5Setstheslotsizeformemorytothespecifiedvalue.Thisadvancedsettingcanbeusedwhenavirtualmachinewithalargememoryreservationskewstheslotsize,asthiswilltypicallyresultinanartificiallyconservativenumberofavailableslots.

das.vmCpuMinMHz-5.0,5.1,5.5Theminimumdefaultslotsizeusedforcalculatingfailovercapacity.Highervalueswillreservemorespaceforfailovers.Donotconfusewith“das.slotCpuInMHz”.

das.slotCpuInMHz-5.0,5.1,5.5SetstheslotsizeforCPUtothespecifiedvalue.ThisadvancedsettingcanbeusedwhenavirtualmachinewithalargeCPUreservationskewstheslotsize,asthiswilltypicallyresultinanartificiallyconservativenumberofavailableslots.

das.perHostConcurrentFailoversLimit-5.0,5.1,5.5Bydefault,HAwillissueupto32concurrentVMpower-onsperhost.Thissettingcontrolsthemaximumnumberofconcurrentrestartsonasinglehost.SettingalargervaluewillallowmoreVMstoberestartedconcurrentlybutwillalsoincreasetheaveragelatencytorecoverasitaddsmorestressonthehostsandstorage.

das.config.log.maxFileNum-5.0,5.1,5.5Desirednumberoflogrotations.

das.config.log.maxFileSize-5.0,5.1,5.5Maximumfilesizeinbytesofthelogfile.

das.config.log.directory-5.0,5.1,5.5Fulldirectorypathusedtostorelogfiles.

das.maxFtVmsPerHost-5.0,5.1,5.5ThemaximumnumberofprimaryandsecondaryFTvirtualmachinesthatcanbeplacedonasinglehost.Thedefaultvalueis4.

das.includeFTcomplianceChecks-5.0,5.1,5.5ControlswhethervSphereFaultTolerancecompliancechecksshouldberunaspartoftheclustercompliancechecks.SetthisoptiontofalsetoavoidclustercompliancefailureswhenFaultToleranceisnotbeingusedinacluster.

das.iostatsinterval(VMMonitoring)-5.0,5.1,5.5,6.0TheI/Ostatsintervaldeterminesifanydiskornetworkactivityhasoccurredforthe

vSphere6.xHADeepdive

160AdvancedSettings

Page 161: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

virtualmachine.Thedefaultvalueis120seconds.das.config.fdm.deadIcmpPingInterval-5.0,5.1,5.5

Defaultvalueis10.ICPMpingsareusedtodeterminewhetheraslavehostisnetworkaccessiblewhentheFDMonthathostisnotconnectedtothemaster.Thisparametercontrolstheinterval(expressedinseconds)betweenpings.

das.config.fdm.icmpPingTimeout-5.0,5.1,5.5Defaultvalueis5.DefinesthetimetowaitinsecondsforanICMPpingreplybeforeassumingthehostbeingpingedisnotnetworkaccessible.

das.config.fdm.hostTimeout-5.0,5.1,5.5Defaultis10.ControlshowlongamasterFDMwaitsinsecondsforaslaveFDMtorespondtoaheartbeatbeforedeclaringtheslavehostnotconnectedandinitiatingtheworkflowtodeterminewhetherthehostisdead,isolated,orpartitioned.

das.config.fdm.stateLogInterval-5.0,5.1,5.5Defaultis600.Frequencyinsecondstologclusterstate.

das.config.fdm.ft.cleanupTimeout-5.0,5.1,5.5Defaultis900.WhenavSphereFaultToleranceVMispoweredonbyvCenterServer,vCenterServerinformstheHAmasteragentthatitisdoingso.ThisoptioncontrolshowmanysecondstheHAmasteragentwaitsforthepoweronofthesecondaryVMtosucceed.Ifthepowerontakeslongerthanthistime(mostlikelybecausevCenterServerhaslostcontactwiththehostorhasfailed),themasteragentwillattempttopoweronthesecondaryVM.

das.config.fdm.storageVmotionCleanupTimeout-5.0,5.1,5.Defaultis900.WhenaStoragevMotionisdoneinaHAenabledclusterusingpre5.0hostsandthehomedatastoreoftheVMisbeingmoved,HAmayinterpretthecompletionofthestoragevmotionasafailure,andmayattempttorestartthesourceVM.Toavoidthisissue,theHAmasteragentwaitsthespecifiednumberofsecondsforastoragevmotiontocomplete.Whenthestoragevmotioncompletesorthetimerexpires,themasterwillassesswhetherafailureoccurred.

das.config.fdm.policy.unknownStateMonitorPeriod-5.0,5.1,5.5,6.0DefinesthenumberofsecondstheHAmasteragentwaitsafteritdetectsthataVMhasfailedbeforeitattemptstorestarttheVM.

das.config.fdm.event.maxMasterEvents-5.0,5.1,5.5Defaultis1000.Definesthemaximumnumberofeventscachedbythemaster

das.config.fdm.event.maxSlaveEvents-5.0,5.1,5.5Defaultis600.Definesthemaximumnumberofeventscachedbyaslave.

Thatisalonglistofadvancedsettingsindeed,andhopefullynooneisplanningtotrythemalloutonasinglecluster,orevenonmultipleclusters.Avoidusingadvancedsettingsasmuchaspossibleasitdefinitelyleadstoincreasedcomplexity,andoftentomoredowntimeratherthanless.

vSphere6.xHADeepdive

161AdvancedSettings

Page 162: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

SummarizingHopefullyIhavesucceededingivingyouabetterunderstandingoftheinternalworkingsofHA.IhopethatthispublicationhashandedyouthetoolsneededtoupdateyourvSpheredesignandultimatelytoincreasetheresiliencyandup-timeofyourenvironment.

Ihavetriedtosimplifysomeoftheconceptstomakeiteasiertounderstand,stillweacknowledgethatsomeconceptsaredifficulttograspandtheamountofarchitecturalchangesthatvSphere5andnewfunctionalitythatvSphere6havebroughtcanbeconfusingattimes.Ihopethoughthatafterreadingthiseveryoneisconfidentenoughtomaketherequiredorrecommendedchanges.

Ifthereareanyquestionspleasedonothesitatetoreachoutmeviatwitterormyblog,orleaveacommentontheonlineversionofthispublication.Iwilldomybesttoansweryourquestions.

vSphere6.xHADeepdive

162Summarizing

Page 163: Table of Contents - vmgu.ru · Table of Contents Introduction Disclaimer About the author Introduction to HA Components of HA ... VMware engineered a feature called VMware vSphere

Changelog1.0.1-Minoredits1.0.2-StartwithVSANStretchedClusterinUsecasesection1.0.3-StartwithVVolsectioninVSANandVVolspecificssection1.0.4-UpdatetoVVolsectionandreplaceddiagram(figure15)

vSphere6.xHADeepdive

163Changelog