vmworld 2013: operating and architecting a vsphere metro storage cluster based infrastructure

53
Operating and Architecting a vSphere Metro Storage Cluster based infrastructure Lee Dilworth, VMware Duncan Epping, VMware BCO4872 #BCO4872

Upload: vmworld

Post on 18-Jul-2015

428 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

Operating and Architecting a vSphere Metro Storage

Cluster based infrastructure

Lee Dilworth, VMware

Duncan Epping, VMware

BCO4872

#BCO4872

Page 2: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

2

Interact!

If you use Twitter, feel free to tweet about this session and use

hashtag #BCO4872

Feel free to take pictures, shoot video, and share it on twitter /

facebook

Blog about it

• We would love to read your thoughts, your opinion, design decisions!

Page 3: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

3

Agenda for Today

Availability Basics

vSphere Metro Storage Cluster Basics

Architecting and Operating

Failure Scenarios

Wrapping up

Page 4: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

4

Availability Basics

Page 5: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

5

Disaster Avoidance

Avoidance NOT Recovery

• Two sites, One vSphere Cluster

• One vCenter manages BOTH sites

• One site effectively put into maintenance mode

• Hot VM Mobility solution

Intra-cluster vMotion

Page 6: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

6

Disaster Recovery

Replication

Recovery NOT avoidance

• Two sites, typically two vSphere Clusters

• Each sites usually managed by own vCenter

• vMSC solutions CAN support disaster recovery via HA restarts

• Cold VM Mobility Solutions (SRM or vMSC “Federated HA”)

Page 7: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

7

vSphere High Availability – Setting the Baseline

vSphere HA minimizes unplanned downtime

Provides automatic VM recovery in minutes

Protects against various types of failures

• Host failure

• Host network isolation

• Permanent loss of datastore

• VM crashes (including VMX)

• Guest OS / Application crashes / hangs

Does not require complex configuration changes

Is Operating System and application-independent

Page 8: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

8

vSphere 5.0+ Architecture

HA Agent

• Called the Fault Domain Manager (FDM)

• Provides all the HA on-host functionality

Operation

• vCenter Server manages the cluster

• Failover is not dependent on vCenter

Communicate over

• Management Network

• Datastores

vCenter Server

Page 9: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

9

Master and Slave Roles

Any host can be master, selected by

election

• All others assume the role of slaves

The Master

• Monitors hosts and VMs

• Manages VM restarts after failures

• Reports cluster state to vCenter Server

The Slave

• Forwards critical state changes to the Master

• Restart VMs when directed by the Master

• Elects new Master

vCenter Server

Page 10: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

10

Network Used for Communication

Network is default communication method

• Used for selecting a Master

• Used for heartbeating

• Used for reporting state to vCenter Server

Network Heartbeating

• Used by a Master to monitor the state of a Slave

• When Master receives no heartbeats it will ping the Slave

• When Slave receives no heartbeats from Master it will ping isolation address

Page 11: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

11

Datastores Used for Communication

Datastores are used when management network is

not available

• It is used to determine state (isolated vs failed)

• Only when a failure has occurred!

• vCenter selects two for each host

Files used on datastores

• host-<id>-hb

• Heartbeat file!

• host-<id>-poweron

• Contains power state of VMs and used to communicate

isolation

• First line, either a “0” or a “1” where “1” means isolated

• protectedlist

• Owned by the master, its view of the world

Page 12: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

12

vSphere Metro Storage Cluster

the Basics (well sort of)

Page 13: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

13

What is a vSphere Metro Storage Cluster

Stretched cluster solution, not a feature!

Requires:

• storage system that “stretches” across sites

• stretched network across sites

Hardware Compatibility List (HCL) – Certified vMSC

• “iSCSI Metro Cluster Storage”

• “FC Metro Cluster Storage”

• “NFS Metro Cluster Storage”

Page 14: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

14

vSphere Metro Storage Cluster – Growing Ecosystem

Page 15: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

15

vMSC Certified Storage

Typical vSphere vMSC Setup

vCenter

Stretched Network

vSphere HA Cluster

Network

Storage

Page 16: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

16

Latency Support Requirements

ESXi management network max supported latency 10 milliseconds

Round Trip Time (RTT)

• Note: 10ms supported with Enterprise+ licenses only (Metro vMotion), default

is 5ms

Synchronous storage replication link is 5 milliseconds RTT

• Note: some storage vendors have different support requirements!

network

Page 17: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

17

When to Use Stretched vSphere Clusters?

Campus / nearby sites

• Sites within Synchronous distance

• Two buildings on a common campus

• Two datacenters within a city

Planned migration important

• Long-distance vMotion for planned maintenance, disaster avoidance, or load

balancing

DR Features less critical

• No testing, orchestration, or automation

• VMware HA typically not sufficient for automation – requires scripting / manual

process due to VM placement with primary / secondary arrays

• RTOs typically longer

Page 18: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

18

Two Architectures: Uniform Host Access Configuration (1/2)

Stretched Cluster

Storage A

LUN (R/W)

Storage B

LUN (R/O)

FC / IP

fabric fabric

Site A Site B

Page 19: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

19

Two architectures: Non-Uniform Host Access Configuration (2/2)

Stretched Cluster

Storage A

LUN (R/W)

Storage B

LUN (R/W)

fabric fabric

FC / IP

distributed

Site A Site B

Page 20: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

20

Defining Some Failure Terminology

All Paths Down (APD) – Aaahhhh where has that device gone?

• Incorrect storage removal i.e. yanked!

• Sudden storage failure

• No time for storage to tell us anything

Permanent Device Loss (PDL) – Aaahhhh the device has gone, OK I

understand

• Much nicer than APD, graceful handing of state change

• Storage notifies of device state change via SCSI sense code

• Allows HA to failover VM’s

Split Brain – Hmmm the other half has disappeared, now what?

• Election of second HA master

• Check heartbeat datastore region

• Restart VM’s (if needed)

Page 21: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

21

Architecting and Operating

vSphere Metro Storage Cluster

Page 22: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

22

Will Use Our Environment to Illustrate…

Two sites

Four hosts in total

Stretched network

Stretched storage

One vCenter Server

One vSphere HA

Cluster

fabric fabric

management

Site A Site B

Storage A

LUN (R/W)

Storage B

LUN (R/W)

FC / IP

distributed

Page 23: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

23

HA & DRS – Site Awareness

DRS

HA

network

What they think…..

What you’ve actually got…..

DRS

HA ? ?

Page 24: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

24

Why Should I Care About Site Awareness?

Operational Simplicity

• Group dependent workloads

• Increase HA predictability

• Reduce impact of full cluster partition

• Orchestrate allocation of workloads

to “sites”

• Even distribution & consumption of

cluster resources

Alignment with Storage

• Locate VM’s above read/write device

• Remove unnecessary east/west IO

traffic

• Access anywhere devices, align with

partition winner per device

Page 25: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

25

DRS Design Considerations – Affinity Rules (1/2)

DRS Host Group Per Site

DRS VM Group Per Site

Align Dependent VM Workloads

Page 26: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

26

DRS Design Considerations – Affinity Rules (2/2)

Use the “should” rules

• HA does not violate “must” therefore avoid for these configurations

Page 27: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

27

Storage DRS Design Considerations

Cluster datastores based on “site affinity”

Avoid unnecessary site-to-site migrations

Set Storage DRS to “Manual”, take control, migration *could* impact availability

Align VM’s with storage / site boundary

Group *similar* devices!

Page 28: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

28

Network Design Considerations

Network teams usually don’t like the words “Stretch” and “Cluster”

Site-to-Site vMotion – handle carefully

Ingress point to the network? Load balanced / redundant?

Consider application users – site affinity affects data flow to!

Network options are changing (OTV, EoMPLS)

L3 Routing impacts (and options LISP?)

Co-locate Multi-VM applications

Consider east-west traffic

network

Page 29: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

29

HA Design Considerations – Admission Control

What about Admission Control?

• We typically recommend setting it to 50%, to allow full site fail-over

• Admission control is not a resource management tool

• Only guarantees power-on

Page 30: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

30

HA Design Considerations – Isolation Response

Isolation response

• Configure it based on your infrastructure!

• We cannot make this decision for you, however…

Page 31: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

31

HA Design Considerations – Isolation Addresses

Isolation addresses

• Specify two, one at each site, using the advanced setting

“das.isolationaddress”

• Note that “default gateway” is an isolation address already!

isolation

address 02 isolation

address 01

Page 32: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

32

HA Design Considerations – Heartbeat Datastores

Each site needs a heartbeat datastore defined to ensure each site can update heartbeat region for storage local to that site

With multiple storage systems consider increasing default from 2 to 4 => 2 per site

Page 33: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

33

HA Design Consideration – Restart Order

You can use “restart priority” to determine restart order

This applies even when there is no contention

Only about order in restarts occur, not about when VM is booted

Page 34: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

34

Operations - Maintaining the Configuration

Storage Device <-> DRS Affinity Group

Mappings

Validate DRS Affinity regularly

Are there VM dependencies? Co-locate!

Remember HA doesn’t speak vApp

(wont’ respect restart order)

…automate if you can!

Some vendors offer tools

DRS

HA

Page 35: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

35

Failure Scenarios

Page 36: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

36

Face Your Fears!

Understand the possibilities

Test them

Test them again and keeping going until they feel normal!

vm mobility

P

A

R

T

I

T

I

O

N

Page 37: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

37

Scenario - Single Host Failure (Non-Uniform)

Storage A

LUN (R/W)

Storage B

LUN (R/W)

FC / IP

fabric fabric

management A normal HA event

No network or

datastore heartbeats

Host will be declared

dead

All VMs will be

restarted

Could violate affinity

rules

X Site A Site B

distributed

Page 38: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

38

Scenario - Full Compute Failure in One Site (Non-Uniform)

Storage A

LUN (R/W)

Storage B

LUN (R/W)

FC / IP

fabric fabric

management Normal HA event

No datastore or

network heartbeats

All virtual machines

will be restarted

Note, max 32

concurrent restarts

per host

“Sequencing” start

up order!

Will violate affinity

rules! (should rule)

X X Site A Site B

distributed

Page 39: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

39

Scenario - Storage Partition (Uniform)

Stretched Cluster

Storage A

LUN (R/W)

Storage B

LUN (R/O)

FC / IP

fabric fabric

management Virtual machines

remained running

with no impact!

Will virtual machines

be restarted on the

other site?

•No Network heartbeats!

X

Site A Site B

Page 40: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

40

Scenario - Storage Partition (Non-uniform)

Stretched Cluster

Storage A

LUN (R/W)

Storage B

LUN (R/W)

FC / IP

fabric fabric

management Virtual machines

remained running

with no impact!

Will virtual machines

be restarted on the

other site?

• Yes PDL Sense code issued.

• VM will be killed

• HA will detect and restart! X

PDL

Site A Site B

preferred

Page 41: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

41

Permanent Device Loss (PDL) Requirements (1/2)

Ensure PDL enhancements are configured

•Cluster Advanced Option • Set “Das.maskCleanShutdownEnabled” to “true”, in advanced settings

• Set to “false” by default in 5.0, change it!

• Set to “true” by default in 5.1 and up

Page 42: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

42

Permanent Device Loss (PDL) Requirements (2/2)

Ensure PDL enhancements are configured

•ESXi Host Level changes • 5.1 and earlier: Set “disk.terminateVMonPDLDefault” to “true” in

“/etc/vmware/settings”

• 5.5 and up: Set advanced setting “VMkernel.Boot.terminateVMOnPDL”

Page 43: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

43

Scenario - Datacenter Partition (Uniform) (1/3)

Stretched Cluster

Storage A

LUN (R/W)

Storage B

LUN (R/O)

FC / IP

fabric fabric

management Virtual machines

remained running

with no impact!

Remember the

affinity rules

Without affinity rules

this would result in

APD condition…

X

X

X

Site A Site B

Page 44: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

44

Scenario - Datacenter Partition (Uniform) (2/3)

Stretched Cluster

Storage A

LUN (R/W)

Storage B

LUN (R/O)

FC / IP

fabric fabric

management Affinity rule was

violated

Same VM restarted in

Site A

Results in APD for

Site B

Same VM

Same IP address

Same name

Yes, could result in

weird behavior!

X

X

X

Site A Site B

Page 45: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

45

Scenario - Datacenter Partition (Uniform) (3/3

• VM restarted in site with “storage site-affinity”

• Now you have two active instances of same VM!

• When partition is lifted, VM will be killed!

Page 46: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

46

Scenario - Loss of full datacenter (Non-Uniform)

Stretched Cluster

Storage A

LUN (R/W)

Storage B

LUN (R/W)

FC / IP

fabric fabric

management

All virtual machines

will be restarted

Note in many cases

requires manual

intervention from a

storage perspective!

HA will retry 5 times

and has a

compatibility list

Run DRS when site

returns, to apply

affinity rules and

balance load!

Site A Site B

distributed

Page 47: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

47

Wrapping Up

Page 48: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

48

Key Takeaways

Design a cluster that meets your needs don’t forget operations!

Understand HA / DRS play key part in your vMSC success

Testing is critical, don’t just test the easy stuff!

Document process changes, gain operational acceptance

Do not assume it is “Next > Next > Finish”

Ongoing maintenance/checks will be required

Automate as much as you can!

Page 49: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

49

Questions?

Page 50: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

50

Other VMware Activities Related to This Session

Group Discussions:

BCO1001-GD

Stretched Clusters for Availability with Lee Dilworth

BCO4872

Page 51: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

THANK YOU

Page 52: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure
Page 53: VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster based infrastructure

Operating and Architecting a vSphere Metro Storage

Cluster based infrastructure

Lee Dilworth, VMware

Duncan Epping, VMware

BCO4872

#BCO4872