vsphere fault tolerance - rainfocus · target market of vsphere fault tolerance no big deal, create...

36
Yiting Jin, Product Management, VMware Joe Bruneau, Systems Administrator, General Mills Sebastian Neagu, Principal Engineer, United Airlines Rick Stopf, Product Marketing Manager, Honeywell SER3107PU #VMworld #SER3107PU Running on Zero Downtime, Zero Data Loss: Real-Life Cases with vSphere Fault Tolerance Users VMworld 2017 Content: Not for publication or distribution

Upload: others

Post on 12-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Yiting Jin, Product Management, VMwareJoe Bruneau, Systems Administrator, General MillsSebastian Neagu, Principal Engineer, United AirlinesRick Stopf, Product Marketing Manager, Honeywell

SER3107PU

#VMworld #SER3107PU

Running on Zero Downtime, Zero Data Loss: Real-Life Cases with vSphere Fault Tolerance Users

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 2: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

• This presentation may contain product features that are currently under development.

• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

• Technical feasibility and market demand will affect final delivery.

• Pricing and packaging for any new technologies or features discussed or presented have not been determined.

Disclaimer

2#SER107PU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 3: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

1,000 Host failures per year

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 4: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Target Market of vSphere Fault Tolerance

No big deal, create a new one

(“Cattle”)

Bring them back up, but HA restart is

enough

(“Pattle”)

Disastrously expensive if any data loss or

downtime.

SAVE AT ALL COST

(“Pets”)

e.g. apps monitoring acid chemical pools, apps tracking inventory and revenue generation

e.g. standard production VMs

e.g. test VMs

What happens when each type of workload starts going down?

“For everything else,

there’s HA”

“Workloads where I can’t afford to lose any

state or experience downtime”

0-RPO, 0-RTO > 0-RPO / RTO is okay

Fault

Tolerance… who cares?

#SER107PU CONFIDENTIAL 4

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 5: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

What’s a vSphere Admin to do?

#SER107PU CONFIDENTIAL

Disastrously expensive if any

data loss or downtime.

SAVE AT ALL COST

(“Pets”)

e.g. apps monitoring acid / chemical pools, apps tracking inventory and revenue generation

1. Spend in-house resources building application protection –

for each type of mission-critical workload you have

2. Pay extra $$$ for third party solutions and support, spend time

training teams on the technology, add complexity to availability

management

3. “… nah, they’ll never go down.”

4. Enable vSphere Fault Tolerance – and not pay anything extra

5

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 6: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

6

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 7: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

New in vSphere 6.5

• Performance improvements on maximum and average response time

– Reduced maximum latency from 100ms to 12ms, average of 1ms

• Multiple NIC aggregation for improved performance

– e.g. rather than dedicating 1 single 10 Gb NIC – aggregate multiple 10+ Gb NICs for FT network

• Interoperate with Distributed Resource Scheduler (DRS)

– DRS takes into consideration FT requirements in determining optimal initial host placement

7#SER107PU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 8: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Using Fault Tolerance with VSAN (vSphere 6.0u1 and later)

• Fault-tolerant VSAN datastore in cluster

• Restart VMs from other hosts in a VSAN cluster

• Preserve storage policies across FT failovers

• Secondary FT VM can be placed on the same VSAN datastore as the primary

• FT primary VM and secondary VM are independent from any replicated VMs for VSAN

• FT and VSAN for Remote and Branch Offices (ROBO)

#SER107PU CONFIDENTIAL 8

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 9: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 10: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 11: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 12: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Introduction of Panelists

▪ Joe Bruneau, Systems Administrator, Enterprise Infrastructure, General Mills

▪ Sebastian Neagu, Principal Engineer, United Airlines

▪ Rick Stopf, Product Marketing Manager, Honeywell

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 13: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

History with VMware Products and Solutions

• Global footprint

• Number of datacenters, vCenters, hosts globally

#SER107PU CONFIDENTIAL 13

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 14: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

What Does One Minute of Downtime Mean to You?

• Elaborate on some past experiences when hardware failure was costly

• Host failures vs. Storage failures

#SER107PU CONFIDENTIAL 14

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 15: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Describe Your Offerings and Future FT Enablement

• What kinds of applications and workloads do you protect today with Fault Tolerance?

• What are you looking to protect in the future?

#SER107PU CONFIDENTIAL 15

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 16: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Alternate Solutions for Protecting Workloads

• Can you talk about alternative solutions and how your experience there was compared with FT?

• Ease of setup

• Zero downtime / zero data loss

• Ability to integrate with vSphere features such as vMotion, snapshots, backups

• Do you think differently about hardware failure?

• How was performance? Is the tradeoff between performance and zero-data loss, zero-downtime protection worth it?

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 17: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

How Easy was it to Set Up Fault Tolerance?

• Setup through vSphere client

• Networking requirements: FT logging bandwidth

• Storage: redundant VMDKs

• Capacity planning and memory reservation

#SER107PU CONFIDENTIAL 17

FT logging channel

Primary Secondary

Datastore A Datastore B

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 18: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Supported Scalability and Hardware Requirements

#SER107PU CONFIDENTIAL

4 vCPU / 64 GB vRAM per FT VM

8 vCPU / 128 GB vRAM per host

4 total FT VMs per host

16 virtual disks

Virtual disk size: 2 TB

10 Gb link for FT Logging Network + Multi-NIC

aggregation(dedicated 10Gb not required,

but recommended)

18

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 19: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Technology Preview

• Increasing to 8 vCPU / 128 GB vRAM per FT-protected VM

– Same host scalability: 8 vCPU of FT VMs per host, 4 total FT VMs per host

• Storage Failure Protection for Fault Tolerance

– Integration with VM Component Protection (VMCP)

– Storage APD / PDL failures will trigger FT failover instead of restarting VM. No data loss.

• End of support for Legacy Record & Replay (1-vCPU) Fault Tolerance

• Fault Tolerance with Site Recovery Manager

• Longer term: Stretched Cluster FT

– Collaboration with Distributed Resource Scheduler (DRS) team

19#SER107PU CONFIDENTIAL

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 20: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

20

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 21: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Summary

21

• Fault Tolerance provides Zero data loss, Zero downtime protection against host failures

• No extra licensing cost

• No need to change your applications

• Simple to manage with software

▪ FT integration with VSAN

• No extra shared storage setup needed

▪ Technology preview provides storage protection with Fault Tolerance, improved scalability

(to 8vCPU per FT VM)VMworld 2017 Content: N

ot for publicatio

n or distribution

Page 22: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Q & A

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 23: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Related Sessions

23

Session Day / Time Session Type

ELW181107U – vSphere HTML Client SDK - Build a Plugin

WorkshopSunday, 1:30 pm – 3:00 pm Hands on Labs

SER3101PU – Acting as One: Plug in to vSphere Monday, 2:30 pm – 3:30 pm Panel Discussion

SER3100GU – Discuss Plug-In Experience with the vSphere

ClientTuesday, 11:30pm – 12:30 pm Group Discussion

SER1411BU – vSphere Clients Roadmap: HTML5 Client, Host

Client, and Web ClientTuesday, 1:00 pm – 2:00 pm Breakout

SER3084BU – Mind Your Foundation: Extending the Power of

the vSphere PlatformTuesday, 5:30 pm – 6:30 pm Breakout

SER3107PU – Running on Zero Downtime, Zero Data Loss:

Real-Life Cases with vSphere Fault Tolerance UsersWednesday, 8:30 am – 9:30 am Panel Discussion

SER1792GU – Discussion of vSphere Web Client (HTML5) and

the Transition ExperienceWednesday, 11:30 pm – 12:30 pm Group Discussion

SER2790BU – Journey to a vSphere HTML Client Ecosystem:

Deep Dive with Big Switch NetworksWednesday, 3:30 pm – 4:30 pm Breakout

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 24: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 25: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Follow us on Twitter: @VMwarevSphere@YitingJin

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 26: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Appendix

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 27: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Improved Fault Tolerance workflow

27

Simplifying protection for your VMs

Right-click on VM to turn on Fault Tolerance

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 28: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Improved Fault Tolerance workflow

28

Simplifying protection for your VMs

Right-click on VM to turn on Fault Tolerance

Select datastore for VM configuration files

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 29: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Improved Fault Tolerance workflow

29

Simplifying protection for your VMs

Right-click on VM to turn on Fault Tolerance

Select datastore for VM configuration files

Select another host in the HA cluster to place the secondary VM

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 30: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Improved Fault Tolerance workflow

30

Simplifying protection for your VMs

Right-click on VM to turn on Fault Tolerance

Select datastore for VM configuration files

Select another host in the HA cluster to place the secondary VM

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 31: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

▪ Continuous Availability for all FT-protected workloads

▪ Protect mission critical applications from vSphere host failure

▪ RPO = 0 RTO = 0 No loss of TCP connections

▪ Any OS Any Application

▪ Support workloads on vSphere STD and above: 4 vCPU per VM, 64 GB vRAM. 8 vCPU per host,

with 4 FT VMs (total primary + secondary) per host

▪ Simple Configuration: Point and click to select VM to enable FT protection

FT logging channel

Primary Secondary

Fault Tolerance: Introduction

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 32: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

32

▪ Separate VMX and VMDK files, changes to which are constantly mirrored to the secondary

▪ FT creates a second copy of VMDKs

• Can be located on separate datastores for further fault domain isolation

FT logging channel

Primary Secondary

Datastore A Datastore B

Redundant Storage

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 33: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

FT logging channel

Primary SecondaryNew Primary

New Secondary

FT logging channel

▪ Failure Occurs: The secondary VM becomes the primary

▪ HA starts a new secondary VM on a new host

▪ HA initiates a new FT migration on the primary VM to set up the FT protection again

Failover

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 34: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

In order for to get to Zero downtime,

Zero data loss:

▪ Any data generated by the primary will not be

transmitted to the outside world until that

data has been replicated completely to the

secondary

CONFIDENTIAL 34

Network

FT pair

(Why 0-downtime and 0-data loss isn’t free)

▪ Outgoing network packets are batched,

agreement between primary and secondary is

achieved, and packets are released en masse

every checkpoint

▪ This adds a varying degree of latency and jitter

to every network packet

Why Fault Tolerance adds Network Latency

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 35: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

Best Practices and Hardware Requirements

▪ Requires Intel Sandy Bridge / AMD Bulldozer or later

▪ Improved performance on newer processor generations

▪ Recommend 10Gb NIC for a separate FT logging network

Configuration requirements

▪ VMs to be protected by FT must be in an HA cluster

▪ Shared storage for configuration file and tiebreaker (witness / arbiter) files so that the primary and secondary VMs can see the files.

▪ 2 separate VMDKs for redundancy: 1 for primary VM, 1 for secondary VM

▪ VMDKs can be local, but VMDKs on shared storage provide the advantage of multiple hosts being able to restart secondary VMs.

CONFIDENTIAL 35

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 36: vSphere Fault Tolerance - RainFocus · Target Market of vSphere Fault Tolerance No big deal, create a new one (“Cattle”) Bring them back up, but HA restart is enough (“Pattle”)

More info on FT in vSphere 6.0

▪ Best practices for deploying SMP-FT in vSphere 6:

http://www.vmware.com/techpapers/2015/performance-

best-practices-for-vmware-vsphere-60-10480.html

▪ vSphere 6 FT Performance Paper:

http://blogs.vmware.com/performance/2016/01/vsphere6

-fault-tolerance-perf.htmlVMworld 2017 Content: N

ot for publicatio

n or distribution