bco2874 vsphere high availability 5.0 and smp fault tolerance – technical overview and roadmap...
TRANSCRIPT
![Page 1: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/1.jpg)
BCO2874
vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap
Name, Title, Company
![Page 2: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/2.jpg)
2
Disclaimer
This session may contain product features that are currently under development.
This session/overview of the new technology represents no commitment from VMware to deliver these features in any generally available product.
Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new technologies or features discussed or presented have not been determined.
![Page 3: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/3.jpg)
3
vSphere HA and FT Today
Minimize downtime without the cost/complexity of traditional solutions
vSphere HA provides rapid recovery from outages
vSphere Fault Tolerance provides continuous availability
Coverage
Hardware
Guest OS
Application
Fault Tolerance
App Monitoring APIs
none minutesDowntime
Guest Monitoring
Partnersolutions
VM
Infrastructure HA
![Page 4: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/4.jpg)
4
Coverage
Hardware
Guest OS
Application
Fault Tolerance
App Monitoring APIs
none minutesDowntime
Guest Monitoring
Partnersolutions
VMInfrastructure HA
This Talk
1. Technical overview of vSphere HA 5.0• Presented by Keith Farkas
2. Technical preview of vSphere Fault Tolerance SMP• Presented by Jim Chow
Multiple vCPUFT
HA 5.0
![Page 5: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/5.jpg)
5
vSphere HA 5.0
Objectives
Learn about the enhancements in vSphere HA 5.0
Understand the new architecture
Identify questions for the breakout / expert sessions
![Page 6: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/6.jpg)
6
vSphere HA 5.0
vSphere HA was completely rewritten in 5.0 to• Simplify setting up HA clusters and managing them
• Enable more flexible and larger HA deployments
• Make HA more robust and easier to troubleshoot
• Support network partitions
5.0 architecture is fundamentally different• This talk
• Describes the three key concepts
• Summarizes host failure responses
• To learn more, see other VMworld HA venues
![Page 7: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/7.jpg)
7
5.0 Architecture
New vSphere HA agent
• Called the Fault Domain Manager (FDM)
• Provides all the HA on-host functionality
As in previous releases
• vCenter Server (VC) manages the cluster
• Failover operations are independent of VC
• FDMs communicate over management network
vCenter Server (VC)
FDM
FDM FDM
FDM
![Page 8: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/8.jpg)
8
Key Concepts – Part 1
• FDM roles and responsibilities
• Inter-FDM communication
![Page 9: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/9.jpg)
9
One FDM is chosen to be the master
• Normally, one master per cluster
• All others assume the role of FDM slaves
Any FDM can be chosen as master
• No longer a primary / secondary role concept
• Selection done using an election
Master-specific responsibilities
• Monitors availability of hosts / VMs in cluster
• Manages VM restarts after VM/host failures
• Reports cluster state / failover actions to VC
• Manages persisted state
FDM Master
master slave
slaveslave
vCenter Server (VC)
![Page 10: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/10.jpg)
10
FDM Slave and Shared Responsibilities
Slave-specific responsibilities
Forwards critical state changes to the master
Restarts VMs when directed by the master
If the master should fail, participates in master election
Each FDM (master or slave)
• Monitors the state of local VMs and the host
• Implements the VM/App Monitoring feature
master slave
slaveslave
![Page 11: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/11.jpg)
11
An election is held when:
vSphere HA is enabled
• Master’s host becomes inactive
• HA is reconfigured on master’s host
• A management network partition occurs
If multiple masters can communicate, all but one will abdicate
Master-election algorithm
• Takes15 to 25s (depends on reason for election)
• Elects participating host with the greatest number of mounted datastores
FDM
The Master Election
FDM
ESX 1
FDM
ESX 3
FDM
ESX 4ESX 2
FDM
FDM
![Page 12: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/12.jpg)
12
master slave
slaveslave
Agent Communication
FDMs communicate over the
• Management networks
• Datastores
Datastores used when network is unavailable
• Used when hosts are isolated or partitioned
Network communication
• All communication is point to point
• Election is conducted using UDP
• All master-slave communication is via SSL encrypted TCP
![Page 13: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/13.jpg)
13
Questions Answered Using Datastore Communication
Master Slave
Is a slave partitioned or isolated? Is a master responsible for my VM?
Are its VMs running?
FDM
FDM
![Page 14: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/14.jpg)
14
Questions Answered Using Datastore Communication
Master Slave
Is a slave partitioned or isolated? Is a master responsible for my VM?
Are its VMs running?
Datastores Used
Datastores selected by VC, calledthe Heartbeat Datastores
Datastores containing VM config files
FDM
FDM
![Page 15: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/15.jpg)
15
Heartbeat Datastores
VC chooses (by default) two datastores for each host
You can override the selection or provide preferences
• Use the cluster “edit settings” dialog for this purpose
![Page 16: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/16.jpg)
16
Responses to a Network or Host Failures
![Page 17: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/17.jpg)
17
Host Is Declared Dead
Master declares a host dead when:
• Master can’t communicate with it over the network
• Host is not connected to master
• Host does not respond to ICMP pings
• Master observes no storage heartbeats
Results in:
• Master attempts to restart all VMs from host
• Restarts on network-reachable hosts andits own host
FDM
ESX 1
FDM
ESX 3
FDM
ESX 4ESX 2
FDM
![Page 18: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/18.jpg)
18
ESX 3
FDM
FDM
Master declares a host partitioned when:
• Master can’t communicate with it over the network
• Master can see its storage heartbeats
Results in:
• One master exists in each partition
• VC reports one master’s view of the cluster
• Only one master “owns” any one VM
• A VM running in the “other” partition will be
• monitored via the heartbeat datastores
• restarted if it fails (in master’s partition)
• When partition is resolved, all but one master abdicates
FDM
ESX 1
FDM
ESX 4ESX 2
FDM
Host Is Network Partitioned
![Page 19: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/19.jpg)
19
Host Is Network Isolated
A host is isolated when:
It sees no vSphere HA network traffic
It cannot ping the isolation addresses
Results in:
Host invokes (improved) Isolation response
• Checks first if a master “owns” a VM
• Applied if VM is owned or datastore is inaccessible
• Default is now Leave Powered On
Master
• Restarts those VMs powered off or that fail later
• Reports host isolated if both can access itsheartbeat datastores, otherwise dead
FDM
ESX 1
FDM
ESX 3
FDM
ESX 4ESX 2
FDM
Isolation Addresses
![Page 20: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/20.jpg)
20
Key Concepts – Part 2
HA Protection and failure-response guarantees
![Page 21: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/21.jpg)
21
vSphere HA Response to Failures
Type of Failure Response Applicable to VMs
Guest OS hangs, crashesReset VM With tools installed
Application heartbeats stop
Host fails (e.g., reboots)Attempt
VM restartThe responding master knows are HA ProtectedHost isolation (VM powered off)
VM fails (e.g., VM crashes)
![Page 22: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/22.jpg)
22
HA Protected Workflow
Master receives directive from VC
VC tells master to protect the VM
User issues power on for a VM
VC learns that the VM powered on
Master writes fact to a file
Write is done
Host powers on the VM
time
![Page 23: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/23.jpg)
23
HA Restart Guarantee
An attempt will be madefor failures now and in future
An attempt may be madeif a failure occurs now
Master receives directive from VC
VC tells master to protect the VM
User issues power on for a VM
VC learns that the VM powered on
Master writes fact to a file
Write is done
Host powers on the VM
time
![Page 24: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/24.jpg)
24
vSphere HA Protection Property
Is a new per-VM property
Reports on whether a restart attempt is guaranteed
Is shown on the VM summary panel and optionally in VM lists
![Page 25: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/25.jpg)
25
Values of the HA Protection Property
Unprotected
Protected
Value reported by VC
N/A
Master receives directive from VC
VC tells master to protect the VM
User issues power on for a VM
VC learns that the VM powered on
Master writes fact to a file
Write is done. Master tells VC
Host powers on the VM
time
VC learns VM has been protected
![Page 26: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/26.jpg)
26
Wrap Up
![Page 27: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/27.jpg)
27
vSphere HA feature provides organizations the ability to run their critical business applications with confidence
5.0 Enhancements provide
• A solid, scalable foundation upon which to build to the cloud
• Simpler management and troubleshooting
• Additional and more robust responses to failures
Resource Pool
vSphere HA Summary
VMware ESXi VMware ESXi VMware ESXi
Failed Server Operating ServerOperating Server
![Page 28: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/28.jpg)
28
To Learn More About HA and HA 5.0
At VMworld
• See demo in VMware booth in solutions exchange
• Try it out in lab HOL04 – Reducing Unplanned Downtime
• Attend group discussions GD15 and GD35 – vSphere HA and FT
• Review panel session VSP1682 – vSphere Clustering Q&A
• Talk with knowledge expert (EXPERTS-09)
Offline
• Availability Guide
• Best Practices Guide
• Troubleshooting Guide
• Release notes
![Page 29: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/29.jpg)
29
vSphere Fault Tolerance SMPTechnical Preview
Objectives
Why Fault Tolerance?
What’s new: SMP
![Page 30: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/30.jpg)
30
vSphere Availability Portfolio
Coverage
Hardware
Guest OS
Application
Fault Tolerance
App Monitoring APIs
none minutesDowntime
Guest Monitoring
VM
Infrastructure HA
![Page 31: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/31.jpg)
31
Why Fault Tolerance?
Continuous Availability
• Zero downtime
• Zero data loss
• No loss of TCP connections
• Completely transparent to guest software
• Simple UI: Turn On Fault Tolerance
• Delegate all management to the virtual infrastructure
OS
Apps
Users
![Page 32: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/32.jpg)
32
Background
2009: vSphere Fault Tolerance in vSphere 4.0
2010: Updates to vSphere Fault Tolerance in vSphere 4.1
2011: Updates to vSphere Fault Tolerance in vSphere 5.0
Details: http://www.vmware.com/products/fault-tolerance/
Problem:
• FT only for uni-processor VMs
• Is FT for multi-processor VMs possible?
• An impressively hard problem
• Concerted effort to find an approach
Reached milestone
• We’d like to share it
![Page 33: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/33.jpg)
33
A Starting Point: vSphere FT
Application
Operating System
Virtualization Layer
Application
Operating System
Virtualization Layer
FT LOGGING
Shared Disk
vLockstep
![Page 34: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/34.jpg)
34
A Clean Slate
Application
Operating System
Virtualization Layer
Application
Operating System
Virtualization Layer
FT LOGGING
Shared Disk
vLockstep
10 GigE
SMP protocol
![Page 35: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/35.jpg)
35
A Clean Slate
Application
Operating System
Virtualization Layer
Application
Operating System
Virtualization Layer
FT LOGGING
10 GigE
SMP protocol
Spare you the details
See it in action
![Page 36: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/36.jpg)
36
Live Demo
Application
Operating System
Virtualization Layer
Application
Operating System
Virtualization Layer
FT LOGGING
10 GigE
SMP protocol
Experimental setup, caveats
Client
Operating System
![Page 37: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/37.jpg)
37
Live Demo Summary
SMP FT in action
• Presented a good solution
• Client oblivious to FT operation
• SwingBench client
• SSH client
• Transparent failover
• Zero downtime, zero data loss
• Taste for performance / bandwidth
But that’s not all
![Page 38: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/38.jpg)
38
Performance Numbers
Micr
osof
t SQL
Serve
r 2-v
CPU
Micr
osof
t SQL
Serve
r 4-v
CPU
Oracle
Swin
gben
ch 2
-vCPU
Oracle
Swin
gben
ch 4
-vCPU
0
40
80
% Throughput (FT/non FT)(higher is better)
Similar configuration to vSphere 4 FT Performance Whitepaper
• Models real-world workloads: 60% CPU utilization
![Page 39: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/39.jpg)
39
vSphere FT Summary
Why Fault Tolerance
• Continuous availability
Fault Tolerance for multi-processor VMs
• Good solution to impressively hard problem
• A new design
• Demonstrated similar experience to existing vSphere FT
• But more vCPUs
![Page 40: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/40.jpg)
40
vSphere HA and FT
Future Directions
![Page 41: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/41.jpg)
41
vSphere HA and FT – Technical Directions
Technical directions include
More comprehensive coverage of failures for more applications
Fault ToleranceHardware/VM
Application
Multi-tierapplication
Multiple vCPUs MetroHAInfrastructure HA
DowntimeProtection against host component failures
Coverage
App Monitoring APIs
Guest OS VM/Guest Monitoring
![Page 42: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/42.jpg)
42
vSphere HA and FT – Technical Directions
Technical directions include
More comprehensive coverage of failures for more applications
Broader set of enablers for improving availability of applications
Fault ToleranceMultiple vCPUs MetroHA
Infrastructure HA
VM/Guest Monitoring
DowntimeProtection against host component failures
Coverage
App Monitoring APIs
Building blocks for creating available
apps
API extensions
Hardware/VM
Application
Multi-tierapplication
Guest OS
![Page 43: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/43.jpg)
43
Fault Tolerance
vSphere HA and FT – Technical Directions
Technical directions include
More comprehensive coverage of failures for more applications
Broader set of enablers for improving availability of applications
Multiple vCPUs MetroHAInfrastructure HA
none minutesDowntime
Protection against host component failures
Coverage
App Monitoring APIs
Building blocks for creating available
apps
API extensions
Partnersolutions
VM/Guest Monitoring
Hardware/VM
Application
Multi-tierapplication
Guest OS
![Page 44: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/44.jpg)
44
vSphere HA and FT – Technical Directions
Technical directions include
More comprehensive coverage of failures for more applications
Broader set of enablers for improving availability of applications
Fault ToleranceMultiple vCPUs MetroHA
Infrastructure HA
none minutesDowntime
Protection against host component failures
Coverage
App Monitoring APIs
Building blocks for creating available
apps
API extensions
Partnersolutions
VM/Guest Monitoring
Solidifying vSphere as the platform for running all mission-critical applications
Hardware/VM
Application
Multi-tierapplication
Guest OS
![Page 45: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/45.jpg)
45
Thank you!
Questions?
![Page 46: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/46.jpg)
![Page 47: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/47.jpg)
BCO2874
vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap
![Page 48: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/48.jpg)
48
Additional vSphere HA 5.0 Details
![Page 49: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/49.jpg)
49
Troubleshooting
![Page 50: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/50.jpg)
50
Troubleshooting vSphere HA 5.0
HA issues proactive warning about possible future conditions
• VMs not protected after powering on
• Management network discontinuities
• Isolation addresses stop working
HA host states provide granularity into error conditions
All HA conditions reported via events; config issues/alarms for some
• Event descriptions describe problem and actions to take
• All event messages contain “vSphere HA” so searching for HA issues easier
• HA alarms are more fine grain and auto clearing (where appropriate)
5.0 Troubleshooting guide which discusses likely top issues. E.g.,
• Implications of each of the HA host states
• Topics on HB datastores, failovers, admission control
• Will be updated periodically
![Page 51: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/51.jpg)
51
HA Agent Logging
HA 5.0 writes operational information to a single log file called fdm.log
• A configurable number of historical copies are kept to assist with debugging
File contains a record of, for example,
• Inventory updates relating to VMs, the host, and datastores received from the host management agent (hostd)
• Processing of configuration updates sent to a master by vCenter Server
• Significant actions taken by the HA agent, such as protecting a VM or restarting a VM
• Messages sent by a slave to a master and by a master to a slave
Default location
• ESXi 5.0: /var/log/fdm.log (historical copies in var/run/log)
• Earlier ESX versions: /var/log/vmware/fdm (all files in the same directory)
Notes
• See vSphere HA best practices guide for recommended log capacities
• HA log files are designed to assist VMware support in diagnosing problems and the format may change at any time. Thus, for reporting, we recommend you rely on the vCenter Server HA-related events, alarms, config issues, and VM/host properties
![Page 52: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/52.jpg)
52
Log File Format
Log file contains time stamped rows
Many rows report the HA agent (FDM) module that logged the info
E.g.,
2011-06-01T05:48:00.945Z [FFFE2B90 info 'Invt' opID=SWI-a111addb] [InventoryManagerImpl::ProcessClusterChange]
Cluster state changed to Startup
Noteworthy modules are
• Cluster – module responsible for cluster functions
• Invt – module responsible for caching key inventory details
• Policy – module responsible for deciding what to do on a failure
• Placement – module responsible for placing failed VMs
• Execution – module responsible for restarting VMs
• Monitor – modules responsible for periodic health checks
• FDM – module responsible for communication with vCenter Server
![Page 53: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/53.jpg)
53
Additional Datastore Details for HA 5.0
• Heartbeating and heartbeat files• Protected VM files• File locations
![Page 54: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/54.jpg)
54
Heartbeat Datastores(HB): Purpose and Mechanisms
Used by master for slaves not connected to it over network
Determine if a slave is alive
• Rely on heartbeats issued to slave’s HB datastores
• Each FDM opens a file on each of its HB datastores for heartbeating purposes
• Files contain no information. On VMFS datastores, file will have the minimum-allowed file size
• Files are named X-hb, where X is the (SDK API) moID of the host
• Master periodically reads heartbeats of all partitioned / isolated slaves
Determine the set of VMs running on a slave
• A FDM writes a list of powered on VMs into a file on each of its HB datastores
• Master periodically reads the files of all partitioned/isolated slaves
• Each poweron file contains at most 140 KB of info. On VMFS datastores, actual disk usage is determined by the file-sizes supported by the VMFS version
• They are named X-powereon, where X is the (SDK API) moID of the host
![Page 55: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/55.jpg)
55
VM Protected Files
Protected-vm files are used
• When recovering from a master failure
• To determine whether a master is responsible for a given VM
• To divvy the VMs up between masters during a partition
One protetedlist file per datastore per cluster using the datastore
• It stores the local paths of the protected VMs
• A VM is listed only in the file on the datastore containing its config file
Each file is a fixed 2 MB in size
![Page 56: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/56.jpg)
56
File Locations
FDMs create a directory (.vSphere-HA) in root of each relevant datastore
Within it, they create a subdirectory for each cluster using the datastore
Each subdirectory is given a unique name called the Fault Domain ID
<VC uuid>-<cluster entity ID>-<8 random hex characters>-<VC hostname>
• Entity ID is the number portion of the (SDK API) moID of the cluster
E.g., in /vmfs/volumes/clusterDS/.vSphere-HA/
FDM-C8496A0D-12D2-4933-AE02-601BCDDB9C61-9-d6bfc023-vc23/ Cluster 9
FDM-C8496A0D-12D2-4933-AE02-601BCDDB9C61-17-ad9fd307-vc23/ Cluster 17
![Page 57: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/57.jpg)
57
UI Changes
![Page 58: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/58.jpg)
58
Summary of UI Changes
Cluster Summary Screen
• Advanced Runtime Info (improved)
• Cluster Status (new)
• Configuration Issues (improved)
Cluster and datacenter
• Hosts list view (improved)
Cluster Configuration
• Datastore Heartbeating (new)
• Admission Control (improved)
Host, cluster, datacenter
• VM list view (improved)
Host Summary Screen
• HA host state (improved)
VM Summary Screen
• HA Protection (improved)
Cluster
![Page 59: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/59.jpg)
59
Summary of UI Changes
Cluster Summary Screen
• Advanced Runtime Info (improved)
• Cluster Status (new)
![Page 60: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/60.jpg)
60
Summary of UI Changes
Cluster Summary Screen
• Advanced Runtime Info (improved)
• Cluster Status (new)
• Configuration Issues (improved)
![Page 61: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/61.jpg)
61
Summary of UI Changes
Cluster Summary Screen
• Advanced Runtime Info (improved)
• Cluster Status (new)
• Configuration Issues (improved)
Cluster and datacenter
• Hosts list view (improved)
![Page 62: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/62.jpg)
62
Summary of UI Changes
Cluster and datacenter
• Hosts list view (improved)
Cluster Configuration
• Datastore Heartbeating (new)
![Page 63: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/63.jpg)
63
Summary of UI Changes
Cluster Summary Screen
• Advanced Runtime Info (improved)
• Cluster Status (new)
• Configuration Issues (improved)
Cluster and datacenter
• Hosts list view (improved)
Cluster Configuration
• Datastore Heartbeating (new)
• Admission Control (improved)
Host, cluster, datacenter
• VM list view (improved)
Host Summary Screen
• HA host state (improved)
VM Summary Screen
• HA Protection (improved)
![Page 64: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/64.jpg)
64
Summary of UI Changes
Host, cluster, datacenter
• VM list view (improved) showing protected VMs
![Page 65: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/65.jpg)
65
UI Changes
Cluster Summary Screen
• Advanced Runtime Info (improved)
• Cluster Status (new)
• Configuration Issues (improved)
Cluster and datacenter
• Hosts list view (improved)
Cluster Configuration
• Datastore Heartbeating (new)
• Admission Control (improved)
Host, cluster, datacenter
• VM list view (improved)
Host Summary Screen
• HA host state (improved)
VM Summary Screen
• HA Protection (improved)
![Page 66: BCO2874 vSphere High Availability 5.0 and SMP Fault Tolerance – Technical Overview and Roadmap Name, Title, Company](https://reader035.vdocument.in/reader035/viewer/2022062314/56649da75503460f94a92c00/html5/thumbnails/66.jpg)
66
UI Changes
Cluster Summary Screen
• Advanced Runtime Info (improved)
• Cluster Status (new)
• Configuration Issues (improved)
Cluster and datacenter
• Hosts list view (improved)
Cluster Configuration
• Datastore Heartbeating (new)
• Admission Control (improved)
Host, cluster, datacenter
• VM list view (improved)
Host Summary Screen
• HA host state (improved)
VM Summary Screen
• HA Protection (improved)