v sphere performance pitfalls

8/12/2019 V Sphere Performance Pitfalls

1/12

Understanding Major

Performance Pitfalls

in vSphere

V e r n a k . c o m

7 / 2 9 / 2 0 1 3

Technical White Paper by Dan McGee

This document is designed to help IT Professionals avoid

painful design mistakes or oversights in a vSphere

environment.


2/12

Page 1

Table of Contents1. Introduction............................................................................................................................... 2

2. CPU .................................................................................................................................. 3

2.1 Understanding CPU % Ready and the CPU Scheduler ...................................................... 3

2.2 Processor Caching ............................................................................................................. 3

2.3 Failure to Complete P2V Migrations ............................................................................. 3-4

3. Memory ............................................................................................................................ 5

3.1 Overcommitted Memory .................................................................................................. 5

3.2 Memory Limits on Gold Images ..................................................................................... 5-6

3.3 Failure to Maintain Like VMs ............................................................................................ 6

4. Network ........................................................................................................................... 7

4.1 Understanding Default NIC Configuration........................................................................ 7

4.2 STP, STA, BPDU Guard & BPDU Filter ............................................................................ 7-8

4.3 Broadcasts, Flat VLANs and Interrupts .......................................................................... 8-9

5. Storage ........................................................................................................................... 10

5.1 Dangers of Thin Provisioning .......................................................................................... 10

5.2 Array Misconfiguration .............................................................................................. 10-11

5.3 Alignment Issues ............................................................................................................. 11

6. Further Reading .............................................................................................................. 11


3/12

Page 2

1. Introduction

A colleague of mine recently asked me What are the things that just grind our

virtualized servers to a halt? What are the big performance nightmares in vSphere? They did

not just want to know some troubleshooting methodology and how it changes going from

physical to virtual servers. They wanted to know the juicy stuff. They wanted the dark,

disturbing stories from organizations that have had it all go to hell in a handbasket. The types

of companies that have suffered performance issues they cant explain, lower ROI, resume-

generating events, and parted with enormous piles of cash on hardware that wasnt

appropriate for them. This White Paper is my answer to the heart of the inquiry and it is my

hope that these examples provide a basis for better decision making in a vSphere environment.

The examples to follow are by no means mutually exclusive from one another, nor are

they collectively exhaustive. Also, I will not be discussing certain features such as vMotion, HA,and DRS in any significant detail. This is not to discredit their incredible value to a healthy

vSphere environment by any means. For more information on these and other technologies,

please reference the Future Reading section at the end of this document.

Virtual environments are inherently dynamic and utilization levels can change drastically

and quickly. Virtual machines can be created or cloned in minutes by administrator actions or

sometimes by automation tools. HA restarts or DRS events can also add to the less predictable

virtualized world. Unlike physical environments, the underlying resource categories are shared

and this requires IT professionals to learn and adapt to new rules and guidelines in the data

center. The four primary resource categories are CPU, Memory, Network (I/O), and Storage

(Capacity & I/O). The following sections will briefly explain how they are shared and

demonstrate soul-crushing scenarios in each category and how they could be resolved or

avoided.


4/12

Page 3

2. CPU

2.1 Understanding CPU % Ready and the CPU Scheduler

In Larry Louks Critical VMware Mistakes You Should Avoid, the following foundational rules arewell laid out for server processor management:

1. ESXi Host Server schedules VMs onto and off-of processor as needed.2. Whenever a VM is scheduled to processor, all of the cores must be available for the VM to be

scheduled or the VM cannot be scheduled at all.

3. If a VM cannot be scheduled to processor when it needs access to processor, VM performancecan suffer tremendously.

4. When VMs are ready for a processor but unable to be scheduled, this creates what VMwarecalls CPU % Ready values.

5. CPU % Ready manifests as a utilization issue but is actually a scheduling issue.Due the rules mentioned above, mixing VMs with greatly varied vCPUs on the same ESXi host

can create major scheduling problems. This issue is amplified if the host has low density for CPU

cores. Whenever possible, reduce VMs to a single vCPU. Often this is not recommended or

possible but never do the opposite of arbitrarily thinking that throwing vCPUs at a VM will

actually make it perform faster.

2.2 Processor Caching

Processor caches contain information that allow guest operating system and applications to

perform better. To this end, vSphere attempts to schedule on the same core over and over again. If thevirtual machine is moved from to a new core on a new socket where the cache isnt shared (which is

usually isnt between sockets) the cache for the new core must be loaded over time with information

formerly cached on the previous core. The reason I discuss processor caching is because VMware

administrators need to understand that there is a temporary switching cost associated with things such

as a hot migration with vMotion.

2.3 Failure to Complete P2V Migrations

Physical to Virtual migrations involve imaging of an entire physical server including OS,

applications and data, into a virtual disk. The often neglected second phase of a proper migration isstripping the new virtual machine of physical device drivers and related hardware-vendor specific tools.

Plain and simple, physical drivers need to be replaced with virtual ones (VM Tools). Here is a list of steps

for a proper P2V migration:

1. Plan capacity2. Pre-migration cleanup3. Full shutdown migration of critical apps and databases, ensuring stable data


5/12

Page 4

4. Exclude unneeded partitions5. Split array partitions into separate virtual disks6. Post-migration cleanup7. Adjust VM and Guest OS to single vCPU where possible8. Remove unnecessary applications9. Review virtual hardware assignments


6/12

Page 5

3. Memory

3.1 Overcommitted Memory

The total physical RAM in an ESXi host is available to VMs aside from a small amount for the

VMKernel. If I have I have an ESXi host with 64GB of physical RAM, I could build VMs on it and assign

more than 64GB total to the VMs. When this is done, I am overcommitting memory. By default, a

virtual machine does not power on with all of the RAM assigned to it. It boots with half of its RAM

assignment unless the minimum value is changed for the VM.

This memory overcommitment does not cause major issues if the VMs dont actually need all of

the RAM assigned to them. However, if the ESXi host is overcommitted and VMs actually need RAM in

excess of RAM available, there are a few ESXi processes that attempt to keep it all up and running.

These processes include balloon driver activity and VMKernel swap file.

The Memory Control Driver (also known as the Balloon Driver) is installed as part of the VMTools package that should be installed on all of the VMs in the environment. The purpose of the Balloon

Driver Is to force VMs to give up some of their RAM and begin using their swap file. In other words, if

there is more demand for RAM than what the ESXi host may provide, the balloon driver inflates and

VMs are forced to move some contents of their own RAM into the guest OS swap file within their own

virtual disks. As is the case with a physical server, this will cause degradation of performance since the

VM must rewrite parts of its page file back into memory for processing.

The VMKernel swap file Is created when a new VM is first powered on in the VMFS volume that

contains the VM. This file is equal in size to the amount of RAM assigned to the virtual machine. This

file is only used in emergency situations and otherwise sits idle in a health vSphere environment. If

there is a crisis with memory overcommitment, the ESXi host will first use the balloon driver to reclaim

RAM. If the crisis continues, the ESXi host will grab parts of the VM memory and move it out to

VMKernel swap. In this scenario, VMs could be swapping both with a page file and again with a

VMKernel swap file. This takes awful performs it and drags it right down to apocalyptic.

Just because you can overcommit memory does not mean that you should. ESXi is very

resilient in keeping virtual machines up and running even in these scenarios but at an obvious cost of

performance.

3.2 Memory Limits on Gold Images

The base images that we use as templates for our various guest operating system must be

configured properly, lest they yield cloned virtual machines that share their flaws. A relatively common

mistake found in a lot of vSphere implementations is that this master image has memory limits set.

Being that the limit always wins, a VM assigned 8GB of RAM will not use 8GB if a limit was accidentally

set to 2GB. The VM will only ever be allocated up to 2GB but the VM guest thinks it has the full 8GB.


7/12

Page 6

If the VM actually needs the RAM you have assigned to it beyond the limit, severe performance issues

occur. Multiply this by the number of times a misconfigured template has been cloned in the

environment, and you can see how the nightmare scenario might play out.

3.3 Failure to Maintain Like VMs

ESXi is able to scan RAM pages for the virtual machines and identify any duplicate pages. After

this detection, the ESXi host will use a single RAM location and point multiple virtual machines to that

single page of shared RAM. The shared page location is flagged read-only to prevent corruption. To the

VM guest operating systems, they are unaware of this underlying sharing. They simply have what they

need available in RAM. When we scale this concept up to having, lets say, one thousand Windows 2008

R2 Servers, you could imagine how much RAM we could save by having OS builds that are similarly

patched and maintained. This is not to say that you should only run one type of guest operating system

in your environment because that simply isnt practical in many, if not most, environments. However, if

attention is paid to proper operating system patching, application patching, and reducing disparate VMs

and sprawl, it is possible to see high percentages of shared ram on your ESXi hosts.


8/12

Page 7

4. Network

4.1 Understanding Default NIC Configuration

The virtual machines hosted on ESXi are commonly assigned virtual network interface cards.

These virtual NICs connect to virtual switches which provide a path from the virtual NICs to the physical

adaptor(s). The physical NICs on the ESXi hosts do not have IP addresses themselves, but rather act as a

passthrough for the virtual machine traffic to the physical network. Proportional shares are utilized in

order to prevent virtual machines from monopolizing a network path. Shares are enforced only in

situations where there is contention for network resources. Otherwise, virtual machines are simply

given what they need in terms of network resources.

When two or more physical NICs are assigned to a virtual switch, the default mechanism used by

the ESXi host is a round-robin approach to load-balancing. This default methodology works fine for

many environments but can lead to network cards carrying uneven loads. In contrast, the virtual switch

can be setup to perform link aggregation as opposed to using round-robin. In this configuration, the

VMs will gain the combined throughput of the NICs while maintaining failover ability.

4.2 STP, STA, BPDU Guard & BPDU Filter

Spanning Tree Protocol (STP) enabled switches in a redundant Local Area Network (LAN) need to

exchange information between each other for STP to work properly. Bridge Protocol Data Units (BPDUs)

are messages exchange between switches in a redundant LAN. The BPDU frames contain information

such as Switch ID, originating port, MAC Address, switch port cost, switch port priority, etc. The BPDUs

are sent out as multicast messages and when they are received, the switch uses Spanning Tree

Algorithm (STA) to know when there is a Layer 2 Switch loop in network and determines which

redundantports need to be shut down. BPDUs and STA are designed to avoid Layer 2 switching loops

as well as broadcast storms.

As VMware so eloquently puts it in their KB 2047822, The STP process of identifying root bridge

and finding is the switch ports are in a forwarding or blocking state takes somewhere around 30 to 50

seconds. During that time no data can be passed from those switch ports. If a server connected to the

port cannot communicate for that long, the applications running on them will time out. To avoid this

issue of time out on the servers, the best practice is to enable Port Fast configuration on the switch

ports where the servers NICs are connected. The Port Fast configuration puts the physical switch port

immediately into STP forwarding state.

So now that we have the important STP best practice out of the way, lets discuss how Standard

and Distributed vSwitches use STP and BPDU They dont! vSwitches do not support STP nor do they

exchange BPDU frames. By default, they just forward them and think nothing of it. That said, an STP

boundary is often created by using BPDU guard on the physical switch ports. In this BPDU guard


9/12

Page 8

example, any BPDU frames received on the physical switch port causes the port to become blocked.

Since the vSwitch will simply forward BPDU frames through to the physical switch ports, this can cause

an uplink traffic path failure if a compromised virtual machine generates BPDU frames. Then, the

vSphere host moves the traffic to another uplink which then disables another switch port. This process

chains until the entire cluster is compromised by the Denial of Service attack. This attack is especially

cruel and damaging in a multi-tenant environment where a nasty neighbor can attack the neighborhood.

From KB 2047822, To prevent such Denial of Service attack scenarios, the BPDU filter feature is

supported as part of the vSphere 5.1 release. After configuring this feature at the ESXi host level, BPDU

frames from any virtual machine will be dropped by the vSwitch. This feature is available on both

Standard and Distributed vSwitches. I believe that it is critically important for VMware Administrators

and their Networking colleagues to understand how to prevent this cluster-wide catastrophe. As an

aside, there are those in the VMware community who believe that VMware should implement a true

BPDU Guard on the vSwitches and not just BPDU Filter, but that is a discussion that will not be picked up

here. Please refer to the Further Readingsection of this paper for appropriate links to related materials.

4.3 Broadcasts, Flat VLANs and Interrupts

Broadcasts tend to be very small packets and very bursty in nature. Although broadcast packets

are sent to every device on the same VLAN as the sending device, we tend to not worry too much

about broadcasts anymore because available bandwidth is usually so high that the impact of small

broadcast packets is usually negligible. Also, contemporary switches have many features to limit or

eliminate networking loops and broadcast storms such as STP.

However, there are two major reasons why we still have to care at some levels: Bad (Read: Flat)

design and Interrupts. If your organization has one, flat VLAN with all your management traffic, network

storage, and virtual machine traffic playing together, then they are not going to play nicely. When you

do this (Dont be this guy/girl), you create a scenario where broadcasts of everything affect everything

else. Do not cut corners in your network design and use VLANs that far exceed industry recommended

sizes. If you do, you will suffer the wrath of broadcasts and their best friend: Interrupts.

In the years before Plug-N-Play, devices had interrupts set manually by the use of jumpers of dip

switches. Plug-N-Play effectively makes the jumper/dip switch decisions for us now. However, devices

like NICs still need their interrupt and every time a packet hits a network card in an Intel architecture

device, it triggers an interrupt. They use interrupt requests (IRQs) to get an Operating

System/Application to stop and make sure the communication with them is correct and complete before

resuming. So when we talk about interrupts, we are talking about literally stopping the processor.Usually, these interrupt stoppages are brief and infrequent enough to not be noticeable or well

understood by many administrators. Sometimes though, organizations for whatever reasons (or lack

thereof) implement flat VLANs where an enormous amount of broadcast packets are hitting Intel

architecture devices such as ESXi Hosts. This then causes incredible slowdowns due to production

workload processing getting interrupted constantly. Proper network segmentation from the start is the


10/12

Page 9

most effective way to mitigate this risk. Remember, interrupts are like bee stings. One or two on

occasion is tolerable but you do not want to ever suffer the swarm.


11/12

Page 10

5. Storage

5.1 Dangers of Thin Provisioning

Thin provisioned virtual disks are disks for which the assigned amount of disk space is not

allocated at creation. Only enough disk space is allocated to accommodate existing data with a small

buffer space added. The VMDK file for the VM will then grow in incrementally as more data is saved.

The guest operating system sees the full amount that has been assigned to the virtual disk but the

underlying storage is conserved until needed. At first glance, this looks great to new VMware admins.

Reducing wasted disk space can be quite tempting but it comes with a price.

First and foremost, since the virtual disk is dynamically growing as needed in small increments,

there is a performance hit as the LUN is actually assigning the storage that the guest OS already thinks it

has. While were on the guest OS note, that brings me to my other point you are lying to it. As stated

before, the guest operating system sees the full provisioning amount so you are effectively lying to the

OS and expecting everything to work out fine. If you want to see a train wreck up close and personal, fill

a LUN unexpectedly when there are active, expanding VMDK files. Hint: It is going to come to a

screeching halt.

The error checking mechanisms in the guest operating system cannot detect that the VMFS

volume is full or almost full because the administrator wrote the server a bad check. The guest believes

there is enough room left to write but there is no space left physically to write to. This nightmare

scenario can definitely lead to data corruption as VMs crash with partially saved data. In a production

environment, it is highly recommended to keep thin provisioning to an absolute minimum or avoid it

completely. It is suitable for a test lab or for VMware Workstation, but do not rely on it for a production

environment.

5.2 Array Misconfiguration

Virtualized environments take advantage of shared storage types such as NAS, iSCSI, and Fiber

Channel ranking from low to high disk I/O. While fiber storage allows administrators to decide on array

configurations, many NAS and iSCSI options do not. This complicates storage purchase decisions for an

organization and several do not successfully avoid the following scenario. In a fiber channel setup, the

administrator can decide on which disks to carve up unto multiple arrays with different RAID

configurations.

The impact of VM storage I/O is isolated to the arrays upon which the VM is installed. The

ability to break down the storage into these separate arrays and RAID Groups prevents a storage I/O

bottleneck. However, in the NAS and iSCSI example with single arrays, every disk affects every other

disk in the storage device. If an iSCSI device, for instance, supports only a single array that the admin

formats in RAID 5, any change to any disk is going to be striped across every single disk. The lesson to be

learned here when purchasing storage is that it is necessary to properly understand the I/Ops and

configuration capabilities of the device and not just the storage and expandability. When organizations


12/12

Page 11

to not heed this warning, they hit their I/Ops limits long before the storage fills and they potentially

spend heaps of money on temporary storage while reconfiguring the poorly provisioned array.

5.3 Alignment Issues

If we think of storage in a virtual environment in three layers, we have the guest operating

system partition (such as NTFS), VMFS/VMDK, and the physical disk. Misalignment can occur between

the virtual disk blocks (.vmdk) and the underlying physical disk blocks. Also, it is possible to also find

misalignment between the guest operating system partition and the underlying virtual disk blocks.

Misalignment in the environment causes unnecessary I/Ops to be used to overcompensate for the

misalignment. Performance degradation ranges from moderate to severe in these instances depending

on the type and scope of the alignment problem. Most newer guest operating systems install with the

guest partition properly aligned but it still worth checking periodically within the environment. Several

organizations such as Dell provide free or paid tools in order to scan vSphere and report this type of

storage issue.

6. Further Readinghttp://blog.ipspace.net/2012/09/dear-vmware-bpdu-filter-bpdu-guard.html

http://blog.ipspace.net/2011/11/virtual-switches-need-bpdu-guard.html

http://www.vmware.com/files/pdf/techpaper/Whats-New-VMware-vSphere-51-Network-Technical-

Whitepaper.pdf

http://rickardnobel.se/esxi-5-1-bdpu-guard/

http://pubs.vmware.com/vsphere-51/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-51-

resource-management-guide.pdf
http://blog.ipspace.net/2012/09/dear-vmware-bpdu-filter-bpdu-guard.htmlhttp://blog.ipspace.net/2012/09/dear-vmware-bpdu-filter-bpdu-guard.htmlhttp://blog.ipspace.net/2011/11/virtual-switches-need-bpdu-guard.htmlhttp://blog.ipspace.net/2011/11/virtual-switches-need-bpdu-guard.htmlhttp://www.vmware.com/files/pdf/techpaper/Whats-New-VMware-vSphere-51-Network-Technical-Whitepaper.pdfhttp://www.vmware.com/files/pdf/techpaper/Whats-New-VMware-vSphere-51-Network-Technical-Whitepaper.pdfhttp://www.vmware.com/files/pdf/techpaper/Whats-New-VMware-vSphere-51-Network-Technical-Whitepaper.pdfhttp://rickardnobel.se/esxi-5-1-bdpu-guard/http://rickardnobel.se/esxi-5-1-bdpu-guard/http://pubs.vmware.com/vsphere-51/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-51-resource-management-guide.pdfhttp://pubs.vmware.com/vsphere-51/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-51-resource-management-guide.pdfhttp://pubs.vmware.com/vsphere-51/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-51-resource-management-guide.pdfhttp://pubs.vmware.com/vsphere-51/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-51-resource-management-guide.pdfhttp://pubs.vmware.com/vsphere-51/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-51-resource-management-guide.pdfhttp://rickardnobel.se/esxi-5-1-bdpu-guard/http://www.vmware.com/files/pdf/techpaper/Whats-New-VMware-vSphere-51-Network-Technical-Whitepaper.pdfhttp://www.vmware.com/files/pdf/techpaper/Whats-New-VMware-vSphere-51-Network-Technical-Whitepaper.pdfhttp://blog.ipspace.net/2011/11/virtual-switches-need-bpdu-guard.htmlhttp://blog.ipspace.net/2012/09/dear-vmware-bpdu-filter-bpdu-guard.html

v sphere performance pitfalls

Documents