esx performance problems 10 steps

Monitoring and Intelligently Reacting to Monitoring and Intelligently Reacting to ESX PerformanceESX Performance

Greg ShieldsGreg ShieldsPartner and Principal TechnologistConcentrated Technologywww.ConcentratedTech.com

This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it

within your own organization however you like.

For more information on our company, including information on private classes and upcoming conference appearances, please

visit our Web site, www.ConcentratedTech.com.

For links to newly-posted decks, follow us on Twitter:@concentrateddon or @concentratdgreg

Class DiscussionClass Discussion

What kinds of performance things should one monitor on an ESX server?– Why?

ESX Performance 101ESX Performance 101

Processor Use– Processor use on any server > 80%

Consider this “overuse”.– Reduce processing requirements on VMs.– Migrate VMs elsewhere, rebalance.

Processor Use– Processor use on any server > 80%

Consider this “overuse”.– Reduce processing requirements on VMs.– Migrate VMs elsewhere, rebalance.

Memory Use– Memory use on any server > 80%

Consider this “overuse”– Reduce assigned vRAM to VMs, if possible.– Migrate VMs elsewhere, rebalance.

Network throughput– Network throughput > 80% and steady

Begin analyzing throughput consumption– Consider re-routing heavy consumption to

independent pNICs & independent vSwitches.– Rebalance load, although this tends to just shift

problems.

Network throughput– Network throughput > 80% and steady

Begin analyzing throughput consumption– Consider re-routing heavy consumption to

independent pNICs & independent vSwitches.– Rebalance load, although this tends to just shift

problems.

Context Switches– Context switches significantly higher than baseline– Analyze workload. Consider V2P.– Rebalance.– Upgrade hardware to Nehalem / Opteron

IOPS– IOPS demand > IOPS supply

Consider this “overuse”– Analyze with esxtop or Disk | Usage in Performance

tab– Adding disks spreads spindle demand, reduces

contention– Consider more/smaller datastores– Consider new storage hardware that can rebalance

internally based on observed contention. $$$

DEMO: ESX performance tab. DEMO: Customizing perf stats intervals

Thank you!Thank you!Class Dismissed!Class Dismissed!

““Uh, GimmeUh, Gimme’’ a Break, Greg. a Break, Greg.Is that All YouIs that All You’’ve Got?ve Got?””

The Structured Approach!– Greg’s TEN STEP Plan to VM Happiness– Computers are deterministic.– Virtual computers are as well, however they are

much more complicated.– Virtual computers have so many more

dependencies than traditional computers. Makes the ad hoc process less intuitive.

– Your “gut feeling” with virtual environments is less effective.

Homework Reading: Performance Troubleshooting for VMware vSphere 4Get it at VMware.com

Step 1: VMware ToolsStep 1: VMware Tools

If the VMware Tools aren’t working, this will cause numerous low-level issues.– Always start by verifying their functionality

DEMO: Verifying VMware Tools status

Step 2: Verify Host CPUStep 2: Verify Host CPUSaturationSaturation

CPU saturation on an ESX host creates contention, which slows down all VMs.– Performance | Advanced– CPU | Usage– Is this number consistently above 75%?– If yes, go to Step 3.

Step 3: Verify VM Ready TimeStep 3: Verify VM Ready Time

If high host CPU usage, then the next step is to see which VM is causing the problem.– Select Host | Virtual Machines tab | Host CPU – Mhz

column.– Locate high-use VM.– Select VM | Performance tab | CPU | Ready (all

vCPUs)

If Ready > 2000ms for any vCPU, then host CPU saturation exists.

Step 3: SolutionsStep 3: Solutions

Rebalance VMs. Move VMs off this host. Increase CPU shares available to host, if

resource constrained.– Resource Pools can do this.

Reduce the number of vCPUs assigned to VMs.

Add hosts.

Step 4: Verify Guest CPUStep 4: Verify Guest CPUSaturationSaturation

Remember that CPU saturation can happen on the host, but it can also happen in the VM.– Shares/Limits/Other can restrict guest processing.– “Everything looks good on the host, but the guest is

running at 100%”

Check VM CPU for saturation– Select VM | Performance tab | CPU | Usage– Is this number consistently above 75%?

The VM is working too hard– (Aren’t we all?)– Not getting enough resources to accomplish its

task. Assign more CPU shares.– Installed workload not well-throttled. Throttle or

reconfigure applications. Balance processing across time of day.

– Add vCPUs. Only do this if the application is multi-threaded.

– Remove pinning of processes to processors.

Step 4½: Verify VMs areStep 4½: Verify VMs areActually Using their vCPUsActually Using their vCPUs

An interesting reverse! Assigning multiple vCPUs to a VM that isn’t

using them wastes resources.– If that VM isn’t using the vCPU, remove it so

another VM can use it instead.– Select VM | Performance tab | CPU | Usage– Look at all vCPU objects.– Is usage for all vCPUs but one close to 0?

Step 4½: SolutionsStep 4½: Solutions

Reduce assigned vCPUs to one.– …and don’t do that again!

Step 5: Check for HostStep 5: Check for HostMemory SwappingMemory Swapping

Memory swapping is generally always a condition you want to avoid.– Swapping exerts an incredible tax on performance.– A solution of last resort.– Select Host | Performance tab | Memory | Swap

In/Out Rate– Are either of these above 0?

Limited solutions for memory swapping.– Reduce memory overcommit. Drop the level of

assigned memory in each VM as appropriate.

– Most of us over-assign memory to VMs anyway. So, at least at first, this can sometimes be effective.

– Reduce reservations. Too many reservations can impact optimization of memory sharing.

– Add RAM.

– Enable resource controls. Note that this might cause VM memory swapping.

DEMO: Verifying a VM’s balloon driver is functioning.

Step 5½: Check for VMStep 5½: Check for VMMemory SwappingMemory Swapping

The solutions for Step 5 can cause downstream effects in each VM.– You decrease available RAM– VM doesn’t have enough– VM itself has to swap

This is a situation just as bad a host swapping.– Select Host | Performance tab | Memory | Real-

Time | Stacked Graph (per VM)– Are any VMs reporting memory swapping > 0?– If so, then that VM needs more RAM.

Step 5½: SolutionsStep 5½: Solutions

That VM needs more RAM.– You’ve gone too far with restricting its resources.

Step 6: Check forStep 6: Check forOverloaded StorageOverloaded Storage

Many paths for verifying storage utilization.– IOPS is an emerging metric.– Can also verify Command Aborts. Identifies the

number of SCSI commands that were aborted.– Select host | Performance tab | Disk | Command

Aborts | Attached LUNs.

– Are any LUNs showing Command Aborts > 0?

This indicates that the storage layer cannot keep up with the demands of VMs.– Increase storage performance. $$$– Segregate storage. Modularity assists here.– Spread VMFS LUNs across more spindles. Add

disks. Reduces storage contention.– Use tools like vscsiStats to quantify storage

behaviors.– Balance memory with storage. Sometimes

throwing more RAM at a VM lessens its storage demand.

– Buy new storage. Buy more storage. $$$

Step 6: vscsiStatsStep 6: vscsiStats

http://communities.vmware.com/docs/DOC-10095– IO size– Seek distance– Outstanding IOs– Latency in ms

Step 7-1: Check for Inbound Step 7-1: Check for Inbound Networking ProblemsNetworking Problems

An inbound network problem is a VM that cannot process receive packets.– Packets are coming in over the wire, but the VM

lacks the resources to process them.– Thus, those packets must be dropped and

retransmitted, reducing effective performance.– This creates a cascading problem. More dropped

packets == more retransmitted ones == more to do == more oversubscription. Yikes!

– Select host | Performance tab | Network | Receive Packets Dropped

– Is this value greater than 0?

Step 7-1: SolutionsStep 7-1: Solutions

An inability to process inbound packets usually relates to vProc overutilization.– With vNICs, your processor is needed to process

their workloads.– Not enough processor == a less-capable vNIC– Reduce VM CPU utilization– Increase VM CPU reservation– Add pCPUs. Add servers.– Verify VMs are using the most-effective driver

(VMXNET3 for most workloads).

Step 7-2: Check for Outbound Step 7-2: Check for Outbound Networking ProblemsNetworking Problems

An outbound network problem is a VM that cannot effectively send packets.– Outbound VM packets are buffered at the vSwitch.– Heavy traffic at the vSwitch can overload its

attached pNIC.– When this happens, packets get dropped and must

be retransmitted.– Select host | Performance tab | Network | Transmit

Packets Dropped– Is this value greater than 0?

Step 7-2: SolutionsStep 7-2: Solutions

An inability to process outbound packets often requires additional pNICs.– Aggregate more pNICs to handle outbound load.– Ensure you’re not using failover mode, but load

balancing.– Rebalance high network use VMs to other hosts.– Rebalance high network use VMs to other vSwitches

(which should be attached to different pNICs).– Add networking.– Reduce ambient network traffic. Isolate subnets.– Ahhh, the old backups network problem. Or, the

n00b who multicasts on the server net! We’ve all been that n00b at some point…

Step 8: Check forStep 8: Check forSlow StorageSlow Storage

“Slow” storage is represented by high storage latency.– Essentially, the storage isn’t responding fast enough.– Storage layer itself could be insufficient, or

overloaded.– Select host | Performance tab | Disk | Physical Device

Read/Write Latency (all LUNs)– Are any average latencies greater than 10ms, or any

peaks above 20ms.*

– * These are VMware’s suggested starting values. Yours may be different based on storage architecture.

This indicates that the storage layer cannot keep up with the demands of VMs.– Increase storage performance. $$$– Segregate storage. Modularity assists here.– Spread VMFS LUNs across more spindles. Add disks.

Reduces storage contention.– Use tools like vscsiStats to quantify storage behaviors.– Balance memory with storage. Sometimes throwing

more RAM at a VM lessens its storage demand.

– Buy new storage. Buy more storage. $$$

– Notice that these are the same as for Step 6!

ESX Server

SAN Storage Device

Step 9: Check for Low VMStep 9: Check for Low VMCPU UtilizationCPU Utilization

Wait a minute! Isn’t low VM CPU utilization a good thing? Isn’t this why virtualization works?– Yes, and no.– Low VM CPU utilization can mean a low-needs

workload.– It can also mean a workload in a wait state.– Only check here if end user experience is suffering.– Select VM | Performance tab | CPU | Usage (VM)– Is this a lower than expected value?

Suffering end user experience but low CPU utilization usually indicates a wait state.– Verify other counters: Network, storage.– Storage response time?– Network response time?– Other servers or virtual servers that this workload

relies upon to do its job?

– Another common source: Overly restrictive resource allocations.

Step 10: Check for MemoryStep 10: Check for MemoryReclamationReclamation

Remember that ESX’s balloon driver will reclaim memory that it doesn’t believe a VM needs.– However, that driver has very limited visibility into

what each VM is actually doing with its memory.– It becomes a problem when memory that the VM

needs is reclaimed. Kind of like a double page fault.

– Select host | Performance tab | Memory | Balloon– If this value is greater than 0, then…

Step 10: Check for MemoryStep 10: Check for MemoryReclamationReclamation

Remember that ESX’s balloon driver will reclaim memory that it doesn’t believe a VM needs.– However, that driver has very limited visibility into

what each VM is actually doing with its memory.– It becomes a problem when memory that the VM

Graph (per VM) | Balloon.– Is this value greater than 0 for the specific VMs

which are experiencing problems?

Ballooning occurs when there’s not enough memory to go around.– You’re oversubscribing your RAM.– This can be a good thing, unless it takes memory

from where its actually needed.– Eliminate memory overcommittment on the host.

Essentially, stop assigning more RAM to VMs than you have.

– Use reservations to ensure adequate memory for VMs.

– Be aware that this may just shift the problem elsewhere.

– Buy RAM. Buy servers. $$$

Honestly…– …go buy a product. Let someone else do the work!

This analysis takes time.– Time that you probably don’t have.– What you want is actionable information– “Convert all this math into a ‘click here’ response.”

Another problem throughout these approaches relates to their “perspective”.– Virtualization touches everything in the datacenter

and introduces dependencies everywhere.– vSphere’s perspective means that it can only see

behaviors as it observes them.– Metaphor: Einstein’s Theory of Relativity.

Third-party products tie into networking, storage, applications, user experience, etc.– They can interrelate performance from multiple

perspectives.

Who’s Who inVirtualizationPerformanceand CapacityManagement

Source: http://www.virtualizationpractice.com/blog/?p=6749

Final ThoughtsFinal Thoughts

Virtualization adds ridiculous interdependencies to the IT datacenter that weren’t there before.– No human alive can monitor all those metrics

effectively and at all times.– You need actionable information.– Use these tips to get you started, solve the

immediate problems.– Consider investing in a set-it-and-forget-it solution.

Monitoring and Intelligently Reacting Monitoring and Intelligently Reacting to ESX Performanceto ESX Performance

Greg ShieldsGreg ShieldsPartner and Principal TechnologistConcentrated Technologywww.ConcentratedTech.com

Please fill out evaluations,or more servers will crash!

This slide deck was used in one of our many conference presentations. We hope you enjoy it, and invite you to use it

within your own organization however you like.

For more information on our company, including information on private classes and upcoming conference appearances, please

visit our Web site, www.ConcentratedTech.com.

For links to newly-posted decks, follow us on Twitter:@concentrateddon or @concentratdgreg

esx performance problems 10 steps

esx host

vm performance tab cpu

check vm cpu

performance advanced

performance troubleshooting

high host cpu usage

guest cpu saturation

memory use memory use

Technology

patch for vmware esx server user's guide€¦ · vmware esx...

esx 4.0 install

mycobacterial esx-3 is required for mycobactin-mediated iron...

esx configuration guide

esx presentation

installation guide vmware esx 3 - wordpress.com ·...

esx configuration guide - esx 4 - vmware...esx configuration...

esx & storage m.theeuwes. 2 contents l océ l centra storage...

esx configuration guide - esx 4 - vmware virtualization for

esx performance troubleshooting

esx and vcenter server installation guide - esx 4.1

esx and vcenter server installation guide - esx 4.0

command line esx

esx short presentation

vmware esx

esx purple screen

tm powervaulttm configuration guide for vmware esx/esxi...

memory resource management in vmware esx...

esx server setup

esx vin eval