copyright © 2009 siemens medical solutions usa, inc. all rights reserved. “so what do those...

Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.

“So What Do Those VMware Counters Really Mean and How Do They Help Me Identify My Problem?”

John Paul

Enterprise Hosting Services

Research and Development

February 20, 2009


Acknowledgments and Presentation Goal

The material in this presentation was pulled from a variety of sources, much of which was graciously provided by VMware staff members. In particular I would like to thank Scott Drummonds for his “Interpreting ESXTop Statistics” document, Eric Heck (local VMware SE) for his input and review of the material, and the Performance Team at VMware that shared their insight and valuable information with us during our recent visit to Palo Alto. I also acknowledge and thank the VMware staff for their permission to use their material in this presentation.

This presentation is intended to review the basics for performance analysis for the VI3.5 virtual infrastructure with detailed information on the tools and counters used. It presents a series of examples of how the performance counters report different types of resource consumption with a focus on key counters to observe.


Agenda

Performance Analysis Basics Types of Performance Counters

Top Performance Counters

Basic Performance Analysis Approach

Native VMWare Tools – Where are We Looking? A Comparison of Esxtop and the Virtual Infrastructure Client

A Quick Introduction to Esxtop

A Quick Introduction to the Virtual Infrastructure Client

Core Four Deep Dive - Performance Counters in Action

Closing Thoughts

Questions and Answers


Types of Performance Counters (a.k.a. statistics)

Static – Counters that don’t change during runtime, for example MEMSZ (memsize), Adapter queue depth, VM Name. The static counters are informational and may not be essential during performance problem analysis.

Performance Analysis Basics

Dynamic – Counters that are computed dynamically, for example CPU load average, memory over-commitment load average.

Calculated - Some are calculated from the delta between two successive snapshots. Refresh interval (-d) determines the time between successive snapshots. For example %CPU used = ( CPU used time at snapshot 2 - CPU used time at snapshot 1 ) / time elapsed between snapshots


Top Performance Counters to Use for Initial Problem Determination

Physical/Virtual Machine CPU (queuing)• Average physical CPU utilization*• Peak physical CPU utilization*• CPU Time*• Processor Queue LengthMemory (swapping)• Average Memory Usage• Peak Memory Usage• Page Faults• Page Fault Delta*Disk (latency)• Split IO/Sec• Disk Read Queue Length• Disk Write Queue Length• Average Disk Sector Transfer TimeNetwork (queuing/errors)• Total Packets/second• Bytes Received/second• Bytes Sent/Second• Output queue length

ESX Host

CPU (queuing)• PCPU%•%SYS•%RDY• Average physical CPU utilization• Peak physical CPU utilization• Physical CPU load average

Memory (swapping)• State (memory state)• SWTGT (swap target)• SWCUR (swap current)• SWR/s (swap read/sec)• SWW/s (swap write/sec)• Consumed • Active (working set)• Swapused (instantaneous swap)• Swapin (cumulative swap in)• Swapout (cumulative swap out)• VMmemctl (balloon memory)

* Remember that counters based upon the timer interrupt inside a VM can be inconsistent and should be used as general guides only.

Disk (latency, queuing)• DiskReadLatency• DiskWriteLatency• CMDS/s (commands/sec)• Bytes transferred/received/sec• Disk bus resets• ABRTS/s (aborts/sec)• SPLTCMD/s (I/O split cmds/sec)

Network (queuing/errors)• %DRPTX (packets dropped - TX)• %DRPRX (packets dropped – RX)• MbTX/s (mb transferred/sec – TX)• MbRX/s (mb transferred/sec – RX)



A Review of the Basic Performance Analysis Approach

Identify the virtual context of the reported performance problem Where is the problem being seen? (“When I do this here, I get that”) How is the problem being quantified? (“My function is 25% slower) Apply a reasonability check (“Has something changed from the status quo?”)

Monitor the performance from within that virtual context View the performance counters in the same context as the problem Look at the ESX cluster level performance counters Look for atypical behavior (“Is the amount of resources consumed characteristic

of this particular application or task for the server processing tier?” ) Look for repeat offenders! This happens often.

Expand the performance monitoring to each virtual context as needed Are other workloads influencing the virtual context of this particular

application and causing a shortage of a particular resource? Consider how a shortage is instantiated for each of the Core Four

resources



A Comparison of Esxtop and the Virtual Infrastructure Client

VIC gives a graphical view of both real-time and trend consumption

VIC combines real-time reporting with short term (1 hour) trending

VIC can report on the virtual machine, ESX host, or ESX cluster

VIC is a bit awkward to get the needed counters in the same view

Esxtop allows more concurrent performance counters to be shown

Esxtop has a higher system overhead to run

Esxtop can sample down to a 2 second sampling period

Esxtop gives a detailed view of each of the Core Four

Recommendation – Use VIC to get a general view of the system performance but use Esxtop for detailed problem analysis

Where Are We Looking?


A Brief Introduction to Esxtop

Launched at the root level of the ESX host

Fairly expensive in overhead, especially if the sampling rate increases

Screens c: cpu (default)

m: memory

n: network

d: disk adapter

u: disk device (new in ESX 3.5)

v: disk VM (new in ESX 3.5)

Can be piped to a file and then imported in W2K System Monitor

Horizontal and vertical screen resolution limits the number of fields and entities that could be viewed so chose your fields wisely

Some of the rollups and counters may be confusing to the casual user



Using the Esxtop screen view

fields hidden from the view…

Time Uptime running worlds

• Worlds = VMKernel processes (like W2k threads)• ID = world identifier• GID = world group identifier• NWLD = number of worlds



Using the Esxtop screen view - expanding groups

• In rolled up view some stats are cumulative of all the worlds in the group• Expanded view gives breakdown per world• VM group consists of mks (mouse, keyboard, screen), vcpu, vmx worlds. SMP VMs have additional vcpu and vmm worlds• vmm0, vmm1 = Virtual machine monitors for vCPU0 and vCPU1 respectively

press ‘e’ key



A Brief Introduction to the Virtual Infrastructure Client

Screens – CPU, Disk, Management Agent, Memory, Network, System vCenter collects performance metrics from the hosts that it manages and aggregates the

data using a consolidation algorithm. The algorithm is optimized to keep the database size constant over time.

vCenter does not display many counters for trend/history screens Esxtop defaults to a 5 second sampling rate while vCenter defaults to a 20 second rate. Default statistics collection periods, samples, and how long they are stored:

Interval Interval Period Number of Samples

Interval Length

Per Hour (real-time) 20 seconds 180

Per day 5 minutes 288 1 day

Per week 30 minutes 336 1 week

Per month 2 hours 360 1 month

Per year 1 day 365 1 year



The Effect of Sampling Rates/Times on ESXTop and VIC Counters

Total CPU =

21.272GHZ

Total CPU Usage

= 3.68%

Total CPU Usage

= 14.18%

ESXTop

VIC Performance

Tab

VIC Summary

Tab

Total CPU Usage =

10.27%


Virtual Infrastructure Client – CPU Screen

To Change Settings

To Change Screens

Performance Counters in Action


Virtual Infrastructure Client – Disk Screen



Virtual Infrastructure Client – Change Settings Screen



A Comparison of Memory Counters in ESXTop and VIC on a 24GB Host

Total Memory

Usage = 52.12%12.51GB Memory

Usage = 52.12%

ESXTop

VIC Performance

Tab

VIC Summary

Tab

Total Memory

Usage = 51.87%


Reservation (Guarantees) • Minimum service level guarantee (in MHz)• Even when system is overcommitted• Needs to pass admission control for start-up

Shares (Share the Resources)• CPU entitlement is directly proportional to VM's shares and depends on the total number of shares issued• Abstract number, only ratio matters

Limit • Absolute upper bound on CPU entitlement (in MHz)• Even when system is not overcommitted

Resource Control Revisit – CPU Example S

hares

Total MHZ

0 MHZ

Limit

Reservation



Types of Resources – The Core Four

Though the Core Four resources exist at both the ESX host and virtual machine levels, they are not the same in how they are instantiated and reported against.

CPU – processor cycles (vertical), multi-processing (horizontal)

Memory – allocation and sharing

Disk (a.k.a. storage) – throughput, size, latencies, queuing

Network - throughput, latencies, queuing

Though all resources are limited, ESX handles the resources differently. CPU is more strictly scheduled, memory is adjusted and reclaimed (more fluid) if based on shares, disk and network are fixed bandwidth (except for queue depths) resources.



CPU – Understanding PCPU versus VCPU

It is important to separate the physical CPU (PCPU) resources of the ESX host from the virtual CPU (VCPU) resources that are presented by ESX to the virtual machine. PCPU – The ESX host’s processor resources are exposed only to ESX. The virtual machines are not aware and cannot report on those physical resources.

VCPU – ESX effectively assembles a virtual CPU(s) for each virtual machine from the physical machine’s processors/cores, based upon the type of resource allocation (ex. shares, guarantees, minimums).

Scheduling - The virtual machine is scheduled to run inside the VCPU(s), with the virtual machine’s reporting mechanism (such as W2K’s System Monitor) reporting on the virtual machine’s allocated VCPU(s) and remaining Core Four resources.



Operating System

Intel Hardware

PCPU PMemory PNIC PDisk

Application

Operating System

Intel Hardware

VCPU VMemory VNIC VDisk

Application

Operating System

Intel Hardware

VCPU VMemory VNIC VDisk

Application

Intel Hardware

PNIC PDiskPMemoryPCPU

PCPU and VCPU Example – Two Virtual Machines

Physical Host2 socket, four core @2.1Ghz = 16.8Ghz CPU

8 GB RAM

Physical Resources

Two Virtual Machines2 cores @2.1Ghz = 4.2Ghz CPU

2 GB RAM

Allocated Physical Resources

6 cores @2.1Ghz = 12.6Ghz CPU (minus virt. overhead)

6 GB RAM (minus virt. overhead)

Limits, maximums, shares all affect real resources

Remaining Physical Resources

Virtual Machine Logical Resources

Each virtual machine defined as a uniprocessor

VCPU = 2.1GHZ since uniprocessor

Memory allocation of 1GB per virtual machine



CPU – Key Question and Considerations

Allocation – The CPU allocation for a specific workload can be constrained due to the resource settings or number of CPUs, amount of shares, or limits. The key field at the virtual machine level is CPU queuing and at the ESX level it is Ready to Run (%RDY in Esxtop).

Capacity - The virtual machine’s CPU can be constrained due to a lack of sufficient capacity at the ESX host level as evidenced by the PCPU/LCPU utilization.

Contention – The specific workload may be constrained by the consumption of workloads operating outside of their typical patterns

SMP CPU Skewing – The movement towards lazy scheduling of SMP CPUs can cause delays if one CPU gets too far “ahead” of the other. Look for higher %CSTP (co-schedule pending)


Is there a lack of CPU resources for the VCPU(s) of the virtual machine or for the PCPU(s) of the ESX host?


CPU Performance Counters 1

Field Usage

PCPU(%)Percentage of CPU utilization per physical CPU and total average physical CPU utilization.

LCPU(%)

Percentage of CPU utilization per logical CPU. The percentages for the logical CPUs belonging to a package add up to100 percent. This line appears only if hyper-threading is present and enabled.

CCPU(%) Percentages of total CPU time as reported by the ESX Server service console.

us— Percentage user time.

sy— Percentage system time.

id— Percentage idle time.

wa— Percentage wait time.

cs/sec— Context switches per second recorded by the service console

IDResource pool ID or virtual machine ID of the running world’s resource pool or virtual machine, or world ID of running world.

GID Resource pool ID of the running world’s resource pool or virtual machine

NAME Name of running world’s resource pool or virtual machine, or name of running world.




Field Usage

NWLD

Number of members in running world’s resource pool or virtual machine. If a Group is expanded using the interactive command e(see interactive commands), then NWLD for all the resulting worlds is 1 (some resource pools like the console resource pool have only one member).

%STATE TIMES Set of CPU statistics made up of the following percentages. For a world, the percentages are a percentage of one physical CPU.

%USED Percentage physical CPU used by the resource pool, virtual machine, or world. This is the combination of all of the cores so the % can be greater then 100%.

%RUN

Percentage of total time scheduled. This time does not account for hyper threading and system time. On a hyper threading enabled server, the %RUN can be twice as large as %USED.

%SYS

Percentage of time spent in the ESX Server VMkernel on behalf of the resource pool, virtual machine, or world to process interrupts and to perform other system activities. This time is part of the time used to calculate %USED, above.

%WAIT

Percentage of time the resource pool, virtual machine, or world spent in the blocked or busy wait state. This percentage includes the percentage of time the resource pool, virtual machine, or world was idle.




Field Usage

%IDLE

Percentage of time the resource pool, virtual machine, or world was idle. Subtract this percentage from %WAIT, above to see the percentage of time the resource pool, virtual machine, or world was waiting for some event.

%RDY Percentage of time the resource pool, virtual machine, or world was ready to run

%OVRLP

Percentage of system time spent during scheduling of a resource pool, virtual machine, or world on behalf of a different resource pool, virtual machine, or world while the resource pool, virtual machine, or world was scheduled. This time is not included in %SYS. For example, if virtual machine A is currently being scheduled and a network packet for virtual machine B is processed by the ESX Server VMkernel, the time spent appears as %OVRLP for virtual machine A and %SYS for virtual machine B.

%CSTP Percentage of time a resource pool spends in a ready, co-deschedule state

%MLMTD

Percentage of time the ESX Server VMkernel deliberately did not run the resource pool, virtual machine, or world because doing so would violate the resource pool, virtual machine, or worlds limit setting. Even though the resource pool, virtual machine, or world is ready to run when it is prevented from running in this way, the %MLMTD time is not included in %RDY time.

EVENTSet of CPU statistics made up of per second event rates. These statistics are COUNTS/s for VMware internal use only.




Field Usage

CPU ALLOC Set of CPU statistics made up of the following CPU allocation configuration parameters.

AMIN Resource pool, virtual machine, or world attribute Reservation

AMAXResource pool, virtual machine, or world attribute Limit. A value of ‐1 means unlimited.

ASHRS Resource pool, virtual machine, or world attribute Shares

STATSParameters and statistics. These statistics are applicable only to worlds and not to virtual machines or resource pools.

AFFINITYBIT Bit mask showing the current scheduling affinity for the world

HTSHARING Current hyperthreading configuration

CPUThe physical or logical processor on which the world was running when resxtop(or esxtop) obtained this information.

HTQIndicates whether the world is currently quarantined or not. N means no and Y means yes.

TIMER Timer rate for this world.



Esxtop CPU screen (c)

PCPU = Physical CPU/core

CCPU = Console CPU (CPU 0)

Press ‘f’ key to choose fields



ESXTop

Virtual Machine View

Idle State on Test Bed (CPU View)



Idle State on Test Bed – GID 32 Expanded

Cumulative Wait %

Five Worlds Total Idle %Expanded GID Rolled Up GID

Wait includes idle



High CPU within one virtual machine caused by affinity - Esxtop

Physical CPU Fully

Used

One Virtual CPU is

Fully Used

Case Studies - CPU


High CPU within one virtual machine (affinity) - vCenter

View of the ESX Host

View of the VM

Case Studies - CPU


Add a Second Virtual Machine – MAX CPU (affinity)

2 Physical CPUs Fully

Used

2 Virtual CPUs Fully

Used

Ready to Run

Acceptable

Case Studies - CPU


Add a Third Virtual Machine – MAX CPU (affinity)


Used

3 Virtual CPUs Fully Used

Ready to Run Acceptable

Case Studies - CPU


Add a Fourth Virtual Machine – MAX CPU (affinity)


Used



Case Studies - CPU


Four Virtual Machines – MAX CPU (no affinity)


Used



Case Studies - CPU


SMP Implementation WITHOUT CPU Constraints


Used



One - 2 CPU SMP VCPU

Case Studies - CPU


SMP Implementation WITHOUT CPU Constraints - VIC

Case Studies - CPU


Case Studies - CPU

SMP Implementation WITHOUT CPU Constraints - Esxtop


Used



One - 2 CPU SMP VCPU


SMP Implementation with Mild CPU Constraints


Used

4 Virtual CPUs Heavily Used

Ready to Run Indicates Severe

Problems

One - 2 CPU SMP VCPUs (7

NWLD)

Case Studies - CPU


SMP Implementation with Severe CPU Constraints


Used

4 Virtual CPUs

Fully Used


Problems

Two - 2 CPU SMP VCPUs

(7 NWLD)

Case Studies - CPU



Case Studies - CPU


Used

4 Virtual CPUs

Fully Used


Problems

Two - 2 CPU SMP VCPUs

(7 NWLD)



Case Studies - CPU


Memory – Separating the machine and guest memoryIt is important to note that some statistics refer to guest physical memory while others refer to machine memory. “ Guest physical memory" is the virtual-hardware physical memory presented to the VM. " Machine memory" is actual physical RAM in the ESX host.

In the figure below, two VMs are running on an ESX host, where each block represents 4 KB of memory and each color represents a different set of data on a block.

Inside each VM, the guest OS maps the virtual memory to its physical memory. ESX Kernel maps the guest physical memory to machine memory. Due to ESX Page Sharing technology, guest physical pages with the same content canbe mapped to the same machine page.

Case Studies - Memory


A Brief Look at Ballooning

The W2K balloon driver is located in VMtools

ESX sets a balloon target for each workload at start-up and as workloads are introduced/removed

The balloon driver expands memory consumption, requiring the Virtual Machine operating system to reclaim memory based on its algorithms

Ballooning routinely takes 10-20 minutes to reach the target

The returned memory is now available for ESX to use

Key ballooning fields:


SZTGT: determined by reservation, limit and memory sharesSWCUR = 0 : no swapping in the pastSWTGT = 0 : no swapping pressureSWR/S, SWR/W = 0 : No swapping activity currently


ESX Memory Sharing - The “Water Bed Effect”

ESX handles memory shares on an ESX host and across an ESX cluster with a result similar to a single water bed, or room full of water beds, depending upon the action and the memory allocation type:

Initial ESX boot (i.e., “lying down on the water bed”) – ESX sets a target working size for each virtual machine, based upon the memory allocations or shares, and uses ballooning to pare back the initial allocations until those targets are reached (if possible).

Steady State (i.e., “minor position changes”) - The host gets into a steady state with small adjustments made to memory allocation targets. Memory “ripples” occur during steady state, with the amplitude dependent upon the workload characteristics and consumption by the virtual machines.

New Event (i.e., “second person on the bed”) – The host receives additional workload via a newly started virtual machine or VMotion moves a virtual machine to the host through a manual step, maintenance mode, or DRS. ESX pares back the target working size of that virtual machine while the other virtual machines lose CPU cycles that are directed to the new workload.

Large Event (i.e., “jumping across water beds”) – The cluster has a major event that causes a substantial movement of workloads to or between multiple hosts. Each of the hosts has to reach a steady state, or to have DRS determine that the workload is not a current candidate for the existing host, moving to another host that has reached a steady state with available capacity. Maintenance mode is another major event.



Memory – Key Question and Considerations

Is the memory allocation for each workload optimum to prevent swapping at the Virtual Machine level, yet low enough not to constrain other workloads or the ESX host? HA/DRS/Maintenance Mode Regularity – How often do the workloads in the cluster get moved between hosts? Each movement causes an impact on the receiving (negative) and sending (positive) hosts with maintenance mode causing a rolling wave of impact across the cluster, depending upon the timing.

Allocation Type – Each of the allocation types have their drawbacks so tread carefully when choosing the allocation type. One size seldom is right for all needs.

Capacity/Swapping - The virtual machine’s CPU can be constrained due to a lack of sufficient capacity at the ESX host level. Look for regular swapping at the ESX host level as an indicator of a memory capacity issue but be sure to notice memory leaks that artificially force a memory shortage situation.



Memory Performance Counters 1

Field Usage

PMEM (MB) Displays the machine memory statistics for the server. All numbers are in megabytes.

total — Total amount of machine memory in the server.

cos — Amount of machine memory allocated to the ESX Server service console (ESX Server 3 only).

vmk — Amount of machine memory being used by the ESX Server VMkernel.

other — Amount of machine memory being used by everything other than the ESX service console (ESX Server 3 only) and ESX Server VMkernel.

free — Amount of machine memory that is free.

state — Current machine memory availability state. Possible values are high, soft, hard and low. High means that the machine memory is not under any pressure and low means that it is.

VMKMEM

Displays the machine memory statistics for the ESX Server VMkernel. All (MB) numbers are in megabytes. . managed — Total amount of machine memory managed by the ESX Server VMkernel. . min free — Minimum amount of machine memory that the ESX Server VMkernel aims to keep free. . rsvd — Total amount of machine memory currently reserved by resource pools. . ursvd — Total amount of machine memory currently unreserved




Field Usage

COSMEMDisplays the memory statistics as reported by the ESX Server service console (MB) (ESX Server 3 only). All numbers are in megabytes.

free — Amount of idle memory.

swap_t — Total swap configured.

swap_f — Amount of swap free.

r/s is — Rate at which memory is swapped in from disk.

w/s — Rate at which memory is swapped to disk.

NUMA (MB)

Displays the ESX Server NUMA statistics. This line appears only if the ESX Server host is running on a NUMA server. All numbers are in megabytes. For each NUMA node in the server, two statistics are displayed

The total amount of machine memory in the NUMA node that is managed by the ESX Server.

The amount of machine memory in the node that is currently free (in parentheses).

PSHARE (MB) Displays the ESX Server page‐sharing statistics. All numbers are in megabytes

shared — Amount of physical memory that is being shared.

common — Amount of machine memory that is common across worlds.

saving — Amount of machine memory that is saved because of page sharing.




Field Usage

SWAP (MB) Displays the ESX Server swap usage statistics. All numbers are in megabytes

curr — Current swap usage

target — Where the ESX Server system expects the swap usage to be.

r/s — Rate at which memory is swapped in by the ESX Server system from disk.

w/s — Rate at which memory is swapped to disk by the ESX Server system

MEMCTL (MB) Displays the memory balloon statistics. All numbers are in megabytes.

curr — Total amount of physical memory reclaimed using the vmmemctl module

target — Total amount of physical memory the ESX Server host attempts to reclaim using the vmmemctl module.

max — Maximum amount of physical memory the ESX Server host can reclaim using the vmmemctlmodule

AMIN Memory reservation for this resource pool or virtual machine

AMAX Memory limit for this resource pool or virtual machine. A value of 01 means Unlimited

ASHRS Memory shares for this resource pool or virtual machine

AMLMT Memory limit for this resource pool or virtual machine




Field Usage

AUNITS Type of units shown (kb)

N%L Current percentage of memory allocated to the virtual machine or resource pool that is local

MEMSZ (MB) Amount of physical memory allocated to a resource pool or virtual machine

SZTGT (MB) Amount of machine memory the ESX Server VMkernel wants to allocate to a resource pool or virtual machine

TCHD (MB) Working set estimate for the resource pool or virtual machine

%ACTV Percentage of guest physical memory that is being referenced by the guest. This is an instantaneous value

%ACTVS Percentage of guest physical memory that is being referenced by the guest. This is a slow moving average.

%ACTVF Percentage of guest physical memory that is being referenced by the guest. This is a fast moving average

%ACTVN Percentage of guest physical memory that is being referenced by the guest. This is an estimation

MCTL Memory balloon driver is installed or not. N means no, Y means yes.

MCTLSZ Amount of physical memory reclaimed from the resource pool by way of (MB) ballooning

MCTLTGTAmount of physical memory the ESX Server system can reclaim from the (MB) resource pool or virtual machine by way of ballooning




Field Usage

MCTLMAX

Maximum amount of physical memory the ESX Server system can reclaim (MB) from the resource pool or virtual machine by way of ballooning. This maximum depends on the guest operating system type

SWCUR (MB) Current swap usage by this resource pool or virtual machine

SWTGT (MB) Target where the ESX Server host expects the swap usage by the resource pool or virtual machine to be

SWR/s (MB) Rate at which the ESX Server host swaps in memory from disk for the resource pool or virtual machine

SWW/s (MB) Rate at which the ESX Server host swaps resource pool or virtual machine memory to disk.

CPTRD (MB) Amount of data read from checkpoint file

CPTTGT Size of checkpoint file. (MB)

ZERO (MB) Resource pool or virtual machine physical pages that are zeroed

SHRD (MB) Resource pool or virtual machine physical pages that are shared.

SHRDSVDMachine pages that are saved because of resource pool or virtual machine (MB) shared pages.

OVHD (MB) Current space overhead for resource pool.




Field Usage

OVHDMAX Maximum space overhead that might be incurred by resource pool or virtual (MB) machine

OVHDUW Current space overhead for a user world

RESP? Memory Responsive?

NHNCurrent home node for the resource pool or virtual machine. This statistic is applicable only on NUMA systems. If the virtual machine has no home node, a dash (‐) is displayed

NRMEMCurrent amount of remote memory allocated to the virtual machine or (MB) resource pool. This statistic is applicable only on NUMA systems

GST_NDx Guest memory allocated for a resource pool on NUMA node x. This statistic (MB) is applicable on NUMA systems only

OVD_NDx VMM overhead memory allocatedforaresource pool on NUMA node x. This (MB) statistic is applicable on NUMA systems only



Esxtop memory screen (m)Possible

states: High, Soft, hard and

low

Physical Memory (PMEM)

VMKMEMCOSPCI Hole

VMKMEM - Memory managed by VMKernelCOSMEM - Memory used by Service Console



Esxtop memory screen (m)

SZTGT = Size targetSWTGT = Swap targetSWCUR = Currently swappedMEMCTL = Balloon driverSWR/S = Swap read /secSWW/S = Swap write /sec

SZTGT : determined by reservation, limit and memory sharesSWCUR = 0 : no swapping in the pastSWTGT = 0 : no swapping pressureSWR/S, SWR/W = 0 : No swapping activity currently

Swapping activity in Service Console

VMKernel Swapping activity



VIC Memory Screen – Summary Tab


Host Memory Usage: The amount of physical

memory (in MB) allocated to a guest

Guest Memory Usage: The amount of physical memory (in MB) actively

used by a guest

Requested Memory: The amount of physical

memory (in MB) requested for a guest


ESXTop and VIC Summary Tab – Memory Screens

Virtual Machine Memory

Actively Used

Virtual Machine Requested

Memory

Virtual Machine Target Working

Set Size

Virtual Machines Have

Previously Swapped


Idle State on Test Bed – Memory View



Memory View at Steady State of 3 Virtual Machines – Memory Shares

Virtual Machine Just Powered On

These VMs are at a memory steady

state

No VM Swapping or Targets


Most memory is not reserved


Ballooning and Swapping in Progress – Memory View

Ballooning In Effect

Mild swapping Different Size

Targets Due to Different Amount of

Up Time

Possible states: High, Soft, hard and

low



Memory Reservations – Effect on New Loads

Can’t start fourth virtual machine of >512MB of reserved memory

Fourth virtual machine of 512MB of reserved memory started

6GB of “free” physical memory due to memory sharing over 20

minutes

666MB of unreserved

memory

Three VMs, each with 2GB reserved memory


What Size Virtual Machine with Reserved Memory Can Be Started?


Memory Shares – Effect on New Loads

Fourth virtual machine of 2GB of memory allocation started successfully

5.9 GB of “free”

physical memory

6GB of unreserved

memory


Three VMs with 2GB allocation


Storage – Key Question and Considerations

Is the bandwidth and configuration of the storage subsystem sufficient to meet the desired latency for the target workloads? If the latency target is not being met then further analysis may be very time consuming. Queuing - Queuing can happen at any point along the storage path, but is not necessarily a bad thing if the latency meets requirements.

Storage Path Configuration and Capacity – It is critical to know the configuration of the storage path and the capacity of each node along that path. The number of active vmkernel commands must be less then or equal to the queue depth max of any of the storage path components while processing the target storage workload.

Virtual Machines per LUN - The number of outstanding active vmkernel commands per virtual machine times the number of virtual machines on a specific LUN must be less then the queue depth of that adaptor

The Latency vs. Throughput Relationship -

Throughput (MB/sec) = (Outstanding IOs/ latency (msec)) * Block size (KB)

Storage Considerations


Storage – More Questions

How fast can the individual disk drive process a request?

Based upon the block-size and type of I/O (sequential read, sequential write, random read, random write) what type of configuration (RAID, number of physical spindles, cache) is required to match the I/O characteristics and workload demands for average and peak throughput?

Does the network storage (SAN frame) handle the I/O rate down each path and aggregated across the internal bus, frame adaptors, and front end processors?

In order to answer these questions we need to better understand the underlying design, considerations, and basics of the storage subsystem



Back-end Storage Design Considerations

Capacity - What is the storage capacity needed for this workload/cluster? Disk drive size (144GB, 300GB) Number of disk drives needed within a single logical unit (ex., LUN)

IOPS Rate – How many I/Os per second are expected with the needed latency? Number of physical spindles per LUN Impact of sharing of physical disk drives between LUNs Configuration (ex., cache) and speed of the disk drive

Availability – How many disk drives, storage components can fail at one time? Type of RAID chosen Amount of redundancy built into the storage solution

Cost – Delivered cost per byte at the required speed and availability Many options are available for each design consideration Final decisions on the choice for each component The cumulative amount of capacity, IOPS rate, and availability often dictate

the overall solution



Storage from the Ground Up – Physical Disk Types

SCSI (Small Computer Systems Interface) Fiber Channel (FC)

Allows queuing depth of 16 S-ATA (Serial Advanced Technology Attachment)

No command queuing, lower costs Especially good for sequential I/Os (ex, backup)

SAS (Serial SCSI) Solid State Disk (SSD)



Storage from the Ground Up – Configuration Options

Latency – The average time it takes for the requested sector to rotate under the read/write head after a completed seek 5400 (5.5ms), 7200 (4.2ms), 10,000 (3ms) , 15,000 (2ms) RPM Ave. disk latency = 1/2 * rotation

Seek Time – The time it takes for the read/write head to find the physical location of the requested data on the disk Average Seek time: 8-10 ms

Average Access Time – The time it takes for a disk device to service a request. This includes seek time, latency, and command processing overhead time.

Host Transfer Rate – The speed at which the host can transfer the data across the disk interface.



The Effect of Rotational Speed on Transfer Time

15,000 RPM disk Average Seek 3.6mSec (1/3 Stroke)

Average latency 2.0mSec

Command and data transfer 0.2mSec

Total random access time = 3.6 + 2.0 + 0.2 = 5.8mSec

1 / 0.0058sec = 172 I/Os per second per disk

At 8KB per I/O that is about 1.4 MB/Second

10,000 RPM disk Average Seek 4.7mSec (1/3 Stroke)

Average latency 3.0 mSec

Command and data transfer < 1mSec

Total random access time = 4.9 + 3.0 + 0.2 = 8.1mSec

1 / 0.0081 = 123 I/0s per second per disk

At 8KB per I/O that is about 984 KB/Sec


Seek

Actuator Arm

ConcentricData

Tracks


An Example of the Benefits of Faster Drives

Drive Speed Drive Capacity # of Drives RAID 5 Capacity (GB) Performance (IOPS) Short-Stroked to (GB)

10K RPM 146GB 17 2482 700 No

15K RPM 146GB 11 1606 700 No

10K RPM 73GB 17 1241 700 No

15K RPM 146GB 11 1606 700 No

10K RPM 300GB 16 2064 763 Yes

15K RPM 146GB 14 2044 772 No


Interaction of Transfer Rate, Block Size, and IOPS - Read

0

100

200

300

400

500

600

700

800

900

1 2 3 4 5 6 7 8 9 10

IOPS K/s Block Size KB/s XFR Rate MB/Sec

XFR Rate

Block SizeIOPS

Block size increases, IOPS decrease, throughput increase


Storage from the Ground Up – Common RAID Types/Uses

RAID 1

Data 1

Data 2

Data 3

Data 4

Data 1

Data 2

Data 3

Data 4

Disk 1 Disk 2

RAID 0

Data 1

Data 3

Data 5

Data 7

Data 2

Data 4

Data 6

Data 8

Disk 1 Disk 2

RAID 5

Data 1

Data 1

Data 1

Data p

Data 2

Data 2

Data p

Data 1

Disk 1 Disk 2

Data 3

Data p

Data 2

Data 2

Data p

Data 3

Data 3

Data 3

Disk 3 Disk 4

RAID 0 - Stripes data across two or more disks with no parity Poor redundancy, good performance

RAID 1 – Data is duplicated across disks (mirrored) Good redundancy, good performance on reads, reduced capacity/price

RAID 5 - Data blocks are striped across disks with parity data distributed across the physical disks Good redundancy, performance penalty on writes, high capacity/price

RAID 6 – Data blocks are striped across disks with two copies of parity data distributed across the physical disks Good redundancy, performance penalty on writes, high capacity/price

RAID 6

Data 1

Data 1

Data 1

Data p

Data 2

Data 2

Data p

Data p

Disk 1 Disk 2

Data 3

Data p

Data p

Data 1

Data p

Data p

Data 2

Data 2

Disk 3 Disk 4 Disk 5

Data p

Data 3

Data 3

Data 3



Example: Creating Four 150GB LUNs Using RAID 1 + 0

Eight total physical disks, each physical disk has 150GB available space

Effective raw disk capacity of 600GB (4 x 150GB)

Each LUN spans four physical disk drives for performance needs

Each physical disk has four LUNs allocated


RAID 1 Four 150GB LUNs

LUN 1

LUN 2

LUN 3

LUN 4

LUN 1

LUN 2

LUN 3

LUN 4

Disk 1 Disk 2Mirrored Disk Pair 1

LUN 1

LUN 2

LUN 3

LUN 4

LUN 1

LUN 2

LUN 3

LUN 4

LUN 1

LUN 2

LUN 3

LUN 4

LUN 1

LUN 2

LUN 3

LUN 4

LUN 1

LUN 2

LUN 3

LUN 4

LUN 1

LUN 2

LUN 3

LUN 4





Example: Creating Four 150GB LUNs Using RAID 5

Four total physical disks, each physical disk has 150GB available space

Effective raw disk capacity of 600GB (4 x 150GB)

Each LUN spans three physical disk drives for performance needs

Each physical disk has four LUNs allocated



LUN 1

LUN 2

LUN 3

LUN p

LUN 1

LUN 2

LUN p

LUN 4

LUN 1

LUN p

LUN 3

LUN 4

LUN p

LUN 2

LUN 3

LUN 4


The Impact of LUN Data Striping on Performance

Each workload independently makes requests of the physical disks

Higher disk utilization usually results in slower response time

The faster the physical disk response, the less negative impact on response time

High usage of a single disk results in hot spots, with resultant data moves by the storage subsystem to different physical disks

“Short stroking” (using outer disk part) of disks may be needed for high I/O



LUN 1

LUN 2

LUN 3

LUN p

LUN 1

LUN 2

LUN p

LUN 4

LUN 1

LUN p

LUN 3

LUN 4

LUN p

LUN 2

LUN 3

LUN 4

RAID 5 Short Stroking

LUN 1 LUN 1 LUN 1 LUN p


Network Storage Components That Can Affect Performance/Availability

Size and use of cache

Number of independent internal data paths and buses

Number of front-end interfaces and processors

Types of interfaces supported (ex. Fiber channel and iSCSI)

Number and type of physical disk drives available

MetaLUN Expansion MetaLUNs allow for the aggregation of LUNs

System typically re-stripes data when MetaLUN is changed

Some performance degradation during re-striping

Storage Virtualization Aggregation of storage arrays behind a presented mount point/LUN

Movements between disk drives and tiers control by storage management

Change of physical drives and configuration may be transient and severe



SAN Storage Infrastructure – Areas to Watch/Consider

HBA

ISL

FC switch Director

San.jpg

HBA

HBA

HBA Speed

Storage Adapter Queue LengthWorld Queue Length LUN Queue Length

Fiber Bandwidth

FA CPU SpeedDisk Response

RAID Configuration

Disk Speeds

Block Size

Number of Spindles in LUN/array

Cache Size/Type



Storage Area Network

VM 3World Queue

Length (WQLEN)

ESX Host

HBA

HBA

VM 1World Queue

Length (WQLEN)

VM 2World Queue

Length (WQLEN)

VM 4World Queue

Length (WQLEN)

Storage Queuing – The Key Throttle Points

LU

N Q

ueu

e L

eng

th (

LQ

LE

N)

Ex

ec

uti

on

Th

rott

leE

xe

cu

tio

n T

hro

ttle



Storage I/O – The Key Throttle Point Definitions

Storage Adapter Queue Length (AQLEN) The number of outstanding vmkernel active commands that the adapter is configured to support. This is not settable. It is a parameter passed from the adapter to the kernel.

LUN Queue Length (LQLEN) The maximum number of permitted outstanding vmkernel active commands to a LUN. (This would be the HBA queue depth setting for an HBA.) This is set in the storage adapter configuration via the command line.

World Queue Length (WQLEN) VMware Recommends Not to Change This!!!! The maximum number of permitted outstanding vmkernel active requests to a LUN from any singular virtual machine (min:1, max:256: default: 32) Configuration->Advanced Settings->Disk-> Disk.SchedNumReqOutstanding

Execution Throttle (this is not a displayed counter) The maximum number of permitted outstanding vmkernel active commands that can be executed on any one HBA port (min:1, max:256: default: ~16, depending on vendor) This is set in the HBA driver configuration.



Queue Length Rules of Thumb

For a lightly-loaded system, average queue length should be less than 1 per spindle with occasional spikes up to 10. If the workload is write-heavy, the average queue length above a mirrored controller should be less than 0.6 per spindle and less than 0.3 per spindle above a RAID-5 controller.

For a heavily-loaded system that isn’t saturated, average queue length should be less than 2.5 per spindle with infrequent spikes up to 20. If the workload is write-heavy, the average queue length above a mirrored controller should be less than 1.5 per spindle and less than 1 above a RAID-5 controller.



Storage Performance Counters 1Field Usage

%USD Percentage of the queue depth used by ESX Server VMkernel active commands. This statistic is applicable only to worlds and devices.

ABRTS/s Number of commands aborted per second.

ACTVNumber of commands in the ESX Server VMkernel that are currently active. This statistic is applicable only to worlds and devices

AQLEN Storage adapter queue depth. Maximum number of ESX Server VMkernel active commands that the adapter driver is configured to support.

BLKSZ Block size in bytes.

CIDStorage adapter channel/controller ID. This ID is visible only if the corresponding adapter, channel and target are expanded.

CMDS/s Number of commands issued per second.

DAVG/cmd Average device latency per command in milliseconds.

DAVG/rd Average device read latency per read operation in milliseconds.

DAVG/wr Average device write latency per write operation in milliseconds.

Device Storage device name. This name is visible only if corresponding world is expanded to devices.

DQLEN Storage device queue depth. This is the maximum number of ESX Server VMkernel active commands that the device is configured to support.

Case Studies - Storage



GAVG/cmd Average guest operating system latency per command in milliseconds.

GAVG/rd Average guest operating system read latency per read operation in milliseconds

GAVG/wr Average guest operating system write latency per write operation in milliseconds

GID Resource pool ID of running world’s resource pool.

ID Resource pool ID of the running world’s resource pool or the world ID of the running world.

KAVG/cmd Average ESX Server VMkernel latency per command in milliseconds.

KAVG/rd Average ESX Server VMkernel read latency per read operation in milliseconds

KAVG/wr Average ESX Server VMkernel write latency per write operation in milliseconds

LIDStorage adapter channel target LUN ID. This ID is visible only if the corresponding adapter, channel and target are expanded.

LOADRatio of ESX Server VMkernel active commands plus ESX Server VMkernel queued commands to queue depth. This statistic is applicable only to worlds and devices.

LQLENLUN queue depth. Maximum number of ESX Server VMkernel active commands that the LUN is allowed to have

MBREAD/s Megabytes read per second

MBWRTN/s Megabytes written per second.

NAME Name of running world’s resource pool or name of the running world.



Storage Performance Counters 3

Field Usage

NCHNS Number of channels.

NDV Number of devices

NLUNS Number of LUNs

NPH Number of paths/partitions

NTGTS Number of targets

NUMBLKSNumber of blocks of the device. It is valid only if the corresponding world is expanded to devices.

NVMS Number of worlds

NWD Number of worlds.

PAECMD/s The number of PAE (Physical Address Extension) commands per second.

PAECP/s Number of PAE copies per second. This statistic is applicable only to paths.

PARTITION Partition ID This ID is visible only if the corresponding device is expanded to partitions.

PATH Path name This name is visible only if the corresponding device is expanded to paths.

QAVG/cmd Average queue latency per command in milliseconds.

QAVG/rd Average queue latency per read operation, in milliseconds

QAVG/wr Average queue latency per write operation, in milliseconds.




QUED Number of commands in the ESX Server VMkernel that are currently queued

READS/s Number of read commands issued per second.

RESETS/s Number of commands reset per second.

SHARES Number of shares

SPLTCMD/s The number of split commands per second.

SPLTCP/s The number of split copies per second.

WID

Storage adapter channel target LUN world ID. This ID is visible only if the corresponding adapter, channel, target and LUN are expanded.

WQLENWorld queue depth. Maximum number of ESX Server VMkernel active commands that the world is allowed to have. This is a per LUN maximum for the world

WQLEN

World queue depth. This is the maximum number of ESX Server VMkernel active commands that the world is allowed to have. This is a per device maximum for the world. It is valid only if the corresponding device is expanded to worlds.

WRITES/s Number of write commands issued per second



Esxtop disk adapter screen (d)

Host bus adapters (HBAs) - includes SCSI,

iSCSI,RAID, and FC-HBA adapters

Latency stats from the Device, Kernel and the

Guest

DAVG/cmd - Average latency (ms) from the Device (LUN)

KAVG/cmd - Average latency (ms) in the VMKernel

GAVG/cmd - Average latency (ms) in the Guest



Esxtop disk device screen (u)

LUNs in C:T:L format (Controller: Target: LUN)



Esxtop disk VM screen (v)

running VMs


Copyright © 2009 Siemens Medical Solutions USA, Inc. All rights reserved.Page 88

Test Bed Idle State– Device Adapter View

Storage Adapter

Maximum Queue Length

LUN Maximum Queue Length

World Maximum Queue Length

Average Device Latency, Per Command



Moderate load on two virtual machines

Acceptable latency from the disk subsystem

Commands are queued BUT….



Heavier load on two virtual machines

Virtual machine latency is consistently above

20ms/second, performance could start

to be an issue

Commands are queued and are exceeding

maximum queue lengths BUT….



Heavy load on four virtual machines

Virtual machine latency is consistently above

60 ms/second for some VMs, performance will

be an issue

Commands are queued and are exceeding

maximum queue lengths AND….



Artificial Constraints on Storage

Bad throughput

Good throughput Problem with the disk subsystem

Device Latency is high - cache disabled

Low device Latency



Network – Key Question and Considerations

Is the bandwidth and configuration of the network sufficient to meet the desired throughput without network errors? Configuration – Is the network configuration known and is the network correctly designed for the needed throughput?

Error Indicators – Dropped packets and low throughput without errors indicates a problem.

Network Counters – These are fairly high level counters. Most network analysis would be done external to the system.

Case Studies - Network


Network Performance Counters 1

Field Usage

PORT Virtual network device port ID

UPLINK Y means the corresponding port is an uplink. N means it is not.

UP Y means the corresponding like is up. N means it is not.

SPEED Link speed in Megabits per second.

FDUPLX Y means the corresponding link is operating in full duplex. N means it is not.

USED Virtual network device port user.

DTYP Virtual network device type. H means hub and S means switch.

PKTTx/s Number of packets transmitted per second.

PKTRX/s Number of packets received per second.

MbTX/s MegaBits transmitted per second.

MbRX/s MegaBits received per second

%DRPTX Percentage of transmit packets dropped

%DRPRX Percentage of receive packets dropped



Esxtop network screen (n)

PKTTX/s - Packets transmitted /secPKTRX/s - Packets received /secMbTx/s - Transmit Throughput in Mbits/secMbRx/s - Receive throughput in Mbits/sec

Port ID: every entity is attached to a port on the virtual switchDNAME - switch where the port belongs to

Physical NIC

Service console

NIC Virtual NICs



Test Bed Idle State – Network View



Test Bed Under Heavy Network Load

No packets are being dropped despite the load

Heavy transmit

and receive loads

BUT….



Closing Thoughts

Know the key counters to look at for each type of resource Be careful on what type of resource allocation technique you use for CPU and RAM. One size may NOT fit all. Consider the impact of events such as maintenance on the performance of a cluster Set up a simple test bed where you can create simple loads to become familiar with the various performance counters and tools Compare your test bed analysis and performance counters with the development and production clusters Know your storage subsystem components and configuration due to the large impact this can have on overall performance Take the time to learn how the various components of the virtual infrastructure work together


Performance Resources

The performance community

http://communities.vmware.com/community/vmtn/general/performance

Performance analysis techniques whitepaper

http://communities.vmware.com/docs/DOC-3930

Performance web page for white papers

http://www.vmware.com/overview/performance

VROOM!—VMware performance blog

http://blogs.vmware.com/performance


John Paul – [email protected]

copyright © 2009 siemens medical solutions usa, inc. all rights reserved. “so what do those...

Documents

cpu time

vmware counters

performance problem

performance team

key counters

example cpu load average

statistics static counters

siemens medical solutions