vmware performance troubleshooting

54
VMware Performance Troubleshooting Presented by Chris Kranz

Upload: glbsolutions

Post on 21-Dec-2014

25.984 views

Category:

Documents


13 download

DESCRIPTION

 

TRANSCRIPT

Page 1: VMware Performance Troubleshooting

VMware Performance Troubleshooting

Presented by Chris Kranz

Page 2: VMware Performance Troubleshooting

Topics Covered• Introduction• Root Cause Analysis• Performance Characteristics• CPU• Networking• Memory• Disk• Virtual Machine optimisation• ESXTop• vm-support• Service Console• Resource Groups• Design Guidelines• Capacity Planner limitations and cautions• Conclusion• Reference Articles

Page 3: VMware Performance Troubleshooting

Introduction

Multiple layers of virtualisation are used to increase service levels, availability and manageability

However, multiple layers of virtualisation often mask performance and configuration issues making it more of a challenge to troubleshoot and correct

The worst out come is that performance issues after a virtualisation project lead to the perception that VMware results in reduced performance and future confidence in VMware can be affected

Page 4: VMware Performance Troubleshooting

• Virtual Machine Resources– CPU– Memory– Disk– Networking

Performance Basics

Page 5: VMware Performance Troubleshooting

Resource Maximums

Host Guest

Logical Processors 64 N/A

Virtual CPUs N/A 8

Virtual CPU’s per Core 20 N/A

Memory 1TB 256GB

http://www.vmware.com/pdf/vsphere4/r40/vsp_40_config_max.pdf

Page 6: VMware Performance Troubleshooting

Typical Host

vSphere 1U Host

CPU’s 2 x Quad Core

Memory 32-64GB RAM

Typical 3 VMs per core, 24VM’s per HostEach has 2GB of RAM = 48GB of RAM

Page 7: VMware Performance Troubleshooting

Root Cause Analysis

http://www.vmware.com/resources/techresources/10066

Page 8: VMware Performance Troubleshooting

Root Cause ...

Page 9: VMware Performance Troubleshooting

• Do not rely on guest tools, but– Can show high CPU, & Memory Utilisation– Measurement of Latency & throughput of Disk &

Network Interfaces• Use the virtualisation layer, to diagnose cause:– Guest is unaware of virtualisation workload– The way in which guest OS’s account time is

different– No visibility of available resources

Monitoring Performance

Page 10: VMware Performance Troubleshooting

• esxtop (service console only)• resxtop (remote command line utilities)• Performance graphs in vCentre

Performance Analysis Tools

Page 11: VMware Performance Troubleshooting

• esxtop can be run:– Interactively – Batch (eg. esxtop -a -b > analysis.csv)– Load batch into windows perfmon or MS Excel

• Two keys to remember– H : help– F : fields to display

esxtop

Page 12: VMware Performance Troubleshooting

esxtop basics

Number of WorldsName of Resource Pool, Virtual Machine or World

Host Resources

Page 13: VMware Performance Troubleshooting

Performance Characteristics

CPU NetworkingMemory DiskSlow ProcessingHigh CPU Wait

Packet LossSlow Network

Slow ProcessingDisk Swapping

Log StallsDisk Queue

Slow Application PerformanceReduced User ExperienceData Loss and Corruption

Page 14: VMware Performance Troubleshooting

CPUESX Scheduler

ServiceConsole

VirtualMachine

Limits / Shares / Reservations

Basic World StatesRead / Run / Wait

CPU StatesReady / Usage / Wait

Page 15: VMware Performance Troubleshooting

CPUesxtop

•PCPU(%): CPU utilization•%USED: Utilization•%RDY: Ready Time•%RUN: Run Time•%WAIT: Wait and idling time

High %RDY + High %User can imply over commitment

Page 16: VMware Performance Troubleshooting

CPUVI-Client

Used Time > Ready Time: Possible CPU over-committment

Used Time

Ready Time

Page 17: VMware Performance Troubleshooting

CPUFurther Investigation

%MLMTD shows this VM has been limited

Page 18: VMware Performance Troubleshooting

CPUFurther Investigation

High ready time caused by CPU resource limit

Page 19: VMware Performance Troubleshooting

VMware Memory Management• Transparent Page Sharing• VMware Tools Balloon Driver to force the VM to swap to disk• Virtual Machine Page File

Page 20: VMware Performance Troubleshooting

MemoryBallooning vs. Swapping

Ballooning driver causes the host to swap pages that it chooses to disk

ESX Swapping will swap any pages to disk.

Page 21: VMware Performance Troubleshooting

• Ballooning can be disabled (0 value) or controlled on a per Virtual Machine basis using:sched.mem.maxmemctl

• Default is set to 65%, can be controlled at host level.

• Only is an issue in resource contention scenarios. (or VM’s with low latency eg Citrix)

Memory

Page 22: VMware Performance Troubleshooting

Memory - Host

VI Client shows memory usage of the host. This is calculated as “consumed + overhead memory + Service Console”.

Performance charts are a very good way of showing the Virtual Machine memory breakdown.

• Consumed Memory• Ballooned Memory• Shared Memory• Swapped Memory

Page 23: VMware Performance Troubleshooting

Memory - Guest

Host Memory = Consumed + Overhead MemoryGuest Memory = Active Memory for Guest OS

Page 24: VMware Performance Troubleshooting

Memory – Guest Overhead

Page 25: VMware Performance Troubleshooting

Memory

Metric DescriptionMemory Active (KB) Physical pages touched recently by a VM

Memory Usage (%) Active memory / configured memory

Memory Consumed (KB) Machine memory mapped to a virtual machine, including its portion of shared pages. Doesn’t include overhead memory

Memory Granted (KB) Physical pages allocated to a virtual machine. May be less than configured memory. Includes shared pages. Doesn’t include overhead memory.

Memory Shared (KB) Physical pages shared with other virtual machines

Memory Balloon (KB) Physical memory ballooned from a virtual machine

Memory Swapped (KB) Physical memory in swap file (approx. “swap out – swap in”). Swap out and Swap in are cumulative

Overhead Memory (KB) Machine pages used for virtualisation

Virtual Machine Memory Metrics – VI Client

Page 26: VMware Performance Troubleshooting

Memory

Metric DescriptionMemory Active (KB) Physical pages touched recently by the host

Memory Usage (%) Active memory / configured memory

Memory Consumed (KB) Total host physical memory – free memory on host. Includes Overhead and Service Console memory

Memory Granted (KB) Sum of physical pages allocated to all virtual machines. Doesn’t include overhead memory.

Memory Shared (KB) Physical pages shared by virtual machines on host

Shared Common (KB) Total machine pages used by shared pages

Memory Balloon (KB) Machine pages ballooned from virtual machines

Memory Swap Used (KB) Physical memory in swap file (approx. “swap out – swap in”). Swap out and Swap in are cumulative

Overhead Memory (KB) Machine pages used for virtualisation

Host Memory Metrics – VI Client

Page 27: VMware Performance Troubleshooting

Memoryesxtop

PMEM: Total physical memory breakdownVMKMEM: Memory managed by vmkernelCOSMEM: Service Console memory breakdownPSHARE: Page sharing statisticsSWAP: Swap statisticsMEMCTL: Balloon driver data

Page 28: VMware Performance Troubleshooting

Memory

VI Client esxtopActive Memory TCHDMemory Usage %ACTVConsumed Memory N/AMemory Granted N/A (SZTGT and CMTTGT represent memory scheduler targets)Memory Shared SHRD (+SHRDSVD per VM). Must enable COW stats in ESXTOPMemory Balloon MCTLSZMemory Swapped SWCUR (SWR/s & SWW/s are rates)Overhead Memory OVHD & OVHDMAX

esxtop / VI Client metrics : Virtual Machines

Page 29: VMware Performance Troubleshooting

Memory

VI Client esxtopMemory Active N/A (try /proc/vmware/sched/mem-verbose)Memory Usage N/A (try /proc/vmware/sched/mem-verbose)Memory Consumed PMEM total – PMEM freeMemory Granted N/A (SZTGT and CMTTGT represent memory scheduler targets)Memory Shared PSHARE (shared)Memory Shared Common PSHARE (common)Memory Balloon MEMCTLMemory Swap Used SWAP (r/w and w/s are rates)Overhead Memory OVHD & OVHDMAX

esxtop / VI Client metrics : Host Usage

Page 30: VMware Performance Troubleshooting

MemoryVI Client memory usage graph

Page 31: VMware Performance Troubleshooting

MemoryTroubleshooting Memory usage issues

Page 32: VMware Performance Troubleshooting

Networking

Network configuration is more likely to blame than resource contention

•Switch Assisted Teaming (IP Hash)•VLAN Trunking•Flow Control (full)•Speed & Duplex (1000Mb / Full)•Port Fast•BPDU Disabled•STP Disabled•Link State Tracking•Jumbo Frames

Page 33: VMware Performance Troubleshooting

Networkingesxtop

Transmit and Receive in Mb/s

Transmit and Receive in Packets

Page 34: VMware Performance Troubleshooting

Networkingesxtop

Drop Packets Received

Dropped Packets Transmit

Page 35: VMware Performance Troubleshooting

Disk

Varying Factors• File system performance• Disk subsystem configuration (SAN, NAS, iSCSI, local disk)• Disk caching• Disk formats (thick, sparse, thin)

ESX Storage Stack• Different latencies for different disks• Queuing within the kernel

K: KernelD: DeviceG: Guest

Page 36: VMware Performance Troubleshooting

Disk

Quite Coarse Statistics• Disk read / write rate (KB/s)• Disk usage: sum of read BW and write BW (KB/s)• Disk read / write requests (per 20s interval)• Bus resets / Command aborts (per 20s interval)• Per LUN or aggregated stats

VI Client statistics

Page 37: VMware Performance Troubleshooting

DiskAggregated stats similar to VI Client• Disk read / write per sec (READS/s, WRITES/s)• MB read / write per sec (MBREAD/s, MBWRTN/s)

Latency Statistics• Kernel Average / command (KAVG/cmd)• Device Average / command (DAVG/cmd)• Guest Average / command (GAVG/cmd)

Queuing Information• Adapter Queue Length (AQLEN)• LUN Queue Length (LQLEN)• VMKernel (QUED)• Active Queue (ACTV)• %Used (%USD = ACTV/LQLEN)

esxtop statistics

Page 38: VMware Performance Troubleshooting

DiskSAN Rough Estimates

Purely looking at a single ESX host, roughly:Throughput (in MBps) = (Outstanding IOs * Block size in KB) / latency in msec

FC, rough maximums:Effective Link Bandwidth = ~80/90% of Real Bandwidth

Effective (2Gbps) = 200 – 230 MBpsEffective (4Gbps) = 410 – 460 MBpsEffective (8Gbps) = 820 – 920 MBps

iSCSI / NFS / FCoE, rough maximums:Effective Link Bandwidth = ~70/80% of Real Bandwidth

Effective (1GigE) = 90 – 100 MBpsEffective (10GigE) = 900 – 1000 MBps

Page 39: VMware Performance Troubleshooting

DiskDesired Latency CalculationsDesired Larency in msec <= (Outstanding IOs * Block size in KB) / Throughput per host

Example:Number of Hosts: 16Effective Link Bandwidth: 90 MBpsThroughput per host: 90 / 16 = 5.6 MBpsDesired Latency: (32 * 32) / (5.6) = 182.86 msec

Workload Cached Sequential Read Cached Sequential Write

Desired Latency (msec) 182.86 182.86

Observed Latency (msec) ~350 ~180

Throughput Drop? Yes No

Throughput (MBps) ~45 ~90

Page 40: VMware Performance Troubleshooting

DiskVI Client

SAN Cache disabled Poor throughput

SAN Cache enabledHigh throughput

Page 41: VMware Performance Troubleshooting

Diskesxtop

Latency is quite high

After enabling cache,Latency is reduced

Page 42: VMware Performance Troubleshooting

Virtual Machine OptimisationDeploy all machines from an optimised template!

• VMware tools MUST be installed• The disks MUST be block aligned to the storage (even when using NFS and SAN)• Where possible, always separate data disks from OS disks• Windows performance settings should be optimised for application performance• Guest operating system timeouts should be set as defined by the SAN vendor• Pagefile should be separated where appropriate (this can impact VMware SRM however)• Unused Windows services should be disabled (wireless config, print spooler, audio, etc.)• Last access update time should be disabled (unless where required)• Logging of the VM should be disabled (only enabled for troubleshooting)• Remove any unused virtual hardware (floppy drives, USB, etc.)• Disable screen savers and power saving features, including logon screen saver• Enable Remote Desktop, avoid using the VI Client for remote administration• Install standard applications into template (bginfo, AntiVirus, any host agents, etc)• Multiple-CPU’s should be allocated sparingly

Page 43: VMware Performance Troubleshooting

Virtual Machine OptimisationBlock alignment is vital to good disk performance!

Page 44: VMware Performance Troubleshooting

esxtopCommand Actionspace Update the display? Show the help pageq quitf / F Add or Remove columns from the displayo / O Change the order the display is sorteds change the update interval# change the number of instances to displayW Write configuration to filee Expand / Rollup CPU StatsV View only VM instancesL Change the length of the NAME fieldm Display memory statisticsn Display network statisticsi Display interrupt statisticsd Display disk adapter statisticsu Display disk device statisticsv Display disk VM statistics

Command Options when inside esxtop

Page 45: VMware Performance Troubleshooting

esxtop

Command Action-b batch mode-l locks the objects available in the first snapshot-s enables secure mode-a show all statistics-c sets the configuration file-R enables replay mode (used with “vm-support –S”)-d sets the update interval-n runs esxtop for n iterations

Command Line Optionsfrom the console

Page 46: VMware Performance Troubleshooting

esxtop

Expand the default window size for your session to get all statistics

Page 47: VMware Performance Troubleshooting

vm-supportCreates a packaged zip file containing the following sections:• boot

• contains the grub configuration• etc

• contains the Console OS configuration files (cron, tcpwrappers, syslog, etc)• proc

• contains much of the hardware configuration modules and variables• tmp

• contains a lot of the ESX specific configuration output• var

• contains log files and any core dumps• vmfs

• contains the structure of the VMFS datastores• esx3-installation (where appropriate)

• contains a copy if the previous esx3 configuration variables

Page 48: VMware Performance Troubleshooting

vm-supportUsing vm-support to extract performance information:

vm-support –S –d <duration> -i <interval><duration> and <interval> are in seconds

The output from this can then be replayed in esxtop for review after it has been extracted.

esxtop –R <path_to_vm-support_output>

Page 49: VMware Performance Troubleshooting

Service Console Performance

•Multiple Service Console networks – for network resiliency•Increased Service Console memory – upto 800MB•Use host agents supplied by your vendors•Make storage recommended tweaks such as HBA Queue Depth and IO timeouts•Minimal use of the VI Client console – RDP or SSH instead•Properly sized vCenter server – 64bit OS where possible

Page 50: VMware Performance Troubleshooting

Resource Groups

Dynamically reallocate resource shares

Additional VM, shares allow you to over-commit resources and have a graceful re-allocation

Remove a VM and exploit extra resources across all remaining VM’s

Page 51: VMware Performance Troubleshooting

Design Guidelines• Full Resilience / Multiple paths• Standard configuration across all aspects (ESX, Storage, Networking, etc.)

• Standard naming conventions• Learn from others mistakes• Follow guidelines from vendors best-practices• Rule out the basics before requesting support

Page 52: VMware Performance Troubleshooting

Capacity Planner & P2V Cautions and Limitations

• Peak CPU usage can sometimes be misleading• Back-end storage system performance• P2V machines will require block-aligning to the storage• P2V machines will still require guest OS optimisation

Page 53: VMware Performance Troubleshooting

Conclusion• Performance issues can often be traced with simple root cause analysis using basic tools (VI Client / esxtop)• Performance tools help diagnose issues and help rule out non-issues• Performance tools are useful in different contexts, not always either/or• Real-time data and troubleshooting: esxtop• Historical data: VI Client• Coarse resource / cluster usage: VI Client• Detailed resource usage: esxtop

• Combine information from various tools to get a complete picture• Always benchmark your systems first so you not what the optimal performance is that you can receive

Page 54: VMware Performance Troubleshooting

Reference Articles• http://www.vmware.com/pdf/esx3_memory.pdf• http://www.vmworld.com/docs/DOC-2370• http://blogs.vmware.com/performance/• http://communities.vmware.com/docs/DOC-5420• http://kb.vmware.com/kb/1008205 • http://communities.vmware.com/community/vmtn/general/performance• http://www.vmware.com/products/vmmark/ • http://www.vmware.com/pdf/vsphere4/r40/vsp_40_san_cfg.pdf• http://www.vmware.com/pdf/vsphere4/r40/vsp_40_iscsi_san_cfg.pdf• http://www.vmware.com/pdf/vsphere4/r40/vsp_40_resource_mgmt.pdf • http://www.vmware.com/pdf/GuestOS_guide.pdf • http://www.vmware.com/resources/techresources/10066 • http://www.vmware.com/resources/techresources/10059• http://www.vmware.com/resources/techresources/10062