oncommand balance v4.0 self evaluation guide -...

26
OnCommand Balance v4.0 Self Evaluation Guide NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING STARTED USING ONCOMMAND BALANCE The purpose of this guide is to support a self-guided, hands-on evaluation of NetApp OnCommand Balance. This guide is intended to show users how to get started with Balance to monitor and manage shared infrastructure performance to increase utilization and assure SLAs.

Upload: others

Post on 09-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

OnCommand Balance v4.0 Self Evaluation Guide NetApp

July 2012

A SELF-GUIDED MANUAL ON GETTING STARTED

USING ONCOMMAND BALANCE

The purpose of this guide is to support a self-guided, hands-on evaluation of NetApp OnCommand Balance. This guide is intended to show users how to get started with Balance to monitor and manage shared infrastructure performance to increase utilization and assure SLAs.

Page 2: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

2 OnCommand Balance v4.0 Self Evaluation Guide

TABLE OF CONTENTS

1 GETTING STARTED............................................................................................................. 4

ABOUT ONCOMMAND BALANCE.................................................................................................................................4

PURPOSE AND USE OF THIS GUIDE............................................................................................................................4

HELP AND SUPPORT DURING EVALUATION...............................................................................................................4

HARDWARE AND SOFTWARE REQUIREMENTS..........................................................................................................6

SUMMARY OF EVALUATION TASKS ............................................................................................................................6

2 ONCOMMAND BALANCE DASHBOARD REVIEW .................................................................. 6

3 USE CASE #1: EXAMINE STORAGE TROUBLESHOOTING AND OPTIMIZATION ...................... 9

4 USE CASE #2: EXAMINE VM WORKLOAD TO STORAGE VISIBILITY .................................... 13

5 USE CASE #3: ANALYZE VM WORKLOAD DISTRIBUTION OPTIMIZATION............................ 16

6 USE CASE #4: IDENTIFY AND PRIORITIZE MISALIGNED LUNS AND VMDK PARTITIONS TO REDUCE PERFORMANCE RISK ......................................................................................... 20

7 USE CASE #5: REVIEW SCORECARDS AND REPORTS ....................................................... 22

SCORECARD REPORTS .............................................................................................................................................22

SERVER REPORTS.....................................................................................................................................................23

PILOT REPORTS ON THE NETAPP COMMUNITY .......................................................................................................24

8 NEXT STEPS..................................................................................................................... 25

LIST OF TABLES

Table 1) OnCommand Balance Videos. ........................................................................................................................... 5

LIST OF FIGURES

Figure 1) OnCommand Balance Support Page. ............................................................................................................. 5

Figure 2) Dashboard: “In Trouble” Panels for Storage, VMs, Virtual Hosts, and Physical Servers. ..................... 7

Figure 3) Dashboard: Key Components of the Storage In-Trouble Panel. ................................................................ 7

Figure 4) Dashboard: Key Components of the Server In-Trouble Panels.................................................................. 8

Figure 5) Dashboard: Monitored Environment Summary............................................................................................. 8

Figure 6) Troubleshooting a Storage Problem. .............................................................................................................. 9

Figure 7) Event Summary Description............................................................................................................................. 9

Figure 8) Disk Utilization is Storage Problem Indicator.............................................................................................. 10

Figure 9) Understanding Disk Utilization. ..................................................................................................................... 10

Figure 10) OnCommand Balance Predictor. ................................................................................................................. 11

Figure 11) Workload Breakdown Graph and Analytics............................................................................................... 11

Figure 12) Bullies and Victims. ....................................................................................................................................... 12

Figure 13) Click the Contention Tab (top of diagram) to Reveal the Contention Graph (bottom). ...................... 13

Figure 14) ESX Host Perspective.................................................................................................................................... 14

Figure 15) Workload Perspective. .................................................................................................................................. 14

Page 3: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

3 OnCommand Balance v4.0 Self Evaluation Guide

Figure 16) Disk Group View. ............................................................................................................................................ 15

Figure 17) Highlight Full Data Paths to Storage........................................................................................................... 16

Figure 18) Understanding the Performance Index for a Host. ................................................................................... 17

Figure 19) Selecting a Portion of the Performance Index Graph to Examine.......................................................... 18

Figure 20) Examining a Portion of the PI Graph (PI~50). ............................................................................................ 19

Figure 21) View the Report on Misaligned LUN Partitions. ........................................................................................ 20

Figure 22) Misalignment is More Critical in a Virtualized Environment. .................................................................. 21

Figure 23) Misalignment Report...................................................................................................................................... 21

Figure 24) Virtual Machine Scorecard............................................................................................................................ 22

Figure 25) Storage Scorecard. ........................................................................................................................................ 23

Figure 26) Server Volume Capacity Utilization Forecast. ........................................................................................... 24

Figure 27) Pilot Reports. .................................................................................................................................................. 25

Page 4: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

4 OnCommand Balance v4.0 Self Evaluation Guide

1 GETTING STARTED

ABOUT ONCOMMAND BALANCE

OnCommand™ Balance performance and capacity management software provides analysis across IT

virtualization layers and technology silos for both virtual and physical servers and storage. OnCommand Balance helps companies troubleshoot performance problems within minutes, optimize utilization, and improve performance in the dynamic data center.

Unlike traditional system and element management tools that look at only at one silo (physical or virtual, servers or storage), OnCommand Balance agentless software dynamically models and analyzes the entire infrastructure. This aids understanding of how application workloads, utilization levels, and resources interact, bringing infrastructure-wide intelligence to the data center. Designed to go beyond basic infrastructure performance monitoring tools, OnCommand Balance is an operational solution that enables organizations to advance virtualization, providing a stepping-stone to the private cloud.

Using an agentless design, this virtual appliance installs quickly and easily. Performance capacity analytics provide automated analysis of shared infrastructure. Key performance indicators, such as infrastructure response time, performance index, and disk utilization, provide guidance. With dozens of pre-canned reports and scorecards, you can easily report on the status of virtual machines (VMs), physical servers, and storage. Balance also provides heterogeneous support for hypervisors and storage including VMware and Hyper-V as well as NetApp, EMC, HDS, HP, and IBM. VMware-ready and management certified, it can help your team easily and more efficiently complete initiatives such as virtualization projects, tiering strategies, consolidations and migrations, and performance optimization.

PURPOSE AND USE OF THIS GUIDE

The purpose of this document is to support a self-guided, hands-on evaluation of NetApp OnCommand

Balance. This guide is intended to show you how to get started with OnCommand Balance to monitor and manage performance, increase utilization, and assure service level agreements (SLAs).

After summarizing hardware and software requirements and referencing companion materials for software installation and configuration, this guide first illustrates use of the software dashboard. Then the guide provides five common use case scenarios to aid product evaluation. Screen captures are provided in a sample environment to help you follow the use cases. Please note that the example screenshots shown throughout this guide are taken from a demonstration database and intended to be used to demonstrate the capabilities of the product. The data will appear differently in every environment.

The use cases are intended for evaluation sequentially, but individual cases can be used. The use cases reflect representative applications of OnCommand Balance, but by no means cover all possible uses of the software.

HELP AND SUPPORT DURING EVALUATION

This guide does not cover software installation and configuration. Refer to the following resources:

Deploying OnCommand Balance: Step-by-step video guides for installing and configuring OnCommand Balance:

Installing OnCommand Balance (13 min): www.brainshark.com/netapp/vu?pi=zG0z8UQH6z2hlJz0

Configuring OnCommand Balance (17 min): www.brainshark.com/netapp/vu?pi=zIJzmdOuTz2hlJz0

Product Documentation: Download Release Notes, User Guide, Configuration Guide, Installation Guide (see Figure 1):

Now.netapp.com – search for ―OnCommand Balance‖. You must login to access the documentation.

Page 5: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

5 OnCommand Balance v4.0 Self Evaluation Guide

Figure 1) OnCommand Balance Support Page.

The following resources provide more information on OnCommand Balance:

Short introduction and ―how to‖ videos explain key product capabilities (see Table 1)

Table 1) OnCommand Balance Videos.

Link to Video Duration (minutes)

Don’t Get Lost, Get OnCommand Balance 2:50

Which Servers are Causing Resource Contention? 4:06

Optimizing Your Virtual Infrastructure 5:40

Are Workloads Resource Constrained? 2:58

OCI OnCommand Insight Murder Mystery 5:12

Data sheet

Customer Community

If you require further assistance deploying OnCommand Balance:

Submit a support request

Page 6: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

6 OnCommand Balance v4.0 Self Evaluation Guide

HARDWARE AND SOFTWARE REQUIREMENTS

The Web-based GUI console virtual appliance requires VMware vSphere™ 4.0 (or higher) with two CPUs at 2.33 GHz minimum and 4 GB of VM memory, and requires four disks (RAID 5/10) with 200 GB. Some arrays may require a proxy VM with one CPU at 2 GHz and 1 GB of VM memory

OnCommand Balance incorporates discovery, data collection, and analysis for resources including:

Servers: Microsoft® Windows®, HP-UX, IBM, Red Hat Enterprise Linux®, Oracle® Solaris, VMware, and Microsoft Hyper-V™

Storage: NetApp, Dell EqualLogic, EMC, Hitachi Data Systems, HP, IBM—SAN and NAS (may require SMIS proxy and/or native CLI)

Applications: RDBMS, OLTP, OLAP, file, e-mail, streaming media, etc., with drill-down for Oracle® 9i™/10g™, 11g™ on Microsoft® Windows®, and Microsoft SQL Server® 2005 and 2008.

SUMMARY OF EVALUATION TASKS

This section outlines suggested evalutation tasks to illustrate the value of OnCommand Balance. The following key capabilities are covered:

Review dashboard

Use Case #1: Examine storage troubleshooting and optimization

Use Case #2: Examine VM workload to storage visibility

Use Case #3: Analyze VM workload distribution optimization

Use Case #4: Identify and prioritize misaligned logical unit numbers (LUNs) and virtual machine disk format (VMDK) partitions to reduce performance risk

Use Case #5: Review scorecards and reports

OnCommand Balance software is designed for heterogeneous environments. It offers support for

products from many hypervisor, server, and storage vendors. As a result, there are references to both NetApp and non-NetApp products.

To get started, launch OnCommand Balance to follow along with the tasks.

2 ONCOMMAND BALANCE DASHBOARD REVIEW

Examining the dashboard provides an overview of the health of the infrastructure. This section shows how OnCommand Balance answers the following questions:

How can you quickly view infrastructure resources that are in trouble?

How do you know where to start troubleshooting a performance problem?

If several problems are identified, does OnCommand Balance list them by urgency or severity?

Can OnCommand Balance show you which servers or storage might be misconfigured?

Can you hide resources that you do not concern you?

The dashboard emphasizes the most significant issues from the monitored infrastructure, including storage arrays, VMs, virtual hosts, and physical servers (see Figure 2). For example, the dashboard shows HPUX and AIX physical servers, Hyper-V and VMware virtual hosts (as well as IBM LPARs), the VMs or guests (Hyper-V, VMware or LPARs), and storage. (OnCommand Balance supports a large number of storage vendors.)

Page 7: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

7 OnCommand Balance v4.0 Self Evaluation Guide

Figure 2) Dashboard: “In Trouble” Panels for Storage, VMs, Virtual Hosts, and Physical Servers .

The storage in-trouble panel displays alerts created by a number of OnCommand Balance’s deep storage analytics (see Figure 3). The display includes when the event occurred; an admin-friendly summary of the specific disk group, aggregate or controller event; and a 10-day display of the OnCommand Balance disk utilization analytic for disk groups and aggregates, or CPU utilization for storage controllers. The disk utilization analytic provides a way to see how much headroom or performance capacity remains at shared storage.

Figure 3) Dashboard: Key Components of the Storage In-Trouble Panel.

Page 8: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

8 OnCommand Balance v4.0 Self Evaluation Guide

The server in-trouble panels display alerts associated with OnCommand Balance’s server-based analytics (see Figure 4). It provides a consolidated view of the most significant issues in the infrastructure, allowing administrators to quickly locate hotspots and developing performance issues. Each panel displays the name of the server, with the top five listed by default, a 6-day week and 72-hour display of the OnCommand Balance performance index (PI). Each panel also displays relevant alerts associated with the server’s CPU, memory and storage, many of which are unique OnCommand Balance analytics. Similar to the disk utilization analytic, the PI shows how much headroom or performance capacity remains for the server. For example, if the server is a virtual host such as ESX or Hyper-V, the PI shows how many more VMs could be supported. (The PI and other server-based analytics are further examined later in this guide.)

Figure 4) Dashboard: Key Components of the Server In-Trouble Panels.

In addition to identifying potential problems, the dashboard also summarizes data about the monitored environment. It shows how many servers and arrays are in the environment, how many guests are monitored or unmonitored, storage and OS types, capacity-based licensing information, and even quick links to add additional resources directly from the dashboard.

Figure 5) Dashboard: Monitored Environment Summary.

Page 9: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

9 OnCommand Balance v4.0 Self Evaluation Guide

3 USE CASE #1: EXAMINE STORAGE TROUBLESHOOTING AND

OPTIMIZATION

This use case illustrates how OnCommand Balance helps answer the following storage questions:

How can you view the ―performance capacity‖ at storage?

How can you view all resources that are sharing storage – virtual and physical?

Can you be notified when a ―bully‖ server is victimizing other servers and/or overwhelming shared storage?

Can you view workload contention on your shared storage?

How can you accurately and efficiently right-size storage capacity and deploy new workloads?

Returning to a portion of the storage in-trouble panel in Figure 2, at 11:00 am today OnCommand Balance discovered an array disk group that exceeded its disk utilization threshold (see Figure 6). OnCommand Balance is also pointing out that this was the result of three bully workloads generating abnormally high throughput. Click the details link to jump to the detailed analysis of that point in time.

Figure 6) Troubleshooting a Storage Problem .

Note that on the detailed analysis page, OnCommand Balance provides a high-level description of the problem (see Figure 7). Rather than simply providing raw data, in this description and elsewhere on this page, OnCommand Balance leverages its analytics and draws conclusions about the environment and impact on workloads.

Figure 7) Event Summary Description.

Page 10: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

10 OnCommand Balance v4.0 Self Evaluation Guide

The red cherry in Figure 8 denotes the point in time when the disk utilization threshold was exceeded and OnCommand Balance sent the alert. This occurred Sunday morning at 11:00 am when disk utilization surpassed 90 percent.

Figure 8) Disk Utilization is Storage Problem Indicator.

Disk utilization warns against increasing loads on a disk group or aggregate and is represented as a percentage of total possible disk activity, with 0 percent for no activity and 100 percent as full load (see Figure 9). Disk utilization is based on throughput, response time, and queue depth (shown in parentheses at the left side of Figure 9), which OnCommand Balance monitors for each disk group and graphs in these unique displays. The disk utilization graph is tracking cumulative utilization for all workloads using the disk group. Use it to monitor and track utilization at the disk group level and see how much remaining performance capacity, or headroom, remains. Its configurable threshold, shown here with the default value of 90 percent, provides control over the OnCommand Balance predictor. When disk utilization reaches this threshold, it is marked on the display with a red dot, and the OnCommand Balance predictor sends an email, SNMP trap, or Syslog with all of the details.

Figure 9) Understanding Disk Utilization.

Figure 10 is an example of the predictor email, which provides the following:

A clear description of the problem

The business impact, which is called the victim servers experiencing slower response times

The root cause or bully servers that are causing the problem

A hyperlink for a full analysis and detailed drill-downs of the individual server workloads.

The predictor helps eliminate guess-work and time consuming troubleshooting of failures; it provides the detailed root cause of developing problems before they result in a failure.

Page 11: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

11 OnCommand Balance v4.0 Self Evaluation Guide

Figure 10) OnCommand Balance Predictor.

From the Predictor email, you can return to the UI to examine the full analysis (which can also be accessed by clicking the ―view full analysis‖ hyperlink in Figure 10). The bottom of the alert page displays the unique workload breakdown charts (see Figure 11), which show how each independent workload is impacting (inputs/outputs operations per second (IOPS), response time, and disk utilization for the alert period. This display includes a detailed analysis of expected, average, and actual performance, as well as bully and victim ranking. Finally, the workload breakdown can be sorted by a variety of factors, including predictor ranking, most IO, worst response time, and victim or bully ranking.

Figure 11) Workload Breakdown Graph and Analytics.

Page 12: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

12 OnCommand Balance v4.0 Self Evaluation Guide

By monitoring this performance data over time, OnCommand Balance can determine whether specific workloads are bullies or victims. For example, Figure 12 shows that PITest5 is the bully and it was ranked the highest because it was averaging 87 IOPS versus the expected high of 33 IOPS. And at one point it actually hit 365 IOPS, performing ten times the amount of IO it normally does. Bullys 2 and 3 are nominally above comparatively. So OnCommand Balance monitors the throughput and response time and ranks these workloads based on their deviation from normal levels.

To identify PITest1 as the highest ranked victim, OnCommand Balance determined that PITest1’s normal throughput is 113-130 IOPS, determined that it was operating well within that range, but also determined that its response time was 21 ms versus the range of 7-13 that was expected. Using the disk utilization analytic and these workload breakdowns, OnCommand Balance determined that PITest1 was operating outside of the norm for 11 am on a Sunday or 2 am on a Friday. The OnCommand Balance predictor can notify administrators by email or administrators can see the problem on the OnCommand Balance dashboard. This page provides the details of what resources are driving the IO on this disk group or aggregate, as well as what resources are being impacted.

Figure 12) Bullies and Victims.

Another way to view disk utilization on a per-workload basis is with the contention graphs (see Figure 13). Clicking the contention tab provides a detailed view of all or selected workloads sharing the disk group. Just click or unclick the boxes at the left for a detailed understanding of how each workload is impacting overall disk utilization and response time. Mouse-over the graphic to see the percentage of disk utilization for each workload for each point in time. The contention graphic provides another visualization of workloads driving IO on a disk group or aggregate.

Page 13: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

13 OnCommand Balance v4.0 Self Evaluation Guide

Figure 13) Click the Contention Tab (top of diagram) to Reveal the Contention Graph (bottom).

4 USE CASE #2: EXAMINE VM WORKLOAD TO STORAGE VISIBILITY

This use case shows how OnCommand Balance provides answers to the following questions using the topology views and cross-domain features:

How do you view your entire infrastructure: VMs, servers and storage?

How do you view full data paths from VM to storage?

Can you view hotspots in order to quickly identify the root cause of performance problems?

Can you change the perspective of the view from any resource type?

Figure 14 is an example of the OnCommand Balance topology view from the perspective of an ESX host. The left side of the diagram shows the workloads, VMs, and volumes that are running on this host. It also shows the virtual machine file system (VMFS) partitions at the ESX level and ultimately how those are mapped back to LUNs, disk groups, or aggregates in your storage environment. This view also overlays performance information with status icons to clearly show where contention exists inside of the data center without the need to drill further into the product or investigate on your own. Status is updated automatically and remains up to date when using Vmotion or DRS.

Page 14: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

14 OnCommand Balance v4.0 Self Evaluation Guide

Figure 14) ESX Host Perspective.

To see the topology view from the perspective of a workload, simply right-click the application icon and select re-orient topology. This reveals all of the resources that this application workload leverages (see Figure 15).

Figure 15) Workload Perspective.

Page 15: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

15 OnCommand Balance v4.0 Self Evaluation Guide

The topology can also be further reoriented from any resource type. Because Figure 15 identifies a problem with CLARiiON Disk Group 6, the topology can be reoriented from its perspective to view resources that impact it (see Figure 16). The topology view re-oriented to the disk group peers outward from the spindles to all resources connected to and sharing it – ESX hosts, Linux and Windows VMs, and even physical servers if connected. This helps administrators understand not only how all of these are configured but also understand any kind of performance implication from the sharing of the storage across the virtual and physical sides of the data center.

Figure 16) Disk Group View .

Page 16: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

16 OnCommand Balance v4.0 Self Evaluation Guide

Clicking any resource type highlights its full data paths to storage, whether that storage resides on individual hosts or the SAN or NAS (see Figure 17).

Figure 17) Highlight Full Data Paths to Storage.

5 USE CASE #3: ANALYZE VM WORKLOAD DISTRIBUTION

OPTIMIZATION

This section shows how OnCommand Balance provides answers to the following questions:

Can you determine why a VM is performing poorly? Does it have too much or too little CPU, memory, or I/O?

Is your workload storage bound or CPU bound?

How would you determine whether a slow response time is normal or abnormal? What is your normal response time?

Can you view trends between two points in time? Why is your application slower than last week?

In Figure 18, the upper graph (highlighted in RED) presents an hourly average of the performance index (PI) for a host; in this case, esx21. The lower graph (highlighted in YELLOW) offers a detailed view of the data at any point on the trend line in the upper graph. The section to the lower right (highlighted in GREEN) presents the numerical data that was used to calculate the PI reflected in the lower point-in-time graph.

OnCommand Balance calculates a host’s PI value by analyzing the metrics of the elements that affect a host's performance: CPU, memory, and storage. All three of these resources are critical components in the data processing path. After all, data has to pass from the CPU to memory, to storage, and back before an application can process the data. Using NetApp’s proprietary queuing network model,

Page 17: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

17 OnCommand Balance v4.0 Self Evaluation Guide

OnCommand Balance determines performance information for each of these resources to derive the host’s PI.

When another workload is placed on a host, that host ’s performance degrades by some amount. OnCommand Balance calculates the point at which the performance cost of adding an additional workload exceeds the benefit of the work it performs. OnCommand Balance assigns a value of 100 to that point – the point of near-perfect efficiency. That value is the ―optimal point‖ – the basis of the PI.

Once the point of optimization is known, resource optimization decisions are easier to make. You can determine whether you can add VMs to this ESX host or whether there are too many VMs running on that host. If a host’s PI value is below 100, the server is underutilized and it can support additional workloads (or VMs). Conversely, if the PI value is above 100, the system is overworked and should have some workloads removed.

In the upper graph on the PI page, the time-averaged PI data on this graph can be used to identify any trends or patterns in server esx21’s performance levels. If customers report that an application is experiencing episodes of poor performance, you can use the PI graph to identify when those episodes are occurring and focus the investigation there.

Figure 18) Understanding the Performance Index for a Host.

When you select a point on the upper graph (RED area), the data in the lower graph (YELLOW area) updates to reflect the specific PI information of the server at the selected point in time. The two points in the lower graph, GREEN and RED, represent the current and optimal operating levels of the server at that time, respectively. If the GREEN point (current) is to the left of the RED point (optimal), the server is

Page 18: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

18 OnCommand Balance v4.0 Self Evaluation Guide

underutilized. If the RED point (optimal) is to the left of the GREEN point (current), the server is over-utilized. At an optimal operating level, the two points should be coincident (on top of each other).

The information pane to the right of the point-in-time graph (GREEN area) presents a detailed snapshot of the actual data on which that graph is based. The pane includes numerical values for CPU utilization, throughput, and response time, as well as the PI value for that sample period. The data can be used to develop a clearer understanding of the cause of any performance issue, as well as provide a point of departure for further investigation.

Select a smaller portion of the PI trend graph to examine. To do this, on the line in the upper graph, find a point where the PI value is approximately 50. Left-click on that point (YELLOW) and drag the mouse to the right, to a point after and well-below the spike (see Figure 19).

Figure 19) Selecting a Portion of the Performance Index Graph to Examine.

As a result, the upper graph in Figure 20 now contains only the data from the selected section. Click on a point on the line on the upper graph in Figure 20. The lower graph in Figure 20 now contains the PI value of the server at that specific point in time.

In the lower graph in Figure 20, the scale on the x-axis represents the amount of throughput the host is generating. The y-axis represents the response time (number of transactions per second) the server is pushing. The shape of the curve shows that as the number of transactions increases, the amount of time required to process each transaction increases, as well. In terms of virtualization, adding more VMs to this host pushes the number of transactions per second (x-axis) to the right, increasing the response time (Y-axis). As the throughput increases, response time (IO latency) increases. And, as response time increases, server efficiency decreases.

Page 19: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

19 OnCommand Balance v4.0 Self Evaluation Guide

Figure 20) Examining a Portion of the PI Graph (PI~50).

Farther to the right, the curve becomes steeper. The red dot, (Optimal PI value), is placed at the point on the curve where the rate of rise accelerates. At that point, the performance cost of adding workloads becomes less linear and more exponential. That is the point where the cost of adding workloads exceeds the benefit.

In Figure 20, the point on the top graph represents a PI of approximately 50. The lower graph shows the current operating point (GREEN) with a PI of 50 and the optimal point (RED) with a PI of 100. Additional processing up to the optimal point has a more linear effect on performance. After that point, this host’s performance will degrade exponentially. If the optimal point represents ideal utilization (consumption of 100% with optimal efficiency), and the current operating point is 50%, then only 50% of the available capacity of this host is being consumed. Hence, the workload on this server could be doubled without compromising performance. So if the host is running five VMs, five similar VMs could be added without losing efficiency.

Note the information in the details pane:

CPU utilization on this host is only 15%.

Average IO throughput is 95.76 transactions per second. Because the optimal throughput is 191.14 transactions per second, the server is not exerting itself.

Optimal overall response time for this server is 17.14 milliseconds. However, the server’s actual response time for this sample period is 10.84 milliseconds.

The server is operating at approximately 40% utilization.

Now, in the upper graph, select a point on the line where the PI is approximately 100. Because the host’s current PI is near 100, this host is operating at a level where adding workloads would cause a dramatic degradation in performance. If another VM is needed to run MSSQL, this ESX host would be the wrong host.

Now, in the upper graph, select a point on the line where the PI is substantially greater than 300. Clearly, the server is operating well beyond its envelope of maximum efficiency.

Page 20: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

20 OnCommand Balance v4.0 Self Evaluation Guide

In conclusion, the PI is an extremely powerful tool. It allows users to distribute VMs across the environment without having to guess where to place them. It provides visibility into the state of the CPU, memory, and IO efficiency on each of the ESX hosts in real-time. It provides visibility across the entire environment and identifies where resources are available and where resources are over-provisioned. This information can be used to ensure that the VMs are distributed optimally across the environment. It also identifies how much headroom is left in the infrastructure. This provides the knowledge to confidently manage the organization’s purchase of additional capacity.

6 USE CASE #4: IDENTIFY AND PRIORITIZE MISALIGNED LUNS AND

VMDK PARTITIONS TO REDUCE PERFORMANCE RISK

If there is misalignment in your infrastructure, you will see an alert message, marked with a yellow triangle

and an exclamation point, toward the top of the page. If you have one, click on the ―view the report‖ link at the end of the alert message (see Figure 21).

Figure 21) View the Report on Misaligned LUN Partitions.

―Misalignment‖ is short-hand for ―file system misalignment.‖ Misalignment has a negative impact on storage performance, charging a storage tax that the administrator is probably unaware they are paying. Misalignment is the result of a configuration mismatch between an operating system (OS) and a storage controller.

When the file systems of an OS and a storage controller are misaligned, any storage IO generated by the OS (a write to a block of data on disk, for example) results in multiple pieces of IO activity for the storage controller and its disk (writes to multiple disk blocks). In this manner, file system misalignment raises the volume of storage IO artificially, as a single IO event is ―multiplied,‖ creating numerous IO events on the storage layer. The artificially high volume of storage activity occupies storage resources with excessive, inefficient IO, making them unavailable for other purposes. By keeping storage unnaturally busy, misalignment compromises overall storage performance and wastes resources unnecessarily.

The problem is more critical in a virtualized environment (see Figure 22). Because each VMDK contains its own file system, each VM can suffer from misalignment. When a volume contains tens or hundreds of VMs, the problem multiplies, becoming much more concentrated and therefore much more acute.

Fortunately, NetApp has developed tools to help identify and remediate misalignment in an environment. OnCommand Balance understands server and storage infrastructures, including best practices configurations for storage from multiple vendors. It also monitors the activity of the hosts (both physical and virtual). As a result, it can identify hotspots where host activity and misalignment collide to create critical storage bottlenecks.

The OnCommand Balance misalignment report gathers the list of misaligned partitions onto a single page and ranks them by the amount of IO activity (average throughput) each is experiencing (see Figure 23). By organizing the information in this way, OnCommand Balance helps prioritize where to take action first to gain the greatest immediate benefit. Because misaligned traffic has a substantial impact on storage performance, correcting misalignment problems also yields a substantial benefit.

Page 21: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

21 OnCommand Balance v4.0 Self Evaluation Guide

In this example (Figure 23), the report shows that the Windows 2003_1_MsSQL VM is generating the most misaligned storage traffic and is the most likely to cause performance problems.

Figure 22) Misalignment is More Critical in a Virtualized Environment.

Figure 23) Misalignment Report.

The ―alignment offset‖ column in Figure 23 shows the current alignment settings of the partition, as well

as the ―vendor recommended alignment‖ value. This information identifies the correct block alignment. The report also identifies two other partition misalignment problems that should be corrected. Not only does OnCommand Balance identify the affected disk partitions, it also provides a ranking of those most impacted. This enables you to create a prioritized list of VMs for correction based on the biggest payoff for the downtime that the conversion process will generate.

Tools such as the NetApp Virtual Storage Console (VSC) and mbralign, among others, can be used to make the corrections. Discussion of these tools is beyond the scope of this evaluation guide. You can

Page 22: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

22 OnCommand Balance v4.0 Self Evaluation Guide

read more about alignment in the NetApp technical report entitled, ―Best Practices for File System Alignment in Virtual Environments .‖

7 USE CASE #5: REVIEW SCORECARDS AND REPORTS

There are over 30 detailed reports built into OnCommand Balance today. These reports are generally split between application workloads, servers, and storage. Reports can be run in a variety of formats, and can be run ad hoc or you can use the tool’s scheduler to receive them in your inbox on a daily, weekly and monthly basis. This use case examines a few of the reports available in OnCommand Balance today.

SCORECARD REPORTS

The virtual machine scorecard provides performance and configuration summaries for all VMs in the infrastructure (see Figure 24). With regards to configuration, the report shows the number of CPUs and the amount of memory available at hosts and allocated to each of the VMs. From a performance perspective, it shows response times, IOPS, and usage index at storage, as well as the key CPU and memory statistics and analytics.

The display can be grouped by host or cluster if desired and can be sorted by a number of analytics. Use it to track high guest CPU utilization, VM run/wait, excessive memory utilization, and even storage response time for each VM. Using the VM scorecard, you can let utilization dictate resource allocation rather than application owner or vendor recommendations.

Figure 24) Virtual Machine Scorecard.

Page 23: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

23 OnCommand Balance v4.0 Self Evaluation Guide

The storage scorecard provides configuration and performance summaries for all storage objects in the infrastructure. From a configuration perspective, it shows the RAID type for each disk group or aggregate, the number of spindles inside of that RAID group, the number of attached LUNs, capacity information, etc. From a performance perspective, it enables you to track the new disk group utilization, response time, and IOPS by average and maximum values. You can use this report to find disk groups that are over-utilized, find disk groups that are under-utilized, and balance out workloads across your storage infrastructure.

Figure 25) Storage Scorecard.

SERVER REPORTS

The OnCommand Balance server volume capacity utilization forecast report takes a volume-metric look at storage across the infrastructure, reporting the current capacity, current utilization, and megabyte per day growth monitored by OnCommand Balance. Using this information, the report shows the number of

Page 24: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

24 OnCommand Balance v4.0 Self Evaluation Guide

weeks to reach 80%, 90%, and 100% utilization from a capacity perspective. This allows you to easily identify where capacity will be an issue in the coming weeks, allowing you to better plan activities such as scheduling application downtime, expanding VMDK files, procuring and allocating storage, etc. You can also identify where in your environment you are over-allocated with storage and balance out that allocation to avoid unnecessary storage purchases.

Figure 26) Server Volume Capacity Utilization Forecast.

PILOT REPORTS ON THE NETAPP COMMUNITY

The pilot reports tab has been improved to provide an easy way for sales personnel to add new pilot

Page 25: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

25 OnCommand Balance v4.0 Self Evaluation Guide

reports into OnCommand Balance (see Figure 27). Pilot reports are managed by the sales team directly and do not involve NetApp Global Support. Key features include:

• Existing pilot reports that were available in version 3.6 have been migrated to version 4.0

• New reports can be added

• Signed reports are available in a private group on the NetApp community: https://communities.netapp.com/groups/oncommand-insight-balance-access-approval

You can request access to these reports and add them to your instance of OnCommand Balance.

Figure 27) Pilot Reports.

8 NEXT STEPS

Thank you for evaluating OnCommand Balance. For additional information or to purchase OnCommand Balance, contact your local NetApp representative or authorized NetApp reseller.

If you would like to contact NetApp directly, call +1-877-263-8277 or request to speak with a sales representative by completing the form on this page: www.netapp.com/us/forms/sales-contact.html.

Join the conversation:

Provide your feedback, ideas, or questions on the OnCommand Insight customer community: https://communities.netapp.com/community/products_and_solutions/storage_management_software/oncommand-insight

Learn more at: www.netapp.com/oncommandbalance

Page 26: OnCommand Balance v4.0 Self Evaluation Guide - NetAppcommunity.netapp.com/fukiw75442/attachments/fukiw75442/... · 2014. 9. 24. · NetApp July 2012 A SELF-GUIDED MANUAL ON GETTING

26 OnCommand Balance v4.0 Self Evaluation Guide

NetApp provides no representations or warranties regarding the accuracy, reliability or serviceability of any

information or recommendations provided in this publication, or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The

information in this document is distributed AS IS, and the use of this information or the implementation of any recommendations or techniques herein is a customer’s responsibil ity and depends on the customer’s

ability to evaluate and integrate them into the customer’s operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed

in this document.

© Copyright 2012 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of

NetApp, Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, Go further, faster, and OnCommand are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.