cluster : active iq - docs.netapp.com · practices according to netapp recommendations. health...

14
Cluster Active IQ NetApp May 01, 2020 This PDF was generated from https://docs.netapp.com/us-en/active-iq-1/concept_ug_cluster_dashboard.html on May 01, 2020. Always check docs.netapp.com for the latest.

Upload: others

Post on 17-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cluster : Active IQ - docs.netapp.com · Practices according to NetApp recommendations. Health Trending It is extremely important to mitigate risks in a timely manner to prevent critical

ClusterActive IQNetAppMay 01, 2020

This PDF was generated from https://docs.netapp.com/us-en/active-iq-1/concept_ug_cluster_dashboard.html onMay 01, 2020. Always check docs.netapp.com for the latest.

Page 2: Cluster : Active IQ - docs.netapp.com · Practices according to NetApp recommendations. Health Trending It is extremely important to mitigate risks in a timely manner to prevent critical

Table of ContentsCluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1

Cluster Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1

Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  2

Upgrade Advisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  3

AutoSupport. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4

Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  5

Storage Efficiency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  7

Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  8

Cluster Viewer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  10

Page 3: Cluster : Active IQ - docs.netapp.com · Practices according to NetApp recommendations. Health Trending It is extremely important to mitigate risks in a timely manner to prevent critical

Cluster

Cluster DashboardThe new cluster dashboard is the central place to look for information about ONTAP clusters. Thedashboard also consolidates health, capacity, storage efficiency and performance insights.

There are two main ways to reach the cluster dashboard:

1. By searching a cluster name.

2. By searching for a node within the cluster. By default, you land on the cluster dashboard the nodebelongs to. From there you have shortcuts to reach the individual nodes.

The figure below shows the functionalities and information available from the cluster dashboard.

Cluster Dashboard has the following components:

At the top of the dashboard, the following critical information about the cluster is summarized:

• High Impact Risks

• Upgrade Recommendations

• AutoSupport On Demand Status

• End of Support details

The cluster dashboard also has more detailed information in the following widgets:

Configuration – This widget lists all the nodes in the cluster and provides hostname, serial number,system ID, ONTAP version, and model of the nodes within the cluster. From the “View ConfigurationDetails” button on top of this widget, you can view additional details about the cluster through the“Cluster Viewer”, which includes a visualization of how the cluster is cabled.

Capacity Forecasting – This widget on the cluster dashboard provides a simple view of whether anynodes within the cluster may be running out of capacity. If there are nodes that are over 90% capacity,or may reach that threshold within 6 months, you can select those nodes and reach out to NetApp torequest capacity addition.

Performance – Available for Internal Users Only – This new widget at the cluster level identifies issueswith performance AutoSupport or other performance characteristics at the cluster level. It looks at thefollowing critical areas:

• Truncation issues with Performance AutoSupport

• Nodes within the cluster with over 90% CPU utilization

• Nodes within the cluster with over 50% Disk utilization

Page 4: Cluster : Active IQ - docs.netapp.com · Practices according to NetApp recommendations. Health Trending It is extremely important to mitigate risks in a timely manner to prevent critical

• Unbalanced systems

The information icon on the top of the widget provides additional details about these critical attributes,and provides guidance on how you may be able to mitigate these critical conditions.

Health Summary – This widget shows the snapshot of risks, best practice gaps, hardware end ofsupport, and alerts of all the nodes within the cluster. You can click any of the numbers within thewidget to drill down into the details of each of these components.

Storage Efficiency – This widget shows the cluster level efficiency ratio, and lists the efficiency ratio ofindividual nodes. To view efficiency details of individual nodes, you can click the arrow on the topright of the widget.

Software Upgrade Recommendations – This widget does a gap analysis of the different components,including ONTAP, drive firmware, system firmware, and shelf firmware. You can download all thedetails into a worksheet. You can also click the different components to upgrade the components.ONTAP upgrade recommendation provide the latest and the most modern version of ONTAP that canbe upgraded considering the platform checks.

Cases - The cases widget enables you to view the recent case details of the cluster. You can alsodownload the details of the cases from the top of this widget.

Performance

From the System Fitness Dashboard, you can click the Performance icon to view the performancehistory of your system. These charts provide up to 60 days of historical performance data, which isuseful for performance trend and pattern analysis. The hourly averages used to prepare these chartsare reported in a daily performance AutoSupport data summary.

System interruptions, such as reboots and service disablements, can cause gaps in the chart. Theseperformance charts are intended for trending analysis, and they should not be used for detailedperformance monitoring or diagnostics. You should use onsite products such as OnCommand suite ofproducts for such use cases.

There are several viewable performance charts including Peak Performance (Headroom), CPU andDisk Utilization, IOPS, Latency and Throughput. Users can check one or more of these charts forselective viewing of performance charts. Charts are downloadable in PDF, SVG, and PNG formats. Youcan also export all the counter information into a CSV from the menu.

Peak performance zone is the area which is equal to or below the peak performance line. In simpleterms, it specifies the limit of good operating behavior for the given storage resource. When aresource’s utilization rises above this line, the client latencies increases rapidly.

Headroom is the difference between peak performance line and current utilization line. Monitorthe performance graphs periodically to identify the nodes that may run out of headroom. If the currentresources utilization is above this peak performance line for an extended time, a performance

Page 5: Cluster : Active IQ - docs.netapp.com · Practices according to NetApp recommendations. Health Trending It is extremely important to mitigate risks in a timely manner to prevent critical

remediation plan might be appropriate. A performance remediation plan might include setting QoSworkload limits, moving volumes or LUNs to another storage controller, or expanding the storagecluster.

The confidence factor is used to determine the accuracy of the peak performance line that is used inCPU and aggregate headroom graphs. The confidence factor counter indicates how good the range ofutilizations and latencies were observed for a resource in the system. The higher the confidence factor,the more accurate the peak performance line will be. Confidence factors range from 1 (low) to 3 (high).

There are cluster aggregated performance charts in cluster performance dashboard and can view nodelevel graphs.

Response Time by Protocol and Concurrency graphs are not available for cDOTsystems.

Upgrade AdvisorUpgrade Advisor offers a quick, automated, and accurate way to generate an ONTAP upgrade plan.From the System or Customer Dashboard, click the icon to open the following screen. By default, ifyou are clicking this from a system level, all nodes of the cluster or the HA-Pair (for 7-Mode systems)are auto populated.

In the next step, the recommended version of ONTAP is suggested. In some cases, users may prefer tostay at a higher or a lower version of ONTAP based on the needs of their installed base and standards.

Page 6: Cluster : Active IQ - docs.netapp.com · Practices according to NetApp recommendations. Health Trending It is extremely important to mitigate risks in a timely manner to prevent critical

Upgrade Advisor now detects SnapMirror relationships between clusters and notifies users ofcompatibility issues when users are planning to perform ONTAP upgrades.

Upgrade Advisor only supports checks for snap relationships between two clusters.Cascaded relationships and relationships among 3 or more clusters are not supported.

Upgrade Advisor checks for both DP and XDP relationships and warns users when it detectscompatibility issues. Generally, source and destination volumes must be running compatible ONTAPversions after the upgrade is completed.

See the Compatible ONTAP Versions for SnapMirror relationships topic in the ONTAP documentationcenter for more details about the basis of compatibility checks in Upgrade Advisor.

AutoSupportWith AutoSupport you can view full AutoSupport details, including weekly AutoSupport logs. The leftpanel contains a menu that lists all the subsections of an AutoSupport message. The most commonlyused AutoSupport sections appear at the top, and the rest of the sections are listed in alphabeticalorder. This is a good place to selectively view individual AutoSupport sections without going throughthe entire AutoSupport message.

By default, wherever available, the sysconfig –a section of the latest weekly AutoSupport message isdisplayed.

You can also download the complete AutoSupport message in either HTML or text format for viewingor troubleshooting.

Newly added functionalities also enable the following:

• Filtering of AutoSupports by type of AutoSupport (Management, Performance, Weekly, Other)

• Searching by section name

• Simple tabular viewing of XML sections - you can change column positions, save column

Page 7: Cluster : Active IQ - docs.netapp.com · Practices according to NetApp recommendations. Health Trending It is extremely important to mitigate risks in a timely manner to prevent critical

preferences, and download the XML section in an Excel file for further use and analysis.

Health

The Health tab contains system risks that identifies configuration or other kinds of issues thatmay impair system performance, availability, and resilience. Each risk entry contains informationabout the specific risk, the potential negative impact, and links to mitigation plans for that risk.Addressing these risks proactively can improve your NetApp storage availability.

Impact Level Definitions:

• High – High potential of a system outage or data corruption, address immediately. Examplesinclude HA Takeover Impossible and Shutdown Pending.

• Medium – May cause system downtime such as a panic. Address as soon as possible.

Case Probability analyzes risk data and technical support case data from the last two to three years.Using machine learning determines the likelihood that a technical support case will be opened for thesystem within 90 days of the risk being detected. This results in determining strong correlationbetween the first discovery of a risk and whether a case is opened.

Using the risk’s impact level and the risk to case confidence value to compute a “Case Probability”score. This score is used to rank the risks present on a system for which risk should be mitigated first.

Security Vulnerability

The Security Vulnerability tab identifies systems with security risks. This tab contains informationabout the specific risk, the potential negative impact and link to the CVE bulletin.

Impact Level Definitions for Security Risks

Page 8: Cluster : Active IQ - docs.netapp.com · Practices according to NetApp recommendations. Health Trending It is extremely important to mitigate risks in a timely manner to prevent critical

The Impact level for Security Risks is based on the Common Vulnerability Scoring System (CVSS) andnoted in the Impact section of the CVE bulletin. The CVSS provides an open framework forcommunicating the characteristics and impacts of IT vulnerabilities. Its quantitative model ensuresrepeatable accurate measurement while enabling users to see the underlying vulnerabilitycharacteristics that were used to generate the scores. Thus, CVSS is well suited as a standardmeasurement system for industries, organizations, and governments that need accurate and consistentvulnerability impact scores. For more information, please visit https://nvd.nist.gov/vuln-metrics/cvss

TIP: If you would like to receive system risk report on a regular basis, click Schedule a Risk Report.

Proactive Remediation

The Proactive Remediation tab lists notification of systemic quality risks that are of high impact andmay impair system performance, availability, and resiliency. Each risk entry contains informationabout the specific risk, systems impacted, and link to remediation recommendations. Risk notificationswill be moved to the Acknowledge System Health tab post acknowledgement. Addressing these risksproactively at the onset can improve your storage availability. Systemic quality risks that are of highimpact and have a high rate of occurrence resulting in potential node or cluster outage or datacorruption must be addressed immediately.

Best Practices

Best practices are available from the Health Summary tab in the left navigation pane and the Fitnessquadrant of the Fitness Dashboard. Gaps in best practices are highlighted, and corrective actions arelisted for mitigation. Best practices are available at both the system and aggregate levels (customer,site, and group), helping you to standardize your storage environment and enhance its operationalefficiency.

TIP: Review Best Practices for checking whether you have implemented Storage Efficiency BestPractices according to NetApp recommendations.

Health Trending

It is extremely important to mitigate risks in a timely manner to prevent critical issues. The HealthTrending feature provides up to a 3-month view of System Risks, Best Practices, and End of Support sothat as you mitigate these conditions, you can track the progress with weekly reports. These reportsshow you a summary of trends and enable you to drill down and analyze individual risks. Trending isavailable at both single system and customer level. You can download these reports in a PDF format.

System Risk Acknowledgement

Use the System Risk Acknowledgement feature to gain the greatest flexibility in managing how risksdetected across your systems are displayed on your dashboard. This feature enables you to customizeyour risk dashboard so that it displays only the risks you deem to be most critical to your environment.

Acknowledging a risk is a way of flagging it in your dashboard. Setting your preferences to “HideAcknowledged Risks” removes the flagged risks from your active default Health Summary view. All

Page 9: Cluster : Active IQ - docs.netapp.com · Practices according to NetApp recommendations. Health Trending It is extremely important to mitigate risks in a timely manner to prevent critical

acknowledged risks are still viewable from the “Acknowledged System Health” tab.

Best Practice: Complete the “justification” field when you acknowledge a risk to document therationale behind the acknowledgement.

If you are a NetApp Internal user acknowledging on behalf of a customer with theirapproval, please add the customer’s name in the “Approved By” field for futurereference and trackability.

Risk Advisor

By using Risk Advisor, users can see how many risks can be mitigated just by doing an ONTAP upgrade.Only systems that can be upgraded to ONTAP 9.x will be shown.

Community Wisdom

Based on other systems with the same risk that upgraded, Community Wisdom gives the likelihood ofthe risk being mitigated by upgrading ONTAP along with a level of confidence. This is presented in thelast two columns as “Risk present after upgrade” and “% of Risk resolved after ONTAP upgrade”column.

Benefits

• Better system availability by lowering risk profile

• Reduces planning time for upgrades – you know which systems will benefit from upgrade from asingle report

• Additional benefit of newer features in ONTAP 9

• Your risk mitigation improves the confidence level of our recommendations

Storage EfficiencyDrawing on diagnostic records from more than 300K devices across NetApp’s user base, Active IQ isconstantly learning, giving you insights to unleash the full potential of your data. Storage Efficiency

Page 10: Cluster : Active IQ - docs.netapp.com · Practices according to NetApp recommendations. Health Trending It is extremely important to mitigate risks in a timely manner to prevent critical

Advisor uses Community Wisdom of AutoSupport data from all NetApp customers and compares theefficiency number of your system against the latest All-Flash models from NetApp where all the bestpractices are followed.

This feature, available for all Active IQ users, is enabled at a single system level for FAS systems aboveONTAP 9.1 and AFF systems above ONTAP 8.3.2. For AFF systems, it also shows the best practice gapsand suggests ways of getting improved efficiency ratios. Also, provides low touch option for customerswho wish to upgrade to latest AFF models.

WorkloadWorkload Tagging enables users to tag volumes within Storage Virtual Machines (SVMs) in ONTAPsystems (cluster mode only) with workload details. One or more volumes can be tagged to a specificworkload using selecting a workload from the pre-defined dropdown list.

Once volumes are tagged, NetApp will make recommendations and best practices available that willhelp users to improve performance, efficiency, and availability of NetApp systems.

Workload Tagging can be accessed by clicking the icon from left navigation of ONTAP cluster.

In the Cluster Dashboard, summary of total number volumes that are not tagged are shown.

You can tag volumes with the Workload, Application, Protocol and Container. Workload is anenterprise workload, and Application is defined as a User Application/Products.

Page 11: Cluster : Active IQ - docs.netapp.com · Practices according to NetApp recommendations. Health Trending It is extremely important to mitigate risks in a timely manner to prevent critical

There are three different type of workload tags:

• ONTAP tag is the tag obtained from ONTAP AutoSupport when workload template in SystemManager is used to provision.

• Auto Generated Tag is the tag that is tagged by auto detection mechanisms using machinelearning. Active IQ can intelligently identify the type of workload running on the volume.Unidentified volumes are tagged as Other.

• User Tag is the tag provided by the user manually. Only user tags can be modified or untagged.

Workload Tagging UI is built with rich features including advanced filters. Workload Tag table can befiltered using SVM, Volume Name, Tagged Workloads, Application, Protocol and Container. It helpsidentifying volumes, workloads and choose multiple volumes to tag at once. You can search for avolume by using a pattern that can match between the volume names. You can also download theentire workload tag list.

Workload and Application Efficiency and Capacity

Once the volumes are tagged, Active IQ provides Total Capacity and Efficiency for each workload andapplication. It also provides volumes level efficiency and capacity. You can filter the workloads in

Page 12: Cluster : Active IQ - docs.netapp.com · Practices according to NetApp recommendations. Health Trending It is extremely important to mitigate risks in a timely manner to prevent critical

efficiency dashboard based on tag type.

All the efficiency ratios provided are excluding Snapshots and clones.

Comparison with Peer Ratio powered by Community Wisdom

Calculated Workload Efficiency Ratio is compared with Peer / Guaranteed Ratio of each workloaddefined. Peer Ratio is calculated based on average efficiency ratio of the workloads identified usingActive IQ community wisdom. Peer Ratio is defined based for each ONTAP version and compared withthe respective ONTAP version running on the cluster.

Additional features are planned using Workload Tagging such as showing best practices, performancetrends and also tighter integration with other NetApp Products.

Cluster ViewerFrom the Cluster and node dashboards, and AutoSupport, you will now see a link to viewconfiguration details, called Cluster Viewer. Cluster Viewer enables you to see detailed physical andlogical configuration details. The details are presented in several easy-to-view tables across multipletabs that include a summary of the configuration, stack diagram, the network interfaces, a summary ofSVMs & aggregates, volume and LUN information, and a few visualizations. Visualization is thegraphical view available of how the system is cabled showing connectivity between controllers andshelves. The details available from Cluster Viewer are downloadable in DOC, XLS, and PDF. Note thatthe graphical view download is currently separate from the download of all the tables.

Type of visualizations

Sample Cable Visualization

You can view the cable visualization to see details of how the cluster is cabled. You can zoom in or out;there are also options to select parts of the visualization. Additionally, you can export the visualizationin SVG, which can then be edited in Visio.

Page 13: Cluster : Active IQ - docs.netapp.com · Practices according to NetApp recommendations. Health Trending It is extremely important to mitigate risks in a timely manner to prevent critical
Page 14: Cluster : Active IQ - docs.netapp.com · Practices according to NetApp recommendations. Health Trending It is extremely important to mitigate risks in a timely manner to prevent critical

Copyright Information

Copyright © 2020 NetApp, Inc. All rights reserved. Printed in the U.S. No part of this documentcovered by copyright may be reproduced in any form or by any means-graphic, electronic, ormechanical, including photocopying, recording, taping, or storage in an electronic retrieval system-without prior written permission of the copyright owner.

Software derived from copyrighted NetApp material is subject to the following license and disclaimer:

THIS SOFTWARE IS PROVIDED BY NETAPP “AS IS” AND WITHOUT ANY EXPRESS OR IMPLIEDWARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OFMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, WHICH ARE HEREBYDISCLAIMED. IN NO EVENT SHALL NETAPP BE LIABLE FOR ANY DIRECT, INDIRECT,INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOTLIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, ORPROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OFLIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OROTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OFTHE POSSIBILITY OF SUCH DAMAGE.

NetApp reserves the right to change any products described herein at any time, and without notice.NetApp assumes no responsibility or liability arising from the use of products described herein,except as expressly agreed to in writing by NetApp. The use or purchase of this product does notconvey a license under any patent rights, trademark rights, or any other intellectual propertyrights of NetApp.

The product described in this manual may be protected by one or more U.S. patents,foreign patents, or pending applications.

RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject torestrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data andComputer Software clause at DFARS 252.277-7103 (October 1988) and FAR 52-227-19 (June 1987).

Trademark Information

NETAPP, the NETAPP logo, and the marks listed at http://www.netapp.com/TM are trademarks ofNetApp, Inc. Other company and product names may be trademarks of their respective owners.