cisco hyperflex hyperconverged infrastructure solution for ... · virtualized sap hana servers, as...

13
White Paper © 2018 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Page 1 of 13 Cisco HyperFlex Hyperconverged Infrastructure Solution for SAP HANA Learn best practices for running SAP HANA on the Cisco HyperFlex™ hyperconverged infrastructure (HCI) solution.

Upload: others

Post on 18-Apr-2020

30 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cisco HyperFlex Hyperconverged Infrastructure Solution for ... · virtualized SAP HANA servers, as described in separate white papers published by Cisco. Purpose of this document

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 1 of 13

Cisco HyperFlex Hyperconverged Infrastructure Solution for SAP HANA

Learn best practices for running SAP HANA on the Cisco HyperFlextrade hyperconverged infrastructure (HCI) solution

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 2 of 13

Contents Introduction 3

Purpose of this document 3 Certification infrastructure 3

Cisco HyperFlex HX240c M5 All Flash Node 3 Solution summary 4 Cisco HyperFlex HX Data Platform controller 4 Data operations and distribution 5 Cisco HyperFlex Connect HTML 5 management webpage 5

Cisco HyperFlex HCI solution for SAP HANA 6 Virtual machine CPU configuration 6 SAP HANA virtual machines 6 Storage controller virtual machines 6 SAP HANA virtual machine configuration for certification 6

Recommendations 6 Requirements 6 Physical components 7 Processor and memory configuration 7 Network configuration 7 Disk sizing 7 SAP HANA performance tuning 7

Cisco HyperFlex HX Data Platform high availability 8 Cisco HyperFlex HX Data Platform cluster tolerated failures 8 Cluster state and number of failed nodes 8 Cluster state and number of nodes with failed disks 9 Cluster access policy 9 Responses to storage cluster node failures 9

Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect 10 Dashboard 11 Monitor pages 12 System Information 12 Generating a support bundle using Cisco HyperFlex Connect 12

For more information 13

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 3 of 13

Introduction This section provides a high-level overview of the Certified hyperconverged infrastructure (HCI) for SAP HANA using the Cisco HyperFlex solution for production environment

SAP landscapes frequently are deployed on virtualization platforms most often using virtualized application servers In recent years SAP has been encouraging its customers to migrate to SAPrsquos own database platform of the future SAP HANA SAP HANA databases can be deployed on virtual servers or on physical machines With the launch of the Cisco HyperFlextrade system Cisco offers a low-cost easy-to-deploy high-performance hyperconverged virtual server platform that is an excellent solution for SAP landscapes Alongside the Cisco HyperFlex solution customers can deploy other standalone servers within the same Cisco Unified Computing Systemtrade (Cisco UCSreg) domain including certified SAP HANA appliances based on Cisco UCS C-Series Rack Servers This combination is an excellent choice for production SAP landscapes

The certified and supported Cisco HyperFlex solution can also be used to deploy SAP application servers along with fully virtualized SAP HANA servers as described in separate white papers published by Cisco

Purpose of this document

This document provides an overview of the Cisco HyperFlex system architecture SAP HANA and best practices for running SAP HANA on the Cisco HyperFlex HX240c M5 All Flash Node

This document does not describe the design installation or configuration of the Cisco HyperFlex system Those details are covered in various Ciscoreg Validated Design documents which are listed in the references section at the end of this document This document also does not include detailed instructions about the installation of the SAP software or OS-level tuning required by SAP although references to this information can also be found at the end of this document

Certification infrastructure This section provides an overview of the infrastructure used to certify this solution

Cisco HyperFlex HX240c M5 All Flash Node

The system used to certify the solution consists of four Cisco HyperFlex HX240c M5 All Flash Nodes integrated into a single system by a pair of Cisco UCS 6300 Series Fabric Interconnects (Figure 1) The HX240c M5 All Flash Node is excellent for high-capacity clusters and provides high performance

Figure 1 Cisco HX240c M5 All Flash Nodes

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 4 of 13

Solution summary

The Cisco HyperFlex system provides a fully contained virtual server platform with computing and memory resources integrated networking connectivity a distributed high-performance log-based file system for virtual machine storage and hypervisor software for running the virtualized servers all within a single Cisco UCS management domain (Figure 2)

Figure 2 Cisco HyperFlex system overview

Cisco HyperFlex HX Data Platform controller

A Cisco HyperFlex HX Data Platform controller resides on each node and implements the distributed file system The controller runs as software in user space within a virtual machine and intercepts and handles all IO from the guest virtual machines The storage controller virtual machine (SCVM) uses the VMDirectPath IO feature to provide PCI pass-through control of the physical serverrsquos SAS disk controller This approach gives the controller virtual machine full control of the physical disk resources using the solid-state disk (SSD) drives as a read-write caching layer and as a capacity layer for distributed storage The controller integrates the data platform into the VMware vSphere cluster through the use of three preinstalled VMware ESXi vSphere Installation Bundles (VIBs) on each node

IO Visor This VIB provides a network file system (NFS) mount point so that the ESXi hypervisor can access the virtual disks that are attached to individual virtual machines From the hypervisorrsquos perspective it is simply attached to a network file system The IO Visor intercepts guest virtual machine IO traffic and intelligently redirects it to the Cisco HyperFlex SCVMs

VMware API for Array Integration (VAAI) This storage offload API allows vSphere to request advanced file system operations such as snapshots and cloning The controller implements these operations through manipulation of the file system metadata rather than actual data copying providing rapid response and thus rapid deployment of new environments

stHypervisorSvc This VIB adds enhancements and features needed for Cisco HyperFlex data protection and virtual machine replication

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 5 of 13

Data operations and distribution

The Cisco HyperFlex HX Data Platform controllers handle all read and write operation requests from the guest virtual machines to the virtual machine disks (VMDKs) stored in the distributed data stores in the cluster The data platform distributes the data across multiple nodes of the cluster and also across multiple capacity disks of each node according to the replication-level policy selected during the cluster setup This approach helps prevent storage hotspots on specific nodes and on specific disks of the nodes and thereby also helps prevent network hotspots and congestion that might result from accessing more data on some nodes than on others

Cisco HyperFlex Connect HTML 5 management webpage

An all-new HTML 5ndashbased web user interface is available for use as the primary management tool for Cisco HyperFlex systems (Figure 3) Through this centralized point of control for the cluster administrators can create volumes monitor data platform health and manage resource use Administrators can also use this data to predict when the cluster will need to be scaled To use the Cisco HyperFlex Connect user interface connect using a web browser to the Cisco HyperFlex cluster IP address httplthx controller cluster ipgt

Figure 3 Cisco HyperFlex Connect GUI

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 6 of 13

Cisco HyperFlex HCI solution for SAP HANA This section summarizes the Cisco HyperFlex HCI solution for SAP HANA

Virtual machine CPU configuration

The selection of the physical CPUs in the Cisco HyperFlex nodes is related to the virtual CPU (vCPU) configuration of the SAP application server virtual machines In particular you need to consider the number of physical cores per CPU socket when sizing the virtual machines and configuring the number of vCPUs they use

SAP HANA virtual machines

Take particular care to configure all the vCPUs for the SAP HANA virtual machines so that they are contained in a single non-uniform memory access (NUMA) node A basic sizing guideline is to configure an SAP HANA virtual machine with as many vCPUs as there are cores in a single physical CPU socket For example the reference server described here with Intelreg Xeonreg processor Gold series CPUs has 18 cores per socket In this case configuring the SAP HANA virtual machines with 18 vCPUs is recommended

Configuring a virtual machine with no more vCPUs than a single socket has cores helps ensure that all the virtual machine vCPUs and memory is scheduled within one NUMA node

Storage controller virtual machines

The SCVMs running on each Cisco HyperFlex node are each configured with a mandatory CPU reservation of 10800 MHz (108 GHz) During normal operation of a Cisco HyperFlex system the processor demands of the SCVMs are typically less than 50 percent of this value The ESXi hosts will not perform any throttling of CPU use unless the system nears 100 percent overall CPU use Only in those circumstances would the host set aside the 108 GHz of potential CPU performance because this performance is otherwise guaranteed to the SCVMs In most circumstances the CPU time of the hosts is not being taxed to the limit and therefore the reservation and CPU use of the SCVMs should have little overall impact on the performance of the virtual machines

SAP HANA virtual machine configuration for certification

The Cisco HyperFlex HCI solution for SAP HANA can run four SAP HANA virtual machines per four-node cluster and one SAP HANA virtual machine per node

The storage for the SAP HANA virtual machines was from the Cisco HyperFlex cluster storage pool where each Cisco HyperFlex node has 19 drives

The data log and shared file systems were used as block devices in the SAP HANA virtual machine

The SAP HANA virtual machines were installed and tested with SUSE Linux Enterprise Server (SLES) for SAP Applications 12 SP3 with SAP HANA 20003100

Recommendations To achieve better performance for SAP HANA with the Cisco HyperFlex HCI solution the recommendations described here are proposed

Requirements

To configure SAP HANA running on a Cisco HyperFlex 30 all-flash cluster you need

A functional and healthy running Cisco HyperFlex cluster

Cisco HyperFlex Release 301d or later

Cisco UCS Firmware Release 32(2g) or later

VMware ESXi 65 Update 2 or later

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 7 of 13

VMware vCenter Server Appliance

Appropriate software and licensing from SAP for SAP HANA

Physical components

Table 1 lists the physical components required for the Cisco HyperFlex system

Table 1 Cisco HyperFlex system components

Component Hardware required

Cisco Nexusreg Family switches 2 Cisco Nexus 9336C-FX2 Switches

Fabric interconnects 2 Cisco UCS 6332 Fabric Interconnects

Servers 4 Cisco HyperFlex HX240c M5SX All-Flash Node rack servers

Processor and memory configuration

Note the following guidelines for the processor and memory configuration

At present only four SAP HANA virtual machines are allowed to run on a four-node Cisco HyperFlex cluster with one SAP HANA virtual machine per node

Memory for the SAP HANA virtual machines should be allocated evenly and it must not be shared with nonndashSAP HANA virtual machines

All Cisco HyperFlex converged storage nodes run an SCVM that requires its own reserved virtual RAM On the HX240c All Flash Node server the Cisco HyperFlex SCVM has a memory reservation of 72 GB

For the best performance the amount of RAM configured for a single SAP HANA virtual machine should not exceed the amount of RAM in a single NUMA node

Configure SAP HANA virtual machines with a number of vCPUs that fit within a single NUMA node Review the virtual machinersquos vmwarelog file to help ensure that the configuration is correct

Network configuration

Two Cisco Nexus 9000 Series Switches are used for the uplink of the Cisco HyperFlex HCI solution to the customer network

According to the SAP HANA network requirements the SAP HANA virtual machines should have sufficient VMNICs created and allocated for this purpose The requirements can be found here

Disk sizing

Consult the SAP Quick Sizer to help determine the memory storage and processor requirements for the SAP HANA database to be used in your specific deployment

SAP HANA performance tuning

After SAP HANA is installed tune the parameters as shown in Table 2

Table 2 Tuning parameters

Parameter Data file system Log file system

max_parallel_io_requests 256 Default

async_read_submit On On

async_write_submit_blocks All All

async_write_submit_active Auto On

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 8 of 13

For SAP HANA 20 installations use either hdbsql or the SQL function in SAP HANA Studio or cockpit and the following SQL commands

ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileiomax_parallel_io_requests[Data]) = 256 WITH RECONFIGURE

ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileio async_write_submit_active [Data]) = Auto WITH RECONFIGURE

For more information refer to SAP Note 2399079 Elimination of hdbparam in HANA 2

Cisco HyperFlex HX Data Platform high availability The HX Data Platform high-availability feature helps ensure that the storage cluster maintains at least two copies of all your data during normal operation with three or more fully functional nodes

If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure

The number of nodes in the storage cluster in combination with the data replication factor and access policy settings determines the state of the storage cluster that results from node failures

Before using the HX Data Platform high-availability feature enable VMware Distributed Resource Scheduler (DRS) and vMotion on the vSphere Web Client

Cisco HyperFlex HX Data Platform cluster tolerated failures

If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure

How the number of node failures affects the storage cluster depends on the following

Number of nodes in the cluster The response by the storage cluster is different for clusters with three or four nodes than for clusters with five or more nodes

Data replication factor This factor is set during HX Data Platform installation and cannot be changed The options are either two or three redundant replicas of your data across the storage cluster

Access policy This policy can be changed from the default setting after the storage cluster is created The options are Strict for protecting against data loss and Lenient to support longer storage cluster availability

Cluster state and number of failed nodes

Tables 3 and 4 show how the storage cluster function changes depending on the number of simultaneous node failures

Table 3 Cluster with five or more nodes Cluster state depending on the number of failed nodes

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Table 4 Cluster with three or four nodes Cluster state depending on the number of failed nodes

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster state and number of nodes with failed disks

Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk

When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict

Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes

Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process

Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks

Replication factor Access policy Failed disks on number of different nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster access policy

The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration

Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures

Lenient This option applies policies to support longer storage cluster availability This setting is the default

Responses to storage cluster node failures

The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished

When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6

Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13

Table 6 Storage cluster failures and responses

Cluster size Number of simultaneous failures

Entity that failed Maintenance action to take

3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health

3 nodes 2 2 or more disks on 2 nodes blacklisted or failed

If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute

If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster

4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section

5 or more nodes 2 2 nodes with 2 or more disk failures on each node

The system automatically triggers a rebalance after 1 minute to restore storage cluster health

5 or more nodes 2 1 node and 1 or more disks on a different node

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication

The main monitoring pages provide information about the local Cisco HyperFlex storage cluster

Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status

Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13

Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth

System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon

Datastores View status information and tasks related to data stores

Virtual Machines View status information and tasks related to protection of virtual machines

Additional Cisco HyperFlex Connect pages provide management access

Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks

Dashboard

The dashboard shows several elements (Figure 4)

Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance

Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics

Cluster size and individual node health

Cluster IOPS storage throughput and latency for the past 1 hour

Figure 4 Dashboard view

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13

Monitor pages

Cisco HyperFlex Connect provides for additional monitoring capabilities including

Alarms Cluster alarms can be viewed acknowledged and reset

Events The cluster event log can be viewed specific events can be filtered for and the log can be exported

Activity Recent job activity can be viewed and the status can be monitored

System Information

The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode

Figure 5 System Information page

Generating a support bundle using Cisco HyperFlex Connect

You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect

Follow this procedure

1 Log in to Cisco HyperFlex Connect

2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle

3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours

You download an existing support bundle in the same way

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13

For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative

Cisco HyperFlex reference documents

Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet

Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30

Cisco HyperFlex Data Platform Administration Guide Release 30

Cisco HyperFlex Systems Troubleshooting Reference Guide 30

Cisco HyperFlex technical support documentation

VMware reference documents

Performance Best Practices for VMware vSphere 60

SAP Solutions on VMware Best Practices Guide

SAP HANA on VMware vSphere Wiki

Printed in USA C11-741524-00 1018

Page 2: Cisco HyperFlex Hyperconverged Infrastructure Solution for ... · virtualized SAP HANA servers, as described in separate white papers published by Cisco. Purpose of this document

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 2 of 13

Contents Introduction 3

Purpose of this document 3 Certification infrastructure 3

Cisco HyperFlex HX240c M5 All Flash Node 3 Solution summary 4 Cisco HyperFlex HX Data Platform controller 4 Data operations and distribution 5 Cisco HyperFlex Connect HTML 5 management webpage 5

Cisco HyperFlex HCI solution for SAP HANA 6 Virtual machine CPU configuration 6 SAP HANA virtual machines 6 Storage controller virtual machines 6 SAP HANA virtual machine configuration for certification 6

Recommendations 6 Requirements 6 Physical components 7 Processor and memory configuration 7 Network configuration 7 Disk sizing 7 SAP HANA performance tuning 7

Cisco HyperFlex HX Data Platform high availability 8 Cisco HyperFlex HX Data Platform cluster tolerated failures 8 Cluster state and number of failed nodes 8 Cluster state and number of nodes with failed disks 9 Cluster access policy 9 Responses to storage cluster node failures 9

Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect 10 Dashboard 11 Monitor pages 12 System Information 12 Generating a support bundle using Cisco HyperFlex Connect 12

For more information 13

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 3 of 13

Introduction This section provides a high-level overview of the Certified hyperconverged infrastructure (HCI) for SAP HANA using the Cisco HyperFlex solution for production environment

SAP landscapes frequently are deployed on virtualization platforms most often using virtualized application servers In recent years SAP has been encouraging its customers to migrate to SAPrsquos own database platform of the future SAP HANA SAP HANA databases can be deployed on virtual servers or on physical machines With the launch of the Cisco HyperFlextrade system Cisco offers a low-cost easy-to-deploy high-performance hyperconverged virtual server platform that is an excellent solution for SAP landscapes Alongside the Cisco HyperFlex solution customers can deploy other standalone servers within the same Cisco Unified Computing Systemtrade (Cisco UCSreg) domain including certified SAP HANA appliances based on Cisco UCS C-Series Rack Servers This combination is an excellent choice for production SAP landscapes

The certified and supported Cisco HyperFlex solution can also be used to deploy SAP application servers along with fully virtualized SAP HANA servers as described in separate white papers published by Cisco

Purpose of this document

This document provides an overview of the Cisco HyperFlex system architecture SAP HANA and best practices for running SAP HANA on the Cisco HyperFlex HX240c M5 All Flash Node

This document does not describe the design installation or configuration of the Cisco HyperFlex system Those details are covered in various Ciscoreg Validated Design documents which are listed in the references section at the end of this document This document also does not include detailed instructions about the installation of the SAP software or OS-level tuning required by SAP although references to this information can also be found at the end of this document

Certification infrastructure This section provides an overview of the infrastructure used to certify this solution

Cisco HyperFlex HX240c M5 All Flash Node

The system used to certify the solution consists of four Cisco HyperFlex HX240c M5 All Flash Nodes integrated into a single system by a pair of Cisco UCS 6300 Series Fabric Interconnects (Figure 1) The HX240c M5 All Flash Node is excellent for high-capacity clusters and provides high performance

Figure 1 Cisco HX240c M5 All Flash Nodes

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 4 of 13

Solution summary

The Cisco HyperFlex system provides a fully contained virtual server platform with computing and memory resources integrated networking connectivity a distributed high-performance log-based file system for virtual machine storage and hypervisor software for running the virtualized servers all within a single Cisco UCS management domain (Figure 2)

Figure 2 Cisco HyperFlex system overview

Cisco HyperFlex HX Data Platform controller

A Cisco HyperFlex HX Data Platform controller resides on each node and implements the distributed file system The controller runs as software in user space within a virtual machine and intercepts and handles all IO from the guest virtual machines The storage controller virtual machine (SCVM) uses the VMDirectPath IO feature to provide PCI pass-through control of the physical serverrsquos SAS disk controller This approach gives the controller virtual machine full control of the physical disk resources using the solid-state disk (SSD) drives as a read-write caching layer and as a capacity layer for distributed storage The controller integrates the data platform into the VMware vSphere cluster through the use of three preinstalled VMware ESXi vSphere Installation Bundles (VIBs) on each node

IO Visor This VIB provides a network file system (NFS) mount point so that the ESXi hypervisor can access the virtual disks that are attached to individual virtual machines From the hypervisorrsquos perspective it is simply attached to a network file system The IO Visor intercepts guest virtual machine IO traffic and intelligently redirects it to the Cisco HyperFlex SCVMs

VMware API for Array Integration (VAAI) This storage offload API allows vSphere to request advanced file system operations such as snapshots and cloning The controller implements these operations through manipulation of the file system metadata rather than actual data copying providing rapid response and thus rapid deployment of new environments

stHypervisorSvc This VIB adds enhancements and features needed for Cisco HyperFlex data protection and virtual machine replication

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 5 of 13

Data operations and distribution

The Cisco HyperFlex HX Data Platform controllers handle all read and write operation requests from the guest virtual machines to the virtual machine disks (VMDKs) stored in the distributed data stores in the cluster The data platform distributes the data across multiple nodes of the cluster and also across multiple capacity disks of each node according to the replication-level policy selected during the cluster setup This approach helps prevent storage hotspots on specific nodes and on specific disks of the nodes and thereby also helps prevent network hotspots and congestion that might result from accessing more data on some nodes than on others

Cisco HyperFlex Connect HTML 5 management webpage

An all-new HTML 5ndashbased web user interface is available for use as the primary management tool for Cisco HyperFlex systems (Figure 3) Through this centralized point of control for the cluster administrators can create volumes monitor data platform health and manage resource use Administrators can also use this data to predict when the cluster will need to be scaled To use the Cisco HyperFlex Connect user interface connect using a web browser to the Cisco HyperFlex cluster IP address httplthx controller cluster ipgt

Figure 3 Cisco HyperFlex Connect GUI

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 6 of 13

Cisco HyperFlex HCI solution for SAP HANA This section summarizes the Cisco HyperFlex HCI solution for SAP HANA

Virtual machine CPU configuration

The selection of the physical CPUs in the Cisco HyperFlex nodes is related to the virtual CPU (vCPU) configuration of the SAP application server virtual machines In particular you need to consider the number of physical cores per CPU socket when sizing the virtual machines and configuring the number of vCPUs they use

SAP HANA virtual machines

Take particular care to configure all the vCPUs for the SAP HANA virtual machines so that they are contained in a single non-uniform memory access (NUMA) node A basic sizing guideline is to configure an SAP HANA virtual machine with as many vCPUs as there are cores in a single physical CPU socket For example the reference server described here with Intelreg Xeonreg processor Gold series CPUs has 18 cores per socket In this case configuring the SAP HANA virtual machines with 18 vCPUs is recommended

Configuring a virtual machine with no more vCPUs than a single socket has cores helps ensure that all the virtual machine vCPUs and memory is scheduled within one NUMA node

Storage controller virtual machines

The SCVMs running on each Cisco HyperFlex node are each configured with a mandatory CPU reservation of 10800 MHz (108 GHz) During normal operation of a Cisco HyperFlex system the processor demands of the SCVMs are typically less than 50 percent of this value The ESXi hosts will not perform any throttling of CPU use unless the system nears 100 percent overall CPU use Only in those circumstances would the host set aside the 108 GHz of potential CPU performance because this performance is otherwise guaranteed to the SCVMs In most circumstances the CPU time of the hosts is not being taxed to the limit and therefore the reservation and CPU use of the SCVMs should have little overall impact on the performance of the virtual machines

SAP HANA virtual machine configuration for certification

The Cisco HyperFlex HCI solution for SAP HANA can run four SAP HANA virtual machines per four-node cluster and one SAP HANA virtual machine per node

The storage for the SAP HANA virtual machines was from the Cisco HyperFlex cluster storage pool where each Cisco HyperFlex node has 19 drives

The data log and shared file systems were used as block devices in the SAP HANA virtual machine

The SAP HANA virtual machines were installed and tested with SUSE Linux Enterprise Server (SLES) for SAP Applications 12 SP3 with SAP HANA 20003100

Recommendations To achieve better performance for SAP HANA with the Cisco HyperFlex HCI solution the recommendations described here are proposed

Requirements

To configure SAP HANA running on a Cisco HyperFlex 30 all-flash cluster you need

A functional and healthy running Cisco HyperFlex cluster

Cisco HyperFlex Release 301d or later

Cisco UCS Firmware Release 32(2g) or later

VMware ESXi 65 Update 2 or later

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 7 of 13

VMware vCenter Server Appliance

Appropriate software and licensing from SAP for SAP HANA

Physical components

Table 1 lists the physical components required for the Cisco HyperFlex system

Table 1 Cisco HyperFlex system components

Component Hardware required

Cisco Nexusreg Family switches 2 Cisco Nexus 9336C-FX2 Switches

Fabric interconnects 2 Cisco UCS 6332 Fabric Interconnects

Servers 4 Cisco HyperFlex HX240c M5SX All-Flash Node rack servers

Processor and memory configuration

Note the following guidelines for the processor and memory configuration

At present only four SAP HANA virtual machines are allowed to run on a four-node Cisco HyperFlex cluster with one SAP HANA virtual machine per node

Memory for the SAP HANA virtual machines should be allocated evenly and it must not be shared with nonndashSAP HANA virtual machines

All Cisco HyperFlex converged storage nodes run an SCVM that requires its own reserved virtual RAM On the HX240c All Flash Node server the Cisco HyperFlex SCVM has a memory reservation of 72 GB

For the best performance the amount of RAM configured for a single SAP HANA virtual machine should not exceed the amount of RAM in a single NUMA node

Configure SAP HANA virtual machines with a number of vCPUs that fit within a single NUMA node Review the virtual machinersquos vmwarelog file to help ensure that the configuration is correct

Network configuration

Two Cisco Nexus 9000 Series Switches are used for the uplink of the Cisco HyperFlex HCI solution to the customer network

According to the SAP HANA network requirements the SAP HANA virtual machines should have sufficient VMNICs created and allocated for this purpose The requirements can be found here

Disk sizing

Consult the SAP Quick Sizer to help determine the memory storage and processor requirements for the SAP HANA database to be used in your specific deployment

SAP HANA performance tuning

After SAP HANA is installed tune the parameters as shown in Table 2

Table 2 Tuning parameters

Parameter Data file system Log file system

max_parallel_io_requests 256 Default

async_read_submit On On

async_write_submit_blocks All All

async_write_submit_active Auto On

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 8 of 13

For SAP HANA 20 installations use either hdbsql or the SQL function in SAP HANA Studio or cockpit and the following SQL commands

ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileiomax_parallel_io_requests[Data]) = 256 WITH RECONFIGURE

ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileio async_write_submit_active [Data]) = Auto WITH RECONFIGURE

For more information refer to SAP Note 2399079 Elimination of hdbparam in HANA 2

Cisco HyperFlex HX Data Platform high availability The HX Data Platform high-availability feature helps ensure that the storage cluster maintains at least two copies of all your data during normal operation with three or more fully functional nodes

If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure

The number of nodes in the storage cluster in combination with the data replication factor and access policy settings determines the state of the storage cluster that results from node failures

Before using the HX Data Platform high-availability feature enable VMware Distributed Resource Scheduler (DRS) and vMotion on the vSphere Web Client

Cisco HyperFlex HX Data Platform cluster tolerated failures

If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure

How the number of node failures affects the storage cluster depends on the following

Number of nodes in the cluster The response by the storage cluster is different for clusters with three or four nodes than for clusters with five or more nodes

Data replication factor This factor is set during HX Data Platform installation and cannot be changed The options are either two or three redundant replicas of your data across the storage cluster

Access policy This policy can be changed from the default setting after the storage cluster is created The options are Strict for protecting against data loss and Lenient to support longer storage cluster availability

Cluster state and number of failed nodes

Tables 3 and 4 show how the storage cluster function changes depending on the number of simultaneous node failures

Table 3 Cluster with five or more nodes Cluster state depending on the number of failed nodes

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Table 4 Cluster with three or four nodes Cluster state depending on the number of failed nodes

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster state and number of nodes with failed disks

Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk

When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict

Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes

Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process

Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks

Replication factor Access policy Failed disks on number of different nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster access policy

The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration

Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures

Lenient This option applies policies to support longer storage cluster availability This setting is the default

Responses to storage cluster node failures

The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished

When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6

Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13

Table 6 Storage cluster failures and responses

Cluster size Number of simultaneous failures

Entity that failed Maintenance action to take

3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health

3 nodes 2 2 or more disks on 2 nodes blacklisted or failed

If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute

If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster

4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section

5 or more nodes 2 2 nodes with 2 or more disk failures on each node

The system automatically triggers a rebalance after 1 minute to restore storage cluster health

5 or more nodes 2 1 node and 1 or more disks on a different node

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication

The main monitoring pages provide information about the local Cisco HyperFlex storage cluster

Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status

Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13

Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth

System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon

Datastores View status information and tasks related to data stores

Virtual Machines View status information and tasks related to protection of virtual machines

Additional Cisco HyperFlex Connect pages provide management access

Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks

Dashboard

The dashboard shows several elements (Figure 4)

Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance

Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics

Cluster size and individual node health

Cluster IOPS storage throughput and latency for the past 1 hour

Figure 4 Dashboard view

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13

Monitor pages

Cisco HyperFlex Connect provides for additional monitoring capabilities including

Alarms Cluster alarms can be viewed acknowledged and reset

Events The cluster event log can be viewed specific events can be filtered for and the log can be exported

Activity Recent job activity can be viewed and the status can be monitored

System Information

The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode

Figure 5 System Information page

Generating a support bundle using Cisco HyperFlex Connect

You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect

Follow this procedure

1 Log in to Cisco HyperFlex Connect

2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle

3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours

You download an existing support bundle in the same way

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13

For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative

Cisco HyperFlex reference documents

Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet

Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30

Cisco HyperFlex Data Platform Administration Guide Release 30

Cisco HyperFlex Systems Troubleshooting Reference Guide 30

Cisco HyperFlex technical support documentation

VMware reference documents

Performance Best Practices for VMware vSphere 60

SAP Solutions on VMware Best Practices Guide

SAP HANA on VMware vSphere Wiki

Printed in USA C11-741524-00 1018

Page 3: Cisco HyperFlex Hyperconverged Infrastructure Solution for ... · virtualized SAP HANA servers, as described in separate white papers published by Cisco. Purpose of this document

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 3 of 13

Introduction This section provides a high-level overview of the Certified hyperconverged infrastructure (HCI) for SAP HANA using the Cisco HyperFlex solution for production environment

SAP landscapes frequently are deployed on virtualization platforms most often using virtualized application servers In recent years SAP has been encouraging its customers to migrate to SAPrsquos own database platform of the future SAP HANA SAP HANA databases can be deployed on virtual servers or on physical machines With the launch of the Cisco HyperFlextrade system Cisco offers a low-cost easy-to-deploy high-performance hyperconverged virtual server platform that is an excellent solution for SAP landscapes Alongside the Cisco HyperFlex solution customers can deploy other standalone servers within the same Cisco Unified Computing Systemtrade (Cisco UCSreg) domain including certified SAP HANA appliances based on Cisco UCS C-Series Rack Servers This combination is an excellent choice for production SAP landscapes

The certified and supported Cisco HyperFlex solution can also be used to deploy SAP application servers along with fully virtualized SAP HANA servers as described in separate white papers published by Cisco

Purpose of this document

This document provides an overview of the Cisco HyperFlex system architecture SAP HANA and best practices for running SAP HANA on the Cisco HyperFlex HX240c M5 All Flash Node

This document does not describe the design installation or configuration of the Cisco HyperFlex system Those details are covered in various Ciscoreg Validated Design documents which are listed in the references section at the end of this document This document also does not include detailed instructions about the installation of the SAP software or OS-level tuning required by SAP although references to this information can also be found at the end of this document

Certification infrastructure This section provides an overview of the infrastructure used to certify this solution

Cisco HyperFlex HX240c M5 All Flash Node

The system used to certify the solution consists of four Cisco HyperFlex HX240c M5 All Flash Nodes integrated into a single system by a pair of Cisco UCS 6300 Series Fabric Interconnects (Figure 1) The HX240c M5 All Flash Node is excellent for high-capacity clusters and provides high performance

Figure 1 Cisco HX240c M5 All Flash Nodes

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 4 of 13

Solution summary

The Cisco HyperFlex system provides a fully contained virtual server platform with computing and memory resources integrated networking connectivity a distributed high-performance log-based file system for virtual machine storage and hypervisor software for running the virtualized servers all within a single Cisco UCS management domain (Figure 2)

Figure 2 Cisco HyperFlex system overview

Cisco HyperFlex HX Data Platform controller

A Cisco HyperFlex HX Data Platform controller resides on each node and implements the distributed file system The controller runs as software in user space within a virtual machine and intercepts and handles all IO from the guest virtual machines The storage controller virtual machine (SCVM) uses the VMDirectPath IO feature to provide PCI pass-through control of the physical serverrsquos SAS disk controller This approach gives the controller virtual machine full control of the physical disk resources using the solid-state disk (SSD) drives as a read-write caching layer and as a capacity layer for distributed storage The controller integrates the data platform into the VMware vSphere cluster through the use of three preinstalled VMware ESXi vSphere Installation Bundles (VIBs) on each node

IO Visor This VIB provides a network file system (NFS) mount point so that the ESXi hypervisor can access the virtual disks that are attached to individual virtual machines From the hypervisorrsquos perspective it is simply attached to a network file system The IO Visor intercepts guest virtual machine IO traffic and intelligently redirects it to the Cisco HyperFlex SCVMs

VMware API for Array Integration (VAAI) This storage offload API allows vSphere to request advanced file system operations such as snapshots and cloning The controller implements these operations through manipulation of the file system metadata rather than actual data copying providing rapid response and thus rapid deployment of new environments

stHypervisorSvc This VIB adds enhancements and features needed for Cisco HyperFlex data protection and virtual machine replication

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 5 of 13

Data operations and distribution

The Cisco HyperFlex HX Data Platform controllers handle all read and write operation requests from the guest virtual machines to the virtual machine disks (VMDKs) stored in the distributed data stores in the cluster The data platform distributes the data across multiple nodes of the cluster and also across multiple capacity disks of each node according to the replication-level policy selected during the cluster setup This approach helps prevent storage hotspots on specific nodes and on specific disks of the nodes and thereby also helps prevent network hotspots and congestion that might result from accessing more data on some nodes than on others

Cisco HyperFlex Connect HTML 5 management webpage

An all-new HTML 5ndashbased web user interface is available for use as the primary management tool for Cisco HyperFlex systems (Figure 3) Through this centralized point of control for the cluster administrators can create volumes monitor data platform health and manage resource use Administrators can also use this data to predict when the cluster will need to be scaled To use the Cisco HyperFlex Connect user interface connect using a web browser to the Cisco HyperFlex cluster IP address httplthx controller cluster ipgt

Figure 3 Cisco HyperFlex Connect GUI

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 6 of 13

Cisco HyperFlex HCI solution for SAP HANA This section summarizes the Cisco HyperFlex HCI solution for SAP HANA

Virtual machine CPU configuration

The selection of the physical CPUs in the Cisco HyperFlex nodes is related to the virtual CPU (vCPU) configuration of the SAP application server virtual machines In particular you need to consider the number of physical cores per CPU socket when sizing the virtual machines and configuring the number of vCPUs they use

SAP HANA virtual machines

Take particular care to configure all the vCPUs for the SAP HANA virtual machines so that they are contained in a single non-uniform memory access (NUMA) node A basic sizing guideline is to configure an SAP HANA virtual machine with as many vCPUs as there are cores in a single physical CPU socket For example the reference server described here with Intelreg Xeonreg processor Gold series CPUs has 18 cores per socket In this case configuring the SAP HANA virtual machines with 18 vCPUs is recommended

Configuring a virtual machine with no more vCPUs than a single socket has cores helps ensure that all the virtual machine vCPUs and memory is scheduled within one NUMA node

Storage controller virtual machines

The SCVMs running on each Cisco HyperFlex node are each configured with a mandatory CPU reservation of 10800 MHz (108 GHz) During normal operation of a Cisco HyperFlex system the processor demands of the SCVMs are typically less than 50 percent of this value The ESXi hosts will not perform any throttling of CPU use unless the system nears 100 percent overall CPU use Only in those circumstances would the host set aside the 108 GHz of potential CPU performance because this performance is otherwise guaranteed to the SCVMs In most circumstances the CPU time of the hosts is not being taxed to the limit and therefore the reservation and CPU use of the SCVMs should have little overall impact on the performance of the virtual machines

SAP HANA virtual machine configuration for certification

The Cisco HyperFlex HCI solution for SAP HANA can run four SAP HANA virtual machines per four-node cluster and one SAP HANA virtual machine per node

The storage for the SAP HANA virtual machines was from the Cisco HyperFlex cluster storage pool where each Cisco HyperFlex node has 19 drives

The data log and shared file systems were used as block devices in the SAP HANA virtual machine

The SAP HANA virtual machines were installed and tested with SUSE Linux Enterprise Server (SLES) for SAP Applications 12 SP3 with SAP HANA 20003100

Recommendations To achieve better performance for SAP HANA with the Cisco HyperFlex HCI solution the recommendations described here are proposed

Requirements

To configure SAP HANA running on a Cisco HyperFlex 30 all-flash cluster you need

A functional and healthy running Cisco HyperFlex cluster

Cisco HyperFlex Release 301d or later

Cisco UCS Firmware Release 32(2g) or later

VMware ESXi 65 Update 2 or later

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 7 of 13

VMware vCenter Server Appliance

Appropriate software and licensing from SAP for SAP HANA

Physical components

Table 1 lists the physical components required for the Cisco HyperFlex system

Table 1 Cisco HyperFlex system components

Component Hardware required

Cisco Nexusreg Family switches 2 Cisco Nexus 9336C-FX2 Switches

Fabric interconnects 2 Cisco UCS 6332 Fabric Interconnects

Servers 4 Cisco HyperFlex HX240c M5SX All-Flash Node rack servers

Processor and memory configuration

Note the following guidelines for the processor and memory configuration

At present only four SAP HANA virtual machines are allowed to run on a four-node Cisco HyperFlex cluster with one SAP HANA virtual machine per node

Memory for the SAP HANA virtual machines should be allocated evenly and it must not be shared with nonndashSAP HANA virtual machines

All Cisco HyperFlex converged storage nodes run an SCVM that requires its own reserved virtual RAM On the HX240c All Flash Node server the Cisco HyperFlex SCVM has a memory reservation of 72 GB

For the best performance the amount of RAM configured for a single SAP HANA virtual machine should not exceed the amount of RAM in a single NUMA node

Configure SAP HANA virtual machines with a number of vCPUs that fit within a single NUMA node Review the virtual machinersquos vmwarelog file to help ensure that the configuration is correct

Network configuration

Two Cisco Nexus 9000 Series Switches are used for the uplink of the Cisco HyperFlex HCI solution to the customer network

According to the SAP HANA network requirements the SAP HANA virtual machines should have sufficient VMNICs created and allocated for this purpose The requirements can be found here

Disk sizing

Consult the SAP Quick Sizer to help determine the memory storage and processor requirements for the SAP HANA database to be used in your specific deployment

SAP HANA performance tuning

After SAP HANA is installed tune the parameters as shown in Table 2

Table 2 Tuning parameters

Parameter Data file system Log file system

max_parallel_io_requests 256 Default

async_read_submit On On

async_write_submit_blocks All All

async_write_submit_active Auto On

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 8 of 13

For SAP HANA 20 installations use either hdbsql or the SQL function in SAP HANA Studio or cockpit and the following SQL commands

ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileiomax_parallel_io_requests[Data]) = 256 WITH RECONFIGURE

ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileio async_write_submit_active [Data]) = Auto WITH RECONFIGURE

For more information refer to SAP Note 2399079 Elimination of hdbparam in HANA 2

Cisco HyperFlex HX Data Platform high availability The HX Data Platform high-availability feature helps ensure that the storage cluster maintains at least two copies of all your data during normal operation with three or more fully functional nodes

If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure

The number of nodes in the storage cluster in combination with the data replication factor and access policy settings determines the state of the storage cluster that results from node failures

Before using the HX Data Platform high-availability feature enable VMware Distributed Resource Scheduler (DRS) and vMotion on the vSphere Web Client

Cisco HyperFlex HX Data Platform cluster tolerated failures

If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure

How the number of node failures affects the storage cluster depends on the following

Number of nodes in the cluster The response by the storage cluster is different for clusters with three or four nodes than for clusters with five or more nodes

Data replication factor This factor is set during HX Data Platform installation and cannot be changed The options are either two or three redundant replicas of your data across the storage cluster

Access policy This policy can be changed from the default setting after the storage cluster is created The options are Strict for protecting against data loss and Lenient to support longer storage cluster availability

Cluster state and number of failed nodes

Tables 3 and 4 show how the storage cluster function changes depending on the number of simultaneous node failures

Table 3 Cluster with five or more nodes Cluster state depending on the number of failed nodes

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Table 4 Cluster with three or four nodes Cluster state depending on the number of failed nodes

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster state and number of nodes with failed disks

Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk

When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict

Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes

Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process

Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks

Replication factor Access policy Failed disks on number of different nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster access policy

The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration

Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures

Lenient This option applies policies to support longer storage cluster availability This setting is the default

Responses to storage cluster node failures

The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished

When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6

Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13

Table 6 Storage cluster failures and responses

Cluster size Number of simultaneous failures

Entity that failed Maintenance action to take

3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health

3 nodes 2 2 or more disks on 2 nodes blacklisted or failed

If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute

If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster

4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section

5 or more nodes 2 2 nodes with 2 or more disk failures on each node

The system automatically triggers a rebalance after 1 minute to restore storage cluster health

5 or more nodes 2 1 node and 1 or more disks on a different node

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication

The main monitoring pages provide information about the local Cisco HyperFlex storage cluster

Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status

Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13

Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth

System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon

Datastores View status information and tasks related to data stores

Virtual Machines View status information and tasks related to protection of virtual machines

Additional Cisco HyperFlex Connect pages provide management access

Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks

Dashboard

The dashboard shows several elements (Figure 4)

Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance

Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics

Cluster size and individual node health

Cluster IOPS storage throughput and latency for the past 1 hour

Figure 4 Dashboard view

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13

Monitor pages

Cisco HyperFlex Connect provides for additional monitoring capabilities including

Alarms Cluster alarms can be viewed acknowledged and reset

Events The cluster event log can be viewed specific events can be filtered for and the log can be exported

Activity Recent job activity can be viewed and the status can be monitored

System Information

The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode

Figure 5 System Information page

Generating a support bundle using Cisco HyperFlex Connect

You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect

Follow this procedure

1 Log in to Cisco HyperFlex Connect

2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle

3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours

You download an existing support bundle in the same way

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13

For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative

Cisco HyperFlex reference documents

Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet

Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30

Cisco HyperFlex Data Platform Administration Guide Release 30

Cisco HyperFlex Systems Troubleshooting Reference Guide 30

Cisco HyperFlex technical support documentation

VMware reference documents

Performance Best Practices for VMware vSphere 60

SAP Solutions on VMware Best Practices Guide

SAP HANA on VMware vSphere Wiki

Printed in USA C11-741524-00 1018

Page 4: Cisco HyperFlex Hyperconverged Infrastructure Solution for ... · virtualized SAP HANA servers, as described in separate white papers published by Cisco. Purpose of this document

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 4 of 13

Solution summary

The Cisco HyperFlex system provides a fully contained virtual server platform with computing and memory resources integrated networking connectivity a distributed high-performance log-based file system for virtual machine storage and hypervisor software for running the virtualized servers all within a single Cisco UCS management domain (Figure 2)

Figure 2 Cisco HyperFlex system overview

Cisco HyperFlex HX Data Platform controller

A Cisco HyperFlex HX Data Platform controller resides on each node and implements the distributed file system The controller runs as software in user space within a virtual machine and intercepts and handles all IO from the guest virtual machines The storage controller virtual machine (SCVM) uses the VMDirectPath IO feature to provide PCI pass-through control of the physical serverrsquos SAS disk controller This approach gives the controller virtual machine full control of the physical disk resources using the solid-state disk (SSD) drives as a read-write caching layer and as a capacity layer for distributed storage The controller integrates the data platform into the VMware vSphere cluster through the use of three preinstalled VMware ESXi vSphere Installation Bundles (VIBs) on each node

IO Visor This VIB provides a network file system (NFS) mount point so that the ESXi hypervisor can access the virtual disks that are attached to individual virtual machines From the hypervisorrsquos perspective it is simply attached to a network file system The IO Visor intercepts guest virtual machine IO traffic and intelligently redirects it to the Cisco HyperFlex SCVMs

VMware API for Array Integration (VAAI) This storage offload API allows vSphere to request advanced file system operations such as snapshots and cloning The controller implements these operations through manipulation of the file system metadata rather than actual data copying providing rapid response and thus rapid deployment of new environments

stHypervisorSvc This VIB adds enhancements and features needed for Cisco HyperFlex data protection and virtual machine replication

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 5 of 13

Data operations and distribution

The Cisco HyperFlex HX Data Platform controllers handle all read and write operation requests from the guest virtual machines to the virtual machine disks (VMDKs) stored in the distributed data stores in the cluster The data platform distributes the data across multiple nodes of the cluster and also across multiple capacity disks of each node according to the replication-level policy selected during the cluster setup This approach helps prevent storage hotspots on specific nodes and on specific disks of the nodes and thereby also helps prevent network hotspots and congestion that might result from accessing more data on some nodes than on others

Cisco HyperFlex Connect HTML 5 management webpage

An all-new HTML 5ndashbased web user interface is available for use as the primary management tool for Cisco HyperFlex systems (Figure 3) Through this centralized point of control for the cluster administrators can create volumes monitor data platform health and manage resource use Administrators can also use this data to predict when the cluster will need to be scaled To use the Cisco HyperFlex Connect user interface connect using a web browser to the Cisco HyperFlex cluster IP address httplthx controller cluster ipgt

Figure 3 Cisco HyperFlex Connect GUI

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 6 of 13

Cisco HyperFlex HCI solution for SAP HANA This section summarizes the Cisco HyperFlex HCI solution for SAP HANA

Virtual machine CPU configuration

The selection of the physical CPUs in the Cisco HyperFlex nodes is related to the virtual CPU (vCPU) configuration of the SAP application server virtual machines In particular you need to consider the number of physical cores per CPU socket when sizing the virtual machines and configuring the number of vCPUs they use

SAP HANA virtual machines

Take particular care to configure all the vCPUs for the SAP HANA virtual machines so that they are contained in a single non-uniform memory access (NUMA) node A basic sizing guideline is to configure an SAP HANA virtual machine with as many vCPUs as there are cores in a single physical CPU socket For example the reference server described here with Intelreg Xeonreg processor Gold series CPUs has 18 cores per socket In this case configuring the SAP HANA virtual machines with 18 vCPUs is recommended

Configuring a virtual machine with no more vCPUs than a single socket has cores helps ensure that all the virtual machine vCPUs and memory is scheduled within one NUMA node

Storage controller virtual machines

The SCVMs running on each Cisco HyperFlex node are each configured with a mandatory CPU reservation of 10800 MHz (108 GHz) During normal operation of a Cisco HyperFlex system the processor demands of the SCVMs are typically less than 50 percent of this value The ESXi hosts will not perform any throttling of CPU use unless the system nears 100 percent overall CPU use Only in those circumstances would the host set aside the 108 GHz of potential CPU performance because this performance is otherwise guaranteed to the SCVMs In most circumstances the CPU time of the hosts is not being taxed to the limit and therefore the reservation and CPU use of the SCVMs should have little overall impact on the performance of the virtual machines

SAP HANA virtual machine configuration for certification

The Cisco HyperFlex HCI solution for SAP HANA can run four SAP HANA virtual machines per four-node cluster and one SAP HANA virtual machine per node

The storage for the SAP HANA virtual machines was from the Cisco HyperFlex cluster storage pool where each Cisco HyperFlex node has 19 drives

The data log and shared file systems were used as block devices in the SAP HANA virtual machine

The SAP HANA virtual machines were installed and tested with SUSE Linux Enterprise Server (SLES) for SAP Applications 12 SP3 with SAP HANA 20003100

Recommendations To achieve better performance for SAP HANA with the Cisco HyperFlex HCI solution the recommendations described here are proposed

Requirements

To configure SAP HANA running on a Cisco HyperFlex 30 all-flash cluster you need

A functional and healthy running Cisco HyperFlex cluster

Cisco HyperFlex Release 301d or later

Cisco UCS Firmware Release 32(2g) or later

VMware ESXi 65 Update 2 or later

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 7 of 13

VMware vCenter Server Appliance

Appropriate software and licensing from SAP for SAP HANA

Physical components

Table 1 lists the physical components required for the Cisco HyperFlex system

Table 1 Cisco HyperFlex system components

Component Hardware required

Cisco Nexusreg Family switches 2 Cisco Nexus 9336C-FX2 Switches

Fabric interconnects 2 Cisco UCS 6332 Fabric Interconnects

Servers 4 Cisco HyperFlex HX240c M5SX All-Flash Node rack servers

Processor and memory configuration

Note the following guidelines for the processor and memory configuration

At present only four SAP HANA virtual machines are allowed to run on a four-node Cisco HyperFlex cluster with one SAP HANA virtual machine per node

Memory for the SAP HANA virtual machines should be allocated evenly and it must not be shared with nonndashSAP HANA virtual machines

All Cisco HyperFlex converged storage nodes run an SCVM that requires its own reserved virtual RAM On the HX240c All Flash Node server the Cisco HyperFlex SCVM has a memory reservation of 72 GB

For the best performance the amount of RAM configured for a single SAP HANA virtual machine should not exceed the amount of RAM in a single NUMA node

Configure SAP HANA virtual machines with a number of vCPUs that fit within a single NUMA node Review the virtual machinersquos vmwarelog file to help ensure that the configuration is correct

Network configuration

Two Cisco Nexus 9000 Series Switches are used for the uplink of the Cisco HyperFlex HCI solution to the customer network

According to the SAP HANA network requirements the SAP HANA virtual machines should have sufficient VMNICs created and allocated for this purpose The requirements can be found here

Disk sizing

Consult the SAP Quick Sizer to help determine the memory storage and processor requirements for the SAP HANA database to be used in your specific deployment

SAP HANA performance tuning

After SAP HANA is installed tune the parameters as shown in Table 2

Table 2 Tuning parameters

Parameter Data file system Log file system

max_parallel_io_requests 256 Default

async_read_submit On On

async_write_submit_blocks All All

async_write_submit_active Auto On

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 8 of 13

For SAP HANA 20 installations use either hdbsql or the SQL function in SAP HANA Studio or cockpit and the following SQL commands

ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileiomax_parallel_io_requests[Data]) = 256 WITH RECONFIGURE

ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileio async_write_submit_active [Data]) = Auto WITH RECONFIGURE

For more information refer to SAP Note 2399079 Elimination of hdbparam in HANA 2

Cisco HyperFlex HX Data Platform high availability The HX Data Platform high-availability feature helps ensure that the storage cluster maintains at least two copies of all your data during normal operation with three or more fully functional nodes

If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure

The number of nodes in the storage cluster in combination with the data replication factor and access policy settings determines the state of the storage cluster that results from node failures

Before using the HX Data Platform high-availability feature enable VMware Distributed Resource Scheduler (DRS) and vMotion on the vSphere Web Client

Cisco HyperFlex HX Data Platform cluster tolerated failures

If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure

How the number of node failures affects the storage cluster depends on the following

Number of nodes in the cluster The response by the storage cluster is different for clusters with three or four nodes than for clusters with five or more nodes

Data replication factor This factor is set during HX Data Platform installation and cannot be changed The options are either two or three redundant replicas of your data across the storage cluster

Access policy This policy can be changed from the default setting after the storage cluster is created The options are Strict for protecting against data loss and Lenient to support longer storage cluster availability

Cluster state and number of failed nodes

Tables 3 and 4 show how the storage cluster function changes depending on the number of simultaneous node failures

Table 3 Cluster with five or more nodes Cluster state depending on the number of failed nodes

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Table 4 Cluster with three or four nodes Cluster state depending on the number of failed nodes

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster state and number of nodes with failed disks

Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk

When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict

Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes

Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process

Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks

Replication factor Access policy Failed disks on number of different nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster access policy

The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration

Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures

Lenient This option applies policies to support longer storage cluster availability This setting is the default

Responses to storage cluster node failures

The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished

When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6

Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13

Table 6 Storage cluster failures and responses

Cluster size Number of simultaneous failures

Entity that failed Maintenance action to take

3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health

3 nodes 2 2 or more disks on 2 nodes blacklisted or failed

If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute

If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster

4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section

5 or more nodes 2 2 nodes with 2 or more disk failures on each node

The system automatically triggers a rebalance after 1 minute to restore storage cluster health

5 or more nodes 2 1 node and 1 or more disks on a different node

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication

The main monitoring pages provide information about the local Cisco HyperFlex storage cluster

Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status

Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13

Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth

System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon

Datastores View status information and tasks related to data stores

Virtual Machines View status information and tasks related to protection of virtual machines

Additional Cisco HyperFlex Connect pages provide management access

Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks

Dashboard

The dashboard shows several elements (Figure 4)

Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance

Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics

Cluster size and individual node health

Cluster IOPS storage throughput and latency for the past 1 hour

Figure 4 Dashboard view

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13

Monitor pages

Cisco HyperFlex Connect provides for additional monitoring capabilities including

Alarms Cluster alarms can be viewed acknowledged and reset

Events The cluster event log can be viewed specific events can be filtered for and the log can be exported

Activity Recent job activity can be viewed and the status can be monitored

System Information

The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode

Figure 5 System Information page

Generating a support bundle using Cisco HyperFlex Connect

You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect

Follow this procedure

1 Log in to Cisco HyperFlex Connect

2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle

3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours

You download an existing support bundle in the same way

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13

For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative

Cisco HyperFlex reference documents

Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet

Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30

Cisco HyperFlex Data Platform Administration Guide Release 30

Cisco HyperFlex Systems Troubleshooting Reference Guide 30

Cisco HyperFlex technical support documentation

VMware reference documents

Performance Best Practices for VMware vSphere 60

SAP Solutions on VMware Best Practices Guide

SAP HANA on VMware vSphere Wiki

Printed in USA C11-741524-00 1018

Page 5: Cisco HyperFlex Hyperconverged Infrastructure Solution for ... · virtualized SAP HANA servers, as described in separate white papers published by Cisco. Purpose of this document

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 5 of 13

Data operations and distribution

The Cisco HyperFlex HX Data Platform controllers handle all read and write operation requests from the guest virtual machines to the virtual machine disks (VMDKs) stored in the distributed data stores in the cluster The data platform distributes the data across multiple nodes of the cluster and also across multiple capacity disks of each node according to the replication-level policy selected during the cluster setup This approach helps prevent storage hotspots on specific nodes and on specific disks of the nodes and thereby also helps prevent network hotspots and congestion that might result from accessing more data on some nodes than on others

Cisco HyperFlex Connect HTML 5 management webpage

An all-new HTML 5ndashbased web user interface is available for use as the primary management tool for Cisco HyperFlex systems (Figure 3) Through this centralized point of control for the cluster administrators can create volumes monitor data platform health and manage resource use Administrators can also use this data to predict when the cluster will need to be scaled To use the Cisco HyperFlex Connect user interface connect using a web browser to the Cisco HyperFlex cluster IP address httplthx controller cluster ipgt

Figure 3 Cisco HyperFlex Connect GUI

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 6 of 13

Cisco HyperFlex HCI solution for SAP HANA This section summarizes the Cisco HyperFlex HCI solution for SAP HANA

Virtual machine CPU configuration

The selection of the physical CPUs in the Cisco HyperFlex nodes is related to the virtual CPU (vCPU) configuration of the SAP application server virtual machines In particular you need to consider the number of physical cores per CPU socket when sizing the virtual machines and configuring the number of vCPUs they use

SAP HANA virtual machines

Take particular care to configure all the vCPUs for the SAP HANA virtual machines so that they are contained in a single non-uniform memory access (NUMA) node A basic sizing guideline is to configure an SAP HANA virtual machine with as many vCPUs as there are cores in a single physical CPU socket For example the reference server described here with Intelreg Xeonreg processor Gold series CPUs has 18 cores per socket In this case configuring the SAP HANA virtual machines with 18 vCPUs is recommended

Configuring a virtual machine with no more vCPUs than a single socket has cores helps ensure that all the virtual machine vCPUs and memory is scheduled within one NUMA node

Storage controller virtual machines

The SCVMs running on each Cisco HyperFlex node are each configured with a mandatory CPU reservation of 10800 MHz (108 GHz) During normal operation of a Cisco HyperFlex system the processor demands of the SCVMs are typically less than 50 percent of this value The ESXi hosts will not perform any throttling of CPU use unless the system nears 100 percent overall CPU use Only in those circumstances would the host set aside the 108 GHz of potential CPU performance because this performance is otherwise guaranteed to the SCVMs In most circumstances the CPU time of the hosts is not being taxed to the limit and therefore the reservation and CPU use of the SCVMs should have little overall impact on the performance of the virtual machines

SAP HANA virtual machine configuration for certification

The Cisco HyperFlex HCI solution for SAP HANA can run four SAP HANA virtual machines per four-node cluster and one SAP HANA virtual machine per node

The storage for the SAP HANA virtual machines was from the Cisco HyperFlex cluster storage pool where each Cisco HyperFlex node has 19 drives

The data log and shared file systems were used as block devices in the SAP HANA virtual machine

The SAP HANA virtual machines were installed and tested with SUSE Linux Enterprise Server (SLES) for SAP Applications 12 SP3 with SAP HANA 20003100

Recommendations To achieve better performance for SAP HANA with the Cisco HyperFlex HCI solution the recommendations described here are proposed

Requirements

To configure SAP HANA running on a Cisco HyperFlex 30 all-flash cluster you need

A functional and healthy running Cisco HyperFlex cluster

Cisco HyperFlex Release 301d or later

Cisco UCS Firmware Release 32(2g) or later

VMware ESXi 65 Update 2 or later

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 7 of 13

VMware vCenter Server Appliance

Appropriate software and licensing from SAP for SAP HANA

Physical components

Table 1 lists the physical components required for the Cisco HyperFlex system

Table 1 Cisco HyperFlex system components

Component Hardware required

Cisco Nexusreg Family switches 2 Cisco Nexus 9336C-FX2 Switches

Fabric interconnects 2 Cisco UCS 6332 Fabric Interconnects

Servers 4 Cisco HyperFlex HX240c M5SX All-Flash Node rack servers

Processor and memory configuration

Note the following guidelines for the processor and memory configuration

At present only four SAP HANA virtual machines are allowed to run on a four-node Cisco HyperFlex cluster with one SAP HANA virtual machine per node

Memory for the SAP HANA virtual machines should be allocated evenly and it must not be shared with nonndashSAP HANA virtual machines

All Cisco HyperFlex converged storage nodes run an SCVM that requires its own reserved virtual RAM On the HX240c All Flash Node server the Cisco HyperFlex SCVM has a memory reservation of 72 GB

For the best performance the amount of RAM configured for a single SAP HANA virtual machine should not exceed the amount of RAM in a single NUMA node

Configure SAP HANA virtual machines with a number of vCPUs that fit within a single NUMA node Review the virtual machinersquos vmwarelog file to help ensure that the configuration is correct

Network configuration

Two Cisco Nexus 9000 Series Switches are used for the uplink of the Cisco HyperFlex HCI solution to the customer network

According to the SAP HANA network requirements the SAP HANA virtual machines should have sufficient VMNICs created and allocated for this purpose The requirements can be found here

Disk sizing

Consult the SAP Quick Sizer to help determine the memory storage and processor requirements for the SAP HANA database to be used in your specific deployment

SAP HANA performance tuning

After SAP HANA is installed tune the parameters as shown in Table 2

Table 2 Tuning parameters

Parameter Data file system Log file system

max_parallel_io_requests 256 Default

async_read_submit On On

async_write_submit_blocks All All

async_write_submit_active Auto On

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 8 of 13

For SAP HANA 20 installations use either hdbsql or the SQL function in SAP HANA Studio or cockpit and the following SQL commands

ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileiomax_parallel_io_requests[Data]) = 256 WITH RECONFIGURE

ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileio async_write_submit_active [Data]) = Auto WITH RECONFIGURE

For more information refer to SAP Note 2399079 Elimination of hdbparam in HANA 2

Cisco HyperFlex HX Data Platform high availability The HX Data Platform high-availability feature helps ensure that the storage cluster maintains at least two copies of all your data during normal operation with three or more fully functional nodes

If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure

The number of nodes in the storage cluster in combination with the data replication factor and access policy settings determines the state of the storage cluster that results from node failures

Before using the HX Data Platform high-availability feature enable VMware Distributed Resource Scheduler (DRS) and vMotion on the vSphere Web Client

Cisco HyperFlex HX Data Platform cluster tolerated failures

If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure

How the number of node failures affects the storage cluster depends on the following

Number of nodes in the cluster The response by the storage cluster is different for clusters with three or four nodes than for clusters with five or more nodes

Data replication factor This factor is set during HX Data Platform installation and cannot be changed The options are either two or three redundant replicas of your data across the storage cluster

Access policy This policy can be changed from the default setting after the storage cluster is created The options are Strict for protecting against data loss and Lenient to support longer storage cluster availability

Cluster state and number of failed nodes

Tables 3 and 4 show how the storage cluster function changes depending on the number of simultaneous node failures

Table 3 Cluster with five or more nodes Cluster state depending on the number of failed nodes

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Table 4 Cluster with three or four nodes Cluster state depending on the number of failed nodes

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster state and number of nodes with failed disks

Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk

When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict

Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes

Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process

Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks

Replication factor Access policy Failed disks on number of different nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster access policy

The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration

Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures

Lenient This option applies policies to support longer storage cluster availability This setting is the default

Responses to storage cluster node failures

The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished

When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6

Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13

Table 6 Storage cluster failures and responses

Cluster size Number of simultaneous failures

Entity that failed Maintenance action to take

3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health

3 nodes 2 2 or more disks on 2 nodes blacklisted or failed

If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute

If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster

4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section

5 or more nodes 2 2 nodes with 2 or more disk failures on each node

The system automatically triggers a rebalance after 1 minute to restore storage cluster health

5 or more nodes 2 1 node and 1 or more disks on a different node

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication

The main monitoring pages provide information about the local Cisco HyperFlex storage cluster

Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status

Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13

Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth

System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon

Datastores View status information and tasks related to data stores

Virtual Machines View status information and tasks related to protection of virtual machines

Additional Cisco HyperFlex Connect pages provide management access

Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks

Dashboard

The dashboard shows several elements (Figure 4)

Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance

Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics

Cluster size and individual node health

Cluster IOPS storage throughput and latency for the past 1 hour

Figure 4 Dashboard view

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13

Monitor pages

Cisco HyperFlex Connect provides for additional monitoring capabilities including

Alarms Cluster alarms can be viewed acknowledged and reset

Events The cluster event log can be viewed specific events can be filtered for and the log can be exported

Activity Recent job activity can be viewed and the status can be monitored

System Information

The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode

Figure 5 System Information page

Generating a support bundle using Cisco HyperFlex Connect

You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect

Follow this procedure

1 Log in to Cisco HyperFlex Connect

2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle

3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours

You download an existing support bundle in the same way

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13

For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative

Cisco HyperFlex reference documents

Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet

Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30

Cisco HyperFlex Data Platform Administration Guide Release 30

Cisco HyperFlex Systems Troubleshooting Reference Guide 30

Cisco HyperFlex technical support documentation

VMware reference documents

Performance Best Practices for VMware vSphere 60

SAP Solutions on VMware Best Practices Guide

SAP HANA on VMware vSphere Wiki

Printed in USA C11-741524-00 1018

Page 6: Cisco HyperFlex Hyperconverged Infrastructure Solution for ... · virtualized SAP HANA servers, as described in separate white papers published by Cisco. Purpose of this document

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 6 of 13

Cisco HyperFlex HCI solution for SAP HANA This section summarizes the Cisco HyperFlex HCI solution for SAP HANA

Virtual machine CPU configuration

The selection of the physical CPUs in the Cisco HyperFlex nodes is related to the virtual CPU (vCPU) configuration of the SAP application server virtual machines In particular you need to consider the number of physical cores per CPU socket when sizing the virtual machines and configuring the number of vCPUs they use

SAP HANA virtual machines

Take particular care to configure all the vCPUs for the SAP HANA virtual machines so that they are contained in a single non-uniform memory access (NUMA) node A basic sizing guideline is to configure an SAP HANA virtual machine with as many vCPUs as there are cores in a single physical CPU socket For example the reference server described here with Intelreg Xeonreg processor Gold series CPUs has 18 cores per socket In this case configuring the SAP HANA virtual machines with 18 vCPUs is recommended

Configuring a virtual machine with no more vCPUs than a single socket has cores helps ensure that all the virtual machine vCPUs and memory is scheduled within one NUMA node

Storage controller virtual machines

The SCVMs running on each Cisco HyperFlex node are each configured with a mandatory CPU reservation of 10800 MHz (108 GHz) During normal operation of a Cisco HyperFlex system the processor demands of the SCVMs are typically less than 50 percent of this value The ESXi hosts will not perform any throttling of CPU use unless the system nears 100 percent overall CPU use Only in those circumstances would the host set aside the 108 GHz of potential CPU performance because this performance is otherwise guaranteed to the SCVMs In most circumstances the CPU time of the hosts is not being taxed to the limit and therefore the reservation and CPU use of the SCVMs should have little overall impact on the performance of the virtual machines

SAP HANA virtual machine configuration for certification

The Cisco HyperFlex HCI solution for SAP HANA can run four SAP HANA virtual machines per four-node cluster and one SAP HANA virtual machine per node

The storage for the SAP HANA virtual machines was from the Cisco HyperFlex cluster storage pool where each Cisco HyperFlex node has 19 drives

The data log and shared file systems were used as block devices in the SAP HANA virtual machine

The SAP HANA virtual machines were installed and tested with SUSE Linux Enterprise Server (SLES) for SAP Applications 12 SP3 with SAP HANA 20003100

Recommendations To achieve better performance for SAP HANA with the Cisco HyperFlex HCI solution the recommendations described here are proposed

Requirements

To configure SAP HANA running on a Cisco HyperFlex 30 all-flash cluster you need

A functional and healthy running Cisco HyperFlex cluster

Cisco HyperFlex Release 301d or later

Cisco UCS Firmware Release 32(2g) or later

VMware ESXi 65 Update 2 or later

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 7 of 13

VMware vCenter Server Appliance

Appropriate software and licensing from SAP for SAP HANA

Physical components

Table 1 lists the physical components required for the Cisco HyperFlex system

Table 1 Cisco HyperFlex system components

Component Hardware required

Cisco Nexusreg Family switches 2 Cisco Nexus 9336C-FX2 Switches

Fabric interconnects 2 Cisco UCS 6332 Fabric Interconnects

Servers 4 Cisco HyperFlex HX240c M5SX All-Flash Node rack servers

Processor and memory configuration

Note the following guidelines for the processor and memory configuration

At present only four SAP HANA virtual machines are allowed to run on a four-node Cisco HyperFlex cluster with one SAP HANA virtual machine per node

Memory for the SAP HANA virtual machines should be allocated evenly and it must not be shared with nonndashSAP HANA virtual machines

All Cisco HyperFlex converged storage nodes run an SCVM that requires its own reserved virtual RAM On the HX240c All Flash Node server the Cisco HyperFlex SCVM has a memory reservation of 72 GB

For the best performance the amount of RAM configured for a single SAP HANA virtual machine should not exceed the amount of RAM in a single NUMA node

Configure SAP HANA virtual machines with a number of vCPUs that fit within a single NUMA node Review the virtual machinersquos vmwarelog file to help ensure that the configuration is correct

Network configuration

Two Cisco Nexus 9000 Series Switches are used for the uplink of the Cisco HyperFlex HCI solution to the customer network

According to the SAP HANA network requirements the SAP HANA virtual machines should have sufficient VMNICs created and allocated for this purpose The requirements can be found here

Disk sizing

Consult the SAP Quick Sizer to help determine the memory storage and processor requirements for the SAP HANA database to be used in your specific deployment

SAP HANA performance tuning

After SAP HANA is installed tune the parameters as shown in Table 2

Table 2 Tuning parameters

Parameter Data file system Log file system

max_parallel_io_requests 256 Default

async_read_submit On On

async_write_submit_blocks All All

async_write_submit_active Auto On

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 8 of 13

For SAP HANA 20 installations use either hdbsql or the SQL function in SAP HANA Studio or cockpit and the following SQL commands

ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileiomax_parallel_io_requests[Data]) = 256 WITH RECONFIGURE

ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileio async_write_submit_active [Data]) = Auto WITH RECONFIGURE

For more information refer to SAP Note 2399079 Elimination of hdbparam in HANA 2

Cisco HyperFlex HX Data Platform high availability The HX Data Platform high-availability feature helps ensure that the storage cluster maintains at least two copies of all your data during normal operation with three or more fully functional nodes

If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure

The number of nodes in the storage cluster in combination with the data replication factor and access policy settings determines the state of the storage cluster that results from node failures

Before using the HX Data Platform high-availability feature enable VMware Distributed Resource Scheduler (DRS) and vMotion on the vSphere Web Client

Cisco HyperFlex HX Data Platform cluster tolerated failures

If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure

How the number of node failures affects the storage cluster depends on the following

Number of nodes in the cluster The response by the storage cluster is different for clusters with three or four nodes than for clusters with five or more nodes

Data replication factor This factor is set during HX Data Platform installation and cannot be changed The options are either two or three redundant replicas of your data across the storage cluster

Access policy This policy can be changed from the default setting after the storage cluster is created The options are Strict for protecting against data loss and Lenient to support longer storage cluster availability

Cluster state and number of failed nodes

Tables 3 and 4 show how the storage cluster function changes depending on the number of simultaneous node failures

Table 3 Cluster with five or more nodes Cluster state depending on the number of failed nodes

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Table 4 Cluster with three or four nodes Cluster state depending on the number of failed nodes

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster state and number of nodes with failed disks

Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk

When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict

Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes

Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process

Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks

Replication factor Access policy Failed disks on number of different nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster access policy

The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration

Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures

Lenient This option applies policies to support longer storage cluster availability This setting is the default

Responses to storage cluster node failures

The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished

When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6

Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13

Table 6 Storage cluster failures and responses

Cluster size Number of simultaneous failures

Entity that failed Maintenance action to take

3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health

3 nodes 2 2 or more disks on 2 nodes blacklisted or failed

If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute

If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster

4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section

5 or more nodes 2 2 nodes with 2 or more disk failures on each node

The system automatically triggers a rebalance after 1 minute to restore storage cluster health

5 or more nodes 2 1 node and 1 or more disks on a different node

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication

The main monitoring pages provide information about the local Cisco HyperFlex storage cluster

Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status

Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13

Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth

System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon

Datastores View status information and tasks related to data stores

Virtual Machines View status information and tasks related to protection of virtual machines

Additional Cisco HyperFlex Connect pages provide management access

Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks

Dashboard

The dashboard shows several elements (Figure 4)

Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance

Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics

Cluster size and individual node health

Cluster IOPS storage throughput and latency for the past 1 hour

Figure 4 Dashboard view

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13

Monitor pages

Cisco HyperFlex Connect provides for additional monitoring capabilities including

Alarms Cluster alarms can be viewed acknowledged and reset

Events The cluster event log can be viewed specific events can be filtered for and the log can be exported

Activity Recent job activity can be viewed and the status can be monitored

System Information

The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode

Figure 5 System Information page

Generating a support bundle using Cisco HyperFlex Connect

You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect

Follow this procedure

1 Log in to Cisco HyperFlex Connect

2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle

3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours

You download an existing support bundle in the same way

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13

For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative

Cisco HyperFlex reference documents

Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet

Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30

Cisco HyperFlex Data Platform Administration Guide Release 30

Cisco HyperFlex Systems Troubleshooting Reference Guide 30

Cisco HyperFlex technical support documentation

VMware reference documents

Performance Best Practices for VMware vSphere 60

SAP Solutions on VMware Best Practices Guide

SAP HANA on VMware vSphere Wiki

Printed in USA C11-741524-00 1018

Page 7: Cisco HyperFlex Hyperconverged Infrastructure Solution for ... · virtualized SAP HANA servers, as described in separate white papers published by Cisco. Purpose of this document

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 7 of 13

VMware vCenter Server Appliance

Appropriate software and licensing from SAP for SAP HANA

Physical components

Table 1 lists the physical components required for the Cisco HyperFlex system

Table 1 Cisco HyperFlex system components

Component Hardware required

Cisco Nexusreg Family switches 2 Cisco Nexus 9336C-FX2 Switches

Fabric interconnects 2 Cisco UCS 6332 Fabric Interconnects

Servers 4 Cisco HyperFlex HX240c M5SX All-Flash Node rack servers

Processor and memory configuration

Note the following guidelines for the processor and memory configuration

At present only four SAP HANA virtual machines are allowed to run on a four-node Cisco HyperFlex cluster with one SAP HANA virtual machine per node

Memory for the SAP HANA virtual machines should be allocated evenly and it must not be shared with nonndashSAP HANA virtual machines

All Cisco HyperFlex converged storage nodes run an SCVM that requires its own reserved virtual RAM On the HX240c All Flash Node server the Cisco HyperFlex SCVM has a memory reservation of 72 GB

For the best performance the amount of RAM configured for a single SAP HANA virtual machine should not exceed the amount of RAM in a single NUMA node

Configure SAP HANA virtual machines with a number of vCPUs that fit within a single NUMA node Review the virtual machinersquos vmwarelog file to help ensure that the configuration is correct

Network configuration

Two Cisco Nexus 9000 Series Switches are used for the uplink of the Cisco HyperFlex HCI solution to the customer network

According to the SAP HANA network requirements the SAP HANA virtual machines should have sufficient VMNICs created and allocated for this purpose The requirements can be found here

Disk sizing

Consult the SAP Quick Sizer to help determine the memory storage and processor requirements for the SAP HANA database to be used in your specific deployment

SAP HANA performance tuning

After SAP HANA is installed tune the parameters as shown in Table 2

Table 2 Tuning parameters

Parameter Data file system Log file system

max_parallel_io_requests 256 Default

async_read_submit On On

async_write_submit_blocks All All

async_write_submit_active Auto On

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 8 of 13

For SAP HANA 20 installations use either hdbsql or the SQL function in SAP HANA Studio or cockpit and the following SQL commands

ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileiomax_parallel_io_requests[Data]) = 256 WITH RECONFIGURE

ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileio async_write_submit_active [Data]) = Auto WITH RECONFIGURE

For more information refer to SAP Note 2399079 Elimination of hdbparam in HANA 2

Cisco HyperFlex HX Data Platform high availability The HX Data Platform high-availability feature helps ensure that the storage cluster maintains at least two copies of all your data during normal operation with three or more fully functional nodes

If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure

The number of nodes in the storage cluster in combination with the data replication factor and access policy settings determines the state of the storage cluster that results from node failures

Before using the HX Data Platform high-availability feature enable VMware Distributed Resource Scheduler (DRS) and vMotion on the vSphere Web Client

Cisco HyperFlex HX Data Platform cluster tolerated failures

If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure

How the number of node failures affects the storage cluster depends on the following

Number of nodes in the cluster The response by the storage cluster is different for clusters with three or four nodes than for clusters with five or more nodes

Data replication factor This factor is set during HX Data Platform installation and cannot be changed The options are either two or three redundant replicas of your data across the storage cluster

Access policy This policy can be changed from the default setting after the storage cluster is created The options are Strict for protecting against data loss and Lenient to support longer storage cluster availability

Cluster state and number of failed nodes

Tables 3 and 4 show how the storage cluster function changes depending on the number of simultaneous node failures

Table 3 Cluster with five or more nodes Cluster state depending on the number of failed nodes

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Table 4 Cluster with three or four nodes Cluster state depending on the number of failed nodes

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster state and number of nodes with failed disks

Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk

When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict

Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes

Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process

Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks

Replication factor Access policy Failed disks on number of different nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster access policy

The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration

Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures

Lenient This option applies policies to support longer storage cluster availability This setting is the default

Responses to storage cluster node failures

The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished

When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6

Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13

Table 6 Storage cluster failures and responses

Cluster size Number of simultaneous failures

Entity that failed Maintenance action to take

3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health

3 nodes 2 2 or more disks on 2 nodes blacklisted or failed

If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute

If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster

4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section

5 or more nodes 2 2 nodes with 2 or more disk failures on each node

The system automatically triggers a rebalance after 1 minute to restore storage cluster health

5 or more nodes 2 1 node and 1 or more disks on a different node

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication

The main monitoring pages provide information about the local Cisco HyperFlex storage cluster

Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status

Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13

Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth

System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon

Datastores View status information and tasks related to data stores

Virtual Machines View status information and tasks related to protection of virtual machines

Additional Cisco HyperFlex Connect pages provide management access

Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks

Dashboard

The dashboard shows several elements (Figure 4)

Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance

Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics

Cluster size and individual node health

Cluster IOPS storage throughput and latency for the past 1 hour

Figure 4 Dashboard view

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13

Monitor pages

Cisco HyperFlex Connect provides for additional monitoring capabilities including

Alarms Cluster alarms can be viewed acknowledged and reset

Events The cluster event log can be viewed specific events can be filtered for and the log can be exported

Activity Recent job activity can be viewed and the status can be monitored

System Information

The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode

Figure 5 System Information page

Generating a support bundle using Cisco HyperFlex Connect

You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect

Follow this procedure

1 Log in to Cisco HyperFlex Connect

2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle

3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours

You download an existing support bundle in the same way

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13

For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative

Cisco HyperFlex reference documents

Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet

Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30

Cisco HyperFlex Data Platform Administration Guide Release 30

Cisco HyperFlex Systems Troubleshooting Reference Guide 30

Cisco HyperFlex technical support documentation

VMware reference documents

Performance Best Practices for VMware vSphere 60

SAP Solutions on VMware Best Practices Guide

SAP HANA on VMware vSphere Wiki

Printed in USA C11-741524-00 1018

Page 8: Cisco HyperFlex Hyperconverged Infrastructure Solution for ... · virtualized SAP HANA servers, as described in separate white papers published by Cisco. Purpose of this document

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 8 of 13

For SAP HANA 20 installations use either hdbsql or the SQL function in SAP HANA Studio or cockpit and the following SQL commands

ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileiomax_parallel_io_requests[Data]) = 256 WITH RECONFIGURE

ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileio async_write_submit_active [Data]) = Auto WITH RECONFIGURE

For more information refer to SAP Note 2399079 Elimination of hdbparam in HANA 2

Cisco HyperFlex HX Data Platform high availability The HX Data Platform high-availability feature helps ensure that the storage cluster maintains at least two copies of all your data during normal operation with three or more fully functional nodes

If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure

The number of nodes in the storage cluster in combination with the data replication factor and access policy settings determines the state of the storage cluster that results from node failures

Before using the HX Data Platform high-availability feature enable VMware Distributed Resource Scheduler (DRS) and vMotion on the vSphere Web Client

Cisco HyperFlex HX Data Platform cluster tolerated failures

If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure

How the number of node failures affects the storage cluster depends on the following

Number of nodes in the cluster The response by the storage cluster is different for clusters with three or four nodes than for clusters with five or more nodes

Data replication factor This factor is set during HX Data Platform installation and cannot be changed The options are either two or three redundant replicas of your data across the storage cluster

Access policy This policy can be changed from the default setting after the storage cluster is created The options are Strict for protecting against data loss and Lenient to support longer storage cluster availability

Cluster state and number of failed nodes

Tables 3 and 4 show how the storage cluster function changes depending on the number of simultaneous node failures

Table 3 Cluster with five or more nodes Cluster state depending on the number of failed nodes

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Table 4 Cluster with three or four nodes Cluster state depending on the number of failed nodes

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster state and number of nodes with failed disks

Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk

When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict

Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes

Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process

Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks

Replication factor Access policy Failed disks on number of different nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster access policy

The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration

Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures

Lenient This option applies policies to support longer storage cluster availability This setting is the default

Responses to storage cluster node failures

The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished

When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6

Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13

Table 6 Storage cluster failures and responses

Cluster size Number of simultaneous failures

Entity that failed Maintenance action to take

3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health

3 nodes 2 2 or more disks on 2 nodes blacklisted or failed

If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute

If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster

4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section

5 or more nodes 2 2 nodes with 2 or more disk failures on each node

The system automatically triggers a rebalance after 1 minute to restore storage cluster health

5 or more nodes 2 1 node and 1 or more disks on a different node

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication

The main monitoring pages provide information about the local Cisco HyperFlex storage cluster

Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status

Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13

Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth

System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon

Datastores View status information and tasks related to data stores

Virtual Machines View status information and tasks related to protection of virtual machines

Additional Cisco HyperFlex Connect pages provide management access

Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks

Dashboard

The dashboard shows several elements (Figure 4)

Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance

Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics

Cluster size and individual node health

Cluster IOPS storage throughput and latency for the past 1 hour

Figure 4 Dashboard view

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13

Monitor pages

Cisco HyperFlex Connect provides for additional monitoring capabilities including

Alarms Cluster alarms can be viewed acknowledged and reset

Events The cluster event log can be viewed specific events can be filtered for and the log can be exported

Activity Recent job activity can be viewed and the status can be monitored

System Information

The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode

Figure 5 System Information page

Generating a support bundle using Cisco HyperFlex Connect

You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect

Follow this procedure

1 Log in to Cisco HyperFlex Connect

2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle

3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours

You download an existing support bundle in the same way

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13

For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative

Cisco HyperFlex reference documents

Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet

Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30

Cisco HyperFlex Data Platform Administration Guide Release 30

Cisco HyperFlex Systems Troubleshooting Reference Guide 30

Cisco HyperFlex technical support documentation

VMware reference documents

Performance Best Practices for VMware vSphere 60

SAP Solutions on VMware Best Practices Guide

SAP HANA on VMware vSphere Wiki

Printed in USA C11-741524-00 1018

Page 9: Cisco HyperFlex Hyperconverged Infrastructure Solution for ... · virtualized SAP HANA servers, as described in separate white papers published by Cisco. Purpose of this document

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13

Replication factor Access policy Number of failed nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster state and number of nodes with failed disks

Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk

When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict

Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes

Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process

Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks

Replication factor Access policy Failed disks on number of different nodes

Read-write Read only Shutdown

2 Lenient 1 ndash 2

2 Strict ndash 1 2

Cluster access policy

The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration

Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures

Lenient This option applies policies to support longer storage cluster availability This setting is the default

Responses to storage cluster node failures

The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished

When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6

Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13

Table 6 Storage cluster failures and responses

Cluster size Number of simultaneous failures

Entity that failed Maintenance action to take

3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health

3 nodes 2 2 or more disks on 2 nodes blacklisted or failed

If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute

If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster

4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section

5 or more nodes 2 2 nodes with 2 or more disk failures on each node

The system automatically triggers a rebalance after 1 minute to restore storage cluster health

5 or more nodes 2 1 node and 1 or more disks on a different node

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication

The main monitoring pages provide information about the local Cisco HyperFlex storage cluster

Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status

Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13

Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth

System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon

Datastores View status information and tasks related to data stores

Virtual Machines View status information and tasks related to protection of virtual machines

Additional Cisco HyperFlex Connect pages provide management access

Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks

Dashboard

The dashboard shows several elements (Figure 4)

Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance

Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics

Cluster size and individual node health

Cluster IOPS storage throughput and latency for the past 1 hour

Figure 4 Dashboard view

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13

Monitor pages

Cisco HyperFlex Connect provides for additional monitoring capabilities including

Alarms Cluster alarms can be viewed acknowledged and reset

Events The cluster event log can be viewed specific events can be filtered for and the log can be exported

Activity Recent job activity can be viewed and the status can be monitored

System Information

The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode

Figure 5 System Information page

Generating a support bundle using Cisco HyperFlex Connect

You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect

Follow this procedure

1 Log in to Cisco HyperFlex Connect

2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle

3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours

You download an existing support bundle in the same way

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13

For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative

Cisco HyperFlex reference documents

Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet

Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30

Cisco HyperFlex Data Platform Administration Guide Release 30

Cisco HyperFlex Systems Troubleshooting Reference Guide 30

Cisco HyperFlex technical support documentation

VMware reference documents

Performance Best Practices for VMware vSphere 60

SAP Solutions on VMware Best Practices Guide

SAP HANA on VMware vSphere Wiki

Printed in USA C11-741524-00 1018

Page 10: Cisco HyperFlex Hyperconverged Infrastructure Solution for ... · virtualized SAP HANA servers, as described in separate white papers published by Cisco. Purpose of this document

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13

Table 6 Storage cluster failures and responses

Cluster size Number of simultaneous failures

Entity that failed Maintenance action to take

3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health

3 nodes 2 2 or more disks on 2 nodes blacklisted or failed

If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute

If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster

4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section

5 or more nodes 2 2 nodes with 2 or more disk failures on each node

The system automatically triggers a rebalance after 1 minute to restore storage cluster health

5 or more nodes 2 1 node and 1 or more disks on a different node

If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes

If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes

If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well

To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You

may need to replace the node Rebalance the cluster

Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication

The main monitoring pages provide information about the local Cisco HyperFlex storage cluster

Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status

Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13

Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth

System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon

Datastores View status information and tasks related to data stores

Virtual Machines View status information and tasks related to protection of virtual machines

Additional Cisco HyperFlex Connect pages provide management access

Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks

Dashboard

The dashboard shows several elements (Figure 4)

Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance

Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics

Cluster size and individual node health

Cluster IOPS storage throughput and latency for the past 1 hour

Figure 4 Dashboard view

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13

Monitor pages

Cisco HyperFlex Connect provides for additional monitoring capabilities including

Alarms Cluster alarms can be viewed acknowledged and reset

Events The cluster event log can be viewed specific events can be filtered for and the log can be exported

Activity Recent job activity can be viewed and the status can be monitored

System Information

The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode

Figure 5 System Information page

Generating a support bundle using Cisco HyperFlex Connect

You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect

Follow this procedure

1 Log in to Cisco HyperFlex Connect

2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle

3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours

You download an existing support bundle in the same way

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13

For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative

Cisco HyperFlex reference documents

Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet

Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30

Cisco HyperFlex Data Platform Administration Guide Release 30

Cisco HyperFlex Systems Troubleshooting Reference Guide 30

Cisco HyperFlex technical support documentation

VMware reference documents

Performance Best Practices for VMware vSphere 60

SAP Solutions on VMware Best Practices Guide

SAP HANA on VMware vSphere Wiki

Printed in USA C11-741524-00 1018

Page 11: Cisco HyperFlex Hyperconverged Infrastructure Solution for ... · virtualized SAP HANA servers, as described in separate white papers published by Cisco. Purpose of this document

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13

Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth

System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon

Datastores View status information and tasks related to data stores

Virtual Machines View status information and tasks related to protection of virtual machines

Additional Cisco HyperFlex Connect pages provide management access

Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot

Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks

Dashboard

The dashboard shows several elements (Figure 4)

Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance

Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics

Cluster size and individual node health

Cluster IOPS storage throughput and latency for the past 1 hour

Figure 4 Dashboard view

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13

Monitor pages

Cisco HyperFlex Connect provides for additional monitoring capabilities including

Alarms Cluster alarms can be viewed acknowledged and reset

Events The cluster event log can be viewed specific events can be filtered for and the log can be exported

Activity Recent job activity can be viewed and the status can be monitored

System Information

The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode

Figure 5 System Information page

Generating a support bundle using Cisco HyperFlex Connect

You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect

Follow this procedure

1 Log in to Cisco HyperFlex Connect

2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle

3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours

You download an existing support bundle in the same way

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13

For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative

Cisco HyperFlex reference documents

Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet

Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30

Cisco HyperFlex Data Platform Administration Guide Release 30

Cisco HyperFlex Systems Troubleshooting Reference Guide 30

Cisco HyperFlex technical support documentation

VMware reference documents

Performance Best Practices for VMware vSphere 60

SAP Solutions on VMware Best Practices Guide

SAP HANA on VMware vSphere Wiki

Printed in USA C11-741524-00 1018

Page 12: Cisco HyperFlex Hyperconverged Infrastructure Solution for ... · virtualized SAP HANA servers, as described in separate white papers published by Cisco. Purpose of this document

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13

Monitor pages

Cisco HyperFlex Connect provides for additional monitoring capabilities including

Alarms Cluster alarms can be viewed acknowledged and reset

Events The cluster event log can be viewed specific events can be filtered for and the log can be exported

Activity Recent job activity can be viewed and the status can be monitored

System Information

The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode

Figure 5 System Information page

Generating a support bundle using Cisco HyperFlex Connect

You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect

Follow this procedure

1 Log in to Cisco HyperFlex Connect

2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle

3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours

You download an existing support bundle in the same way

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13

For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative

Cisco HyperFlex reference documents

Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet

Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30

Cisco HyperFlex Data Platform Administration Guide Release 30

Cisco HyperFlex Systems Troubleshooting Reference Guide 30

Cisco HyperFlex technical support documentation

VMware reference documents

Performance Best Practices for VMware vSphere 60

SAP Solutions on VMware Best Practices Guide

SAP HANA on VMware vSphere Wiki

Printed in USA C11-741524-00 1018

Page 13: Cisco HyperFlex Hyperconverged Infrastructure Solution for ... · virtualized SAP HANA servers, as described in separate white papers published by Cisco. Purpose of this document

White Paper

copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13

For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative

Cisco HyperFlex reference documents

Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet

Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30

Cisco HyperFlex Data Platform Administration Guide Release 30

Cisco HyperFlex Systems Troubleshooting Reference Guide 30

Cisco HyperFlex technical support documentation

VMware reference documents

Performance Best Practices for VMware vSphere 60

SAP Solutions on VMware Best Practices Guide

SAP HANA on VMware vSphere Wiki

Printed in USA C11-741524-00 1018