cisco hyperflex hyperconverged infrastructure solution for ... · virtualized sap hana servers, as...
TRANSCRIPT
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 1 of 13
Cisco HyperFlex Hyperconverged Infrastructure Solution for SAP HANA
Learn best practices for running SAP HANA on the Cisco HyperFlextrade hyperconverged infrastructure (HCI) solution
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 2 of 13
Contents Introduction 3
Purpose of this document 3 Certification infrastructure 3
Cisco HyperFlex HX240c M5 All Flash Node 3 Solution summary 4 Cisco HyperFlex HX Data Platform controller 4 Data operations and distribution 5 Cisco HyperFlex Connect HTML 5 management webpage 5
Cisco HyperFlex HCI solution for SAP HANA 6 Virtual machine CPU configuration 6 SAP HANA virtual machines 6 Storage controller virtual machines 6 SAP HANA virtual machine configuration for certification 6
Recommendations 6 Requirements 6 Physical components 7 Processor and memory configuration 7 Network configuration 7 Disk sizing 7 SAP HANA performance tuning 7
Cisco HyperFlex HX Data Platform high availability 8 Cisco HyperFlex HX Data Platform cluster tolerated failures 8 Cluster state and number of failed nodes 8 Cluster state and number of nodes with failed disks 9 Cluster access policy 9 Responses to storage cluster node failures 9
Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect 10 Dashboard 11 Monitor pages 12 System Information 12 Generating a support bundle using Cisco HyperFlex Connect 12
For more information 13
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 3 of 13
Introduction This section provides a high-level overview of the Certified hyperconverged infrastructure (HCI) for SAP HANA using the Cisco HyperFlex solution for production environment
SAP landscapes frequently are deployed on virtualization platforms most often using virtualized application servers In recent years SAP has been encouraging its customers to migrate to SAPrsquos own database platform of the future SAP HANA SAP HANA databases can be deployed on virtual servers or on physical machines With the launch of the Cisco HyperFlextrade system Cisco offers a low-cost easy-to-deploy high-performance hyperconverged virtual server platform that is an excellent solution for SAP landscapes Alongside the Cisco HyperFlex solution customers can deploy other standalone servers within the same Cisco Unified Computing Systemtrade (Cisco UCSreg) domain including certified SAP HANA appliances based on Cisco UCS C-Series Rack Servers This combination is an excellent choice for production SAP landscapes
The certified and supported Cisco HyperFlex solution can also be used to deploy SAP application servers along with fully virtualized SAP HANA servers as described in separate white papers published by Cisco
Purpose of this document
This document provides an overview of the Cisco HyperFlex system architecture SAP HANA and best practices for running SAP HANA on the Cisco HyperFlex HX240c M5 All Flash Node
This document does not describe the design installation or configuration of the Cisco HyperFlex system Those details are covered in various Ciscoreg Validated Design documents which are listed in the references section at the end of this document This document also does not include detailed instructions about the installation of the SAP software or OS-level tuning required by SAP although references to this information can also be found at the end of this document
Certification infrastructure This section provides an overview of the infrastructure used to certify this solution
Cisco HyperFlex HX240c M5 All Flash Node
The system used to certify the solution consists of four Cisco HyperFlex HX240c M5 All Flash Nodes integrated into a single system by a pair of Cisco UCS 6300 Series Fabric Interconnects (Figure 1) The HX240c M5 All Flash Node is excellent for high-capacity clusters and provides high performance
Figure 1 Cisco HX240c M5 All Flash Nodes
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 4 of 13
Solution summary
The Cisco HyperFlex system provides a fully contained virtual server platform with computing and memory resources integrated networking connectivity a distributed high-performance log-based file system for virtual machine storage and hypervisor software for running the virtualized servers all within a single Cisco UCS management domain (Figure 2)
Figure 2 Cisco HyperFlex system overview
Cisco HyperFlex HX Data Platform controller
A Cisco HyperFlex HX Data Platform controller resides on each node and implements the distributed file system The controller runs as software in user space within a virtual machine and intercepts and handles all IO from the guest virtual machines The storage controller virtual machine (SCVM) uses the VMDirectPath IO feature to provide PCI pass-through control of the physical serverrsquos SAS disk controller This approach gives the controller virtual machine full control of the physical disk resources using the solid-state disk (SSD) drives as a read-write caching layer and as a capacity layer for distributed storage The controller integrates the data platform into the VMware vSphere cluster through the use of three preinstalled VMware ESXi vSphere Installation Bundles (VIBs) on each node
IO Visor This VIB provides a network file system (NFS) mount point so that the ESXi hypervisor can access the virtual disks that are attached to individual virtual machines From the hypervisorrsquos perspective it is simply attached to a network file system The IO Visor intercepts guest virtual machine IO traffic and intelligently redirects it to the Cisco HyperFlex SCVMs
VMware API for Array Integration (VAAI) This storage offload API allows vSphere to request advanced file system operations such as snapshots and cloning The controller implements these operations through manipulation of the file system metadata rather than actual data copying providing rapid response and thus rapid deployment of new environments
stHypervisorSvc This VIB adds enhancements and features needed for Cisco HyperFlex data protection and virtual machine replication
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 5 of 13
Data operations and distribution
The Cisco HyperFlex HX Data Platform controllers handle all read and write operation requests from the guest virtual machines to the virtual machine disks (VMDKs) stored in the distributed data stores in the cluster The data platform distributes the data across multiple nodes of the cluster and also across multiple capacity disks of each node according to the replication-level policy selected during the cluster setup This approach helps prevent storage hotspots on specific nodes and on specific disks of the nodes and thereby also helps prevent network hotspots and congestion that might result from accessing more data on some nodes than on others
Cisco HyperFlex Connect HTML 5 management webpage
An all-new HTML 5ndashbased web user interface is available for use as the primary management tool for Cisco HyperFlex systems (Figure 3) Through this centralized point of control for the cluster administrators can create volumes monitor data platform health and manage resource use Administrators can also use this data to predict when the cluster will need to be scaled To use the Cisco HyperFlex Connect user interface connect using a web browser to the Cisco HyperFlex cluster IP address httplthx controller cluster ipgt
Figure 3 Cisco HyperFlex Connect GUI
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 6 of 13
Cisco HyperFlex HCI solution for SAP HANA This section summarizes the Cisco HyperFlex HCI solution for SAP HANA
Virtual machine CPU configuration
The selection of the physical CPUs in the Cisco HyperFlex nodes is related to the virtual CPU (vCPU) configuration of the SAP application server virtual machines In particular you need to consider the number of physical cores per CPU socket when sizing the virtual machines and configuring the number of vCPUs they use
SAP HANA virtual machines
Take particular care to configure all the vCPUs for the SAP HANA virtual machines so that they are contained in a single non-uniform memory access (NUMA) node A basic sizing guideline is to configure an SAP HANA virtual machine with as many vCPUs as there are cores in a single physical CPU socket For example the reference server described here with Intelreg Xeonreg processor Gold series CPUs has 18 cores per socket In this case configuring the SAP HANA virtual machines with 18 vCPUs is recommended
Configuring a virtual machine with no more vCPUs than a single socket has cores helps ensure that all the virtual machine vCPUs and memory is scheduled within one NUMA node
Storage controller virtual machines
The SCVMs running on each Cisco HyperFlex node are each configured with a mandatory CPU reservation of 10800 MHz (108 GHz) During normal operation of a Cisco HyperFlex system the processor demands of the SCVMs are typically less than 50 percent of this value The ESXi hosts will not perform any throttling of CPU use unless the system nears 100 percent overall CPU use Only in those circumstances would the host set aside the 108 GHz of potential CPU performance because this performance is otherwise guaranteed to the SCVMs In most circumstances the CPU time of the hosts is not being taxed to the limit and therefore the reservation and CPU use of the SCVMs should have little overall impact on the performance of the virtual machines
SAP HANA virtual machine configuration for certification
The Cisco HyperFlex HCI solution for SAP HANA can run four SAP HANA virtual machines per four-node cluster and one SAP HANA virtual machine per node
The storage for the SAP HANA virtual machines was from the Cisco HyperFlex cluster storage pool where each Cisco HyperFlex node has 19 drives
The data log and shared file systems were used as block devices in the SAP HANA virtual machine
The SAP HANA virtual machines were installed and tested with SUSE Linux Enterprise Server (SLES) for SAP Applications 12 SP3 with SAP HANA 20003100
Recommendations To achieve better performance for SAP HANA with the Cisco HyperFlex HCI solution the recommendations described here are proposed
Requirements
To configure SAP HANA running on a Cisco HyperFlex 30 all-flash cluster you need
A functional and healthy running Cisco HyperFlex cluster
Cisco HyperFlex Release 301d or later
Cisco UCS Firmware Release 32(2g) or later
VMware ESXi 65 Update 2 or later
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 7 of 13
VMware vCenter Server Appliance
Appropriate software and licensing from SAP for SAP HANA
Physical components
Table 1 lists the physical components required for the Cisco HyperFlex system
Table 1 Cisco HyperFlex system components
Component Hardware required
Cisco Nexusreg Family switches 2 Cisco Nexus 9336C-FX2 Switches
Fabric interconnects 2 Cisco UCS 6332 Fabric Interconnects
Servers 4 Cisco HyperFlex HX240c M5SX All-Flash Node rack servers
Processor and memory configuration
Note the following guidelines for the processor and memory configuration
At present only four SAP HANA virtual machines are allowed to run on a four-node Cisco HyperFlex cluster with one SAP HANA virtual machine per node
Memory for the SAP HANA virtual machines should be allocated evenly and it must not be shared with nonndashSAP HANA virtual machines
All Cisco HyperFlex converged storage nodes run an SCVM that requires its own reserved virtual RAM On the HX240c All Flash Node server the Cisco HyperFlex SCVM has a memory reservation of 72 GB
For the best performance the amount of RAM configured for a single SAP HANA virtual machine should not exceed the amount of RAM in a single NUMA node
Configure SAP HANA virtual machines with a number of vCPUs that fit within a single NUMA node Review the virtual machinersquos vmwarelog file to help ensure that the configuration is correct
Network configuration
Two Cisco Nexus 9000 Series Switches are used for the uplink of the Cisco HyperFlex HCI solution to the customer network
According to the SAP HANA network requirements the SAP HANA virtual machines should have sufficient VMNICs created and allocated for this purpose The requirements can be found here
Disk sizing
Consult the SAP Quick Sizer to help determine the memory storage and processor requirements for the SAP HANA database to be used in your specific deployment
SAP HANA performance tuning
After SAP HANA is installed tune the parameters as shown in Table 2
Table 2 Tuning parameters
Parameter Data file system Log file system
max_parallel_io_requests 256 Default
async_read_submit On On
async_write_submit_blocks All All
async_write_submit_active Auto On
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 8 of 13
For SAP HANA 20 installations use either hdbsql or the SQL function in SAP HANA Studio or cockpit and the following SQL commands
ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileiomax_parallel_io_requests[Data]) = 256 WITH RECONFIGURE
ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileio async_write_submit_active [Data]) = Auto WITH RECONFIGURE
For more information refer to SAP Note 2399079 Elimination of hdbparam in HANA 2
Cisco HyperFlex HX Data Platform high availability The HX Data Platform high-availability feature helps ensure that the storage cluster maintains at least two copies of all your data during normal operation with three or more fully functional nodes
If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure
The number of nodes in the storage cluster in combination with the data replication factor and access policy settings determines the state of the storage cluster that results from node failures
Before using the HX Data Platform high-availability feature enable VMware Distributed Resource Scheduler (DRS) and vMotion on the vSphere Web Client
Cisco HyperFlex HX Data Platform cluster tolerated failures
If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure
How the number of node failures affects the storage cluster depends on the following
Number of nodes in the cluster The response by the storage cluster is different for clusters with three or four nodes than for clusters with five or more nodes
Data replication factor This factor is set during HX Data Platform installation and cannot be changed The options are either two or three redundant replicas of your data across the storage cluster
Access policy This policy can be changed from the default setting after the storage cluster is created The options are Strict for protecting against data loss and Lenient to support longer storage cluster availability
Cluster state and number of failed nodes
Tables 3 and 4 show how the storage cluster function changes depending on the number of simultaneous node failures
Table 3 Cluster with five or more nodes Cluster state depending on the number of failed nodes
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Table 4 Cluster with three or four nodes Cluster state depending on the number of failed nodes
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster state and number of nodes with failed disks
Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk
When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict
Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes
Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process
Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks
Replication factor Access policy Failed disks on number of different nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster access policy
The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration
Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures
Lenient This option applies policies to support longer storage cluster availability This setting is the default
Responses to storage cluster node failures
The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished
When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6
Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13
Table 6 Storage cluster failures and responses
Cluster size Number of simultaneous failures
Entity that failed Maintenance action to take
3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health
3 nodes 2 2 or more disks on 2 nodes blacklisted or failed
If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute
If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster
4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section
5 or more nodes 2 2 nodes with 2 or more disk failures on each node
The system automatically triggers a rebalance after 1 minute to restore storage cluster health
5 or more nodes 2 1 node and 1 or more disks on a different node
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication
The main monitoring pages provide information about the local Cisco HyperFlex storage cluster
Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status
Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13
Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth
System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon
Datastores View status information and tasks related to data stores
Virtual Machines View status information and tasks related to protection of virtual machines
Additional Cisco HyperFlex Connect pages provide management access
Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks
Dashboard
The dashboard shows several elements (Figure 4)
Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance
Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics
Cluster size and individual node health
Cluster IOPS storage throughput and latency for the past 1 hour
Figure 4 Dashboard view
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13
Monitor pages
Cisco HyperFlex Connect provides for additional monitoring capabilities including
Alarms Cluster alarms can be viewed acknowledged and reset
Events The cluster event log can be viewed specific events can be filtered for and the log can be exported
Activity Recent job activity can be viewed and the status can be monitored
System Information
The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode
Figure 5 System Information page
Generating a support bundle using Cisco HyperFlex Connect
You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect
Follow this procedure
1 Log in to Cisco HyperFlex Connect
2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle
3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours
You download an existing support bundle in the same way
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13
For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative
Cisco HyperFlex reference documents
Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet
Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30
Cisco HyperFlex Data Platform Administration Guide Release 30
Cisco HyperFlex Systems Troubleshooting Reference Guide 30
Cisco HyperFlex technical support documentation
VMware reference documents
Performance Best Practices for VMware vSphere 60
SAP Solutions on VMware Best Practices Guide
SAP HANA on VMware vSphere Wiki
Printed in USA C11-741524-00 1018
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 2 of 13
Contents Introduction 3
Purpose of this document 3 Certification infrastructure 3
Cisco HyperFlex HX240c M5 All Flash Node 3 Solution summary 4 Cisco HyperFlex HX Data Platform controller 4 Data operations and distribution 5 Cisco HyperFlex Connect HTML 5 management webpage 5
Cisco HyperFlex HCI solution for SAP HANA 6 Virtual machine CPU configuration 6 SAP HANA virtual machines 6 Storage controller virtual machines 6 SAP HANA virtual machine configuration for certification 6
Recommendations 6 Requirements 6 Physical components 7 Processor and memory configuration 7 Network configuration 7 Disk sizing 7 SAP HANA performance tuning 7
Cisco HyperFlex HX Data Platform high availability 8 Cisco HyperFlex HX Data Platform cluster tolerated failures 8 Cluster state and number of failed nodes 8 Cluster state and number of nodes with failed disks 9 Cluster access policy 9 Responses to storage cluster node failures 9
Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect 10 Dashboard 11 Monitor pages 12 System Information 12 Generating a support bundle using Cisco HyperFlex Connect 12
For more information 13
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 3 of 13
Introduction This section provides a high-level overview of the Certified hyperconverged infrastructure (HCI) for SAP HANA using the Cisco HyperFlex solution for production environment
SAP landscapes frequently are deployed on virtualization platforms most often using virtualized application servers In recent years SAP has been encouraging its customers to migrate to SAPrsquos own database platform of the future SAP HANA SAP HANA databases can be deployed on virtual servers or on physical machines With the launch of the Cisco HyperFlextrade system Cisco offers a low-cost easy-to-deploy high-performance hyperconverged virtual server platform that is an excellent solution for SAP landscapes Alongside the Cisco HyperFlex solution customers can deploy other standalone servers within the same Cisco Unified Computing Systemtrade (Cisco UCSreg) domain including certified SAP HANA appliances based on Cisco UCS C-Series Rack Servers This combination is an excellent choice for production SAP landscapes
The certified and supported Cisco HyperFlex solution can also be used to deploy SAP application servers along with fully virtualized SAP HANA servers as described in separate white papers published by Cisco
Purpose of this document
This document provides an overview of the Cisco HyperFlex system architecture SAP HANA and best practices for running SAP HANA on the Cisco HyperFlex HX240c M5 All Flash Node
This document does not describe the design installation or configuration of the Cisco HyperFlex system Those details are covered in various Ciscoreg Validated Design documents which are listed in the references section at the end of this document This document also does not include detailed instructions about the installation of the SAP software or OS-level tuning required by SAP although references to this information can also be found at the end of this document
Certification infrastructure This section provides an overview of the infrastructure used to certify this solution
Cisco HyperFlex HX240c M5 All Flash Node
The system used to certify the solution consists of four Cisco HyperFlex HX240c M5 All Flash Nodes integrated into a single system by a pair of Cisco UCS 6300 Series Fabric Interconnects (Figure 1) The HX240c M5 All Flash Node is excellent for high-capacity clusters and provides high performance
Figure 1 Cisco HX240c M5 All Flash Nodes
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 4 of 13
Solution summary
The Cisco HyperFlex system provides a fully contained virtual server platform with computing and memory resources integrated networking connectivity a distributed high-performance log-based file system for virtual machine storage and hypervisor software for running the virtualized servers all within a single Cisco UCS management domain (Figure 2)
Figure 2 Cisco HyperFlex system overview
Cisco HyperFlex HX Data Platform controller
A Cisco HyperFlex HX Data Platform controller resides on each node and implements the distributed file system The controller runs as software in user space within a virtual machine and intercepts and handles all IO from the guest virtual machines The storage controller virtual machine (SCVM) uses the VMDirectPath IO feature to provide PCI pass-through control of the physical serverrsquos SAS disk controller This approach gives the controller virtual machine full control of the physical disk resources using the solid-state disk (SSD) drives as a read-write caching layer and as a capacity layer for distributed storage The controller integrates the data platform into the VMware vSphere cluster through the use of three preinstalled VMware ESXi vSphere Installation Bundles (VIBs) on each node
IO Visor This VIB provides a network file system (NFS) mount point so that the ESXi hypervisor can access the virtual disks that are attached to individual virtual machines From the hypervisorrsquos perspective it is simply attached to a network file system The IO Visor intercepts guest virtual machine IO traffic and intelligently redirects it to the Cisco HyperFlex SCVMs
VMware API for Array Integration (VAAI) This storage offload API allows vSphere to request advanced file system operations such as snapshots and cloning The controller implements these operations through manipulation of the file system metadata rather than actual data copying providing rapid response and thus rapid deployment of new environments
stHypervisorSvc This VIB adds enhancements and features needed for Cisco HyperFlex data protection and virtual machine replication
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 5 of 13
Data operations and distribution
The Cisco HyperFlex HX Data Platform controllers handle all read and write operation requests from the guest virtual machines to the virtual machine disks (VMDKs) stored in the distributed data stores in the cluster The data platform distributes the data across multiple nodes of the cluster and also across multiple capacity disks of each node according to the replication-level policy selected during the cluster setup This approach helps prevent storage hotspots on specific nodes and on specific disks of the nodes and thereby also helps prevent network hotspots and congestion that might result from accessing more data on some nodes than on others
Cisco HyperFlex Connect HTML 5 management webpage
An all-new HTML 5ndashbased web user interface is available for use as the primary management tool for Cisco HyperFlex systems (Figure 3) Through this centralized point of control for the cluster administrators can create volumes monitor data platform health and manage resource use Administrators can also use this data to predict when the cluster will need to be scaled To use the Cisco HyperFlex Connect user interface connect using a web browser to the Cisco HyperFlex cluster IP address httplthx controller cluster ipgt
Figure 3 Cisco HyperFlex Connect GUI
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 6 of 13
Cisco HyperFlex HCI solution for SAP HANA This section summarizes the Cisco HyperFlex HCI solution for SAP HANA
Virtual machine CPU configuration
The selection of the physical CPUs in the Cisco HyperFlex nodes is related to the virtual CPU (vCPU) configuration of the SAP application server virtual machines In particular you need to consider the number of physical cores per CPU socket when sizing the virtual machines and configuring the number of vCPUs they use
SAP HANA virtual machines
Take particular care to configure all the vCPUs for the SAP HANA virtual machines so that they are contained in a single non-uniform memory access (NUMA) node A basic sizing guideline is to configure an SAP HANA virtual machine with as many vCPUs as there are cores in a single physical CPU socket For example the reference server described here with Intelreg Xeonreg processor Gold series CPUs has 18 cores per socket In this case configuring the SAP HANA virtual machines with 18 vCPUs is recommended
Configuring a virtual machine with no more vCPUs than a single socket has cores helps ensure that all the virtual machine vCPUs and memory is scheduled within one NUMA node
Storage controller virtual machines
The SCVMs running on each Cisco HyperFlex node are each configured with a mandatory CPU reservation of 10800 MHz (108 GHz) During normal operation of a Cisco HyperFlex system the processor demands of the SCVMs are typically less than 50 percent of this value The ESXi hosts will not perform any throttling of CPU use unless the system nears 100 percent overall CPU use Only in those circumstances would the host set aside the 108 GHz of potential CPU performance because this performance is otherwise guaranteed to the SCVMs In most circumstances the CPU time of the hosts is not being taxed to the limit and therefore the reservation and CPU use of the SCVMs should have little overall impact on the performance of the virtual machines
SAP HANA virtual machine configuration for certification
The Cisco HyperFlex HCI solution for SAP HANA can run four SAP HANA virtual machines per four-node cluster and one SAP HANA virtual machine per node
The storage for the SAP HANA virtual machines was from the Cisco HyperFlex cluster storage pool where each Cisco HyperFlex node has 19 drives
The data log and shared file systems were used as block devices in the SAP HANA virtual machine
The SAP HANA virtual machines were installed and tested with SUSE Linux Enterprise Server (SLES) for SAP Applications 12 SP3 with SAP HANA 20003100
Recommendations To achieve better performance for SAP HANA with the Cisco HyperFlex HCI solution the recommendations described here are proposed
Requirements
To configure SAP HANA running on a Cisco HyperFlex 30 all-flash cluster you need
A functional and healthy running Cisco HyperFlex cluster
Cisco HyperFlex Release 301d or later
Cisco UCS Firmware Release 32(2g) or later
VMware ESXi 65 Update 2 or later
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 7 of 13
VMware vCenter Server Appliance
Appropriate software and licensing from SAP for SAP HANA
Physical components
Table 1 lists the physical components required for the Cisco HyperFlex system
Table 1 Cisco HyperFlex system components
Component Hardware required
Cisco Nexusreg Family switches 2 Cisco Nexus 9336C-FX2 Switches
Fabric interconnects 2 Cisco UCS 6332 Fabric Interconnects
Servers 4 Cisco HyperFlex HX240c M5SX All-Flash Node rack servers
Processor and memory configuration
Note the following guidelines for the processor and memory configuration
At present only four SAP HANA virtual machines are allowed to run on a four-node Cisco HyperFlex cluster with one SAP HANA virtual machine per node
Memory for the SAP HANA virtual machines should be allocated evenly and it must not be shared with nonndashSAP HANA virtual machines
All Cisco HyperFlex converged storage nodes run an SCVM that requires its own reserved virtual RAM On the HX240c All Flash Node server the Cisco HyperFlex SCVM has a memory reservation of 72 GB
For the best performance the amount of RAM configured for a single SAP HANA virtual machine should not exceed the amount of RAM in a single NUMA node
Configure SAP HANA virtual machines with a number of vCPUs that fit within a single NUMA node Review the virtual machinersquos vmwarelog file to help ensure that the configuration is correct
Network configuration
Two Cisco Nexus 9000 Series Switches are used for the uplink of the Cisco HyperFlex HCI solution to the customer network
According to the SAP HANA network requirements the SAP HANA virtual machines should have sufficient VMNICs created and allocated for this purpose The requirements can be found here
Disk sizing
Consult the SAP Quick Sizer to help determine the memory storage and processor requirements for the SAP HANA database to be used in your specific deployment
SAP HANA performance tuning
After SAP HANA is installed tune the parameters as shown in Table 2
Table 2 Tuning parameters
Parameter Data file system Log file system
max_parallel_io_requests 256 Default
async_read_submit On On
async_write_submit_blocks All All
async_write_submit_active Auto On
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 8 of 13
For SAP HANA 20 installations use either hdbsql or the SQL function in SAP HANA Studio or cockpit and the following SQL commands
ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileiomax_parallel_io_requests[Data]) = 256 WITH RECONFIGURE
ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileio async_write_submit_active [Data]) = Auto WITH RECONFIGURE
For more information refer to SAP Note 2399079 Elimination of hdbparam in HANA 2
Cisco HyperFlex HX Data Platform high availability The HX Data Platform high-availability feature helps ensure that the storage cluster maintains at least two copies of all your data during normal operation with three or more fully functional nodes
If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure
The number of nodes in the storage cluster in combination with the data replication factor and access policy settings determines the state of the storage cluster that results from node failures
Before using the HX Data Platform high-availability feature enable VMware Distributed Resource Scheduler (DRS) and vMotion on the vSphere Web Client
Cisco HyperFlex HX Data Platform cluster tolerated failures
If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure
How the number of node failures affects the storage cluster depends on the following
Number of nodes in the cluster The response by the storage cluster is different for clusters with three or four nodes than for clusters with five or more nodes
Data replication factor This factor is set during HX Data Platform installation and cannot be changed The options are either two or three redundant replicas of your data across the storage cluster
Access policy This policy can be changed from the default setting after the storage cluster is created The options are Strict for protecting against data loss and Lenient to support longer storage cluster availability
Cluster state and number of failed nodes
Tables 3 and 4 show how the storage cluster function changes depending on the number of simultaneous node failures
Table 3 Cluster with five or more nodes Cluster state depending on the number of failed nodes
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Table 4 Cluster with three or four nodes Cluster state depending on the number of failed nodes
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster state and number of nodes with failed disks
Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk
When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict
Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes
Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process
Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks
Replication factor Access policy Failed disks on number of different nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster access policy
The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration
Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures
Lenient This option applies policies to support longer storage cluster availability This setting is the default
Responses to storage cluster node failures
The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished
When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6
Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13
Table 6 Storage cluster failures and responses
Cluster size Number of simultaneous failures
Entity that failed Maintenance action to take
3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health
3 nodes 2 2 or more disks on 2 nodes blacklisted or failed
If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute
If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster
4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section
5 or more nodes 2 2 nodes with 2 or more disk failures on each node
The system automatically triggers a rebalance after 1 minute to restore storage cluster health
5 or more nodes 2 1 node and 1 or more disks on a different node
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication
The main monitoring pages provide information about the local Cisco HyperFlex storage cluster
Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status
Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13
Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth
System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon
Datastores View status information and tasks related to data stores
Virtual Machines View status information and tasks related to protection of virtual machines
Additional Cisco HyperFlex Connect pages provide management access
Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks
Dashboard
The dashboard shows several elements (Figure 4)
Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance
Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics
Cluster size and individual node health
Cluster IOPS storage throughput and latency for the past 1 hour
Figure 4 Dashboard view
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13
Monitor pages
Cisco HyperFlex Connect provides for additional monitoring capabilities including
Alarms Cluster alarms can be viewed acknowledged and reset
Events The cluster event log can be viewed specific events can be filtered for and the log can be exported
Activity Recent job activity can be viewed and the status can be monitored
System Information
The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode
Figure 5 System Information page
Generating a support bundle using Cisco HyperFlex Connect
You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect
Follow this procedure
1 Log in to Cisco HyperFlex Connect
2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle
3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours
You download an existing support bundle in the same way
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13
For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative
Cisco HyperFlex reference documents
Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet
Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30
Cisco HyperFlex Data Platform Administration Guide Release 30
Cisco HyperFlex Systems Troubleshooting Reference Guide 30
Cisco HyperFlex technical support documentation
VMware reference documents
Performance Best Practices for VMware vSphere 60
SAP Solutions on VMware Best Practices Guide
SAP HANA on VMware vSphere Wiki
Printed in USA C11-741524-00 1018
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 3 of 13
Introduction This section provides a high-level overview of the Certified hyperconverged infrastructure (HCI) for SAP HANA using the Cisco HyperFlex solution for production environment
SAP landscapes frequently are deployed on virtualization platforms most often using virtualized application servers In recent years SAP has been encouraging its customers to migrate to SAPrsquos own database platform of the future SAP HANA SAP HANA databases can be deployed on virtual servers or on physical machines With the launch of the Cisco HyperFlextrade system Cisco offers a low-cost easy-to-deploy high-performance hyperconverged virtual server platform that is an excellent solution for SAP landscapes Alongside the Cisco HyperFlex solution customers can deploy other standalone servers within the same Cisco Unified Computing Systemtrade (Cisco UCSreg) domain including certified SAP HANA appliances based on Cisco UCS C-Series Rack Servers This combination is an excellent choice for production SAP landscapes
The certified and supported Cisco HyperFlex solution can also be used to deploy SAP application servers along with fully virtualized SAP HANA servers as described in separate white papers published by Cisco
Purpose of this document
This document provides an overview of the Cisco HyperFlex system architecture SAP HANA and best practices for running SAP HANA on the Cisco HyperFlex HX240c M5 All Flash Node
This document does not describe the design installation or configuration of the Cisco HyperFlex system Those details are covered in various Ciscoreg Validated Design documents which are listed in the references section at the end of this document This document also does not include detailed instructions about the installation of the SAP software or OS-level tuning required by SAP although references to this information can also be found at the end of this document
Certification infrastructure This section provides an overview of the infrastructure used to certify this solution
Cisco HyperFlex HX240c M5 All Flash Node
The system used to certify the solution consists of four Cisco HyperFlex HX240c M5 All Flash Nodes integrated into a single system by a pair of Cisco UCS 6300 Series Fabric Interconnects (Figure 1) The HX240c M5 All Flash Node is excellent for high-capacity clusters and provides high performance
Figure 1 Cisco HX240c M5 All Flash Nodes
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 4 of 13
Solution summary
The Cisco HyperFlex system provides a fully contained virtual server platform with computing and memory resources integrated networking connectivity a distributed high-performance log-based file system for virtual machine storage and hypervisor software for running the virtualized servers all within a single Cisco UCS management domain (Figure 2)
Figure 2 Cisco HyperFlex system overview
Cisco HyperFlex HX Data Platform controller
A Cisco HyperFlex HX Data Platform controller resides on each node and implements the distributed file system The controller runs as software in user space within a virtual machine and intercepts and handles all IO from the guest virtual machines The storage controller virtual machine (SCVM) uses the VMDirectPath IO feature to provide PCI pass-through control of the physical serverrsquos SAS disk controller This approach gives the controller virtual machine full control of the physical disk resources using the solid-state disk (SSD) drives as a read-write caching layer and as a capacity layer for distributed storage The controller integrates the data platform into the VMware vSphere cluster through the use of three preinstalled VMware ESXi vSphere Installation Bundles (VIBs) on each node
IO Visor This VIB provides a network file system (NFS) mount point so that the ESXi hypervisor can access the virtual disks that are attached to individual virtual machines From the hypervisorrsquos perspective it is simply attached to a network file system The IO Visor intercepts guest virtual machine IO traffic and intelligently redirects it to the Cisco HyperFlex SCVMs
VMware API for Array Integration (VAAI) This storage offload API allows vSphere to request advanced file system operations such as snapshots and cloning The controller implements these operations through manipulation of the file system metadata rather than actual data copying providing rapid response and thus rapid deployment of new environments
stHypervisorSvc This VIB adds enhancements and features needed for Cisco HyperFlex data protection and virtual machine replication
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 5 of 13
Data operations and distribution
The Cisco HyperFlex HX Data Platform controllers handle all read and write operation requests from the guest virtual machines to the virtual machine disks (VMDKs) stored in the distributed data stores in the cluster The data platform distributes the data across multiple nodes of the cluster and also across multiple capacity disks of each node according to the replication-level policy selected during the cluster setup This approach helps prevent storage hotspots on specific nodes and on specific disks of the nodes and thereby also helps prevent network hotspots and congestion that might result from accessing more data on some nodes than on others
Cisco HyperFlex Connect HTML 5 management webpage
An all-new HTML 5ndashbased web user interface is available for use as the primary management tool for Cisco HyperFlex systems (Figure 3) Through this centralized point of control for the cluster administrators can create volumes monitor data platform health and manage resource use Administrators can also use this data to predict when the cluster will need to be scaled To use the Cisco HyperFlex Connect user interface connect using a web browser to the Cisco HyperFlex cluster IP address httplthx controller cluster ipgt
Figure 3 Cisco HyperFlex Connect GUI
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 6 of 13
Cisco HyperFlex HCI solution for SAP HANA This section summarizes the Cisco HyperFlex HCI solution for SAP HANA
Virtual machine CPU configuration
The selection of the physical CPUs in the Cisco HyperFlex nodes is related to the virtual CPU (vCPU) configuration of the SAP application server virtual machines In particular you need to consider the number of physical cores per CPU socket when sizing the virtual machines and configuring the number of vCPUs they use
SAP HANA virtual machines
Take particular care to configure all the vCPUs for the SAP HANA virtual machines so that they are contained in a single non-uniform memory access (NUMA) node A basic sizing guideline is to configure an SAP HANA virtual machine with as many vCPUs as there are cores in a single physical CPU socket For example the reference server described here with Intelreg Xeonreg processor Gold series CPUs has 18 cores per socket In this case configuring the SAP HANA virtual machines with 18 vCPUs is recommended
Configuring a virtual machine with no more vCPUs than a single socket has cores helps ensure that all the virtual machine vCPUs and memory is scheduled within one NUMA node
Storage controller virtual machines
The SCVMs running on each Cisco HyperFlex node are each configured with a mandatory CPU reservation of 10800 MHz (108 GHz) During normal operation of a Cisco HyperFlex system the processor demands of the SCVMs are typically less than 50 percent of this value The ESXi hosts will not perform any throttling of CPU use unless the system nears 100 percent overall CPU use Only in those circumstances would the host set aside the 108 GHz of potential CPU performance because this performance is otherwise guaranteed to the SCVMs In most circumstances the CPU time of the hosts is not being taxed to the limit and therefore the reservation and CPU use of the SCVMs should have little overall impact on the performance of the virtual machines
SAP HANA virtual machine configuration for certification
The Cisco HyperFlex HCI solution for SAP HANA can run four SAP HANA virtual machines per four-node cluster and one SAP HANA virtual machine per node
The storage for the SAP HANA virtual machines was from the Cisco HyperFlex cluster storage pool where each Cisco HyperFlex node has 19 drives
The data log and shared file systems were used as block devices in the SAP HANA virtual machine
The SAP HANA virtual machines were installed and tested with SUSE Linux Enterprise Server (SLES) for SAP Applications 12 SP3 with SAP HANA 20003100
Recommendations To achieve better performance for SAP HANA with the Cisco HyperFlex HCI solution the recommendations described here are proposed
Requirements
To configure SAP HANA running on a Cisco HyperFlex 30 all-flash cluster you need
A functional and healthy running Cisco HyperFlex cluster
Cisco HyperFlex Release 301d or later
Cisco UCS Firmware Release 32(2g) or later
VMware ESXi 65 Update 2 or later
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 7 of 13
VMware vCenter Server Appliance
Appropriate software and licensing from SAP for SAP HANA
Physical components
Table 1 lists the physical components required for the Cisco HyperFlex system
Table 1 Cisco HyperFlex system components
Component Hardware required
Cisco Nexusreg Family switches 2 Cisco Nexus 9336C-FX2 Switches
Fabric interconnects 2 Cisco UCS 6332 Fabric Interconnects
Servers 4 Cisco HyperFlex HX240c M5SX All-Flash Node rack servers
Processor and memory configuration
Note the following guidelines for the processor and memory configuration
At present only four SAP HANA virtual machines are allowed to run on a four-node Cisco HyperFlex cluster with one SAP HANA virtual machine per node
Memory for the SAP HANA virtual machines should be allocated evenly and it must not be shared with nonndashSAP HANA virtual machines
All Cisco HyperFlex converged storage nodes run an SCVM that requires its own reserved virtual RAM On the HX240c All Flash Node server the Cisco HyperFlex SCVM has a memory reservation of 72 GB
For the best performance the amount of RAM configured for a single SAP HANA virtual machine should not exceed the amount of RAM in a single NUMA node
Configure SAP HANA virtual machines with a number of vCPUs that fit within a single NUMA node Review the virtual machinersquos vmwarelog file to help ensure that the configuration is correct
Network configuration
Two Cisco Nexus 9000 Series Switches are used for the uplink of the Cisco HyperFlex HCI solution to the customer network
According to the SAP HANA network requirements the SAP HANA virtual machines should have sufficient VMNICs created and allocated for this purpose The requirements can be found here
Disk sizing
Consult the SAP Quick Sizer to help determine the memory storage and processor requirements for the SAP HANA database to be used in your specific deployment
SAP HANA performance tuning
After SAP HANA is installed tune the parameters as shown in Table 2
Table 2 Tuning parameters
Parameter Data file system Log file system
max_parallel_io_requests 256 Default
async_read_submit On On
async_write_submit_blocks All All
async_write_submit_active Auto On
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 8 of 13
For SAP HANA 20 installations use either hdbsql or the SQL function in SAP HANA Studio or cockpit and the following SQL commands
ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileiomax_parallel_io_requests[Data]) = 256 WITH RECONFIGURE
ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileio async_write_submit_active [Data]) = Auto WITH RECONFIGURE
For more information refer to SAP Note 2399079 Elimination of hdbparam in HANA 2
Cisco HyperFlex HX Data Platform high availability The HX Data Platform high-availability feature helps ensure that the storage cluster maintains at least two copies of all your data during normal operation with three or more fully functional nodes
If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure
The number of nodes in the storage cluster in combination with the data replication factor and access policy settings determines the state of the storage cluster that results from node failures
Before using the HX Data Platform high-availability feature enable VMware Distributed Resource Scheduler (DRS) and vMotion on the vSphere Web Client
Cisco HyperFlex HX Data Platform cluster tolerated failures
If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure
How the number of node failures affects the storage cluster depends on the following
Number of nodes in the cluster The response by the storage cluster is different for clusters with three or four nodes than for clusters with five or more nodes
Data replication factor This factor is set during HX Data Platform installation and cannot be changed The options are either two or three redundant replicas of your data across the storage cluster
Access policy This policy can be changed from the default setting after the storage cluster is created The options are Strict for protecting against data loss and Lenient to support longer storage cluster availability
Cluster state and number of failed nodes
Tables 3 and 4 show how the storage cluster function changes depending on the number of simultaneous node failures
Table 3 Cluster with five or more nodes Cluster state depending on the number of failed nodes
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Table 4 Cluster with three or four nodes Cluster state depending on the number of failed nodes
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster state and number of nodes with failed disks
Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk
When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict
Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes
Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process
Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks
Replication factor Access policy Failed disks on number of different nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster access policy
The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration
Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures
Lenient This option applies policies to support longer storage cluster availability This setting is the default
Responses to storage cluster node failures
The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished
When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6
Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13
Table 6 Storage cluster failures and responses
Cluster size Number of simultaneous failures
Entity that failed Maintenance action to take
3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health
3 nodes 2 2 or more disks on 2 nodes blacklisted or failed
If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute
If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster
4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section
5 or more nodes 2 2 nodes with 2 or more disk failures on each node
The system automatically triggers a rebalance after 1 minute to restore storage cluster health
5 or more nodes 2 1 node and 1 or more disks on a different node
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication
The main monitoring pages provide information about the local Cisco HyperFlex storage cluster
Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status
Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13
Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth
System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon
Datastores View status information and tasks related to data stores
Virtual Machines View status information and tasks related to protection of virtual machines
Additional Cisco HyperFlex Connect pages provide management access
Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks
Dashboard
The dashboard shows several elements (Figure 4)
Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance
Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics
Cluster size and individual node health
Cluster IOPS storage throughput and latency for the past 1 hour
Figure 4 Dashboard view
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13
Monitor pages
Cisco HyperFlex Connect provides for additional monitoring capabilities including
Alarms Cluster alarms can be viewed acknowledged and reset
Events The cluster event log can be viewed specific events can be filtered for and the log can be exported
Activity Recent job activity can be viewed and the status can be monitored
System Information
The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode
Figure 5 System Information page
Generating a support bundle using Cisco HyperFlex Connect
You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect
Follow this procedure
1 Log in to Cisco HyperFlex Connect
2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle
3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours
You download an existing support bundle in the same way
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13
For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative
Cisco HyperFlex reference documents
Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet
Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30
Cisco HyperFlex Data Platform Administration Guide Release 30
Cisco HyperFlex Systems Troubleshooting Reference Guide 30
Cisco HyperFlex technical support documentation
VMware reference documents
Performance Best Practices for VMware vSphere 60
SAP Solutions on VMware Best Practices Guide
SAP HANA on VMware vSphere Wiki
Printed in USA C11-741524-00 1018
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 4 of 13
Solution summary
The Cisco HyperFlex system provides a fully contained virtual server platform with computing and memory resources integrated networking connectivity a distributed high-performance log-based file system for virtual machine storage and hypervisor software for running the virtualized servers all within a single Cisco UCS management domain (Figure 2)
Figure 2 Cisco HyperFlex system overview
Cisco HyperFlex HX Data Platform controller
A Cisco HyperFlex HX Data Platform controller resides on each node and implements the distributed file system The controller runs as software in user space within a virtual machine and intercepts and handles all IO from the guest virtual machines The storage controller virtual machine (SCVM) uses the VMDirectPath IO feature to provide PCI pass-through control of the physical serverrsquos SAS disk controller This approach gives the controller virtual machine full control of the physical disk resources using the solid-state disk (SSD) drives as a read-write caching layer and as a capacity layer for distributed storage The controller integrates the data platform into the VMware vSphere cluster through the use of three preinstalled VMware ESXi vSphere Installation Bundles (VIBs) on each node
IO Visor This VIB provides a network file system (NFS) mount point so that the ESXi hypervisor can access the virtual disks that are attached to individual virtual machines From the hypervisorrsquos perspective it is simply attached to a network file system The IO Visor intercepts guest virtual machine IO traffic and intelligently redirects it to the Cisco HyperFlex SCVMs
VMware API for Array Integration (VAAI) This storage offload API allows vSphere to request advanced file system operations such as snapshots and cloning The controller implements these operations through manipulation of the file system metadata rather than actual data copying providing rapid response and thus rapid deployment of new environments
stHypervisorSvc This VIB adds enhancements and features needed for Cisco HyperFlex data protection and virtual machine replication
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 5 of 13
Data operations and distribution
The Cisco HyperFlex HX Data Platform controllers handle all read and write operation requests from the guest virtual machines to the virtual machine disks (VMDKs) stored in the distributed data stores in the cluster The data platform distributes the data across multiple nodes of the cluster and also across multiple capacity disks of each node according to the replication-level policy selected during the cluster setup This approach helps prevent storage hotspots on specific nodes and on specific disks of the nodes and thereby also helps prevent network hotspots and congestion that might result from accessing more data on some nodes than on others
Cisco HyperFlex Connect HTML 5 management webpage
An all-new HTML 5ndashbased web user interface is available for use as the primary management tool for Cisco HyperFlex systems (Figure 3) Through this centralized point of control for the cluster administrators can create volumes monitor data platform health and manage resource use Administrators can also use this data to predict when the cluster will need to be scaled To use the Cisco HyperFlex Connect user interface connect using a web browser to the Cisco HyperFlex cluster IP address httplthx controller cluster ipgt
Figure 3 Cisco HyperFlex Connect GUI
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 6 of 13
Cisco HyperFlex HCI solution for SAP HANA This section summarizes the Cisco HyperFlex HCI solution for SAP HANA
Virtual machine CPU configuration
The selection of the physical CPUs in the Cisco HyperFlex nodes is related to the virtual CPU (vCPU) configuration of the SAP application server virtual machines In particular you need to consider the number of physical cores per CPU socket when sizing the virtual machines and configuring the number of vCPUs they use
SAP HANA virtual machines
Take particular care to configure all the vCPUs for the SAP HANA virtual machines so that they are contained in a single non-uniform memory access (NUMA) node A basic sizing guideline is to configure an SAP HANA virtual machine with as many vCPUs as there are cores in a single physical CPU socket For example the reference server described here with Intelreg Xeonreg processor Gold series CPUs has 18 cores per socket In this case configuring the SAP HANA virtual machines with 18 vCPUs is recommended
Configuring a virtual machine with no more vCPUs than a single socket has cores helps ensure that all the virtual machine vCPUs and memory is scheduled within one NUMA node
Storage controller virtual machines
The SCVMs running on each Cisco HyperFlex node are each configured with a mandatory CPU reservation of 10800 MHz (108 GHz) During normal operation of a Cisco HyperFlex system the processor demands of the SCVMs are typically less than 50 percent of this value The ESXi hosts will not perform any throttling of CPU use unless the system nears 100 percent overall CPU use Only in those circumstances would the host set aside the 108 GHz of potential CPU performance because this performance is otherwise guaranteed to the SCVMs In most circumstances the CPU time of the hosts is not being taxed to the limit and therefore the reservation and CPU use of the SCVMs should have little overall impact on the performance of the virtual machines
SAP HANA virtual machine configuration for certification
The Cisco HyperFlex HCI solution for SAP HANA can run four SAP HANA virtual machines per four-node cluster and one SAP HANA virtual machine per node
The storage for the SAP HANA virtual machines was from the Cisco HyperFlex cluster storage pool where each Cisco HyperFlex node has 19 drives
The data log and shared file systems were used as block devices in the SAP HANA virtual machine
The SAP HANA virtual machines were installed and tested with SUSE Linux Enterprise Server (SLES) for SAP Applications 12 SP3 with SAP HANA 20003100
Recommendations To achieve better performance for SAP HANA with the Cisco HyperFlex HCI solution the recommendations described here are proposed
Requirements
To configure SAP HANA running on a Cisco HyperFlex 30 all-flash cluster you need
A functional and healthy running Cisco HyperFlex cluster
Cisco HyperFlex Release 301d or later
Cisco UCS Firmware Release 32(2g) or later
VMware ESXi 65 Update 2 or later
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 7 of 13
VMware vCenter Server Appliance
Appropriate software and licensing from SAP for SAP HANA
Physical components
Table 1 lists the physical components required for the Cisco HyperFlex system
Table 1 Cisco HyperFlex system components
Component Hardware required
Cisco Nexusreg Family switches 2 Cisco Nexus 9336C-FX2 Switches
Fabric interconnects 2 Cisco UCS 6332 Fabric Interconnects
Servers 4 Cisco HyperFlex HX240c M5SX All-Flash Node rack servers
Processor and memory configuration
Note the following guidelines for the processor and memory configuration
At present only four SAP HANA virtual machines are allowed to run on a four-node Cisco HyperFlex cluster with one SAP HANA virtual machine per node
Memory for the SAP HANA virtual machines should be allocated evenly and it must not be shared with nonndashSAP HANA virtual machines
All Cisco HyperFlex converged storage nodes run an SCVM that requires its own reserved virtual RAM On the HX240c All Flash Node server the Cisco HyperFlex SCVM has a memory reservation of 72 GB
For the best performance the amount of RAM configured for a single SAP HANA virtual machine should not exceed the amount of RAM in a single NUMA node
Configure SAP HANA virtual machines with a number of vCPUs that fit within a single NUMA node Review the virtual machinersquos vmwarelog file to help ensure that the configuration is correct
Network configuration
Two Cisco Nexus 9000 Series Switches are used for the uplink of the Cisco HyperFlex HCI solution to the customer network
According to the SAP HANA network requirements the SAP HANA virtual machines should have sufficient VMNICs created and allocated for this purpose The requirements can be found here
Disk sizing
Consult the SAP Quick Sizer to help determine the memory storage and processor requirements for the SAP HANA database to be used in your specific deployment
SAP HANA performance tuning
After SAP HANA is installed tune the parameters as shown in Table 2
Table 2 Tuning parameters
Parameter Data file system Log file system
max_parallel_io_requests 256 Default
async_read_submit On On
async_write_submit_blocks All All
async_write_submit_active Auto On
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 8 of 13
For SAP HANA 20 installations use either hdbsql or the SQL function in SAP HANA Studio or cockpit and the following SQL commands
ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileiomax_parallel_io_requests[Data]) = 256 WITH RECONFIGURE
ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileio async_write_submit_active [Data]) = Auto WITH RECONFIGURE
For more information refer to SAP Note 2399079 Elimination of hdbparam in HANA 2
Cisco HyperFlex HX Data Platform high availability The HX Data Platform high-availability feature helps ensure that the storage cluster maintains at least two copies of all your data during normal operation with three or more fully functional nodes
If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure
The number of nodes in the storage cluster in combination with the data replication factor and access policy settings determines the state of the storage cluster that results from node failures
Before using the HX Data Platform high-availability feature enable VMware Distributed Resource Scheduler (DRS) and vMotion on the vSphere Web Client
Cisco HyperFlex HX Data Platform cluster tolerated failures
If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure
How the number of node failures affects the storage cluster depends on the following
Number of nodes in the cluster The response by the storage cluster is different for clusters with three or four nodes than for clusters with five or more nodes
Data replication factor This factor is set during HX Data Platform installation and cannot be changed The options are either two or three redundant replicas of your data across the storage cluster
Access policy This policy can be changed from the default setting after the storage cluster is created The options are Strict for protecting against data loss and Lenient to support longer storage cluster availability
Cluster state and number of failed nodes
Tables 3 and 4 show how the storage cluster function changes depending on the number of simultaneous node failures
Table 3 Cluster with five or more nodes Cluster state depending on the number of failed nodes
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Table 4 Cluster with three or four nodes Cluster state depending on the number of failed nodes
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster state and number of nodes with failed disks
Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk
When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict
Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes
Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process
Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks
Replication factor Access policy Failed disks on number of different nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster access policy
The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration
Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures
Lenient This option applies policies to support longer storage cluster availability This setting is the default
Responses to storage cluster node failures
The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished
When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6
Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13
Table 6 Storage cluster failures and responses
Cluster size Number of simultaneous failures
Entity that failed Maintenance action to take
3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health
3 nodes 2 2 or more disks on 2 nodes blacklisted or failed
If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute
If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster
4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section
5 or more nodes 2 2 nodes with 2 or more disk failures on each node
The system automatically triggers a rebalance after 1 minute to restore storage cluster health
5 or more nodes 2 1 node and 1 or more disks on a different node
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication
The main monitoring pages provide information about the local Cisco HyperFlex storage cluster
Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status
Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13
Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth
System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon
Datastores View status information and tasks related to data stores
Virtual Machines View status information and tasks related to protection of virtual machines
Additional Cisco HyperFlex Connect pages provide management access
Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks
Dashboard
The dashboard shows several elements (Figure 4)
Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance
Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics
Cluster size and individual node health
Cluster IOPS storage throughput and latency for the past 1 hour
Figure 4 Dashboard view
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13
Monitor pages
Cisco HyperFlex Connect provides for additional monitoring capabilities including
Alarms Cluster alarms can be viewed acknowledged and reset
Events The cluster event log can be viewed specific events can be filtered for and the log can be exported
Activity Recent job activity can be viewed and the status can be monitored
System Information
The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode
Figure 5 System Information page
Generating a support bundle using Cisco HyperFlex Connect
You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect
Follow this procedure
1 Log in to Cisco HyperFlex Connect
2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle
3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours
You download an existing support bundle in the same way
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13
For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative
Cisco HyperFlex reference documents
Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet
Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30
Cisco HyperFlex Data Platform Administration Guide Release 30
Cisco HyperFlex Systems Troubleshooting Reference Guide 30
Cisco HyperFlex technical support documentation
VMware reference documents
Performance Best Practices for VMware vSphere 60
SAP Solutions on VMware Best Practices Guide
SAP HANA on VMware vSphere Wiki
Printed in USA C11-741524-00 1018
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 5 of 13
Data operations and distribution
The Cisco HyperFlex HX Data Platform controllers handle all read and write operation requests from the guest virtual machines to the virtual machine disks (VMDKs) stored in the distributed data stores in the cluster The data platform distributes the data across multiple nodes of the cluster and also across multiple capacity disks of each node according to the replication-level policy selected during the cluster setup This approach helps prevent storage hotspots on specific nodes and on specific disks of the nodes and thereby also helps prevent network hotspots and congestion that might result from accessing more data on some nodes than on others
Cisco HyperFlex Connect HTML 5 management webpage
An all-new HTML 5ndashbased web user interface is available for use as the primary management tool for Cisco HyperFlex systems (Figure 3) Through this centralized point of control for the cluster administrators can create volumes monitor data platform health and manage resource use Administrators can also use this data to predict when the cluster will need to be scaled To use the Cisco HyperFlex Connect user interface connect using a web browser to the Cisco HyperFlex cluster IP address httplthx controller cluster ipgt
Figure 3 Cisco HyperFlex Connect GUI
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 6 of 13
Cisco HyperFlex HCI solution for SAP HANA This section summarizes the Cisco HyperFlex HCI solution for SAP HANA
Virtual machine CPU configuration
The selection of the physical CPUs in the Cisco HyperFlex nodes is related to the virtual CPU (vCPU) configuration of the SAP application server virtual machines In particular you need to consider the number of physical cores per CPU socket when sizing the virtual machines and configuring the number of vCPUs they use
SAP HANA virtual machines
Take particular care to configure all the vCPUs for the SAP HANA virtual machines so that they are contained in a single non-uniform memory access (NUMA) node A basic sizing guideline is to configure an SAP HANA virtual machine with as many vCPUs as there are cores in a single physical CPU socket For example the reference server described here with Intelreg Xeonreg processor Gold series CPUs has 18 cores per socket In this case configuring the SAP HANA virtual machines with 18 vCPUs is recommended
Configuring a virtual machine with no more vCPUs than a single socket has cores helps ensure that all the virtual machine vCPUs and memory is scheduled within one NUMA node
Storage controller virtual machines
The SCVMs running on each Cisco HyperFlex node are each configured with a mandatory CPU reservation of 10800 MHz (108 GHz) During normal operation of a Cisco HyperFlex system the processor demands of the SCVMs are typically less than 50 percent of this value The ESXi hosts will not perform any throttling of CPU use unless the system nears 100 percent overall CPU use Only in those circumstances would the host set aside the 108 GHz of potential CPU performance because this performance is otherwise guaranteed to the SCVMs In most circumstances the CPU time of the hosts is not being taxed to the limit and therefore the reservation and CPU use of the SCVMs should have little overall impact on the performance of the virtual machines
SAP HANA virtual machine configuration for certification
The Cisco HyperFlex HCI solution for SAP HANA can run four SAP HANA virtual machines per four-node cluster and one SAP HANA virtual machine per node
The storage for the SAP HANA virtual machines was from the Cisco HyperFlex cluster storage pool where each Cisco HyperFlex node has 19 drives
The data log and shared file systems were used as block devices in the SAP HANA virtual machine
The SAP HANA virtual machines were installed and tested with SUSE Linux Enterprise Server (SLES) for SAP Applications 12 SP3 with SAP HANA 20003100
Recommendations To achieve better performance for SAP HANA with the Cisco HyperFlex HCI solution the recommendations described here are proposed
Requirements
To configure SAP HANA running on a Cisco HyperFlex 30 all-flash cluster you need
A functional and healthy running Cisco HyperFlex cluster
Cisco HyperFlex Release 301d or later
Cisco UCS Firmware Release 32(2g) or later
VMware ESXi 65 Update 2 or later
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 7 of 13
VMware vCenter Server Appliance
Appropriate software and licensing from SAP for SAP HANA
Physical components
Table 1 lists the physical components required for the Cisco HyperFlex system
Table 1 Cisco HyperFlex system components
Component Hardware required
Cisco Nexusreg Family switches 2 Cisco Nexus 9336C-FX2 Switches
Fabric interconnects 2 Cisco UCS 6332 Fabric Interconnects
Servers 4 Cisco HyperFlex HX240c M5SX All-Flash Node rack servers
Processor and memory configuration
Note the following guidelines for the processor and memory configuration
At present only four SAP HANA virtual machines are allowed to run on a four-node Cisco HyperFlex cluster with one SAP HANA virtual machine per node
Memory for the SAP HANA virtual machines should be allocated evenly and it must not be shared with nonndashSAP HANA virtual machines
All Cisco HyperFlex converged storage nodes run an SCVM that requires its own reserved virtual RAM On the HX240c All Flash Node server the Cisco HyperFlex SCVM has a memory reservation of 72 GB
For the best performance the amount of RAM configured for a single SAP HANA virtual machine should not exceed the amount of RAM in a single NUMA node
Configure SAP HANA virtual machines with a number of vCPUs that fit within a single NUMA node Review the virtual machinersquos vmwarelog file to help ensure that the configuration is correct
Network configuration
Two Cisco Nexus 9000 Series Switches are used for the uplink of the Cisco HyperFlex HCI solution to the customer network
According to the SAP HANA network requirements the SAP HANA virtual machines should have sufficient VMNICs created and allocated for this purpose The requirements can be found here
Disk sizing
Consult the SAP Quick Sizer to help determine the memory storage and processor requirements for the SAP HANA database to be used in your specific deployment
SAP HANA performance tuning
After SAP HANA is installed tune the parameters as shown in Table 2
Table 2 Tuning parameters
Parameter Data file system Log file system
max_parallel_io_requests 256 Default
async_read_submit On On
async_write_submit_blocks All All
async_write_submit_active Auto On
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 8 of 13
For SAP HANA 20 installations use either hdbsql or the SQL function in SAP HANA Studio or cockpit and the following SQL commands
ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileiomax_parallel_io_requests[Data]) = 256 WITH RECONFIGURE
ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileio async_write_submit_active [Data]) = Auto WITH RECONFIGURE
For more information refer to SAP Note 2399079 Elimination of hdbparam in HANA 2
Cisco HyperFlex HX Data Platform high availability The HX Data Platform high-availability feature helps ensure that the storage cluster maintains at least two copies of all your data during normal operation with three or more fully functional nodes
If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure
The number of nodes in the storage cluster in combination with the data replication factor and access policy settings determines the state of the storage cluster that results from node failures
Before using the HX Data Platform high-availability feature enable VMware Distributed Resource Scheduler (DRS) and vMotion on the vSphere Web Client
Cisco HyperFlex HX Data Platform cluster tolerated failures
If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure
How the number of node failures affects the storage cluster depends on the following
Number of nodes in the cluster The response by the storage cluster is different for clusters with three or four nodes than for clusters with five or more nodes
Data replication factor This factor is set during HX Data Platform installation and cannot be changed The options are either two or three redundant replicas of your data across the storage cluster
Access policy This policy can be changed from the default setting after the storage cluster is created The options are Strict for protecting against data loss and Lenient to support longer storage cluster availability
Cluster state and number of failed nodes
Tables 3 and 4 show how the storage cluster function changes depending on the number of simultaneous node failures
Table 3 Cluster with five or more nodes Cluster state depending on the number of failed nodes
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Table 4 Cluster with three or four nodes Cluster state depending on the number of failed nodes
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster state and number of nodes with failed disks
Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk
When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict
Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes
Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process
Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks
Replication factor Access policy Failed disks on number of different nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster access policy
The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration
Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures
Lenient This option applies policies to support longer storage cluster availability This setting is the default
Responses to storage cluster node failures
The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished
When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6
Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13
Table 6 Storage cluster failures and responses
Cluster size Number of simultaneous failures
Entity that failed Maintenance action to take
3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health
3 nodes 2 2 or more disks on 2 nodes blacklisted or failed
If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute
If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster
4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section
5 or more nodes 2 2 nodes with 2 or more disk failures on each node
The system automatically triggers a rebalance after 1 minute to restore storage cluster health
5 or more nodes 2 1 node and 1 or more disks on a different node
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication
The main monitoring pages provide information about the local Cisco HyperFlex storage cluster
Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status
Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13
Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth
System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon
Datastores View status information and tasks related to data stores
Virtual Machines View status information and tasks related to protection of virtual machines
Additional Cisco HyperFlex Connect pages provide management access
Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks
Dashboard
The dashboard shows several elements (Figure 4)
Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance
Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics
Cluster size and individual node health
Cluster IOPS storage throughput and latency for the past 1 hour
Figure 4 Dashboard view
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13
Monitor pages
Cisco HyperFlex Connect provides for additional monitoring capabilities including
Alarms Cluster alarms can be viewed acknowledged and reset
Events The cluster event log can be viewed specific events can be filtered for and the log can be exported
Activity Recent job activity can be viewed and the status can be monitored
System Information
The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode
Figure 5 System Information page
Generating a support bundle using Cisco HyperFlex Connect
You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect
Follow this procedure
1 Log in to Cisco HyperFlex Connect
2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle
3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours
You download an existing support bundle in the same way
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13
For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative
Cisco HyperFlex reference documents
Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet
Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30
Cisco HyperFlex Data Platform Administration Guide Release 30
Cisco HyperFlex Systems Troubleshooting Reference Guide 30
Cisco HyperFlex technical support documentation
VMware reference documents
Performance Best Practices for VMware vSphere 60
SAP Solutions on VMware Best Practices Guide
SAP HANA on VMware vSphere Wiki
Printed in USA C11-741524-00 1018
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 6 of 13
Cisco HyperFlex HCI solution for SAP HANA This section summarizes the Cisco HyperFlex HCI solution for SAP HANA
Virtual machine CPU configuration
The selection of the physical CPUs in the Cisco HyperFlex nodes is related to the virtual CPU (vCPU) configuration of the SAP application server virtual machines In particular you need to consider the number of physical cores per CPU socket when sizing the virtual machines and configuring the number of vCPUs they use
SAP HANA virtual machines
Take particular care to configure all the vCPUs for the SAP HANA virtual machines so that they are contained in a single non-uniform memory access (NUMA) node A basic sizing guideline is to configure an SAP HANA virtual machine with as many vCPUs as there are cores in a single physical CPU socket For example the reference server described here with Intelreg Xeonreg processor Gold series CPUs has 18 cores per socket In this case configuring the SAP HANA virtual machines with 18 vCPUs is recommended
Configuring a virtual machine with no more vCPUs than a single socket has cores helps ensure that all the virtual machine vCPUs and memory is scheduled within one NUMA node
Storage controller virtual machines
The SCVMs running on each Cisco HyperFlex node are each configured with a mandatory CPU reservation of 10800 MHz (108 GHz) During normal operation of a Cisco HyperFlex system the processor demands of the SCVMs are typically less than 50 percent of this value The ESXi hosts will not perform any throttling of CPU use unless the system nears 100 percent overall CPU use Only in those circumstances would the host set aside the 108 GHz of potential CPU performance because this performance is otherwise guaranteed to the SCVMs In most circumstances the CPU time of the hosts is not being taxed to the limit and therefore the reservation and CPU use of the SCVMs should have little overall impact on the performance of the virtual machines
SAP HANA virtual machine configuration for certification
The Cisco HyperFlex HCI solution for SAP HANA can run four SAP HANA virtual machines per four-node cluster and one SAP HANA virtual machine per node
The storage for the SAP HANA virtual machines was from the Cisco HyperFlex cluster storage pool where each Cisco HyperFlex node has 19 drives
The data log and shared file systems were used as block devices in the SAP HANA virtual machine
The SAP HANA virtual machines were installed and tested with SUSE Linux Enterprise Server (SLES) for SAP Applications 12 SP3 with SAP HANA 20003100
Recommendations To achieve better performance for SAP HANA with the Cisco HyperFlex HCI solution the recommendations described here are proposed
Requirements
To configure SAP HANA running on a Cisco HyperFlex 30 all-flash cluster you need
A functional and healthy running Cisco HyperFlex cluster
Cisco HyperFlex Release 301d or later
Cisco UCS Firmware Release 32(2g) or later
VMware ESXi 65 Update 2 or later
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 7 of 13
VMware vCenter Server Appliance
Appropriate software and licensing from SAP for SAP HANA
Physical components
Table 1 lists the physical components required for the Cisco HyperFlex system
Table 1 Cisco HyperFlex system components
Component Hardware required
Cisco Nexusreg Family switches 2 Cisco Nexus 9336C-FX2 Switches
Fabric interconnects 2 Cisco UCS 6332 Fabric Interconnects
Servers 4 Cisco HyperFlex HX240c M5SX All-Flash Node rack servers
Processor and memory configuration
Note the following guidelines for the processor and memory configuration
At present only four SAP HANA virtual machines are allowed to run on a four-node Cisco HyperFlex cluster with one SAP HANA virtual machine per node
Memory for the SAP HANA virtual machines should be allocated evenly and it must not be shared with nonndashSAP HANA virtual machines
All Cisco HyperFlex converged storage nodes run an SCVM that requires its own reserved virtual RAM On the HX240c All Flash Node server the Cisco HyperFlex SCVM has a memory reservation of 72 GB
For the best performance the amount of RAM configured for a single SAP HANA virtual machine should not exceed the amount of RAM in a single NUMA node
Configure SAP HANA virtual machines with a number of vCPUs that fit within a single NUMA node Review the virtual machinersquos vmwarelog file to help ensure that the configuration is correct
Network configuration
Two Cisco Nexus 9000 Series Switches are used for the uplink of the Cisco HyperFlex HCI solution to the customer network
According to the SAP HANA network requirements the SAP HANA virtual machines should have sufficient VMNICs created and allocated for this purpose The requirements can be found here
Disk sizing
Consult the SAP Quick Sizer to help determine the memory storage and processor requirements for the SAP HANA database to be used in your specific deployment
SAP HANA performance tuning
After SAP HANA is installed tune the parameters as shown in Table 2
Table 2 Tuning parameters
Parameter Data file system Log file system
max_parallel_io_requests 256 Default
async_read_submit On On
async_write_submit_blocks All All
async_write_submit_active Auto On
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 8 of 13
For SAP HANA 20 installations use either hdbsql or the SQL function in SAP HANA Studio or cockpit and the following SQL commands
ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileiomax_parallel_io_requests[Data]) = 256 WITH RECONFIGURE
ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileio async_write_submit_active [Data]) = Auto WITH RECONFIGURE
For more information refer to SAP Note 2399079 Elimination of hdbparam in HANA 2
Cisco HyperFlex HX Data Platform high availability The HX Data Platform high-availability feature helps ensure that the storage cluster maintains at least two copies of all your data during normal operation with three or more fully functional nodes
If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure
The number of nodes in the storage cluster in combination with the data replication factor and access policy settings determines the state of the storage cluster that results from node failures
Before using the HX Data Platform high-availability feature enable VMware Distributed Resource Scheduler (DRS) and vMotion on the vSphere Web Client
Cisco HyperFlex HX Data Platform cluster tolerated failures
If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure
How the number of node failures affects the storage cluster depends on the following
Number of nodes in the cluster The response by the storage cluster is different for clusters with three or four nodes than for clusters with five or more nodes
Data replication factor This factor is set during HX Data Platform installation and cannot be changed The options are either two or three redundant replicas of your data across the storage cluster
Access policy This policy can be changed from the default setting after the storage cluster is created The options are Strict for protecting against data loss and Lenient to support longer storage cluster availability
Cluster state and number of failed nodes
Tables 3 and 4 show how the storage cluster function changes depending on the number of simultaneous node failures
Table 3 Cluster with five or more nodes Cluster state depending on the number of failed nodes
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Table 4 Cluster with three or four nodes Cluster state depending on the number of failed nodes
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster state and number of nodes with failed disks
Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk
When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict
Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes
Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process
Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks
Replication factor Access policy Failed disks on number of different nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster access policy
The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration
Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures
Lenient This option applies policies to support longer storage cluster availability This setting is the default
Responses to storage cluster node failures
The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished
When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6
Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13
Table 6 Storage cluster failures and responses
Cluster size Number of simultaneous failures
Entity that failed Maintenance action to take
3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health
3 nodes 2 2 or more disks on 2 nodes blacklisted or failed
If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute
If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster
4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section
5 or more nodes 2 2 nodes with 2 or more disk failures on each node
The system automatically triggers a rebalance after 1 minute to restore storage cluster health
5 or more nodes 2 1 node and 1 or more disks on a different node
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication
The main monitoring pages provide information about the local Cisco HyperFlex storage cluster
Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status
Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13
Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth
System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon
Datastores View status information and tasks related to data stores
Virtual Machines View status information and tasks related to protection of virtual machines
Additional Cisco HyperFlex Connect pages provide management access
Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks
Dashboard
The dashboard shows several elements (Figure 4)
Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance
Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics
Cluster size and individual node health
Cluster IOPS storage throughput and latency for the past 1 hour
Figure 4 Dashboard view
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13
Monitor pages
Cisco HyperFlex Connect provides for additional monitoring capabilities including
Alarms Cluster alarms can be viewed acknowledged and reset
Events The cluster event log can be viewed specific events can be filtered for and the log can be exported
Activity Recent job activity can be viewed and the status can be monitored
System Information
The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode
Figure 5 System Information page
Generating a support bundle using Cisco HyperFlex Connect
You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect
Follow this procedure
1 Log in to Cisco HyperFlex Connect
2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle
3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours
You download an existing support bundle in the same way
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13
For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative
Cisco HyperFlex reference documents
Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet
Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30
Cisco HyperFlex Data Platform Administration Guide Release 30
Cisco HyperFlex Systems Troubleshooting Reference Guide 30
Cisco HyperFlex technical support documentation
VMware reference documents
Performance Best Practices for VMware vSphere 60
SAP Solutions on VMware Best Practices Guide
SAP HANA on VMware vSphere Wiki
Printed in USA C11-741524-00 1018
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 7 of 13
VMware vCenter Server Appliance
Appropriate software and licensing from SAP for SAP HANA
Physical components
Table 1 lists the physical components required for the Cisco HyperFlex system
Table 1 Cisco HyperFlex system components
Component Hardware required
Cisco Nexusreg Family switches 2 Cisco Nexus 9336C-FX2 Switches
Fabric interconnects 2 Cisco UCS 6332 Fabric Interconnects
Servers 4 Cisco HyperFlex HX240c M5SX All-Flash Node rack servers
Processor and memory configuration
Note the following guidelines for the processor and memory configuration
At present only four SAP HANA virtual machines are allowed to run on a four-node Cisco HyperFlex cluster with one SAP HANA virtual machine per node
Memory for the SAP HANA virtual machines should be allocated evenly and it must not be shared with nonndashSAP HANA virtual machines
All Cisco HyperFlex converged storage nodes run an SCVM that requires its own reserved virtual RAM On the HX240c All Flash Node server the Cisco HyperFlex SCVM has a memory reservation of 72 GB
For the best performance the amount of RAM configured for a single SAP HANA virtual machine should not exceed the amount of RAM in a single NUMA node
Configure SAP HANA virtual machines with a number of vCPUs that fit within a single NUMA node Review the virtual machinersquos vmwarelog file to help ensure that the configuration is correct
Network configuration
Two Cisco Nexus 9000 Series Switches are used for the uplink of the Cisco HyperFlex HCI solution to the customer network
According to the SAP HANA network requirements the SAP HANA virtual machines should have sufficient VMNICs created and allocated for this purpose The requirements can be found here
Disk sizing
Consult the SAP Quick Sizer to help determine the memory storage and processor requirements for the SAP HANA database to be used in your specific deployment
SAP HANA performance tuning
After SAP HANA is installed tune the parameters as shown in Table 2
Table 2 Tuning parameters
Parameter Data file system Log file system
max_parallel_io_requests 256 Default
async_read_submit On On
async_write_submit_blocks All All
async_write_submit_active Auto On
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 8 of 13
For SAP HANA 20 installations use either hdbsql or the SQL function in SAP HANA Studio or cockpit and the following SQL commands
ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileiomax_parallel_io_requests[Data]) = 256 WITH RECONFIGURE
ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileio async_write_submit_active [Data]) = Auto WITH RECONFIGURE
For more information refer to SAP Note 2399079 Elimination of hdbparam in HANA 2
Cisco HyperFlex HX Data Platform high availability The HX Data Platform high-availability feature helps ensure that the storage cluster maintains at least two copies of all your data during normal operation with three or more fully functional nodes
If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure
The number of nodes in the storage cluster in combination with the data replication factor and access policy settings determines the state of the storage cluster that results from node failures
Before using the HX Data Platform high-availability feature enable VMware Distributed Resource Scheduler (DRS) and vMotion on the vSphere Web Client
Cisco HyperFlex HX Data Platform cluster tolerated failures
If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure
How the number of node failures affects the storage cluster depends on the following
Number of nodes in the cluster The response by the storage cluster is different for clusters with three or four nodes than for clusters with five or more nodes
Data replication factor This factor is set during HX Data Platform installation and cannot be changed The options are either two or three redundant replicas of your data across the storage cluster
Access policy This policy can be changed from the default setting after the storage cluster is created The options are Strict for protecting against data loss and Lenient to support longer storage cluster availability
Cluster state and number of failed nodes
Tables 3 and 4 show how the storage cluster function changes depending on the number of simultaneous node failures
Table 3 Cluster with five or more nodes Cluster state depending on the number of failed nodes
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Table 4 Cluster with three or four nodes Cluster state depending on the number of failed nodes
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster state and number of nodes with failed disks
Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk
When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict
Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes
Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process
Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks
Replication factor Access policy Failed disks on number of different nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster access policy
The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration
Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures
Lenient This option applies policies to support longer storage cluster availability This setting is the default
Responses to storage cluster node failures
The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished
When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6
Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13
Table 6 Storage cluster failures and responses
Cluster size Number of simultaneous failures
Entity that failed Maintenance action to take
3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health
3 nodes 2 2 or more disks on 2 nodes blacklisted or failed
If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute
If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster
4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section
5 or more nodes 2 2 nodes with 2 or more disk failures on each node
The system automatically triggers a rebalance after 1 minute to restore storage cluster health
5 or more nodes 2 1 node and 1 or more disks on a different node
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication
The main monitoring pages provide information about the local Cisco HyperFlex storage cluster
Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status
Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13
Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth
System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon
Datastores View status information and tasks related to data stores
Virtual Machines View status information and tasks related to protection of virtual machines
Additional Cisco HyperFlex Connect pages provide management access
Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks
Dashboard
The dashboard shows several elements (Figure 4)
Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance
Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics
Cluster size and individual node health
Cluster IOPS storage throughput and latency for the past 1 hour
Figure 4 Dashboard view
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13
Monitor pages
Cisco HyperFlex Connect provides for additional monitoring capabilities including
Alarms Cluster alarms can be viewed acknowledged and reset
Events The cluster event log can be viewed specific events can be filtered for and the log can be exported
Activity Recent job activity can be viewed and the status can be monitored
System Information
The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode
Figure 5 System Information page
Generating a support bundle using Cisco HyperFlex Connect
You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect
Follow this procedure
1 Log in to Cisco HyperFlex Connect
2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle
3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours
You download an existing support bundle in the same way
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13
For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative
Cisco HyperFlex reference documents
Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet
Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30
Cisco HyperFlex Data Platform Administration Guide Release 30
Cisco HyperFlex Systems Troubleshooting Reference Guide 30
Cisco HyperFlex technical support documentation
VMware reference documents
Performance Best Practices for VMware vSphere 60
SAP Solutions on VMware Best Practices Guide
SAP HANA on VMware vSphere Wiki
Printed in USA C11-741524-00 1018
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 8 of 13
For SAP HANA 20 installations use either hdbsql or the SQL function in SAP HANA Studio or cockpit and the following SQL commands
ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileiomax_parallel_io_requests[Data]) = 256 WITH RECONFIGURE
ALTER SYSTEM ALTER CONFIGURATION (globalini SYSTEM) SET (fileio fileio async_write_submit_active [Data]) = Auto WITH RECONFIGURE
For more information refer to SAP Note 2399079 Elimination of hdbparam in HANA 2
Cisco HyperFlex HX Data Platform high availability The HX Data Platform high-availability feature helps ensure that the storage cluster maintains at least two copies of all your data during normal operation with three or more fully functional nodes
If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure
The number of nodes in the storage cluster in combination with the data replication factor and access policy settings determines the state of the storage cluster that results from node failures
Before using the HX Data Platform high-availability feature enable VMware Distributed Resource Scheduler (DRS) and vMotion on the vSphere Web Client
Cisco HyperFlex HX Data Platform cluster tolerated failures
If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail the situation is called a simultaneous failure
How the number of node failures affects the storage cluster depends on the following
Number of nodes in the cluster The response by the storage cluster is different for clusters with three or four nodes than for clusters with five or more nodes
Data replication factor This factor is set during HX Data Platform installation and cannot be changed The options are either two or three redundant replicas of your data across the storage cluster
Access policy This policy can be changed from the default setting after the storage cluster is created The options are Strict for protecting against data loss and Lenient to support longer storage cluster availability
Cluster state and number of failed nodes
Tables 3 and 4 show how the storage cluster function changes depending on the number of simultaneous node failures
Table 3 Cluster with five or more nodes Cluster state depending on the number of failed nodes
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Table 4 Cluster with three or four nodes Cluster state depending on the number of failed nodes
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster state and number of nodes with failed disks
Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk
When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict
Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes
Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process
Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks
Replication factor Access policy Failed disks on number of different nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster access policy
The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration
Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures
Lenient This option applies policies to support longer storage cluster availability This setting is the default
Responses to storage cluster node failures
The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished
When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6
Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13
Table 6 Storage cluster failures and responses
Cluster size Number of simultaneous failures
Entity that failed Maintenance action to take
3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health
3 nodes 2 2 or more disks on 2 nodes blacklisted or failed
If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute
If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster
4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section
5 or more nodes 2 2 nodes with 2 or more disk failures on each node
The system automatically triggers a rebalance after 1 minute to restore storage cluster health
5 or more nodes 2 1 node and 1 or more disks on a different node
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication
The main monitoring pages provide information about the local Cisco HyperFlex storage cluster
Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status
Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13
Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth
System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon
Datastores View status information and tasks related to data stores
Virtual Machines View status information and tasks related to protection of virtual machines
Additional Cisco HyperFlex Connect pages provide management access
Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks
Dashboard
The dashboard shows several elements (Figure 4)
Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance
Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics
Cluster size and individual node health
Cluster IOPS storage throughput and latency for the past 1 hour
Figure 4 Dashboard view
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13
Monitor pages
Cisco HyperFlex Connect provides for additional monitoring capabilities including
Alarms Cluster alarms can be viewed acknowledged and reset
Events The cluster event log can be viewed specific events can be filtered for and the log can be exported
Activity Recent job activity can be viewed and the status can be monitored
System Information
The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode
Figure 5 System Information page
Generating a support bundle using Cisco HyperFlex Connect
You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect
Follow this procedure
1 Log in to Cisco HyperFlex Connect
2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle
3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours
You download an existing support bundle in the same way
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13
For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative
Cisco HyperFlex reference documents
Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet
Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30
Cisco HyperFlex Data Platform Administration Guide Release 30
Cisco HyperFlex Systems Troubleshooting Reference Guide 30
Cisco HyperFlex technical support documentation
VMware reference documents
Performance Best Practices for VMware vSphere 60
SAP Solutions on VMware Best Practices Guide
SAP HANA on VMware vSphere Wiki
Printed in USA C11-741524-00 1018
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 9 of 13
Replication factor Access policy Number of failed nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster state and number of nodes with failed disks
Table 5 shows how the storage cluster function changes with the number of nodes that have one or more failed disks Note that the node itself has not failed but disks within the node have failed For example 2 indicates that there are two nodes that each have at least one failed disk
When the table refers to multiple disk failures it is referring to the disks used for storage capacity For example if a cache SSD fails on one node and a capacity SSD fails on another node the storage cluster remains highly available even with an access policy setting of Strict
Table 5 lists the worst case scenario for the listed number of failed disks It applies to any storage cluster of three or more nodes
Note Cisco HyperFlex storage clusters can sustain serial disk failures (separate disk failures over time) The only requirement is that sufficient storage capacity must be available to support self-healing The worst-case scenarios listed in Table 5 apply only during the small window in which the Cisco HyperFlex cluster is completing the automatic self-healing and rebalancing process
Table 5 Cluster with three or more nodes Cluster state depending on the number of nodes with failed disks
Replication factor Access policy Failed disks on number of different nodes
Read-write Read only Shutdown
2 Lenient 1 ndash 2
2 Strict ndash 1 2
Cluster access policy
The cluster access policy works with the data replication factor to set levels of data protection and data loss prevention Two cluster access policy options are available The default setting is Lenient This setting is not configurable during installation but it can be changed after installation and initial storage cluster configuration
Strict This option applies policies to protect against data loss If nodes or disks in the storage cluster fail the clusters ability to function is affected If more than one node fails or one node and one or more disks on a different node fail this situation is called a simultaneous failure The Strict setting helps protect the data in the event of simultaneous failures
Lenient This option applies policies to support longer storage cluster availability This setting is the default
Responses to storage cluster node failures
The storage cluster healing timeout is the length of time that the HX Data Platform plug-in waits before automatically healing the storage cluster If a disk fails the healing timeout is 1 minute If a node fails the healing timeout is 2 hours A node failure timeout takes priority if a disk and a node fail at same time or if a disk fails after a node failure but before the healing process is finished
When the cluster resiliency status is Warning the HX Data Platform supports the storage cluster failures and responses listed in Table 6
Optionally click the associated Cluster Status or Resiliency Status icons in the HX Data Platform plug-in to display reason messages that explain what is contributing to the current state
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13
Table 6 Storage cluster failures and responses
Cluster size Number of simultaneous failures
Entity that failed Maintenance action to take
3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health
3 nodes 2 2 or more disks on 2 nodes blacklisted or failed
If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute
If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster
4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section
5 or more nodes 2 2 nodes with 2 or more disk failures on each node
The system automatically triggers a rebalance after 1 minute to restore storage cluster health
5 or more nodes 2 1 node and 1 or more disks on a different node
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication
The main monitoring pages provide information about the local Cisco HyperFlex storage cluster
Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status
Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13
Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth
System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon
Datastores View status information and tasks related to data stores
Virtual Machines View status information and tasks related to protection of virtual machines
Additional Cisco HyperFlex Connect pages provide management access
Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks
Dashboard
The dashboard shows several elements (Figure 4)
Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance
Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics
Cluster size and individual node health
Cluster IOPS storage throughput and latency for the past 1 hour
Figure 4 Dashboard view
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13
Monitor pages
Cisco HyperFlex Connect provides for additional monitoring capabilities including
Alarms Cluster alarms can be viewed acknowledged and reset
Events The cluster event log can be viewed specific events can be filtered for and the log can be exported
Activity Recent job activity can be viewed and the status can be monitored
System Information
The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode
Figure 5 System Information page
Generating a support bundle using Cisco HyperFlex Connect
You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect
Follow this procedure
1 Log in to Cisco HyperFlex Connect
2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle
3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours
You download an existing support bundle in the same way
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13
For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative
Cisco HyperFlex reference documents
Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet
Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30
Cisco HyperFlex Data Platform Administration Guide Release 30
Cisco HyperFlex Systems Troubleshooting Reference Guide 30
Cisco HyperFlex technical support documentation
VMware reference documents
Performance Best Practices for VMware vSphere 60
SAP Solutions on VMware Best Practices Guide
SAP HANA on VMware vSphere Wiki
Printed in USA C11-741524-00 1018
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 10 of 13
Table 6 Storage cluster failures and responses
Cluster size Number of simultaneous failures
Entity that failed Maintenance action to take
3 nodes 1 1 node The storage cluster does not automatically heal Replace the failed node to restore storage cluster health
3 nodes 2 2 or more disks on 2 nodes blacklisted or failed
If one SSD fails or is removed the disk is blacklisted immediately The storage cluster automatically begins healing within 1 minute
If more than one SSD fails the system may not automatically restore storage cluster health If the system is not restored replace the faulty disks and restore the system by rebalancing the cluster
4 nodes 1 1 node If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
4 nodes 2 2 or more disks on 2 nodes If 2 SSDs fail the storage cluster does not automatically heal
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
5 or more nodes 2 Up to 2 nodes If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
If the storage cluster shuts down see Troubleshooting Two Nodes Fail Simultaneously Causes the Storage Cluster to Shutdown section
5 or more nodes 2 2 nodes with 2 or more disk failures on each node
The system automatically triggers a rebalance after 1 minute to restore storage cluster health
5 or more nodes 2 1 node and 1 or more disks on a different node
If the disk does not recover in 1 minute the storage cluster starts healing by rebalancing data on the remaining nodes
If the node does not recover in 2 hours the storage cluster starts healing by rebalancing data on the remaining nodes
If a node in the storage cluster fails and a disk on a different node also fails the storage cluster starts healing the failed disk (without touching the data on the failed node) in 1 minute If the failed node does not come back up after 2 hours the storage cluster starts healing the failed node as well
To recover the failed node immediately and fully restore the storage cluster Check that the node is powered on and restart it if possible You
may need to replace the node Rebalance the cluster
Cisco HyperFlex cluster health check and monitoring with Cisco HyperFlex Connect The Cisco HyperFlex Connect user interface provides a view of the Cisco HyperFlex storage cluster status components and features such as encryption and replication
The main monitoring pages provide information about the local Cisco HyperFlex storage cluster
Dashboard The dashboard shows the overall Cisco HyperFlex storage cluster status
Alarms Events and Activity See the Cisco HyperFlex Systems Troubleshooting Reference Guide for details
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13
Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth
System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon
Datastores View status information and tasks related to data stores
Virtual Machines View status information and tasks related to protection of virtual machines
Additional Cisco HyperFlex Connect pages provide management access
Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks
Dashboard
The dashboard shows several elements (Figure 4)
Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance
Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics
Cluster size and individual node health
Cluster IOPS storage throughput and latency for the past 1 hour
Figure 4 Dashboard view
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13
Monitor pages
Cisco HyperFlex Connect provides for additional monitoring capabilities including
Alarms Cluster alarms can be viewed acknowledged and reset
Events The cluster event log can be viewed specific events can be filtered for and the log can be exported
Activity Recent job activity can be viewed and the status can be monitored
System Information
The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode
Figure 5 System Information page
Generating a support bundle using Cisco HyperFlex Connect
You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect
Follow this procedure
1 Log in to Cisco HyperFlex Connect
2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle
3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours
You download an existing support bundle in the same way
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13
For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative
Cisco HyperFlex reference documents
Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet
Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30
Cisco HyperFlex Data Platform Administration Guide Release 30
Cisco HyperFlex Systems Troubleshooting Reference Guide 30
Cisco HyperFlex technical support documentation
VMware reference documents
Performance Best Practices for VMware vSphere 60
SAP Solutions on VMware Best Practices Guide
SAP HANA on VMware vSphere Wiki
Printed in USA C11-741524-00 1018
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 11 of 13
Performance See charts showing IO operations per second (IOPS) throughput latency and replication network bandwidth
System Information Get a system overview plus view status information and tasks for nodes and disks See the Cisco HyperFlex Systems Troubleshooting Reference Guide for information about generating support bundles see Storage Cluster Maintenance Operations Overview for information about entering and exiting maintenance mode and see Setting a Beacon for information about setting a node or disk beacon
Datastores View status information and tasks related to data stores
Virtual Machines View status information and tasks related to protection of virtual machines
Additional Cisco HyperFlex Connect pages provide management access
Encryption Use this page for data at rest disk and node encryption tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Replication Use this page for disaster recovery virtual machine protection tasks Refer the section Cisco HyperFlex Connect HTML 5 management webpage for the screenshot
Upgrade The Upgrade page provides access to HX Data Platform and Cisco UCS Manager firmware upgrade tasks
Dashboard
The dashboard shows several elements (Figure 4)
Cluster operational status overall cluster health and the clusterrsquos current node failure tolerance
Cluster storage capacity used and free space compression and deduplication savings and overall cluster storage optimization statistics
Cluster size and individual node health
Cluster IOPS storage throughput and latency for the past 1 hour
Figure 4 Dashboard view
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13
Monitor pages
Cisco HyperFlex Connect provides for additional monitoring capabilities including
Alarms Cluster alarms can be viewed acknowledged and reset
Events The cluster event log can be viewed specific events can be filtered for and the log can be exported
Activity Recent job activity can be viewed and the status can be monitored
System Information
The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode
Figure 5 System Information page
Generating a support bundle using Cisco HyperFlex Connect
You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect
Follow this procedure
1 Log in to Cisco HyperFlex Connect
2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle
3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours
You download an existing support bundle in the same way
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13
For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative
Cisco HyperFlex reference documents
Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet
Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30
Cisco HyperFlex Data Platform Administration Guide Release 30
Cisco HyperFlex Systems Troubleshooting Reference Guide 30
Cisco HyperFlex technical support documentation
VMware reference documents
Performance Best Practices for VMware vSphere 60
SAP Solutions on VMware Best Practices Guide
SAP HANA on VMware vSphere Wiki
Printed in USA C11-741524-00 1018
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 12 of 13
Monitor pages
Cisco HyperFlex Connect provides for additional monitoring capabilities including
Alarms Cluster alarms can be viewed acknowledged and reset
Events The cluster event log can be viewed specific events can be filtered for and the log can be exported
Activity Recent job activity can be viewed and the status can be monitored
System Information
The System Information page (Figure 5) provides a complete health snapshot of the cluster This page displays Cisco HyperFlex storage cluster system-related information including node and disk data It also provides access to Cisco HyperFlex maintenance mode
Figure 5 System Information page
Generating a support bundle using Cisco HyperFlex Connect
You can use the Cisco HyperFlex Connect user interface to generate a support bundle that collects the logs from every controller virtual machine and ESXi host in the local Cisco HyperFlex storage cluster If you are using replication to protect your virtual machines and their data when you need to generate a support bundle you also need to generate a support bundle from the remote Cisco HyperFlex storage cluster The vCenter logs are not collected through Cisco HyperFlex Connect
Follow this procedure
1 Log in to Cisco HyperFlex Connect
2 In the banner click Edit Settings (the gear icon) gt Support Bundle Alternatively click System Information gt Support Bundle
3 When the support bundle is displayed click supportBundlezip to download the file to your local computer Downloading the support bundle can take 1 to 2 hours
You download an existing support bundle in the same way
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13
For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative
Cisco HyperFlex reference documents
Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet
Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30
Cisco HyperFlex Data Platform Administration Guide Release 30
Cisco HyperFlex Systems Troubleshooting Reference Guide 30
Cisco HyperFlex technical support documentation
VMware reference documents
Performance Best Practices for VMware vSphere 60
SAP Solutions on VMware Best Practices Guide
SAP HANA on VMware vSphere Wiki
Printed in USA C11-741524-00 1018
White Paper
copy 2018 Cisco andor its affiliates All rights reserved This document is Cisco Public Information Page 13 of 13
For more information The following documents available online provide additional relevant information If you do not have access to a document please contact your Cisco representative
Cisco HyperFlex reference documents
Cisco HyperFlex HXAF240c M5 (All Flash) spec sheet
Cisco HyperFlex Systems Installation Guide for VMware ESXi Release 30
Cisco HyperFlex Data Platform Administration Guide Release 30
Cisco HyperFlex Systems Troubleshooting Reference Guide 30
Cisco HyperFlex technical support documentation
VMware reference documents
Performance Best Practices for VMware vSphere 60
SAP Solutions on VMware Best Practices Guide
SAP HANA on VMware vSphere Wiki
Printed in USA C11-741524-00 1018