active-active datacenter solution technical white paper ......2019/01/03  · huawei optix osn...

75
Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper Issue 1.2 Date 2017-12-30 HUAWEI TECHNOLOGIES CO., LTD.

Upload: others

Post on 09-Feb-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Huawei OceanStor Dorado V3 All-Flash Storage System

    Active-Active Data Center Solution Technical White Paper

    Issue 1.2

    Date 2017-12-30

    HUAWEI TECHNOLOGIES CO., LTD.

  • Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    i

    Copyright © Huawei Technologies Co., Ltd. 2017. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd. Trademarks and Permissions

    and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders. Notice The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied.

    Huawei Technologies Co., Ltd.

    Address: Huawei Industrial Base Bantian, Longgang Shenzhen 518129 People's Republic of China

    Website: http://e.huawei.com

    http://e.huawei.com/

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper Contents

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    ii

    Contents

    1 Challenges and Trends of DR Construction......................................................................... 1

    2 Huawei Active-Active DC Solution ....................................................................................... 2 2.1 Active-Active Data Center Architecture ................................................................................................................. 2 2.2 Active-Active DC Deployment .............................................................................................................................. 4 2.3 Customer Benefits ................................................................................................................................................. 5

    3 Key Technologies ..................................................................................................................... 7 3.1 Active-Active Storage Systems .............................................................................................................................. 7 3.1.1 Active-Active Architecture .................................................................................................................................. 8 3.1.1.1 Parallel Access.................................................................................................................................................. 8 3.1.1.2 Gateway-Free Design ....................................................................................................................................... 8 3.1.1.3 I/O Access Path ................................................................................................................................................ 9 3.1.1.4 Storage-Layer Network ..................................................................................................................................... 9 3.1.2 Robust Reliability ..............................................................................................................................................10 3.1.2.1 Cross-Site Clustering ...................................................................................................................................... 10 3.1.2.2 Cross-Site Real-Time Data Mirroring .............................................................................................................. 12 3.1.2.3 Cross-Site Bad Block Repair ........................................................................................................................... 13 3.1.2.4 Split-Brain Prevention by Arbitration .............................................................................................................. 14 3.1.2.5 High-Reliability Link Design .......................................................................................................................... 19 3.1.3 High Performance ..............................................................................................................................................20 3.1.3.1 FastWrite ........................................................................................................................................................ 20 3.1.3.2 Optimized Cross-Site Access .......................................................................................................................... 21 3.1.3.3 Optimistic Locking ......................................................................................................................................... 22 3.1.4 Efficient Technologies .......................................................................................................................................23 3.1.4.1 Compacted Data Copying ............................................................................................................................... 23 3.1.4.2 SmartDedupe&SmartCompression .................................................................................................................. 24 3.1.5 Flexible Scalability ............................................................................................................................................25 3.1.5.1 Online Expansion to Active-Active Arrays ...................................................................................................... 25 3.1.5.2 Local Protection ............................................................................................................................................. 26 3.2 Active-Active Design at the Computing Layer ......................................................................................................27 3.3 Active-Active Design at the Application Layer ......................................................................................................28 3.3.1 Active-Active B/S Applications ..........................................................................................................................28 3.3.2 Active-Active C/S Applications ..........................................................................................................................30

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper Contents

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    iii

    3.3.3 Active-Active Databases ....................................................................................................................................32 3.3.3.1 Active-Active Oracle RAC.............................................................................................................................. 33 3.3.3.2 Active-Active DB2 ......................................................................................................................................... 34 3.3.3.3 Active-Active SQL Server............................................................................................................................... 35 3.4 Active-Active Design at the Network Layer ..........................................................................................................36 3.4.1 Network Architecture .........................................................................................................................................36 3.4.2 Network Across Data Centers ............................................................................................................................37 3.4.3 Network Architecture for Service Access............................................................................................................39 3.4.3.1 B/S Application Network Architecture ............................................................................................................ 39 3.4.3.2 C/S Application Network Architecture ............................................................................................................ 40 3.4.4 Layer-2 Interconnection .....................................................................................................................................43 3.4.5 Load Balancing .................................................................................................................................................45 3.4.5.1 Inter-Site Load Balancing ............................................................................................................................... 45 3.4.5.2 Intra-Site Load Balancing ............................................................................................................................... 47 3.5 Transmission-Layer Technologies .........................................................................................................................49

    4 Visualized DR Management................................................................................................. 50 4.1 Overall Deployment .............................................................................................................................................50 4.1.1 Deployment Mode .............................................................................................................................................50 4.1.2 System Running Environment ............................................................................................................................51 4.2 Application Support Matrix...................................................................................................................................53 4.3 Applicable DR Scenarios ......................................................................................................................................53 4.3.1 Active-Active SAN Scenarios ............................................................................................................................53 4.3.1.1 Technical Characteristics................................................................................................................................. 53 4.3.1.2 Physical Topology .......................................................................................................................................... 54 4.3.1.3 Logical Topology ............................................................................................................................................ 54

    5 Fault Scenarios ........................................................................................................................ 56 5.1 GSLB Fault ..........................................................................................................................................................56 5.2 SLB Fault .............................................................................................................................................................57 5.3 Web-Server Fault ..................................................................................................................................................59 5.4 Application Server Fault .......................................................................................................................................61 5.5 Oracle RAC Fault .................................................................................................................................................64 5.6 IBM DB2 Fault ....................................................................................................................................................65 5.7 Storage Array Fault ...............................................................................................................................................66 5.8 WAN Link Fault ...................................................................................................................................................68 5.9 Inter-Site Link Fault .............................................................................................................................................69 5.10 Site Fault ............................................................................................................................................................70

    6 Acronyms and Abbreviations ............................................................................................... 71

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 1 Challenges and Trends of DR Construction

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    1

    1 Challenges and Trends of DR Construction

    With the rapid development of the information technology (IT), information systems are becoming ever important to critical services in a variety of industries. Service interruptions in information systems may lead to severe economic loss, damaged brand images, or critical data loss, especially in industries such as communication, finance, medical care, e-commerce, logistics, and government. Therefore, business continuity is critical to the construction of information systems.

    Currently, business continuity is typically improved by building disaster recovery (DR) centers where copies of production data are saved. In a traditional DR solution, a DR center is deployed corresponding to a production center. The DR center does not provide service access unless the production center encounters a disaster that leads to service breakdown unrepairable in a short period of time. Such DR systems are facing the following challenges:

    When the production center encounters power supply failures, fires, floods, or earthquakes, manual operation is required to switch services to the DR system. Professional recovery measures and debugging are also required. These disasters may cause long-term service interruption and business discontinuity.

    The DR center does not provide services and remains idle most of the time, lowering resource usage.

    To meet customer requirements on efficient resource usage, load balancing, and automatic switchover between two data centers (DCs), Huawei launches the end-to-end Active-Active DC Solution.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 2 Huawei Active-Active DC Solution

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    2

    2 Huawei Active-Active DC Solution The Active-Active DC Solution means that both DCs run concurrently to share service loads and to improve the overall service capability and resource usage.

    Currently, DCs can work in either active-passive mode or active-active mode.

    In active-passive mode, some services run on DC A, with DC B as the hot backup, while other services run on DC B, with DC A as the hot backup. This achieves approximate active-active effects.

    In active-active mode, all I/O paths are allowed access to an active-active LUN, service loads are balanced, and seamless failover can be performed.

    Huawei Active-Active DC Solution employs an active-active architecture. Based on the industry-leading HyperMetro function of OceanStor Dorado V3 storage systems, the solution can work with web, database clusters, load balancing components, transmission devices, networks, and other components to deliver active-active capabilities for DCs deployed within 100 km. The solution ensures automatic failover with zero service awareness in case of device failures or even single-DC failures. In addition, it boasts zero recovery point objective (RPO) and zero recovery time objective (RTO). (RTO depends on the application system and the deployment mode).

    2.1 Active-Active Data Center Architecture

    2.2 Active-Active DC Deployment

    2.3 Customer Benefits

    2.1 Active-Active Data Center Architecture The end-to-end Active-Active DC Solution consists of the storage, computing, application, network, transmission, and security layers. Figure 2-1 shows its logical architecture.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 2 Huawei Active-Active DC Solution

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    3

    Figure 2-1 Logical architecture

    The solution employs various designs and optimizations at each layer for better reliability, performance, and load balancing effect. Table 2-1 lists the detailed designs.

    Table 2-1 Detailed designs

    Layer Design

    Storage layer The active-active architecture is gateway-free. The HyperMetro feature provides the active-active capabilities, reduces the solution's fault points, and avoids I/O performance bottlenecks brought by virtual storage gateways. The FastWrite function reduces the two round trips of a standard write I/O process to one to improve the write performance.

    Network layer The Ethernet Virtual Network (EVN) technology of Huawei CloudEngine series DC switches is used. Streamlined Layer-2 network with EVN allows Layer-2 network protocols to run on a Layer-3 network, ensuring the cross-DC service interconnection and communication.

    Computing layer Virtualization platforms such as Huawei FusionSphere and VMware allow for cross-DC clustering, meeting enterprises' active-active requirements of multiple mission-critical services.

    Application layer Virtual clusters provide higher reliability for web services and applications, and achieve automatic service switchover based on load balancing. Databases are deployed on active-active LUNs across sites.

    Transmission layer

    Huawei OptiX OSN series is used as the wavelength division multiplexing (WDM) device for active-active DCs. Three 1+1 protection schemes (link redundancy, module redundancy, and device redundancy) meet the reliability requirements of various

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 2 Huawei Active-Active DC Solution

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    4

    Layer Design levels. Optimization methods such as dispersion compensation minimize the transmission-layer delay.

    2.2 Active-Active DC Deployment Figure 2-2 shows the overall physical network of the solution.

    Figure 2-2 Physical network

    Table 2-2 describes the deployment of solution modules.

    Table 2-2 Deployment modes

    Layer Deployment Mode

    Storage layer Two OceanStor Dorado V3 storage arrays are deployed across DCs to form a storage cluster.

    Network layer Huawei CloudEngine DC switches are used as core switches. The data centers adopt typical Layer-2 or Layer-3 physical networking. EVN is enabled to form Layer-2 channels, which are aggregated by the core switches to the WDM devices. An independent global server load balancer (GSLB) is deployed at each site for load balancing between sites. Two server load balancers (SLBs) are deployed at each site to form a

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 2 Huawei Active-Active DC Solution

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    5

    Layer Deployment Mode high-availability (HA) cluster for load balancing between servers at the application layer.

    Application layer

    The web and application layers are deployed on virtual machines (VMs) or physical machines. Multiple servers in a DC or across DCs form a cluster. Databases are deployed on physical machines to form a cross-DC cluster.

    Computing layer

    Virtualization platforms such as Huawei FusionSphere and VMware are used to form a cross-DC virtual host cluster.

    Transmission layer

    Two WDM devices are deployed at each site. If device-level redundancy cannot be achieved, at least two transmission boards must be configured on each WDM device for board redundancy. Fibre Channel and IP signals are transferred by optical fibers. WDM devices are connected by two pairs of bare optical fibers.

    Arbitration Arbitration devices and software are deployed at a third-place site. The arbitration software is deployed on a physical server or a VM. The quorum server is connected to the two storage arrays in the active-active DCs over an IP network. Two quorum servers can be deployed in active/standby mode.

    GSLB adjusts the traffic among servers in various areas on a WAN (including the Internet) to ensure that users are provided with high-quality services by the nearest servers. The SLB can be regarded as the expansion of Hot Standby Router Protocol (HSRP), which enables load balancing among multiple servers.

    2.3 Customer Benefits By taking the advantage of its wide product lines and tight multi-product coupling, Huawei is the only vendor in the industry who can provide end-to-end active-active solutions.

    Huawei Active-Active DC Solution has the following highlights:

    Active-active architecture, zero data loss, and zero service interruption (RPO = 0, RTO = 0)

    Both DCs to process services, making the best use of DR resources Flexible scalability and visualized DR management

    The solution delivers the following values:

    Quick service rollout End-to-end active-active DC design ensures quick service rollout.

    Simplified architecture and 24/7 service running

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 2 Huawei Active-Active DC Solution

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    6

    Gateway-free design reduces possible failure points to ensure 24/7 service running reliability and provides active-active capabilities that allow concurrent reads and writes at both sites.

    I/O optimization for high performance Active-active I/O optimization ensures the shortest I/O processing path. In addition, optimizations in lock prefetch, storage protocols, and site access significantly enhance service performance.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    7

    3 Key Technologies Huawei Active-Active DC Solution employs the following technologies:

    Storage layer: HyperMetro for active-active capabilities at the storage layer Computing layer: virtualization technologies such as FusionSphere and VMware to

    provide the VM HA feature for automatic recovery upon faults Application layer: application clustering and database clustering technologies for

    active-active capabilities Network layer: Layer-2 interconnection technologies such as DWDM and EVN for

    low-latency, highly reliable Layer-2 network interconnection; path optimization technologies such as network devices' active-active gateways and RHI; global load balancing and server load balancers for nearest active-active access or HA network switching

    Transmission layer: device redundancy and board redundancy to establish reliable active-active transmission networks

    3.1 Active-Active at the Storage Layer

    3.2 Active-Active at the Computing Layer

    3.3 Active-Active at the Application Layer

    3.4 Active-active at the Network Layer

    3.5 Transfer-Layer Technologies

    3.1 Active-Active Storage Systems Huawei Active-Active DC Solution uses the HyperMetro feature of OceanStor Dorado V3 to achieve active-active deployments. HyperMetro federates two storage arrays into a cross-site cluster for real-time data mirroring, achieving robust reliability, high performance, and flexible scalability.

    This section describes the key technologies, working principles, and characteristics of HyperMetro.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    8

    3.1.1 Active-Active Architecture

    3.1.1.1 Parallel Access HyperMetro delivers active-active service capabilities on two storage arrays. Data in the active-active LUNs on both arrays is synchronized in real time, and both arrays process read and write I/Os from application servers to provide the servers with non-differentiated parallel active-active access. When either storage array encounters a fault, services are seamlessly switched to the other end without interrupting service access.

    In comparison with the active-passive mode, Huawei's active-active solution fully utilizes computing resources, effectively reduces inter-array communication, and greatly shortens I/O paths, providing higher access performance and faster failover. Figure 3-1 shows the interaction processes of the active-passive and active-active solutions.

    Figure 3-1 Active-passive and active-active storage architectures

    3.1.1.2 Gateway-Free Design The HyperMetro architecture groups two storage arrays into a cross-site cluster system without any additional virtual gateway. The system supports a maximum of four storage controllers, that is, two dual-controller storage arrays can be used to establish a HyperMetro architecture.

    This solution has a simplified architecture and is well compatible with value-added storage features. It delivers the following values to customers:

    Reduced number of gateway-related fault points and enhanced solution reliability Quicker I/O response. Latency caused by gateway forwarding is eliminated because I/Os

    are not forwarded by gateways. Compatibility with existing storage features. Combined with Hyper-series features, the

    solution can deliver various data protection and DR solutions. Simplified network and easier maintenance

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    9

    3.1.1.3 I/O Access Path At the application host side, HyperMetro combines active-active LUNs from both storage arrays into one using UltraPath. It then provides I/O read and write capabilities for applications using virtual disks (vdisks) of UltraPath. When an application accesses a vdisk, UltraPath selects the optimal path based on the path selection policy to deliver the I/Os to a storage array. (For details, see section 3.1.3.2 "Optimized Cross-Site Access.")

    When a LUN receives a read I/O request, the storage system reads its local cache directly and returns the data to the application. When a LUN receives a write I/O, the parallel access paths are mutually exclusive. After the write permission is obtained, the data in the I/O is written to the cache of both the local and remote active-active LUNs. After the write operation is complete at both ends, a write success is returned to the application. (For details, see section 3.1.2.2 "Cross-Site Real-Time Data Mirroring.")

    Figure 3-2 Active-active I/O path

    3.1.1.4 Storage-Layer Network Active-active storage arrays can communicate with each other using Fibre Channel (recommended) or IP links. IP links are used between the storage arrays and the quorum server.

    Figure 3-3 shows the port usage on a Fibre Channel switch in active-active networking with dual-controller storage arrays.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    10

    Figure 3-3 Active-active networking

    3.1.2 Robust Reliability HyperMetro has added new technologies to the inherent reliability design of OceanStor storage systems to maximize the active-active solution reliability.

    This section describes HyperMetro's reliability designs from the following aspects:

    Cross-site clustering Cross-site real-time data mirroring Cross-site bad block repair Split-brain prevention by arbitration High-reliability link design

    3.1.2.1 Cross-Site Clustering A cross-site cluster is built on two independent storage arrays to provide non-differentiated parallel access services to application servers and process the servers' I/O requests with an active-active storage architecture.

    An active-active cross-site cluster can be set up simply by configuring two storage arrays into a HyperMetro domain.

    The cross-site cluster system uses Fibre Channel or IP links between the storage arrays to establish a global node view and monitor the device status. Based on the view, the cluster offers capabilities such as distributed mutual exclusion and supports the active-active architecture. Figure 3-4 illustrates the cluster.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    11

    Figure 3-4 Cross-site cluster

    Cluster nodes support concurrent access:

    If a single controller is faulty, the other working controllers in the active-active cross-site cluster continue to carry host services. When the controller fault occurs, the local cluster starts self-check. To minimize the impact on performance and reliability, the system preferentially selects the controllers in the remote cluster to carry host services. After the self-check has finished and passed, the local cluster continues to provide host service capability.

    If all working controllers in the local cluster are faulty, the remote cluster continues to carry the host services.

    Figure 3-5 Active-active access and switchover

    Based on the cross-site cluster, HyperMetro creates HyperMetro pairs or consistency groups to provide services and manage task statuses.

    Member HyperMetro LUNs from both storage arrays form a virtual HyperMetro LUN. With the real-time mirroring technology, data on the member LUNs is kept consistent in real time.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    12

    A consistency group consists of multiple HyperMetro pairs and is used to ensure data consistency in the storage system when a host writes data to multiple LUNs.

    When you split or synchronize a consistency group, all HyperMetro pairs in the group are split or synchronized at the same time. If a link fault occurs, all member pairs are interrupted at the same time. After the fault is rectified, data synchronization is implemented for all pairs to ensure the availability of the data on the secondary storage array.

    3.1.2.2 Cross-Site Real-Time Data Mirroring HyperMetro uses real-time mirroring to synchronize data between two storage arrays in real time. Data is concurrently written into member LUNs in a HyperMetro pair in both DCs to ensure real-time data consistency. Figure 3-6 illustrates the write I/O process.

    Figure 3-6 Cross-site mirroring

    When DC A receives a write I/O, the mirroring process is as follows:

    1. Applying for write permission and recording the request in a log: When the storage array in DC A receives a write request, it applies for write permission for the request. After write permission is obtained, the HyperMetro pair records the request in a log. The log only records the address information but no data content and uses memory space with power failure protection capabilities to achieve better performance.

    2. Dual-write: The system writes the request to the cache of both the local and remote LUNs separately.

    3. Waiting for dual-write results: The host waits for the write results from the LUNs at both ends.

    4. Responding to the host: The HyperMetro pair returns a write I/O completion message.

    HyperMetro supports resumable data transmission. If a HyperMetro pair is disconnected (for example, due to a single-array failure), HyperMetro records the new write I/Os generated by the host in logs. After the fault is rectified, HyperMetro automatically recovers the pair and synchronizes only incremental data to the remote end. The whole process is transparent to the host and does not affect host services.

    Table 3-1 describes the relationship between the HyperMetro pair running status and host access status

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    13

    Table 3-1 Host access statues

    HyperMetro Pair Running Status

    Host Access Status Description

    Primary LUN Secondary LUN

    Suspended Readable and writable

    Unreadable and unwritable

    The user has suspended the HyperMetro mirroring relationship.

    To be synchronized Readable and writable

    Unreadable and unwritable

    The HyperMetro mirroring relationship has been interrupted due to inter-array link faults or I/O errors.

    Synchronizing Readable and writable

    Unreadable and unwritable

    A full or incremental data synchronization is in progress between both ends.

    Normal Readable and writable

    Readable and writable

    HyperMetro LUNs are in real-time mirroring relationship.

    Forced start Readable and writable

    Unreadable and unwritable

    The user has forcibly switched the secondary LUN to the primary LUN.

    Table 3-2 describes the relationship between the HyperMetro pair running status and mirroring status.

    Table 3-2 Active-active mirroring statuses

    HyperMetro Pair Running Status

    Mirroring Status

    Primary LUN Secondary LUN

    Suspended/To be synchronized/Forced start

    No mirroring. Changed data is recorded.

    N/A

    Synchronizing Mirror write. Changed data is synchronized in the background.

    N/A

    Normal Mirror write Mirror write

    3.1.2.3 Cross-Site Bad Block Repair Disks may have bad blocks due to abnormalities such as power failure. If repairable bad blocks fail to be repaired by the local end, HyperMetro automatically obtains data from the remote end to repair them, which further enhances the system reliability.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    14

    Figure 3-7 Cross-site bad block repair

    When DC A has bad blocks, the local read I/O process is as follows:

    Step 1 Applying for the read permission: When DC A receives a read request, the local storage array confirms the local read permission of the HyperMetro pair.

    Step 2 Reading a local LUN

    Step 3 Reading a bad block: If it is repairable, go to Step 4; otherwise, the process ends after 1 and 2.

    1. Redirecting the request to the remote end 2. Returning the read response

    Step 4 Returning the read data to the host to ensure rapid response

    Step 5 Writing the read data into the bad block to repair it

    Step 6 Returning the write repair result

    ----End

    3.1.2.4 Split-Brain Prevention by Arbitration If links between two HyperMetro storage arrays are disconnected, real-time mirroring will be unavailable to the storage arrays and only one array can continue providing services. To ensure data consistency, HyperMetro uses the arbitration mechanism to determine which storage array will continue providing services.

    HyperMetro supports arbitration by pair or consistency group. If services provided by multiple pairs are mutually dependent, you can configure the pairs into a consistency group. After the arbitration, the group provides services only on one storage array. For example, an Oracle database's data and log files may be saved in different LUNs, the application system accessing the database may be saved in other LUNs, and these LUNs are mutually dependent. When configuring HyperMetro, you are advised to configure HyperMetro pairs for the data, log, and application LUNs respectively, and add them to a consistency group.

    HyperMetro provides two arbitration modes:

    Static priority mode

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    15

    Quorum server mode

    You must configure a HyperMetro domain before configuring HyperMetro pairs. A HyperMetro domain contains the two storage arrays that require an active-active relationship and a quorum server. HyperMetro pairs must be created within a HyperMetro domain and each domain can have only one arbitration mode.

    Quorum server mode delivers higher reliability than static priority mode and is able to ensure continuous service running in various single points of failure. Therefore, quorum server mode is recommended.

    Static Priority Mode Static priority mode is mainly used when no third-place quorum server is deployed. In this mode, you can set either end in a HyperMetro pair or consistency group as the preferred site and the other end the non-preferred site. This mode does not require any quorum server, as shown in Figure 3-8.

    In case of heartbeat interruption between the storage arrays, the preferred site wins the arbitration.

    If the link between the storage arrays or the non-preferred site encounters a fault, LUNs at the preferred site continue with services while those at the non-preferred site stop.

    If the preferred site encounters a fault, the non-preferred site does not take over services automatically. As a result, the services stop and you must forcibly start services at the non-preferred site.

    Figure 3-8 Static priority mode

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    16

    If you power off the storage array at the preferred site for maintenance, the storage array at the non-preferred site immediately takes over all active-active services without interruption.

    This mode has a disadvantage. When heartbeats between the storage arrays are lost, the system is unable to determine whether the problem is caused by link disconnection or storage array fault. Table 3-3 lists the arbitration policies in static priority mode.

    Table 3-3 Arbitration diagrams in static priority mode

    No. Diagram Arbitration Result

    1

    Fault: link fault Arbitration result: H1 continues to run while H2 stops.

    2

    Fault: fault at the non-preferred site Arbitration result: H1 continues to run while H2 fails.

    3

    Fault: fault at the preferred site Arbitration result: H1 fails. H2 stops and must be started manually.

    Quorum Server Mode In this mode, an independent physical server or VM is used as the quorum server. You are advised to deploy the server at the third-place site to prevent server faults caused by a disaster in a DC. Figure 3-9 shows the quorum server deployment.

    Figure 3-9 Quorum server deployment

    In quorum server mode, when heartbeats between storage arrays are lost, the storage arrays at both ends apply for arbitration by the quorum server. The storage array that wins the arbitration continues providing services while the other storage array stops.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    17

    If you have arbitration preference in this mode, you can also configure priorities for the sites. The preferred storage array enjoys arbitration priority and wins the arbitration when only heartbeats are lost.

    Figure 3-10 shows the arbitration process.

    Figure 3-10 Arbitration mechanism

    1. When the link between DCs is interrupted, the cross-site storage cluster is split into two small clusters.

    2. The two small clusters apply for arbitration. The preferred storage array has arbitration priority and wins the arbitration. Consequently, it continues providing services to hosts and storage access space to applications. The losing cluster stops providing services.

    3. After the link recovers, the small clusters detect the link and form a cross-site cluster again after handshakes with each other. Then, the HyperMetro relationship recovers and the cluster provides services in active-active mode.

    Table 3-4 lists the arbitration results in various fault scenarios in quorum server mode.

    Table 3-4 Arbitration diagrams in quorum server mode

    No. Diagram Arbitration Result

    1

    Fault: The quorum server fails. Arbitration result: Both H1 and H2 continue to run.

    2

    Fault: The link between a storage array and the quorum server fails. Arbitration result: Both H1 and H2 continue to run.

    3

    Fault: A storage array fails. Arbitration result: H1 stops while H2 continues to run.

    4

    Fault: The link between the storage arrays fails. Arbitration result: H2 fails while H1 continues to run.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    18

    No. Diagram Arbitration Result

    5

    Fault: A storage array and the quorum server fail at the same time.

    Arbitration result: H1 fails and H2 stops. Fault: H1 fails first and then the quorum server fails.

    Arbitration result: H1 fails while H2 continues to run. Fault: The quorum server fails first and then H1 fails.

    Arbitration result: H1 fails and H2 stops.

    6

    Fault: The link between the storage arrays and that between H1 and the quorum server fail at the same time.

    Arbitration result: H1 stops while H2 continues to run. Fault: The link between the storage arrays fails first, and

    then the link between H1 and the quorum server fails. Arbitration result: H2 stops while H1 continues to run.

    Fault: The link between H1 and the quorum server fails first, and then the link between the storage arrays fails.

    Arbitration result: H1 stops while H2 continues to run.

    7

    Fault: H1 fails, and the link between H2 and the quorum server fails at the same time.

    Arbitration result: H1 fails and H2 stops. Fault: H1 fails first, and then the link between H2 and the

    quorum server fails. Arbitration result: H1 fails while H2 continues to run.

    Fault: The link between H2 and the quorum server fails first, and then H1 fails.

    Arbitration result: H1 fails and H2 stops.

    8

    Fault: The quorum server fails, and the link between the storage arrays fails at the same time.

    Arbitration result: Both H1 and H2 stop. Fault: The quorum server fails first, and then the link

    between the storage arrays fails. Arbitration result: H2 stops while H1 continues to run.

    Fault: The link between the storage arrays fails first, and then the quorum server fails.

    Arbitration result: H2 stops while H1 continues to run.

    9

    Fault: The quorum server fails, and the link between the quorum server and either storage array fails. Arbitration result: Both H1 and H2 continue to run.

    H1 and H2 are the two storage arrays providing HyperMetro LUNs, and C indicates the quorum server. H1 is preferred and H2 is non-preferred.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    19

    Forced Start In certain scenarios where multiple faults occur, the arbitration mechanism may stop host access to surviving member LUNs in HyperMetro pairs to protect data consistency. For example, if the preferred site in static priority mode fails, the host will be unable to access the surviving member LUNs in HyperMetro pairs. In this case, users or post-sales engineers can manually start services at the non-preferred site to rapidly recover them.

    After a site is forcibly started, it becomes the source end of HyperMetro data synchronization, and member LUNs at this end have the latest data. When the link is recovered, the system stops host access to the member LUNs at the other end and synchronizes data from the source end to the other end. Only incremental and changed data is synchronized.

    Before the forced start, consider the dual-active risk by confirming the LUN status and service status in both DCs to ensure that the storage array at the other end has stopped running.

    3.1.2.5 High-Reliability Link Design HyperMetro supports Fibre Channel and IP networks between storage arrays, which can be selected based on users' network conditions. Storage arrays can be connected directly or using Fibre Channel or IP switches. Fibre Channel networks are recommended for better performance.

    In a cross-DC 4-controller active-active network, it is advised to establish two inter-array mirroring links between each controller on the local array and that on the remote array. In addition, isolate the switches of the two links for higher link reliability.

    Figure 3-11 A cross-DC 4-controller active-active network

    To ensure performance, HyperMetro has the following requirements on the inter-site active-active links:

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    20

    Bit error rate (BER) ≤ 10e-12 Round trip time (RTT) ≤ 1 ms Zero jitter and zero packet loss Link bandwidth (a minimum of 2 Gbit/s) > peak service bandwidth

    Besides reliable networks and demanding link quality requirements, HyperMetro supports adaptive transmission bandwidth for mirroring links between active-active storage arrays. The technology dynamically adjusts the usage of each link based on the link quality to reduce the data retransmission rate.

    For example, two mirroring links (A and B) are used between the current controller and the controller on the storage array at the remote end. After detecting high transmission latency in link A caused by problems such as bit error, the system lowers the link's flow control bandwidth by 20% with an algorithm, relieving the link's bandwidth pressure to link B for more stable transmission latency. When the latency in link A drops, the system increases the link's flow control bandwidth by 20% to recover the throughput.

    3.1.3 High Performance To ensure real-time data consistency of the two DCs, a write success response is returned to the host only when the data has been written to the storage at both ends. Because real-time dual-writes increase the latency of I/Os, HyperMetro uses various I/O performance optimization solutions to mitigate the impact on write latency and improve the overall active-active service performance.

    3.1.3.1 FastWrite HyperMetro uses FastWrite to optimize data transmission between storage arrays. With SCSI's First Burst Enabled function, the data transmission interactions in a data write process are halved.

    In a common SCSI process, a write I/O undergoes multiple interactions between the two ends, such as write command delivery, write address allocation completion, data write, and write execution status return. FastWrite optimizes the write I/O interaction process by combining command delivery and data write and canceling write address allocation completion. This halves the interactions of a cross-site write I/O, as shown in Figure 3-12.

    Figure 3-12 Transfer protocol optimization

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    21

    3.1.3.2 Optimized Cross-Site Access In HyperMetro service cases, the distance between the two sites is the key determinate to I/O access performance. Interworking with OceanStor UltraPath, HyperMetro provides two I/O access policies based on the distance between sites.

    Load balancing mode Preferred array mode

    Load Balancing Mode This mode enables cross-array I/O load balancing, that is, I/Os are delivered in segments to two storage arrays. The segment size is configurable. If a segment size is 128 Mbit, I/Os with a start address ranging from 0 to 128 Mbit are delivered to storage array A, those with a start address ranging from 128 to 256 Mbit are delivered to storage array B, and other I/Os are delivered in a similar way.

    The load balancing mode is mainly used when HyperMetro storage arrays are deployed in the same data center. In these scenarios, both storage arrays deliver almost the same access performance to a host. To maximize resource usage of both storage arrays, I/Os of the host are delivered in segments to both arrays.

    Figure 3-13 Load balancing access

    Preferred Array Mode In this mode, you can assign a preferred storage array on OceanStor UltraPath. Host I/Os are delivered only to the preferred storage array based on a load balancing policy, and cross-array I/O access is not performed. The I/Os are delivered to the non-preferred storage array only when the preferred storage array is faulty.

    The preferred array mode is mainly used when the HyperMetro storage arrays are deployed in different data centers over distance. In these scenarios, cross-site access will increase the latency. If the link distance between the DCs is 100 km, the RTT is approximately 1 ms. Reducing cross-site interactions helps improve the I/O performance.

    In data read scenarios, a service host only needs to read data from the storage array of the local DC, which avoids cross-DC data reads and improves overall access performance.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    22

    Figure 3-14 Data read in preferred array mode

    In the write data scenario, the service host writes the active-active storage array of the local DC, avoiding cross-DC data forwarding of hosts. HyperMetro functions are used to ensure that each controller in the active-active cluster can receive write I/Os. The write I/O requests of local hosts are processed by local controllers, which reduces the forwarding times across DCs and improves overall performance of the solution. Figure 3-15 shows the data write I/O process.

    Figure 3-15 Writing data in preferred disk array mode

    3.1.3.3 Optimistic Locking Optimistic locking of HyperMetro achieves exclusive dual-write by adding locks within sites locally.

    When a write I/O is received, the local site applies for an optimistic lock and then writes the data to both the local and remote LUNs. Upon reception of the write I/O, the remote site

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    23

    applies for an optimistic lock locally and then writes the data. Figure 3-16 illustrates the process.

    Figure 3-16 Optimistic locking

    Optimistic locking reduces interactions between sites and improves HyperMetro performance.

    3.1.4 Efficient Technologies

    3.1.4.1 Compacted Data Copying In the initial synchronization of active-active mirror data or incremental synchronization during data recovery, the differential data blocks generally contain a large number of zero data blocks. By using the compacted data copying function, those zero data blocks will not be copied one by one. For example, in a virtualization scenario, a large number of zero data blocks may be generated when you create a VM, and an operating system disk with dozens of GB data may contain only 2 to 3 GB of non-zero data blocks. Figure 3-17 shows the working principle of compacted data copying.

    Figure 3-17 Compacted data copying

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    24

    The HyperMetro zero page identification technology is implemented as follows:

    All-zero data at the copy source end is detected using a chip. During data copying, all-zero data is marked, and only a small special page instead of the full data is transferred to the peer end.

    This technology effectively reduces the amount of data to be synchronized, saves bandwidth, and shortens the synchronization duration.

    3.1.4.2 SmartDedupe&SmartCompression On Huawei OceanStor Dorado V3, HyperMetro works with inline global deduplication and compression technologies to reduce the amount of data to be managed by systems and save storage space. As a result, devices' lifecycle is prolonged, device utilization efficiency is improved, and initial costs for purchasing storage devices and TCOs are reduced. Figure 3-18 shows the deduplication and compression effects of HyperMetro member LUNs.

    Figure 3-18 Effects of deduplication and compression policies

    Inline deduplication allows OceanStor Dorado V3 to delete duplicate data online before writing data to flash media. When user data is written to the HyperMetro storage arrays, it is saved in the cache and then the arrays delete duplicate data before writing it to disks. Deduplication is performed in real time and is not handled in back-end processing.

    Inline deduplication compresses data online before writing data to flash media. In addition, compression is performed after deduplication, ensuring that no duplicate data is compressed and improving compression efficiency. Compression is performed in real time and is not handled in post-processing. The overall compression ratio is determined by the nature of data sets. The compressed data blocks are stored in arrays but need fewer SSDs. SmartCompression minimizes write amplification (WA) of SSDs and improves durability of flash arrays. Dorado V3 adopts the enhanced LZ4 rapid compression algorithm. The unit for storing compressed data is 1 KB, which significantly increases the data compression ratio.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    25

    3.1.5 Flexible Scalability HyperMetro can work with other Hyper-series features to provide various data protection and DR solutions.

    3.1.5.1 Online Expansion to Active-Active Arrays HyperMetro is paired with Huawei UltraPath multipathing software that supports LUN aggregation and shields physical differences at the storage layer.

    When users want to expand a single data center to active-active data centers, UltraPath can smoothly take over the new array and active-active member LUNs for online expansion. Figure 3-19 shows the online expansion process.

    Figure 3-19 Online expansion from one array to active-active arrays

    The expansion procedure is as follows:

    Step 1 Initialize array A.

    Step 2 Install the multipathing software.

    Step 3 Configure array A.

    Step 4 Configure a multipathing policy.

    After array A properly provides services:

    If you use Huawei UltraPath, perform Step 5 to Step 9 to implement online expansion from one array to active-active arrays.

    If you use the native multipathing software, perform Step 5 to Step 8 to create paths between the host and array B and HyperMetro member LUNs. In Step 9, you must restart the host after configuring the multipathing policy for the new paths and HyperMetro member LUNs to take effect.

    Step 5 Add and initialize array B.

    Step 6 Install quorum server software.

    Step 7 Configure quorum server software.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    26

    Step 8 Configure SAN HyperMetro.

    Step 9 Configure a multipathing policy for HyperMetro.

    ----End

    3.1.5.2 Local Protection

    HyperSnap Protection In the event of virus attacks or misoperations, data in the DCs may be damaged. HyperSnap generates snapshots from existing data before operation to protect local data.

    HyperSnap uses the redirect-on-write (ROW) technology, which has little impact on the source LUN and only needs a small storage space. If data on the original volumes is modified or deleted by mistake and needs to be restored, snapshots can be used to roll back and restore the data on the original volumes. In addition, the snapshot volume can be mapped to hosts for data testing and mining without affecting production services. Interworking with Huawei DR management software, a full check is performed on databases before snapshots are activated. After data is written into disks, snapshots are activated to ensure that snapshot data is consistent with the data in the databases and databases can be started promptly.

    If snapshot rollback is performed on the current active-active data, the data will be overwritten and cannot be restored to the latest status. Therefore, if you want to restore the data to the latest status, manually generate a snapshot of the current data.

    The combination of HyperMetro and HyperSnap can provide snapshot protection for member LUNs on the storage arrays at both ends.

    Figure 3-20 Active-active solution with snapshot protection

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    27

    3.2 Active-Active Design at the Computing Layer If deployed with physical machines, cross-DC clusters provide fast service switchover and uninterrupted service access without data loss in various scenarios.

    Common virtual cluster technologies share the following outstanding characteristics:

    The restart and recovery of VMs by using the HA function interrupt services for a short moment. If a host machine malfunctions, VMs running on it automatically restart on other host machines. VM services will be interrupted for a short moment, and memory data of these VMs will be lost.

    Compared with a physical machine cluster, the resource usage of a VM cluster is dozens of times higher. Dozens of VMs can run on a physical host machine, providing excellent resource usage.

    Based on the above, resolving the service interruption issue experienced during an HA restart is the top priority; otherwise, the zero interruption requirements of active-active services cannot be met. As previously described, SLBs are used to balance service loads in active-active DCs. To resolve the service interruption issue, deploy the same service on VMs in the two DCs so that when one host machine malfunctions, VMs in the other DC can take over the loads in real time.

    Virtual platforms, such as VMware vSphere and FusionSphere, have been commercially used for many years, and they have been proved to be stable and reliable. The recommended configuration of computing resource virtualization is as follows:

    Deploy a cross-DC virtual cluster. After computing resources are virtualized, deploy VMs on them.

    Configure the HA function to protect VMs and enable them to recover automatically from faults.

    Configure dynamic resource schedulers (DRSs) to distribute VMs over multiple host machines based on service requirements.

    Interconnect with the Layer-2 network to enable the VMs to freely migrate across DCs online, and provide better maintainability of virtualization platforms without service impact during routine maintenance.

    Map the shared storage space provided by the active-active storage platform to all host machines of the virtual cluster to enhance the flexibility of VMs.

    After the active-active reconstruction of the computing layer, VMs can better balance service loads based on the original computing resources, significantly improving the resource usage and running efficiency, making service deployment simpler and more flexible, and enabling VMs to have better reliability, online migration performance, and maintainability.

    VM deployment methods:

    For browser/server (B/S) applications: The Web layer and App layer are deployed on VMs. The VMs are not clustered. The SLB can detect server faults and distribute services to other functional servers.

    For client/server (C/S) applications: If the App layer is deployed on VMs, the VMs are deployed in a cross-DC cluster.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    28

    3.3 Active-Active Design at the Application Layer 3.3.1 Active-Active B/S Applications

    Working Principle When passing through a web server (for example, Apache), an HTTP request sent by a browser is redirected to an application server (for example, Weblogic). This redirection is performed by the web server.

    Generally, a web server corresponds to an application server cluster for load balancing.

    Figure 3-21 Deployment of a web application server cluster

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    29

    Network Topology

    Figure 3-22 Topology of B/S request forwarding

    Multiple web servers are deployed at each site without being clustered. All web servers at each site constitute a resource pool on the SLB (F5 LTM). In the active-active DC Solution, DC A and DC B resource pools are created.

    If multiple application servers are deployed at each site, group application servers of the same type into an active-active cluster.

    Application server clusters in the two DCs are connected to the cross-DC database cluster.

    HTTP Session Persistence Management An HTTP session refers to a series of requests sent by a user from a browser to a server. HTTP sessions enable applications running on web containers to track each user's operations.

    Session persistence management is that all follow-up requests of a user are distributed to the same application server. The session persistence management enhances the system performance because application servers do not need to create and maintain session information. In addition, HTTP persistence management avoids losing former sessions.

    Deployed across DCs, the application server cluster uses memory synchronization to implement session persistence management. Sessions are not lost even in cross-DC access.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    30

    Load Balancing The process for load balancing of B/S applications is as follows:

    1. An SLB receives an HTTP request and allocates the request to a web server based on the load balancing algorithm.

    2. The web server receives the HTTP request and forwards it to an application server node selected from the resource pool. a. The web server checks the persistence of the HTTP session by checking the

    jSession id parameter in the cookie or the URL of the request. b. If the parameter matches the session ID, a plug-in forwards the request to the

    application server. c. If the parameter does not match the session ID, the plug-in will search for another

    application server based on the preset rule. 3. The web server forwards the request to an application server through TCP/IP. If the

    forwarding succeeds, the application server returns with TCP/IP ACK. If the application server does not respond due to timeout, the web server will return the error code 500.

    4. After the request is sent, the web server enters the waiting state. When the application server returns the request processing result, the waiting state of the web server finishes. If the request fails to be processed, the web server marks the application server as unavailable, and forwards the request to another application server.

    3.3.2 Active-Active C/S Applications C/S services support external IP address access. For example, you can log in to an application using an IP address, user name, and password on a client.

    If C/S applications do not support distributed deployment, primary routes can be advertised only in one DC. Therefore, the applications can run in one DC only, but automatic failover to the secondary site is supported to allocate applications on two DCs.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    31

    Figure 3-23 Working principle of active-active C/S applications (IP address access without support for distributed deployment)

    If the C/S applications support distributed deployment and run in two DCs (different IP addresses are used for external access requests), manually configure the server IP address corresponding to the client to balance different customers' loads among the DCs.

    Figure 3-24 Working principle of active-active C/S applications (IP address access with support for distributed deployment)

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    32

    3.3.3 Active-Active Databases Active-active databases are realized by database clustering in two modes: active-standby and active-active clustering.

    Common active-standby cluster systems include: IBM PowerHA, HP ServiceGuard, Microsoft WSFC, and Veritas Cluster Server. When an active node malfunctions, a failover is performed, that is, a standby node in the cluster automatically restarts the application system. The administrator must deploy cluster software on all active and standby nodes in the cluster to control the mounting of file systems, startup of application system services, and configuration of public network IP addresses. Figure 3-25 shows the active-standby cluster architecture.

    Figure 3-25 Active-standby cluster

    Multiple nodes on an active-active cluster can provide the same service simultaneously. This feature enables the active-active cluster to implement seamless failover and enhance the overall application system performance. Currently, active-active clusters such as the Oracle Real Application Cluster (RAC) are commonly used. Figure 3-26 shows the active-active cluster architecture.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    33

    Figure 3-26 Active-active cluster

    3.3.3.1 Active-Active Oracle RAC Oracle RAC enables all nodes to concurrently access data files, redo log files, control files, and parameter files on shared storage resources. In addition, if a node fails, services running on the node are automatically switched to a functioning node to ensure database availability.

    Active-active LUNs provided by the HyperMetro storage system are used as shared volumes to construct cross-DC Oracle Extended RACs.

    Oracle Extended RAC works with the Oracle listener to achieve cross-DC service access and load balancing. Combined with Oracle Transparent Application Failover (TAF), Oracle Extended RAC enables clients to continue running with new connections without service interruption when a server or DC encounters a fault.

    If heartbeat links on intermediate networks are down, Oracle RAC implements arbitration based on the following rules:

    The sub-cluster with the largest number of nodes wins. If the numbers of nodes in the sub-clusters are the same, the sub-cluster with the lowest

    node No. wins.

    It is recommended that Oracle RAC be deployed in 2+1 mode, that is, two servers in DC A and one server in DC B. By doing so, instances on DC A survive first when a heartbeat link fault occurs. If the DCs have the same number of nodes, deploy the servers with low node No. in DC A.

    Figure 3-27 shows the node deployment of Oracle Extended Distance Cluster.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    34

    Figure 3-27 Oracle RAC network topology

    It is advised to create different services at the Oracle RAC layer to isolate services and prevent data interaction across DCs.

    The PREFERRED function of Oracle RAC TAF enables applications to access local instances preferentially. Instances in the remote DC are set as available and access requests are switched to remote instances only when all local instances are faulty.

    Table 3-5 lists the service setting methods.

    Table 3-5 Service setting modes

    Service Name Instance 1 Instance 2 Instance 3

    Service configuration policy

    Service 1 Preferred Preferred Available

    Service 2 Available Available Preferred

    3.3.3.2 Active-Active DB2 The IBM DB2 is clustered in two methods: database clustering (such as pureScale) and operating system clustering (such as IBM PowerHA).

    This solution uses IBM PowerHA cluster to deploy active-active DB2 databases across DCs.

    As an active-standby clustering mode, PowerHA is designed to eliminate single points of failures. When a server fails, another one automatically takes over services. With a redundancy mechanism, the PowerHA provides failover upon faults, and supports horizontal expansion by concurrent/parallel access.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    35

    Based on the Shared Everything principle, PowerHA requires the storage layer to provide shared volumes for reads and writes of multiple database nodes in active-active architecture. Therefore, the data files and log files of DB2 databases must be saved on shared disks. DB2 supports the following storage modes:

    File system Raw device

    It is advised to use file systems to provide storage space for the database. File systems with direct I/O and concurrent I/O features have similar performance as raw devices and are easier to manage.

    It is recommended that one file system be created on each LUN. You are advised to create the following file systems:

    One log file system Multiple data file systems

    The storage system must support SCSI-3 Persistent Group Reservations (PGR) for non-concurrent VGs.

    Suggestions for PowerHA heartbeat configuration:

    Coexistence of IP and non-IP heartbeat networks for redundancy TCP/IP and disk heartbeat networks as PowerHA heartbeat networks An enhanced concurrent VG created on storage for PowerHA's disk heartbeat

    communication.

    PowerHA supports three takeover modes of resource groups:

    Cascading A cascading resource group defines a list of nodes that can control the resource group and the priority of theses nodes to take over the group. When the cluster is started, the resource group is taken over by the node with the highest priority. When the node fails, another node with second highest priority takes over the resource group. After the failed node is re-added to the cluster, it re-controls the resource pool.

    Rotating A rotating resource group is associated with a group of nodes. The resource group passes on from one defined node to another based on the node's priority. When the cluster is started, the resource group is taken over by the node with the highest priority. When the node fails, another node with second highest priority takes over the resource group. After the failed node is re-added to the cluster, it does not take over services and serves as a standby node.

    Concurrent A concurrent resource group is shared by multiple nodes. When the cluster is started, all the nodes access the resource group concurrently and share all the resources. When a node fails, the resource group is put offline only on this node. After the node is re-added to the cluster, the resource group is put back online.

    The concurrent resource group is mainly used to realize the active-active cluster architecture for Oracle RAC. This solution does not use the concurrent mode.

    3.3.3.3 Active-Active SQL Server SQL Server database adopts operating system clusters, for example, Windows Server Failover Cluster (WSFC).

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    36

    WSFC provides various functions to support high-availability and DR solutions of carried server application programs (such as Microsoft SQL Server). If a cluster node or service fails, services running on it will be automatically or manually switched to another available node in a failover.

    The nodes in a WSFC work together to provide the following functions:

    Distributed metadata and notifications Each node in the cluster maintains the metadata of WSFC services and application programs they carry. The metadata includes the application program settings, WSFC configurations and status. Changes in the metadata or status of a single node are automatically synchronized to other nodes in the cluster.

    Resource management Each node in the cluster can provide physical resources, such as direct attached storage (DAS), network interfaces, and access to shared disks. The carried application programs register themselves as cluster resources, and you can configure the dependency between their startup as well as running statuses and other resources.

    Running status monitoring The monitoring of inter-node and primary node running status is implemented by monitoring network communications and resources. The overall running status of the cluster is determined by the votes of the nodes in the cluster.

    Failover coordination Each resource is set to be carried by the primary node and can be automatically or manually switched to one or more secondary nodes. The running status-based failover policy controls the automatic switching of resource ownership among nodes. When a failover occurs, the node and applications running on it are notified so that they can react properly.

    Because the WSFC adopts the Shared Everything principle, data files and log files of SQL Server database must be stored on shared disks. It is advised to use new technology file systems (NTFSs) for SQL Server.

    Reservation mode requirement: support for SCSI-3 persistent group reservations (PGR).

    WSFC arbitration mode:

    You are advised to deploy two arbitration nodes, which yield higher resource usage compared to a deployment with more than two arbitration nodes. In a cluster with an even number of nodes, the "majority of nodes and disks" arbitration mode is adopted.

    3.4 Active-Active Design at the Network Layer 3.4.1 Network Architecture

    The network architecture covers the deployment and planning of three sites: DC A, DC B, and a third-place quorum site.

    It is recommended that the distance between the two DCs be within 100 km, and bare optical fibers be available.

    The third-place quorum site is connected to both DC A and DC B without distance requirements.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    37

    Figure 3-28 Overall network architecture

    Public networks should be logically isolated from private networks and quorum networks on core

    switches. On WDM devices, different WDM channels must be used to carry these networks (especially

    large-scale networks). The public networks must be physically isolated from private networks. Public networks can share WDM channels with other traffic (such as traffic among servers) based on the bandwidth utilization.

    Databases are used to provide service access to upper-layer application servers. Service access is not directly provided to WAN users.

    3.4.2 Network Across Data Centers To ensure reliability, data transmission links and heartbeat links are separated in this solution. It isolates end-to-end traffic using VLAN or VRF and assigns independent physical interconnection links to achieve traffic isolation between services and cluster heartbeats without mutual impacts.

    Services that are transmitted across DCs include:

    Real-time data synchronization between intra-city DCs using Fibre Channel links. Heartbeat communication of host application clusters and communication of

    synchronization interconnection links between DCs using a Layer-2 Ethernet.

    If the fiber distance of two DCs is ≤ 25 km, and the number of bare optical fiber pairs is ≥ 4.

    It is advised to cascade four core switches in a 10GE network with two pairs of bare fibers.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    38

    It is advised to cascade four Fibre Channel switches in one-to-one mode with two pairs of bare optical fibers.

    Figure 3-29 Network architecture when link distance is ≤ 25 km and the pair of bare optical fiber is ≥ 4

    If the distance between the two DCs is more than 25 km or the pair of bare optical fibers is fewer than four, it is advised to use optical transfer network (OTN) WDM devices to build an intra-city network for the two DCs.

    Both Ethernet switches and Fibre Channel switches are connected to OTNs, and the OTNs of both DCs are directly cascaded with two pairs of bare optical fibers.

    Figure 3-30 Network architecture when link distance is > 25 km and number of bare fibers < 4 pairs

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    39

    3.4.3 Network Architecture for Service Access

    3.4.3.1 B/S Application Network Architecture

    Figure 3-31 B/S application network architecture

    B/S applications are mostly web applications providing external domain name access services. A Web-App-DB three-layer structure is used: The Web and App layers are deployed on VMs, which are deployed into a cluster in a DC; DB is deployed on physical machines, which are deployed into Oracle RAC across DCs. The Web/App layer provides active-active access, while DB provides services only for the App/Web layer.

    Service access network design for the Web/App layer:

    In a Layer-3 physical network architecture, gateways of the Web/App layer are configured on the aggregation switches; in a Layer-2 physical network architecture, gateways of the Web/App layer are configured on the core switches. Gateways of the two sites are independent from each other and reside on different network segments with different routes advertised.

    Service access network design for the DB layer:

    In a Layer-3 physical network architecture, gateways of the DB layer are configured on the aggregation switches; in a Layer-2 physical network architecture, gateways of the DB layer are configured on the core switches. Layer-2 interconnection is required between the two sites which reside on the same network segment. Active-active gateways are deployed at each site, which advertise the host routes of databases to the DCs. These host routes are not advertised to the WAN.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    40

    3.4.3.2 C/S Application Network Architecture

    Figure 3-32 C/S application network architecture

    C/S applications are mostly middleware applications providing external IP address access without any Global Server Load Balance (GSLB). C/S applications use the App-DB two-layer structure: The App layer is deployed on VMs, which are deployed into a cluster in a DC; DB is deployed on physical machines. The App layer runs in a single DC. The DB layer provides services to the App layer only.

    Service access network design for the DB layer:

    In a Layer-3 physical network architecture, gateways of the DB layer are configured on the aggregation switches; in a Layer-2 physical network architecture, gateways of the DB layer are configured on the core switches. Layer-2 interconnection is required between the two sites which reside on the same network segment. Active-active gateways are deployed at each site, which advertise the host routes of databases to the DCs. The host routes are not advertised to the WAN.

    Service access network design for the App layer:

    In a Layer-3 physical network architecture, gateways of the App layer are configured on the aggregation switches; in a Layer-2 physical network architecture, gateways of the App layer are configured on core switches. Layer-2 interconnection is required between the two sites which reside on the same network segment. Gateways are designed in two ways:

    Centralized gateways The gateway in the DC where the App host resides is configured as primary Virtual Route Redundancy Protocol (VRRP) gateway and advertises the primary routes, while that in the other DC is configured as secondary gateway and advertises the secondary routes. If VMs are migrated across DCs, the gateways' primary/secondary relationship will not switch with the migration, and there will be cross-DC access.

    Active-active gateways deployed at each site

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    41

    The active-active gateways dynamically detect the location of an App host in the DC. The local gateways advertise the host routes to the DC and the WAN to optimize access paths.

    Centralized Gateways C/S applications mostly run in a single DC, and the gateways advertise network segment routes within a DC. C/S applications can be migrated across DCs.

    Figure 3-33 C/S application path – before migration

    The centralized gateways do not switch with the VM migration. As a result, cross-DC access exists. However, external routes are stable with clear access paths, which simplifies troubleshooting and O&M.

  • Huawei OceanStor Dorado V3 All-Flash Storage System Active-Active Data Center Solution Technical White Paper 3 Key Technologies

    Issue 1.2 (2017-12-30) Huawei Proprietary and Confidential Copyright © Huawei Technologies Co., Ltd..

    42

    Figure 3-34 C/S application path – after migration

    Active-Active Gateways C/S applications can be migrated and scheduled freely across DCs. Host routes are advertised by gateways nearest to the DC based on the location of VMs.

    Figure 3-35 C/S application path optimization – before migration

    Address Resolution Protocol (ARP) broadcast is performed after the VM migration. The active-active gateways