vce vxrail appliance - leadkeeper.net · vxrail concepts and architecture 5 © 2016 vce company,...
TRANSCRIPT
VXRAIL CONCEPTS AND ARCHITECTURE
1 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
VCE VXRAIL™ APPLIANCE
Hyper-Converged Infrastructure Appliance from EMC® and VMware®
Document H15104 Version 1.0
April, 2016
VXRAIL CONCEPTS AND ARCHITECTURE
2 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Copyright © 2016 EMC Corporation. All rights reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED ―AS IS.‖ EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.
EMC2, EMC, VCE, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United State and other countries. All other trademarks used herein are the property of their respective owners.
For the most up-to-date regulator document for your product line, go to EMC Online Support (https://support.emc.com).
VXRAIL CONCEPTS AND ARCHITECTURE
3 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Table of Contents
Preface
AUDIENCE ......................................................................................................................... 6
RELATED RESOURCES AND DOCUMENTATION ....................................................................... 6
CONTRIBUTORS ................................................................................................................. 7
CONVENTIONS ................................................................................................................... 7
Introduction
DEPLOYMENT TREND TOWARDS CONVERGED INFRASTRUCTURE ............................................. 8
DESIGN TREND TOWARDS SDDCs ........................................................................................ 9
HYPER-CONVERGED INFRASTRUCTURE ............................................................................... 10
VCE Converged Infrastructure Platforms Overview
BLOCK ARCHITECTURE ..................................................................................................... 13
RACK ARCHITECTURE ....................................................................................................... 14
APPLIANCE ARCHITECTURE ............................................................................................... 14
VCE VXRAIL™APPLIANCE PRODUCT PROFILE ....................................................................... 15
VxRail Hardware Architecture
VXRAIL APPLIANCE CLUSTER ............................................................................................. 17
VxRail Node ..................................................................................................................... 17
VxRail Node Storage Disk Drives ........................................................................................ 19
VXRAIL MODELS AND SPECIFICATIONS .............................................................................. 19
Scaling ........................................................................................................................... 20
VxRail Software Architecture
APPLIANCE MANAGEMENT ................................................................................................. 23
VxRail Manager ................................................................................................................ 23
VxRail Manager Extension ................................................................................................. 23
VMWARE VSPHERE ........................................................................................................... 26
VMware vSphere vCenter Server ........................................................................................ 26
vCenter Server Services and Interfaces ................................................................................. 27
PSC Deployment Options ................................................................................................... 27
VMware vSphere ESXi ....................................................................................................... 28
ESXi Overview ................................................................................................................ 28
Communication between vCenter Server and ESXi Hosts ....................................................... 29
Virtual Machines ............................................................................................................... 30
VXRAIL CONCEPTS AND ARCHITECTURE
4 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Virtual Machine Hardware .................................................................................................. 31
Virtual Machine Communication .......................................................................................... 31
Virtual Networking ............................................................................................................ 31
Standard Virtual Switch .................................................................................................... 32
Virtual Distributed Switch .................................................................................................. 33
Migration and VMotion ...................................................................................................... 34
Enhanced vMotion Compatibility .......................................................................................... 35
Storage vMotion .............................................................................................................. 35
vSphere Distributed Resource Scheduler ............................................................................. 36
vSphere High Availability (HA) ........................................................................................... 38
vCenter Server Watchdog .................................................................................................. 40
vSphere Fault Tolerance (FT) ............................................................................................. 41
VIRTUAL SAN .................................................................................................................. 42
Disk Groups ..................................................................................................................... 43
Hybrid and All-Flash Differences ......................................................................................... 44
Read Cache: Basic Function ............................................................................................... 44
Write Cache: Basic Function ............................................................................................... 45
Flash Endurance ............................................................................................................... 45
Virtual SAN’s Impact on Flash Endurance ............................................................................... 45
Client Cache .................................................................................................................... 45
Objects and Components .................................................................................................. 46
Witness ........................................................................................................................ 46
Replicas ........................................................................................................................ 46
Storage Policy Based Management (SPBM) .......................................................................... 47
Dynamic Policy Changes .................................................................................................... 47
Storage Policy Attributes ................................................................................................... 47
I/O Paths and Caching Algorithms ...................................................................................... 50
Read Caching ................................................................................................................. 50
Write Caching ................................................................................................................. 52
Distributed Caching Considerations ...................................................................................... 54
Virtual SAN High Availability and Fault Domains ................................................................... 55
Limitations of Two- and Three-Node Configurations .................................................................. 55
Fault Domain Overview ..................................................................................................... 56
Virtual SAN Stretched Cluster ............................................................................................ 57
Site Locality ................................................................................................................... 58
Networking .................................................................................................................... 59
Stretched-Cluster Heartbeats and Site Bias ............................................................................ 59
vSphere HA settings for Stretched Cluster ............................................................................. 59
Snapshots ....................................................................................................................... 59
How Snapshots Work ....................................................................................................... 60
Managing Snapshots ........................................................................................................ 62
Deduplication and Compression ......................................................................................... 62
Advantages of Data-Reduction Technology ............................................................................. 62
In-line Deduplication and Compression per Disk Group .............................................................. 63
Latency and Resource Consumption ..................................................................................... 64
Enabling Deduplication and Compression ............................................................................... 64
Erasure Coding ................................................................................................................ 64
Enabling Erasure Coding ................................................................................................... 66
Requirements ................................................................................................................. 67
Overhead Issues (RAID-5 and RAID-6) ................................................................................. 67
VXRAIL CONCEPTS AND ARCHITECTURE
5 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Integrated Solutions STORAGE TIERING WITH CLOUDARRAY .............................................................................. 68
INTEGRATED BACKUP AND RECOVERY WITH VSPHERE DATA PROTECTION (VDP) .................... 70
INTEGRATED REPLICATION WITH RECOVERPOINT FOR VIRTUAL MACHINES ........................... 71
Use Case Examples USE CASE: CREATE IT CERTAINTY FOR VIRTUAL DESKTOP INFRASTRUCTURE (VDI) ................ 72
Meeting the Virtualization Challenge for Federal Agencies .................................................. 73
USE CASE: SIMPLIFYING THE DISTRIBUTED ENTERPRISE ENVIRONMENT ............................... 74
Meeting the Distributed Enterprise Challenge for State and Local Agencies ........................... 75
Product Information
PRODUCT SUPPORT .......................................................................................................... 76
EMC PROFESSIONAL SERVICES FOR VXRAIL APPLIANCES ..................................................... 76
VSPHERE ORDERING INFORMATION ................................................................................... 77
WE’D LIKE TO HEAR FROM YOU! ........................................................................................ 77
VXRAIL CONCEPTS AND ARCHITECTURE
6 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Preface
This EMC TechBook provides a thorough conceptual and architectural review of the VCE VxRail™
Appliance. It reviews current trends in the industry that are driving adoption of converged
infrastructure and highlights the pivotal role of VxRail Appliances in today’s modern data center.
As part of an effort to improve and enhance the performance and capabilities of its product lines,
EMC periodically releases revisions of its hardware and software. Therefore, some functions
described in this document may not be supported by all versions of the software or hardware
currently in use. For the most up-to-date information on product features, refer to the product
release notes. If a product does not function as described in this document, please contact your
EMC representative.
AUDIENCE
This TechBook is intended for EMC field personnel, partners, and customers involved in designing, acquiring,
managing, or operating aVxRail Appliance solution.This TechBook may also be useful for Systems Administrators
and EMC Solutions Architects.
RELATED RESOURCES AND DOCUMENTATION
Refer to the following items for related, supplemental documentation, technical papers, and websites.
DRS Web Content at https://www.vmware.com/products/vsphere/features/distributed-switch#sthash.WC5hSHzt.dpuf
EMC CloudArray Product Description Guide: https://www.emc.com/collateral/guide/h13456-cloudarray-pdg.pdf
EMC CloudArray AdministratorGuide: http://uk.emc.com/collateral/TechnicalDocument/docu60786.pdf
An overview of VMware VSAN Caching Algorithmsathttps://www.vmware.com/files/pdf/products/vsan/vmware-virtual-san-caching-whitepaper.pdf
vSphere Resource Management athttp:/www.vmware.com/support/pubs
Virtual SAN 6.2 Stretched Cluster Guideat:http://www.vmware.com/files/pdf/products/vsan/VMware-Virtual-SAN-6.2-Stretched-Cluster-Guide.pdf
Virtual SANSparse—Tech Note for Virtual SAN 6.0 Snapshots at https://www.vmware.com/files/pdf/products/
SAN
vSphere Virtual Machine Administration Guide at https://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-6-pubs.html
Blogs, web pages, publications, and multimedia content from http://www.hyperconverged.org/
VXRAIL CONCEPTS AND ARCHITECTURE
7 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
CONTRIBUTORS
Along with other EMC and VMware engineers, field personnel, and partners, the following individuals have been
contributors to this TechBook:
Flavio Fomin Bill Leslie Arron Lock Joe Vukson Sam Huang Aleksey Lib Violin Zhang Colin Gallagher Megan McMichael Hanoch Eiron Gail Riley Jim Wentworth
CONVENTIONS
EMC uses the following type style conventions in this document.
Normal—Used in running (nonprocedural) text for
Names of interface elements, such as names of windows, dialog boxes, buttons, fields, and menus
Namesofresources,attributes,pools,Booleanexpressions,DQL statements, keywords, clauses, environment variables, functions, and utilities
URLs,pathnames,filenames,directorynames,computer names, links, groups, file systems, and notifications
Bold—Used in running (nonprocedural) text for names of commands, daemons, options, programs, processes,
services, applications, utilities, kernels, notifications, system calls, and man pages.
Italic: Used in all text (including procedures) for
Full titles of publications referenced in text
Emphasis, for example, a new term
Policies and variables
Courier: Used for:
System output, such as an error message or script
URLs, complete paths, filenames, prompts, and syntax when shown outside of running text
VXRAIL CONCEPTS AND ARCHITECTURE
8 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Introduction
The IT infrastructure market is undergoing unprecedented transformation. The most significant transformation is
reflected by two major trends: Adeployment trend toward converged infrastructure and a design trend toward
software-defined data centers (SDDCs). Both are responses to the IT realities of infrastructure clutter, complexity,
and high cost; they represent attempts to simplify IT and reduce the overall cost of infrastructure ownership.
Today’s infrastructure environments are typically comprised of multiple hardware and software products from
multiple vendors, with each product offering a different management interface and requiring different training. Each
product in this type of legacy stack is likely to be grossly overprovisioned, using its own resources (CPU, memory,
and storage) to address the intermittent peak workloads of resident applications. The value of a single shared
resource pool, offered by server virtualization, is still generally limited to the server layer. All other products are
islands of overprovisioned resources that are not shared. Therefore, low utilization of the overall stack results in the
ripple effects of high acquisition, space, and power costs. Too many resources can be wasted in legacy
environments.
DEPLOYMENT TREND TOWARDS CONVERGED INFRASTRUCTURE (CI)
Industry-infrastructure deployment has shifted from a build to a buy approach. This shift is being driven by the
need for IT to focus limited economic resources on driving business innovation. While a build-your-own strategy can
achieve a productive IT infrastructure, these deployments can be difficult and lengthy to implement and vulnerable
to higher operating costs,and they’re susceptible to greater risk related to component integration, configuration,
qualification, compliance, and management. Converged infrastructure (CI) packages compute, storage, and
networking components into a single optimized IT solution. CI is a simple, fast, and effective alternative to build-
your-own and has been widely adopted.
CI typically brings together blade-servers, enterprise storage arrays, storage area networks, IP networking,
virtualization, and management software into a single product. CI means that multiple pre-engineered and pre-
integrated components operate under a single controlled converged architecture with a single point of management
and a single source for end-to-end support. CI provides a localized single resource pool that enables a higher
overall resource utilization than with a legacy island-based infrastructure. Overall acquisition cost is lower and
VXRAIL CONCEPTS AND ARCHITECTURE
9 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
management is simplified. In the data center, CI typically has a smaller footprint with less cabling and can be
deployed much faster than traditional infrastructure.
DESIGN TREND TOWARDS
SOFTWARE-DEFINED DATA CENTERS (SDDCs)
Traditional data centers are hardware-centric. Emerging data centers are software-centric. While the concept is still
evolving, a software-defined data center (SDDC) is a software-centric architectural approach based on virtualization
and automation. To logically define all infrastructure services, the SDDC applies the widely successful principles of
server virtualization—abstraction, isolation, and pooling—to the remaining network and storage infrastructure
services. SDDC management is automated through policy-based software which controls both on-premises and off-
premises resources. With SDDC, traditional enterprise applications can be supported in a more flexible and cost
effective manner. SDDC represents the epitome of the agile digital business model, where pooled resources adapt
and respond to shifting application requirements.
Figure 1: SDDC
Virtualized servers are probably the most well-known software-defined IT entity, where hypervisors running on a
cluster of hosts allocate hardware resources to virtual machines (VMs). In turn, VMs can function with a degree of
autonomy from the underlying physical hardware. Software-defined storage (SDS) and software-defined
networking (SDN) are based on a similar premise: Physical resources are aggregated and dynamically allocated
based on predefined policies with software abstracting control from the underlying hardware. The result is the
logical pooling of compute, storage, and networking resources. Physical servers function as a pool of CPU resources
hosting VMs, while network bandwidth is aggregated into logical resources, and pooled storage capacity is allocated
by specified service levels for performance and durability.
Once the data center has abstracted resources, SDDC services make the data center remarkably adaptable and
responsive to business demands. In addition to virtualized infrastructure, the SDDC includes automation, policy-
based management, and hybrid cloud services. The policy-based model insulates users from the underlying
commodity technology, and policies balance and coordinate resource delivery. Resources are allocated where
needed, absorbing utilization spikes while maintaining consistent and predictable performance. Conceptually, SDDC
encompasses more than the IT infrastructure itself; it also represents an essential departure from traditional
methods of delivering and consuming IT resources. Infrastructure, platforms, and software have become services,
and SDDC is the fundamental mechanism that underpins the most sophisticated cloud services. The most effective
SDDC deployments are based on technology that provides simple implementation, administration, and
VXRAIL CONCEPTS AND ARCHITECTURE
10 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
management. This requires an infrastructure solution with an extremely high level of efficiency and serviceability,
such as hyper-converged infrastructure.
HYPER-CONVERGED INFRASTRUCTURE
Hyper-converged infrastructure (HCI) is the next level of converged infrastructure. HCI is a new type of CI with a
software-centric architecture based on smaller, industry-standard building-block servers that can be scaled. HCI
has a software-defined architecture with everything virtualized. Compute, storage, and networking functions are
decoupled from the underlying infrastructure and run on a common set of physical resources that are based on
industry-standard components. Hyper-converged systems do not include separate enterprise storage arrays.
Instead, they adopt industry-standard server platforms with local direct-attached storage (DAS), which is
virtualized using software-defined storage technology. (See Figure 2 below.) By integrating these technologies,
HCI systems are managed as a single system through a common toolset.
The ideal HCI solution integrates thesebuilding-block servers with a familiar, simple management software for
reliability and serviceability. This enables efficient and safe use of commodity-off-the-shelf (COTS) hardware.
Simple management software allows a common operational model, which drives efficiency and enables workload
mobility. Other benefits of HCI include a lower total cost of operation as well as flexible scalability—nodes, which
provide both CPU and storage, can easily be added to meet business demands. Unlike CI, the technologies in HCI
are so integrated that they cannot be broken down into separate components for independent use. HCI offers a
seamless framework of integrated, virtualized, scalable nodes with built-in management.
Figure 2: CI and HCI
HCI carries forward the benefits of CI, including a single shared resource pool,and takes them even further. By
reinventing the underlying data architecture, HCI includes full data services. Complete integration and innovation at
the software layer allows for radically simple end-to-end data management. Deploying new infrastructure, which
could take up to a week in the build-your-own model, can be up and running in under 30 minutes, because HCI
offers such high levels of task automation. Ideally, HCI is fully integrated, preconfigured, and tested. This provides
a simple, cost effective, non-disruptive scalable solution with centralized management functionality, rich data
services, and a single source of support.
HCI enables faster, better, and simpler management of consolidated workloads, virtual desktops, business-critical
applications, and remote office infrastructure.
HCI solutions have distinct features including scalability, simplicity, and data services.
Scalability. Hyper-converged infrastructures are designed to scale out by adding nodes, which provides a
predictable ―pay-as-you-grow‖ approach. Adding nodes rather than separately adding CPUs or storage capacity,
VXRAIL CONCEPTS AND ARCHITECTURE
11 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
provides linear performance and an elastic infrastructure. Dynamic pooled resources are allocated according to
fluctuating workload requirements. This absorbs application workload spikes and maintains performance
consistency. Mid-sized IT departments or remote enterprise-edge locations, like branch offices, can implement an
inexpensive, entry-level HCI solution, starting small and then easily and non-disruptively scaling both capacity and
performance. HCI integration with public-cloud offerings can also seamlessly and securely expand capacity on
demand and without limits to provide a hybrid-cloud solution.
Simplicity. Hyper-convergence changes the game in terms of management and serviceability. Seamless
integration among HCI elements unifies operations, using familiar consistent interfaces, and simplifies
management. In addition, HCI facilitates simple workload mobility within the entire SDDC. The HCI management
software stack includes applications for monitoring, logging, security and access control, compliance and upgrades,
in addition to configuration utilities for virtual machines, network, and data services. The building-block design
provides a superior implementation model in which all the components have been fully integrated, preconfigured,
and tested, making the system simple to set up, expand, and maintain.
Data Services. HCI provides the same level of mission-critical data services provided by traditional high-end
enterprise storage arrays. Enterprise IT applications are designed with the expectation that the IT infrastructure is
equipped for consistent performance, high availability, and disaster recovery. HCI meets these expectations with
rich data services such as deduplication, compression, replication, and backup and recovery. HCI brings
consumption-based infrastructure economics and flexibility to enterprise IT without compromising on performance,
reliability, or availability.
So when should CI be implemented and when is HCI a better option? The answer depends on the scale and scope
of the infrastructure and the workloads. If the purpose is to support a large number of dense workloads and a
multi-petabyte capacity, then CI is a better option. But for a smallerset of workloads—including the most
demanding loads like databases and OLTP, but at a smaller scale—then HCI is an excellent option. It also is the
appropriate choice for specific departments or remote offices. In short, HCI is ideal for applications that need agility
and need to scale quickly at the lowest cost per unit. HCI is easy to deploy with little expertise. HCI doesn’t replace
CI, but it allows IT to better tier infrastructure for varied application needs. Most IT operations can benefit from a
combination of CI and HCI that can flex to meet the evolving demands of their business.
In summary, IT organizations are rapidly evolving into cloud-centric business models where agility, scalability,
security, resource optimization, and SLAs are paramount. The SDDC architecture makes the hybrid cloud possible
by defining a platform common to both private and public clouds. Enterprises have three ways to establish an
SDDC: 1) build their own; 2) use a converged infrastructure; or 3) use a hyper-converged infrastructure. With
seamless integration of the technology stack, both CI and HCI create platforms that allow IT organizations to
efficiently and effectively transition to a modern Software Defined Data Center (SDDC). HCI is the easiest and
fastest way to stand up a fully virtualized software-defined data center (SDDC) environment so IT organizations can
focus on innovation and adding business value.
VXRAIL CONCEPTS AND ARCHITECTURE
12 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 3: One Destination, Multiple Deployment Approaches
VCE Converged Infrastructure Platforms Overview This section reviews the VCE CI and HCI platform architectures and product portfolios and then specifically focuses
on the VCE VxRail Appliance. Included is an introduction to the VxRail Appliance architecture and components with
a specific emphasis on the key integrated VMware software technologies that provide VxRail Appliance core services
and functionality.
VCE, the Converged Platforms Division of EMC, specializes in industry-leading Converged and Hyper-Converged
Infrastructure platforms which simply and quickly transition data centers to a modern SDDC, enabling business
transformation. Simplicity is the core driver behind the VCE portfolio of CI platforms. The VCE mission is to break
down the silos of static infrastructure in the data center and make available flexible, shared pools of resources.
With the VCE portfolio, IT leaders have the flexibility to shift resources from maintaining infrastructure to delivering
new, innovative business services while remaining cost-effective. The VCE portfolio can quickly and reliably
modernize the data center to meet the evolving and dynamic demands of today’s tech-savvy business workforce.
VCE pioneered converged infrastructure with the introduction of Vblock Systems, which bring together VMware
virtualization, Cisco networking and compute, and EMC storage. The VCE portfolio expanded quickly, offering
increased choice, flexibility, and targeted application-workload solutions as new workload platforms emerged in the
industry. Applications are now typically identified by industry-defined workload platforms: Platform 1.0 which refers
to mainframe-application workloads; Platform 2.0, which refers to client-server and virtualized x86 traditional-
application workloads; and Platform 3.0, which refers to Big Data applications with new workloads built for cloud,
social, and mobile.
VXRAIL CONCEPTS AND ARCHITECTURE
13 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 4: Industry-Defined Workload Platforms
The full VCE portfolio features pre-integrated, preconfigured components, tested, validated and qualified with a
single source of support. The VCE portfolio is built on the widely adopted, industry leading VMware technology for
core functionality and management operations. The VCE portfolio features three distinct system-level architectures,
reflected in the graphic below. The architectures are Blocks, Racks, and Appliances and the correlated design points
are proven, flexible, and simple. Each architecture has its own distinct role in a SDDC and hybrid-cloud solution
based on application workload and business requirements.
Figure 5: VCE Portfolio
BLOCK ARCHITECTURE
In the Block architecture, VCE offers two product families, Vblock®Systems and VxBlock™ Systems. These systems
bring together VMware virtualization, Cisco networking and compute, and varied EMC storage arrays. The Block
VXRAIL CONCEPTS AND ARCHITECTURE
14 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
system architecture typically implements Cisco UCS server blades configured as ESXi hosts for compute layer
services. The VxBlock Systems adds two fully integrated options for software-defined networking (SDN) and
network-layer abstraction, VMware NSX technology or Cisco’s Application Centric Infrastructure (ACI). Within the
Vblock Systems product family, specific models correspond to specific data-center purposes, but they all focus on
traditional, mission-critical enterprise workloads.
The Block architecture design center is ―proven.‖ Vblock Systems and VxBlock Systems are proven and widely deployed. In fact, they have become an industry-standard CI system with the terms ―Vblock‖ and ―converged infrastructure‖ often used interchangeably.
The Block system-level architecture has disaggregated compute, memory, network, and storage which allows
for variation at all layers. Vblock Systems and VxBlock Systems also have the traditional elements required to deliver legacy persistence and networking capabilities. This Block system-level architecture has step-function scaling.
The Block architecture workload and business requirements focus on rich infrastructure services to support Platform 2.0 applications. Vblock Systems and VxBlock Systems both support any open-system workload in the data center and have a broad set of traditional data services to meet enterprise business requirements.
RACK ARCHITECTURE
The VxRack™ Systems expands VCE’s industry leading CI portfolio to include hyper-converged infrastructure. The
VxRack Systems architecture scales linearly with hyper-converged node servers that consolidate compute and
storage layers. It incorporates a leaf-spine network architecture specifically designed to accommodate extensive,
scale-out workloads and over a thousand nodes. VCE refers to the VxRack Systems platform as hyper-convergence
at rack scale. It represents a full system deployment that includes integrated storage-attached servers and network
hardware. The VxRack Systems implements VMware EVO SDDC to facilitate ESXi server-based software-defined
storage and to deploy a virtualized NSX network layer over the physical network fabric for SDN. VxRack Systems
provides performance, reliability, and operational simplicity at large scale.
The Rack architecture center is ―flexible.‖ VCE VxRack Systems is an example of the flexible design center. It’s an adaptable platform in terms of itshardware and persona. (Persona flexibility refers to VxRack Systemsability to run multiple hypervisors—ESXi or KVM—as well as support bare-metal deployments.)
Rack systems are engineered systems with network design as the key differentiator. At scale, leaf-and-spine and top-of-rack (ToR) cabling architectures are critical. Rack architecture incorporates the leaf-and-spine network and ToR cabling architecturesthat enable scaling to hundreds and thousands of nodes, deployed not in small clusters but as a massive, rack-scale, web-scale, and hyper-scale system. VxRack Systems incorporates the network fabric as a core part of the system design and management stack. The network is
not just bundled but rather is an integral part of the system with single support and warranty plus management integration. Rack system-level architecture uses software-defined storage (SDS) and commodity-off-the-shelf (COTS) hardware. This rack system-level architecture has linear-function scaling.
Rack-architecture workload and business requirements focus on flexibility for different workload types (Platform 2.0, Platform 3.0, kernel-mode VMs, Linux containers) and come in multiple personas. (VxRackSystems supports OpenStack and VMware hypervisors initially and will support others in the future).
APPLIANCE ARCHITECTURE
The hyper-converged VxRail Appliance features a clustered node architecture that consolidates compute, storage,
and management into a single, resilient, network-ready HCI unit. The software-defined architectural structure
converges server and storage resources, allowing a scale-out, building-block approach, and each appliance carries
management as an integral component. From a hardware perspective, the VxRail Appliance node is a server
equipped with integrated direct-attached storage. No network components are included with the appliance; VxRail
Appliance leaves that up to the customer (although VCE can bundle switch hardware and NSX can function as an
integrated option for SDN). Typically, organizations with a small IT staff can benefit from the simplicity of the
appliance architecture to expedite application deployment and take advantage of the same data services available
from high-end systems.
VXRAIL CONCEPTS AND ARCHITECTURE
15 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
The VxRail Appliance architecture design center is ―simple.‖ VxRail Applianceis simple to acquire, deploy, operate, scale, and maintain.
The VxRail Appliance system-level architecture uses SDS and multi-node servers with integrated storage and can leverage whatever network infrastructure is available. Appliance architecture provides low-cost and low-capacity entry points with simple configurations that can easily scale.
Appliance-architecture workload and business requirements focus on simplicity and the ability to start small and grow easily. VDI and productivity applications are examples of the initial workloads deployed in appliances.
Figure 6: VCE Blocks, Racks, and Appliances
All three VCE converged infrastructure architecture models can be deployed in the same data center or, as shown in
Figure 7 below, can be part of a Federated Enterprise Hybrid Cloud (FEHC) that allows integration of the entire suite
of data center solutions (including those in remote, branch, and edge locations) and provisioning of the resources in
local or remote sites using a common service catalog.
Figure 7: VCE Converged Infrastructure in the Enterprise Data Center
VCE VXRAILAPPLIANCE PRODUCT PROFILE
VxRail Appliance was jointly developed by EMC and VMware and is the only fully integrated, preconfigured, and
tested HCI appliance powered by VMware Hyper-Converged Software. Managed through the ubiquitous VMware
vCenter Server interface, VxRail Appliance provides a familiar VMware experience that enables streamlined
VXRAIL CONCEPTS AND ARCHITECTURE
16 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
deployment and the ability to extend existing IT tools and processes. The VxRail Appliance is fully loaded with
integrated, mission-critical data services from EMC and VMware including compression, deduplication, replication,
and backup. The VxRail Appliance delivers resiliency and centralized-management functionality enabling faster,
better, and simpler management of consolidated workloads, virtual desktops, business-critical applications, and
remote-office infrastructure. As the exclusive hyper-converged infrastructure appliance from VCE and VMware,
VxRail Appliance is the easiest and fastest way to stand up a fully virtualized SDDC environment.
VxRail Appliance provides an entry point to the SDDC and caters to small- and medium-sized environments, remote
and branch offices (ROBO), edge departments, and projects within larger organizations. Small-shop IT personnel
can benefit from the simplicity of the appliance model to expedite the application-deployment process while still
taking advantage of data services only typically available in high-end systems. VxRail Appliance allows businesses
to start small, with a single appliance, and scale non-disruptively. VxRail Appliance is highly configurable. Storage
can be configured for both all-flash or hybrid applications. In addition, appliances are available in nine different
models, each with a different configuration, scale points, and options for processors, storage, and cache capacity.
Finally, because the VxRail Applianceis jointly engineered, integrated, and tested, organizations can leverage a
single source of support and remote services from EMC.
Each VxRail Appliance holds four server nodes with direct-attached storage drives. VxRail Appliances are delivered
ready to deploy and ready to attach to a 10GB customer provided network. At the software layer,
VxRailApplianceuses VMware technology for server virtualization, network virtualization, and software-defined
storage. VxRail Appliance servers are configured as ESXi hosts, and VMs depend on the virtual switch for logical
networking. VMware Virtual SAN technology embeds storage pooling capabilities at the ESXi-kernel level, a highly
efficient design which dramatically reduces the complexities involved in infrastructure management. The policy-
based software in the management layer controls storage distribution based on application service settings.
The VxRail Appliance management platform is a strategic advantage for VxRail Appliance—a remedy for the HCI
systems inherent operational complexity. VxRail Appliance bundles management software as a centralized stack,
and the VxRail™ Manager and VxRail™ Manager Extension each have a simple dashboard interface to automate and
accelerate deployment and to perform management tasks like upgrades. Since VxRail Appliance nodes function as
ESXi hosts, the appliance taps vCenter Server for VM-related management, automation, monitoring, and security.
Furthermore, VxRail Appliance supports the wider-ranging VMware ecosystem for high availability, cloud
management, and end-user computing services. vSphere is a well-established virtualization platform—a familiar
usable entity in most data centers. The VxRail Appliance product relies on a tailor-made management stack rather
than the Advanced Management Pod model used by Vblock Systems and VxBlock Systems. However, all three VCE
product platforms leverage vCenter Server and offer support for optional VMware and EMC services.
Software-defined functionality provided by VxRail Appliance introduces significant advancements in IT services. The
appliance is built around VMware Hyper-Converged Software (HCS), an operational software stack that includes
vSphere functionality for ESXi-based virtualization and VM networking as well as Virtual SAN for SDS. NSX for SDN
can also be easily integrated into the solution as an option. A VxRail Appliance implementation integrates smoothly
into VMware-centric data centers and, as a VCE product, it operates in concert with the Block and Rack level
deployments. This allows all data-center assets to be maintained using a single administrative platform, which
means monitoring, upgrading, and diagnostics activities are performed efficiently and reliably. Blocks, Racks, and
Appliances use the same migration technologies from VMware for moving VMs and data, thus providing advantages
in workload mobility. Finally, VxRail Appliance supports existing tools and optional services with seamless
integration. The VxRail Appliance Extension provides additional EMC services, including RecoverPoint replication,
Data Domain for backup, EMC Remote Secure Services (ESRS), and cloud tiering services. VxRail Appliance also
has optional support for VCE Vision™ Intelligent Operations software, allowing IT shops to leverage integration with
VxRack Systems and Vblock Systems, enabling them to deliver a full enterprise solution for all workloads and to
replicate and protect from the enterprise edge to the data center.
VXRAIL CONCEPTS AND ARCHITECTURE
17 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
VxRailAppliance Hardware Architecture
The VxRail Appliance family is a proven building block of the Software-Defined Data Center and delivers up to five times
the performance of other hyper-converged appliances. The appliance-based design allows IT centers to scale capacity and
performance non-disruptively, so they can start small and grow incrementally with minimal up-front planning.
VxRailAppliance configurations can start with as fewas 200 virtual machines (VMs) and scale to thousands. The VxRail
Appliancearchitecture enables a predictable pay-as-you-grow approach that aligns to changing business goals and user
demand.
The VxRail Appliance is built using a distributed system architecture consisting of modular blocks (a 2U appliance with four
nodes) that scales linearly from oneto 16 appliances, for a maximum of 64 nodes in a cluster. In addition, different options
are available for compute, memory, and storage configurations to match any use case. Choose from a range of next-gen
Intel processors, variable RAM, storage, and cache capacityfor flexible CPU-to-RAM-to-storage ratios. Single-node scaling
and a low-cost entry point lets customersprocure just the right amount of storage and compute for today’s requirements
and tomorrow’s growth. Additionally, all-flash models deliver the industry’s most powerful HCI to maximize performance
and scale for applications that demand low latency. Figure 8 below shows the basic VxRail Appliance building block: A
four-node appliance with storage in front and compute in the back.
Figure 8: VxRail Appliance
VXRAIL APPLIANCE CLUSTER
Again, each VxRail Appliance consists of four nodes. Each node includes a server and six storage disk drives, either all-
flash SSDs or a hybrid mix of flash SSDs and HDDs. The nodes form a networked cluster that can be expanded by adding
more appliances (containing more nodes).
VxRail Appliance Node
The VxRail Appliance is assembled with proven server-node hardware that has been integrated, tested, and validated as a
complete solution by EMC. The current generation of VxRail Appliance nodes uses Haswell-based Intel Xeon E5-2600
processors. The Intel Xeon E5 processor family is a multi-threaded, multi-core CPU designed to handle diverse workloads
for cloud services, high-performance computing, and networking. The number of cores and memory capacity differ for
each VxRail Appliance model. Figure 9 below shows a physical view of a node server with its processors, memory and
supporting components.
VXRAIL CONCEPTS AND ARCHITECTURE
18 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 9: VxRail Appliance Physical Node Server
Each node server includes the following technology:
1 – 2 Intel Xeon E5-2600 V3 processors with 6, 8, or 10 cores per processor
16 DDR4 DIMMs, providing memory capacity from 64GB to 512GB per node
A PCIe SAS Controller supporting 6GB SAS speeds
A 64GB SATADOM sub-module
Dual-port network adapters
An integrated graphics BMC port, 2 USB ports, 1 Serial port, 1 VGA port
Figure 10 shows the single node from the back.
Figure 10: VxRail Appliance Node Server: Back View
VXRAIL CONCEPTS AND ARCHITECTURE
19 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
VxRail Appliance Node Storage Disk Drives
Storage capacity for the VxRail Appliance is provided by disk drives that have been integrated, tested, and validated by
EMC. 2.5‖ form-factor Solid State Disks (SDD) and mechanical Hard Disk Drives (HDD) are managed in logical groups.
Each group has up to six disk drives and each node has one disk group.Disk groups are configured in two ways:
Hybrid configurations, which contain a single SDD flash-based disk for caching (the caching-tier) and multiple HDD
disks for capacity (the capacity-tier)
All-flash configurations, which contain all SDD flash based disk drives
The flash drives used for caching and capacity have different endurance levels. Endurance level refers to the number of
times that an entire flash disk can be written every day for a five-year period before it has to be replaced. A higher-
endurance SSD is used for caching than for capacity. Currently, the caching tier uses 200GB, 400GB, and 800GB flash
disks, and the capacity tier uses either 3.84TB flash SSDs, 1.2TB HDDs, or 2TB HDDs. All VxRail Appliance disk
configurations use a carefully designed cache-to-capacity ratio to ensure consistent performance.
VXRAIL APPLIANCE MODELS AND SPECIFICATIONS
Nine VxRail Appliance models are currently available, ranging from the Model 60 with nodes containing a single, 6-core
processor and 64GB of memory to the Model 280F with nodes that use dual, 14-core processors and up to 512GB of
memory. Figure 11identifies the configuration range for both the hybrid and all-flash nodes.
Figure 11: Configuration ranges for all-flash and hybridnodes.
(*Certain selections can limit other options that are available.)
VXRAIL CONCEPTS AND ARCHITECTURE
20 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 12 shows the five VxRail Appliance models that have nodes containing all-flash storage, and Figure 13 shows the
four hybrid disk-configuration models.
Figure 12: All-Flash VxRail Appliance Models
Figure 13: Hybrid VxRail Appliance Models
Scaling
Current model configurations start with as few as four nodes housed in a single appliance and can grow in one-appliance
increments up to 16 appliances (64 nodes). New appliances can be added non-disruptively, and different model appliances
can be mixed within the larger appliance cluster environment.
VXRAIL CONCEPTS AND ARCHITECTURE
21 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 14: VxRail Appliance Scaling
A few basic rules regarding scaling are worth considering for planning a cluster build out:
1. Balance: All nodes in an appliance chassis must be balanced (i.e., be the same).
a. Only the first appliance must include full four nodes.
b. Additional appliances can be partially populated with 1, 2, or 3 nodes, or they can be fully populated.
c. If a drive is added to one node in an appliance, all nodes in that appliance must also receive the drive upgrade.
2. Flexibility: Appliances in a cluster can be different models and can have different numbers of nodes.
a. Exceptions:
Hybrid models and flash models cannot be mixed in a cluster.
1GB models (i.e. the VxRail 60) cannot be mixed with 10GB-networking models (i.e. VxRail 120 and higher).
VXRAIL CONCEPTS AND ARCHITECTURE
22 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
VxRail Appliance Software Architecture
These sections on software architecture provide a comprehensive examination of all the VxRail Appliancesoftware
components and their relationships and co-dependencies. The VCE VxRail Appliance is architected with software for
appliance management and for virtualization and virtual-system management. The software stack comes preinstalled and
simply requires running a configuration wizard on-site to integrate the appliance into an existing network environment.
The picture below (Figure 15) shows the software layers and the previously discussed underlying hardware represented at
a high level.
The VxRail Appliance management, operations, and automation software includes
VxRail Manager
VxRail Manager Extension (including VMware vRealize Log Insight–formerly vCenter Log Insight)
Supplemental management options: VCE Vision Intelligent Operations software and additional VMware vRealize components
The VMware virtualization and virtual-infrastructure management software includes
vSphere vCenter Server
vSphere ESXi
VMware Virtual SAN (Software-Defined Storage)
Figure 15: VxRailAppliance Infrastructure Components
VxRail Appliance provides a unique and tightly integrated architecture for VMware environments. VxRail Appliance deeply
integrates VMware virtualization software. Specifically, VMware Virtual SAN is integrated at the kernel level and is
managed with VMware vSphere, which enables higher performance for the VxRail Appliance as well as automated scaling
and wizard-based upgrades.
The next sections review the VxRail Appliance management, operations, and automation software in depth.
VXRAIL CONCEPTS AND ARCHITECTURE
23 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
APPLIANCE MANAGEMENT
VxRail™ Manager
In the introduction section of this TechBook, we discussed the complexity of the software-defined data center and the
challenges of managing and maintaining an SDDC environment. The VxRail Manager provides a user-friendly dashboard
interface (shown below in Figure 16) to automate VxRail Applianceconfiguration, VM provisioning, and management. The
dashboard Health Tab can be used to monitor the health of all individual appliances and individual nodes in the entire
cluster.
Once the appliance is configured and deployed, VxRail Manager can be accessed by pointing a browser at the VxRail
Manager IP address or the DNS host name.
Figure 16: VxRail Manager Dashboard: The Home view displays all the VMs, and the Health Tab indicates CPU, memory, storage, and usage.
VxRail™ Manager Extension
VxRail Manager Extension is used foradding new appliances to an existing cluster easily and non-disruptively, monitoring
the appliance resource utilization, expediting diagnostics, and troubleshooting software problems. It can, for instance,
guide systems administrators through the replacement of failed disk drives without disrupting the appliance’s availability.
VXRAIL CONCEPTS AND ARCHITECTURE
24 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
The VxRail Manager Extension leverages the underlying VMware vRealize Log Insight product to capture events and
provide real-time holistic notifications about the state of virtual applications, virtual machines, and appliance hardware.
The VxRail Manager Extension adopts the simple, effectivedashboard user interface (shown below in Figure 17) of the
VxRail Manager, providing a consistent look and feel for convenient access to EMC services.
Figure 17: VxRail Manager Extension displays overall system health, and its Support Tab displays support status information and resources.
The VxRail Manager Extension dashboard lets users directlyreachthings like EMC knowledge-base articles and user-
community forums for FAQ information and VxRail Appliancebest practices.The VxRail Manager Extension also provides
service integration and simplifies the appliance lifecycle management by delivering patch software and update notifications
that can be automatically installed without interruption or downtime.
Another feature within the VxRail Manager Extension is EMC Software Remote Services (ESRS),which enables appliances
deployed off-site to have the same level of support and service as the devices deployed in the main datacenter. ESRS also
can be used for online chat support andEMC field-service assistance.Figure 18 below summarizes its implementation
details.
VXRAIL CONCEPTS AND ARCHITECTURE
25 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 18: VxRail Manager Extension ESRS details
Furthermore, the VxRail Manager Extension provides access to a digital market (Figure 19) for finding and downloading
qualified, value-add VxRail Appliance VM applications such as CloudArray, RecoverPoint for VMs, and vSphere Data
Protection (VDP).
Figure 19: VxRail Manager Extension Dashboard – Market Tab
In addition to service integration, the VxRail Manager Extension augments the VxRail Manager health monitoring via
integration with the VMware vRealize Log Insight to track alerts for hardware, software, and virtual machines. It delivers
real-time automated log management for the VxRail Appliance with log monitoring, intelligent grouping, and analytics to
provide better troubleshooting at scale across VxRail Appliancephysical, virtual, and cloud environments.
VXRAIL CONCEPTS AND ARCHITECTURE
26 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
VMWARE VSPHERE
The VMware vSphere software suite delivers an industry-leading virtualization platform to provide application virtualization
within a highly available, resilient, efficient on-demand infrastructure—making it the ideal software foundation for VxRail
Appliance. ESXi and vCenter are components of the vSphere software suite. ESXi is a hypervisor installed directly onto a
physical server node in VxRail Appliance, enabling it to be partitioned into multiple logical servers referred to as virtual
machines (VMs). VMs are installed on top of the ESXi server. VMware vCenter server is a centralized management
application that is used to manage the ESXi hosts and VMs.
The following sections will provide in-depth examination of the VMware vSphere software components that are
implemented in the VxRail Appliance software architecture.
VMware vSphere vCenter Server
VxRail Appliance usesvSphere vCenter Server from VMware as the central administrator for networked ESXi hosts. vCenter
Server provides the VxRail Appliance with trusted, functional, and familiar VM management. vCenter Server enables
pooling and manages resources from multiple ESXi servers. (See Figures 20 and 21 below.) A single vCenter Server can
manage up to 1,000 ESXi hosts and/or up to 10,000 virtual machines.
The vCenter Server architecture includes the following components:
vSphere Client,which provides direct connection to ESXi hosts.
vSphere Web Client,which provides direct connection to vCenter Server.
vCenter Server database, which functions as the back-end SQL database for storing the inventory items, security roles, resource pools, performance data, and other critical information for vCenter Server.
VMware vSphere Platform Services Controller (PSC), which is a new service in vSphere 6that handles the infrastructure security functions such as vCenter Single Sign-On, licensing, certificate management, directory services, and server reservation. The PSC also includes a Lookup Service that keeps topology information about the vSphere infrastructure for secure component interconnectivity. Other services (such as the Inventory Service) register with the Lookup Service so they can be located by vCenter Server components (like the vSphere Web Client).
Figure 20: vCenter Server Architecture (1 of 2)
VXRAIL CONCEPTS AND ARCHITECTURE
27 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 21: vCenter Server Architecture (2 of 2)
vCenter Server Services and Interfaces
vCenter provides a number of services and interfaces, including
Core VM and resource services such as an inventory service, task scheduling, statistics logging, alarm and event management, and VM provisioning and configuration
Distributed services such as vSphere vMotion, vSphere DRS, and vSphere HA
vCenter Server database interface
Figure 22: vCenter Server services
PSC Deployment Options
The Platform Services Controller (PSC) can be deployed either as embedded or external, as depicted in Figure 23.
Embedded PSC is implemented in stand-alone deployments where vCenter Server is the only SSO-integrated solution. The vCenter Server is bundled with an embedded PSC, and all the PSC services reside on the same host machine as vCenter Server.
VXRAIL CONCEPTS AND ARCHITECTURE
28 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
External PSC is deployed in environments with multiple SSO-enabled solutions, and supports an Enhanced Linked Mode (ELM) that connects multiple vCenter Servers to the External PSC. VxRail Applianceadministrators have a clear view of all the vCenter Server instances across all linked vCenter Server systems and can create and replicate roles, permissions, licenses, and other key data. vCenter supports High-Availability External PSC configurations, where multiple PSCs use a load balancer to provide resilientavailability. (See Figure 24.) The vCenter Server systems can then join that PSC domain using the IP address of the load balancer. In the end, the ELM-created replicated services that exist on multiple instances of vCenter Server can be attached to two PSCs implemented in a highly available configuration, which is resilient to failures.
Figure 23: Embedded and External PCS deployments
Figure 24: External PSCs configured for High Availability
VMware vSphere ESXi
vSphere is the core operational software in the VxRail Appliance. vSphere aggregates a comprehensive set of features that
efficiently pools and manages the resources available under the ESXi hosts. Keep in mind that this TechBook focuses on
vSphere technology specifically as it pertains to the VxRail Appliance. Features included in other vSphere implementations
may not apply to VxRail Appliance and features included in VxRail Appliance may not apply to other implementations.
ESXi Overview
VMware ESXi is an enterprise-class hypervisor that deploys and services virtual machines. Diagram 25 illustrates its basic
architecture. ESXi partitions a physical server into multiple secure and portable VMs that can run side by side on the same
physical server. Each VM represents a complete system—with processors, memory, networking, storage, and BIOS—soany
VXRAIL CONCEPTS AND ARCHITECTURE
29 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
operating system (guest OS)and software applications can be installed and run in the virtual machine without any
modification.The hypervisor provides physical-hardware resources dynamically to virtual machines (VMs) as needed to
support the operation of the VMs. The hypervisor enables virtual machines to operate with a degree of independence from
the underlying physical hardware. For example, a virtual machine can be moved from one physical host to another. Also,
the VM’s virtual disks can be moved from one type of storage to anotherwithout affecting the functioning of the virtual
machine. ESXi also isolates VMs from one another, so when a guest operating system running in one VM fails, other VMs
on the same physical host are unaffected and continue to run. Virtual machines share access to CPUs and the hypervisor
is responsible for CPU scheduling. In addition, ESXi assigns VMs a region of usable memoryand provides shared access to
the physical network cards and disk controllers associated with the physical host. Different virtual machines can run
different operating systems and applications on the same physical computer.
Figure 25: Birds-Eye View: vSphere ESXi Architecture
Communication Between vCenter Server and ESXi Hosts
vCenter Server communicates with the ESXi host through a vCenter Server agent, also referred to as vpxa or the
vmware-vpxa service, which is started on the ESXi host when it is added to the vCenter Server inventory. (See Figure
26.) Specifically, the vCenter vpxd daemon communicates through the vpxa to the ESXi host daemon known as the
hostd process. The vpxa process acts as an intermediary between the vpxd process that runs on vCenter Server and the
hostd process that runs on the ESXi host, relaying the tasks to perform on the host. The hostd process runs directly on
the ESXi host and is responsible for managing most of the operations on the ESXi host including creating VMs, migrating
VMs, and powering on VMs.
VXRAIL CONCEPTS AND ARCHITECTURE
30 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 26: Communication Between vCenter and ESXi Hosts
Virtual Machines
A virtual machine consists of a core set of the following related files, or a set of objects. (See Figure 27.) Except for the
log files, the name of each file starts with the virtual machine’s name (VM_name). These files include
A configuration file (.vmx) and/or a virtual-machine template-configuration file (.vmtx)
One or more virtual disk files (.vmdk)
A file containing the virtual machine’s BIOS settings (.nvram)
A virtual machine’s current log file (.log) and a set of files used to archive old log entries (-#.log)
Swap files (.vswp), used to reclaim memory during periods of contention
A snapshot description file (.vmsd), which is empty if the virtual machine has no snapshots
Figure 27: Virtual Machine Files
VXRAIL CONCEPTS AND ARCHITECTURE
31 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Virtual Machine Hardware
A virtual machine uses virtual hardware. Each guest operating system sees ordinary hardware devices and does not know
that these devices are virtual. (Hardware resources are shown below in Figure 28.) All virtual machines have uniform
hardware, except for a few variations that the system administrator can apply. Uniform hardware makes virtual machines
portable across VMware virtualization platforms. vSphere supports many of the latest CPU features, including virtual CPU
performance counters. It is possible to add virtual hard disks and NICs, and configure virtual hardware, such as CD/DVD
drives, floppy drives, SCSI devices, USB devices, and up to 16 PCI vSphere DirectPath I/O devices.
Figure 28: Hardware resources for VMs
Virtual Machine Communication
The Virtual Machine Communication Interface (VMCI) provides a high-speed communication channel between a virtual
machine and the hypervisor. VMCI devices cannot be added or removed. The SATA controller provides access to virtual
disks and DVD/CD-ROM devices. The SATA virtual controller appears to a virtual machine as an AHCI SATA controller.
Without VMCI, virtual machines would communicate with the host using the network layer, which adds overhead to the
communication. With VMCI, communication overhead is minimal, and tasks requiring that communication can be
optimized. An internal network can transmit an average of slightly over 2Gbps using VMXNET3. VMCI can go up to nearly
10Gbps with 12,8k-sized queue pairs.
VMCI provides socket APIs that are very similar to the APIs already used for TCP/UDP applications.
For more information about the virtual hardware, see the vSphere Virtual Machine Administration Guide at
https://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-6-pubs.html.
Virtual Networking
VMware vSphere provides a rich set of networking capabilities that integrate well with sophisticated enterprise networks.
These networking capabilities are provided by ESXi Server and managed by vCenter. Virtual networking provides the
ability to network virtual machines in the same way physical machines are networked. Virtual networks can be built
VXRAIL CONCEPTS AND ARCHITECTURE
32 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
withina single ESX Server host or across multiple ESX Server hosts. Virtual switches allow virtual machines on the same
ESX Server host to communicate with each other using the same protocols that would be used over physical switches,
without the need for additional networking hardware. ESX Server virtual switches also support VLANs that are compatible
with standard VLAN implementations from other vendors. A virtual switch, like a physical Ethernet switch, forwards frames
at the data link layer.A virtual machine can be configured with one or more virtual Ethernet adapters, each of which has
its own IP address and MAC address. As a result, virtual machines have the same properties as physical machines from a
networking standpoint. In addition, virtual networks enable functionality not possible with physical networks today. The
key virtual networking components provided by vSphereare virtual Ethernet adapters, used by individual virtual machines
and virtual switches, which connect virtual machines to each other and connect both virtual machines and the ESX Server
service console to external networks.
Figure 29: Virtual Switch Architecture
An ESXi host might contain multiple virtual switches. The virtual switch connects to the external network through
outbound Ethernet adapters called vmnics, and the virtual switch can bind multiple vmnics together (much like NIC
teaming on a traditional server), extending availability and bandwidth to the virtual machines it services.
Virtual switches are similar to their physical-switch counterparts. A general architecture is depicted in Figure29. Like a
physical network device, each virtual switch is isolated for security and has its own forwarding table. An entry in one table
cannot point to another port on another virtual switch. The switch looks up only destinations that match the ports on the
virtual switch where the frame originated. This feature stops potential hackers from breakingvirtual switch isolation.
Virtual switches also support VLAN segmentation at the port level, so each port can be configured either as an access port
to a single VLAN or as a trunk port tomultiple VLANs.
VMware has developed two virtual switches—the standard switch and the distributed switch—for different applications. The
VxRail Appliance supports both switch types through vCenter Server.
Standard Virtual Switch
The standard virtual switch is responsible for connecting virtual machines to a virtual network. It works similar to a
physical switch and controls how virtual machines communicate with one another. The standard switchhas a host-level
virtual network configuration.In this case, each ESXi host uses the standard switch both to connect virtual machines to the
physical network and to connect the physical network to VMkernel services, including access to IP storage, such as NFS or
iSCSI.
VXRAIL CONCEPTS AND ARCHITECTURE
33 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 30: Single Standard Switch
More than one network can coexist on the same virtual switch (Figure 30), or multiple networks can exist on separate
virtual switches (Figure 31).
Figure 31: Multiple Standard Switches
Virtual Distributed Switch
The VMware vSphere Distributed Switch (VDS) has similar components to those of a standard switch, but functions as a
single virtual switch across all associated hosts. This switch enables virtual machines to maintain consistent network
configuration as they migrate across multiple hosts. A distributed switch is configured in vCenter Server at the datacenter
level and makes the configuration consistent across all hosts. vCenter Server stores the state of distributed ports in the
vCenter Server database. Networking statistics and policies migrate with virtual machines when the virtual machines are
moved from host to host. As we discuss in upcoming sections, Virtual SAN relies on VDS for its storage-virtualization
capabilities, and the VxRail Appliance uses VDS for appliance traffic.
Figure 32 provides a VDS overview. Detailed information about VDS is available at:
https://www.vmware.com/products/vsphere/features/distributed-switch#sthash.WC5hSHzt.dpuf
VXRAIL CONCEPTS AND ARCHITECTURE
34 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 32: Distributed Switch
Migration and VMotion
The advanced capabilities for migrating data without disruption is one of the features that distinguishes the VxRail
Appliance solution from other HCI options. In the vSphere virtual infrastructure, migration refers to moving a virtual
machine from one host, datastore, or vCenter Server system to another host, datastore, or vCenter Server system.
Different types of migrations exist including
Cold, which is migrating a powered-off VM to a new host or datastore
Suspended, which is migrating a suspended VM to a new host or datastore
Live, which uses vSphere vMotion to migrate a ―live,‖ powered-on VM to a new host and/or uses vSphere Storage vMotion to migrate the files of a live, powered-on VM to a new datastore
vMotion allows for live migration of virtual machines between compatible ESXi hosts with no disruption or downtime. The
process is summarized in Figure 33. With vMotion, while the entire state of the virtual machine is migrated, the data
storage remains in the same datastore. The state information includes the current memory content and all the information
that defines and identifies the virtual machine. The memory content consists of transaction data and whatever bits of the
operating system and applications in memory. The definition and identification information stored in the state includes all
the data that maps to the virtual machine hardware elements, including BIOS, devices, CPU, and MAC addresses for the
Ethernet cards.
VXRAIL CONCEPTS AND ARCHITECTURE
35 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 33: vMotion Migration
A vMotion migration consists of the following steps:
1. The VM memory state is copied over the vMotion network from the source host to the target host. Users continue to access the VM and, potentially, update pages in memory. A list of modified pages in memory is kept in a memory bitmap on the source host.
2. After most of the VM memory is copied from the source host to the target host, the VM is quiesced. No additional activity occurs on the VM. During the quiesce period, vMotion transfers the VM-device state and memory bitmap to the destination host.
3. Immediately after the VM is quiesced on the source host, the VM is initialized and starts running on the target host. A Gratuitous Address Resolution Protocol (GARP) request notifies the subnet that the MAC address for the VM is now on a new switch port.
4. Users access the VM on the target host instead of the source host. The memory pages used by the VM on the source host are marked as free.
Enhanced vMotion Compatibility
Enhanced vMotion Compatibility (EVC) is a cluster feature that prevents vMotion migrations from failing because of
incompatible CPUs. EVC ensures that all hosts in a cluster present the same CPU feature set to virtual machines, even if
the actual CPUs on the hosts differ. It prevents migration failures due to CPU incompatibility.
Storage vMotion
Storage vMotion uses an I/O-mirroring architecture to copy disk blocks between source and destination. The image below
(Figure 34) helps to describe the process:
1. Initiate storage migration.
2. Use the VMkernel data mover and provide vSphere Storage APIs for Array Integration (VAAI) to copy data.
3. Start a new VM process.
4. Mirror I/O calls to file blocks that have already been copied to virtual disk on the target datastore.
5. Switch to the target-VM process to begin accessing the virtual-disk copy.
VXRAIL CONCEPTS AND ARCHITECTURE
36 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 34: Storage vMotion
The storage-migration process copies the disk just once, and the mirror driver synchronizes the source and target blocks
with no need for recursive passes. In other words, if the source blockchanges after it migrates, the mirror driver writes to
both disks simultaneously which maintains transactional integrity. The mirroring architecture of Storage vMotion produces
more predictable results, shorter migration times, and fewer I/O operations than more conventional storage-migration
options. It’s fast enough to be unnoticeable to the end user. It also guarantees migration success even when using a slow
destination disk.
vSphere 6.0 supports the following Storage vMotion migrations:
Between clusters
Between datastores
Between networks
Between vCenter Server instances for vCenter Servers configured in Enhanced Link Mode with hosts that are time-synchronized
Over long distances (up to 150ms round trip time)
vSphere Distributed Resource Scheduler
VMware Distributed Resource Scheduler (DRS) is a key feature included with vSphere EnterprisePlus and vSphere with
Operations Management Enterprise Plus. DRS balances computing capacity across a collection of VxRail Appliance server
resources that have been aggregated into logical pools. It continuously balances and optimizescompute resource allocation
among the VMs. When a VM experiences an increased workload, DRS evaluates the VM priority against user-defined
resource-allocation rules and policies. If justified, DRS allocates additional resources. It can also be configured to
dedicateconsistent resources to the VMs of particular business-unit applications tomeet SLAs and business requirements.
DRS allocates resources to the VM either by migrating the VM to another server with more available resources or by
making more ―resources‖ for the VM on the same server by migrating other VMs off the server. In the VxRail Appliance, all
ESXi hosts are part of a vMotion network. The live migration of VMs to different node servers is completely transparent to
end users through VMotion (see Figures 35 and 36 below). DRS adds tremendous value to the VxRail Appliance by
automating VM placement, ensuring consistent and predictable application-workload performance.
VXRAIL CONCEPTS AND ARCHITECTURE
37 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 35: DRS Movement of VMs Across Node Servers
Figure 36: VM migration across the vMotion Network
DRS offers a considerable advantage to VxRail Appliance users during maintenance situations, because it automates the
tasks normally involved in manually moving live machines during upgrades or repairs. DRSfacilitates maintenance
automation, providing transparent, continuous operations bydynamically migrating all VMs to other physical servers. That
way, servers can be attended to for maintenance, or new node servers can be added to a resource pool, all while DRS
automatically redistributes the VMs among the available servers as the physical resources change. In other words, DRS
dynamically balances VMs as soon as additional resources become available when anew server is added or when an
existing server has finished its maintenance cycle. DRS allocates only CPU and memory resources for the VMs and uses
Virtual SAN for shared storage.
VXRAIL CONCEPTS AND ARCHITECTURE
38 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 37: Configuring DRS Settings
Some conditions and business operations warrant a more aggressive DRS migration strategy than others. Adjustable
Virtual SAN cluster parameters establish the thresholds that trigger DRS migrations. For example, a Level-2 threshold only
appliesspecified migration recommendations to make a significant impact on the cluster’s load balance, whereas a Level-5
threshold applies all the recommendations to even slightly improve the cluster’s load balance.
DRS applies only to VxRail Appliance virtual machines. (Virtual SANuses a single datastore and handles placement and
balancing internally. Virtual SANdoes not currently support Storage DRS or Storage I/O Control.)
vSphere High Availability (HA)
vSphere provides several solutions to ensure a high level of availability, both planned and unplanned downtime
scenarios.vSphere depends on the following technologies to make sure that virtual machines running in the environment
remain available (as in Figure 38):
Virtual machine migration
Multiple I/O adapter paths
Virtual machine load balancing
Fault tolerance
Disaster recovery
Together with Virtual SAN, vSphere HA produces a resilient, highly available solution for VxRail Appliance virtual machine
workloads. vSphere HA protects virtual machines by restarting them in the event of a host failure. It leverages the ESXi
cluster configuration to ensure rapid recovery from outages, providing cost-effective high availability for applications
running in virtual machines.When a host joins a cluster, its resources become part of the cluster resources. The cluster
manages the resources of all hosts within it. In a vSphere environment, ESXi clusters are responsible for vSphere HA,
DRS, and the Virtual SAN technology that provides VxRail Appliance software-defined storage capabilities.
VXRAIL CONCEPTS AND ARCHITECTURE
39 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 38: vSphere HA
vSphere HA provides several points of protection for applications:
It circumvents any server failure by restarting the virtual machines on other hosts within the cluster.
It continuously monitors virtual machines and resetsany detected VM failures.
It protects against datastore accessibility failures and provides automated recovery for affected virtual machines. With Virtual Machine Component Protection (VMCP), the affected VMs are restarted on other hosts that still have access to the datastores.
It protects virtual machines against network isolation by restarting them if their host becomes isolated on the management or VMware Virtual SAN network. This protection is provided even if the network has become partitioned.
Once vSphere HA is configured, all workloads are protected. No actions are required to protect new virtual machines and
no special software needs to exist within the application or virtual machine.
Included in the failover capabilities in vSphere HA is a service called the Fault Domain Manager (FDM) that runs on the
member hosts. After the FDM agents have started, the cluster hosts become part of a fault domain, and a host can exist in
only one fault domain at a time.Hosts cannot participate in a fault domain if they are in maintenance mode, standby
mode, or disconnected from vCenter Server.
VXRAIL CONCEPTS AND ARCHITECTURE
40 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 39: Fault Domain Management
FDM uses a master-slave operational model (Figure 39). An automatically designated master host manages the fault
domain, and the remaining hosts are slaves. FDM agents on slave hosts communicate with the FDM service on the master
host using a secure TCP connection. In the VxRail Appliance environment, vSphere HA is enabled only afterthe Virtual SAN
cluster has been configured.Once vSphere HA has started, vCenter Server contacts the master host agent and sends it a
list of cluster-member hosts along with the cluster configuration. That information is saved to local storage on the master
host and then pushed out to the slave hosts in the cluster. If additional hosts are added to the cluster during normal
operation, the master agent sends an update to all hosts in the cluster.
The master host provides an interface to vCenter Server for querying and reporting on the state of the fault domain and
virtual machine availability. vCenter Server governs the vSphere HA agent, identifying the virtual machines to protect and
maintaining a VM-to-host compatibility list. The agent learns of state changes through hostd,and vCenter Server learns of
them through vpxa. The master host monitors the health of the slaves and takes responsibility for virtual machines that
had been running on a failed slave host. Meanwhile, the slave host monitors the health of its local virtual machines and
sends state changes to the master host. A slave host also monitors the health of the master host.
vSphere HA is configured, managed, and monitored through vCenter Server. Cluster configuration data is maintained by
the vCenter Servervpxd process. If vxpd reports any cluster configuration changes to the master agent, the master
advertises a new copy of the cluster configuration information and then each slave fetches the updated copy and writes
the new information to local storage. Each datastore includes a list of protected virtual machines. The list is updated after
vCenter Server notices any user-initiated power-on (protected) or power-off (unprotected) operation.
vCenter Server Watchdog
One method of providing vCenter Server availability is to use the Watchdog feature in a vSphere HA cluster. Watchdog
monitors and protects vCenter Server services. If any services fail, Watchdog attempts to restart them. If it cannot restart
the service because of a host failure, vSphere HA restarts the virtual machine (VM) running the service on a new host.
Watchdog can provide better availability by using vCenter Server processes (PID Watchdog) or the vCenter Server API
(API Watchdog).
VXRAIL CONCEPTS AND ARCHITECTURE
41 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
vSphere Fault Tolerance (FT)
vSphere Fault Tolerance provides a higher level of availability, allowing users to protect any virtual machine from a host
failure with no loss of data, transactions, or connections. Fault Tolerance works through redundancy. It duplicates the
virtual machine workload and transactions onto an identical virtual machine on a different host so it can be used for
transparent failover. In other words, it implements a primary and secondary VM, as in Figure 40 below. The key is
ensuring that the states of the primary and secondary virtual machines remain identical at all points in the instruction
execution.
Figure 40: Fault Tolerance
vSphere Fault Tolerance creates two complete virtual machines. Each virtual machine has its own .vmx configuration file
and .vmdkfiles. The protected virtual machine is the primary, and the secondary VM runs on another host. It can take
over at any point without interruption, providing fault-tolerant protection.
The primary and secondary virtual machines continuously monitor the status of one another to securely maintain fault
tolerance. If the primary VM fails, the secondary is activated immediately as a replacement. At that point, a new
secondary virtual machine is started and redundant fault tolerance is reestablished automatically. Furthermore, if a host
failure occurs on the secondary VM, it is also immediately replaced. In either case, users experience no interruption in
service and no loss of data.
vSphere Fault Tolerance needs to be compatible with DRS. Using both solutions requires that the Enhanced vMotion
Compatibility mode be enabled. Then DRS can make initial placement recommendations for fault-tolerant virtual machines
knowing that fault-tolerant primary and secondary VMs cannot run on the same host.
vSphere Fault Tolerance can accommodate symmetric multiprocessor (SMP) virtual machines with up to four vCPUs.
VXRAIL CONCEPTS AND ARCHITECTURE
42 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
VIRTUAL SAN
VxRail Appliance leverages VMware’s Virtual SAN software, which is fully integrated with vSphere to access full-
featured, efficient, and cost-effective software-defined storage. Virtual SAN aggregates locally attached disks of
vSphere cluster hosts to create a pool of distributed shared storage. (See Figure41 below.) IT centers can easily
scale up the Virtual SAN storage solution by adding new or larger disks to the ESXi hosts (nodes) and just as easily
scale it out by adding new ESXi hosts to the cluster. This provides the flexibility to start with a very small
environment and scale it over time,adding new hosts and more disks. VM-level policies can be set and modified on
the fly to control storage provisioning and day-to-day management of storage service-level agreements
(SLAs).vSphere and Virtual SANare integrated into VxRail Appliance to deliver enterprise-class features for VMs
such as vMotion, HA, and DRS and to provide storage scale and performance.
Virtual SAN is a software-based distributed storage solution that is built into the ESXi hypervisor. It’s preconfigured
and managed through vCenter to provide storage capacity across all VxRail Appliance nodes. The appliance-
initialization process collects locally attached storage disks from each ESXi node in the cluster to create a
distributed, shared-storage datastore. The amount of storage in the Virtual SAN datastore is an aggregate of all of
the capacity drives in the cluster. Cache drives are not used in calculating the size of the datastore. For example, if
a cluster has eight hosts, and each host contributes three 12GB SAS drives, the Virtual SAN datastore will be
approximately 288GB. All VMs created in VxRail Appliance are automatically added to the Virtual SAN datastore.A
typical VxRail Appliance configuration would have four ESXi node servers for each appliance, and the disk group for
each node contains at least one flash SSD and three-to-five HDDs.
Figure 41: Virtual SAN Datastore
Virtual SAN enables rapid storage provisioning within vCenter as part of the VM-creation and -deployment operations.
Virtual SAN is policy driven and designed to simplify storage provisioning and management. It automatically and
dynamically matches requirements with underlying storage resources based on VM-level storage policies. With Virtual
SAN, VxRail Appliance provides two different node-storage configuration options:Ahybrid configuration that leverages both
flash SSDs and mechanical HDDs, and an all-flash SSD configuration. The hybrid configuration usesflash SSDs at
VXRAIL CONCEPTS AND ARCHITECTURE
43 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
thecachetier and mechanical HDDs for capacity and persistent data storage. This delivers enterprise performance and a
resilient storage platform. The all-flash configuration uses flash SSDs for both the cachingtier and capacitytier.
Disk Groups
Storage disks in VxRail Appliance hosts are organized into disk groups, and they contribute to the storage available
from the Virtual SAN cluster. Think of disk groups as the main unit of storage in on an ESXi host. (See Figure
42below.) In a VxRail Appliance, a disk group contains a maximum of one flash-cache device and up to five
capacity devices: Either mechanical disks or flash devices used as capacity in an all-flash configuration. Each server
node (ESXi host) has its own disk group.
Figure 42: VxRail Disk Groups
In hybrid configurations, a disk group combines a single flash-based device for caching with multiple mechanical-
disk devices for capacity. For theses deployments, the flash device is assigned during configuration to provide the
cache for a given set of capacity devices. This gives a degree of control over performance because the cache-to-
capacity ratio is based on disk-group configuration. Wider cache-to-capacity ratios generally require flash devices of
larger capacity. Currently, the VxRail Appliance is offered with 200GB, 400GB, or 800GB cache-tier flash devices for
hybrid configurations.
The screenshot below (Figure 43) identifiesthe disk group on a hostthat contains four disks. The first is a flash SSD,
and its role is defined as Cache. The other three disks are HDDs defined as Capacity. The role of the disks, either
cache or capacity, is automatically set in the VxRail Appliance.
VXRAIL CONCEPTS AND ARCHITECTURE
44 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 43: Disk Group Configuration
Hybrid and All-Flash Differences
The cache is used differently in hybrid and all-flash configurations. In hybrid disk-group configurations (which use
mechanical HDDs for the capacitytier and flash SSD devices for the cachingtier), the caching algorithm attempts to
maximize both read and write performance.The flash SSD device serves two purposes: Aread cache and a write
buffer. Seventy percent of the available cache is allocated for storing frequently read disk blocks, minimizing
accesses to the slower mechanical disks. The remaining 30 percent of available cache is allocated to writes. Multiple
writes are coalesced and written sequentially if possible, again maximizing mechanical HDD performance.
In all-flash configurations, one designated flash SSD device is used for the cache tier, while additional flash SSD
devices are used for the capacitytier. In all-flash disk-group configurations, there are two types of flash SSDs: A
very fast and durable flash device that functions as write cache and more cost-effective SSD devices that function
as capacity. Here, the cache-tier SSD is 100 percent allocated for writes. None of the flash cache is used for reads;
read performance from capacity-tier flash SSDs is more than sufficient for high performance. Many more writes can
be held by the cacheSSD in an all-flash configuration, and writes are only written to capacitywhen needed, which
extends the life of the capacity-tier SSD.
While both configurations dramatically improve the performance of VMs running on Virtual SAN, all-flash
configurations provide the most predictable and uniform performance regardless of workload.
Read Cache: Basic Function
The read cache, which only exists in hybrid configurations, keeps a collection of recently read disk blocks. This
reduces the I/O read latency in the event of a cache hit, i.e. the disk block can be fetched from cache rather than
mechanical disk. For a given VM data block, Virtual SAN always reads from the same replica/mirror. However, when
there are multiple replicas (to tolerate failures), Virtual SAN divides up the caching of the data blocks evenly
between the replica copies.
VXRAIL CONCEPTS AND ARCHITECTURE
45 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
If the data block being read from the first replica is not in cache, the directory service is referenced to discover
whether or not the data block exists in the cache of another mirror (on another host) in the cluster. If the data
block is found there, the data is retrieved. If the data block isn’t in cache on the other host, then there is a read-
cache miss. In that case, the data is retrieved directly from the mechanical HDD.
Write Cache: Basic Function
The write cache, found in both hybrid and all-flash configurations, behaves as a non-volatile write buffer. This
greatly improves performance in both hybrid and all-flash configurations and also extends the life of flash capacity
devices in all-flash configurations.When writes are written to cache, Virtual SAN ensures that a copy of the data is
written elsewhere in the cluster. All VMs deployed with Virtual SANare set with a default availability policy that
ensures at least one additional copy of the VM data is available. This includes making sure that writes end up in
multiple write caches in the cluster.
Once an application running inside the guest OS initiates a write, it is duplicated to the write cache on the hosts
that include replicas of the storage objects.This means that in the event of a host failure, a copy of the data is in
cache and no data loss occurs. The VM simply uses the replicated copy of the cache data.
Flash Endurance
Flash endurance is related to the number of write/erase cycles that the cache-tier flash SSD can tolerate before it
begins having issues with reliability. For Virtual SAN 6.0 and VxRail Appliance configurations, the endurance
specification has been changed to use Terabytes Written (TBW); previously the specification was full Drive Writes
Per Day (DWPD). By quoting the specification in TBW, VMware allows vendors the flexibility to use larger capacity
drives with lower full DWPD specifications. For example, from an endurance perspective, a 200GB drive with a
specification of 10 full DWPD is equivalent to a 400GB drive with a specification of 5 full DWPD. If VMware kept a
specification of 10 DWPD for Virtual SAN flash devices, the 400 GB drive with 5 DWPD would be excluded from the
Virtual SAN certification. By changing the specification to 2TBW per day, both the 200GBdrive and 400GB drives
meet the certification requirement. 2TBW per day is the equivalent of 5DWPD for the 400GB drive and is the
equivalent of 10 DWPD for the 200GB drive. For all-flash Virtual SAN deployments running high workloads, the
flash-cache device specification is 4TBW per day—the equivalent of 7300 TB Writes over five years.
Virtual SAN’s Impact on Flash Endurance
There are two commonly used approaches to improve NAND Flash endurance: Improve wear leveling and minimize
write activity. Unfortunately, a distributed storage implementation that focuses on localizing data on the same node
where the VMs reside prevents the distribution of the writes across all the drives in the cluster. This localization
inevitably increases drive usage, leading to early drive replacement.
In contrast, Virtual SAN distributes the objects and components of a VM across all the disk groups in the VxRail
Appliance cluster. This distribution significantly improves wear leveling and reduces write activity by deferring
writes. Virtual SAN also reduces writes by employing data-reduction techniques such as deduplication and
compression.
Client Cache
The client cache is used on both hybrid and all-flash configurations. It leverages local DRAM server memory (client
cache) within the node to the VM to accelerate read performance. The amount of memory allocated is .4%–1GB per
host. Virtual SAN first tries to fulfill the read request from the local client cache, so the VM can avoid crossing the
network to complete the read, and it’s fulfilled faster.If the data is unavailable in the client cache, the cache-tier
SSD is queried to fulfill the read request. The client cache benefits read-cache-friendly workloads.
VXRAIL CONCEPTS AND ARCHITECTURE
46 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Objects and Components
VxRail Appliance virtual machines are made up of a set of objects. For example, a VMDK is an object, a snapshot is
an object, VM swap space is an object, and the VM home namespace (where the .vmx file, log files, etc. are
stored) is also an object. (See Figure 44 below.)
Virtual-machine objects are split into multiple components based on performance and availability requirements
defined in the VM storage profile. For example, if the VM is deployed with a policy to tolerate failure, the objects
have two replica components. Distributed storage uses a disk-striping process to distribute data blocks across
multiple devices. The stripe itself refers to a slice of divided data; the striped device is the individual drive that
holds the stripe. If the policy contains a stripe width, the object is striped across multiple devices in the capacity
layer, and each stripe is an object component.
Figure 44: Virtual SAN Objects and Components
Each Virtual SAN host has a maximum of 9,000 components. The largest component size is 255GB. For objects
greater than 255GB, Virtual SAN automatically divides them into multiple components. For example, a VMDK of
62TB generates more than 500 x 255GB components.
Witness
In Virtual SAN, witnesses are generally an integral component of every storage object, as long as the object is
configured to tolerate at least one failure. They are components that contain no data, only metadata. Their purpose
is to serve as tiebreakers when availability decisions are made to meet the failures to tolerate policy setting, and
they’re used when determining if a quorum of components exist in the cluster.
In Virtual SAN 6.0, storage components can be distributed in such a way that they can guarantee availability
without relying on a witness. In this case, each component has a number of votes—at least one or more. Quorum is
calculated based on the rule that requires "more than 50 percent of votes."(Still, many objects have a witness in
6.0.)
Replicas
Replicas make up the virtual machine’s storage objects. Replicas are instantiated when an availability policy
(NumberOfFailuresToTolerate) is specified for the virtual machine. The availability policy dictates how many replicas
are created and lets virtual machines continue running with a full complement of data even when host, network, or
disk failures occur in the cluster.
VXRAIL CONCEPTS AND ARCHITECTURE
47 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Storage Policy-Based Management (SPBM)
Virtual SAN policies define virtual-machine storage requirements, such as performance and availability. These
policies determine how storage objects are provisioned and allocated within the datastore to guarantee the required
level of service.
Virtual SAN implements Storage Policy-Based Management, and each virtual machine deployed in a Virtual SAN
datastore has at least one assigned policy. When the VM is created and assigned a storage policy, the policy
requirements are pushed to the Virtual SAN layer. (See Figure 45 below.)
Figure 45
Policy assignments can be manually or automatically generated, based on rules. For instance, all virtual machines
that include with PROD-SQL in their name or resource group might be set at RAID-1 and a 5-percent read-cache
reservation, and TEST-WEB would be automatically set to RAID-0.
Dynamic Policy Changes
Administrators can dynamically change a VM storage policy. When changing attributes such as
NumberOfFailuresToTolerate (FTT), Virtual SAN attempts to find a new placement for a replica with the new
configuration. In some cases, existing parts of the current configuration can be reused, and the configuration just
needs to be updated or extended. For example, if an object currently uses NumberOfFailuresToTolerate=1, and the
user asks for NumberOfFailuresToTolerate=2, Virtual SAN can simply add another mirror (and witness).
In other cases, such as changing the stripe width from one to two, Virtual SAN cannot reuse existing replicas, and it
creates a brand new replica (or replicas) without impacting the existing objects.
Storage Policy Attributes
The screenshot in Figure 46displays the current policy attributes available with Virtual SAN:
VXRAIL CONCEPTS AND ARCHITECTURE
48 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 46: Virtual SAN Policy Attributes
Number of Disk Stripes per Object
This policy attribute establishes the minimum number of capacity devices used for striping each virtual-machine
replica. A value higher than 1 might result in better performance, but it also results in higher resource
consumption. The default value is the minimum 1, and the maximum value is 12. The stripe size is 1MB.
Virtual SAN may decide that an object needs to be striped across multiple disks without any stripe-width policy
requirement. The reason for this can vary, but typically it occurswhen a VMDK is too large to fit on a single physical
drive. However, when a particular stripe width is required, then it should not exceed the number of disks available
to the cluster.
Flash Cache Reservation
Flash Cache Reservation refers to flash capacity reserved as read cache for the virtual-machine object, and it
applies to hybrid configurations only. By default, Virtual SAN dynamically allocates read cache to storage objects
based on demand. As a result, no need typically exists to change the default 0 value for this parameter.
However, in very specific cases, when a small increase in the read cache for a single VM can provide a significant
change in performance, it is an option. It should be used with caution to avoid wasting resources or taking
resources from other VMs.
The default value is 0 percent. Maximum value is 100 percent.
Number of Failures to Tolerate
This FTT option generally defines the number of host and device failures that a virtual machine object can tolerate.
For n failures tolerated, n+1copies of the VM object area created and 2n+1 hosts with storage are required.
The default value is 1. Maximum value is 3.
Virtual SAN supports two specific configurations when erasure codes are enabled. The first, RAID-5, applies when
the number of failures to tolerate is set to 1, and the second, RAID-6, applies when the number of failures to
tolerate is set to 2. Note that a Virtual SAN cluster size needs to be at least four hosts for RAID-5 and at least six
hosts for RAID-6. Of course, it may be (much) larger than that.
VXRAIL CONCEPTS AND ARCHITECTURE
49 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Fault Tolerance Method
Fault Tolerance Method specifies whether the data-replication method optimizes for performance or capacity. The
RAID-1 mirroringoption for performance uses more disk space to place the object components but consumes less
CPU and network resources. RAID-5/6 erasure coding is the capacity option. It uses less disk space, but consumes
more CPU and network resources. (An upcoming section on erasure coding section provides additional information.)
IOPS Limit for Object (QoS)
This attribute defines the IOPS limit for an object, such as a VMDK. IOPS is calculated as the number of disk I/O
operations, using a weighted size. If the system uses the default base size of 32KB, two I/O operations would be
represented as 64KB I/O. This Quality of Service option can be used to keep workloads from impacting each other
(the noisy-neighbor issue) or establish limits for differentiated services.
A few notes regarding IOPS
When calculating IOPS, read and write are considered equivalent, but keep in mind that cache-hit ratio and sequentiality are not considered.
When an object exceeds its disk IOPS limit, I/O operations are throttled.
If the IOPS limit for object is set to 0, IOPS limits are not enforced.
Virtual SAN allows the object to double the IOPS-limit rate during the first second of operation or after a period of inactivity.
Figure 47: IOPS limits impact Quality of Service.
Checksum
Virtual SAN uses end-to-end checksum to ensure the integrity of data by confirming that each copy of a file is
exactly the same as the source file. The system checks the validity of the data during read/write operations, and if
an error is detected, Virtual SAN repairs the data or reports the error. If a checksum mismatch is detected, Virtual
SAN automatically repairs it by overwriting the data by overwriting with correct data. Checksum calculation and
error-correction are background operations.
The default setting for all objects in the cluster isNo, which means that checksum is enabled.
Force Provisioning
If this option is set to Yes, the object is provisioned even if the NumberOfFailuresToTolerate,
VXRAIL CONCEPTS AND ARCHITECTURE
50 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
NumberOfDiskStripesPerObject, and FlashReadCacheReservation policies specified in the storage policy cannot be
satisfied by the datastore.
This parameter is used in bootstrapping scenarios and during an outage when standard provisioning is no longer
possible.
The default No is acceptable for most production environments. Virtual SAN fails to provision a virtual machine
when the policy requirements are not met, but it successfully creates the user-defined storage policy.
Object Space Reservation
Object space reservation defines the logical size of the VMDK object as percentage of the actual VMDK. It reflects
the reserved, thick-provisioned space required for deploying virtual machines.
The default value is 0 percent.Maximum value is 100 percent.
The value should be set either to 0 percentor 100 percentwhen using RAID-5/6.
I/O Paths and Caching Algorithms1
This section elaborates on some of the Virtual SAN concepts that have been introduced so far with additional,
general information about Virtual SAN’s caching algorithms. The next paragraphs briefly describe how Virtual SAN
leverages flash, memory, and rotating disks. They also illustrate the I/O Paths between the guest OS and the
persistent storage areas.
Read Caching
Read caching in Virtual SAN exists to separate performance from capacity and deliver low latency and capacity
density at a competitive cost. Part of the SSD is used as the read cache (RC) of the corresponding disk group. The
purpose is to serve the highest possible ratio of read operations from data staged in the RC and to minimize the
portion of read operations served by the HDDs. It leverages the higher IOPS capabilities and lower latencies of the
SSDs to provide a cost-performance solution for the VxRail Appliance.
The RC is organized in terms of cache lines. They represent the unit of data management in RC, and the current
size is 1MB. Data is fetched into the RC and evicted at cache-line granularity. In addition to the SSD read cache,
Virtual SAN also maintains a small in-memory (RAM) read cache that holds the most-recently accessed cache lines
from the RC. The in-memory cache is dynamically sized based on the available memory in the system.
Virtual SAN maintains in-memory metadata that tracks the state of the RC (both SSD and in memory), including
the logical addresses of cache lines, valid and invalid regions in each cache line, aging information, etc. These data
structures are designed to compress for efficiencies, using memory space without imposing a substantial CPU
overhead on regular operations. No need exists to swap RC metadata in or out of persistent storage. (This is one
area where VMware holds important IP.)
Read-cache contents are not tracked across power-cycle operations of the host. If power is lost and recovered, then
the RC is re-populated (warmed) from scratch. So, essentially RC is used as a fast-storage tier, and its persistence
is not required across power cycles. The rationale behind this approach is to avoid any overheads on the common
data path that would be required if the RC metadata was persisted every time RC was modified—such as cache-line
fetching and eviction, or when write operations invalidate a sub-region of a cache line.
Anatomy of a Hybrid Read
Read operations follow a defined procedure. To illustrate, the VMDK in the example below has two replicas on esxi1
and esxi3.
1Much of the content in thisspecific section has been extracted from an existing technical whitepaper: An overview of VMware VSAN Caching Algorithms.
VXRAIL CONCEPTS AND ARCHITECTURE
51 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
1. Guest OS issues a read on virtual disk
2. Owner chooses replica to read from
Load balance across replicas
Not necessarily local replica (if one)
A block always reads from same replica
3. At chosen replica (esxi-03): read data from flash write buffer, if present
4. At chosen replica (esxi-03): read data from flash read cache, if present
5. Otherwise, read from HDD and place data in flash read cache
Allocate a 1MB buffer for the missing cache line and replace ―coldest‖ data (eviction of coldest data to make room for new read)
o Each missing line is read from the HDD as multiples of 64KB chunks, starting with the chunks that contain the referenced data
6. Return data to owner
7. Complete read and return data to VM
8. Once the 1MB cache line is added to the in-line read cache, its population continues asynchronously. This occurs to explore both the spatial and temporal locality of reference, increasing the changes that the next reads will find in the read cache.
Figure 48: Hybrid Read
Anatomy of an All-Flash Read
1. Guest OS issues a read on virtual disk
2. Owner chooses replica to read from
Load balance across replicas
Not necessarily local replica (if one)
3. At chosen replica (esxi-03): read data from flash write buffer, if present
4. Otherwise, read from capacity flash device
VXRAIL CONCEPTS AND ARCHITECTURE
52 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
5. Return data to owner
6. Complete read and return data to VM
Figure 49: All-Flash Read
The major difference is that read-cache misses cause no serious performance degradation. Reads from flash
capacity devices should be almost as quick as reads from the cache SSD. Another significant difference is that no
need exists to move the block from the capacity layer to the cache layer, as inhybrid configurations.
Write Caching
Why write-back caching? In hybrid-configurations, this is done entirely for performance. The aggregate-storage
workloads in virtualized infrastructures are almost always random, thanks to the statistical multiplexing of the
many VMs and applications that share the infrastructure.
HDDs can perform only a small number of random I/O with a high latency compared to SSDs.So, sending the
randomwrite part of the workload directly to spinning disks can cause performance degradation. On the other hand,
magnetic disks exhibit decent performance for sequential workloads. Modern HDDs may exhibit sequential-like
behavior and performance even when the workload is not perfectly sequential. ―Proximal I/O‖ suffices.
In hybrid disk groups, Virtual SAN uses the write-buffer (WB) section of the SSD (by default, 30 percent of device
capacity) as a write-back buffer that stages all the write operations. The key objective is to de-stage written data
(not individual write operations) in a way creates a benign, near-sequential (proximal) write workload for the HDDs
that form the capacity tier.
In all-flash disk groups, Virtual SAN utilizes the tier-1 SSD entirely as a write-back buffer (100 percent of the
device capacity—up to a maximum of 600GB). The purpose of the WB is quite different in this case. It absorbs the
highest rate of write operations in a high-endurance device and allows only a trickle of data to be written to the
capacity flash tier. This approach allows low-endurance, larger-capacity SSDs at the capacity tier.
Nevertheless, capacity-tier SSDs are capable of serving very large numbers of read IOPS. Thus, no read caching
occurs in the tier-1 SSD, except when the most-recent data referenced by a read operation still resides in the WB.
VXRAIL CONCEPTS AND ARCHITECTURE
53 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
In either case (hybrid or all-flash), every write operation is handled through transactional processes: A record for
the operation is persisted in the transaction log in the SSD.
The data (payload) of the operation is persisted in the WB.
Updated in-memory tables reflect the new data and its logical address space (for tracking) as well as its
physical location in the capacity tier.
The write operation completes upstream after the transaction has committed successfully.
Commonly (under typical steady-state workload), the log records of multiple write operations are coalesced before
they are persisted in the log. This reduces the amount of metadata-write operations for the SSD. By definition, the
log is a circular buffer, written and freed in a sequential fashion. Thus write amplification can be avoided (good for
device endurance). The WB region allocates blocks in a round-robin fashion, keeping wear leveling in mind. Even
when a write operation overwrites existing WB data, Virtual SAN never rewrites an existing SDD page in place.
Instead, it allocates a new block and updates metadata to reflect that the old blocks are invalid. Virtual SAN fills an
entire SSD page before it moves to the next one. Eventually, entire pages are freed when all their data is invalid.
(It is very rare to re-buffer data to allow SSD pages to be freed). Also, because the device firmware does not have
visibility into invalidated data, it sees no ―holes‖ in pages. In effect, internal write leveling (by moving data around
to fill holes in pages) is all but eliminated. This extends the overall endurance of a device. In general, the Virtual
SAN design has gone to great lengths to impose a benign workload in terms of endurance. As a result, the life
expectancy of SSDs implemented in VIRTUAL SAN may exceed the manufacturers’ specifications, which are
developed with more generic workloads in mind.
Anatomy of a Write I/O – Hybrid and All-Flash
1. VM running on host esxi-01
2. esxi-01 is owner of virtual disk object
Number Of Failures To Tolerate = 1
3. Object has two (2) replicas on esxi-01 and esxi-03
4. Guest OS issues write op to virtual disk
5. Owner clones write operation
In parallel: sends write op to esxi-01 (locally) and esxi-03
6. esxi-01, esxi-03 persist operation to flash (log)
7. esxi-01, esxi-03 ACK-write operation to owner
8. Owner waits for ACK from both writes and completes I/O!
9. Later, backend hosts commit batch of writes
VXRAIL CONCEPTS AND ARCHITECTURE
54 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 50: Hybrid and Flash Write I/O
Distributed Caching Considerations
Virtual SAN’s caching algorithms and data-locality techniques reflect a number of objectives and observations
pertaining to distributed caching:
Virtual SAN exploits temporal and spatial locality for caching.
Virtual SAN implements a distributed, persistent cache on flash across the cluster. Caching is done in front of the disks where the data replicas live, not on the client side. A distributed-caching mechanism results in better overall flash-cache utilization.
Another benefit of distributed caching is during VM migrations, which can happen in some data centers over ten times a day. With DRS and vMotion, VMs can move around from host-to-host in a cluster. Without a distributed cache, the migrations would have to move around a lot of data and rewarm caches every time a VM migrates. As the graph below (Figure 51) illustrates, Virtual SAN prevents any performance degradation after a VM migration.
Figure 51: Virtual SAN prevents performance degradation after VM migration.
VXRAIL CONCEPTS AND ARCHITECTURE
55 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
The network introduces a small latency when accessing data on another host. Typical latencies in 10GbE networks range from 5 to50 microseconds. Typical latencies of a flash drive, accessed through a SCSI layer, are near 1ms for small (4K) I/O blocks. So, for the majority of the I/O executed in the system, the network impact adds near 0.1 percent to the latency.
Few workloads are actually cache-friendly, meaning that they don’t take advantage of the way small increases in cache size can significantly increase the rate of I/O. These workloads can benefit from local cache.
VirtualSAN works with a View Accelerator (deduplicated, in-memory read cache), which is notably effective for VDI use cases. Remember also that Virtual SAN 6.2 features client cache that leverages DRAM memory local to the virtual machine to accelerate read performance. The amount of memory allocated is anywhere from 0.4 percent to 1GB per host.
Virtual SAN High Availability and Fault Domains
Virtual SAN policy attributes establish parameters to protect against host failures, but they may not be the most
effective or efficient way to build tolerance for events like rack failures. This section surveys the availability
solutions for Virtual SAN clusters on the VxRail Appliance. It starts out by looking at the availability implications on
small VxRail Appliance deployments with fewer than four nodes.
Limitations of Two- and Three-Node Configurations
Currently, VxRail Appliance clusters a minimum of four nodes. If ―start small‖ the ideal for scalability, why not begin
even smaller than the four-node cluster? Virtual SAN supports a three-node cluster, but IT shops that deploy it
needs to understand the trade-off between the cost of the hardware and software components and the degree of
availability that the configuration provides. Two- and three-node configurations can behave differently from
configurations with at least four nodes. In particular, the system can come up short in the event of a failure. Such
small clusters have slim resources—certainly not enough to rebuild components on another host and automatically
restore fault tolerance. Also two-node and three-node configurations affect VM uptime during certain host-
maintenance operations that require data migration to another host.
Recall that Virtual SAN replication requires two copies of data and a witness—all of which reside on a different host.
In configurations with fewer than four nodes,that’s a problem. At best they can only tolerate one failure. If a node
fails, Virtual SAN can neither rebuild components nor provision new VMs that tolerate failures until the failed node
is replaced.When the applications require maximum availability, both for planned and unplanned outage scenarios,
a configuration with at least four nodes is recommended.
That said, VCE is planning a two-node VxRail Appliance for the near future. The two-node deployment is targeted at
ROBO locations where a small witness VM can reside in the central data center (1+1+1) or in the cloud. Each of the
nodes is a failure domain. The witness VM requires two vCPUs, 8GB of memory, 15GB of capacity, and 10GB for
caching.
On larger enterprise deployments, a three- or four-node Virtual SAN cluster could be deployed in the central data
center to host all the witnesses (as in Figure 52 below). All sites could be managed centrally by a single instance of
vCenter. (vSphere limitations apply: 1,000 hosts per vCenter, etc.)
VXRAIL CONCEPTS AND ARCHITECTURE
56 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 52: ROBO implementation: Witness VMs at a central location.
Fault Domain Overview
Virtual SAN and VxRail Appliances implement fault domains as a solution for tolerating rack and site failures.Fault
domains instruct Virtual SAN to spread redundancy components across the servers in separate racks. They protect
the environment from a rack-level failure such as loss of power or connectivity. Consider, for example, a cluster
with four VxRail Appliances, each one placed in a different rack. The nodes of each appliance can be in a different
fault domain.
In terms of implementation, any host that is not part of another fault domain is considered its own single-host fault
domain. Virtual SAN requires at least two fault domains, and each has at least one host. Fault-domain definitions
recognizethe physical hardware constructs that represent the domain itself. Once the domain is enabled, Virtual
SAN applies the active virtual-machine storage policy to the entire domain, instead of just to the individual
hosts.The number of fault domains in a cluster is calculated based on the FTT attribute: (NumberOfFaultDomains) =
2 * (NumberOfFailuresToTolerate) + 1
Figure 53: Managing Fault Domains
VXRAIL CONCEPTS AND ARCHITECTURE
57 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Fault Domains and Rack-Level Failures
The fault-domain mechanism is smart enough to perceive when the configuration is vulnerable. Consider a cluster
that contains four server racks, each with two hosts. If the NumberOfFailuresToTolerate is set to1,and fault
domains are not enabled, Virtual SAN might store both replicas of an object with hosts in the same rack, and if
that’s the case, applications are exposed to a potential rack-level failure. With fault domains enabled however,
Virtual SAN ensures that each protection component (replicas and witnesses) is placed in a separate fault domain.
It makes sure that the hosts can’t fail together. The chart below (Figure 54) illustrates a four-server rack, each with
two ESXi hosts.
Four defined Fault Domains:
FD1 = esxi-01, esxi-02
FD2 = esxi-03, esxi-04
FD3 = esxi-05, esxi-06
FD4 = esxi-07, esxi-08
Figure 54: Fault Domains for a Four-Server VxRail Appliance Rack
This configuration guarantees that the replicas of an object are stored in hosts of different rack enclosures,
ensuring availability and data protection in case of a rack-level failure.
Virtual SAN Stretched Cluster
We touched on the advantages of the Virtual SAN’s native integration with vSphere, and the concept of a stretched
cluster is exactly the kind of thing we were talking about. This is a case where deploying VxRail Appliance
technology extends the availability of the larger enterprise data center. The stretched cluster is a specific
configuration implemented in environments where the requirement for data-center level disaster/downtime
avoidance is absolute. We’ve already reviewed the way fault domains enable ―rack awareness‖ for rack failures.This
sectiondiscusses how fault domains leverage ―data-center awareness,‖ providing virtual-machine availability despite
specific data-center failure scenarios.
In a VxRail Appliance environment, stretched clusters with witness host refers to a deployment where a Virtual SAN
cluster consists of two active/active sites with an identical number of ESXi hosts distributed evenly between them.
The sites are connected via a high bandwidth/low latency link.
VXRAIL CONCEPTS AND ARCHITECTURE
58 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 55: Stretched VxRail Appliance Cluster
In the graphic above (Figure 55), each site is configured as a Virtual SAN fault domain. The nomenclature used to
describe the stretched-cluster configuration is X+Y+Z, where X is the number of ESXi hosts at Site A, Y is the
number of ESXi hosts at Site B, and Z is the number of witness hosts at site C.
A virtual machine deployed on a stretched cluster hasone copy of its data on Site A, and anotheron Site B,as well as
witness components placed on the host at Site C.
It’s a singular configuration, achieved only through a combination of fault domains, hosts and VM groups, and
affinity rules. In the event of a complete site failure, the other site still has a full copy of virtual-machine data and
at least half of the resource components available. That means all the VMs remain active and available on the
Virtual SAN datastore.
The minimum supported configuration is 1+1+1(3 nodes). The maximum configuration is 15+15+1 (31 nodes).
Stretched clusters are supported by both hybrid- and all-flash VxRail Appliance configurations.
NOTE: This section contains only a brief design and considerations discussion. More information can be found in
VMware’s Virtual SAN 6.2 Stretched Cluster Guide:http://www.vmware.com/files/pdf/products/vsan/VMware-
Virtual-SAN-6.2-Stretched-Cluster-Guide.pdf
Site Locality
In a conventionalstorage-cluster configuration, reads are distributed across replicas. In a stretched-cluster
configuration, the Virtual SANDistributed Object Manager (DOM)also takes into account the object’s fault domain,
and only reads from replicas in the same domain. That way, it avoids any lag time associated with using the inter-
site network to perform reads.
VXRAIL CONCEPTS AND ARCHITECTURE
59 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Networking
Both Layer-2 (same subnet) and Layer-3 (routed) configurations areusedfor stretched-cluster deployments. A
Layer-2 connection should exist between data sites, and Layer-3 connection between the witness and the data
sites.
The bandwidth between data sites depends on workloads, but VCE recommends a minimum of 10Gbps for VxRail
Appliances, and that should accommodate the stretched cluster. (In two-node ROBO configurations, dedicated
1Gbps may suffice, but it still depends on workload activity.) The supported latency for witness hosts is up to
500ms RTT and a bandwidth of 2Mbps for every 1,000 Virtual SAN objects. Also bear in mind that the latency
between data sites should be no greater than 5ms, with the estimated distance for a 5ms RTT is 500km or about
310miles.
Stretched-Cluster Heartbeats and Site Bias
Stretched cluster configurations effectively have three fault domains. The first functions as the preferred data site,
the second is the secondary data site, and the third is simply the witness host site.
The Virtual SAN master node is placed on the preferred site and the Virtual SAN backup node is placed on the
secondary site. As long as nodes (ESXi hosts) are available in the preferred site, then a master is always selected
from one of the nodes on this site—similarly for the secondary site, as long as nodes are available on the secondary
site.
The master node and the backup node send heartbeats every second. If heartbeat communication is lost for five
consecutive heartbeats (five seconds),the witness is deemed to have failed. If the witness has suffered a
permanent failure, a new witness host can be configured and added to the cluster. Preferred sites gain ownership in
case of a partition.
After a complete failure, both the master and the backup end up at the sole remaining live site. Once the failed site
returns, it continues with its designated role as preferred or secondary, and the master and secondary migrate to
their respective locations.
vSphere HA settings for Stretched Cluster
Host monitoring is enabled by default in all VxRail Appliance deployments, including of course stretched-cluster configurations. This feature also uses network heartbeat to determine the status of hosts participating in the cluster. It indicates a possible need for remediation, such as restarting virtual machines on other cluster nodes.
Configuring admission control ensures that vSphere HA has sufficient available resources to restart virtual machines after a failure. This may be even more significant in a stretched cluster than it is in a single-site cluster, because it makes the entire, multi-site infrastructure resilient.Workload availability is perhaps the primary motivation behind most stretched-cluster implementations.
The deployment needs sufficient capacity to accommodate full-site failure. Since the stretched cluster equally divides the number of ESXi hosts between sites, VCE recommends configuring the admission-control policy to
50 percent for both CPU and memory to ensure that all workloads can be restarted by vSphere HA.
Snapshots
Snapshots have been around for a while as a means of capturing the state of system at a particular point in time
(PIT), so that it can be rolled back to that state if need be after a crash. In the case of the VxRail Appliance
solution, administrators can create, roll back, or delete VM snapshots using the Snapshot Manager in the vSphere
Web client. Each VM supports a chain of up to 32 snapshots.
A virtual machine snapshot generally includes the settings (.nvram and .vmx) and power state, state of all the
VM’s associated disks, and optionally, the memory state. Specifically, each snapshot includes:
Delta disk:
VXRAIL CONCEPTS AND ARCHITECTURE
60 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
o The state of the virtual disk at the time the snapshot is taken is preserved. When this occurs, the guest OS is unable write to its .vmdk file. Instead, changes are captured in an alternate file named VM_name-delta.vmdk.
Memory-state file:
o VM_name-Snapshot#.vmsn, where #is the next number in the sequence, starting with 1. This file holds the memory state since the snapshot was taken. If memory is captured, the size of this file is the size of the virtual machine’s maximum memory. If memory is not captured, the file is much smaller.
Disk-descriptor file:
o VM_name-00000#.vmdk, a small text file that contains information about the snapshot.
Snapshot-delta file:
o VM_name-00000#-delta.vmdk, which contains the changes to the virtual disk’s data at the time the snapshot was taken.
VM_name.vmsd:
o This snapshot list file is created when virtual machine itself is deployed. It maintains VM snapshot information that goes into a snapshot list in the vSphere Web Client. This information includes the name of the snapshot .vmsn file and the name of the virtual-disk file.
The snapshot state uses a .vmsn extension and stores the requisite VM information at the time of the snapshot.
Each new VM snapshot generates a new .vmsn file. The size of this file varies, based on the options selected during
creation. For example, including the memory state of the virtual machine increases the size of the .vmsn file.
Ittypically contains the name of the VMDK, the display name and description, and an identifier for each snapshot.
Other files might also exist. For example, a snapshot of a powered-on virtual machine has an associated
snapshot_name_number.vmem file that contains the main memory of the guest OS, saved as part of the
snapshot.
A quiesce option is available to maintain consistent point-in-time copies for powered-on VMs. VMware tools may
use their own sync driver or use Microsoft’s Volume Shadow Copy Service (VSS) to quiesce not only the guest OS
files system, but also any Microsoft applications that understand VSS directives.
How Snapshots Work
Virtual SAN snapshots use an efficient, on-disk Virtual SANSparse format. When a base-disk snapshot is taken, it
creates a child delta disk. The parent functions as a static, PIT copy. Meanwhile the child delta starts a snapshot
chain, recording the virtual-machine write history. The delta disk snapshot object is made up of a set of grains,
where each grain is a block of sectors containing virtual-disk data. The deltas keep only changed grains, which
makes them space efficient.
In the diagram below (Figure56), the base disk object is called Disk.vmdk and sits at the bottom of the chain. The
chain includes three snapshot objects (Disk-001.vmdk, Disk-002.vmdk and Disk-003.vmdk) that have been
taken at various intervals. Various guest-OS writes have also occurred at various intervals, leading to changes in
snapshot deltas.
Base object writes to grains 1,2,3, and 5,
Delta object Disk-001 writes to grains 1 and 4
Delta object Disk-002 writes to grains 2 and 4
Delta object Disk-003 writes to grains 1 and 6
VXRAIL CONCEPTS AND ARCHITECTURE
61 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Figure 56: Snapshot Chain
A virtual-machine read would return the following:
Grain 1 – retrieved from Delta object Disk-003
Grain 2 – retrieved from Delta object Disk-002
Grain 3 – retrieved from Base object
Grain 4 – retrieved from Delta object Disk-002
Grain 5 – retrieved from Base object – 0 returned as it was never written
Grain 6 – retrieved from Delta object Disk-003
The diagram below (Figure 57) reuses the example above to illustrate the Virtual SANSparse driver and its in-
memory cache.
Figure 57: Virtual SANSparse Driver
VXRAIL CONCEPTS AND ARCHITECTURE
62 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
When a guest OS sends a write request, the Virtual SANSparse driver writes the data to the top-most object in the
snapshot chain and updates its in-memory read cache. On subsequent reads, the Virtual SANSparse references the
in-memory read cache to determine which delta (or deltas) to read. The read requests are sent in parallel to all
deltas that have the necessary data.
Managing Snapshots
Administrators use the Snapshot Manager to review active virtual-machine snapshots and perform limited
management operations, including
Delete, which commits snapshot data to its parent snapshot
Delete All, which removes all the snapshots, including the parent
Revert To, which rolls back to a referenced snapshot so that it becomes the current snapshot
Note that deleting a snapshot consolidates the changes between snapshots and previous disk states. It also writes
to the parent disk all data from the delta disk and the deleted snapshot. When the parent is deleted, all changes
merge with the base VMDK.
Administrators also should remember to monitor read cache, because snapshots—used extensively—can consume
RC at a higher-than-optimal rate.
NOTE: For full details regarding VxRail Appliance snapshot technology, refer to Virtual SANSparse – Tech Note for
Virtual SAN 6.0 Snapshots at https://www.vmware.com/files/pdf/products/ SAN
Deduplication and Compression
Many IT sites want their storage solution to include data-reduction technology. For some, it’s more of a
requirement than for others. Naturally, environments with highly redundant data—full-clone virtual desktops for
instance, or homogenous-server operating systems—benefit the most from deduplication. Likewise, compression
makes more of an impact on resources that compress well: Text, bitmap, and program files. For these
environments, deduplication and compression can dramatically reduce the amount of physical storage consumed,
resulting in a lower total cost of ownership.
It may sound obvious, but considering that deduplication and compression algorithms consume CPU and memory,
it’simportant to verify that the stored data in question is actually compressible. Sometimes data has already been
compressed—for example, certain graphics formats and video files, or encrypted files. These may ultimately yield
little or no reduction at all in storage consumption from compression.
Advantages of Data-Reduction Technology
Several years ago, when NAND flash started to appear in storage arrays, a gulf separated HDDs from flash drives in
terms of cost/GB. Flash cost fifteen times more than magnetic devices. The introduction of deduplication and
compression techniques in the data path helped create the market segment of all-flash arrays (AFAs), which were
effective in reducing the cost of flashfor tier-1 applications, despite the high cost of a global-lookup table for
fingerprints.
More recently, the cost of NAND flash has dropped 50 percent, and an all-flash configuration is suddenly very
attractive for more than just tier-1 workloads. It also has the opportunity to better balance the data-reduction
target and the consumption of CPU against memory and network resources on an appliance like VxRail Appliance.
This is precisely where data reduction benefits VxRail Appliance customers. The appliance includes in-line
deduplication and compression at a disk-group level.
VXRAIL CONCEPTS AND ARCHITECTURE
63 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Remember our conversation about flash endurance and drive writes per day (DWPD)? Currently, the price-per-GB
of a 1DWPD flash drive is about 4.5 times more expensive than that of a 10K RPM HDD. If cost alone is the issue, a
data reduction of 4.5 times makes the price of an all-flash appliance compatible with the cost of a hybrid
configuration.
But cost is not the only factor worth considering here. As HDD capacity increases, so does the gap in performance
between HDDs and flash disks. In other words, capacity grows much faster than performance. In terms of
IOPS/GB, a 3.8TB flash performs 50 times better than 1.2TB 10K rpm HDD, and it has a latency advantage of at
least a 10 to 1.
Because of its data-reduction technology, the all-flash VxRail Appliance configuration in particular has found the
sweet spot in terms of price-performance, even if the compression ratio is lower than 4:1. An all-flash appliance
provides a significantly higher throughput and a much more predictable performance behavior at an attractive cost.
In-Line Deduplication and Compression per Disk Group
In Virtual SAN, deduplication occurs when data is de-staged from the cache tier. It uses a fixed block-length
deduplication (4KB blocks), which increases the chances of finding duplicated blocks. Virtual SAN performs the
deduplication algorithm within each disk group and reduces redundant copies into one copy (as in Figure 58 below).
Redundant blocks across multiple disk groups, though, are not deduplicated.
This is a smart technique. By deduplicating only when de-staging, the implementation minimizes the CPU overhead
of creating hash keys for new writes directed to the same cache locality. By limiting the deduplication domain to a
disk group, Virtual SAN further diminishes network overhead and CPU utilization. It avoids the requirement of a
global lookup table, which would add a sizable resource overhead. This way, resources can track to a smaller and
more meaningful block size.
Compression occurs after deduplication, but before the data is de-staged from the cache to the capacity tier. Virtual
SAN only stores compressed data if it can reduce a unique 4KB block to 2KB. Otherwise, the block is written
uncompressed, avoiding misalignment and resource waste.
Figure 58: Deduplication
VXRAIL CONCEPTS AND ARCHITECTURE
64 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Latency and Resource Consumption
Performance overhead should be expected during a read miss, or during decompression when data moves from the
capacity to the performance tier. However, don’t overlook the fact that any overhead is mitigated by low-latency
flash-disk response times—nearly 1ms for small-block I/O. Meanwhile, write latency should not be affected.
The metadata created in the data-reduction process is kept in the capacity tier and can consume between 3 and 5
percent of the flash-disk space.
Enabling Deduplication and Compression
In VxRail Appliances—including stretched-cluster implementations—data-reduction operations use cluster-wide
settings. Deduplication and compression are disabled by default, so they need to be enabled. (See Figure 59.) They
become activated at the same time, which executes an online rolling reformat on all the disks in the Virtual SAN
cluster. If deduplication and compression become disabled at some point, turning them back on triggers another
rolling-reformat execution.
Figure 59: Deduplication and Compression Enabled
Erasure Coding
When it comes to fault tolerance and data protection, purely conventional data-replication services are not the most
workable solution for a distributed storage system, because replication consumes so much storage space. Erasure
coding provides a practical alternative for all-flash VxRail Appliance configurations. It breaks up data into
fragments, and distributes redundant chunks of data across the system.
Erasure codes introduce redundancy by using data blocks and striping. We briefly discussed striping earlier, and we
won’t go too far into explaining it here, because it could lead to an unnecessary investigation of RAID technology.
But basically, data blocks are grouped in sets of n, and for each set of n data blocks, a set of p parity blocks exists.
Together, these sets of (n + p) blocks make up astripe. The crux is that any of the n blocks in the (n + p) stripeis
enoughto recover the entire data on the stripe.
In VxRail Appliance clusters, the data and parity blocks that belong to a single stripe are placed in different ESXi
hosts in a cluster, providing a layer of fault tolerance for each stripe. Stripes don’t follow a one-to-one distribution
model. It’s not a situation where the set of n data blocks sits on one host, and the parity set sits on another.
Rather, the algorithm distributes individual blocks from the parity set among the ESXi hosts.
VXRAIL CONCEPTS AND ARCHITECTURE
65 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
The diagrams below (60 and 61) illustrate the implementation. A 3+1 stripe uses 3 data blocks and 1 parity block.
It requires a minimum of four hosts or four fault domains to ensure availability in case one of the hosts or disks
fails. This is recognized as a RAID-5 network implementation.
Figure 60: RAID-5 Network
A RAID-6 implementationwith a 4+1 configuration requires at least six hosts.
Figure 61: RAID-6 Network
VXRAIL CONCEPTS AND ARCHITECTURE
66 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Look at the comparison of usable capacity in the graph below (Figure 62). The erasure-code protection method
increases the usable capacity up to 50 percent compared to mirroring.
Figure 62: Erasure coding increases usable capacity up to 50 percent.
An all-flash VxRail Appliance node, using 3.84TB drives has up to 19.2TB of raw capacity (5 x 3.84).
When using mirroring as the protection method and an FTT policy of 1, the usable capacity is 9.6TB.
When using Erasure Coding as the protection method and FTT=1, the usable capacity is 14.4TB
Enabling Erasure Coding
As mentioned in the section on Storage Policy Based Management, a rule calledFault Tolerance Methodlets
administrators choose between RAID-1 (Mirroring) and RAID-5/6 (Erasure Coding). The FTT policy (in Figure 63)
determines the number of parity blocks written by the erasure code.
Figure 63: FTT policy determines the number of parity blocks written by the erasure code
VxRail Appliance implements erasure coding at a very granular level, and it can be applied to VMDKs, making for a
nuanced approach. Configurations for VMs with write-intensive workloads—a database log, for instance—can
include a mirroring policy, while the data component can include an erasure coding.
VXRAIL CONCEPTS AND ARCHITECTURE
67 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Requirements
Erasure coding requires a minimum number of fault domains to ensure availability. (Remember that if no fault
domains have been defined, an individual host becomes fault domain.)
Overhead Issues (RAID-5 and RAID-6)
Erasure coding saves space, yes, but the cost is performance. Computing parity blocks consumes CPU cycles and
adds overhead to the network and disks, as does distributing data slices across multiple hosts. This extra activity
can affect latency and overall IOPS throughput.
The rebuild operation also adds overhead. In general, rebuild operations multiply the number of reads and network
transfers used for replication. A formula is available here, too. If, n refers to the number of blocks in a stripe, then
the rebuild operations cost n times that of ordinary replication. For a 3+1 stripe, that means three disk reads and
three network transfers for every one of conventional data-replication. The rebuild operation can also be invoked to
serve read requests for currently available data.
This additional I/0 is the primary reason why only all-flash VxRail Appliance configurations use erasure coding. The
rationale here is that the flash disks compensate for the extra I/O.
VXRAIL CONCEPTS AND ARCHITECTURE
68 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Integrated Solutions
STORAGE TIERING WITH CLOUDARRAY
EMC CloudArray, EMC’s cloud storage gateway, is integrated into VxRail Appliancesand it seamlessly extends the appliance
to public and private clouds to securely expand managed storage capacity. Cloud-storage gateways make it possible to
take advantage of storage services from both public and private cloud storage providers while maintaining predictable
performance behavior. EMC CloudArray is accessed through VxRail Manager Extension and provides an additional 10TB of
on-demand cloud storage per appliance. EMC CloudArray currently provides connections (APIs) to over 20 different public
and private cloudsincluding EMC ViPR, VMware vCloud Air, Rackspace, Amazon Web Services, Google Cloud, EMC Atmos,
and Openstack. VxRail Appliance CloudArray can provide an elegant,seamless solution for cost-efficient cold (inactive)
data storage or an easily accessible online archive with predictable performance behavior.
VxRail Appliance deploys CloudArray as a virtual appliance, a preconfigured, ready-to-run VM packaged with an operating
system and a software application. Self-contained virtual appliances make it simpler to acquire, deploy, and manage
applications. The CloudArray virtual appliance is essentially a VM already installed with and running the EMC CloudArray
software application. The communication between the VxRail VMs and the CloudArray VM takes place through the VM IP
network. An iSCSI initiator is configured on the VM’s guest OS to connect it to the CloudArray VM, and the IP address of
the CloudArray VM is defined as the iSCSI target. Diagram 64 below illustrates the implementation.
Figure 64: CloudArray Communication
When using VxRail Appliance and CloudArray for cloud tiering, virtual disks (vdisks) are first created in the VSAN
Datastore for the CloudArray virtual appliance to use as cache sources. CloudArray identifies these vdisk devices as cache
sources and places them in pools, andthe cache sources then allocate the capacity into different-sized spaces, or cache
areas.
For the VxRail Appliance, CloudArray creates volumes using specific volume-provisioning definitions associated with the
cache area. These definitions determine whether the volume accesses capacity from a cloud service orremains local
(cloudless). Typically, local provisioning requires large cache areas that can store 100 percentof the volume capacity
locally. Large cache areas accommodate frequently accessed volumes. Less-active volumes are generally provisioned
using small cache areas and leverage a cloud provider for capacity.
VXRAIL CONCEPTS AND ARCHITECTURE
69 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Observe in the illustration (Figure 65) below that Vol1 requires 10GB of capacity from Cache1, which can provide up to
600GB of capacity.On the other hand, Vol2 requires 100GB of capacity from Cache2, which is allocated from a cache area
that provides only 25GB of capacity. Regardless of the cache area size, the cache always maintains the most recently
accessed data, and the less frequently accessed data can be tiered to a cloud.
Figure 65: CloudArray Cache Sources
CloudArray can also create and schedule in-cloud snapshots, which are extremely spaceefficient and can be controlled via
age-based retention controls. A granular bandwidth scheduler helps optimize WAN utilization by enabling the scheduling
and bandwidth controlused by CloudArray. Local caching naturally reduces bandwidth consumption and data latency, and
only changed data blocks are sent to the cloud after the initial data is delivered.
CloudArray also provides a multi-layered AES 256-bit encryption. Both data and metadata are encrypted separately, with
two different sets of keys. Furthermore, the keys themselves are password protected.
Figure 66: CloudArray Local and Cloud storage
VXRAIL CONCEPTS AND ARCHITECTURE
70 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
In conclusion, CloudArray offers VxRail Appliance environments a valuable set of extended services to make cloud tiering
simple, secure, reliable, and efficient. For more information about CloudArray, refer to EMC CloudArray Product
Description and Administrator Guides:
https://www.emc.com/collateral/guide/h13456-cloudarray-pdg.pdf
http://uk.emc.com/collateral/TechnicalDocument/docu60786.pdf
INTEGRATED BACKUP AND RECOVERY WITH VSPHERE DATA PROTECTION (VDP)
VxRail Appliances interoperate with vSphere Data Protection (VDP) for extended backup and recovery services. VDP is
deployed as a Linux-based virtual appliance and includes up to 8TB of backup virtual disks per ESXi host. VDP protects
every application or VM on the VxRail Appliance. It features the familiar vCenter Server interface and is powered by EMC
Avamar with built-in enterprise deduplication to reduce network bandwidth and shrink backup windows. VDP leverages
vCenter management for one-step recovery with verification, enabling 30 percent-faster backups compared to disk
backup. VDP provides agentless backup and recovery for VMs running VSAN Datastores. VDP’s deduplication uses a
variable-length segment algorithm that reduces consumption in backup storage. Backup data can also be moved off-site
using replication.
VDP backs up VMs without running any services within the VM itself. APIs allow VDP to connect to the ESXi host running
the VM and to take a snapshot via a process similar to VSAN’s standard snapshot technology. The VDP snapshot is a
static, read-only, point-in-time reference that non-disruptively captures virtual-disk data and VM-configuration
information. The snapshot information is then copied to backup media, and VDP tracks changes to disk sectors
usingchanged-block-tracking (CBT).
In addition, VDP has the ability to reduce bandwidth consumption by using SCSI HotAdd for backup data transmission.
VDP attaches a vdisk to the backup storage device the same way the vdisk would attach to a VM. As long as the ESXi host
of the VM being backed up has access to the backup storage device, VDP does not use the network. (See the diagram in
Figure 67 below.) If the ESXi host cannot access the backup storage device, VDP sends the encrypted snapshot data
across the network using an incremental transmission to maintain low bandwidth.
Figure 67: vSphere Data Protection
VXRAIL CONCEPTS AND ARCHITECTURE
71 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
INTEGRATED REPLICATION WITH RECOVERPOINT FOR VIRTUAL MACHINES
EMC’s RecoverPoint for VMs (RPVM) provides simple and efficient local and remote VM-level replication for VxRail
Appliance deployments. It supports synchronous or asynchronous replication over any distance and includes built-in
capabilities for workflow and disaster-recovery automation (as illustrated in Figure68below). RCVM is integrated with
vCenter to provide continuous data protection, built-in orchestration and automation, and recovery for VMs to any point in
time. It also features deduplication and compression and uses algorithms to reduce bandwidth consumption. Each VxRail
Appliance includes RecoverPoint for VM licenses to replicate 15 VMs.
RecoverPoint for VMs has three architectural components which are fully integrated and deployed in a VMware ESXi server
environment: ThevCenter plug-in, a RecoverPoint write-splitter embedded in vSphere ESXi, and a virtualappliance. VxRail
Appliance implements RCVM as a virtual appliance. A RecoverPoint write-splitter embeds directly into the ESXi kernel on
all servers with protected workloads, allowing replication and recovery at the virtual-disk (VMDK and RDM) granularity
level. Replication provisioning occurs through vCenter, using a simple user interface to select the destination for the
replication, define the consistency group of multiple VMs representing inter-dependent applications, set the data-
protection policies, and auto-provision VMDKs and VMs on the replicas. The automated workflows for disaster recovery
include: Recovery from logical corruption to any point, failover and failback of specific consistency groups, and non-
disruptive DR test. RPVM’s compression, deduplication, and advanced bandwidth-reduction algorithms dramatically
decrease WAN bandwidth consumption by up to 90 percent, saving associated communication costs. RCVM scales along
with the VxRail Appliance and can support the maximum 16-appliance configuration and thousands of VMs.
Figure 68: RCVM implements a journal model that tracks changes to the virtual machine as rolling data that can be
unrolled to a specific point in time.
VXRAIL CONCEPTS AND ARCHITECTURE
72 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
VxRail Appliance Use-Case Examples
VCE VxRail Hyper-Converged Infrastructure Appliances have been deployed successfully to fit many use cases. This
section describes two such use cases—one for a virtual desktop infrastructure (VDI) platform and one for a remote
office/branch office IT infrastructure platform. Each use case is then highlighted in a specific customer solution
implementation. These customers benefit from the simplicity and business value of the VxRail Appliance.
USE CASE: CREATE IT CERTAINTY FOR VIRTUAL DESKTOP
INFRASTRUCTURE (VDI)
VCE VxRail Appliances are the easiest, fastest, most affordable way to implement a high performance VDI infrastructure.
Rapidlydeploy an appliance that integrates market-leading compute, storage, virtualization, and management software
from EMC andVMware to set up a VDI infrastructure in minutes. Flexible configuration options and modular scalability
ensure that optimumperformance and capacity are always available whether you are deploying hundreds or thousands of
virtual desktops. The VxRail Appliance’s highly redundantarchitecture, integrated EMC data protection software, and non-
disruptive upgrades create certainty that virtualdesktops will always be available to end users and that the user
experience will always exceed expectations.VCE VxRail Appliances are a family of hyper-converged infrastructure (HCI)
appliances that include a full suite of industry-leadingdata services, including replication, backup, and recovery for data
protection. Built on the foundation of VMware Hyper-Converged Software and managedthrough the familiar VMware
vCenter interface, VxRail Appliances provide customers with a familiar experience that also allows them to take advantage
of the hallmark benefits of VCE—increased agility, simplified operations, and lower risk.
VxRail Appliance Advantages for VDI
Quick and easy automated deployment with power-on to VM creation in minutes and easy ongoing VM management
Scalability from 80 to 600 virtual desktops per appliance, and a maximum 9,600 desktops in a fully-populated VxRail Appliance cluster
One-click, non-disruptive patches and upgrades
Application uptime ensured through highly available VMware VSAN
Automated operational and disaster-recovery orchestration for VMs, including local and remote replication and continuous data protection with granular recovery to any point in time
VxRail Appliances enable customers to reduce VDI footprints, saving power and infrastructure costs while minimizing
administrativeburdens and lowering operational costs. The modular, just-in-time purchase approach enables predictable
evolution with a repeatable, simple, and agile means to scale on demand. VxRail Appliances can host virtual desktops from
VMware,Citrix, and other VDI vendors. Businesses can be confident that VxRail Appliances will meet performance and
capacity demands associated with desktop growth andapplication and user demands through continuous hardware and
software evolution. VxRail Appliances seamlessly integrate new enterpriseclass x86 and storage technologies and non-
disruptively update to the latest VMwaresoftware to ensure thatthe VDI deployment can continuously modernize to meet
business demands.
00002016
VXRAIL CONCEPTS AND ARCHITECTURE
73 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Meeting the Virtualization Challenge for Federal Agencies
Business Challenge
The IT challenges facing today’s federal government organizations are much like those of their corporate counterparts:
Deadlines and budgets shrink while expectations grow. They need to provide security and the freedom and flexibility to
support a mobile workforce. However, federal agencies also face the added pressures of public oversight. IT purchases
may be subject to strict procurement guidelines, require more than the typical due diligence in planning and may take
more time for purchase approvals. In one study of federal IT professionals, more than half (54 percent) said they do not
believe that their agency is able to acquire new IT resources in a timely manner. This is a challenge, especially in light of
the fact that many federal agencies are vulnerable to the problems and inefficiencies of aging IT systems and
infrastructures. The same survey noted that 77 percent felt that their agencies needed a more flexible IT infrastructure.
Business Solution
For increasing numbers of federal agencies, the answer to the challenge is IT resource virtualization. A virtual desktop
infrastructure (VDI) puts resources precisely where they are needed and in the strength they are needed at a moment’s
notice. Virtualized IT infrastructures are in place in most large organizations today. But until the recent advent of HCI
technology, they have been beyond the reach of smaller federal agencies or departments within large federal
organizations. With VCE VxRail Appliances, federal organizations can take advantage of a ―just-in-time‖ approach to
deployment and expansion. An organization can start with a single appliance and then build out an IT infrastructure over
time. This can help expedite the procurement process by keeping incremental purchase amounts for technology below
discretionary federal agency spending limits. It can also reduce the need for overprovisioning and facilitate the creation of
a master configuration that can be replicated in other departments within the organization.
VxRail Appliances make federal agencies more confident intheir IT infrastructure because they provide a pre-configured,
pre-tested solution jointly developedby EMC/VCE and VMware, trusted vendors by organizations around the world, and
they are backed by a single point of 24/7global support for all appliance hardware and software. With VxRail Appliances,
businesses can be confident that the virtual infrastructure will work today and will lead them along the path to more
innovative technologies, from cloud computing to the software-defined data center (SDDC).
VXRAIL CONCEPTS AND ARCHITECTURE
74 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
USE CASE: SIMPLIFYING THE DISTRIBUTED ENTERPRISE ENVIRONMENT
Distributed Enterprises usually have a central IT staff that creates the overall business network architecture for the
enterprise, and also have many remote offices that are essential to running the business, normally with limited on-site
technical staff. However, the infrastructure and data at these distributed locations is mission critical. Typical important
operations found in distributed enterprises are warehousing and distribution, manufacturing of the company's core
products and mobile or remote life-saving operations like health clinics. Support responsibilities for these remote
operations usually fall to the central IT staff. According to Enterprise Strategy Group (ESG) in their 2015 research report
Remote Office/Branch Office Technology Trends, 72 percent of organizations intend to increase spending on remote office
IT infrastructure. In addition to the above challenges, footprint is an issue because remote locations, unlike data centers,
do not have the dedicated space, power, or cooling capabilities necessary for multiple servers running multiple
applications. This means remote organizations are much more sensitive to server sprawl. While some organizations may
look to the cloud to reduce server sprawl and centralize operations, in many cases that is not feasible. This is because
offices either are in remote locations with limited Internet service or have minimal WAN bandwidth and redundancy
available. So issues such as latency and availability become limiting factors. VCE VxRail Appliances are ideal for
consolidating multiple applications in a remote location onto a single high-performance and highly-available platform that
is easy to deploy and manage.
VCE VxRail Appliances that integrate compute, storage, virtualization, and management software from EMC and VMware
are theoptimal endpoints for the distributed enterprise. As an integral solution in the VCEconverged infrastructure
portfolio, VxRail Appliances can be monitored with VCE Vision™ Intelligent Operations, enabling IT to have visibility across
the distributed solution from the same single-pane-of-glass console used to manage the data center infrastructure. The
VxRail Appliance enables customers to consolidate multiple remote office applications onto a single appliance. VMware
VirtualSAN software integrated with flash or hybrid storage ensures the highest possible performance since Virtual SAN is
embedded inthe hypervisor and eliminates many data path bottlenecks. Simple deployment enables customers to be up
and running in 15minutes. The local team only needs to plug in the appliance and power it up. All other configuration can
be done remotely.In addition, the VxRail Appliance is the only HCI appliance on the market offering Quality of Service
(QoS) functionality thateliminates ―noisy neighbors.‖ This functionality makes it certain that multiple applications can be
hosted on the same appliance or inthe same cluster without performance impact.
VxRail Appliance Advantages for Distributed Enterprises
Tailor compute and capacity deployment for each remote location
Simple, standard set-up reduces IT skills needed at remote locations
Part of a complete portfolio of converged infrastructure core-to-edge solutions
Backup locally at the remote office or over the WAN to central data centers with RecoverPoint for VMs
Management and visibility across the distributed enterprise with VMware tools and VCE Vision™ software
VXRAIL CONCEPTS AND ARCHITECTURE
75 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
Meeting the Distributed Enterprise Challenge for State and Local Agencies
Business Challenge
For state and local agencies, the promises of new technology come with unique new challenges as well. New applications,
includingthose for remote and mobile computing, for example, can boost user productivity, but they present performance
and data-storage demands for aging IT infrastructures that their original planners could not have foreseen. State and local
systems are barely able to keep up with the refresh cycles of their various hardware and software components,much less
adjust to today’s demands for tightened service-level agreements, shorter project deadlines, and shrinking
budgets.Pressured to keep costs low, agencies have difficulty justifying specialized IT technicians or even new real
estateto supportan upgrade in IT infrastructure.
Business Solution
For a fast-growing segment of state and local agencies, a hyper-converged applianceis an effective solution to the
problems ofhigh expectations and small budgets. Leveraging VxRail Appliances reduces cost by eliminating conflicting
system-refresh cycles and redundant software and the need forspecialized IT technicians. VxRail Appliances provide the
ability to put compute resources where they are most needed at any given time, saving the cost of over-provisioning IT
systems and building out new office spacefor larger servers, storage, or networking gear. With the emergence of VxRail
Appliances, the benefits ofa Software-Defined Data Center (SDDC) are within the reach of state and local agencies.
With conventional IT systems, deployment can take months, to plan, procure, install, configure, provision, and test. And it
can require the services of technicians skilled in servers, storage, networking, and applications. The more time ittakes for
deployment, the higher the cost and the more likely the project will be stopped in its tracks or diminished in scopeby
budget-conscious regulators.VxRail Appliances avoid these pitfalls because they are totally self-contained and thoroughly
tested by EMC/VCE before they areshipped. Wizard-based automation helps non-technical staff set up pools of virtual
machines for users. Once this setup is complete, it takesjust 15 minutes from power-on to creation of a new virtual
machine.Expansion is a simple matter of plugging in a new node or adding another appliance. New nodes are hot
swappable, so the appliance does not have to be powered down and no new software is required to grow your
infrastructure. In addition, VxRail Appliances have a systemarchitecture that is predictable and repeatable, new versions
of a master configuration can be installed into other offices without new testing or troubleshooting.
VXRAIL CONCEPTS AND ARCHITECTURE
76 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
PRODUCT INFORMATION
For documentation, release notes, software updates, or for information about EMC products, licensing, and service,
go to the EMC Online Support site (registration required) at: https://support.EMC.com.
PRODUCT SUPPORT
Single source, 24X7 global support is provided for VxRail Appliance hardware and software via phone, chat, or
instant message. Support also includes access to online support tools and documentation, rapidon-site parts
delivery and replacement, access to new software versions, assistance with operating environment updates, and
remote monitoring, diagnostics and repair with EMC Secure Remote Services (ESRS).
EMC PROFESSIONAL SERVICES FOR VXRAIL APPLIANCES
EMC offersinstallation and implementation servicesto ensure smooth andrapidintegrationof VxRailAppliances into
customer networks. The standard service, optimal for a single appliance, provides an expert on site to perform a
pre-installation checklist with the data-center team, confirm the network and Top of Rack (TOR) switch settings,
conduct site validation, rack and cable, configure, and initialize the appliance. Finally, an on-site EMC service
technician will configureEMC Secure Remote Services (ESRS) and conduct a brief functional overviewon essential
VxRail Appliance administrative tasks. A custom version of this installation and implementation service is available
for larger-scale VxRail Appliance deployments, including those with multiple appliances or clustered environments.
Also offered is VxRail Appliance extended service, which is delivered remotely and provides an expert service
technician to rapidly implementVxRail Appliance pre-loaded data services (RecoverPoint for Virtual Machines,
vSphere Data Protection, and CloudArray).
vSPHERE ORDERING INFORMATION
Beginning May 9, 2016, the VxRail Appliance is moving to a vSphere license-independent model to allow customers
to use any existing eligible vSphere licenses. This VxRail Appliance vSphere license-independent model (also called
―bring your own‖ or BYO vSphere License model or VMware Loyalty Program model or VLP model) allows customers
to leverage a wide variety of vSphere licenses they may have already purchased. Therefore, the VxRail Appliance
bundled vSphere Standard Edition licenses option will no longer be an orderable option.
For the VxRail Appliance BYO vSphere license model, several vSphere license editions are supported including
Enterprise+, Standard, and ROBO editions. Also supported are vSphere licenses from Horizon bundles or add-ons
when the appliance is dedicated to VDI. Using vSphere licenses editions other than Enterprise+ editions requires
VxRail 3.5, which will be available in June.
VXRAIL CONCEPTS AND ARCHITECTURE
77 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
If vSphere BYO licenses need to be purchased, they should be ordered through the customer’s preferred VMware
channel partner or from VMware directly. vSphere licenses will be orderable from EMC in July. BYO license acquired
through VMware ELA, VMware partners or EMC will receive singe call support from EMC. See the VMWare Loyalty
Program (VLP) FAQ on the enablement center (https://www.emc.com/collateral/faq/vmware-vsphere-loyalty-
program-vce-vxrailappliances.pdf) for additional details.
WE’D LIKE TO HEAR FROM YOU!
Feedback will help us continue to improve the accuracy, organization, and overall quality of EMC user publications.
Please send feedback regarding this TechBook to: [email protected].
VXRAIL CONCEPTS AND ARCHITECTURE
78 © 2016 VCE COMPANY, LLC. ALL RIGHTS RESERVED.
ABOUT VCE VCE, an EMC Federation Company, is the world market leader in converged infrastructure and converged solutions. VCE
accelerates the adoption of converged infrastructure and cloud-based computing models that reduce IT costs while improving
time to market. VCE delivers the industry's only fully integrated and virtualized cloud infrastructure systems, allowing customers
to focus on business innovation instead of integrating, validating, and managing IT infrastructure. VCE solutions are available
through an extensive partner network, and cover horizontal applications, vertical industry offerings, and application development
environments, allowing customers to focus on business innovation instead of integrating, validating, and managing IT
infrastructure.
For more information, go to vce.com.
Copyright © 2010-2016 VCE Company, LLC. All rights reserved. VCE, VCE Vision, VCE Vscale, Vblock, VxBlock, VxRack, VxRail, and the VCE logo are registered trademarks or trademarks of VCE Company LLC. All other trademarks used herein are the property of their respective owners.