1m iops perf vsphere5

Upload: facer-dancer

Post on 07-Apr-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 1M Iops Perf Vsphere5

    1/18

    Achieving a Million I/OOperations per Second froma Single VMware vSphere

    5.0 Host

    Performance Study

    T E C H N I C A L W H I T E P A P E R

  • 8/3/2019 1M Iops Perf Vsphere5

    2/18

    T E C H N I C A L W H I T E P A P E R / 2

    Achieving a Million I/O Operations per Secondfrom a Single VMware vSphere 5.0 Host

    Table of Contents

    Introduction................................................................................................................................................................................................................... 3Executive Summary .................................................................................................................................................................................................. 3

    Software and Hardware ....................... ...................... ....................... ....................... ...................... ....................... ....................... ...................... ...... 3

    Test Workload .............................................................................................................................................................................................................. 3

    Multi-VM Tests ............................................................................................................................................................................................................. 4

    Experimental Setup ........................................................................................................................................................................................... 4

    Server ................................................................................................................................................................................................................. 4

    Storage Area Network ............................................................................................................................................................................... 4

    Virtual Platform ............................................................................................................................................................................................. 4

    Virtual Machines............................................................................................................................................................................................ 4

    Iometer Workload ........................................................................................................................................................................................ 4

    Test Bed ............................................................................................................................................................................................................. 5

    Results ...................................................................................................................................................................................................................... 7

    Scaling I/O Operations per Second with Multiple VMs ................................................................................................................ 7

    Scaling I/O Throughput as I/O Request Size Increase .................... ...................... ....................... ....................... ...................... .. 7

    CPU Cost of an I/O Operation with LSI Logic SAS and PVSCSI Virtual Controllers ...................................................... 9

    Performance of a Single VM ................................................................................................................................................................................. 11

    Experimental Setup ........................................................................................................................................................................................... 11

    Server ................................................................................................................................................................................................................. 11

    Storage Area Network ............................................................................................................................................................................... 11Virtual Platform ............................................................................................................................................................................................. 11

    Virtual Machines............................................................................................................................................................................................ 11

    Iometer Workload ........................................................................................................................................................................................ 11

    Test Bed ............................................................................................................................................................................................................ 11

    Results .................................................................................................................................................................................................................... 13

    Scaling I/O Operations per Second with Virtual SCSI Controllers ........................................................................................ 13

    Conclusion ................................................................................................................................................................................................................... 14

    Appendix A ................................................................................................................................................................................................................. 15

    Selecting the Right PCIe Slots for the HBAs ......................................................................................................................................... 15

    Building a Scalable Test Infrastructure.................................................................................................................................................... 16

    Assigning Virtual Disks to a VM .................................................................................................................................................................. 16

    Estimation of CPU Cost of an I/O Operation ........................................................................................................................................ 17

    References ................................................................................................................................................................................................................... 17

    About the Author ..................................................................................................................................................................................................... 17

    Acknowledgements ......................................................................................................................................................................................... 17

  • 8/3/2019 1M Iops Perf Vsphere5

    3/18

    T E C H N I C A L W H I T E P A P E R / 3

    Achieving a Million I/O Operations per Secondfrom a Single VMware vSphere 5.0 Host

    IntroductionOne of the essential requirements for a platform supporting enterprise datacenters is the capability to support

    the extreme I/O demands of applications running in those datacenters. A previous study1

    has shown that vSpherecan easily handle demands for high I/O operations per second. Experiments discussed in this paper strengthen

    this assertion further by demonstrating that a vSphere 5.0 virtual platform can easily satisfy an extremely high

    level of I/O demand that originates from the hosted applications, as long as the hardware infrastructure meets

    the need.

    Executive SummaryThe results obtained from the experiments show that:

    A single vSphere 5.0 host is capable of supporting a million+ I/O operations per second.

    300,000 I/O operations per second can be achieved from a single virtual machine (VM).

    I/O throughput (bandwidth consumption) scales almost linearly as the request size of an I/O operation

    increases.

    I/O operations on vSphere 5.0 systems with Paravirtual SCSI (PVSCSI) controllers use less CPU cycles than

    those with LSI Logic SAS virtual SCSI controllers.

    Software and HardwareBecause of its capabilities and rich feature set, VMware vSphere has become one of the Industrys leading choices

    as a platform to build private and public clouds. Significant improvements have been made to vSpheres storage

    stack in successive releases. These improvements enable vSphere to easily satisfy the ever-increasing demand

    for I/O throughput by almost all enterprise applications.

    The Symmetrix VMAX storage system is a high end, highly scalable storage system from EMC2. It allows the

    scaling of storage system resources through common building blocks called Symmetrix VMAX engines. VMAX

    engines can be scaled from one VMAX engine with one storage bay, to eight VMAX engines with a maximum of

    ten storage bays. Each VMAX engine contains four quad-core processors, up to 128GB of memory, and up to 16

    front-end ports for host connectivity.

    The Emulex LightPulse LPe120023 is an 8Gbits/s Fibre Channel PCI Express dual-channel host bus adapter3. The

    LPe12002 delivers some of the industrys highest performance, CPU efficiency, and reliability, making it a

    convenient choice to enable mission-critical and I/O-intensive applications in cloud environments.

    Test WorkloadIometer was used to generate the I/O load in all the experiments discussed in the pape r

    4. Iometer was configured

    to generate 16 or 32 Outstanding I/Os (OIOs) with 100% random and 100% read requests. The size of the I/O

    requests varied between 512 bytes, 1KB, 2KB, 4KB, and 8KB depending on the experiments. The I/O sizes used in

    the experiments represent the I/O characteristics of a wide gamut of transaction-oriented applications such as

    databases and enterprise mail servers.

    http://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_Referenceshttp://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_Referenceshttp://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_Referenceshttp://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_Referenceshttp://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_Referenceshttp://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_Referenceshttp://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_Referenceshttp://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_Referenceshttp://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_Referenceshttp://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_Referenceshttp://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_Referenceshttp://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_Referenceshttp://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_Referenceshttp://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_References
  • 8/3/2019 1M Iops Perf Vsphere5

    4/18

    T E C H N I C A L W H I T E P A P E R / 4

    Achieving a Million I/O Operations per Secondfrom a Single VMware vSphere 5.0 Host

    Multi-VM Tests

    Experimental Setup

    Server

    4 Intel Xeon Processors E7-4870, 2.40GHz, 10 cores

    256GB memory

    6 dual-port Emulex LPe12002 HBAs (8Gbps)

    Storage Area Network

    VMAX with 8 engines

    4 quad-core processors and 128GB of memory per engine

    64 front-end 8Gbps Fibre Channel (FC) ports

    64 back-end 4Gbps FC ports

    960 * 15K RPM, 450GB FC drives

    1 FC switch

    Virtual Platform

    vSphere 5.0

    Virtual Machines

    Windows Server 2008 R2 EE x64

    4 vCPUs

    8GB memory

    3 virtual SCSI controllers

    Iometer Workload

    1 worker for a pair of virtual disks (five workers per VM)

    100% random

    100% read

    An access region of 6.4GB in each virtual disk (a total of 384GB on 60 virtual disks)

    16 or 32 OIOs depending on the test case

    Request size of 512 bytes, 1KB, 2KB, 4KB, or 8KB depending on the test case

  • 8/3/2019 1M Iops Perf Vsphere5

    5/18

    T E C H N I C A L W H I T E P A P E R / 5

    Achieving a Million I/O Operations per Secondfrom a Single VMware vSphere 5.0 Host

    Test Bed

    A single physical server running vSphere 5.0 was used for the experiments. Six identical VMs were created on this

    host. Six dual-port 8Gbps FC HBAs were installed in the vSphere host. All 12 FC ports of these HBAs were

    connected to a FC switch. The VMAX was connected to the same FC switch via 60 8Gbps FC links, with each linkconnected to a separate front-end FC port on the array. Refer to Building a Scalable Test Infrastructure in the

    appendix for more details.

    A total of 480 RAID-1 groups were created on top of 960 FC drives in the VMAX array. A single LUN was created

    in each RAID group. A 250GB metaLUN was created on a set of 8 LUNs, resulting in a total of 60 metaLUNs. All

    60 metaLUNs were exposed to the vSphere host. Separate VMFS datastores were created on each of the 60

    metaLUNs. To ensure high concurrency and effective use of all the available I/O processing capacity, each VMFS

    datastore was configured to use a fixed path via a dedicated FC port on the array.

    A single thick*

    Each VM was assigned a total of 10 virtual disks

    virtual disk was created in each VMFS datastore. The virtual disks were assigned to the VMs as

    follows:

    All 10 virtual disks in a VM were distributed across three virtual SCSI controllers. Refer to

    for Iometer testing.

    Assigning VirtualDisks to a VM for more details on the distribution of virtual disks.

    The virtual SCSI controller type was varied between LSI Logic SAS and PVSCSI for the comparison tests.

    A total of 384GB (6.4GB in each virtual disk) of disk space was used by Iometer to generate I/O load in all

    VMs. VMAX, because of its 1TB aggregate memory, was able to cache most of the disk blocks belonging to

    this 384GB disk space.

    *Also known as eagerzeroed thick.

    These virtual disks were different than the virtual disk that contained operating system and Iometer applicationrelated

    files.

  • 8/3/2019 1M Iops Perf Vsphere5

    6/18

    T E C H N I C A L W H I T E P A P E R / 6

    Achieving a Million I/O Operations per Secondfrom a Single VMware vSphere 5.0 Host

    Figure 1. Configuration for Multi-VM Tests

  • 8/3/2019 1M Iops Perf Vsphere5

    7/18

    T E C H N I C A L W H I T E P A P E R / 7

    Achieving a Million I/O Operations per Secondfrom a Single VMware vSphere 5.0 Host

    ResultsFor each experiment, different metrics such as I/O operations per second (IOPS), megabytes per second (MBps),

    and latency of a single I/O operation measured in milliseconds (ms) were collected to analyze the I/O

    performance measured under different test scenarios.

    Scaling I/O Operations per Second with Multiple VMs

    This test focused on illustrating the scalability of I/O operations per second on a single host as the number of VMs

    generating the I/O load increased. Iometer in each VM was configured to generate a similar I/O load using 32

    OIOs of 8KB size. The number of VMs generating the I/O load was increased from one to six. In each case, the

    total I/O operations per second and average latency for each I/O operation was measured in each VM. This test

    was done with only PVSCSI controllers.

    Figure 2. Scaling I/O Operations per Second with Multiple VMs

    Figure 2 shows the aggregate number of I/O operations per second achieved as the number of VMs were

    increased. Aggregate I/O operations per second scaled from 200 thousand to slightly above 1 million as the

    number of VMs was increased from one to six. The latency of I/O operations remained under 2 milliseconds

    throughout the test, increasing by only 10% as the I/O load increased on the host.

    Scaling I/O Throughput as I/O Request Size Increase

    This test focused on demonstrating the scalability of I/O throughput achieved on the host as the size of the I/O

    operations increased from 512 bytes to 8KB. Iometer in each VM was configured to generate a similar I/O load

    using 16 outstanding I/Os of varying size. The number of VMs generating the I/O load was fixed at six. The total

    I/O operations per second, I/O throughput (megabytes per second), and average latency for each I/O operationwas measured in each VM. This test was run with LSI Logic SAS and PVSCSI virtual controllers in order to

    compare their impact on overall I/O performance.

    0.0

    0.4

    0.8

    1.2

    1.6

    2.0

    2.4

    0

    200,000

    400,000

    600,000

    800,000

    1,000,000

    1,200,000

    1 2 3 4 5 6

    Le(ms

    In/OupOpaopS

    (OP

    Number of Virtual Machines

    IOPs

    Latency

  • 8/3/2019 1M Iops Perf Vsphere5

    8/18

    T E C H N I C A L W H I T E P A P E R / 8

    Achieving a Million I/O Operations per Secondfrom a Single VMware vSphere 5.0 Host

    Figure 3. I/O Operations per Second with Different Request Sizes

    Figure 3 shows the aggregate I/O performance of all six VMs when generating similar I/O load, but each time

    with a different request size. For each request size, experiments were conducted with both LSI Logic SAS and

    PVSCSI controllers. As seen Figure 3, aggregate I/O operations achieved from six VMs remained well over 1

    million for all I/O sizes except 8KB. The average latency of a single I/O operation remained well under a

    millisecond

    Please note that the slightly lower number of aggregate IOPS observed with 8KB request size and 6 VMs in this

    test case compared to that with a same request size and a same number of VMs in the test case described in the

    section titled

    with each virtual SCSI controller type, except at 8KB request size. The increase in I/O latency as

    measured by Iometer was due to a corresponding increase in the I/O latency at the storage.

    Scaling I/O Operations per Second with Multiple VMs was primarily due to a lower number of

    OIOs used for this test.

    Note that the I/O latency in this test was lower than that in the section titled Scaling I/O Operations per Second with

    Multiple VMs as the number of OIOs was half of that used for the latter test.

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    1.2

    1.4

    0

    200,000

    400,000

    600,000

    800,000

    1,000,000

    1,200,000

    1,400,000

    512 bytes 1KB 2KB 4KB 8KB

    I/OLe(ms

    IOP

    I/O Size

    LSI - IOPs pVSCSI - IOPs LSI - Latency pVSCSI - Latency

  • 8/3/2019 1M Iops Perf Vsphere5

    9/18

    T E C H N I C A L W H I T E P A P E R / 9

    Achieving a Million I/O Operations per Secondfrom a Single VMware vSphere 5.0 Host

    Figure 4. Scaling I/O Throughput with the Request Size

    The corresponding aggregate I/O throughput observed in the host is shown in Figure 4. As seen in the figure,

    throughput scales almost linearly as I/O request size is doubled in each iteration. vSphere utilized the available

    I/O bandwidth to scale almost linearly from 592MB per second to almost 8GB per second as the I/O request size

    increased from 512 bytes to 8KB. The linear scaling clearly indicates that the vSphere software stack didnt

    present any type of bottlenecks to the I/O workload originating from the VMs. The slight drop observed at 8KB

    request size was due to a corresponding increase in the I/O latency observed in the storage array.

    Figures 3 and 4 also compare the aggregate I/O performance of the host when using two different virtual SCSI

    controllers - LSI Logic SAS and PVSCSI. The PVSCSI adapter provided 7% to 10% better throughput over LSI

    Logic SAS in all of the cases.

    CPU Cost of an I/O Operation with LSI Logic SAS and PVSCSI Virtual Controllers

    Another important metric that is useful to compare the performance of LSI Logic and PVSCSI adapters is the CPU

    cost of an I/O operation. A detailed explanation of the estimation methodology is given in Estimation of CPU

    Cost of an Operation. Figure 6 compares the CPU cost of an I/O operation with a LSI Logic SAS virtual SCSI

    adapter to that with a PVSCSI adapter for an I/O request size of 8KB.

    0

    2,000

    4,000

    6,000

    8,000

    0 1 2 3 4 5 6 7 8

    IOTo

    (MBee

    I/O Size (KB)

    LSI - MBps

    pVSCSI - MBps

  • 8/3/2019 1M Iops Perf Vsphere5

    10/18

    T E C H N I C A L W H I T E P A P E R / 10

    Achieving a Million I/O Operations per Secondfrom a Single VMware vSphere 5.0 Host

    Figure 5. CPU Cost of an I/O Operation with LSI Logic SAS and PVSCSI Adapters

    As seen in Figure 5, a PVSCSI adapter provides 8% better throughput at 10% lower CPU cost. These results clearly

    show that PVSCSI adapters are capable of providing better throughput at a lower CPU cost than LSI Logic SAS

    adapters at extreme load conditions.

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    1.2

    0

    200,000

    400,000

    600,000

    800,000

    1,000,000

    1,200,000

    LSI PVSCSI

    NmazCe/O

    IOP

    Virtual SCSI Controller

    IOPs Cycles / IO

  • 8/3/2019 1M Iops Perf Vsphere5

    11/18

    T E C H N I C A L W H I T E P A P E R / 11

    Achieving a Million I/O Operations per Secondfrom a Single VMware vSphere 5.0 Host

    Performance of a Single VMThe next study focuses on the performance of a single VM.

    Experimental Setup

    Server

    4 Intel Xeon Processors E7-4870, 2.40GHz, 10 cores

    256GB memory

    4 dual-port Emulex LPe12002 HBAs (8Gbps)

    Storage Area Network

    VMAX with 5 engines

    4 quad-core processors and 128GB of memory per engine

    40 front-end 8Gbps FC ports

    40 back-end 4Gbps FC ports

    960 * 15K RPM, 450GB FC drives

    1 FC switch

    Virtual Platform

    vSphere 5.0

    Virtual Machines

    Windows Server 2008 R2 EE x64

    16 vCPUs

    16GB memory

    1 to 4 virtual SCSI controllers

    Iometer Workload

    Two workers for every 10 virtual disks

    100% random

    100% read

    An access region of 6.4GB in each virtual disk

    16 OIOs

    Request size of 8K Bytes

    Test Bed

    A single VM running on the vSphere 5.0 host was used for this test. The number of vCPUs was increased from 4

    to 16. The amount of memory assigned to the VM was increased from 8GB to 16GB. The storage layout created

    for the multi-VM tests was reused for this study. Iometer was configured to use a total of 40 virtual disks. All the

    virtual disks were assigned to the single test VM in increments of 10. Each time, the set of 10 virtual disks were

    attached to a new virtual SCSI controller

    vSphere 5.0 supports a total of 4 virtual SCSI controllers per VM.

    . Iometer was configured to generate the same I/O load on every virtual

  • 8/3/2019 1M Iops Perf Vsphere5

    12/18

    T E C H N I C A L W H I T E P A P E R / 12

    Achieving a Million I/O Operations per Secondfrom a Single VMware vSphere 5.0 Host

    disk. This ensured a constant load on each virtual SCSI controller.

    Each set of 10 virtual disks was accessed through a dual-port HBA in the host but separate FC ports on the array.

    A total of four dual-port HBAs in the host and 40 front-end FC ports in the array were used to access all 40

    virtual disks.

    Figure 6. Test Configuration for Single-VM test

  • 8/3/2019 1M Iops Perf Vsphere5

    13/18

    T E C H N I C A L W H I T E P A P E R / 13

    Achieving a Million I/O Operations per Secondfrom a Single VMware vSphere 5.0 Host

    ResultsAs done in the multi-VM test case, different metrics such as I/O operations per second and latency for a single

    I/O operation measured in milliseconds were used to study the I/O performance.

    Scaling I/O Operations per Second with Virtual SCSI Controllers

    This test case focused on studying the scalability of I/O operations per second when the number of virtual SCSI

    controllers in a VM was increased.

    Figure 7. Scaling I/O Operations per Second with 8KB Request Size in a Single VM

    Figure 7 shows the aggregate I/O operations per second and the average latency of an I/O operation achieved

    from a single VM as the number of virtual SCSI controllers was increased. As shown in Figure 8, aggregate I/O

    operations per second increased linearly while the latency of I/O operations remained close after increasing from

    one to two virtual SCSI controllers.

    0.0

    0.4

    0.8

    1.2

    1.6

    2.0

    2.4

    2.8

    0

    60,000

    120,000

    180,000

    240,000

    300,000

    360,000

    420,000

    1 2 3 4

    I/OLe(ms

    IOP

    Number of Virtual SCSI Controllers

    pVSCSI - IOPs pVSCSI - Latency

  • 8/3/2019 1M Iops Perf Vsphere5

    14/18

    T E C H N I C A L W H I T E P A P E R / 14

    Achieving a Million I/O Operations per Secondfrom a Single VMware vSphere 5.0 Host

    ConclusionvSphere offers a virtualization layer that, in conjunction with some of the industrys biggest storage platforms,

    can be used to create private and public clouds that are capable of supporting any level of I/O demandsoriginating from the applications running in those clouds. Results of the experiments conducted at EMC labs

    illustrate that:

    vSphere can easily support a million+ I/O operations per second from a single host out of the box.

    A single VM running on vSphere 5.0 is capable of supporting 300,000 I/O operations per second at an 8KB

    request size.

    vSphere can scale linearly in terms of I/O throughput (bandwidth consumption) as the request size of an I/O

    operation increases.

    vSpheres Paravirtual SCSI controller offers lower CPU cost for an I/O operation compared to that of the LSI

    Logic SAS virtual SCSI controller.

    The results presented in this paper are by no means the upper limit for the I/O operations achievable through any

    of the components used for the tests. The intent is to show that a vSphere virtual infrastructure, such as the one

    used for this study, can easily handle even the most extreme I/O demands that exist in datacenters today.

    Customers can virtualize their most I/O-intensive applications with confidence on a platform powered by

    vSphere 5 and EMC VMAX.

  • 8/3/2019 1M Iops Perf Vsphere5

    15/18

    T E C H N I C A L W H I T E P A P E R / 15

    Achieving a Million I/O Operations per Secondfrom a Single VMware vSphere 5.0 Host

    Appendix ASelecting the Right PCIe Slots for the HBAsTo achieve a million+ I/O operations per second with an 8KB request size through the least number of HBAs

    required the HBAs to collectively provide 8GB per second of throughput. Although five HBAs, each with a dual

    port and supporting 8Gbps, could theoretically meet the 8GB per second throughput requirement, at that

    throughput each HBA would be operating near the HBAs saturation regime. To ensure sufficient concurrency

    and still have enough capacity to push a million+ IOPS, six HBAs were used for the exercise. This required each

    HBA to support at least 1.3GB per second. To support that kind of throughput, the HBAs had to be placed in those

    PCIe slots that were capable of supporting the required throughput from each HBA. The HBAs were placed in

    different slots as shown in Table 1. The maximum theoretical throughput of each slot is also shown in the table.

    Figure 8. PCIe Slot Configuration of the Server used for the Tests

    SLOTNUMBER

    PCIeVERSION

    NUMBER OFLANES

    MAXIMUMTHROUGHPUT

    **

    1 Gen 2 4 2GBps2, 3, 4, 6 Gen 2 8 4GBps7 Gen 2 16 8GBps

    Table 1. Details of the PCIe Slots

    **Each PCIe Gen 2 channel provides a theoretical max of 500MB per second.

  • 8/3/2019 1M Iops Perf Vsphere5

    16/18

    T E C H N I C A L W H I T E P A P E R / 16

    Achieving a Million I/O Operations per Secondfrom a Single VMware vSphere 5.0 Host

    Building a Scalable Test InfrastructureTo build a scalable virtual infrastructure that can support a million+ IOPS, a few tests were conducted initially. A

    single VM with 2 vCPUs and 8GB of memory was chosen as the basic building block. This VM was assigned 4

    thick virtual disks, each created on a separate VMFS datastore. Each datastore was created on a separate

    metaLUN in the VMAX array. Each datastore was accessed through a single 8Gbps FC port in the host, but with

    dedicated FC ports on the array. Iometer was configured to issue 100% random, 100% read requests to 6.4GB

    worth of storage space on each disk.

    With this Iometer profile, the VM produced 120K IOPS and 100K IOPS with 512 bytes and 8KB request sizes,

    respectively. If the vCPU and storage configuration of the VMs were to be doubled, it was theoretically possible

    to achieve 240K and 200K IOPS with 512 bytes and 8KB request sizes respectively through a dual-port 8Gbps FC

    HBA. Thus, a single VM with the following configuration was chosen as the building block for the tests: 4 vCPUs

    and 8GB of memory, issuing I/O requests to 10 virtual disks (all on separate VMFS datastores) through two ports

    of a dual-port adapter. The number of VMs was increased to six, with each VM being identical in the

    configuration. The final test setup used is shown in Figure 1.

    Assigning Virtual Disks to a VMTo achieve enough parallelism while pushing a high number of I/O operations per second, virtual disks in each VM

    were spread across multiple virtual SCSI controllers as follows:

    NUMBER OF VDISKS VIRTUAL SCSICONTROLLER ID

    4 03 13 2

    Table 2. Virtual Disk Assignment for Multi-VM tests

    NUMBER OF VDISKS VIRTUAL SCSICONTROLLER ID

    10 010 110 210 3

    Table 3. Virtual Disk Assignment for Single-VM Test

  • 8/3/2019 1M Iops Perf Vsphere5

    17/18

    T E C H N I C A L W H I T E P A P E R / 17

    Achieving a Million I/O Operations per Secondfrom a Single VMware vSphere 5.0 Host

    Estimation of CPU Cost of an I/O Operation%Processor time of the vSphere host was used to estimate the CPU cost of an I/O operation. When all six VMs

    were actively issuing 8KB I/O requests to their storage, the total %Processor time of the vSphere host was

    recorded using esxtop5. The CPU cost of an I/O operation was estimated using the following equation:

    (Average % processor time of the host Rated CPU clock frequency Number of logical threads) (100

    Total IO operations per second)

    The rated clock frequency of the processors in the server was 2.4GHz. By default, hyperthreading was enabled on

    the vSphere host. Hence the number of logical threads was 80.

    References1. VMware vSPhere 4 Performance with Extreme I/O Workloads. VMware, Inc., 2009.

    http://www.vmware.com/pdf/vsp_4_extreme_io.pdf.

    2. EMC Symmetrix VMAX storage system. EMC Corporation, 2011.

    http://www.emc.com/collateral/hardware/specification-sheet/h6176-symmetrix-vmax-storage-system.pdf.

    3. LightPulse LPe12002. Emulex Corporation, 2008.

    http://www.emulex.com/products/host-bus-adapters/emulex-branded/lightpulse-lpe12002/overview.html.

    4. Iometer.http://www.iometer.org/.

    5. Interpreting esxtop Statistics. VMware, Inc., 2010.http://communities.vmware.com/docs/DOC-9279.

    About the AuthorChethan Kumar is a senior member of Performance Engineering at VMware, where his work focuses on

    performance-related topics concerning database and storage. He has presented his findings in white papers and

    blog articles, and technical papers in academic conferences and at VMworld.

    Acknowledgements

    The author would like to thank VMwares partnersIntel, Emulex, and EMCfor providing the necessary gear to

    build the infrastructure that was used for the experiments discussed in the paper. He would also like to thank the

    Symmetrix Performance Engineering team of Thomas Rogers, John Aurin, John Adams and Dan Aharoni for

    installing and configuring the hardware setup and for running all the experiments. Finally, the author would like to

    thank Chad Sakac, VP of VMware Alliance, EMC for supporting this effort and ensuring the availability of the

    hardware components to complete the exercise in time.

    http://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_Referenceshttp://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_Referenceshttp://www.vmware.com/pdf/vsp_4_extreme_io.pdfhttp://www.vmware.com/pdf/vsp_4_extreme_io.pdfhttp://www.emc.com/collateral/hardware/specification-sheet/h6176-symmetrix-vmax-storage-system.pdfhttp://www.emc.com/collateral/hardware/specification-sheet/h6176-symmetrix-vmax-storage-system.pdfhttp://www.emulex.com/products/host-bus-adapters/emulex-branded/lightpulse-lpe12002/overview.htmlhttp://www.emulex.com/products/host-bus-adapters/emulex-branded/lightpulse-lpe12002/overview.htmlhttp://www.iometer.org/http://www.iometer.org/http://www.iometer.org/http://communities.vmware.com/docs/DOC-9279http://communities.vmware.com/docs/DOC-9279http://communities.vmware.com/docs/DOC-9279http://communities.vmware.com/docs/DOC-9279http://www.iometer.org/http://www.emulex.com/products/host-bus-adapters/emulex-branded/lightpulse-lpe12002/overview.htmlhttp://www.emc.com/collateral/hardware/specification-sheet/h6176-symmetrix-vmax-storage-system.pdfhttp://www.vmware.com/pdf/vsp_4_extreme_io.pdfhttp://../Local%20Settings/Temporary%20Internet%20Files/Content.Outlook/5EN6J0IE/1M-io-vsphere5-0%201%20(2).docx#_References
  • 8/3/2019 1M Iops Perf Vsphere5

    18/18

    VMware, Inc. 3401 Hillview Avenue Palo Alto CA 94304 USA Tel 877-486-9273 Fax 650-427-5001 www.vmware.comCopyright 2011 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at

    Achieving a Million I/O Operations per Secondfrom a Single vSphere 5.0 Host