sto2115be vsphere storage best practices or distribution · planned lun/disk removal best practice...

Thorbjoern DonbaekRohan Pasalkar

STO2115BE

#VMWorld #STO2115BE

vSphere Storage Best Practices

VMworld 2017 Content: Not fo

r publication or distri

bution

• This presentation may contain product features that are currently under development.

• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

• Technical feasibility and market demand will affect final delivery.

• Pricing and packaging for any new technologies or features discussed or presented have not been determined.

Disclaimer

2#STO2115BE CONFIDENTIAL

bution

Agenda

• ESXi Storage Architecture Overview

• vSphere Storage Configuration and Best Practices

– PSA Best Practices

– iSCSI Configuration Best Practices

– VMFS Best Practices

– NFS Best Practices

• Questions

– Please hang onto your questions till end

bution

ESXi Storage Architecture Overview

bution

• The flow of an IO

– VM initiates IO through VMM layer

– FSS layer can fork into VMFS, NFS etc.

• RDM goes directly to PSA Device layer

• VMFS pushes command to PSA

– Queuing and IO scheduling happens in PSA Device Layer

– NMP and third party multi-pathing plugins in PSA layer

• Sub-plugins of NMP are SATP and PSP

– Talk focus on VMFS, NFS, PSA, iSCSI

FSS + FDS

VMFS, …

PSA Device layer

PSA Plugins

Path + Adapter

Native Driver

Application

Guest OS

SATP PSP

What we will focus on here

#STO2115BE CONFIDENTIAL

bution

vSphere Storage Configuration and Best Practices

bution

PSA Best Practices

bution

Planned LUN/Disk Removal Best Practice

• APD is the loss of all paths to a target

– Best practice: avoid single points of failure in topology to avoid APD

• PDL is the removal of a LUN at a target

– In vSphere 5.5 and onwards, a device in PDL will be automatically removed when not in use, but please use planned removal

– In vSphere 5.5 and onwards, you can terminate a VM on PDL• https://docs.vmware.com/en/VMware-vSphere/5.5/com.vmware.vsphere.storage.doc/GUID-AA39FBEF-AF50-4D67-9362-32FA4A6C9105.html

• Best practice: Whether device removal reports APD or PDL the planned removal process should be used and is the same

– First unmount all VMFS volumes on the LUN(s), then detach LUN(s) before finally removing the LUN(s)/target – see KB 2004605 for the exact process

vSphere

Datastore

vSphere

Datastore

bution

vSphere

Virtual disks Virtual disks

vSphere

Virtual disks Virtual disks

SCSI Scheduler (pre vSphere 6.0) SCSI Scheduler (post vSphere 6.0)

Single queue per

<VM, SCSI-LUN>

Shares, IOPS limit,

reservation settings are

cumulatively set for the

Fairness across VMs but

not across VMDKs

A queue for every VMDK

Shares, IOPS limit,

reservation are restricted

per-VMDK

Fairness across VMs and

Settings must be done per

Virtual disk/VMDK.

This mechanism is

enabled by default, but

can be disabled using the

following documentation.

https://docs.vmware.com/en/VMware-vSphere/6.0/com.vmware.vsphere.storage.doc/GUID-99C31BE3-D39D-4DA1-89A3-EC9DEA5CC24E.html

bution

SATP and PSP Best Practices

• SATP should only be changed per storage vendors recommendation/guides

– Reason is that this is what has been qualified by the storage vendor

• Always use a PSP that has been qualified by the storage vendor

– If PSP_RR is one of the qualified options, use PSP_RR

• Rotates through all active paths

• Provides best performance with good load spread

• Now also works with WSFC

– Best PSP_RR practice

• Only change settings like iops, bytes, useANO per array vendor or VMware recommendations

vSphere

Datastore

bution

LUN Count Best Practice

• Spread your load across many LUNs

– Helps spread load across paths too

– Use at least 2x-3x the number of paths

– Using more LUNs will increase the number of commands outstanding at the array (up to adapter limit)

• This will increase throughput as long as the array can benefit from more parallelism and you are not bottlenecked on other things (path/adapter bandwidth, CPU…)

– For heavy IO VMs consider having only one or few VMs per LUN

• Also consider more LUNs per VM and in extremes more virtual HBAs per VM

vSphere

Active

vSphere

Active

active

bution

Adaptive Queuing by Throttling LUN Queue Depth

• The VMkernel can throttle the LUN Queue Depth (QD) when congestion is detected

– Enabled by QFullSampleSize and QFullThreshold values

• QD reduced when seeing BUSY or QUEUE FULL status for more than QFullThreshold out of QFullSampleSize IOs

• VMkernel will gradually restore the QD as congestion subsides

– In vSphere 5.1 patch 1 and onwards, QFullSampleSize and QFullThreshold can be set both globally and per device

• Per device setting takes precedence if both are set

– Better as different arrays can benefit from different thresholds

• Further details are documented in KB 1008113

vSphere vSphere

Shared

IOs failing

with BUSY or

QUEUE FULL

bution

Shared SCSI Disk

WSFC (Clustering) Best Practices

• WSFC best practices are generally documented in the

vSphere guide: Setup for Failover Clustering and

Microsoft Cluster Servicehttps://docs.vmware.com/en/VMware-

vSphere/6.5/com.vmware.vsphere.mscs.doc/GUID-1A2476C0-CA66-

4B80-B6F9-8421B6983808.html

• New: Ensure that the LUN number seen by both hosts

is the same.• If the LUN number is the same then after adding the

RDM disk on one host, the user can add the same RDM

disk by selecting – "Existing Hard Disk”.

• If the LUN numbers are different then the option ‘Existing

Hard Disk’ can’t be used, instead the option "Add RDM

disk” needs to be used. This is being documented in a

KB article.

VMware vSphere

L10L11

Add as RDM Add as RDM

bution

iSCSI Configuration Best Practices

bution

iSCSI Configuration Best Practices

• ESX support 3 types of iSCSI adapters: Software, dependent and independent

• 1Gb NIC minimum, prefer 10Gb NIC for multiple iSCSI LUNS for better throughput

• Recommended to use dedicated LAN for iSCSI traffic for better security and performance

• Recommend Jumbo Frames (MTU 9000) enabled E2E for better throughput.

• Enable digest as a robust mechanism to detect error & IPSec/IPv6/CHAP for enhanced security

• iBFT iSCSI boot is supported for both IPv4 and IPv6 under BIOS/UEFI boot mode

• ESX allows simultaneous use of all 3 types of adapters

• Mix of SW and HW adapters to access same target not allowed

• IPv4 & IPv6 Supported on all 3 adapters

• Mix of IPv4 & IPv6 not recommended

• Software iSCSI provides near line Rate Throughput (10 Gb)

• HW iSCSI Provides Low CPU ESX Utilization

bution

NIC Teaming vs Port Binding

• Avoid NIC teaming for iSCSI

– To utilize NIC teaming, two or more network adapters must be uplinked to a virtual switch. The main advantages of NIC teaming are increased network capacity and passive failover in the event one of the adapters in the team goes down

– Best practice for iSCSI is to avoid NIC teaming for iSCSI trafficand instead use port binding.

– Port binding fails over I/O to alternate paths based on SCSI sensecodes and not just network failures.

– Also, port bindings gives administrators the opportunityto load-balance I/O over multiple paths to the storage device

– NIC teaming provides fault tolerance only at NIC/port level.

– Please note that port binding is not supported with NIC teaming

bution

iSCSI Port Binding

• Port binding is used in iSCSI when multiple VMkernel ports for iSCSI reside in the same broadcast domain and IP subnet to allow multiple paths to be created with target

• Routing is also supported with port binding from vSphere 6.5 onwards

• With port binding we try to connect to each target portal from all bound ports, so make sure that all portals are reachable from all bound ports else you may experience discovery/login/rescan delays

bution

When Not to Use Port Binding

• Unless you use routing, port binding should not be used when

– Array Target iSCSI ports are in a different broadcast domain and IP subnet

– VMkernel ports used for iSCSI connectivity exist in a different broadcast domain, IP subnet and/or vSwitch

– Per vmk gateway (routing) is required for this to work (available in vSPhere 6.5) – see next slide

bution

iSCSI Routing Using Separate Gateway per vmkernel Port and Static Routes

• From vSphere 6.5 onwards, separate gateway per vmkernel port can be configured

To see gateway information per vmkernel port

# esxcli network ip interface ipv4 address list

Name IPv4 Address IPv4 Netmask IPv4 Broadcast Address Type Gateway DHCP DNS

---- ------------ ------------ -------------- ------------ -------------- ---------

vmk0 10.115.155.122 255.255.252.0 10.115.155.255 DHCP 10.115.155.253 true

vmk1 10.115.179.209 255.255.252.0 10.115.179.255 DHCP 10.115.179.253 true

vmk2 10.115.179.146 255.255.252.0 10.115.179.255 DHCP 10.115.179.253 true

With the above, you can use port binding to reach targets in different subnets.

• You can also configure static routes when initiators and targets are in different subnets

bution

VMFS Best Practices

bution

VMFS Best Practices and Considerations

• VMFS Datastore Management

• Storage Space Reclamation (UNMAP) in VMFS-6

• Thin vs Thick Storage Provisioning

• Storage Hardware Acceleration

• VMFS Snapshot Best Practices and Considerations

• VMFS Heartbeat and ATS

• VMFS Troubleshooting

bution

VMFS Datastore Management (VMFS5 vs VMFS6)

Features and

FunctionalityVMFS3/5 VMFS6

4KB Sector readiness No Yes

Automatic Space Reclamation No Yes

Space Reclamation from

guest OSLimited Yes

Manual Space Reclamation

through esxcliYes Yes

512e Storage Devices Yes. Not supported on Local 512e Devices. Yes

Default SnapshotsVMFSSparse for virtual disks < 2TB.

SEsparse for virtual disks >= 2TB.SEsparse

Access for ESXi host version

6.0 and earlierYes No

Note: VMFS3 is deprecated starting ESXi 6.0. No new VMFS3 datastore creation is allowed.

We strongly recommend to upgrade your datastore to VMFS5/6.

bution

VMFS Datastore Management (Considerations)

• Migration to VMFS6 version

– No in-place upgrade from VMFS3 / VMFS5 to VMFS6.

– Please use storage vMotion to migrate VMs from VMFS3 / VMFS5 datastore to VMFS6 datastore.

– Following PowerCLI Update-VmfsDatastore makes it easier for setups with large VM to upgrade to VMFS6.

Recommendation: Use VMFS6 if compatibility with previous vSphere version is not a requirement.

• Datastore Extents

– Best practice: Use homogeneous storage devices (512n or 512e) as extents for spanned VMFS datastore.

• Storage DRS

– VMFS5 and VMFS6 can coexist in the same datastore cluster.

– Best practice: Use homogeneous storage devices (512n or 512e) for all datastores in the cluster.

bution

Automatic Space Reclamation for VMFS6

• Automatic Unmap Features in VMFS6

– Support added for SCSI version 6 and SPC-4 extending UNMAP support to Linux GOS as well.

– Space reclamation priority is default set to Low (25MB per sec).

– Space reclamation is automated.

• Prerequisites for Linux/Windows GOS unmap support

– The virtual disk must be thin-provisioned.

– Virtual machine hardware must be of version 11 ( version 13 for Linux GOS) or later.

• Recommendation

– Automatic unmap is not supported on storage arrays with UNMAP granularity greater than 1MB. Please use CLI for these storage arrays to reclaim the free space.

– All GOS generated unmaps may not translate to exact space reclamation ( e.g. non-aligned unmaps). We recommend to use optimize drive or fstrim ( for Linux).

bution

UNMAP & Thin and Thick Provisioning (VMFS6)

LUN VMDK UNMAP GOS UNMAP VMFS Performance Impact

Thin Thin UNMAP Supported Automatic UNMAP Asynchronous UNMAP

processing with minimal

performance impact.

Thin Thick No UNMAP Automatic UNMAP Asynchronous UNMAP

processing with minimal

performance impact.

Thick Thin UNMAP Supported No UNMAP Just meta data updates.

No performance impact.

Thick Thick No UNMAP No UNMAP No Impact

bution

Thin vs Thick Storage Provisioning

Virtual Disk

Eager Zeroed

Thick Virtual

Lazy Zeroed

Thick Virtual

Thin Virtual Disk

Storage Space

AllocationUpfront Upfront

On demand as

virtual disk is written

In Guest

Support

No No Yes

Performance*

(*First Write)

HighestLess than

Less than

Virtual Disk

Creation TimeHigher Medium Instant

bution

Storage Hardware Acceleration

• Best Practice: Use hardware acceleration supports if supported by backend Storage Device.

• For XCOPY, default transfer size is 4MB. Support for different sizes for different vendors.

– Recommendation: Please consult your storage vendor for recommended setting for your usage.

• In a storage DR configuration, please ensure the remote Storage Array also has ATS enabled.

• Reference: Frequently Asked Questions for vStorage APIs for Array Integration Refer KB 1021976.

bution

VMFS Snapshot Best Practices and Considerations

• Best Practices for using Snapshot

– VMs with snapshot has performance penalty compared with VMs without snapshot. Reduce the amount of time you run on snapshot.

– Limit the number of snapshot levels active at a given point in time for optimal performance.

– To help reduce consolidation time run VMs for lesser duration on snapshot.

– When consolidation fails, some of the delta disk are left behind. Periodically go and remove them.

• Considerations:

– SESparse is default snapshot type for VMFS6.

– If you migrate a VM with the VMFSsparse snapshot to VMFS6, the snapshot format changes to SEsparse.

bution

VMFS Heartbeat and ATS

• VMFS Heart Beat

– VMFS relies on on-disk locks to arbitrate access to shared storage. It uses on-disk heartbeat to indicate liveness of hosts using the filesystem

– Starting ESX 5.5x VMFS HB started using ATS.

• Not using ATS exposes the issue of Delayed write

• ESX 6.5 (and ESX 6.0 U3) introduces hardening of ATS HB algorithm to handle erroneous miscompares from underlying storage.

• Best Practice: Use ATS for Heart Beat.

bution

VMFS Troubleshooting

• VOMA – Using vSphere On-Disk Metadata Analyzer

• Check vmfs, lvm metadata errors

– voma –m vmfs -f check -d /vmfs/devices/disks/<device>

• Fix vmfs, lvm metadata errors

– voma –m vmfs -f fix -d /vmfs/devices/disks/<device>

• Collect metadata dump

– voma -f dump -d /vmfs/devices/disks/<device> -D <path for dump file>

• Recommendation: Power off VM or migrate them to different datastore before using VOMA.

• Please refer KB 2036767 for more information.

bution

NFS Best Practices

bution

NFS Configuration Best Practices (NFSv3 & NFSv4.1)

• Network considerations:

• Minimum NIC speed 1Gb (recommended speed 10Gb)

• Use separate vmkernel port group for NFS traffic

• Avoid multiple hops between ESXi host and NFS server

• Configure same MTU for all the devices in the path (recommended MTU 9000)

• Recommendation: Configure with no Single Point Of Failure.

VMware EXi

Switch NFS Storage

bution

NFS Multi-pathing Best Practices (NFSv4.1 Only)

• Configure shares for Multi-pathing (Session Trunking)

NFS41 Storage

VMware Exi

192.168.1.y

192.168.1.z

192.168.1.x

NFS Storage192.168.1.y

192.168.2.y

VMware Exi

192.168.1.x

192.168.2.x

vSwitch2

vSwitch1

Interface in Different SubnetInterface in Same Subnet

bution

NFS Security Best Practices (NFSv3 & NFSv4.1)

NFSv3:

– ESXi server mount the NFS shares with root access

– To address this concern, either use dedicated LAN or VLAN

NFSv4.1:

– NFSv41 supports kerberos, for

• User authentication (krb5)

• Integrity of RPC headers (krb5i)

bution

Questions ?

bution

#STO2115BE CONFIDENTIAL 38

bution

sto2115be vsphere storage best practices or distribution · planned lun/disk removal best practice...

Documents

shang han lun

the effect of priorities on lun management operations ·...

best practices...> best practice 1: understand your internal...

lun alignment 163801

best practice 2012 - school for sick children northern …...

dynamic lun addressing

sustainable investment: best practice disclosure checklist...

konfucii lun yui

best practice in e.on decommissioning projects€¦ · best...

best practices for nondisruptive tiering via emc · pdf...

best practice best practice guidelines guidelines in using

rural transport solutions best practice. best practice...

competition law bulletin - zhong lun competition law...

brocade fibre channel security best practices · as a best...

new construction building commissioning best practice · pdf...

best practice) ·...

best practice and best fit

best practice นางสาววิชนีย์...

best practice

javascript: best practice -...