Leveraging file-based shared storage for enterprise applicationsMathew GeorgePrincipal SDE3-053
Why use networked file storage? What’s new in Windows Server 2012?SMB 3.0 highlightsHow can apps make use of this?Programming considerations
Agenda
Why use networked file storage?Why not ? Historical reasons for not using file shares for I/O intensive server applicationsPoor performance (due to network fabric and protocol limitations)Unreliable connectionsUnreliable storage on the file serverLack of integrity guaranteesLimited tools for diagnostics and management
Why use networked file storage?The world has changedMore reliable network fabrics, multiple integrated NICsEthernet speed is competitive with Fiber ChannelRobust storage behind the file serverLow costNo need for specialized storage networking infrastructure or knowledgeEasier management and troubleshootingFile shares instead of LUNs and SAN zoningFamiliar access control and authentication mechanismsDynamic server/service relocation
Continuously available file serverScale-out file serverBandwidth aggregation (SMB multichannel)Support for new network fabrics (SMB Direct)RDMA on iWARP, RoCE, and InfinibandSMB encryptionStorage Spaces & ReFSEasier manageability and diagnosabilityApplication consistent backupsPowerShell based configurationETW events, performance counters
What is new in Windows Server 2012?
SMB
3.0
What is a continuously available file server?Insulates applications from network and server failuresI/O issued by an application will be resilient toTransient network failuresFailure of one or more network pathsPlanned or unplanned server/storage failuresMost existing Windows applications which work on local storage will work without modifications against a continuously available file share
Failover transparent to applicationSMB client and server handle failover gracefullyZero downtime – small IO delay during failoverMost* file and directory operations continue to work
Supports planned and unplanned failoversHardware or software maintenanceHardware or software failuresLoad rebalancing
RequiresSMB server in a failover clusterSMB server and client must implement SMB 3.0Shares enabled for ‘Continuous Availability’ (CA)
Continuously available SMB file server
File Server Cluster
File Server Node A
File Server Node B
\\fs\share
Normal operation
Failover share – connections and handles lost, temporary stall of IO
Connections and handles auto-recoveredapplication continues with no errors
\\fs\share
Server application
1
2
3
1
2
3
Active-active configurationUnified view of the shares and the file system across all nodesClients can connect to any nodeClients can be moved from one node to another transparently
Targeted for server application storageVirtualization and databasesIncrease available bandwidth by adding cluster nodesSMB 3.0 clients get scale-out and transparent failover
LimitationsNot suitable for a general purpose file serverSMB 2.x clients can connect, but no transparent failoverSMB 1.x clients cannot connect
Scale-out SMB file server
Datacenter network
Application cluster
Single logical file server (\\FS\Share)
Single file system namespace
Cluster file system
File Server Cluster
Win32 feature compatibilityFeature/API Set Standalone SMB Continuously
available SMBScale-out SMB
Basic Win32 I/O APIs Yes Yes YesHard links Yes (NTFS only) No NoAlternate data streams Yes (NTFS only) No NoSymbolic links Yes Yes YesWrite-through I/O Yes Yes YesUnbuffered I/O Yes Yes YesTransactional I/O No No NoEncryption (EFS)/Compression
Yes (NTFS only) Yes (NTFS only) No
Volume level (DASD) APIs No No NoOpportunistic locks Yes Yes Limited
SMB multichannel Use multiple connections to move data between SMB client and serverTolerate failure of one of more NICs Enhanced throughputMore bandwidth with multiple NICsSpread interrupt load across CPU coresComplements NIC teamingZERO configurationEasy monitoring and troubleshooting
Sample Configurations
Multiple 10GbE/IB RSS-capable NICs
SMB server
SMB clientNIC
10GbE/IB
NIC10GbE/
IB
Switch10GbE/
IB
Switch10GbE/
IB
NIC10GbE/
IB
NIC10GbE/
IB
Vertical blue lines are logical channels, not cables
Multiple 1GbE in LBFO team
SMB server
SMB client
NIC1GbE
NIC1GbE
LBFO
Switch1GbE
NIC1GbE
NIC1GbE
LBFO
SMB multichannel: A CPU comparisonSMB session, without multichannelOnly one TCP/IP connectionOnly one NIC is usedOnly one CPU core engagedCan’t use full 20Gbps
SMB session, with multichannelMultiple TCP/IP connectionsReceive Side Scaling (RSS) helps distribute
load across CPU coresFull 20Gbps available
CPU utilization per core
Core 1 Core 2 Core 3 Core 4SMB server
SMB clientNIC
10GbE/IB
NIC10GbE/
IB
Switch10GbE/
IB
Switch10GbE/
IB
NIC10GbE/
IB
NIC10GbE/
IB
SMB server
SMB clientNIC
10GbE/IB
NIC10GbE/
IB
Switch10GbE/
IB
Switch10GbE/
IB
NIC10GbE/
IB
NIC10GbE/
IB
CPU utilization per core
Core 1 Core 2 Core 3 Core 4
SMB direct (SMB over RDMA)New class of network hardware Speeds up to 56 GbpsFull hardware offload, low CPUoverhead, low latencyRemote DMA moves large chunksof memory between serverswithout any CPU interventionRequires RDMA capable interface Supports iWARP, RoCE, and Infiniband
File Client File Server
SMB server
Userkernel
Application
R-NIC
Network w/RDMA
supportNTFSSCSI
Network w/RDMA
support
R-NICDisk
SMB client
How can I use SMB direct?Applications continue to use existing Win32/.NET file I/O APIsSMB client makes the decision to use SMB direct at run timeNDKPI provides a much thinner layer than TCP/IPRemote direct-memory-access performed by the network interfaces
File Server
SMB Direct
ClientApplication
NICRDMA NIC
TCP/ IP
SMB Direct
Ethernet and/or
InfiniBand
TCP/ IP
Unchanged API
SMB ServerSMB Client
Memory Memory
NDKPINDKPI
RDMA NIC NIC
RDMA1
2
3
4
How do apps access shared file storage?Virtualized application instance running in a VM whose storage is on a remote file shareApplication directly running against a remotely mounted VHDBuilding apps on top of SQL Server 2012/SharePoint and hosting the databases on SMB 3.0 file sharesDirectly accessing remote storage (via UNC paths or mapped drives) instead of a directly attached volume
Virtualized applications: Hyper-V over SMBHyper-V in Windows Server 2012 fully supports hosting virtual disks on a SMB 3.0 file shareContinuously available shares for fault toleranceBandwidth aggregation using SMB multichannelLive storage migration over TCP/IP networksLive migration of running VMs in a hyper-V clusterNo changes are required in the applicationApplication continues to run on a local drive within the VM
Virtualized applications: Hyper-V over SMB Hyper-V server
Parent partitionChild partition
File server
User Kernel
Application
Storage VSC
NTFS SCSI/IDE
Network (RDMA option)
SMB client
Network (RDMA option)
NTFS SCSI
SMB server
Storage VSP
VHD stack
VM bus
NIC NIC
Configuring Hyper-V over SMBFull permissions on folder and share for administrator and computer account of Hyper-V hostsREM Folder permissions.MD F:\VMSICACLS F:\VMS /Inheritance:RICACLS F:\VMS /Grant Dom\HAdmin:(CI)(OI)FICACLS F:\VMS /Grant Dom\HV1$:(CI)(OI)FICACLS F:\VMS /Grant Dom\HV1$:(CI)(OI)FREM Share permissionsNew-SmbShare -Name VMS -Path F:\VMS -FullAccess Dom\HAdmin, Dom\HV1$, Dom\HV2$
Simply point the VHD to the UNC pathNew-VHD -VHDFormat VHDX -Path \\FS\VMS\VM1.VHDX -VHDType Dynamic -SizeBytes 127GBNew-VM -Name VM1 -Path \\FS\VMS -VHDPath \\FS\VMS\VM1.VHDX -Memory 1GB
Hyper-V over SMB: Standalone setupHighlightsSimplicity (file shares, permissions)Flexibility (migration, shared storage)Low costStorage is fault tolerant (mirroring, parity)
LimitationsFile server is not continuously availableVMs are not continuously available
Hyper-V host
VM
Hyper-V host
VM
VHD VHDX
File server
Share Share
Storage spaces
Space Space
File server
Hyper-V over SMB: Clustered setupAbility to cluster both the file server and Hyper-V hosts
HighlightsHyper-V VMs are highly availableFile Server is continuously availableStorage is fault tolerant
Hyper-V host
VM
Hyper-V host
VM
VHD VHDX
File server
Share Share
Clustered spacesSpace Space
Shared JBOD SAS
Failo
ver c
lust
erFa
ilove
r clu
ster
SQL Server support for SMB file sharesSQL Server 2008 R2Formalized support for storing user databases on SMB file shares• Removed the trace flag requirement• SMB 2.1 and 3.0 officially supported
Integrated SMB scenarios into automated test infrastructure and labsSQL Server 2012Added support for SQL Server clusters using SMB file shares• Adds flexibility to cluster configurations• Removes the drive-letter restriction for cluster groups
Added support for System DB on SMB file shares• Root of the installation can now be on the share
SQL Server 2012 and Windows Server 2012SMB transparent failoverFailover transparent to server applicationZero downtime – small IO delayOS guarantees timeliness and consistency of dataPlanned maintenance and unplanned failuresHW/SW maintenanceHW/SW failuresLoad rebalancingRequiresWindows failover clustersBoth server running application and file server must be Windows Server 2012
File Server Cluster
File Server Node A
File Server Node B
\\fs\salesdb
Normal operation
Failover share - connections and handles lost, temporary stall of IO
Connections and handles auto-recoveredapplication continues with no errors
\\fs\salesdb
SQL Server
1
2
3
1
2
3
\\fs\saleslog
\\fs\saleslog
SQL Server 2012 over SMB configurationFull permissions on the folder and share for SQL DBA and the SQL Server service accountREM Setup Folder permissions.MD F:\DataICACLS F:\Data /Inheritance:RICACLS F:\Data /Grant Dom\SQLDBA:(CI)(OI)F ICACLS F:\Data /Grant Dom\SQLService:(CI)(OI)FREM Create Share.New-SmbShare -Name SQLData -Path F:\Data -FullAccess Dom\SQLDBA, Dom\SQLService
Simply point to the UNC path of the share when creating the database
Direct access to shared file storage?Any Win32/WinRT/.NET application can access SMB remote file storage via UNC paths (\\server\share)Setup explicit credentials (optional) • credential manager [cmdkey OR control keymgr.dll]• net use or equivalent powershell command New-SmbMapping
Automatically get the benefits of SMB multichannel and SMB direct if appropriate hardware is availableExplicitly provision shares for continuous availability, scale-out access and encryption
Continuously available file handlesGuaranteed I/O fault tolerance on any file handle opened on a “continuously available” shareData consistency is guaranteed in the event of server/network failuresThe OS (SMB client/server) will re-establish connections, restore any lost state and retry I/O beneath the applicationIf the server/network is unreachable after a configurable timeout (60 seconds default), the I/O will fail and the file handle is lostConsistency guarantees for metadata changing operationsCreate, delete, rename, file extension/truncationBest effort guarantees for most directory operations
Continuously available file handlesWhat is the cost? File server operates in write-through mode, resulting in added disk I/ONeed disk subsystems which correctly honor write-throughAdditional disk I/O to track file handle state and metadata changes
What kind of apps will work well?Apps that use long lived file handlesApps that do a lot of I/O intensive processingApps that that are NOT metadata intensive
Continuously available file handlesShould applications explicitly care?Most do not, but if needed you can explicitly query for the “persistent handle” flag
PFILE_REMOTE_PROTOCOL_INFO protocolInfo;…if (!GetFileInformationByHandleEx( hFile, FileRemoteProtocolInfo, &protocolInfo, sizeof(protocolInfo) )) {
return TRUE; // Local filesystems do not support the query. }if (protocolInfo.Protocol == WNNC_NET_SMB && (protocolInfo.Flags & REMOTE_PROTOCOL_INFO_FLAG_PERSISTENT_HANDLE) != 0) {
return TRUE; // File handle is continuously available}return FALSE;
Writing clustered client applicationsHave a clustered app which stores data on an SMB share?You need a way to tell the SMB server to abandon your file handles when your application instance is moved between serversHow?Register an “AppInstance ID” for your process. All file handles opened by the process will be tagged with the ID
RegisterAppInstance( __in HANDLE ProcessHandle, __in GUID* AppInstanceId, __in BOOL ChildrenInheritAppInstance );
Drivers can attach an ECP with the AppInstance ID to a create request
Achieving high I/O throughputLarge I/OInterested in network “throughput” (bytes / sec)Fewer passes through the filesystem stack -> low CPU costOften sequential in nature -> fewer disk seeks• File copy, database logs
Small I/OInterested in IO/sec (IOPS)CPU intensive due to larger number of passes through the stackOften tends to be random I/O
I/O throughput: To cache or not to cacheCaching helps with small, bursty I/O, but limits sustained throughputEffectively makes ReadFile() and WriteFile() calls synchronousCannot achieve zero-copyWhat are the available options?Open the file with FILE_NO_INTERMEDIATE_BUFFERING to bypass filesystem caching on both the client and the server.Open the file cached, and then disable client side caching• Issue IOCTL_LMR_DISABLE_LOCAL_BUFFERING on file handle using DeviceIoControl() API• Able to do fully pipelined asynchronous I/O on the client• Server may be able to do zero-copy• Larger memory resources typically available on the server can help absorb some disk I/O
Pipeline enough I/O to fill the network pipe, sustain deep disk queues
How many IOPS can you push ? Is having a “fast enough” network and disk sufficient?No. It is mostly about managing CPU utilizationNUMA awareness (non-uniform memory access)Multiprocessor systems where the cost of accessing memory / network varies based on on which physical CPU your code is running onApplication I/Os need to be managed “per CPU” • dedicated threads for each CPU to do I/O• dedicated buffers for each CPUThe OS and the SMB redirector manage the rest by distributing the I/O across all available network interfaces.
Half a million IOPS over SMB-Direct!
File Server(SMB 3.0)
File Client(SMB 3.0) SQLIO
RDMA NIC
RDMA NIC
RDMA NIC
RDMA NIC
RDMA NIC
RDMA NIC
SAS
RAIDControlle
r
JBODSSDSSDSSDSSDSSDSSDSSDSSD
SAS
SASHBA
JBODSSDSSDSSDSSDSSDSSDSSDSSD
SAS
SASHBA
JBODSSDSSDSSDSSDSSDSSDSSDSSD
SAS
SASHBA
JBODSSDSSDSSDSSDSSDSSDSSDSSD
SAS
SASHBA
JBODSSDSSDSSDSSDSSDSSDSSDSSD
SAS
SASHBA
JBODSSDSSDSSDSSDSSDSSDSSDSSD
Storage Spaces
WorkloadBWMB/sec
IOPSIOs/sec
%CPUPrivileged
Latency
milliseconds
512KB IOs, 100% read, 2t, 8o 16,778 32,002 ~11% ~ 2 ms
8KB IOs, 100% read, 16t, 2o 4,027 491,66
5 ~65% < 1 ms
Results on Windows Server 2012 RTM using EchoStreams server, with 2 Intel E5-2620 CPUs at 2.00 Ghz
Both client and server using three Mellanox ConnectX-3 network interfaces on PCIe Gen3 x8 slots
Data goes to 6 LSI SAS adapters and 48 Intel SSDs, attached directly to the EchoStreams server.
Data on the table is based on a 60-second average. Performance Monitor data is an instant snapshot.
TakeawaysFile-based network storage is now faster and more reliableAll these technologies are now available Windows Server 2012Hyper-V and SQL have been validated against SMB 3.0 file sharesApplications don’t need any significant changes!Go build!
• Follow us on Twitter @WindowsAzure
• Get Started: www.windowsazure.com/build
Resources
Please submit session evals on the Build Windows 8 App or at http://aka.ms/BuildSessions
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.