storage area network - university of minnesota · storage area network universal storage...
TRANSCRIPT
• Any high-performance network whose primary purpose is to enable storage devices to communicate with computer systems and with each other. *
• A high-speed network, an extension to the storage bus, allows the establishment of direct connections between storage devices and processors (servers). **
• A network that provides access to consolidated, block level data storage. ***
SAN
*www.snia.org**Khattar, Ravi Kumar, et al. Introduction to Storage Area Network, SAN. IBM Corporation, International Technical Support Organization, 1999.***https://en.wikipedia.org/wiki/Storage_area_network
Why SAN?
• Industry Recognition : Three tiers architecture
PresentationDesktop(PC, NC)
ProcessingApplication Servers
Data StorageStorage Devices
Why SAN?
Client/Server Computing
Clients
Client Access LAN
…
Application Servers
Storage Devices
DAS
A B
Information Island
Limited distance of data transmittingSCSI: 1.5m~25m
Poor scalabilityAdding Disk for each server
Hard to share datainformation island• Extra resource of copying and
transmitting data• Work with out-of-date data
Why SAN?
Storage Area NetworkUniversal Storage Connectivity
Good scalability
Scale performance and capacity
Relatively long distance data transmitting
IP : Internet-based Long-distance
FC: 15m~10km
IB : 15m~10km
No-copy data sharing
Shared storage pool
Clients
Client Access LAN
…
Application Servers
Storage Area Network
Storage
Storage Model 1: Direct Access Storage
• All storage stranded behind server
• Proprietary access (vendor specific)
• Storage sharing creates CPU overhead
• Network burdened with disk I/O traffic
• Limited scalability and low performance
Server
Storage Model 2: Fibre Channel SAN
• Replaces parallel SCSI transport
• SAN is DAS from servers’ perspective
• Optimized for movement of data from server to disk or tape
• Facilitates storage clustering & LAN-free backup
• Typically does not use LAN protocols, relies on serial SCSI (SCSI-3)
SAN
SAN
SAN
Server
Server
Server
Interne
t
Intrane
t
Storage Model 2: FC SAN Limitations
• Creates a 3rd network (LAN, WAN, SAN)
• Pre-Gigabit Ethernet bandwidth assumptions
• Management nightmare
• Limited interoperability
• Minimal storage security
• Creates “SAN Islands”
SAN
SAN
SAN
Server
Server
Server
Interne
t
Intrane
t
Storage Model 3: IP-SAN
• Best features of Fibre Channel & IP networks
• Multiple server operating systems supported
• Maintain IT infrastructure, security & interoperability
• Ease of configuration and management
• Servers used optimally
• Support IP Quality of Service, Error detection & Prioritization
StorageData
IP
Video Voice
LINUXWIN 2000
SN 5420
SUN NT 4.0
IP
Network
Fibre Channel
Active Disk with OSD Capability As An Example of Intelligent Storage Devices
IP Network Attached
More Processing
Power and Memory
Storage Area Network• Server Architecture Based on SAN & NAS
• Network Protocol (FC-AL and SSA)
• Spatial Reuse
• Multiple Links and Switch Based Multiple FC-AL
HostSAN
FC-AL
Internet
ConnectionInternet
Connection
Internet
Connection
Internet
Connection
HostHost
Host
FC-AL
FC-AL
SAN
SAN
FC-AL FC-AL
Previous Research on SAN
• Efficient Protocol Design for FC-AL and SSA
• Emphasis on performance for future disks
• Built detailed simulation models for both FC-AL and SSA
• Supported by Seagate and IBM Storage Systems Division
• Scalable Streaming Video Servers based on SAN
• Co-funded a streaming video server company- Steaming21
• Many publications on streaming video servers and streaming video delivery over Internet
Serial Storage Interfaces
• Fibre Channel• FC-AL
• FC Switch
• Serial Storage Architecture (SSA)• Buffer Insertion Ring
• Link-by-link flow control
• Fairness Algorithm
• Independent links: Spatial Reuse
• Fault tolerance against link failure
FC-AL Features
• Bandwidth: 100 MB/s
• Connectivity: 126 devices
• Connection Distance: 30m device to device (with copper) and 10km
(with Fiber Optics)
• Fault-Tolerance: CRC protected frames, dual port, hot plug connector
• Distributed switch logic
FC-AL Fairness Algorithm
• Based on an Access Window with a history variable ACCESS
• Default value of ACCESS is true
• When an L_Port wins the arbitration, set ACCESS to false
• Before opening a circuit, winner send out ARB(F0) to detect if other L_Ports are also arbitrating
• If receive ARBx, other L_Ports are arbitrating
• When relinquish the loop, the winner sends out:• ARB(F0) if other L_Ports are arbitrating, or
• IDLE to trigger all L_Ports re-set ACCESS to true
SSA Features• 2-in and 2-out links per node (with 20 MB/s per
link)
• Fairness Access
• Fault-Tolerance: A multiple host configuration offers fault tolerance again host, link and adapter failures.
• Number of attachments: 126 for SSA
• Compact connectors: serial vs. parallel for SCSI
• Transmission distance: 25 m (2.5km) between devices with copper cables (fiber optic)
Spatial Reuse
• What is spatial reuse?• Concurrent non-overlap transfers can utilize full link bandwidth
• Why is it important?• Throughput can scale up with more links and non-overlap
transfers
• Achieved throughput could be as low as link bandwidth
• Device/data sharing may reduce spatial reuse potential
SSA SAT Fairness Algorithm• Based on token passing and quota
• Forwarding frames have higher priority than originating frames
• Holding a token allows a node to switch the priority between the originating and forwarding traffic.
• Hold quota (a_quota): number of frames that can be originated when holding the SAT token.
• Idle quota (b_quota): number of frames that can be originated since a node passed the SAT token last time and the channel is idle. In general, b_quota =4*a_quota.
Fairness vs. Channel Utilization
• How to define fairness?
• How to improve channel utilization?
• Starvation possible?
• Fairness+throughput
FC-TORN
• B_RDY (credit) is used to control the number of frames potentially can be sent to a destination (disk).
• SAT token based on one quota for each source (host or disk) to control the maximum number of frames sent by a source.
• B_RDY and B_RDY’ are used to produce fairness from sources to a destination.
Storage Area Network• Server Architecture Based on SAN & NAS
• Network Protocol (FC-AL and SSA)
• Spatial Reuse
• Multiple Links and Switch Based Multiple FC-AL
HostSAN
FC-AL
InternetConnection
InternetConnection
InternetConnection
InternetConnection
Host Host
Host
FC-AL
FC-AL
SAN
SAN
FC-AL FC-AL
SAN Components
Interconnects
Adapter
Server Host
Interconnects (The heart of a SAN)
• CableConnect the components with each other
• AdaptersConnect to devices and control the protocol
• Switches (Fabric)
Storage Array
Tapes Hard Disks
Interconnect devices, increase bandwidth, reduce
congestion and provide aggregate throughput
provide simple NameServer services.
or Hub (Arbitrated Loop) Share bandwidth
Fibre Channel SAN or FC SAN
IP Network SAN or IP SAN
InfiniBand SAN or IB SAN
SAN Components
Note: It doesn't say that a SAN uses Fibre Channel or Ethernet or any other specific interconnect technology. A growing number of network technologies have architectural and physical properties that make them suitable for use in SANs. - See more at: http://www.snia.org/education/storage_networking_primer/san/what_san#sthash.9cPWdUBs.dpuf
Fibre Channel SAN or FC SAN
IP Network SAN or IP SAN
InfiniBand SAN or IB SAN
SAN Components
Note: It doesn't say that a SAN uses Fibre Channel or Ethernet or any other specific interconnect technology. A growing number of network technologies have architectural and physical properties that make them suitable for use in SANs. - See more at: http://www.snia.org/education/storage_networking_primer/san/what_san#sthash.9cPWdUBs.dpuf
Fibre Channel
SAN Components
Fibre Channel started in 1988, with ANSI standard approval in 1994, to simplify HPPI (High Performance Parallel Interface) system.
FC is a high-speed network technology (commonly running at 2-, 4-, 8- and
16-gigabit per second rates) primarily used to connect computer data
storage. (32-Gigabit, 128-Gigabit speeds in 2016)
FC is the best design combining the I/O Channel with Networking.
Networking pays most attention on handling the changes of configuration and
loads as well as addressing data to proper destination.
I/O Channel focuses on the performance, which means to move data with least
latency by utilizing a rigorous and simple protocol.
FC maintains the speed and low overhead of a channel while adding the
flexibility (through connectivity) and the longer distances that are characteristic of
a networking.
SAN Components
Eventually the market chose FC over SSA (Serial Storage Architecture).
The competition of High-end Storage Technology
FC SSA
Throughput 531.25 Mb/s 640Mb/s
Device amount UnlimitedUp to 192 hot swappable hard disk per system
Up to 32 separate RAID arrays per adaptor
Distance 10km 10km( with 25 meters apart among arrays)
Up layer Protocol ATM, IP, FICON, SCSI SCSI-3
Fibre Channel Topologies:
SAN Components
FC-P2P:Point to point
The easiest configuration
The easiest to administer
High-speed interconnect between
two nodes
Possible Usage
• Between Central Processing Units
• From a workstation to a specialized graphics processor or simulation accelerator
• From a file server to a disk array
……
Fibre Channel Topologies:
SAN Components
FC-AL: Arbitrated Loop
1. First arbitrate to win control of the loop.2. Establish a point-to-point (virtual)
connection3. two nodes consume all of the loop’s
bandwidth until the data transfer operation is complete
Advantages
• Lower-cost alternative
• Support of up to 126 devices is possible on a single loop.
• ……
However, by 2007, FC-AL had become rare in server-to-storage communication
Fibre Channel Topologies:
SAN Components
FC-SW: Switched Fabric
Increased bandwidth
Increased number of devices
scalable performance
maximum of 16 million devices
FC-SW topology is what we deploy in a SAN.
High cost : Switch is the most costly hardware device.
Fibre Channel Switches
FC Host Bus Adapter
Server Host
Fibre or copper cable
Fibre cable
Fibre Channel SAN
SAN Components
FC Host Bus AdapterA unique World Wide Name (WWN)
CableCopper 15m 100 MB/sFibre 10km 2000MB/s
Fibre Channel SwitchesDirectors
No single point of failure (high availability)
Switches
smaller, fixed-configuration, less redundant devices
Fibre Channel Layers
SAN Components
FC-0 Physical layer : describes the physical interface
• an analog interface to transmitter
• a digital interface to the FC-1 layer
• the requirements for infrastructures
Transport media
Receiver hardware
……
Example of options of FC-0 Plants
Fibre Channel Layers
SAN Components
FC-1 Encode/Decode Layer: describes the means of encoding/decoding user data
8/10 bit encode/decode scheme
8b/10b encoding was proposed by Albert X. Widmer and Peter A.
Franaszek of IBM Corporation in 1983.
Minimize errors by equalizing the number of 1’s and 0’s transmitted and not
allowing more than 4 consecutive bits of the same type in a row. Allows for distinguishing “Special Characters (K28.5)” and also provides for
simplifying byte and word alignment.
the evening out of 1’s and 0’s allows for the design of relatively inexpensive
transmitter/receiver circuitry.
SAN ComponentsFC-1 Encode/Decode Layer: Encode Process
FC-2 byte notation: 0xBC (Hexadecimal)
FC-2 bit notation: 1 0 1 1 1 1 0 0 K7 6 5 4 3 2 1 0 Variable
FC-1 un-encoded: 1 0 1 1 1 1 0 0 KH G F E D C B A Z
FC-1 reordered for : K 1 1 1 0 0 1 0 1Z E D C B A F G H
Z XX . y
K28.5
FC-1 encoded : 0 0 1 1 1 1 1 0 1 0A B C D E i F G H j
5B/6B (Negative) 3B/4B(+Previous Running Disparity)
Fibre Channel Layers
SAN Components
FC-2: Framing Protocol/Flow Control
data using frames
flow control
classes of service
SAN Components Frames are the basic package used to encapsulate and transport the data.
Two types of Frames Data Frame
Link Control Frame
A group of related Frames transmitted in one direction constitute a
sequence.
Exchanges are groups of related Sequences.
SoF: the “comma”
and 3 bytes
indicating the type of
connection service
Expiration Security Header
Network Header
Association Header
Device Header
User Data(Not used in Link Control Frame )
Verify the data integrity of the FH and Payload
Designates the end of the Frame content
and validity of the Frame’s content
SAN Components FC-2 controls the flow of Frames between ports so that receiver buffers are
not overrun.
Buffer is maintained by the Sequence Initiator (transmitter) and is used to
throttle the transmission of Frames.
There are two basic types of flow control.
End to End Control in N_port to N_port communicationsThe receiver responds to all valid Frames it receives with an ACK Frame.
Buffer to Buffer Control in N_port talking to a Fabric or an N_port to N_port
connection in a Point to Point topologyEach side is responsible for maintaining its own BB_Credit_Count.
SAN Components FC-2 provides up to 5 Classes of Service (CoS). The different CoS represent
different levels of delivery guarantee, bandwidth and connectivity.
Class 1
dedicated connection
remain active until being closed.
R_RDY on Connect Request only
sustained, high throughput transactions
SAN Components
Class 2
control on a Frame by Frame
Basis
allows interleaving of
Sequences over the single
connection from multiple
N_ports
the ACK for every Frame. Also R_RDY.
SAN Components
Class 3
provides a connectionless service
with no acknowledgment
lack of ACK. Only R_RDY for link
maintenance
Fibre Channel Layers
SAN Components
FC-3: Common Services
The FC-3 level is not currently fully defined. The term “common services”
means a service that would utilize multiple N_ports working together on
a single node.
Fibre Channel Layers
SAN Components
FC-4: Upper Level Protocol Support
The FC-4 level supports the mapping of Upper Level Protocols (ULP) onto
Fibre Channel data structures.
SCSI (Small Computer Systems Interface)
IPI-3 (Intelligent Peripheral Interface-3)
HiPPI (High Performance Parallel Interface)
IP (Internet Protocol) - IEEE 802.2 (TCP/IP) data
ATM/AAL5 (ATM adaptation layer for computer data)
SBCCS (Single Byte Command Code Set)
The way that FC serves as a transport for ULPs is by mapping the ULP
messages(known as Information Units) into FC Sequences and/or Exchanges.
SAN Components IP over FC
IP datagram
ARP datagram
Moving between nodes on networks using the IP protocol stack
ARP datagram is used during network configuration to map IP addresses to
Media Access Control addresses (used for routing).
A dedicated ARP server must be set up at a “well known” address
Two kinds of Information Units
SAN Components IP over FC
Frame Header Network Header Payload Frame Header Payload … Frame
Split
Optional HeaderThe First Frame Additional Frame
Network Header
IP Packets
SAN Components SCSI over Fibre Channel (Predominate in FC SAN)
Generally, FCP stands for Fibre Channel Protocol for SCSI.*
The transport is accomplished by wrapping SCSI command, response, status and data
blocks.
*Norman, David. "Fibre Channel Technology for Storage Area Networks."
SCSI Command
SAN Components SCSI over Fibre Channel
*Norman, David. "Fibre Channel Technology for Storage Area Networks."
Receive
Handle
Initiator FCP_Port Target FCP_Port
Read Example
Fibre Channel SAN or FC SAN
IP Network SAN or IP SAN
InfiniBand SAN or IB SAN
SAN Components
Note: It doesn't say that a SAN uses Fibre Channel or Ethernet or any other specific interconnect technology. A growing number of network technologies have architectural and physical properties that make them suitable for use in SANs. - See more at: http://www.snia.org/education/storage_networking_primer/san/what_san#sthash.9cPWdUBs.dpuf
Network Switches
iSCSI Host Bus Adapter
Server Host
Ethernet
IP SAN
SAN Components
iSCSI HBAsiSCSI Node Names
CableEthernet
Network Switches
An IP SAN is a Storage Area Network that
uses the iSCSI protocol to transfer block-level
data over a network, generally Ethernet.
Fibre Channel SAN or FC SAN
IP Network SAN or IP SAN
InfiniBand SAN or IB SAN
SAN Components
Note: It doesn't say that a SAN uses Fibre Channel or Ethernet or any other specific interconnect technology. A growing number of network technologies have architectural and physical properties that make them suitable for use in SANs. - See more at: http://www.snia.org/education/storage_networking_primer/san/what_san#sthash.9cPWdUBs.dpuf
InfiniBand Network Architecture
IB SAN
InfiniBand is a network communications protocol that offers a switch-based fabric
of point-to-point bi-directional serial links between processor nodes, as well as
between processor nodes and input/output nodes, such as disks or storage.
Higher throughput – 56Gb/s per server and storage connection, and soon 100Gb/s, compared
to up-to 40Gb Ethernet and Fibre Channel
Lower latency – RDMA zero-copy networking reduces OS overhead so data can move
through the network quickly
Enhanced scalability – InfiniBand can accommodate theoretically unlimited-sized flat networks
based on the same switch components simply by adding additional switches
Higher CPU efficiency – Data movement offloads the CPU
InfiniBand Architecture
EndNodes: Servers and Devices
Link: copper and optical fibre*
1X fibre link has two optical fibres, one for each direction
Switches: IBA Switches A private, protected channel directly between the nodes was established by switches. Adapters: Host Channel Adapter Data and message movement without CPU involvement with RDMA and Send/Receive
offloads is performed by adapters. The adapters are connected on one end to the CPU over a PCI Express interface and to
the InfiniBand subnet through InfiniBand network ports. Subnet Manager: Routing define and Subnet discovery
InfiniBand Architecture
IB Communication Stack
A Consumer is a process with virtual
address space.
A Consumer can have more than one QP.
A QP(Queue Pair) is a Virtual Interface.
A OP includes a Send Q and Receive Q.
QPs are the endpoints of Channel.
A Channel Adapter has up to 2^24 QPs.
QPs are independent with each other.
IB Message Transfer Semantics Send/Receive
Simply send and receive.
RDMA Read/Write
Directly Read and write to Virtual Memory
InfiniBand Architecture
IB Message Transfer Semantics: Send/Receive
Step:
1. Initiator put the message in the SND.
2. The Message is sent to Target.
3. Target receive the Message.
4. Target put the Message in the RCV.
InfiniBand Architecture
IB Message Transfer Semantics: RDMA
Step:
1. Application on initiator registers a
buffer and puts the send request in
SND.
2. Target receives the request and
reads the data from initiator buffer
directly.
3. Target returns a status to Initiator.
Complete IBA Packet Format
Local Routing Header
Global Routing Header
Base Transport
Header
ExtendedTransport
Header
Immediate Data
MessagePayload
InvariantCRC
VariantCRC
8 Bytes 40 Bytes 12 Bytes 28 bytes 4 Bytes 0-4096 Bytes 4 Bytes 2 Bytes
Intra-subnet
Inter-subnettells endnodes what to
do with packets
Message
InfiniBand Architecture is said to be message-oriented.
A message can be any size ranging up to 2^31 bytes in size.
The InfiniBand hardware automatically segments the outbound message into a
number of packets.
InfiniBand Architecture
IB Verbs
InfiniBand architecture does not
define APIs, only provides the basis
for specifying the APIs.
A verb is a method by which an
application requests an action from
InfiniBand’s message transport
service.
Other organizations, such as the
OpenFabrics Alliance, provide a
complete set of APIs and software
that implements the verbs to work
seamlessly with the InfiniBand
hardware.
InfiniBand Architecture
IB Up Layer Protocol
InfiniBand Architecture
Linux InfiniBand software architecture
The upper level protocols
IPoIB : IP over IB
SRP : SCSI RDMA Protocol
SDP : Sockets Direct Protocol
iSER : iSCSI Extensions for RDMA
SRP Protocol
InfiniBand Architecture
Linux InfiniBand SRP Protocol architecture
SCSI RDMA Protocol (SRP) was
defined by the ANSI T10 committee to
provide block storage capabilities for
the InfiniBand architecture.
SRP is a protocol that tunnels SCSI
request packets over InfiniBand hardware
SAN Components (Summary)
FC SAN IP SAN IB SAN
Bandwidth100Mb(Copper)
20Gb(Fibre)32Gband 128Gb(Coming)
100Mb or 1Gb(Ethernet)10Gb(10GB Ethernet)
120Gb(12X)
LatencyDedicated to
block I/ODirect connection
Dedicated to block I/O
Distance15m(Copper)20km(Fibre)
Internet-based Long-distance
125m(12X)10km(1X)
Cost High Cheap Medium
SAN, NAS or DAS?
SAN
More Efficient Block-Level data access
NAS
Convenient data sharing in homogenous File System
DAS
Easy implement and low cost
Acknowledgement Professor David Du gives me numerous basic knowledge on Storage
System and provides this interesting topic.
During the preparation for the presentation, Dr. Fenggang Wu helps mereview the slices and gives me significant references.
Reference• www.snia.org• Khattar, Ravi Kumar, et al. Introduction to Storage Area Network, SAN. IBM Corporation,
International Technical Support Organization, 1999.• https://en.wikipedia.org/wiki/Storage_area_network• https://en.wikipedia.org/wiki/Fibre_Channel#Fibre_Channel_topologies• http://www.networkworld.com/article/2174282/lan-wan/fibre-channel-will-come-with-
32-gigabit--128-gigabit-speeds-in-2016.html• https://www.pctechguide.com/interfaces/hard-disks-what-is-serial-storage-architecture• https://en.wikipedia.org/wiki/Fibre_Channel_point-to-point• Shanley, Tom, and Joe Winkles. InfiniBand Network Architecture. Addison-Wesley
Professional, 2003.• IP SAN Fundamentals: An Introduction to IP SANs and iSCSI• Norman, David. "Fibre Channel Technology for Storage Area Networks.“• Grun, Paul. "Introduction to infiniband for end users." White paper, InfiniBand Trade
Association (2010).