nvme over fabric solution · 2016-06-03 · nvme over roce, infiniband, opa...
TRANSCRIPT
Traditional data storages
FC/iSCSI SAN
Local SSD per server
Advantages:• 서버에서 스토리지 독립• 서비스 제공 편리함• 용량 재할당 가능
Disadvantages:• Legacy 패브릭 (FC/iSCSI)• Bandwidth 제약• 높은 latency
Advantages:• 낮은 latency• 높은 bandwidth
Disadvantages:• 볼륨 크기는 각 SSD 용량으
로 제한• 서버당 고정된 용량• SSD maintenance 어려움
Needs for fast drive - NVMe
SSDs become more common..
SATA/SAS interface’s bandwidth is not enough for today’s SSDs.
Use of the fastest type of current SSDs, NVMe is being increased..
NVMe:PCI Express bus를 통해 비휘발성(non-volatile) 스토리지 미디어에 접속하기 위한 논리적 디바이스인터페이스 규격
NVMe over fabrics
The advantages from SAN and local SSD per server
NVMe over RoCE, Infiniband, OPA
• 서버에서 스토리지 독립• 서비스 제공 편리함• 용량 재할당 가능
• 낮은 latency• 높은 bandwidth
• Non-Proprietary Architecture
• 표준 이더넷 스위치 Fabric
• 로컬 NVMe 장치로 볼륨 인식
Vol
Vol
Vol
Vol
10, 40, 56, 100Gb/s, Low latency
NVMe over fabric – NVMe SSD
MX6300 PCIe NVMe SSD
Specifications
• 1Tb & 2Tb eMLC NAND Flash
• PCIe Gen3 x8
• Capacity: 2.7TB, 5.4TB, 12TB(coming soon)
• 최대 random read/write 900K/600K IOPS
• 최저 read 90ms /write 15ms latency
• 7 drive writes per day (5 years)
NVMe over fabric - highlights
Samstor SX5200 flash storage array
Highlights
• NVMe SSD를 기반으로 한 flash array
• 40/56/100Gbps RDMA over Ethernet, Infiniband, OPA를 통한 스토리지 읽기/쓰기
• 서버 내 로컬 PCIe SSD와 거의 대등한 수준의 remote latency
• 표준 인터페이스 사용, 낮은 latency 네트워크에 최적화된 스토리지 소프트웨어
• Random read 3000K, random write 2250K IOPS
• Linux, Windows 지원
NVMe over fabric – specifications & features
Samstor SX5200 flash storage array
Specifications & features
• Capacity : Configurable (10.8TB ~ 96TB..)
• Read 110 ms, write 30 ms latency
• RAID 0
• Thin provisioning, Dynamic volumes
• Snapshot
• Storage HA
• Inline replication
• Tiering
• Openstack Cinder support
• High density P2P
• RAID 1, 10, 5, 6
• NFS 4.1, pNFS
• Deduplication
NVMe over fabric – flash array software
Samstor SX5200 flash storage array software
• CPU 오버헤드를 줄이고 네트워크로 부터 flash 스토리지로의 구간을 최적화하여 낮은 latency와 높은bandwidth 구현.
• NVMe, RDMA, 멀티코어 기술을 이용하여 뛰어난block 스토리지 bandwidth와 latency 제공
• 스토리지 소프트웨어 타겟은 Openstack Cinder*와연동하여 array의 볼륨을 관리하고 모니터링하기위한 직관적인 관리 framework 제공
Storage Software Stack
NVMe over fabric – configuration
Samstor SX5200 flash storage array configuration
Block Storage Device
• Samstor SX5200에 설정된 볼륨들은 Server에 로컬
block 스토리지 장치로 표출
• 이러한 block 디바이스들은 어플리케이션들에서 다
른 로컬 장치들, NVMe 혹은 SATA SSD 등 과 같은
방법으로 사용 가능 (파일시스템 mount 등..)
Ethernet RDMA,Infiniband,OPA
Samstor Storage Array
Servers
NVMe over fabric – performance
Reads
Notes:Identical Q-Depth on each client
~2.3M IOPs (4K random read )under 200uS latency
NVMe over fabric – performance
Samstor flash storage array
Coming Soon2 x dual 56Gb or 2 x single 100Gb
SX5200 Series2 x dual 40/56Gb
SX5200 Series2 x dual 40/56Gb or
2 x single 100Gb
Other product A
Other product B
Other product C
NVMe over fabric – data replication
Vol_ASX-Array_0
SX-Array_1
Vol_A
Vol_B
Vol_B
ApplicationApplication
• 어플리케이션 서버는 Vol_A에 대한 client (initiator), SX-Array_0는 Vol_A의 타겟
• SX-Array_0은 Vol_B에 대한 client (initiator), SX-Array_1은 Vol_B의 타겟
1. Client는 “로컬“ block device인 Vol_A (물리적으로 SX-Array_0에 존재)에 데이터 쓰기
2. SX-Array_0은 로컬로 보이지만 물리적으로 SX-Array_1에 있는 Vol_B에 “background copy” 수행
• Background copy는 redundancy를 위한 중복 데이터이거나 시간에 기반한 백업용 snapshot
NVMe over fabric – use cases
Business Analytics• Database (Oracle, MySQL, SAP)• Real-time data analysis farms
High Performance Computing• High speed burst buffer• High speed network capture• Intermediate cache for real time data analysis
Design & Automation• Fast and complex design simulations• Fast file load/write logging• Database for EDA applications
Media & Entertainment• High resolution video capture and processing• High resolution image processing
NVMe over fabric – burst buffer
System design – burst buffer
Lustre FS
Samstor Storage Platform(For use as Burst Buffer)
Controller(Server)
IB Link
IB Switch
Dual EDR Link
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
IB Links
Management Network
NVMe over fabric – burst buffer
Initialization
Lustre FS
Samstor Storage Platform(For use as Burst Buffer)
Controller(Server)
IB Link
IB Switch
Dual EDR Link
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
IB Links
Management Network
Burst Read Buffer Volume Burst Write Buffer Volumes
Mapped as /dev/rdbuf
Mapped as /dev/wrbuf(separate volume per compute node)
Mapped as /dev/rdbufshared with all compute nodes
Mapped as /dev/wrbuf[0-7]
NVMe over fabric – burst buffer
Seeding read buffer
Lustre FS
Samstor Storage Platform(For use as Burst Buffer)
Controller(Server)
IB Link
IB Switch
Dual EDR Link
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
IB Links
Management Network
Burst Read Buffer Volume Burst Write Buffer Volumes
NVMe over fabric – burst buffer
Compute node processing
Lustre FS
Samstor Storage Platform(For use as Burst Buffer)
Controller(Server)
IB Link
IB Switch
Dual EDR Link
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
IB Links
Management Network
Burst Read Buffer Volume Burst Write Buffer Volumes
Use Block io to read data
Use Block io to write data
NVMe over fabric – burst buffer
Saving results
Lustre FS
Samstor Storage Platform(For use as Burst Buffer)
Controller(Server)
IB Link
IB Switch
Dual EDR Link
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
Compute Node
IB Links
Management Network
Burst Read Buffer Volume Burst Write Buffer Volumes
Use Block io to write results back
NVMe over fabric – network filter and data capture
SwitchFilter Node
100Gbps Line
Dual 100GbpsTAP
Dual 100Gbps data write paths
Scalable data capture Buffer
Multiple clients with read only access to data
Write Read
NVMe over fabric – vmware configuration
HypervisorHypervisor
VM0
VM1
VM2
VM3
VM4
VM5
VM6
VM7
Pass-through NIC Ethernet Switch
HypervisorHypervisor
VM0
VM1
VM2
VM3
VM4
VM5
VM6
VM7
Pass-through NIC
SX5200 flash storage array – specifications
Specifications
Flash Storage Array SX5200
Rack Height 2U/3U
Capacity (Configurable) 10.8 TB (4x 2.7TB) 21.6 TB (4x 5.4TB) 48 TB (4x 12TB) 96 TB (8x 12TB)
BandwidthRead 12.0 GB/s 12.0 GB/s 12.0 GB/s 20.0 GB/s
Write 9.0 GB/s 9.0 GB/s 9.0 GB/s 18 GB/s
Throughput(4K)
Read 3.0M IOPS 3.0M IOPS 3.0M IOPS 3.0M IOPS
Write 2.25M IOPS 2.25M IOPS 2.25M IOPS 2.25M IOPS
LatencyRead 110 ms
Write 30 ms
I/O Connectivity Dual or single port 40/56/100Gb Ethernet, Infiniband, OPA
Fabric Protocol RDMA over Converged Ethernet (RoCE), Infiniband, iWARP
Client OS RHEL, SLES, CentOS, Ubuntu, Windows, Vmware ESXi 5.5/6.0 (pass-through)
Management CLI, GUI, RESTful API, OpenStack Cinder*
Environmental Inlet temperature: 10 ~ 35°C; Humidity: 5 ~ 95% (non condensing)
Power 1100W