nvme over fabric solution · 2016-06-03 · nvme over roce, infiniband, opa...

29
NVMe over Fabric Solution Samstor SX5200

Upload: others

Post on 09-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

NVMe over Fabric SolutionSamstor SX5200

Storage advances and NVMe over fabric

Traditional data storages

FC/iSCSI SAN

Local SSD per server

Advantages:• 서버에서 스토리지 독립• 서비스 제공 편리함• 용량 재할당 가능

Disadvantages:• Legacy 패브릭 (FC/iSCSI)• Bandwidth 제약• 높은 latency

Advantages:• 낮은 latency• 높은 bandwidth

Disadvantages:• 볼륨 크기는 각 SSD 용량으

로 제한• 서버당 고정된 용량• SSD maintenance 어려움

Needs for fast drive - NVMe

SSDs become more common..

SATA/SAS interface’s bandwidth is not enough for today’s SSDs.

Use of the fastest type of current SSDs, NVMe is being increased..

NVMe:PCI Express bus를 통해 비휘발성(non-volatile) 스토리지 미디어에 접속하기 위한 논리적 디바이스인터페이스 규격

NVMe over fabrics

The advantages from SAN and local SSD per server

NVMe over RoCE, Infiniband, OPA

• 서버에서 스토리지 독립• 서비스 제공 편리함• 용량 재할당 가능

• 낮은 latency• 높은 bandwidth

• Non-Proprietary Architecture

• 표준 이더넷 스위치 Fabric

• 로컬 NVMe 장치로 볼륨 인식

Vol

Vol

Vol

Vol

10, 40, 56, 100Gb/s, Low latency

About Samstor SX5200

NVMe over fabric – NVMe SSD

MX6300 PCIe NVMe SSD

Specifications

• 1Tb & 2Tb eMLC NAND Flash

• PCIe Gen3 x8

• Capacity: 2.7TB, 5.4TB, 12TB(coming soon)

• 최대 random read/write 900K/600K IOPS

• 최저 read 90ms /write 15ms latency

• 7 drive writes per day (5 years)

NVMe over fabric - highlights

Samstor SX5200 flash storage array

Highlights

• NVMe SSD를 기반으로 한 flash array

• 40/56/100Gbps RDMA over Ethernet, Infiniband, OPA를 통한 스토리지 읽기/쓰기

• 서버 내 로컬 PCIe SSD와 거의 대등한 수준의 remote latency

• 표준 인터페이스 사용, 낮은 latency 네트워크에 최적화된 스토리지 소프트웨어

• Random read 3000K, random write 2250K IOPS

• Linux, Windows 지원

NVMe over fabric – specifications & features

Samstor SX5200 flash storage array

Specifications & features

• Capacity : Configurable (10.8TB ~ 96TB..)

• Read 110 ms, write 30 ms latency

• RAID 0

• Thin provisioning, Dynamic volumes

• Snapshot

• Storage HA

• Inline replication

• Tiering

• Openstack Cinder support

• High density P2P

• RAID 1, 10, 5, 6

• NFS 4.1, pNFS

• Deduplication

NVMe over fabric – flash array software

Samstor SX5200 flash storage array software

• CPU 오버헤드를 줄이고 네트워크로 부터 flash 스토리지로의 구간을 최적화하여 낮은 latency와 높은bandwidth 구현.

• NVMe, RDMA, 멀티코어 기술을 이용하여 뛰어난block 스토리지 bandwidth와 latency 제공

• 스토리지 소프트웨어 타겟은 Openstack Cinder*와연동하여 array의 볼륨을 관리하고 모니터링하기위한 직관적인 관리 framework 제공

Storage Software Stack

NVMe over fabric – configuration

Samstor SX5200 flash storage array configuration

Block Storage Device

• Samstor SX5200에 설정된 볼륨들은 Server에 로컬

block 스토리지 장치로 표출

• 이러한 block 디바이스들은 어플리케이션들에서 다

른 로컬 장치들, NVMe 혹은 SATA SSD 등 과 같은

방법으로 사용 가능 (파일시스템 mount 등..)

Ethernet RDMA,Infiniband,OPA

Samstor Storage Array

Servers

Samstor SX5200 performance

NVMe over fabric – performance

Reads

Notes:Identical Q-Depth on each client

~2.3M IOPs (4K random read )under 200uS latency

NVMe over fabric – performance

Writes~2.3M IOPs (4K random write )under 110us latency

NVMe over fabric – performance

Samstor flash storage array

Coming Soon2 x dual 56Gb or 2 x single 100Gb

SX5200 Series2 x dual 40/56Gb

SX5200 Series2 x dual 40/56Gb or

2 x single 100Gb

Other product A

Other product B

Other product C

Samstor SX5200 data protection

NVMe over fabric – data replication

Vol_ASX-Array_0

SX-Array_1

Vol_A

Vol_B

Vol_B

ApplicationApplication

• 어플리케이션 서버는 Vol_A에 대한 client (initiator), SX-Array_0는 Vol_A의 타겟

• SX-Array_0은 Vol_B에 대한 client (initiator), SX-Array_1은 Vol_B의 타겟

1. Client는 “로컬“ block device인 Vol_A (물리적으로 SX-Array_0에 존재)에 데이터 쓰기

2. SX-Array_0은 로컬로 보이지만 물리적으로 SX-Array_1에 있는 Vol_B에 “background copy” 수행

• Background copy는 redundancy를 위한 중복 데이터이거나 시간에 기반한 백업용 snapshot

NVMe over fabric use cases

NVMe over fabric – use cases

Business Analytics• Database (Oracle, MySQL, SAP)• Real-time data analysis farms

High Performance Computing• High speed burst buffer• High speed network capture• Intermediate cache for real time data analysis

Design & Automation• Fast and complex design simulations• Fast file load/write logging• Database for EDA applications

Media & Entertainment• High resolution video capture and processing• High resolution image processing

NVMe over fabric – burst buffer

System design – burst buffer

Lustre FS

Samstor Storage Platform(For use as Burst Buffer)

Controller(Server)

IB Link

IB Switch

Dual EDR Link

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

IB Links

Management Network

NVMe over fabric – burst buffer

Initialization

Lustre FS

Samstor Storage Platform(For use as Burst Buffer)

Controller(Server)

IB Link

IB Switch

Dual EDR Link

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

IB Links

Management Network

Burst Read Buffer Volume Burst Write Buffer Volumes

Mapped as /dev/rdbuf

Mapped as /dev/wrbuf(separate volume per compute node)

Mapped as /dev/rdbufshared with all compute nodes

Mapped as /dev/wrbuf[0-7]

NVMe over fabric – burst buffer

Seeding read buffer

Lustre FS

Samstor Storage Platform(For use as Burst Buffer)

Controller(Server)

IB Link

IB Switch

Dual EDR Link

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

IB Links

Management Network

Burst Read Buffer Volume Burst Write Buffer Volumes

NVMe over fabric – burst buffer

Compute node processing

Lustre FS

Samstor Storage Platform(For use as Burst Buffer)

Controller(Server)

IB Link

IB Switch

Dual EDR Link

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

IB Links

Management Network

Burst Read Buffer Volume Burst Write Buffer Volumes

Use Block io to read data

Use Block io to write data

NVMe over fabric – burst buffer

Saving results

Lustre FS

Samstor Storage Platform(For use as Burst Buffer)

Controller(Server)

IB Link

IB Switch

Dual EDR Link

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

Compute Node

IB Links

Management Network

Burst Read Buffer Volume Burst Write Buffer Volumes

Use Block io to write results back

NVMe over fabric – network filter and data capture

SwitchFilter Node

100Gbps Line

Dual 100GbpsTAP

Dual 100Gbps data write paths

Scalable data capture Buffer

Multiple clients with read only access to data

Write Read

NVMe over fabric – vmware configuration

HypervisorHypervisor

VM0

VM1

VM2

VM3

VM4

VM5

VM6

VM7

Pass-through NIC Ethernet Switch

HypervisorHypervisor

VM0

VM1

VM2

VM3

VM4

VM5

VM6

VM7

Pass-through NIC

Samstor SX5200 specifications

SX5200 flash storage array – specifications

Specifications

Flash Storage Array SX5200

Rack Height 2U/3U

Capacity (Configurable) 10.8 TB (4x 2.7TB) 21.6 TB (4x 5.4TB) 48 TB (4x 12TB) 96 TB (8x 12TB)

BandwidthRead 12.0 GB/s 12.0 GB/s 12.0 GB/s 20.0 GB/s

Write 9.0 GB/s 9.0 GB/s 9.0 GB/s 18 GB/s

Throughput(4K)

Read 3.0M IOPS 3.0M IOPS 3.0M IOPS 3.0M IOPS

Write 2.25M IOPS 2.25M IOPS 2.25M IOPS 2.25M IOPS

LatencyRead 110 ms

Write 30 ms

I/O Connectivity Dual or single port 40/56/100Gb Ethernet, Infiniband, OPA

Fabric Protocol RDMA over Converged Ethernet (RoCE), Infiniband, iWARP

Client OS RHEL, SLES, CentOS, Ubuntu, Windows, Vmware ESXi 5.5/6.0 (pass-through)

Management CLI, GUI, RESTful API, OpenStack Cinder*

Environmental Inlet temperature: 10 ~ 35°C; Humidity: 5 ~ 95% (non condensing)

Power 1100W

감 사 합 니 다