g2m research multi-vendor webinar: …g2minc.com › wp-content › uploads › 2019 › 06 ›...

RESEARCH

G2M Research Multi-Vendor Webinar: Computational Storage – Use CasesTuesday June 22, 2019

SPONSORED BY:

2 © 2019 G2M Communications, Inc.. All rights reserved.RESEARCH

Webinar Agenda

9:00-9:05 Ground Rules and Webinar Topic Introduction(G2M Research)

9:06-9:29 Sponsoring Vendor presentations on topic (6 minute each) 9:30-9:31 Audience Survey 1 (2 minutes) 9:32-9:41 Key Question 1 (2-minute question; 2 minutes response per

vendor) 9:42-9:43 Audience Survey 2 (2 minutes) 9:44-9:53 Key Question 2 (2-minute question; 3 minutes response per

vendor) 9:54-9:55 Audience Survey 3 (2 minutes)9:56-10:05 Key Question 3 (2-minute question; 3 minutes response per

vendor)10:06-10:18 Audience Q&A (13 minutes) 10:19-10:20 Wrap-Up

RESEARCH

G2M Research Introductionand Ground RulesMike HeumannManaging Partner, G2M Research


JB BakerSr Director, Product Management(www.scaleflux.com)

Panelists

Host/Emcee: Mike HeumannManaging PartnerG2M Researchwww.g2minc.com

Pankaj MehraVP, Product Planning Team(www.samsung.com )

Stephen BatesChief Technology Officer(www.eideticom.com)

Scott ShadleyVP of Marketing (www.ngdsystems.com)

http://www.scaleflux.com/

http://www.g2minc.com/

http://www.samsung.com/

http://www.lightbitslabs.com/

http://www.ngdsystems.com/


What is Computational Storage?

Computational Storage is a means to accelerate application execution by colocating processing with data storage

It is an extension of the data locality concept (put processing where the data resides, vs moving data to the processor)

The exponential growth in the size of data sets has been a motivational factor (“petabyte-scale real-time analytics”)

The goals of computational storage are:– Reduce the time to perform petabyte-scale computations– Reduce the energy and cooling for petabyte-scale analytics– Reduce the cost and footprint for petabyte-scale analytics– Allow processor power to scale linearly with the size of the

problem’s data set


Is Computational Storage a New Concept?

The idea of localizing storage with processors really started with massively parallel processors– Goodyear MPP (1983)

Content-Addressable Storage (CAS) also contained some of the attributes of computational storage (search capabilities built into storage)– EMC Centera (2001)

– IBM DR550

Computational Storage can theoretically be performed at the storage chip (flash ASIC), storage device (SSD), system, memory, or potentially other levels


Potential Challenges to Computational Storage Market Acceptance

Application Modification: Whenever applications have to be modified to accommodate a technology, adoption times are significantly lengthened.

Go-To-Market Channel Adoption: Are computational storage devices available from standard IT sources (SIs, resellers, OEMs).

“Crossing The Chasm”: Many IT buyers won’t adopt a new technology until it has been adopted by “mainstream” companies.

Standardization: Do standards for the technology exist to help avoid vendor lock-in?

© 2019 Storage Networking Industry Association. All Rights Reserved.

SNIA Computational Storage TWG: Interoperability Between Computational Storage Devices

8

SNIA Computational Storage TWG Founded in October 201840+ Participating Companies, 128+ Individual Members

© 2019 Storage Networking Industry Association. All Rights Reserved.

SNIA TWG: New Product Categories

Computational Storage Device (CSx)

Computational Storage Drive (CSD)Computational Storage Processor (CSP)Computational Storage Array (CSA)

9

Co m pu t a t ional S t o r age Arr a y(Acc e ss v ia C S P and / o r di r e c t t o S t o r age )

O p t ional Pr o x iedS t o r age Acc e s s

O p t ional Di r e c tS t o r age Acc e s s

Co m pu t a t ional S t o r age D r i v e(Acc e ss v ia C S P and / o r di r e c t t o S t o r age )

Co m pu t a t ionalS t o r age Pr o c e ss o r

Co m pu t a t ionalS t o r age D r i v e

T r adi t iona lS t o r age De v i c e

Ho s t A gen t 1

F ab r i c (P C I e , E t he r ne t, e t c )

Ho s t A gen t 2 Ho s t A gen t N

S t o r ag e

I/ O I/ OM G N T M G N T I/ OM G N T

O p t ional Pr o x iedS t o r age Acc e s s

O p t ional Di r e c tS t o r age Acc e s s

I/ OM G N T I/ OM G N T

S t o r ag e

I/ OM G N T I/ OM G N T

Arra yCont r o l

C S

C S

C S

C S

C S

C S

C S

C S

Co m pu t a t iona lS t o r ag e

Pr o c e ss o r(s )


Pr o c e ss o r(s )


Pr o c e ss o r(s )


Pr o c e ss o r(s )

C SC SD r i v e r s C S

D r i v e r s C SD r i v e r

A nd /O r A nd /O r

RESEARCH

SamsungPankaj MehraVice President/Product Planning Teamwww.samsung.com

http://www.samsung.com/

C O L L A B O R A T E . I N N O V A T E . G R O W .

Smarts: Computational Storage

CPU

FPGA / GPU

Heavyweight compute in

CPU

SSD

Large data transfer

SSD SSDsTBs, PBs, EBs

dataset

Today’s Architectures Storage and Data Acceleration

CPU

Minimize data movement

NAND

SSD Controller Accelerator

CPU performance bottleneck IO Bandwidth bottleneck

(U.2, CPU PCIe Lanes)

Large data movement

Concurrent processing and transfer bandwidth

Reduced CPU Load

Near Data Processing


Introducing SmartSSD

PM983F AIC PoC Results

• SmartSSD PM983F announced at Samsung Tech Day

For I/O-bound workloads, SmartSSD showed 3x to 4x better performance with scalability

Financial BI (VWAP1)Throughput (MOPS)

3.3x

PM983 PM983F

Database (MariaDB)TPC-H Score, Geo.Mean

PM983 PM983F

3.5x

* VWAP: Volume Weighted Average Price

Xilinx FPGA

Samsung Controller

Samsung V-NAND

PM983 1 PM983F 2 PM983F 4 PM983F

Airline Data Analysis (Spark)Query Execution Time (s)

4x1.8x

1.9x

10.17.2018

SmartSSD PCIe add-in card Shown successfully integrated with Bigstream Several data-intensive workloads easily ported


Scaling Out, Not Sprawling Out

• Dense Spark Nodes

I/O bound tasks- Scan-filter (ad hoc condition)- Aggregate (custom average)- Map/Tag (newly trained detector)

2 choices:

Funnel data out of SSDs into CPU/GPU/FPGAonly to discard most of it, reduce it down to a scalar, or worse still write it all back only slightly modified

Process near the data locally, concurrently, without funneling

Talk “Arrow” to your SSD

parse

Data @rest

compress

encrypt decrypt

decompress

index

stats parse

Scan-filterPack an entire DB storage engineinto each drive

Save expensive server CPUs for monetizable DWUs, DTUS


Unlimited Concurrency

Dense Storage Nodes

Density not frontend-limited- Not just compression and encryption- Offload entire threads

Scaling to high device counts Processing capacity

Enough processing and memory in each device to handle its own work of compaction and dedup

Data BandwidthEach device has internal data path to processingNet data processing bandwidth scales linearly

I/O for Virtual Machines

dedupe

Block

Data @rest

compress

encrypt decrypt

decompress

Pack a whole processing stackinto each drive

Then scale #devices to keep up with amount of data


SmartSSD Technology Roadmap

• Roadmap to smaller FF (U.2) and greater integration with SSD controller

Moves Data out of SSD via CPU & MemoryLimited scalability: Funnels 3 SSDs 1 FPGA

External FPGA

SmartSSD

Next Gen SmartSSD

U.2 FF: Scale Processing to 24 or 48 devicesGreater Integration and More Bandwidth


SmartSSD Acceleration Use Cases


SmartSSD Platform

RESEARCH

ScaleFluxJB BakerSr. Director, Product Managementwww.scaleflux.com

http://www.scaleflux.com/

19

Low Latency Flash StorageComputeEngines

ScaleFlux: Computational Storage Leader

Industry Standard

Infrastructure

Most Innovative StartupFMS 2018

Beijing, ShanghaiChina

HQ San Jose, CAUS Incorporated

Oct 2014

Well funded by venture & corporate investorsRich patent portfolio: 42 filed; 8 issued1st Production Computational Storage

Hyperscale, webscale & enterprise customers Multiple Tier-1 system OEMs qualified1000s of production drives shipped worldwide

20

Turnkey Application Deployments

200%Latency ConsistencyCustomer Specific Workload

CSD vs. NVMe

200%Queries Per SecondSysbench OLTP write-only

CSD vs. NVMe

161%Jobs Completed

Teragen, TerasortCSD+HDD vs. HDD only

3.0

*Applies to HDFS &

Up to 500%GZIP Write Throughput

FIO CSD GZIP vs. CPU GZIP

260%GZIP Write Throughput

YCSB Load Benchmark CSD vs. NVMe

21

Hadoop: Compression and Erasure CodingTeragen + Terasort

All benchmarks configurations use HDD as main storage24 Mapper/Reducers per Datanode *9 = 216 totalBetter performance on CSD reported with lower

Mapper/Reducers possible

Datanode Config:Dual E5-2640v3, 128GB DRAM, 12*6TB SAS HDD

…

Baseline Configuration

Add one ScaleFlux CSS 1000 to each

server

3.1 w/ EC (6+3)

With CSS

Seco

nds

Low

er is

Bet

ter

100

200

300

400

500

405

HDD StorageCPU GZIPCPU/ISA-L EC

Baseline

155HDD StorageCSS GZIPCSS EC

With CSS

62% Lower Run Time261% Job Throughput

1TB → 281GB 1TB → 296GB

Teragen

500

1000

1500

2000

2500

3000

2787

HDD StorageSnappy HDD Temp CPU GZIP CPU/ISA-L EC

Baseline

HDD StorageSnappy CSS Flash TempCSS GZIP CSS EC

1842

Seco

nds

Low

er is

Bet

ter

With CSS

34% Lower Run Time151% Job Throughput

Terasort

Compute AND Flash37% ↓ Run Time

160% Job Throughput3192

HDD StorageSnappy HDD Temp CPU GZIPCPU/ISA-L EC

Baseline

HDD StorageSnappy CSS Flash TempCSS GZIP CSS EC

1997

500

1000

1500

2000

2500

3000

3500

Seco

nds

Low

er is

Bet

ter

Teragen + Terasort

With CSS

CSS 1000 alleviates CPU & Storage I/O Bottlenecks to

Improve Run Time & Job Throughput

Application Note available for more detailed analysis

22

CSD HBase / YCSB BenchmarkImpact of csszlib, css flash (Bucket cache)

1x 3.2TB CSS 1000 AIC per Node

(HDFS Bucketcache, compressed)

DualE5 2640 V3@ 2.6GHZ

128GB DRAM

Linux CentOS 7.4

HBase 1.2.5 (ScaleFlux GitHub)

Hadoop2.7.3 (ScaleFlux GitHub)Heap Size: 32GB; On-Heap Cache 1.6GB; Memstore 1.6GB; Short Circuit Reads Enabled

10GbE Switch

YCSB Client

Name Node

YCSB 0.12.0 + CSS update12x 7200 RPM 6TB SAS

HDDs

Database

All data is stored on HDD for all configs

53% ↓

YCSB Data Compressibility Modified

~2.6X ↑18X↑

Datanode 1-3

Adding CSS 1000:

Enables GZIP storage savings at near “no compression” performance

Dramatically improves Run Time

23

Open ZFS: GZIP Compression OffloadUp to 5x Throughput vs CPU GZIP and 23% Storage Capacity savings vs. LZ4

ZFS Throughput by Record Size & Compression Type

(higher is better)

CSS 1000SW 2.3.1

FPGA 6136

DualE5 2640 V3@ 2.6GHZ

128GB DRAMZFS Storage Server

ZFS version 0.7.12-1 FIO Benchmark, 100% Random Write

Data xfer size = ZFS record size

0

50

100

150

200

250

300

No Compression LZ4 CSSZlib Gzip

Size

on

Disk

(KiB

)

Size on Disk for Canterbury Corpus Files*

(lower is better)

23% Storage Savings vs. LZ4Up to 5X

Write Throughput vs. CPU GZIP

Application Note available for more detailed analysis

*http://corpus.canterbury.ac.nz

Using CSS 1000 compute engines:

Accelerates GZIP throughput vs CPU

Reduces Storage consumption vs LZ4

RESEARCH

EideticomStephen BatesChief Technology Officerwww.eideticom.com

http://www.eideticom.com/

SNIA Computational Storage Terminology

• Computational Storage Processor (CSP): A component that provides Computational Storage Services to a storage system without providing persistent storage

25

CSP

NoLoad® Computational Storage Processor (CSP)

Eideticom’s NoLoad® CSP is purpose built for acceleration of storage and compute intensive workloads

NoLoad® CSP = NVMe Computational Accelerators

• Storage Accelerators: Compression, Encryption, Erasure Coding, Deduplication, RAID

• Compute Accelerators: Data Analytics, AI and ML

NoLoad® CSP = Consumable Accelerators• NVMe Standards-based Interface

• Leverages existing NVMe eco-system

• In-box drivers for all major OS

• It Just Works!

U.2 FPGACard

COTS PCIeFPGA Cards

CustomerPlatforms

FPGAor

ASIC

NoLoad®

IP

26

NoLoad® CSP – Platforms

NoLoad® CSP U.2

• Standard U.2 SSD form-factor: Utilizing SFF-8639 connector.

NoLoad® CSP Alveo

• Standard GPU form-factor: x16 PCIe

• Deployed on Xilinx Alveo U200, 250 or U280

PCIe Gen4

• 16GB/s of data ingestion/egestion.

Eideticom NoLoad IP:

• NVM Express end-point

• Storage and Compute Accelerators

• NVMe SGL support

• CMB and P2P support

27

Available Now

Storage Form-Factor

Classic Form-Factor

NoLoad® CSP – Software Stack

• NoLoad® CSP & Hardware Evaluation Kits

Har

dwar

eO

S

libnoload

• No changes to OS• Use In-box NVMe drivers• Use NVMe-MI for

management

SPDK

ApplicationsManagement

nvme-clinvme-of

etc Use

r spa

ce

• Both kernel and User space frameworks supported

28

NVMe support to be standardized!

Eideticom End Solutions – RocksDB Acceleration

Details• Eideticom’s NoLoad CSP• Xilinx Alveo U280 (HBM)• Dell R7425 PowerEdge server • RocksDB• Linux Operating System• 2 NoLoad instances with

compression offload

Bottom Line• 6x more transactions per sec• 2.5x more efficient• 4x reduced NAND costs• Improved QoS

29

libnoload

Management

nvme-clinvme-of

etc

4.20 Linux KernelUbuntu 18.04 LTS

RocksDB

Eideticom End Solutions – Compression Offload in ZFS

Details• Eideticom’s NoLoad CSP in U.2• ZFS updated to integrate directly with

Eideticom’s NoLoad CSP• Benchmarking on Dell AMD EPYC

server• Supports Burst Buffer architectures by

providing a fast storage layer• Test set with 30% compressible data

Bottom Line• 3+ GB/s compression per NoLoad U2• 26X improvement in CPU loading• ZFS performance scales linearly with

number of NoLoads

Scalable number of NoLoad U2s

Dell R7425 AMD EPYC Server

Dell NVMe SSDs

30

Eideticom HQ Eideticom (Bay Area)3553 31st NW, 168 South Park,Calgary, AB, San Francisco, CA 94107Canada T2L 2K7 USA

www.eideticom.com

Contact: [email protected]

http://www.eideticom.com/

RESEARCH

NGD SystemsScott ShadleyVP of Marketingwww.ngdsystems.com


Market Innovator with In-Situ Processing1st Deployed NVMe Computational Storage

Off-the-Shelf NVMe Storage PlatformIndustry Leading NVMe Capacity available

ASIC-based Solution:Larger ScaleLower PowerBetter TCO

NGD Systems Computational Storage

June 11, 2019 NGD Systems, Inc. – Webinar 33

Building a P-CSS CSD for Scale and Ease

June 11, 2019

Needed key attributes for ease of customer use: Use standard protocols (NVMe) Minimize data movement (Faster Response, Lower cost per result) Improve Capacity and Power to maximize Customer TCO

Host Platform

Standard NVMe Protocol

AI

Core Solution Stack:

Moving Compute to Data

Programmable Computational Storage Services on the 1st M.2 NVME Computational Storage Drive

NGD Systems, Inc. – Webinar 34

How to Deploy the NGD SolutionA look at the HW Platform

Single-chip Solution reduces latency and improves results

A Look at In-Situ Processing

June 11, 2019 NGD Systems, Inc. – Webinar 35

Form Factor AvailableIn 2019

Capacity (TB)

MAX Power (W)

M.2 22110 NOW up to 8 8EDSFF E1.S Q3 up to 16 12EDSFF E1.L Q3 Up to 32 12U.2 15mm NOW up to 32 12AiC FHTQL up to 64 15

Solutions and Products

June 11, 2019

>10x faster Image Classification

40% faster 1/3 the Space

Lower TCO

Native, Real-Time Applications

500x FasterImage Matching

OS runningOn Drive


Amplifying TCO for Hadoop

June 11, 2019

Datanode Config:Single E5-2620v4, 32GB DRAM, 12*8 TB SAS HDD18U Total Density in 18U = 864TB

9 Cores for Data Processing

Datanode Config:Single E5-2620v4, 32GB DRAM, 36*8TB NVMe3U Total Density in 3U = 864TB 432 Cores for Data Processing

8 Cores with NGD M.2 Computational SSDs

@ ScaleSaves Power!Saves Space!Saves Time!


Scalability and Energy Savings with CNN

June 11, 2019

Improvements:Query / Sec by 6xEnergy Cons. (KJ) by 3X

Proven with Microsoft Research


Bringing Intelligence to Storage

Thank YouScott Shadley, VP Marketing

@SMShadley, @[email protected]

www.NGDSystems.comJune 11, 2019 NGD Systems, Inc. – Webinar

https://twitter.com/SMShadley

https://twitter.com/NGDSystems

mailto:[email protected]


RESEARCH

Panel Discussion


Audience Survey Question #1

How familiar is your organization with computational storage? (check one; xx responses):

• Very familiar/interested; we identified use cases forcomputational storage within our organization 37%

• Familiar/interested; we have studied computational storagefor possible use in our organization 37%

• Familiar/rejected; we have studied it, but we don’t thinkthat there are any use cases for it in our organization 0%

• I am familiar with computational storage, but I don’t knowto what extent my organization has looked at it 24%

• Unfamiliar with the concept of computational storage 3%


Panel Question #1

What do you see as the leading use case today for computational storage?– NGD Systems– Samsung– ScaleFlux– Eideticom



What do you see as the advantages of computational storage over classical server-based approaches? (check all that apply; xx responses):

• Reduction/elimination of data movement: 79%• Reduction of power consumption and cooling: 33%• Reduction of physical footprint: 30%• Reduction in equipment cost: 36%• Not convinced there are any advantages: 3%


Panel Question #2

What are the typical savings that can be realized by using computational storage for the right use cases?– Samsung– ScaleFlux– Eideticom– NGD Systems



What do you see as the greatest challenges to implementing computational storage in your organization? (select all that apply; xx responses):

• Finding relevant use cases for computational storage: 25%• Getting applications to support computational storage: 54%• Needing to integrate solutions ourselves (DIY): 33%• Being able to acquire the solution from standard channels

such as OEMs, integrators, or resellers: 38%• Getting adequate support to achieve the expected

performance advantages of computational storage: 25%• Having the expertise within our organization to use it: 13%


Panel Question #3

What do you think is necessary to overcome the ecosystem challenges that computational storage faces?– ScaleFlux– Eideticom– NGD Systems– Samsung


Audience Q&A

RESEARCH

Thank You For Attending

RESEARCH

g2m research multi-vendor webinar: …g2minc.com › wp-content › uploads › 2019 › 06 ›...

Documents