g2m research multi-vendor webinar: …g2minc.com › wp-content › uploads › 2019 › 06 ›...
TRANSCRIPT
RESEARCH
G2M Research Multi-Vendor Webinar: Computational Storage – Use CasesTuesday June 22, 2019
SPONSORED BY:
2 © 2019 G2M Communications, Inc.. All rights reserved.RESEARCH
Webinar Agenda
9:00-9:05 Ground Rules and Webinar Topic Introduction(G2M Research)
9:06-9:29 Sponsoring Vendor presentations on topic (6 minute each) 9:30-9:31 Audience Survey 1 (2 minutes) 9:32-9:41 Key Question 1 (2-minute question; 2 minutes response per
vendor) 9:42-9:43 Audience Survey 2 (2 minutes) 9:44-9:53 Key Question 2 (2-minute question; 3 minutes response per
vendor) 9:54-9:55 Audience Survey 3 (2 minutes)9:56-10:05 Key Question 3 (2-minute question; 3 minutes response per
vendor)10:06-10:18 Audience Q&A (13 minutes) 10:19-10:20 Wrap-Up
RESEARCH
G2M Research Introductionand Ground RulesMike HeumannManaging Partner, G2M Research
4 © 2019 G2M Communications, Inc.. All rights reserved.RESEARCH
JB BakerSr Director, Product Management(www.scaleflux.com)
Panelists
Host/Emcee: Mike HeumannManaging PartnerG2M Researchwww.g2minc.com
Pankaj MehraVP, Product Planning Team(www.samsung.com )
Stephen BatesChief Technology Officer(www.eideticom.com)
Scott ShadleyVP of Marketing (www.ngdsystems.com)
5 © 2019 G2M Communications, Inc.. All rights reserved.RESEARCH
What is Computational Storage?
Computational Storage is a means to accelerate application execution by colocating processing with data storage
It is an extension of the data locality concept (put processing where the data resides, vs moving data to the processor)
The exponential growth in the size of data sets has been a motivational factor (“petabyte-scale real-time analytics”)
The goals of computational storage are:– Reduce the time to perform petabyte-scale computations– Reduce the energy and cooling for petabyte-scale analytics– Reduce the cost and footprint for petabyte-scale analytics– Allow processor power to scale linearly with the size of the
problem’s data set
6 © 2019 G2M Communications, Inc.. All rights reserved.RESEARCH
Is Computational Storage a New Concept?
The idea of localizing storage with processors really started with massively parallel processors– Goodyear MPP (1983)
Content-Addressable Storage (CAS) also contained some of the attributes of computational storage (search capabilities built into storage)– EMC Centera (2001)
– IBM DR550
Computational Storage can theoretically be performed at the storage chip (flash ASIC), storage device (SSD), system, memory, or potentially other levels
7 © 2019 G2M Communications, Inc.. All rights reserved.RESEARCH
Potential Challenges to Computational Storage Market Acceptance
Application Modification: Whenever applications have to be modified to accommodate a technology, adoption times are significantly lengthened.
Go-To-Market Channel Adoption: Are computational storage devices available from standard IT sources (SIs, resellers, OEMs).
“Crossing The Chasm”: Many IT buyers won’t adopt a new technology until it has been adopted by “mainstream” companies.
Standardization: Do standards for the technology exist to help avoid vendor lock-in?
© 2019 Storage Networking Industry Association. All Rights Reserved.
SNIA Computational Storage TWG: Interoperability Between Computational Storage Devices
8
SNIA Computational Storage TWG Founded in October 201840+ Participating Companies, 128+ Individual Members
© 2019 Storage Networking Industry Association. All Rights Reserved.
SNIA TWG: New Product Categories
Computational Storage Device (CSx)
Computational Storage Drive (CSD)Computational Storage Processor (CSP)Computational Storage Array (CSA)
9
Co m pu t a t ional S t o r age Arr a y(Acc e ss v ia C S P and / o r di r e c t t o S t o r age )
O p t ional Pr o x iedS t o r age Acc e s s
O p t ional Di r e c tS t o r age Acc e s s
Co m pu t a t ional S t o r age D r i v e(Acc e ss v ia C S P and / o r di r e c t t o S t o r age )
Co m pu t a t ionalS t o r age Pr o c e ss o r
Co m pu t a t ionalS t o r age D r i v e
T r adi t iona lS t o r age De v i c e
Ho s t A gen t 1
F ab r i c (P C I e , E t he r ne t, e t c )
Ho s t A gen t 2 Ho s t A gen t N
S t o r ag e
I/ O I/ OM G N T M G N T I/ OM G N T
O p t ional Pr o x iedS t o r age Acc e s s
O p t ional Di r e c tS t o r age Acc e s s
I/ OM G N T I/ OM G N T
S t o r ag e
I/ OM G N T I/ OM G N T
Arra yCont r o l
C S
C S
C S
C S
C S
C S
C S
C S
Co m pu t a t iona lS t o r ag e
Pr o c e ss o r(s )
Co m pu t a t iona lS t o r ag e
Pr o c e ss o r(s )
Co m pu t a t iona lS t o r ag e
Pr o c e ss o r(s )
Co m pu t a t iona lS t o r ag e
Pr o c e ss o r(s )
C SC SD r i v e r s C S
D r i v e r s C SD r i v e r
A nd /O r A nd /O r
RESEARCH
SamsungPankaj MehraVice President/Product Planning Teamwww.samsung.com
C O L L A B O R A T E . I N N O V A T E . G R O W .
Smarts: Computational Storage
CPU
FPGA / GPU
Heavyweight compute in
CPU
SSD
Large data transfer
SSD SSDsTBs, PBs, EBs
dataset
Today’s Architectures Storage and Data Acceleration
CPU
Minimize data movement
NAND
SSD Controller Accelerator
CPU performance bottleneck IO Bandwidth bottleneck
(U.2, CPU PCIe Lanes)
Large data movement
Concurrent processing and transfer bandwidth
Reduced CPU Load
Near Data Processing
C O L L A B O R A T E . I N N O V A T E . G R O W .
Introducing SmartSSD
PM983F AIC PoC Results
• SmartSSD PM983F announced at Samsung Tech Day
For I/O-bound workloads, SmartSSD showed 3x to 4x better performance with scalability
Financial BI (VWAP1)Throughput (MOPS)
3.3x
PM983 PM983F
Database (MariaDB)TPC-H Score, Geo.Mean
PM983 PM983F
3.5x
* VWAP: Volume Weighted Average Price
Xilinx FPGA
Samsung Controller
Samsung V-NAND
PM983 1 PM983F 2 PM983F 4 PM983F
Airline Data Analysis (Spark)Query Execution Time (s)
4x1.8x
1.9x
10.17.2018
SmartSSD PCIe add-in card Shown successfully integrated with Bigstream Several data-intensive workloads easily ported
C O L L A B O R A T E . I N N O V A T E . G R O W .
Scaling Out, Not Sprawling Out
• Dense Spark Nodes
I/O bound tasks- Scan-filter (ad hoc condition)- Aggregate (custom average)- Map/Tag (newly trained detector)
2 choices:
Funnel data out of SSDs into CPU/GPU/FPGAonly to discard most of it, reduce it down to a scalar, or worse still write it all back only slightly modified
Process near the data locally, concurrently, without funneling
Talk “Arrow” to your SSD
parse
Data @rest
compress
encrypt decrypt
decompress
index
stats parse
Scan-filterPack an entire DB storage engineinto each drive
Save expensive server CPUs for monetizable DWUs, DTUS
C O L L A B O R A T E . I N N O V A T E . G R O W .
Unlimited Concurrency
Dense Storage Nodes
Density not frontend-limited- Not just compression and encryption- Offload entire threads
Scaling to high device counts Processing capacity
Enough processing and memory in each device to handle its own work of compaction and dedup
Data BandwidthEach device has internal data path to processingNet data processing bandwidth scales linearly
I/O for Virtual Machines
dedupe
Block
Data @rest
compress
encrypt decrypt
decompress
Pack a whole processing stackinto each drive
Then scale #devices to keep up with amount of data
C O L L A B O R A T E . I N N O V A T E . G R O W .
SmartSSD Technology Roadmap
• Roadmap to smaller FF (U.2) and greater integration with SSD controller
Moves Data out of SSD via CPU & MemoryLimited scalability: Funnels 3 SSDs 1 FPGA
External FPGA
SmartSSD
Next Gen SmartSSD
U.2 FF: Scale Processing to 24 or 48 devicesGreater Integration and More Bandwidth
C O L L A B O R A T E . I N N O V A T E . G R O W .
SmartSSD Acceleration Use Cases
C O L L A B O R A T E . I N N O V A T E . G R O W .
SmartSSD Platform
RESEARCH
ScaleFluxJB BakerSr. Director, Product Managementwww.scaleflux.com
19
Low Latency Flash StorageComputeEngines
ScaleFlux: Computational Storage Leader
Industry Standard
Infrastructure
Most Innovative StartupFMS 2018
Beijing, ShanghaiChina
HQ San Jose, CAUS Incorporated
Oct 2014
Well funded by venture & corporate investorsRich patent portfolio: 42 filed; 8 issued1st Production Computational Storage
Hyperscale, webscale & enterprise customers Multiple Tier-1 system OEMs qualified1000s of production drives shipped worldwide
20
Turnkey Application Deployments
200%Latency ConsistencyCustomer Specific Workload
CSD vs. NVMe
200%Queries Per SecondSysbench OLTP write-only
CSD vs. NVMe
161%Jobs Completed
Teragen, TerasortCSD+HDD vs. HDD only
3.0
*Applies to HDFS &
Up to 500%GZIP Write Throughput
FIO CSD GZIP vs. CPU GZIP
260%GZIP Write Throughput
YCSB Load Benchmark CSD vs. NVMe
21
Hadoop: Compression and Erasure CodingTeragen + Terasort
All benchmarks configurations use HDD as main storage24 Mapper/Reducers per Datanode *9 = 216 totalBetter performance on CSD reported with lower
Mapper/Reducers possible
Datanode Config:Dual E5-2640v3, 128GB DRAM, 12*6TB SAS HDD
…
Baseline Configuration
Add one ScaleFlux CSS 1000 to each
server
3.1 w/ EC (6+3)
With CSS
Seco
nds
Low
er is
Bet
ter
100
200
300
400
500
405
HDD StorageCPU GZIPCPU/ISA-L EC
Baseline
155HDD StorageCSS GZIPCSS EC
With CSS
62% Lower Run Time261% Job Throughput
1TB → 281GB 1TB → 296GB
Teragen
500
1000
1500
2000
2500
3000
2787
HDD StorageSnappy HDD Temp CPU GZIP CPU/ISA-L EC
Baseline
HDD StorageSnappy CSS Flash TempCSS GZIP CSS EC
1842
Seco
nds
Low
er is
Bet
ter
With CSS
34% Lower Run Time151% Job Throughput
Terasort
Compute AND Flash37% ↓ Run Time
160% Job Throughput3192
HDD StorageSnappy HDD Temp CPU GZIPCPU/ISA-L EC
Baseline
HDD StorageSnappy CSS Flash TempCSS GZIP CSS EC
1997
500
1000
1500
2000
2500
3000
3500
Seco
nds
Low
er is
Bet
ter
Teragen + Terasort
With CSS
CSS 1000 alleviates CPU & Storage I/O Bottlenecks to
Improve Run Time & Job Throughput
Application Note available for more detailed analysis
22
CSD HBase / YCSB BenchmarkImpact of csszlib, css flash (Bucket cache)
1x 3.2TB CSS 1000 AIC per Node
(HDFS Bucketcache, compressed)
DualE5 2640 V3@ 2.6GHZ
128GB DRAM
Linux CentOS 7.4
HBase 1.2.5 (ScaleFlux GitHub)
Hadoop2.7.3 (ScaleFlux GitHub)Heap Size: 32GB; On-Heap Cache 1.6GB; Memstore 1.6GB; Short Circuit Reads Enabled
10GbE Switch
YCSB Client
Name Node
YCSB 0.12.0 + CSS update12x 7200 RPM 6TB SAS
HDDs
Database
All data is stored on HDD for all configs
53% ↓
YCSB Data Compressibility Modified
~2.6X ↑18X↑
Datanode 1-3
Adding CSS 1000:
Enables GZIP storage savings at near “no compression” performance
Dramatically improves Run Time
23
Open ZFS: GZIP Compression OffloadUp to 5x Throughput vs CPU GZIP and 23% Storage Capacity savings vs. LZ4
ZFS Throughput by Record Size & Compression Type
(higher is better)
CSS 1000SW 2.3.1
FPGA 6136
DualE5 2640 V3@ 2.6GHZ
128GB DRAMZFS Storage Server
ZFS version 0.7.12-1 FIO Benchmark, 100% Random Write
Data xfer size = ZFS record size
0
50
100
150
200
250
300
No Compression LZ4 CSSZlib Gzip
Size
on
Disk
(KiB
)
Size on Disk for Canterbury Corpus Files*
(lower is better)
23% Storage Savings vs. LZ4Up to 5X
Write Throughput vs. CPU GZIP
Application Note available for more detailed analysis
*http://corpus.canterbury.ac.nz
Using CSS 1000 compute engines:
Accelerates GZIP throughput vs CPU
Reduces Storage consumption vs LZ4
SNIA Computational Storage Terminology
• Computational Storage Processor (CSP): A component that provides Computational Storage Services to a storage system without providing persistent storage
25
CSP
NoLoad® Computational Storage Processor (CSP)
Eideticom’s NoLoad® CSP is purpose built for acceleration of storage and compute intensive workloads
NoLoad® CSP = NVMe Computational Accelerators
• Storage Accelerators: Compression, Encryption, Erasure Coding, Deduplication, RAID
• Compute Accelerators: Data Analytics, AI and ML
NoLoad® CSP = Consumable Accelerators• NVMe Standards-based Interface
• Leverages existing NVMe eco-system
• In-box drivers for all major OS
• It Just Works!
U.2 FPGACard
COTS PCIeFPGA Cards
CustomerPlatforms
FPGAor
ASIC
NoLoad®
IP
26
NoLoad® CSP – Platforms
NoLoad® CSP U.2
• Standard U.2 SSD form-factor: Utilizing SFF-8639 connector.
NoLoad® CSP Alveo
• Standard GPU form-factor: x16 PCIe
• Deployed on Xilinx Alveo U200, 250 or U280
PCIe Gen4
• 16GB/s of data ingestion/egestion.
Eideticom NoLoad IP:
• NVM Express end-point
• Storage and Compute Accelerators
• NVMe SGL support
• CMB and P2P support
27
Available Now
Storage Form-Factor
Classic Form-Factor
NoLoad® CSP – Software Stack
• NoLoad® CSP & Hardware Evaluation Kits
Har
dwar
eO
S
libnoload
• No changes to OS• Use In-box NVMe drivers• Use NVMe-MI for
management
SPDK
ApplicationsManagement
nvme-clinvme-of
etc Use
r spa
ce
• Both kernel and User space frameworks supported
28
NVMe support to be standardized!
Eideticom End Solutions – RocksDB Acceleration
Details• Eideticom’s NoLoad CSP• Xilinx Alveo U280 (HBM)• Dell R7425 PowerEdge server • RocksDB• Linux Operating System• 2 NoLoad instances with
compression offload
Bottom Line• 6x more transactions per sec• 2.5x more efficient• 4x reduced NAND costs• Improved QoS
29
libnoload
Management
nvme-clinvme-of
etc
4.20 Linux KernelUbuntu 18.04 LTS
RocksDB
Eideticom End Solutions – Compression Offload in ZFS
Details• Eideticom’s NoLoad CSP in U.2• ZFS updated to integrate directly with
Eideticom’s NoLoad CSP• Benchmarking on Dell AMD EPYC
server• Supports Burst Buffer architectures by
providing a fast storage layer• Test set with 30% compressible data
Bottom Line• 3+ GB/s compression per NoLoad U2• 26X improvement in CPU loading• ZFS performance scales linearly with
number of NoLoads
Scalable number of NoLoad U2s
Dell R7425 AMD EPYC Server
Dell NVMe SSDs
30
Eideticom HQ Eideticom (Bay Area)3553 31st NW, 168 South Park,Calgary, AB, San Francisco, CA 94107Canada T2L 2K7 USA
www.eideticom.com
Contact: [email protected]
Market Innovator with In-Situ Processing1st Deployed NVMe Computational Storage
Off-the-Shelf NVMe Storage PlatformIndustry Leading NVMe Capacity available
ASIC-based Solution:Larger ScaleLower PowerBetter TCO
NGD Systems Computational Storage
June 11, 2019 NGD Systems, Inc. – Webinar 33
Building a P-CSS CSD for Scale and Ease
June 11, 2019
Needed key attributes for ease of customer use: Use standard protocols (NVMe) Minimize data movement (Faster Response, Lower cost per result) Improve Capacity and Power to maximize Customer TCO
Host Platform
Standard NVMe Protocol
AI
Core Solution Stack:
Moving Compute to Data
Programmable Computational Storage Services on the 1st M.2 NVME Computational Storage Drive
NGD Systems, Inc. – Webinar 34
How to Deploy the NGD SolutionA look at the HW Platform
Single-chip Solution reduces latency and improves results
A Look at In-Situ Processing
June 11, 2019 NGD Systems, Inc. – Webinar 35
Form Factor AvailableIn 2019
Capacity (TB)
MAX Power (W)
M.2 22110 NOW up to 8 8EDSFF E1.S Q3 up to 16 12EDSFF E1.L Q3 Up to 32 12U.2 15mm NOW up to 32 12AiC FHTQL up to 64 15
Solutions and Products
June 11, 2019
>10x faster Image Classification
40% faster 1/3 the Space
Lower TCO
Native, Real-Time Applications
500x FasterImage Matching
OS runningOn Drive
NGD Systems, Inc. – Webinar 36
Amplifying TCO for Hadoop
June 11, 2019
Datanode Config:Single E5-2620v4, 32GB DRAM, 12*8 TB SAS HDD18U Total Density in 18U = 864TB
9 Cores for Data Processing
Datanode Config:Single E5-2620v4, 32GB DRAM, 36*8TB NVMe3U Total Density in 3U = 864TB 432 Cores for Data Processing
8 Cores with NGD M.2 Computational SSDs
@ ScaleSaves Power!Saves Space!Saves Time!
NGD Systems, Inc. – Webinar 37
Scalability and Energy Savings with CNN
June 11, 2019
Improvements:Query / Sec by 6xEnergy Cons. (KJ) by 3X
Proven with Microsoft Research
NGD Systems, Inc. – Webinar 38
Bringing Intelligence to Storage
Thank YouScott Shadley, VP Marketing
@SMShadley, @[email protected]
www.NGDSystems.comJune 11, 2019 NGD Systems, Inc. – Webinar
RESEARCH
Panel Discussion
41 © 2019 G2M Communications, Inc.. All rights reserved.RESEARCH
Audience Survey Question #1
How familiar is your organization with computational storage? (check one; xx responses):
• Very familiar/interested; we identified use cases forcomputational storage within our organization 37%
• Familiar/interested; we have studied computational storagefor possible use in our organization 37%
• Familiar/rejected; we have studied it, but we don’t thinkthat there are any use cases for it in our organization 0%
• I am familiar with computational storage, but I don’t knowto what extent my organization has looked at it 24%
• Unfamiliar with the concept of computational storage 3%
42 © 2019 G2M Communications, Inc.. All rights reserved.RESEARCH
Panel Question #1
What do you see as the leading use case today for computational storage?– NGD Systems– Samsung– ScaleFlux– Eideticom
43 © 2019 G2M Communications, Inc.. All rights reserved.RESEARCH
Audience Survey Question #2
What do you see as the advantages of computational storage over classical server-based approaches? (check all that apply; xx responses):
• Reduction/elimination of data movement: 79%• Reduction of power consumption and cooling: 33%• Reduction of physical footprint: 30%• Reduction in equipment cost: 36%• Not convinced there are any advantages: 3%
44 © 2019 G2M Communications, Inc.. All rights reserved.RESEARCH
Panel Question #2
What are the typical savings that can be realized by using computational storage for the right use cases?– Samsung– ScaleFlux– Eideticom– NGD Systems
45 © 2019 G2M Communications, Inc.. All rights reserved.RESEARCH
Audience Survey Question #3
What do you see as the greatest challenges to implementing computational storage in your organization? (select all that apply; xx responses):
• Finding relevant use cases for computational storage: 25%• Getting applications to support computational storage: 54%• Needing to integrate solutions ourselves (DIY): 33%• Being able to acquire the solution from standard channels
such as OEMs, integrators, or resellers: 38%• Getting adequate support to achieve the expected
performance advantages of computational storage: 25%• Having the expertise within our organization to use it: 13%
46 © 2019 G2M Communications, Inc.. All rights reserved.RESEARCH
Panel Question #3
What do you think is necessary to overcome the ecosystem challenges that computational storage faces?– ScaleFlux– Eideticom– NGD Systems– Samsung
47 © 2019 G2M Communications, Inc.. All rights reserved.RESEARCH
Audience Q&A
RESEARCH
Thank You For Attending
RESEARCH