enabling data analytics in a ssd - flash memory summit data analytics in a ssd a converged platform...
TRANSCRIPT
Enabling Data Analytics in a SSD
A Converged Platform Vladimir Alves, PhD CTO – NxGn Data
Flash Memory Summit 2015 Santa Clara, CA
1
The Evolution of Storage
Flash Memory Summit 2015 Santa Clara, CA
2
Since 1956 storage devices have been performing the same basic functions: read and write and sometimes fail…
Computational Storage
Flash Memory Summit 2015 Santa Clara, CA
4
A Converged Pla.orm that takes compute to data
High performance storage
Computational capabilities
Virtualization
A Paradigm Shift in Storage
Flash Memory Summit 2015 Santa Clara, CA
5
Flash Media
HOST I/F & FLASH MANAGEMENT
OS
HW ACCELERATION
VIRTUALIZATION (CONTAINER) SW AGENTS VIRTUAL TUNNEL
(TCP/IP)
+
In-‐Situ processing
A Paradigm Shift in Storage
Flash Memory Summit 2015 Santa Clara, CA
6
+
In-‐Situ processing
§ ~10x accelera&on in data analy&cs § Virtually eliminate network & interface traffic § 90% reduc&on in energy consump&on § 75% cost reduc&on
In-‐Situ Processing
A Paradigm Shift in Storage
Flash Memory Summit 2015 Santa Clara, CA
7
+
In-‐Situ processing
Distributed Grep on Amazon S3/ CloudFiles Ø Low-‐cost search/scan feature in cloud-‐storage
In-‐situ Cloud–enabled MapReduce Ø Low-‐cost MapReduce embedded into cloud-‐storage
Swi]/Ceph Storage Node Ø Lightweight, low-‐cost Storage Node for scalability
Making Hadoop Fly in Clouds Ø Smarter cloud storage for performance op&miza&on
And more…
System Use Case: MapReduce
Flash Memory Summit 2015 Santa Clara, CA
9
Leveraging In-‐Storage Computa&on
System Use Case: Distributed Grep
Flash Memory Summit 2015 Santa Clara, CA
10
Common Unix command for search: grep -Eh <regex> <inDir>/* | sort | uniq -c | sort -nr - counts lines in all files in <inDir> that match <regex> and displays the counts in descending order
C B B C
C A
3 C 1 A
Result File 2 File 1
System Use Case: Distributed Grep
Flash Memory Summit 2015 Santa Clara, CA
11
Map function in this case: - input is (file offset, line) - output is either:
1. an empty list [] (the line does not match) 2. a key-value pair [(line, 1)] (if it matches)
Reduce function in this case: - input is (line, [1, 1, ...]) - output is (line, n) where n is the number of 1s in the list.
Future Applications
Flash Memory Summit 2015 Santa Clara, CA
12
Genome Sequencing
High Performance Scientific Computing
IoT
Facial Recognition