leveraging capi flash for apache spark · • ibm corporation 2016 • ibm, the ibm logo and...
TRANSCRIPT
© 2016 OpenPOWER Foundation
Leveraging CAPI Flash for Apache Spark
Jan S. Rellermeyer Research Staff Member
IBM Research – Austin
Thomas S. Hubregtsen Research Staff Member
IBM Research – Austin
Generation 1:
• Workload: Batch/Unstructured
• Resiliency (Hadoop): through data replication
• Key parameter: Disk bandwidth
Big Data Systems are:
Scalable
Resilient
Easy to use
Big Data Systems: Evolution
© 2016 OpenPOWER Foundation 2
Generation 2
• Workload:
Interactive/Iterative
• Resiliency (Spark):
through in-memory
re-computation
• Key parameter:
Memory capacity
Big Data Systems are:
Scalable
Resilient
Easy to use
Generation 1:
• Workload: Batch/Unstructured
• Resiliency (Hadoop): through data replication
• Key parameter: Disk bandwidth
Big Data Systems: Evolution
© 2016 OpenPOWER Foundation 3
Source: http://www.businessinsider.com/
ibm-spark-good-news-for-databricks-2015-6?IR=T,
June 2015
Source:
http://spark.apache.org/
Big Data Systems: Spark
© 2016 OpenPOWER Foundation 4
spill
0
50000
100000
150000
200000
250000
300000
350000
400000
1GB 2GB 4GB 8GB 16GB 32GB 64GB 128GB
Ru
nti
me (
ms)
Total Heap Memory
x Degrees of Separation on Spark
2x
Experiment: Artificially reducing available memory
© 2016 OpenPOWER Foundation 5
The amount of data is growing exponentially
© 2016 OpenPOWER Foundation 6
Source: Calista Redmon; OpenPOWER summit 2016
Flash: Fast storage or slow memory? Price ($/TB raw) $$$$$
Capacity ≤ 1 TB per system
Access Latency ~100ns
BW ~12GB/s
Price ($/TB raw) $$$
Capacity Up to 56 TB per system
Access Latency ~100us (~1M IOPS)
BW ~8GB/s
Price ($/TB raw) $
Capacity ≥ 6 TB per drive (capacity optimized)
Access Latency ~5ms (~75-100 IOPS) (7200 RPM)
BW ~0.1GB/s
© 2016 OpenPOWER Foundation 7
Source: Brad Brech
DRAM Flash HDD Unit
Type DDR 1600 SATA SATA
Bandwidth 1 0.1 0.01 Gb/s/$
IOPS 1,000,000 1,000 1 IOPS/$
Capacity 100 1,000 10,000 MB/$
Latency 100 50,000 5,000,000 Ns
Characteristics of different storage media (in orders of magnitude)
© 2016 OpenPOWER Foundation 8
Source: Peter Hofstee
Source:
Objective Analysis,
August 2007
Source:
http://blogs-images.forbes.com/jimhandy/files/2014/04/2014-04-30-
DRAM-NAND-Price-per-GB-Trends-2010-2014-paint.jpg, 2014
Trend: Price DRAM vs NAND Flash
© 2016 OpenPOWER Foundation 9
Trend: Bandwidth DRAM vs SSD
© 2016 OpenPOWER Foundation 10
Source: Sandisk IT blog
• Algorithm: Six degrees of separation from Kevin Bacon
• Input set: 10,000 movies, 1-101 actors per movie
• Hardware: IBM Power System S882L - two 12-core 3.02 GHz Power8 processor cards - 512 GB DRAM - Single Hard Disk Drive - Flash storage: IBM FlashSystem 840
• Software: - Ubuntu 14.04 Little Endian - Apache Spark 1.3
Experiment: Spilling files to Flash
© 2016 OpenPOWER Foundation 11
*Not using the unified memory model
*
Data management in Apache Spark
© 2016 OpenPOWER Foundation 12
1
10
100
1000
10000
1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950
Exe
cuti
on
tim
e in
mili
seco
nd
s, u
sin
g a
loga
rith
mic
sca
le
Available memory in gigabytes
Spark on HDD
Spark on ramdisk
Baseline
1.01x speedup
Proxy experiment: Spilling files to DRAM
© 2016 OpenPOWER Foundation 13
CAPI: The advantages
• Accelerators can work with the same memory addresses that the processors use
• Removes OS and device driver overhead
Source: https://openpowerfoundation.org/blogs/capi-drives-business-performance/ © 2016 OpenPOWER Foundation 14
• Read/write commands issued via APIs from applications to eliminate 97% of code path length • Saves 20-30 cores per 1M IOPS
CAPI: Reduction in code path
© 2016 OpenPOWER Foundation 15
Source: Brad Brech/Damir Jamsek
Identical hardware with 2 different paths to data
FlashSystem
Conventional
I/O (FC) CAPI
IBM POWER S822L >5x better IOPS
per HW thread
>2x lower latency
CAPI: IOPS and Latency difference
© 2016 OpenPOWER Foundation 16 Source: Brad Brech/Damir Jamsek
• Algorithm: Six degrees of separation from Kevin Bacon
• Input set: 10,000 movies, 1-101 actors per movie
• Hardware: IBM Power System S882L - two 12-core 3.02 GHz Power8 processor cards - 512 GB DRAM - Single Hard Disk Drive - Flash storage: IBM FlashSystem 840 connected using CAPI
• Software: - Ubuntu 14.04 Little Endian - Apache Spark 1.3
Experiment: Spilling key-value pairs to Flash using CAPI
© 2016 OpenPOWER Foundation 17
0
50000
100000
150000
200000
250000
300000
350000
400000
1GB 2GB 4GB 8GB 16GB 32GB 64GB 128GB
Ru
nti
me (
ms)
Total Heap Memory
x Degrees of Separation on Spark
Disk
CAPI+Flash
4 x memory reduction through CAPI + Flash
Result: Reduced the memory footprint
© 2016 OpenPOWER Foundation 18
In collaboration with Jan Rellermeyer
Number of Terasort instances in parallel
Result: Increased number of parallel instances
© 2016 OpenPOWER Foundation 19
In collaboration with Jan Rellermeyer
Direct Storage • Application constrained by single-system memory
capacity. Typical growth is through additional compute nodes.
• CAPI Flash APIs offer highly-efficient flash access, increased total capacity at better $ / throughput.
Data Cache • Application uses in-memory caches for data storage,
and typically-constrained by ratios of memory to underlying storage.
• CAPI Flash APIs offer access to much larger ephemeral or persistent data in Flash, freeing up RAM.
Fast Storage • Application is constrained by IO overhead and
throughput of existing storage infrastructure.
• CAPI Flash APIs offer extremely high IO per CPU thread with low latency.
CAPI Flash acceleration: Use cases and advantages
© 2016 OpenPOWER Foundation 20
Source: Brad Brech
Do you get what you pay for?
• McKinsey (2008): data center CPU utilization is roughly 6%
• Gartner (2012): industry-wide utilization rate is 12%
Chances are good that you started buying new servers primarily because you needed more memory banks
© 2016 OpenPOWER Foundation 21
Source: news.com/2014/07/01/ibm-set-to-open-new-softlayer-data-center-in-london
Get more Spark out of your data center Contact me at [email protected] for more information
23
• IBM Corporation 2016
• IBM, the IBM logo and ibm.com are registered trademarks, and other company, product, or service names may be trademarks or service marks of International Business Machines Corporation in the United States, other countries, or both. A current list of IBM trademarks is available on the web at “Copyright and trademark information”at www.ibm.com/legal/copytrade.shtml
• Other company, product, and service names may be trademarks or service marks of others.
• References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates.
• IBM and IBM Credit LLC do not, nor intend to, offer or provide accounting, tax or legal advice to clients. Clients should consult with their own financial, tax and legal advisors. Any tax or accounting treatment decisions made by or on behalf of the client are the sole responsibility of the customer.
• IBM Global Financing offerings are provided through IBM Credit LLC in the United States, IBM Canada Ltd. in Canada, and other IBM subsidiaries and divisions worldwide to qualified commercial and government clients. Rates and availability are based on a client’s credit rating, financing terms, offering type, equipment type and options, and may vary by country. Some offerings are not available in certain countries. Other restrictions may apply. Rates and offerings are subject to change, extension or withdrawal without notice.
• Tux penguin image used with permission: Larry Ewing <[email protected]>