tachyon: an open source memory-centric distributed storage system
TRANSCRIPT
![Page 1: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/1.jpg)
Haoyuan Li, Tachyon [email protected]
September 30, 2015 @ Strata and Hadoop World NYC 2015
An Open Source Memory-Centric Distributed Storage System
![Page 2: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/2.jpg)
Outline
• Open Source
• Introduction to Tachyon
• New Features
• Getting Involved
2
![Page 3: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/3.jpg)
Outline
• Open Source
• Introduction to Tachyon
• New Features
• Getting Involved
3
![Page 4: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/4.jpg)
History • Started at UC Berkeley AMPLab – From summer 2012 – Same lab produced Apache Spark and Apache Mesos
• Open sourced – April 2013 – Apache License 2.0 – Latest Release: Version 0.7.1 (August 2015)
• Deployed at > 100 companies
4
![Page 5: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/5.jpg)
Contributors Growth
5
v0.4!Feb ‘14
v0.3!Oct ‘13
v0.2 Apr ‘13
v0.1 Dec ‘12
v0.6!Mar ‘15
v0.5!Jul ‘14
v0.7!Jul ‘15
1 3 15
30
46
70
111
![Page 6: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/6.jpg)
Contributors Growth
6
> 150 Contributors (3x increment over the last Strata NYC)
> 50 Organizations
![Page 7: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/7.jpg)
Contributors Growth
7
One of the Fastest Growing Big Data Open Source Project
![Page 8: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/8.jpg)
Thanks to Contributors and Users!
8
![Page 9: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/9.jpg)
One Tachyon ProductionDeployment Example
• Baidu (Dominant Search Engine in China, ~ 50 Billion USD Market Cap)
• Framework: SparkSQL • Under Storage: Baidu’s File System • Storage Media: MEM + HDD • 100+ nodes deployment • 1PB+ managed space • 30x Performance Improvement
9
![Page 10: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/10.jpg)
Outline
• Open Source
• Introduction to Tachyon
• New Features
• Getting Involved
10
![Page 11: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/11.jpg)
Tachyon is an Open Source
Memory-centricDistributed
Storage System 11
![Page 12: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/12.jpg)
12
Why Tachyon?
![Page 13: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/13.jpg)
Performance Trend: Memory is Fast
• RAM throughput increasing exponentially
• Disk throughput increasing slowly
13
Memory-locality key to interactive response times
![Page 14: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/14.jpg)
Price Trend: Memory is Cheaper
source: jcmit.com 14
![Page 15: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/15.jpg)
Realized by many…
15
![Page 16: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/16.jpg)
16
Is the Problem Solved?
![Page 17: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/17.jpg)
17
Missing a Solution for the Storage Layer
![Page 18: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/18.jpg)
A Use Case Example with -
• Fast, in-memory data processing framework – Keep one in-memory copy inside JVM – Track lineage of operations used to derive data – Upon failure, use lineage to recompute data
map
filter map
join reduce
Lineage Tracking
18
![Page 19: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/19.jpg)
Issue 1
19
Data Sharing is the bottleneck in analytics pipeline:Slow writes to disk
Spark Job1
Spark mem block manager
block 1
block 3
Spark Job2
Spark mem block manager
block 3
block 1
HDFS / Amazon S3 block 1
block 3
block 2
block 4
storage engine & execution engine same process (slow writes)
![Page 20: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/20.jpg)
Issue 1
20
Spark Job
Spark mem block manager
block 1
block 3
Hadoop MR Job
YARN
HDFS / Amazon S3 block 1
block 3
block 2
block 4
Data Sharing is the bottleneck in analytics pipeline:Slow writes to disk
storage engine & execution engine same process (slow writes)
![Page 21: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/21.jpg)
Issue 1 resolved with Tachyon
21
Memory-speed data sharingamong jobs in different
frameworks execution engine & storage engine same process (fast writes)
Spark Job
Spark mem
Hadoop MR Job
YARN
HDFS / Amazon S3 block 1
block 3
block 2
block 4
HDFS disk
block 1
block 3
block 2
block 4 Tachyon!in-memory
block 1
block 3 block 4
![Page 22: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/22.jpg)
Issue 2
22
Spark Task
Spark memory block manager
block 1
block 3
HDFS / Amazon S3 block 1
block 3
block 2
block 4
execution engine & storage engine same process
Cache loss when process crashes
![Page 23: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/23.jpg)
Issue 2
23
crash
Spark memory block manager
block 1
block 3
HDFS / Amazon S3 block 1
block 3
block 2
block 4
execution engine & storage engine same process
Cache loss when process crashes
![Page 24: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/24.jpg)
HDFS / Amazon S3
Issue 2
24
block 1
block 3
block 2
block 4
execution engine & storage engine same process
crash
Cache loss when process crashes
![Page 25: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/25.jpg)
HDFS / Amazon S3 block 1
block 3
block 2
block 4 Tachyon!in-memory
block 1
block 3 block 4
Issue 2 resolved with Tachyon
25
Spark Task
Spark memory block manager
execution engine & storage engine same process
Keep in-memory data safe,even when a job crashes.
![Page 26: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/26.jpg)
Issue 2 resolved with Tachyon
26
HDFS disk
block 1
block 3
block 2
block 4
execution engine & storage engine same process
Tachyon!in-memory
block 1
block 3 block 4
crash
HDFS / Amazon S3 block 1
block 3
block 2
block 4
Keep in-memory data safe,even when a job crashes.
![Page 27: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/27.jpg)
HDFS / Amazon S3
Issue 3
27
In-memory Data Duplication & Java Garbage Collection
Spark Job1
Spark mem block manager
block 1
block 3
Spark Job2
Spark mem block manager
block 3
block 1
block 1
block 3
block 2
block 4
execution engine & storage engine same process (duplication & GC)
![Page 28: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/28.jpg)
Issue 3 resolved with Tachyon
28
No in-memory data duplication,much less GC
Spark Job1
Spark mem
Spark Job2
Spark mem
HDFS / Amazon S3 block 1
block 3
block 2
block 4
execution engine & storage engine same process (no duplication & GC)
HDFS disk
block 1
block 3
block 2
block 4 Tachyon!in-memory
block 1
block 3 block 4
![Page 29: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/29.jpg)
Previously Mentioned
• A memory-centric storage architecture
• Push lineage down to storage layer
29
![Page 30: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/30.jpg)
Tachyon Memory-Centric Architecture
30
![Page 31: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/31.jpg)
Tachyon Memory-Centric Architecture
31
![Page 32: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/32.jpg)
Lineage in Tachyon
32
![Page 33: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/33.jpg)
Outline
• Open Source
• Introduction to Tachyon
• New Features
• Getting Involved
33
![Page 34: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/34.jpg)
1) Eco-system: Enable new workload in any storage;
Work with the framework of your choice;
34
![Page 35: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/35.jpg)
2) Tachyon running in production environment,
both in the Cloud and on Premise.
35
![Page 36: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/36.jpg)
Use Case: Baidu
• Framework: SparkSQL • Under Storage: Baidu’s File System • Storage Media: MEM + HDD • 100+ nodes deployment • 1PB+ managed space • 30x Performance Improvement
36
![Page 37: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/37.jpg)
Use Case: a SAAS Company
• Framework: Impala
• Under Storage: S3
• Storage Media: MEM + SSD
• 15x Performance Improvement
37
![Page 38: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/38.jpg)
Use Case: an Oil Company
• Framework: Spark
• Under Storage: GlusterFS
• Storage Media: MEM only
• Analyzing data in traditional storage
38
![Page 39: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/39.jpg)
Use Case: a SAAS Company
• Framework: Spark
• Under Storage: S3
• Storage Media: SSD only
• Elastic Tachyon deployment
39
![Page 40: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/40.jpg)
40
What if data size exceeds memory capacity?
![Page 41: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/41.jpg)
41
3) Tiered Storage:Tachyon Manages More Than DRAM
MEM SSD
HDD
Faster
Higher Capacity
![Page 42: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/42.jpg)
42
Configurable Storage Tiers
MEM only
MEM + HHD
SSD only
![Page 43: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/43.jpg)
43
4) Pluggable Data Management Policy
Evict stale data to lower tier
Promote hot data to upper tier
![Page 44: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/44.jpg)
44
Pin Data in Memory
![Page 45: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/45.jpg)
5) Transparent Naming
45
![Page 46: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/46.jpg)
6) Unified Namespace
46
![Page 47: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/47.jpg)
More Features
• 7) Remote Write Support • 8) Easy deployment with Mesos and Yarn • 9) Initial Security Support • 10) One Command Cluster Deployment • 11) Metrics Reporting for Clients, Workers,
and Master
47
![Page 48: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/48.jpg)
12) More Under Storage Supports
48
![Page 49: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/49.jpg)
Reported Tachyon Usage
49
![Page 50: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/50.jpg)
Outline
• Open Source
• Introduction to Tachyon
• New Features
• Getting Involved
50
![Page 51: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/51.jpg)
Memory-Centric Distributed Storage
Welcome to try, contact, and collaborate!
51
JIRA New Contributor Tasks
![Page 52: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/52.jpg)
• Team consists of Tachyon creators, top contributors
• Series A ($7.5 million) from Andreessen Horowitz
• Committed to Tachyon Open Source
52
![Page 53: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/53.jpg)
53
![Page 54: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/54.jpg)
Strata NYC 2015
• Welcome to visit us at our booth #P18.
• Check out other Tachyon related talks. – First-ever scalable, distributed deep learning architecture
using Spark and Tachyon • Christopher Nguyen (Adatao, Inc.), Vu Pham (Adatao, Inc) • 2:05pm–2:45pm Thursday, 10/01/2015
– Faster time to insight using Spark, Tachyon, and Zeppelin • Nirmal Ranganathan (Rackspace Hosting) • 2:05pm–2:45pm Thursday, 10/01/2015
54
![Page 55: Tachyon: An Open Source Memory-Centric Distributed Storage System](https://reader031.vdocument.in/reader031/viewer/2022021813/5875bbb51a28ab33128b4763/html5/thumbnails/55.jpg)
• Try Tachyon: http://tachyon-project.org
• Develop Tachyon: https://github.com/amplab/tachyon
• Meet Friends: http://www.meetup.com/Tachyon
• Get News: http://goo.gl/mwB2sX
• Tachyon Nexus: http://www.tachyonnexus.com • Contact us: [email protected]
55