cloud ppt.pptx
TRANSCRIPT
-
8/11/2019 cloud ppt.pptx
1/30
Resilient Distributed DatasetsA Fault-Tolerant Abstraction for
In-Memory Cluster Computing
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das,Ankur Dave, Justin Ma, Murphy McCauley,Michael Franklin, Scott Shenker, Ion Stoica
-
8/11/2019 cloud ppt.pptx
2/30
MotivationMapReduce greatly simplified big data analysison large, unreliable clusters
But as soon as it got popular, users wanted more:
More complex, multi-stage applications
(e.g. iterative machine learning & graph processing)
More interactivead-hoc queries
Response: specializedframeworks for some of
these apps (e.g. Pregel for graph processing)
-
8/11/2019 cloud ppt.pptx
3/30
MotivationComplex apps and interactive queries both needone thing that MapReduce lacks:
Efficient primitives for data sharing
In MapReduce, the only way to share dataacross jobs is stable storageslow!
-
8/11/2019 cloud ppt.pptx
4/30
Examples
iter. 1 iter. 2 . . .
Input
HDFSread
HDFSwrite
HDFSread
HDFSwrite
Input
query 1
query 2
query 3
result 1
result 2
result 3
. . .
HDFS
read
Slow due to replication and disk I/O,
but necessary for fault tolerance
-
8/11/2019 cloud ppt.pptx
5/30
iter. 1 iter. 2 . . .
Input
Goal: In-Memory Data Sharing
Input
query 1
query 2
query 3
. . .
one-time
processing
10-100 faster than network/disk, but how to get FT?
-
8/11/2019 cloud ppt.pptx
6/30
Challenge
How to design a distributed memory abstractionthat is both fault-tolerantand efficient?
-
8/11/2019 cloud ppt.pptx
7/30
Challenge
Existing storage abstractions have interfacesbased onfine-grainedupdates to mutable state
RAMCloud, databases, distributed mem, Piccolo
Requires replicating data or logs across nodesfor fault tolerance
Costly for data-intensive apps10-100x slower than memory write
-
8/11/2019 cloud ppt.pptx
8/30
Solution: Resilient Distributed
Datasets (RDDs)Restricted form of distributed shared memory
Immutable, partitioned collections of recordsCan only be built through coarse-grained
deterministic transformations (map, filter, join, )
Efficient fault recovery using lineageLog one operation to apply to many elementsRecompute lost partitions on failure
No cost if nothing fails
-
8/11/2019 cloud ppt.pptx
9/30
Input
query 1
query 2
query 3
. . .
RDD Recovery
one-time
processing
iter. 1 iter. 2 . . .
Input
-
8/11/2019 cloud ppt.pptx
10/30
Generality of RDDs
Despite their restrictions, RDDs can expresssurprisingly many parallel algorithms
These naturally apply the same operation to many items
Unify many current programming modelsData flow models:MapReduce, Dryad, SQL,
Specialized modelsfor iterative apps: BSP (Pregel),iterative MapReduce (Haloop), bulk incremental,
Support new apps that these models dont
-
8/11/2019 cloud ppt.pptx
11/30
Memory
bandwidth
Network
bandwidth
Tradeoff Space
Granularityof Updates
Write Throughput
Fine
Coarse
Low High
K-V
stores,databases,
RAMCloudBest for batch
workloads
Best for
transactionalworkloads
HDFS RDDs
-
8/11/2019 cloud ppt.pptx
12/30
Outline
Spark programming interface
Implementation
Demo
How people are using Spark
-
8/11/2019 cloud ppt.pptx
13/30
Spark Programming Interface
DryadLINQ-like API in the Scala language
Usable interactively from Scala interpreter
Provides:Resilient distributed datasets (RDDs)Operations on RDDs: transformations(build new RDDs),
actions(compute and output results)Control of each RDDspartitioning(layout across nodes)
andpersistence(storage in RAM, on disk, etc)
-
8/11/2019 cloud ppt.pptx
14/30
Example: Log Mining
Load error messages from a log into memory, theninteractively search for various patterns
lines = spark.textFile(hdfs://...)
errors = lines.filter(_.startsWith(ERROR))messages = errors.map(_.split(\t)(2))
messages.persist()
Block 1
Block 2
Block 3
Worker
Worker
Worker
Master
messages.filter(_.contains(foo)).count
messages.filter(_.contains(bar)).count
tasks
results
Msgs. 1
Msgs. 2
Msgs. 3
Base RDDTransformed RDD
Action
Result:full-text search of Wikipediain
-
8/11/2019 cloud ppt.pptx
15/30
RDDs track the graph of transformations thatbuilt them (their lineage) to rebuild lost data
E.g.: messages = textFile(...).filter(_.contains(error)).map(_.split(\t)(2))
HadoopRDD
path = hdfs://
FilteredRDD
func = _.contains(...)
MappedRDD
func = _.split()
Fault Recovery
HadoopRDD FilteredRDD MappedRDD
-
8/11/2019 cloud ppt.pptx
16/30
Fault Recovery Results
119
57 56 58 58
81
57 59 57 59
020
40
60
80100
120
140
1 2 3 4 5 6 7 8 9 10
I
teratriontime
(s)
Iteration
Failure happens
-
8/11/2019 cloud ppt.pptx
17/30
Example: PageRank
1. Start each page with a rank of 1
2. On each iteration, update each pages rank to
ineighborsranki/ |neighborsi|
links = // RDD of (url, neighbors) pairsranks = // RDD of (url, rank) pairs
for (i links.map(dest => (dest, rank/links.size))
}.reduceByKey(_ + _)
}
-
8/11/2019 cloud ppt.pptx
18/30
Optimizing Placement
links& ranksrepeatedly joined
Can co-partitionthem (e.g. hash
both on URL) to avoid shuffles
Can also use app knowledge,e.g., hash on DNS name
links = links.partitionBy(new URLPartitioner())
reduce
Contribs0
join
join
Contribs2
Ranks0(url, rank)
Links(url, neighbors)
. . .
Ranks2
reduce
Ranks1
-
8/11/2019 cloud ppt.pptx
19/30
PageRank Performance
171
72
23
0
50
100
150
200
Tim
eperiteration
(s)
Hadoop
Basic Spark
Spark + Controlled
Partitioning
-
8/11/2019 cloud ppt.pptx
20/30
Implementation
Runs on Mesos [NSDI 11]to share clusters w/ Hadoop
Can read from any Hadoopinput source (HDFS, S3, )
Spark Hadoop MPI
Mesos
Node Node Node Node
No changes to Scala language or compiler Reflection + bytecode analysis to correctly ship code
www.spark-project.org
http://www.spark-project.org/http://www.spark-project.org/http://www.spark-project.org/http://www.spark-project.org/ -
8/11/2019 cloud ppt.pptx
21/30
-
8/11/2019 cloud ppt.pptx
22/30
Demo
-
8/11/2019 cloud ppt.pptx
23/30
Open Source Community
15contributors, 5+companies using Spark,3+applications projects at Berkeley
User applications:Data mining 40x faster than Hadoop (Conviva)Exploratory log analysis (Foursquare)
Traffic prediction via EM (Mobile Millennium)Twitter spam classification (Monarch)DNA sequence analysis (SNAP) . . .
-
8/11/2019 cloud ppt.pptx
24/30
Related WorkRAMCloud, Piccolo, GraphLab, parallel DBs
Fine-grained writes requiring replication for resilience
Pregel, iterative MapReduce
Specialized models; cant run arbitrary / ad-hoc queriesDryadLINQ, FlumeJava
Language-integrated distributed dataset API, but cannotshare datasets efficiently acrossqueries
Nectar [OSDI 10] Automatic expression caching, but over distributed FS
PacMan [NSDI 12] Memory cache for HDFS, but writes still go to network/disk
-
8/11/2019 cloud ppt.pptx
25/30
-
8/11/2019 cloud ppt.pptx
26/30
Behavior with Insufficient RAM
68.
8
58.1
40.7
29.7
11.5
0
20
40
60
80
100
0% 25% 50% 75% 100%
Iterationtim
e(s)
Percent of working set in memory
-
8/11/2019 cloud ppt.pptx
27/30
Scalability
184
111
76
116
80
62
15
6
3
0
50
100
150
200
250
25 50 100
Iterationtime(s
)
Number of machines
Hadoop
HadoopBinMem
Spark
274
157
106
197
121
87
143
61
33
0
50
100
150
200250
300
25 50 100
Iterationtime(s)
Number of machines
Hadoop
HadoopBinMem
Spark
Logistic Regression K-Means
-
8/11/2019 cloud ppt.pptx
28/30
Breaking Down the Speedup
15.4
13
.1
2
.9
8.4
6.9
2
.9
0
5
10
15
20
In-mem HDFS In-mem local file Spark RDD
Iterationtime(s)
Text Input
Binary Input
-
8/11/2019 cloud ppt.pptx
29/30
Spark Operations
Transformations
(define a new RDD)
mapfilter
sample
groupByKeyreduceByKey
sortByKey
flatMapunionjoin
cogroupcross
mapValues
Actions(return a result todriver program)
collect
reducecountsave
lookupKey
-
8/11/2019 cloud ppt.pptx
30/30
Task SchedulerDryad-like DAGs
Pipelines functions
within a stage
Locality & datareuse aware
Partitioning-awareto avoid shuffles
join
union
groupBy
map
Stage 3
Stage 1
Stage 2
A: B:
C: D:
E:
F:
G:
= cached data partition