cloud ppt.pptx

8/11/2019 cloud ppt.pptx

1/30

Resilient Distributed DatasetsA Fault-Tolerant Abstraction for

In-Memory Cluster Computing

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das,Ankur Dave, Justin Ma, Murphy McCauley,Michael Franklin, Scott Shenker, Ion Stoica


2/30

MotivationMapReduce greatly simplified big data analysison large, unreliable clusters

But as soon as it got popular, users wanted more:

More complex, multi-stage applications

(e.g. iterative machine learning & graph processing)

More interactivead-hoc queries

Response: specializedframeworks for some of

these apps (e.g. Pregel for graph processing)


3/30

MotivationComplex apps and interactive queries both needone thing that MapReduce lacks:

Efficient primitives for data sharing

In MapReduce, the only way to share dataacross jobs is stable storageslow!


4/30

Examples

iter. 1 iter. 2 . . .

Input

HDFSread

HDFSwrite

HDFSread

HDFSwrite

Input

query 1

query 2

query 3

result 1

result 2

result 3

. . .

HDFS

read

Slow due to replication and disk I/O,

but necessary for fault tolerance


5/30

iter. 1 iter. 2 . . .

Input

Goal: In-Memory Data Sharing

Input

query 1

query 2

query 3

. . .

one-time

processing

10-100 faster than network/disk, but how to get FT?


6/30

Challenge

How to design a distributed memory abstractionthat is both fault-tolerantand efficient?


7/30

Challenge

Existing storage abstractions have interfacesbased onfine-grainedupdates to mutable state

RAMCloud, databases, distributed mem, Piccolo

Requires replicating data or logs across nodesfor fault tolerance

Costly for data-intensive apps10-100x slower than memory write


8/30

Solution: Resilient Distributed

Datasets (RDDs)Restricted form of distributed shared memory

Immutable, partitioned collections of recordsCan only be built through coarse-grained

deterministic transformations (map, filter, join, )

Efficient fault recovery using lineageLog one operation to apply to many elementsRecompute lost partitions on failure

No cost if nothing fails


9/30

Input

query 1

query 2

query 3

. . .

RDD Recovery

one-time

processing

iter. 1 iter. 2 . . .

Input


10/30

Generality of RDDs

Despite their restrictions, RDDs can expresssurprisingly many parallel algorithms

These naturally apply the same operation to many items

Unify many current programming modelsData flow models:MapReduce, Dryad, SQL,

Specialized modelsfor iterative apps: BSP (Pregel),iterative MapReduce (Haloop), bulk incremental,

Support new apps that these models dont


11/30

Memory

bandwidth

Network

bandwidth

Tradeoff Space

Granularityof Updates

Write Throughput

Fine

Coarse

Low High

K-V

stores,databases,

RAMCloudBest for batch

workloads

Best for

transactionalworkloads

HDFS RDDs


12/30

Outline

Spark programming interface

Implementation

Demo

How people are using Spark


13/30

Spark Programming Interface

DryadLINQ-like API in the Scala language

Usable interactively from Scala interpreter

Provides:Resilient distributed datasets (RDDs)Operations on RDDs: transformations(build new RDDs),

actions(compute and output results)Control of each RDDspartitioning(layout across nodes)

andpersistence(storage in RAM, on disk, etc)


14/30

Example: Log Mining

Load error messages from a log into memory, theninteractively search for various patterns

lines = spark.textFile(hdfs://...)

errors = lines.filter(_.startsWith(ERROR))messages = errors.map(_.split(\t)(2))

messages.persist()

Block 1

Block 2

Block 3

Worker

Worker

Worker

Master

messages.filter(_.contains(foo)).count

messages.filter(_.contains(bar)).count

tasks

results

Msgs. 1

Msgs. 2

Msgs. 3

Base RDDTransformed RDD

Action

Result:full-text search of Wikipediain


15/30

RDDs track the graph of transformations thatbuilt them (their lineage) to rebuild lost data

E.g.: messages = textFile(...).filter(_.contains(error)).map(_.split(\t)(2))

HadoopRDD

path = hdfs://

FilteredRDD

func = _.contains(...)

MappedRDD

func = _.split()

Fault Recovery

HadoopRDD FilteredRDD MappedRDD


16/30

Fault Recovery Results

119

57 56 58 58

81

57 59 57 59

020

40

60

80100

120

140

1 2 3 4 5 6 7 8 9 10

I

teratriontime

(s)

Iteration

Failure happens


17/30

Example: PageRank

1. Start each page with a rank of 1

2. On each iteration, update each pages rank to

ineighborsranki/ |neighborsi|

links = // RDD of (url, neighbors) pairsranks = // RDD of (url, rank) pairs

for (i links.map(dest => (dest, rank/links.size))

}.reduceByKey(_ + _)

}


18/30

Optimizing Placement

links& ranksrepeatedly joined

Can co-partitionthem (e.g. hash

both on URL) to avoid shuffles

Can also use app knowledge,e.g., hash on DNS name

links = links.partitionBy(new URLPartitioner())

reduce

Contribs0

join

join

Contribs2

Ranks0(url, rank)

Links(url, neighbors)

. . .

Ranks2

reduce

Ranks1


19/30

PageRank Performance

171

72

23

0

50

100

150

200

Tim

eperiteration

(s)

Hadoop

Basic Spark

Spark + Controlled

Partitioning


20/30

Implementation

Runs on Mesos [NSDI 11]to share clusters w/ Hadoop

Can read from any Hadoopinput source (HDFS, S3, )

Spark Hadoop MPI

Mesos

Node Node Node Node

No changes to Scala language or compiler Reflection + bytecode analysis to correctly ship code

www.spark-project.org
http://www.spark-project.org/http://www.spark-project.org/http://www.spark-project.org/http://www.spark-project.org/


21/30


22/30

Demo


23/30

Open Source Community

15contributors, 5+companies using Spark,3+applications projects at Berkeley

User applications:Data mining 40x faster than Hadoop (Conviva)Exploratory log analysis (Foursquare)

Traffic prediction via EM (Mobile Millennium)Twitter spam classification (Monarch)DNA sequence analysis (SNAP) . . .


24/30

Related WorkRAMCloud, Piccolo, GraphLab, parallel DBs

Fine-grained writes requiring replication for resilience

Pregel, iterative MapReduce

Specialized models; cant run arbitrary / ad-hoc queriesDryadLINQ, FlumeJava

Language-integrated distributed dataset API, but cannotshare datasets efficiently acrossqueries

Nectar [OSDI 10] Automatic expression caching, but over distributed FS

PacMan [NSDI 12] Memory cache for HDFS, but writes still go to network/disk


25/30


26/30

Behavior with Insufficient RAM

68.

8

58.1

40.7

29.7

11.5

0

20

40

60

80

100

0% 25% 50% 75% 100%

Iterationtim

e(s)

Percent of working set in memory


27/30

Scalability

184

111

76

116

80

62

15

6

3

0

50

100

150

200

250

25 50 100

Iterationtime(s

)

Number of machines

Hadoop

HadoopBinMem

Spark

274

157

106

197

121

87

143

61

33

0

50

100

150

200250

300

25 50 100

Iterationtime(s)

Number of machines

Hadoop

HadoopBinMem

Spark

Logistic Regression K-Means


28/30

Breaking Down the Speedup

15.4

13

.1

2

.9

8.4

6.9

2

.9

0

5

10

15

20

In-mem HDFS In-mem local file Spark RDD

Iterationtime(s)

Text Input

Binary Input


29/30

Spark Operations

Transformations

(define a new RDD)

mapfilter

sample

groupByKeyreduceByKey

sortByKey

flatMapunionjoin

cogroupcross

mapValues

Actions(return a result todriver program)

collect

reducecountsave

lookupKey


30/30

Task SchedulerDryad-like DAGs

Pipelines functions

within a stage

Locality & datareuse aware

Partitioning-awareto avoid shuffles

join

union

groupBy

map

Stage 3

Stage 1

Stage 2

A: B:

C: D:

E:

F:

G:

= cached data partition

cloud ppt.pptx

Documents