sparkstreaming, - uc berkeley amp campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · •...
TRANSCRIPT
![Page 1: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/1.jpg)
SparkStreaming Large scale near-‐realtime stream processing
Tathagata Das (TD) UC Berkeley UC BERKELEY
![Page 2: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/2.jpg)
Motivation • Many important applications must process large data streams at second-‐scale latencies – Site statistics, intrusion detection, spam filtering,...
• Scaling these apps to 100s of nodes, require … – Fault-‐tolerance: for both crashes and stragglers – Efficiency: for being cost-‐effective
• Also would like to have … – Simple programming model – Integration with batch + ad hoc queries
Current streaming frameworks don’t meet
both goals together
![Page 3: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/3.jpg)
Traditional Streaming Systems
• Record-‐at-‐a-‐time processing model – Each node has mutable state – For each record, update state & send new records
mutable state
node 1
node 3
input records push
node 2 input records
![Page 4: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/4.jpg)
Traditional Streaming Systems
Fault tolerance via replication or upstream backup
node 1 node 3
node 2
node 1’ node 3’
node 2’
synchronization
input
input
node 1 node 3
node 2
standby
input
input
Replication Upstream backup
Fast recovery, but 2x hardware cost
Only need 1 standby, but slow to recover
![Page 5: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/5.jpg)
Traditional Streaming Systems
node 1 node 3
node 2
node 1’ node 3’
node 2’
synchronization
input
input
node 1 node 3
node 2
standby
input
input
Replication Upstream backup
Neither handle stragglers
Fault tolerance via replication or upstream backup
![Page 6: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/6.jpg)
Observation
• Batch processing models, like MapReduce, do provide fault tolerance efficiently – Divide job into deterministic tasks – Rerun failed/slow tasks in parallel on other nodes
![Page 7: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/7.jpg)
Idea
• Idea: run a streaming computation as a series of very small, deterministic batch jobs – Eg. Process stream of tweets in 1 sec batches – Same recovery schemes at smaller timescale
• Try to make batch size as small as possible – Lower batch size à lower end-‐to-‐end latency
• State between batches kept in memory – Deterministic stateful ops à fault-‐tolerance
![Page 8: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/8.jpg)
Discretized Stream Processing
time = 0 - 1:
time = 1 - 2:
batch operations input
input
immutable dataset (stored reliably)
immutable dataset (output or state); stored in memory
as Spark RDD
input stream state stream
…
…
…
![Page 9: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/9.jpg)
Fault Recovery
• All dataset modeled as RDDs with dependency graph à fault-‐tolerant with full replication
• Fault/straggler recovery is done in parallel on other nodes à fast recovery
map
input dataset
Fast recovery without the cost of full replication
output dataset
![Page 10: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/10.jpg)
How Fast Can It Go?
• Prototype can process 4 GB/s (40M records/s) of data on 100 nodes at sub-‐second latency
Max throughput with a given latency bound (1 or 2s)
0
1
2
3
4
5
0 50 100
TopKWords
0
1
2
3
4
5
0 50 100
Clu
ster
Thr
ough
put
(GB/
s)
Grep
1 sec 2 sec
Number of nodes Number of nodes
![Page 11: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/11.jpg)
Comparison with Storm
• Storm limited to 10,000 records/s/node
• Also tried Apache S4: 7000 records/s/node
• Commercial systems report O(100K)
0
20
40
60
100 1000 10000
Gre
p Th
roug
hput
(M
B/s/
node
)
Record Size (bytes)
Spark Storm
0
10
20
30
100 1000 10000
TopK
Thr
ough
put
(MB/
s/no
de)
Record Size (bytes)
Spark Storm
![Page 12: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/12.jpg)
How Fast Can It Recover?
• Recovers from faults/stragglers within 1 sec
Failure Happens
0.0
0.5
1.0
1.5
2.0
0 15 30 45 60 75
Inte
rval
Pro
cess
ing
Tim
e (s
)
Time (s)
Sliding WordCount on 10 nodes with 30s checkpoint interval
![Page 13: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/13.jpg)
Programming Interface
• A Discretized Stream or DStream is a sequence of RDDs – Represents a stream of data – API very similar to RDDs
• DStreams can be created… – Either from live streaming data – Or by transforming other DStreams
![Page 14: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/14.jpg)
DStream Operators
• Transformations – Build new streams from existing streams – Existing RDD ops (map, etc) + new “stateful” ops
• Output operators – Send data to outside world (save results to external storage, print to screen, etc)
![Page 15: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/15.jpg)
Example 1 Count the words received every second
!words = createNetworkStream("http://...”) !
counts = words.count() !DStreams
transformation
time = 0 - 1:
time = 1 - 2:
time = 2 - 3:
words counts
count
count
count
= RDD
![Page 16: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/16.jpg)
Example 2 Count frequency of words received every second !
words = createNetworkStream("http://...”) !
ones = words.map(w => (w, 1)) !
freqs = ones.reduceByKey(_ + _) !
Scala function literal
time = 0 - 1:
time = 1 - 2:
time = 2 - 3:
words ones freqs
map reduce
![Page 17: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/17.jpg)
Example 2: Full code // Create the context and set the batch size !val ssc = new SparkStreamContext("local", "test") !ssc.setBatchDuration(Seconds(1)) !!!// Process a network stream !val words = ssc.createNetworkStream("http://...") !val ones = words.map(w => (w, 1)) !val freqs = ones.reduceByKey(_ + _) !freqs.print() !!!// Start the stream computation !ssc.run!
![Page 18: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/18.jpg)
Example 3 Count frequency of words received in last minute
ones = words.map(w => (w, 1)) !
freqs = ones.reduceByKey(_ + _) !
freqs_60s = freqs.window(Seconds(60), Second(1)) !" .reduceByKey(_ + _) !
!window length window movement
sliding window operator
freqs
time = 0 - 1:
time = 1 - 2:
time = 2 - 3:
words ones
map reduce freqs_60s window reduce
![Page 19: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/19.jpg)
freqs = ones.reduceByKey(_ + _) !
freqs_60s = freqs.window(Seconds(60), Second(1)) !" .reduceByKey(_ + _) !
!!
!
freqs = ones.reduceByKeyAndWindow(_ + _, Seconds(60), Seconds(1)) !
Simpler window-‐based reduce
![Page 20: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/20.jpg)
“Incremental” window operators words freqs freqs_60s
t-1
t
t+1
t+2
t+3
t+4 +
Aggregation function
words freqs freqs_60s
t-1
t
t+1
t+2
t+3
t+4 +
+
–
Invertible aggregation function
![Page 21: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/21.jpg)
freqs = ones.reduceByKey(_ + _) !
freqs_60s = freqs.window(Seconds(60), Second(1)) !" .reduceByKey(_ + _) !
!!
!
freqs = ones.reduceByKeyAndWindow(_ + _, Seconds(60), Seconds(1)) !
!
!freqs = ones.reduceByKeyAndWindow( !
" " " " "_ + _, _ - _, Seconds(60), Seconds(1)) !
!
Smarter window-‐based reduce
![Page 22: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/22.jpg)
Output Operators
• save: write results to any Hadoop-‐compatible storage system (e.g. HDFS, HBase)
• foreachRDD: run a Spark function on each RDD
freqs.save(“hdfs://...”)
freqs.foreachRDD(freqsRDD => { !"// any Spark/Scala processing, maybe save to database !
})
![Page 23: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/23.jpg)
Live + Batch + Interactive
• Combining DStreams with historical datasets ! freqs.join(oldFreqs).map(...) !
• Interactive queries on stream state from the Spark interpreter
! freqs.slice(“21:00”, “21:05”).topK(10)
![Page 24: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/24.jpg)
One stack to rule them all
• The promise of a unified data analytics stack – Write algorithms only once – Cuts complexity of maintaining separate stacks for live and batch processing
– Query live stream state instead of waiting for import
• Feedback very exciting
• Some recent experiences …
![Page 25: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/25.jpg)
Implementation on Spark
• Optimizations on current Spark – Optimized scheduling for < 100ms tasks – New block store with fast NIO communication – Pipelining of jobs from different time intervals
• Changes already in dev branch of Spark on http://www.github.com/mesos/spark
• An alpha will be released with Spark 0.6 soon
![Page 26: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/26.jpg)
More Details
• You can find more about SparkStreaming in our paper: http://tinyurl.com/dstreams
Thank you!
![Page 27: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/27.jpg)
Fault Recovery
1.47 1.66 1.75
2.31 2.64
2.03
0.0 0.5 1.0 1.5 2.0 2.5 3.0
WordCount, 30s checkpoints
WordMax, no checkpoints
WordMax, 10s checkpoints
Inte
rval
Pro
cess
ing
TIm
e (s
) Before Failure
At Time of Failure
0.79 0.66
1.09 1.08
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4
WordCount Grep
Inte
rval
Pro
cess
ing
Tim
e (s
)
No straggler
Straggler, with speculation
Failures:
Stragglers:
![Page 28: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/28.jpg)
Interactive Ad-‐Hoc Queries
![Page 29: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/29.jpg)
Related Work
• Bulk incremental processing (CBP, Comet) – Periodic (~5 min) batch jobs on Hadoop/Dryad – On-‐disk, replicated FS for storage instead of RDDs
• Hadoop Online – Does not recover stateful ops or allow multi-‐stage jobs
• Streaming databases – Record-‐at-‐a-‐time processing, generally replication for FT
• Approximate query processing, load shedding – Do not support the loss of arbitrary nodes – Different math because drop rate is known exactly
• Parallel recovery (MapReduce, GFS, RAMCloud, etc)
![Page 30: SparkStreaming, - UC Berkeley AMP Campampcamp.berkeley.edu/.../2012/...spark-streaming.pdf · • Also,tried,Apache,S4:,7000,records/s/node, • Commercial,systems,report,O(100K),](https://reader035.vdocument.in/reader035/viewer/2022070809/5f07f1d97e708231d41f8c05/html5/thumbnails/30.jpg)
Timing Considerations
• D-‐streams group input into intervals based on when records arrive at the system
• For apps that need to group by an “external” time and tolerate network delays, support: – Slack time: delay starting a batch for a short fixed time to give records a chance to arrive
– Application-‐level correction: e.g. give a result for time t at time t+1, then use later records to update incrementally at time t+5