stream processing with spark and storm: couchbase connect 2015
TRANSCRIPT
©2015 Couchbase Inc. 2
Stream Processing with Storm and Spark
Agenda
MapReduce Directed Acyclic
Graphs Storm
Spark Streaming Enter Couchbase
Server Customer Example
©2015 Couchbase Inc. 3
Stream Processing with Storm and Spark
MapReduce
Log Message
Log Message
Log Message
Map
Warn
Info
Info
Shuffle
Warn
Error
Info
Info Info - 2
Warn- 1
Reduce
©2015 Couchbase Inc. 4
Stream Processing with Storm and Spark
Directed Acyclic Graph
Log Message
Extract Level
Count Info
Count Warn
Count Error
Get Counts
©2015 Couchbase Inc. 5
Stream Processing with Storm and Spark
Log Message
Log Message
Log Message
Batch versus Stream
Log Message
Log Message
Log Message
Analyze
Log Message
Log Message
Log Message
©2015 Couchbase Inc. 6
Stream Processing with Storm and Spark
Batch versus Stream
Analyze
LogMessag
e
LogMessag
e
LogMessag
e
LogMessag
e
LogMessag
e
©2015 Couchbase Inc. 7
Stream Processing with Storm and Spark
Storm
Apache Software Foundation
Open Sourced by Twitter Analyze Tweets
“Trends”
Distributed Real-Time Continuous
©2015 Couchbase Inc. 8
Stream Processing with Storm and Spark
Terminology
Tuple – An immutable set of key/value pairs name=shane, company=couchbase
Stream – An unbounded sequence of tuples person, person, person, person, person…
©2015 Couchbase Inc. 9
Stream Processing with Storm and Spark
Terminology
Spout – A source of data for the stream Pulls data from somewhere (e.g. message queue) Pushes tuples into a stream
Bolt – Processes a stream of tuples Consume Multiple Streams, Produce Multiple Streams Do something (e.g. filter, aggregate, persist)
©2015 Couchbase Inc. 10
Stream Processing with Storm and Spark
TopologyA set of spouts and bolts
Src Spout
Bolt
Bolt
Bolt Dest
©2015 Couchbase Inc. 11
Stream Processing with Storm and Spark
GroupingA bolt / spout is executed as parallel tasks
Bolt
Task
Task
TaskWhich task processesthe tuple?
©2015 Couchbase Inc. 12
Stream Processing with Storm and Spark
Grouping
Shuffle – Send tuples to random tasks Field – Send tuples to tasks based on the value
of a field All tuples with the same field value are sent to the same
task All – Send tuples to all tasks Global – Send tuples to the same task
©2015 Couchbase Inc. 13
Stream Processing with Storm and Spark
Streams & Events
Topology
LogMessag
eTuple Output
LogMessag
eOutput
©2015 Couchbase Inc. 14
Stream Processing with Storm and Spark
Spark Streaming
Apache Software Foundation
Open Sourced by UC Berkeley AMPLab
Spark Spark Core Spark Streaming Spark SQL
Distributed Real-Time Continuous
©2015 Couchbase Inc. 15
Stream Processing with Storm and Spark
Resilient Distributed Datasets (RDD)
“an immutable, partitioned collection of elementsthat can be operated on in parallel”
Read-Only, Partitioned Create RDDs by Transforming Them Lineage – Rebuild RDD from Previous RDDs
©2015 Couchbase Inc. 16
Stream Processing with Storm and Spark
Terminology
Input DStream – Source of Input Data Receiver - Pulls data from somewhere (e.g. message
queue) Creates a stream of really small RDDs
Discretized Stream (DStream) – Stream of RDDs Transform RDDs
Do something (e.g. filter, aggregate, persist) Streams Create Streams
©2015 Couchbase Inc. 17
Stream Processing with Storm and Spark
Streams & RDDsMicro-Batching
LogLog
Log
Log
Log
RDD
LogLog
RDD
Input Stream DStream
©2015 Couchbase Inc. 18
Stream Processing with Storm and Spark
Transformations
map, flatmap filter repartition, union
join cogroup, transform window
©2015 Couchbase Inc. 19
Stream Processing with Storm and Spark
Streams Create Streams
RDD – T1RDD – T2RDD – T3
RDDX – T1RDDX – T2RDDX – T3
RDDY – T1RDDY – T2RDDY – T3
Filter
Count
©2015 Couchbase Inc. 20
Stream Processing with Storm and Spark
Enter Couchbase Server
Pipeline
Stream Processor
? ?
Source Dest
©2015 Couchbase Inc. 21
Stream Processing with Storm and Spark
Enter Couchbase Server
Pipeline
Stream ProcessorSource Dest
KafkaCouchbas
eServer
©2015 Couchbase Inc. 22
Stream Processing with Storm and Spark
Enter Couchbase Server
Pipeline
Stream ProcessorSource DestCouchbas
eServer
Couchbase
Server
©2015 Couchbase Inc. 24
Stream Processing with Storm and Spark
LiveEngage
Chat
Personalized Messages
Personalized
Offers
©2015 Couchbase Inc. 25
Stream Processing with Storm and Spark
• Identify visitor behaviors and patterns• Predict likelihood to buy• Identify intent• Provide targeted, personalized content
• Provide satisfaction and conversion metrics
• Engage visitors when necessary• Showing hesitation or signs of
abandonment
©2015 Couchbase Inc. 27
Stream Processing with Storm and Spark
22+ MInteractions
2+ BSessions
13+ TBData
Per Month
©2015 Couchbase Inc. 28
Stream Processing with Storm and Spark
Customer AgentClickstream / Chat
Visitor Feed
Ingest Process Access
©2015 Couchbase Inc. 30
Stream Processing with Storm and Spark
PROCESS ACCESS
STORE ANALYZE REPORT
MONITOR
CHAT
BATCH
REAL TIME
DASHBOARD