stream processing with spark and storm: couchbase connect 2015

32
STREAM PROCESSING WITH STORM AND SPARK STREAMING Shane Johnson, Couchbase

Upload: couchbase

Post on 26-Jul-2015

151 views

Category:

Technology


1 download

TRANSCRIPT

STREAM PROCESSING WITH STORM AND

SPARK STREAMINGShane Johnson, Couchbase

©2015 Couchbase Inc. 2

Stream Processing with Storm and Spark

Agenda

MapReduce Directed Acyclic

Graphs Storm

Spark Streaming Enter Couchbase

Server Customer Example

©2015 Couchbase Inc. 3

Stream Processing with Storm and Spark

MapReduce

Log Message

Log Message

Log Message

Map

Warn

Info

Info

Shuffle

Warn

Error

Info

Info Info - 2

Warn- 1

Reduce

©2015 Couchbase Inc. 4

Stream Processing with Storm and Spark

Directed Acyclic Graph

Log Message

Extract Level

Count Info

Count Warn

Count Error

Get Counts

©2015 Couchbase Inc. 5

Stream Processing with Storm and Spark

Log Message

Log Message

Log Message

Batch versus Stream

Log Message

Log Message

Log Message

Analyze

Log Message

Log Message

Log Message

©2015 Couchbase Inc. 6

Stream Processing with Storm and Spark

Batch versus Stream

Analyze

LogMessag

e

LogMessag

e

LogMessag

e

LogMessag

e

LogMessag

e

©2015 Couchbase Inc. 7

Stream Processing with Storm and Spark

Storm

Apache Software Foundation

Open Sourced by Twitter Analyze Tweets

“Trends”

Distributed Real-Time Continuous

©2015 Couchbase Inc. 8

Stream Processing with Storm and Spark

Terminology

Tuple – An immutable set of key/value pairs name=shane, company=couchbase

Stream – An unbounded sequence of tuples person, person, person, person, person…

©2015 Couchbase Inc. 9

Stream Processing with Storm and Spark

Terminology

Spout – A source of data for the stream Pulls data from somewhere (e.g. message queue) Pushes tuples into a stream

Bolt – Processes a stream of tuples Consume Multiple Streams, Produce Multiple Streams Do something (e.g. filter, aggregate, persist)

©2015 Couchbase Inc. 10

Stream Processing with Storm and Spark

TopologyA set of spouts and bolts

Src Spout

Bolt

Bolt

Bolt Dest

©2015 Couchbase Inc. 11

Stream Processing with Storm and Spark

GroupingA bolt / spout is executed as parallel tasks

Bolt

Task

Task

TaskWhich task processesthe tuple?

©2015 Couchbase Inc. 12

Stream Processing with Storm and Spark

Grouping

Shuffle – Send tuples to random tasks Field – Send tuples to tasks based on the value

of a field All tuples with the same field value are sent to the same

task All – Send tuples to all tasks Global – Send tuples to the same task

©2015 Couchbase Inc. 13

Stream Processing with Storm and Spark

Streams & Events

Topology

LogMessag

eTuple Output

LogMessag

eOutput

©2015 Couchbase Inc. 14

Stream Processing with Storm and Spark

Spark Streaming

Apache Software Foundation

Open Sourced by UC Berkeley AMPLab

Spark Spark Core Spark Streaming Spark SQL

Distributed Real-Time Continuous

©2015 Couchbase Inc. 15

Stream Processing with Storm and Spark

Resilient Distributed Datasets (RDD)

“an immutable, partitioned collection of elementsthat can be operated on in parallel”

Read-Only, Partitioned Create RDDs by Transforming Them Lineage – Rebuild RDD from Previous RDDs

©2015 Couchbase Inc. 16

Stream Processing with Storm and Spark

Terminology

Input DStream – Source of Input Data Receiver - Pulls data from somewhere (e.g. message

queue) Creates a stream of really small RDDs

Discretized Stream (DStream) – Stream of RDDs Transform RDDs

Do something (e.g. filter, aggregate, persist) Streams Create Streams

©2015 Couchbase Inc. 17

Stream Processing with Storm and Spark

Streams & RDDsMicro-Batching

LogLog

Log

Log

Log

RDD

LogLog

RDD

Input Stream DStream

©2015 Couchbase Inc. 18

Stream Processing with Storm and Spark

Transformations

map, flatmap filter repartition, union

join cogroup, transform window

©2015 Couchbase Inc. 19

Stream Processing with Storm and Spark

Streams Create Streams

RDD – T1RDD – T2RDD – T3

RDDX – T1RDDX – T2RDDX – T3

RDDY – T1RDDY – T2RDDY – T3

Filter

Count

©2015 Couchbase Inc. 20

Stream Processing with Storm and Spark

Enter Couchbase Server

Pipeline

Stream Processor

? ?

Source Dest

©2015 Couchbase Inc. 21

Stream Processing with Storm and Spark

Enter Couchbase Server

Pipeline

Stream ProcessorSource Dest

KafkaCouchbas

eServer

©2015 Couchbase Inc. 22

Stream Processing with Storm and Spark

Enter Couchbase Server

Pipeline

Stream ProcessorSource DestCouchbas

eServer

Couchbase

Server

Stream Processing at LivePerson

©2015 Couchbase Inc. 24

Stream Processing with Storm and Spark

LiveEngage

Chat

Personalized Messages

Personalized

Offers

©2015 Couchbase Inc. 25

Stream Processing with Storm and Spark

• Identify visitor behaviors and patterns• Predict likelihood to buy• Identify intent• Provide targeted, personalized content

• Provide satisfaction and conversion metrics

• Engage visitors when necessary• Showing hesitation or signs of

abandonment

©2015 Couchbase Inc. 26

Stream Processing with Storm and Spark

Real Time Web Analytics

©2015 Couchbase Inc. 27

Stream Processing with Storm and Spark

22+ MInteractions

2+ BSessions

13+ TBData

Per Month

©2015 Couchbase Inc. 28

Stream Processing with Storm and Spark

Customer AgentClickstream / Chat

Visitor Feed

Ingest Process Access

©2015 Couchbase Inc. 29

Stream Processing with Storm and Spark

HADOOPSTORM COUCHBASESERVER

KAFKA

©2015 Couchbase Inc. 30

Stream Processing with Storm and Spark

PROCESS ACCESS

STORE ANALYZE REPORT

MONITOR

CHAT

BATCH

REAL TIME

DASHBOARD

©2015 Couchbase Inc. 31

Chicago Style@shane_dev

Thank you.