[spark meetup] spark streaming overview

Post on 14-Jul-2015

1.068 Views

Category:

Technology

12 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SparkSQL

SparkStreaming

MLlib(machine learning)

GraphX(graph)

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

• Kafka provides seamless integration between information of producers and consumers without blocking the producers of the information, and without letting producers know who the final consumers are.

• Each consumer keeps control of its own offset (read)

• On demand topic creation

SPARK STREAMING OVERVIEW

• ETL and ELT, wide catalog of sources and sinks

• Flexible design of topologies and agent deployment strategies.

• Data transformation, thanks to interceptors.

SPARK STREAMING OVERVIEW

readClobreadCSVreadLinereadMultiLinereadAvroreadJson

addCurrentTimeaddLocalHostgeoIPfindReplaceSplit

generateUUIDdecompressIfextractJsonPathsdetectMimeType

xqueryextractURIComponentsxsltGrok (regular expressions)

exec

spooling

logger

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

CASSANDRA

Kafka

STRATIO DEEP

STRATIO DEEP

SPARK STREAMING OVERVIEW

Shark(SQL)

SparkStreaming

Mllib(machine learning)

GraphX(graph)

SPARK STREAMING OVERVIEW

RDD, what is that?

SPARK STREAMING OVERVIEW

RDD, what is that?

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

?SPARK STREAMING OVERVIEW

Spark Streaming: Overall view

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

Spark Streaming: Overall view

Discretized Stream or DStream.

SPARK STREAMING OVERVIEW

Discretized Stream or DStream.

SPARK STREAMING OVERVIEW

Discretized Stream or DStream.

SPARK STREAMING OVERVIEW

Overall view

SPARK STREAMING OVERVIEW

Input DStreams and Receivers.

• Basic (distributed with Spark Streaming).

• Advanced (available as dependency).

SPARK STREAMING OVERVIEW

Basic sources

• File Stream.

• Sockets.

• Actors (Akka).

• Queue RDDs (Testing).

SPARK STREAMING OVERVIEW

Advanced sources

SPARK STREAMING OVERVIEW

Do It Yourself

• Code onStart()

• Code onStop()

• Code receive()

• Custom Receiver ready!

SPARK STREAMING OVERVIEW

• map(func), flatMap(func), filter(func), count()

• repartition(numPartitions)

• union(otherStream)

• reduce(func),countByValue(), reduceByKey(func, [numTasks])

• join(otherStream, [numTasks]), cogroup(otherStream, [numTasks])

• transform(func)

• updateStateByKey(func)

• window(windowLength, slideInterval)

• countByWindow(windowLength, slideInterval)

• reduceByWindow(func, windowLength, slideInterval)

• reduceByKeyAndWindow(func, windowLength, slideInterval, [numTasks])

• countByValueAndWindow(windowLength, slideInterval, [numTasks])

• print()

• foreachRDD(func)

• saveAsObjectFiles(prefix, [suffix])

• saveAsTextFiles(prefix, [suffix])

• saveAsHadoopFiles(prefix, [suffix])

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

• Stateful transformations (updateStateByKey, reduceByKeyAndWindow).

• As fault-tolerance mechanism, when driver crashes.

HDFS is mandatory if you are going to use operations that requires checkpointing.

SPARK STREAMING OVERVIEW

Configuration parameters

• spark.streaming.receiver.maxRate

• spark.streaming.concurrentJobs

• spark.streaming.receiver.writeAheadLogs.enable

• spark.streaming.unpersist

SPARK STREAMING OVERVIEW

each node has mutable state and for each record they have to update state & send new records

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

SPARK STREAMING OVERVIEW

top related