storm distributed cache workshop

26
Storm Distributed Cache Workshop How to efficiently distribute mutable BLOBs into Apache Storm

Upload: roger-rafanell-mas

Post on 22-Jan-2018

73 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Storm distributed cache workshop

Storm Distributed Cache WorkshopHow to efficiently distribute mutable BLOBs into Apache Storm

Page 2: Storm distributed cache workshop

Problem (Apache Storm < v1.x)

Topology Resources:

● Dictionaries, ML Models, Geolocation Data, etc...

Typically packaged in topology JAR:

● Immutable: Any change require re-packaging & deployment

● Fine for small files

● Large files negatively impact on topology startup time

Page 3: Storm distributed cache workshop

Solution (Apache Storm v1.x)

Storm Distributed Cache:

● Allows sharing of files (BLOBs) among topologies

● Files can change over the lifetime of the topology

● Files can be updated from command line or programmatically

● Allows for files from several KB to several GB in size

● Allows for compression(e.g. Zip, Tar, Gzip)

Page 4: Storm distributed cache workshop

Storm Distributed Cache

Two Implementations:

● LocalFSBlobStore:

○ Stores data on Nimbus local file system

○ Supports Replication Factor (not needed for HDFS-backed implementation)

● HdfsBlobStore:

● Stores data on HDFS file system

Page 5: Storm distributed cache workshop

Nimbus in High Availability

Page 6: Storm distributed cache workshop

Nimbus in High Availability

HA Nimbus:

● Increase overall availability on Nimbus

● Nimbus hosts can join/leave at any time

● Leverages Distributed Cache API

● JAR, Config and Serialized Topology uploaded to Distr. Cache

● Replication guarantees availability of all files

Page 7: Storm distributed cache workshop

Storm Distributed Cache (Create)

Page 8: Storm distributed cache workshop

Storm Distributed Cache (Submit)

Page 9: Storm distributed cache workshop

Storm Distributed Cache (Update)

It is possible for the cached files to be updated while topologies are running. In the current

versions it is the user’s responsibility to check whether a new file is available

Page 10: Storm distributed cache workshop

Storm Distributed Cache (Reading BLOBs)

Page 11: Storm distributed cache workshop

Hands-On

Page 12: Storm distributed cache workshop

Intrastructure

+

Twitter producer

Apache Kafka

Aggregate

(WordCount)

+DistCache

Topology

Page 13: Storm distributed cache workshop

Storm DistCache Topology

Kafka Spout

Storm Distributed Cache

+

wordsToTrack.list

Apache Kafka

Sentence

SplitterCounter

Aggregate

(WordCount)

Tick

Stream

(Signal)

Page 14: Storm distributed cache workshop

Example

Checkout project:

● https://github.com/rrafanell/storm-distcache-example

● Follow the steps described in the README

Requirements:

● Java Oracle JDK 1.8 or similar

● Maven

● Docker

Page 15: Storm distributed cache workshop

Code Inspection

Page 16: Storm distributed cache workshop

Example (Starting the Infrastructure)

Storm UI: http://localhost:8080

Page 17: Storm distributed cache workshop

Example (Configuring The Twitter-producer)

Page 18: Storm distributed cache workshop

Example (Running The Twitter-producer)

Page 19: Storm distributed cache workshop

Example (Uploading BLOBs)

Page 20: Storm distributed cache workshop

Example (Checking BLOBs)

Page 21: Storm distributed cache workshop

Example (Running the Topology)

Page 22: Storm distributed cache workshop

Example (Running the Topology)

Page 23: Storm distributed cache workshop

Example (Updating the BLOBs & reloading on-the-fly)

Page 24: Storm distributed cache workshop

Example (Shutting down the Infrastructure)

Page 25: Storm distributed cache workshop

Storm Distributed Cache Workshop

THANK YOU!

Page 26: Storm distributed cache workshop

Local FS Blob Store