storm distributed cache workshop
TRANSCRIPT
Storm Distributed Cache WorkshopHow to efficiently distribute mutable BLOBs into Apache Storm
Problem (Apache Storm < v1.x)
Topology Resources:
● Dictionaries, ML Models, Geolocation Data, etc...
Typically packaged in topology JAR:
● Immutable: Any change require re-packaging & deployment
● Fine for small files
● Large files negatively impact on topology startup time
Solution (Apache Storm v1.x)
Storm Distributed Cache:
● Allows sharing of files (BLOBs) among topologies
● Files can change over the lifetime of the topology
● Files can be updated from command line or programmatically
● Allows for files from several KB to several GB in size
● Allows for compression(e.g. Zip, Tar, Gzip)
Storm Distributed Cache
Two Implementations:
● LocalFSBlobStore:
○ Stores data on Nimbus local file system
○ Supports Replication Factor (not needed for HDFS-backed implementation)
● HdfsBlobStore:
● Stores data on HDFS file system
Nimbus in High Availability
Nimbus in High Availability
HA Nimbus:
● Increase overall availability on Nimbus
● Nimbus hosts can join/leave at any time
● Leverages Distributed Cache API
● JAR, Config and Serialized Topology uploaded to Distr. Cache
● Replication guarantees availability of all files
Storm Distributed Cache (Create)
Storm Distributed Cache (Submit)
Storm Distributed Cache (Update)
It is possible for the cached files to be updated while topologies are running. In the current
versions it is the user’s responsibility to check whether a new file is available
Storm Distributed Cache (Reading BLOBs)
Hands-On
Intrastructure
+
Twitter producer
Apache Kafka
Aggregate
(WordCount)
+DistCache
Topology
Storm DistCache Topology
Kafka Spout
Storm Distributed Cache
+
wordsToTrack.list
Apache Kafka
Sentence
SplitterCounter
Aggregate
(WordCount)
Tick
Stream
(Signal)
Example
Checkout project:
● https://github.com/rrafanell/storm-distcache-example
● Follow the steps described in the README
Requirements:
● Java Oracle JDK 1.8 or similar
● Maven
● Docker
Code Inspection
Example (Starting the Infrastructure)
Storm UI: http://localhost:8080
Example (Configuring The Twitter-producer)
Example (Running The Twitter-producer)
Example (Uploading BLOBs)
Example (Checking BLOBs)
Example (Running the Topology)
Example (Running the Topology)
Example (Updating the BLOBs & reloading on-the-fly)
Example (Shutting down the Infrastructure)
Storm Distributed Cache Workshop
THANK YOU!
Local FS Blob Store