streams processing with storm
TRANSCRIPT
![Page 1: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/1.jpg)
Data streamsprocessing with
STORM
Mariusz Gil
![Page 2: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/2.jpg)
data expire fast. very fast
![Page 3: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/3.jpg)
![Page 4: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/4.jpg)
realtime processing?
![Page 5: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/5.jpg)
Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
![Page 6: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/6.jpg)
Storm is fast, a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
![Page 7: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/7.jpg)
concept architecture
![Page 8: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/8.jpg)
Stream
(val1, val2)(val3, val4)(val5, val6)
unbounded sequence of tuples
tuple
tuple
tuple
tuple
tuple
tuple
tuple
![Page 9: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/9.jpg)
Spoutssource of streams
tupletuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
![Page 10: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/10.jpg)
Reliable and unreliable Spoutsreplay or forget about touple
tupletuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
![Page 11: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/11.jpg)
Spoutssource of streams
tupletuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
Storm-Kafka
![Page 12: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/12.jpg)
Spoutssource of streams
tupletuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
Storm-Kestrel
![Page 13: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/13.jpg)
Spoutssource of streams
tupletuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
Storm-AMQP-Spout
![Page 14: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/14.jpg)
Spoutssource of streams
tupletuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
Storm-JMS
![Page 15: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/15.jpg)
Spoutssource of streams
tupletuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
Storm-PubSub*
![Page 16: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/16.jpg)
Spoutssource of streams
tupletuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
Storm-Beanstalkd-Spout
![Page 17: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/17.jpg)
Boltsprocess input streams and produce new streams
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
![Page 18: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/18.jpg)
Boltsprocess input streams and produce new streams
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tuple
tupl
etu
ple
tupl
etu
ple
tupl
etu
ple
tupl
etuple
tupletuple
tupletuple
tupletuple
![Page 19: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/19.jpg)
Topologiesnetwork of spouts and bolts
TextSpout SplitSentenceBolt WordCountBolt
[sentence] [word] [word, count]
![Page 20: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/20.jpg)
Topologiesnetwork of spouts and bolts
TextSpout SplitSentenceBolt
WordCountBolt
[sentence]
[word]
[word, count]
TextSpout SplitSentenceBolt
[sentence]
xyzBolt
![Page 21: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/21.jpg)
servers architecture
![Page 22: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/22.jpg)
Nimbusprocess responsible for distributing processing across the cluster
![Page 23: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/23.jpg)
Supervisorsworker process responsible for executing subset of topology
![Page 24: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/24.jpg)
zookeeperscoordination layer between Nimbus and Supervisors
![Page 25: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/25.jpg)
fastCLUSTER STATE IS STOREDLOCALLY OR IN ZOOKEEPERSfail
![Page 26: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/26.jpg)
sample code
![Page 27: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/27.jpg)
Spouts
public class RandomSentenceSpout extends BaseRichSpout { SpoutOutputCollector _collector; Random _rand;
@Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { _collector = collector; _rand = new Random(); }
@Override public void nextTuple() { Utils.sleep(100); String[] sentences = new String[] { "the cow jumped over the moon", "an apple a day keeps the doctor away", "four score and seven years ago", "snow white and the seven dwarfs", "i am at two with nature"}; String sentence = sentences[_rand.nextInt(sentences.length)]; _collector.emit(new Values(sentence)); }
@Override public void ack(Object id) { }
@Override public void fail(Object id) { }
@Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); }}
![Page 28: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/28.jpg)
Bolts
public static class WordCount extends BaseBasicBolt { Map<String, Integer> counts = new HashMap<String, Integer>();
@Override public void execute(Tuple tuple, BasicOutputCollector collector) { String word = tuple.getString(0); Integer count = counts.get(word);
if (count == null) count = 0; count++; counts.put(word, count); collector.emit(new Values(word, count)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count")); }}
![Page 29: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/29.jpg)
Bolts
public static class ExclamationBolt implements IRichBolt { OutputCollector _collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) { _collector = collector; }
public void execute(Tuple tuple) { _collector.emit(tuple, new Values(tuple.getString(0) + "!!!")); _collector.ack(tuple); }
public void cleanup() { }
public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } public Map getComponentConfiguration() { return null; }}
![Page 30: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/30.jpg)
Topology
public class WordCountTopology { public static void main(String[] args) throws Exception { TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
conf.setDebug(true);
if (args != null && args.length > 0) { conf.setNumWorkers(3);
StormSubmitter.submitTopology(args[0], conf, builder.createTopology()); } else { conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster(); cluster.submitTopology("word-count", conf, builder.createTopology());
Thread.sleep(10000);
cluster.shutdown(); } }}
![Page 31: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/31.jpg)
Bolts
public static class SplitSentence extends ShellBolt implements IRichBolt { public SplitSentence() { super("python", "splitsentence.py"); }
public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); }}
import storm
class SplitSentenceBolt(storm.BasicBolt): def process(self, tup): words = tup.values[0].split(" ") for word in words: storm.emit([word])
SplitSentenceBolt().run()
![Page 32: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/32.jpg)
github.com/nathanmarz/storm-starter
![Page 33: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/33.jpg)
streams groupping
![Page 34: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/34.jpg)
Topology
public class WordCountTopology { public static void main(String[] args) throws Exception { TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
conf.setDebug(true);
if (args != null && args.length > 0) { conf.setNumWorkers(3);
StormSubmitter.submitTopology(args[0], conf, builder.createTopology()); } else { conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster(); cluster.submitTopology("word-count", conf, builder.createTopology());
Thread.sleep(10000);
cluster.shutdown(); } }}
![Page 35: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/35.jpg)
Grouppingshuffle, fields, all, global, none, direct, local or shuffle
![Page 36: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/36.jpg)
distributed rpc
![Page 37: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/37.jpg)
RPCdistributed
arguments
results
[request-id, arguments]
[request-id, results]
![Page 38: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/38.jpg)
RPCdistributed
arguments
results
[request-id, arguments]
[request-id, results]public static class ExclaimBolt extends BaseBasicBolt { public void execute(Tuple tuple, BasicOutputCollector collector) { String input = tuple.getString(1); collector.emit(new Values(tuple.getValue(0), input + "!")); }
public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("id", "result")); }}
public static void main(String[] args) throws Exception { LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder("exclamation"); builder.addBolt(new ExclaimBolt(), 3);
LocalDRPC drpc = new LocalDRPC(); LocalCluster cluster = new LocalCluster();
cluster.submitTopology("drpc-demo", conf, builder.createLocalTopology(drpc));
System.out.println("Results for 'hello':" + drpc.execute("exclamation", "hello"));
cluster.shutdown(); drpc.shutdown();}
![Page 39: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/39.jpg)
realtime analytics
personalization
search
revenue
optimization
monitoring
![Page 40: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/40.jpg)
content search
realtime analytics
generating feeds
integrated with
elastic search,
Hbase,hadoop
and hdfs
![Page 41: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/41.jpg)
realtime scoring
moments generation
integrated with
kafka queues and
hdfs storage
![Page 42: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/42.jpg)
Storm-YARN enables
Storm applications to
utilize the
computational
resources in a Hadoop
cluster along with
accessing Hadoop
storage resources
such As HBase and
HDFS
![Page 43: Streams processing with Storm](https://reader033.vdocument.in/reader033/viewer/2022042607/554a0f65b4c9055c598b4a1d/html5/thumbnails/43.jpg)
thanks!mail: [email protected]: @mariuszgil