storm - nodebb · distributed realtime computation system originated at backtype/twitter, open...

64
Storm Hui Li 08/15/2016

Upload: others

Post on 15-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

StormHui Li

08/15/2016

Page 2: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Storm 介绍及特点

► Storm 核心概念

► Storm 系统架构

► Storm 使用

► Storm 应用开发

Storm 大纲

Page 3: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Storm 介绍及特点

► Storm 核心概念

► Storm 系统架构

► Storm 使用

► Storm 应用开发

Storm 大纲

Page 4: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Distributed realtime computation system

► Originated at BackType/Twitter, open sourced in late 2011

► Implemented in Clojure, some Java

► Top-level-project, ~141 contributors

Storm 介绍

Page 5: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Reliable、Guaranteed data processing► At-most-once & At-least-once & Exactly-once(Trident)

► Scalable► Thousands of worker per cluster

► Fault-tolerance► Failure is expected, and embraced

► Fast► clocked at 1M+ messages per second per node

Storm 特点

Page 6: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Realtime analytics

► Online machine learning

► Continuous computation

► Distributed RPC

► ETL

Storm Use Cases

Page 7: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Twitter: personalization, search, revenue optimization, …► 200 nodes, 30 topos, 50B msg/day, avg latency <50ms, Jun 2013

► Yahoo: user events, content feeds, and application logs ► 320 nodes (YARN), 130k msg/s, June 2013

► Spotify: recommendation, ads, monitoring, …► v0.8.0, 22 nodes, 15+ topos, 200k msg/s, Mar 2014

► Alibaba, Cisco, Flickr, PARC, WeatherChannel, …► Netflix is looking at Storm and Samza, too.

Storm Adoptions

Page 8: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Storm 介绍及特点

► Storm 核心概念

► Storm 系统架构

► Storm 使用

► Storm 应用开发

Storm 大纲

Page 9: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Core Unit of Data

► Immutable Set of Key/Value Pairs

Tuple

Page 10: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Unbounded Sequence of Tuples

Streams

Page 11: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Source of Streams

► Wraps a streaming data source and emits Tuples

► Eg: read from Kafka or read from Redis

Spouts

Page 12: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Processes input streams and produces new streams

► Core functions of a streaming computation

Bolts

Page 13: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Functions

► Filters

► Aggregation

► Joins

► Talk to databases

Bolts

Page 14: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Network(DAG) of spouts and bolts

► Data Flow Representation

Topology

Page 15: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Spouts and bolts execute as many tasks across the cluster

Tasks

Page 16: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Determine how Storm routes Tuples between tasks in a topology

Stream Grouping

Page 17: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Shuffle grouping► Randomized round-robin

► Local or shuffle grouping► Randomized round-robin

► With a preference for intra-worker tasks

Stream Grouping

Page 18: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Fields grouping► Mod hashing on a subset of tuple fields

► Ensures all tuples with the same field values are always routed to the same task.

► Partial Key grouping► Like the Fields grouping, but are load balanced between two

downstream bolts► Provides better utilization of resources for skewed incoming data

Stream Grouping

Page 19: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► All grouping► Send to all tasks

► Global grouping► Pick task with lowest id

► None grouping

► Currently, equivalent to shuffle groupings

Stream Grouping

Page 20: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Direct grouping► The producer of the tuple decides which task of the

consumer will receive this tuple.

► Direct groupings can only be declared on streams that have been declared as direct streams.

Stream Grouping

Page 21: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Storm 介绍及特点

► Storm 核心概念

► Storm 系统架构

► Storm 使用

► Storm 应用开发

Storm 大纲

Page 22: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

Storm 架构

Topology Nimbus

Zookeeper Zookeeper Zookeeper

Supervisor Supervisor Supervisor

Workers Workers Workers

Page 23: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Storm 介绍及特点

► Storm 核心概念

► Storm 系统架构

► Storm 使用

► Storm 应用开发

Storm 大纲

Page 24: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Storm Deployment

► Command Line Client

► REST API

► Storm UI

Storm 使用

Page 25: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► 1. Set up a Zookeeper cluster

► For demo: storm dev-zookeeper

► 2. Install dependencies on Nimbus and worker machines

► Java 7

► Python 2.6.6

► Optional

► Configure PATH & JAVA_HOME environment

Storm Deployment

Page 26: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► 3. Download and extract a Storm release to Nimbus and

worker machines

► sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/

► 4. Fill in mandatory configurations into storm.yaml

► storm.zookeeper.servers

► nimbus.seeds

► supervisor.slots.ports

► storm.local.dir

Storm Deployment

Page 27: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► 5. Launch daemons under supervision using "storm" script

and a supervisor of your choice

► storm nimbus

► storm supervisor

► Optional

► storm ui

► storm drpc

► storm logviewer

Storm Deployment

Page 28: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► storm jar topology-jar-path class ...

► storm list

► storm deactivate topology-name

► storm activate topology-name

► storm rebalance topology-name [-w wait-time-secs] [-n new-num-workers] [-e component=parallelism]*

► storm get-errors topology-name

► storm kill topology-name [-w wait-time-secs]

Command Line Client -- Toplogy Related

Page 29: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► storm nimbus

► storm supervisor

► storm ui

► storm drpc

► storm logviewer

► storm pacemaker

Command Line Client -- Daemon Related

Page 30: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► storm classpath

► storm localconfvalue conf-name

► ~/.storm/storm.yaml + defaults.yaml

► storm remoteconfvalue conf-name

► $STORM-PATH/conf/storm.yaml + defaults.yaml

Command Line Client -- Config Related

Page 31: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► storm monitor topology-name [-i interval-secs] [-m component-id] [-s stream-id] [-w [emitted | transferred]]

► storm set_log_level -l [logger name]=[log level][:optional timeout] -r [logger name] topology-name

► storm shell resourcesdir command args

► storm blobstore cmd► storm blobstore create mytopo:data.tgz -f data.tgz -a u:alice:rwa,u:bob:rw,o::r

► storm sql sql-file topology-name

Command Line Client -- Advanced

Page 32: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► storm help

► storm version

► storm dev-zookeeper

► storm kill_workers

► run on a supervisor node

Command Line Client -- Misc

Page 33: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Function

► retrieving metrics data

► retrieving configuration information

► management operations

► Supports JSONP

REST API

Page 34: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Request URL Format

► http://<ui-host>:<ui-port>/api/v1/...

► Default Port: 8080

► Response Format: JSON

REST API

Page 35: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► /api/v1/cluster/configuration (GET)

► /api/v1/cluster/summary (GET)

► /api/v1/nimbus/summary (GET)

► /api/v1/supervisor/summary (GET)

► /api/v1/topology/summary (GET)

► /api/v1/topology/:id (GET)

REST API - GET

Page 36: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► /api/v1/topology/:id/activate (POST)

► /api/v1/topology/:id/deactivate (POST)

► /api/v1/topology/:id/rebalance/:wait-time (POST)

► /api/v1/topology/:id/kill/:wait-time (POST)

REST API - POST

Page 37: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

Storm UI

Page 38: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Storm 介绍及特点

► Storm 核心概念

► Storm 系统架构

► Storm 使用

► Storm 应用开发

Storm 大纲

Page 39: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► API

► WordCount Example

► Parallelism

► Reliablity API

► DRPC

► Trident

► WordCount(Trident version) Example

Storm 应用开发

Page 40: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

public interface ISpout extends Serializable {

void open(Map var1, TopologyContext context, SpoutOutputCollector );

void close();

void activate();

void deactivate();

void nextTuple();

void ack(Object var1);

void fail(Object var1);

}

API -- Spout

Lifecycle API

Core API

Reliablity API

• 常见子接口:IRichSpout• 常见实现类:BaseRichSpout, DRPCSpout, RandomSentenceSpout, KafkaSpout

Page 41: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

public interface IBolt extends Serializable {

void prepare(Map var1, TopologyContext context, OutputCollector collector);

void execute(Tuple var1);

void cleanup();

}

API -- Bolt

Lifecycle API

Core API

• 常见子接口:IRichBolt • 常见实现类:BaseRichBolt, ShellBolt, RedisStoreBolt, KafkaBolt, HdfsBolt

Page 42: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

public interface IOutputCollector extends IErrorReporter {

List<Integer> emit(String streamId, Collection<Tuple> anchors, List<Object> tuple);

void emitDirect(int taskId, String streamId, Collection<Tuple> anchors, List<Object> tuple);

void ack(Tuple input);

void fail(Tuple input);

void resetTimeout(Tuple input);

}

API -- Bolt Output

Core API

Reliablity API

Page 43: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("spout", new RandomSentenceSpout(), 2);

builder.setBolt("split", new SplitSentence(), 2).shuffleGrouping("spout").setNumTasks(4);

builder.setBolt("count", new WordCount(), 6).fieldsGrouping("split", new Fields("word"));

API -- Topology

spout split count

Page 44: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

常见配置

• Config.TOPOLOGY_WORKERS

• Config.TOPOLOGY_ACKER_EXECUTORS

• Config.TOPOLOGY_MAX_SPOUT_PENDING

• Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS

• Config.TOPOLOGY_SERIALIZATIONS

API -- Topology Configuration

Config conf = new Config();

conf.setNumWorkers(20);

conf.setMaxSpoutPending(5000);

Page 45: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Local Mode

API - Topology Submission

► Remote Mode

LocalCluster cluster = new LocalCluster();

cluster.submitTopology("word-count", conf, builder.createTopology());

...

cluster.shutdown();

StormSubmitter.submitTopologyWithProgressBar("word-count", conf, builder.createTopology());

Page 46: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

WordCount Example

snow white and the seven dwarfssnow

white

and

the

seven

dwarfs

seven: 11snow: 11

and: 23dwarfs: 11

the: 19white: 11

Fields GroupingShuffle Grouping

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("spout", new RandomSentenceSpout(), 2);

builder.setBolt("split", new SplitSentence(), 2).shuffleGrouping("spout").setNumTasks(4);

builder.setBolt("count", new WordCount(), 6).fieldsGrouping("split", new Fields("word"));

Page 47: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

public class RandomSentenceSpout extends BaseRichSpout { SpoutOutputCollector _collector; Random _rand; public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { _collector = collector; _rand = new Random(); } public void nextTuple() { Utils.sleep(100); String[] sentences = new String[]{ "the cow jumped over the moon", "an apple a day keeps the doctor away",

"four score and seven years ago", "snow white and the seven dwarfs", "i am at two with nature" }; String sentence = sentences[_rand.nextInt(sentences.length)]; _collector.emit(new Values(sentence)); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("sentence")); }}

WordCount Example -- Spout

Page 48: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

public static class SplitSentence extends BaseBasicBolt { public void execute(Tuple tuple, BasicOutputCollector collector) { String sentence = tuple.getString(0); for (String word : sentence.split(" ")) { collector.emit(new Values(word)); } } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); }}

WordCount Example -- Split Bolt

Page 49: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

public static class WordCount extends BaseBasicBolt { Map<String, Integer> counts = new HashMap<String, Integer>(); public void execute(Tuple tuple, BasicOutputCollector collector) { String word = tuple.getString(0); Integer count = counts.get(word); if (count == null) count = 0; count++; counts.put(word, count); collector.emit(new Values(word, count)); System.out.println(String.format("== %s, %d ==", word, count)); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count")); }}

WordCount Example -- Count Bolt

Page 50: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

builder.setSpout("spout", new RandomSentenceSpout(), 2);

builder.setBolt("split", new SplitSentence(), 2).shuffleGrouping("spout").setNumTasks(4);

builder.setBolt("count", new WordCount(), 6).fieldsGrouping("split", new Fields("word"));

conf.setNumWorkers(2);

Parallelism

Parallelism Hint & Task Number & Worker Number ?

Page 51: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Worker processes

► Executors (threads)

► Tasks

Parallelism

Page 52: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

Parallelism

► Rebalance► storm rebalance mytopology -n 5 -e blue-spout=3 -e yellow-bolt=10

Page 53: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

Reliablity API -- "fully processed"

Page 54: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► To create a new link in the tree of tuples

Reliablity API -- "anchoring"

List<Tuple> anchors = new ArrayList<Tuple>();

anchors.add(A);

_collector.emit(anchors, new Values(B));

Page 55: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

Reliablity API -- Acknowledgment

ACK

Fail

ACK Bolt

ACK Bolt

ackack

failfail

public interface ISpout extends Serializable{

void ack(Object var1);

void fail(Object var1);

}

public interface IOutputCollector extends IErrorReporter {

void ack(Tuple input);

void fail(Tuple input);

}

BaseBasicBolt

Page 56: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Use single 64-bit integer

► XOR MagicLong a, b = Random.nextLong();a != 0a ^ a ^b != 0a ^ a ^ b ^ b == 0

► Question► What will happen if a tuple isn't acked because the task died?

Reliablity API -- Track Tuple Tree

Page 57: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

DRPC

Page 58: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► DRPC Server

storm drpc

► DRPC Client

DRPCClient client = new DRPCClient(conf, host, 3772);

String result = client.execute("wc", word);

► DRPC Topology

LinearDRPCTopologyBuilder

DRPC

Page 59: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Provides consistent, exactly-once semantics

► Micro-Batch Oriented

► Fluent, Stream-Oriented API► Functions

► Filters

► Groupings

► Aggregations

► Merges and Joins

► Stateful, incremental processing on top of any persistence store

Trident

Page 60: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

TridentBatch #1Batch #2

Page 61: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

TridentTopology topology = new TridentTopology();TridentState wordCounts = topology.newStream("spout1", spout).parallelismHint(16)

.each(new Fields("sentence"),new Split(), new Fields("word"))

.groupBy(new Fields("word"))

.persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count"))

.parallelismHint(16);topology.newDRPCStream("words", drpc)

.each(new Fields("args"), new Split(), new Fields("word"))

.groupBy(new Fields("word"))

.stateQuery(wordCounts, new Fields("word"), new MapGet(), new Fields("count"))

.each(new Fields("count"), new FilterNull())

.aggregate(new Fields("count"), new Sum(), new Fields("sum"));

WordCount(Trident version) Example

Page 62: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

► Storm 基本概念、系统架构、基本使用及应用开发入门

► Advanced

► State Management & Statefule Bolts

► Native Streaming Window API

► Distributed Cache API

► Scheduler & Resource Aware Scheduler

► Worker Execution Model

► ...

总结

Page 63: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

关注我们

QingCloud-IaaS

青云QingCloud

www.qingcloud.com

Page 64: Storm - NodeBB · Distributed realtime computation system Originated at BackType/Twitter, open sourced in late 2011 ... sudo tar -zxvf apache-storm-1.0.2.tar.gz -C /opt/ 4. Fill in

Thank [email protected]