apache’storm’ - puc-rioendler/courses/rt-analytics/transp/storm.pdf · design’goals’ •...
TRANSCRIPT
Storm
• Storm is a distributed real-‐;me computa;on pla<orm
• Provides abstrac;ons for implemen;ng event-‐based computa;ons on a cluster of physical nodes
• Performs parallel computa;ons on data streams • Manages high throughput data streams • It can be used to design complex event-‐driven applica;ons on intense streams of data
Introduc;on • Began as a project of BackType, a marke;ng intelligence company
bought by TwiFer in 2011 • TwiFer open-‐sourced the project and became an Apache project in
2014 • Storm = the Hadoop for Real-‐Time processing "Storm makes it
easy to reliably process unbounded streams of data, doing for real8me processing what Hadoop did for batch processing.”
• Has been designed for massive scalability, supports fault-‐tolerance with a “fail fast, auto restart” approach to processes, and provides the guarantee that every data of the stream will be processed.
• Its default is “at least once” processing seman;cs, but offers the ability to implement also the “exactly once” processing seman;cs(transac;onal)
Design Goals
• Guaranteed Data processing – no data is lost
• Impera;ve descrip;on of a streaming workflow (through stream manipula;on classes)
• Horizontal Scalability • Fault-‐ Tolerance • Programmable in different languages
Main Concepts: Spouts and Bolts • Any Storm processing is defined as a Directed Acyclic Graph (DAG) of
Spouts and Bolts, which is called a topology. • In the topology, Spouts and Bolts produce and consume a streams of
tuples. • Tuple:: are generic objects without any schema, but can have named
fields • Spouts:: are the tuple input modules;
– can be “unreliable” (fire-‐and-‐forget) or “reliable” (replay failed tuples) • Bolts:: are the tuple processing or output modules,
– consume streams and poten;ally produce new streams • Stream:: a poten;ally infinite sequence of Tuple objects that Storm
serializes and passes to the next bolts in the topology.
• Complex stream transforma;ons o]en require mul;ple steps (a chain of mul;ple bolts)
• Storm topologies run on clusters and the Storm scheduler distributes work to nodes around the cluster, based on the topology configura;on.
Applica;on represented as a topology
Source: Heinze, Aniello, Querzoni, Jerzak, Cloud-‐based Data Stream Processing, DEBS 2014
• Unlike Map-‐Reduce jobs, topologies run forever or un;l manually terminated.
• Spouts: – bring data into the system and hand the data off to bolts (which may in turn hand data to subsequent bolts)
• Bolts: – do the processing on the stream. – may write data out to a database or file system, – send a message to another external system, or – make the results of the computa;on available to the users.
Typical Bolts
• Func;ons – tuple transforma;ons • Filters • Aggrega;on • Joins • Storage/retrieval from persistent stores
Applica;on represented as a topology • Storm developer may set “parallelism hints” at elements of the topology.
Source: Heinze, Aniello, Querzoni, Jerzak, Cloud-‐based Data Stream Processing, DEBS 2014
Storm strengths • a rich array of available spouts specialized for receiving
data from all types of sources (e.g. from the TwiFer streaming API to Apache Kaea to JMS brokers, etc.)
• it is straigh<orward to integrate with HDFS file systems, meaning Storm can easily interoperate with Hadoop, if needed.
• Storm has support for mul;-‐language programming, and spouts and bolts can be wriFen in almost any language.
• Storm is a very scalable, fast, fault-‐tolerant open source system for distributed computa;on, with a special focus on calcula;ng rolling metrics in real ;me over streams of data.
Data Par;;oning Schemes • When a tuple is emiFed, to which task does it go? • Storm offers some flexibility to define the data par;;oning/
shuffling method • Stream groupings define the data flow in the topology • This is set for every spout and bold through the …grouping
method when defining the topology
Task view Topology view
Types of Stream Grouping • Shuffle grouping -‐ random distribu;on of tuples to the next
downstream bolt tasks • Fields grouping – uses one/more named elements of the
tuples to determine the des;na;on task (by mod hashing) • All grouping – sends all tuples to all all tasks • Global grouping – all tuples go to the bolt task with the
lowest Id • Direct grouping – explicit defini;on of the target bolt • Custom grouping – define a custom grouping method by
implemen;ng the CustomStreamGrouping interface • LocalOrShuffle grouping: if the target bolt has >1 tasks in
the same worker process, tuples will be shuffled to just those in-‐process tasks. Otherwise, it is the same as normal shuffle
A Prac;cal Example: Word Count • Word count: the HelloWorld • Input: stream of text (e.g. from documents) • Output: number of appearance for each word
Topology descrip;on • Using the Topologybuilder class and its methods setSpout()
and setBolt() the spouts and bolts are declared and instan;ated.
• setBolt returns an InputDeclarer object that is used to define the inputs to the bolt. With this a bolt explicitly subscribes to a specific stream of another component (spout or bolt), and …
• chooses the data shuffling/par;;oning op;on
• the paralleliza;on hint for spouts and bolts is op;onal
• The cluster class (its submitTopology method) is then used to map the topology to a cluster
IRichSpout IRichSpout: is the interface that any spout must implement. • open method:: allows the spout to configure any connec;ons
to the outside world (e.g. connec;ons to queue servers) and to receive the SpoutOutputCollector)
• nextTuple method:: will emit (send) the next tuple downstream the topology, it is called repeatedly by the Storm infra-‐structure
• declareOutputFields defies the fields of the tuples of the output streams
• Methods ack and fail are called when Storm detects that a tuple emiFed from the Spout either successfully completed the topology, or failed to be completed.
BaseRichBolt Extend the abstract class BaseRichBolt or implement the iRichBolt interface • Prepare method:: passes to the bolt informa;on about the
topology. The Outputcollector object manages the interac;on between the bolt and the topology (e.g. transmiong and acknowledging tuples)
• Execute method:: does the processing of incoming tuples • The collector.emit() method is used to send the
transformed/new tuple to the next bolt. • Through collector.ack() and collector.fail() the bolt can
no;fy Storm if the processing of the tuple was successful or if it failed, and for which reason (collector.reportError())
• declareOutputFields method:: is used do declare the fields of the output tuples or to define new named output streams.
BaseRichBolt • Bolts can emit more than one stream. To make use of this, declare mul;ple
named streams using the declareStream method of OutputFieldsDeclarer interface
• And then specify the named output streams using the emit method on SpoutOutputCollector!
public void declareOutputFields (OutputFieldsDeclarer d) {!!d.declare (new Fields (“first””, “second”, “third”))!!d.declareStream(“car”, new Fields(“first”));!!d.declareStream(“cdr”, new Fields(“second”,”third”))!
}!
Name of the stream Name of the fields
public void execute(Tuple input) {!List<Object> objs = input.select( new Fields(“first”,”second”,”third”) );!
!collector emit(objs);!!collector emit(“car”, new Values(objs.get(0)));!!collector.emit(“cdr”, new Values(objs.get(1), objs.get(2)));!!collector.ack(input);!
}!Access to the tuple fields
Topology Execu;on
• A Topology processes tuples forever (un;l you kill it). It consists of many worker processes spread across many machines (managed by a supervisor)
• A machine in a Cluster may run one or more worker processes. It is either idle or being used by a single topology. Each worker node may run one or more tasks of the same component.
• Storm’s default scheduler applies a simple round-‐robin strategy to assign tasks to worker processes
Architecture of a Storm Cluster
• Nimbus: – distributes code around the cluster – Assigns tasks to machines/supervisors (i.e. allocates the execu;on of components -‐ spouts – and bolts) -‐ to the worker processes – Failure monitoring – Is fail-‐fast and stateless
• Zookeeper: – Keeps the informa;on of which supervisor machines are execu;ng (for discovery and coordina;on
purposes) and if Nimbus machine is up. • Supervisor:
– Listens to work assigned to its machine – Starts and stops worker processes based on Nimbus commands – Is fast-‐fail and stateless
Storm considers a tuple coming off a spout "fully processed" when the tuple tree has been exhausted and every message in the tree has been processed. A tuple is considered failed when its tree of messages fails to be fully processed within a specified ;meout. This ;meout can be configured (default is 30 seconds)
Tuple emiFed by a spout
Tuple Tree
The tuple tree generated by the processing of a sentence
Anchoring • A tuple tree is defined by specifying the input tuple as the first argument
of emit. • If the new tuple fails to be processed downstream, the root tuple can be
iden;fied.
At-‐least-‐once processing guarantee • With anchoring, Storm can guarantee at-‐least-‐once seman;cs (in the presence of failures reported by bolts) without using intermediate queues.
• Instead of retrying from the point that a failure has been reported, retries happen from the root of the tuple tree -‐ spouts will simply re-‐emit the root tuple again.
• Intermediate stages of bolt processing that had been completed successfully will be re-‐done.
• This is a waste of processing, … • But has the advantage is there is no need to synchronize the processing of the tuples by the parallel tasks.
• And if the opera;on of the bolts is idempotent (no side effects) the re-‐processing actually defines exactly-‐once processing guarantee.
Transac;onal Exactly-‐once processing guarantee
But bolts may not do idempotent processing and processing may require exactly-‐once seman;cs: • e.g. if a bolt holds some state that is updated as tuples are processed (e.g. a counter) and which is sensi;ve to repeated processing, or if state must be restored from a failed bolt.
• exactly-‐once seman;cs requires that data sources be fault-‐tolerant and can re-‐emit tuples (aka, tuple replay)
Transac;onal Exactly-‐once processing guarantee
Storm handles this by using the following processing protocol:
• Tuples are grouped into micro-‐batches and each batch is associated with a transac;on ID. A transac;on ID is a monotonically growing numerical value (e.g. the first batch has ID 1, the second ID 2, etc.).
• If the topology fails to process a batch, this batch is re-‐emiFed with the same transac;on ID.
• Before sending the batch through the pipeline, Storm announces to the nodes (bolts) that a new transac;on is been aFempted. If it is successful, all nodes can commit their state.
• Storm guarantees that commit phases are globally ordered across all transac;ons i.e. a transac;on n+1 can never be commiFed before the transac;on n.
Each processing node executes the following logic for state updates: • The latest transac;on ID is persisted along with the state. • If the framework requests to commit the current
transac;on with a ID that differs from the ID value persisted, the state can be updated e.g. a counter can be incremented (Assuming a strong ordering of transac;ons, such update will happen exactly one for each batch).
• If the current transac;on ID equals to the persisted value, the node skips the commit because this is a batch replay. The node must have processed the batch earlier and updated the state accordingly, but the transac;on failed due to an error somewhere else in the pipeline.
• the strict order of commits is important to achieve exactly-‐once processing seman;cs.
Storm’s Transac;on Processing A topology
Note: transac;onal processing can cause serious performance degrada;on even if large batches are used.
Spouts Re-‐emiong tuples • When emiong a tuple, the Spout provides a "message id"
that will be used to iden;fy the tuple later. • The tuple gets sent to consuming bolts and Storm takes
care of tracking the tree of messages that is created. • If a failure (or ;meout) is detected, Storm calls the fail
method only on the specific Spout task that emiFed the failed tuple informing its “message id”. Other parallel spout tasks will not be affected.
• The need to re-‐emit root tuples in case of failure requires a persistent queue – the message is not de-‐queued but placed on a pending state, wai;ng for the acknowledgement that the message processing is completed by the topology.
• Therefore, spouts are o]en connected to Kaea clusters.
Storm Opera;on Modes Local mode: simulates the execu;on of a Storm cluster in a single process (useful for debugging)
Distributed mode: execu;on in a cluster of machines. Submiong a topology to the master it also submits the code necessary to run the topology. • Nimbus will take care of distribu;ng your code and alloca;ng workers to run your topology. If workers go down, it will reassign them somewhere else.