deep dive into apache apex app development
TRANSCRIPT
Deep Dive Into Apache Apex Application
Chaitanya Chebolu
Application Development Model
2
▪A Stream is a sequence of data tuples▪A typical Operator takes one or more input streams, performs computations & emits one or more output streams
• Each Operator is YOUR custom business logic in java, or built-in operator from our open source library
• Operator has many instances that run in parallel and each instance is single-threaded▪Directed Acyclic Graph (DAG) is made up of operators and streams
Directed Acyclic Graph (DAG)
Filtered
Stream
Output StreamTuple Tuple
Filtered Stream
Enriched Stream
Enriched
Stream
er
Operator
er
Operator
er
Operator
er
Operator
er
Operator
er
Operator
3
Typical application example
4
DAG TypesO1 O2
O3
O4
O5• Logical Plan● Logical representation of
computation● Defines operators, streams and
dataflow
• Physical Plan● Deployable plan on cluster● Contains partition information of operators● Has ready-to-deploy serialized operatorinstances
Logical DAG
O1P1
O1P2
O1P3
O2P1
O2P2
O2P3
U
O3
O4
O5
Physical DAG
5
➔ All operators in DAG go through
this life-cycle
➔ Managed by Apex Platform
➔ Governed by control tuples
Operator Lifecycle
6
➔ Setup
◆ Start of operator lifecycle
◆ Do any initialization here
➔ beginWindow
◆ Marks starting of window
➔ endWindow
◆ Marks end of window
➔ teardown
◆ Do any finalization here
◆ End of operator lifecycle
Operator Lifecycle (contd...)
7
Operator Lifecycle (contd...)➔ emitTuples
◆ Called for Input Adapters
◆ Called in an infinite while
loop by platform
➔ process
◆ Called for Generic Operators
and Output Adapters
◆ Associated to to a port
◆ Called for every incoming
tuple
8
Operator Lifecycle (contd...)➔ OutputPort::emit
◆ Special method not part of
operator lifecycle
◆ To be called by operator code
◆ Emits the tuples to next
operator
◆ Bound by Window
9
Input Operator (Adapter)
Output Operator (Adapter)
Generic Operators
LOGSReader Parser Counter OutputHDFS
Defining DAG
10
• MyApplication implements StreamingApplicationᵒ Provide implementation for populateDAGᵒ Stitch the DAG
APIs : Application
11
• SampleInputOperator implements InputOperator
ᵒ Define output ports ᵒ Define emitTuples method.
ᵒ Define beginWindow, endWindow, setup, teardown
APIs : InputOperator
12
• SampleOperator extends BaseOperatorᵒ Define input ports, output ports ᵒ Define process methods
ᵒ Optional : Define beginWindow, endWindow, setup, teardown
APIs : GenericOperator, OutputOperator
Application Specification (Java)
13
DAG API (compositional)
Writing an Operator
14
15
Writing an Operator
Operator Library
16
RDBMS• Vertica• MySQL• Oracle• JDBC
NoSQL• Cassandra, Hbase• Aerospike, Accumulo• Couchbase/ CouchDB• Redis, MongoDB• Geode
Messaging• Kafka• Solace• Flume, ActiveMQ• Kinesis, NiFi
File Systems• HDFS/ Hive• NFS• S3
Parsers• XML • JSON• CSV• Avro• Parquet
Transformations• Filters• Rules• Expression• Dedup• Enrich
Analytics• Dimensional Aggregations
(with state management for historical data + query)
Protocols• HTTP• FTP• WebSocket• MQTT• SMTP
Other• Elastic Search• Script (JavaScript, Python, R)• Solr• Twitter
17
Java : 1.7.xmvn : 3.0 + git : 1.7 +Apache hadoop : How to : Single node cluster Apache Apex Core
ᵒ git clone [email protected]:apache/apex-core.gitᵒ cd apex-core/ᵒ git checkout masterᵒ mvn clean install -DskipTests
Apache Apex Malharᵒ git clone [email protected]:apache/apex-malhar.gitᵒ cd apex-malhar/ᵒ git checkout masterᵒ mvn clean install -DskipTests
DataTorrent RTS community edition
Building Apache Apex
Monitoring ConsoleLogical View
18
Physical View
Real-Time Dashboards
19
Q&A
20
Resources
21
• http://apex.apache.org/• Learn more: http://apex.apache.org/docs.html • Subscribe - http://apex.apache.org/community.html• Download - http://apex.apache.org/downloads.html• Follow @ApacheApex - https://twitter.com/apacheapex• Meetups – http://www.meetup.com/pro/apacheapex/• More examples: https://github.com/DataTorrent/examples• Slideshare:
http://www.slideshare.net/ApacheApex/presentations• https://www.youtube.com/results?search_query=apache+ape
x• Free Enterprise License for Startups -
https://www.datatorrent.com/product/startup-accelerator/