deep dive into apache apex app development

21
Deep Dive Into Apache Apex Application Chaitanya Chebolu

Upload: apache-apex

Post on 16-Apr-2017

374 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Deep Dive into Apache Apex App Development

Deep Dive Into Apache Apex Application

Chaitanya Chebolu

Page 2: Deep Dive into Apache Apex App Development

Application Development Model

2

▪A Stream is a sequence of data tuples▪A typical Operator takes one or more input streams, performs computations & emits one or more output streams

• Each Operator is YOUR custom business logic in java, or built-in operator from our open source library

• Operator has many instances that run in parallel and each instance is single-threaded▪Directed Acyclic Graph (DAG) is made up of operators and streams

Directed Acyclic Graph (DAG)

Filtered

Stream

Output StreamTuple Tuple

Filtered Stream

Enriched Stream

Enriched

Stream

er

Operator

er

Operator

er

Operator

er

Operator

er

Operator

er

Operator

Page 3: Deep Dive into Apache Apex App Development

3

Typical application example

Page 4: Deep Dive into Apache Apex App Development

4

DAG TypesO1 O2

O3

O4

O5• Logical Plan● Logical representation of

computation● Defines operators, streams and

dataflow

• Physical Plan● Deployable plan on cluster● Contains partition information of operators● Has ready-to-deploy serialized operatorinstances

Logical DAG

O1P1

O1P2

O1P3

O2P1

O2P2

O2P3

U

O3

O4

O5

Physical DAG

Page 5: Deep Dive into Apache Apex App Development

5

➔ All operators in DAG go through

this life-cycle

➔ Managed by Apex Platform

➔ Governed by control tuples

Operator Lifecycle

Page 6: Deep Dive into Apache Apex App Development

6

➔ Setup

◆ Start of operator lifecycle

◆ Do any initialization here

➔ beginWindow

◆ Marks starting of window

➔ endWindow

◆ Marks end of window

➔ teardown

◆ Do any finalization here

◆ End of operator lifecycle

Operator Lifecycle (contd...)

Page 7: Deep Dive into Apache Apex App Development

7

Operator Lifecycle (contd...)➔ emitTuples

◆ Called for Input Adapters

◆ Called in an infinite while

loop by platform

➔ process

◆ Called for Generic Operators

and Output Adapters

◆ Associated to to a port

◆ Called for every incoming

tuple

Page 8: Deep Dive into Apache Apex App Development

8

Operator Lifecycle (contd...)➔ OutputPort::emit

◆ Special method not part of

operator lifecycle

◆ To be called by operator code

◆ Emits the tuples to next

operator

◆ Bound by Window

Page 9: Deep Dive into Apache Apex App Development

9

Input Operator (Adapter)

Output Operator (Adapter)

Generic Operators

LOGSReader Parser Counter OutputHDFS

Defining DAG

Page 11: Deep Dive into Apache Apex App Development

11

• SampleInputOperator implements InputOperator

ᵒ Define output ports ᵒ Define emitTuples method.

ᵒ Define beginWindow, endWindow, setup, teardown

APIs : InputOperator

Page 12: Deep Dive into Apache Apex App Development

12

• SampleOperator extends BaseOperatorᵒ Define input ports, output ports ᵒ Define process methods

ᵒ Optional : Define beginWindow, endWindow, setup, teardown

APIs : GenericOperator, OutputOperator

Page 13: Deep Dive into Apache Apex App Development

Application Specification (Java)

13

DAG API (compositional)

Page 14: Deep Dive into Apache Apex App Development

Writing an Operator

14

Page 15: Deep Dive into Apache Apex App Development

15

Writing an Operator

Page 16: Deep Dive into Apache Apex App Development

Operator Library

16

RDBMS• Vertica• MySQL• Oracle• JDBC

NoSQL• Cassandra, Hbase• Aerospike, Accumulo• Couchbase/ CouchDB• Redis, MongoDB• Geode

Messaging• Kafka• Solace• Flume, ActiveMQ• Kinesis, NiFi

File Systems• HDFS/ Hive• NFS• S3

Parsers• XML • JSON• CSV• Avro• Parquet

Transformations• Filters• Rules• Expression• Dedup• Enrich

Analytics• Dimensional Aggregations

(with state management for historical data + query)

Protocols• HTTP• FTP• WebSocket• MQTT• SMTP

Other• Elastic Search• Script (JavaScript, Python, R)• Solr• Twitter

Page 17: Deep Dive into Apache Apex App Development

17

Java : 1.7.xmvn : 3.0 + git : 1.7 +Apache hadoop : How to : Single node cluster Apache Apex Core

ᵒ git clone [email protected]:apache/apex-core.gitᵒ cd apex-core/ᵒ git checkout masterᵒ mvn clean install -DskipTests

Apache Apex Malharᵒ git clone [email protected]:apache/apex-malhar.gitᵒ cd apex-malhar/ᵒ git checkout masterᵒ mvn clean install -DskipTests

DataTorrent RTS community edition

Building Apache Apex

Page 18: Deep Dive into Apache Apex App Development

Monitoring ConsoleLogical View

18

Physical View

Page 19: Deep Dive into Apache Apex App Development

Real-Time Dashboards

19

Page 20: Deep Dive into Apache Apex App Development

Q&A

20

Page 21: Deep Dive into Apache Apex App Development

Resources

21

• http://apex.apache.org/• Learn more: http://apex.apache.org/docs.html • Subscribe - http://apex.apache.org/community.html• Download - http://apex.apache.org/downloads.html• Follow @ApacheApex - https://twitter.com/apacheapex• Meetups – http://www.meetup.com/pro/apacheapex/• More examples: https://github.com/DataTorrent/examples• Slideshare:

http://www.slideshare.net/ApacheApex/presentations• https://www.youtube.com/results?search_query=apache+ape

x• Free Enterprise License for Startups -

https://www.datatorrent.com/product/startup-accelerator/