writing an apache apex application

19
© 2015 DataTorrent Akshay Gore, Bhupesh Chawda DataTorrent Apex Hands-on Lab - Into the code! Getting started with your first Apex Application!

Upload: apache-apex

Post on 08-Jan-2017

389 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Writing an Apache Apex Application

© 2015 DataTorrent

Akshay Gore, Bhupesh ChawdaDataTorrent

Apex Hands-on Lab - Into the code!Getting started with your first Apex Application!

Page 2: Writing an Apache Apex Application

© 2015 DataTorrent

Operators• Input Adaptor Vs

Generic Operators ?

• What are streams?• What are ports?

Page 3: Writing an Apache Apex Application

© 2015 DataTorrent

Apex Operator Lifecycle

Page 4: Writing an Apache Apex Application

© 2015 DataTorrent

Apex Streaming Application

public class Application implements StreamingApplication{

populateDAG(DAG dag, Configuration conf) {

// Add Operators to dag - dag.addOperator(args)// Add Streams between operators - dag.addStream(args)// Additional config + Hints to YARN - Optional } }

Page 5: Writing an Apache Apex Application

© 2015 DataTorrent

Apex Application - FilterWords

Apex Application DAG

• Problem statement - Filter words in the fileᵒ Read a file located on HDFSᵒ Split each line into words, check if it is not one of the forbidden words

and write it down to HDFS

HDFS

Lines Filtered WordsHDFS

Page 6: Writing an Apache Apex Application

© 2015 DataTorrent

FilterWords Application DAG

Reader Tokenize Processor Writter

Input Operator (Adapter)

Output Operator (Adapter)

Generic Operators

HDFS HDFS

Lines WordsFiltered Words

Page 7: Writing an Apache Apex Application

© 2015 DataTorrent

Prerequisites• JAVA 1.7 or above• Maven 3.0 or above • Apache Apex projects:

ᵒ Apache Apex Core: core platform, engineᵒ Apache Apex Malhar: operators library

• Hadoop cluster in running state• Your favourite IDE - Eclipse / vi

Page 8: Writing an Apache Apex Application

© 2015 DataTorrent

Demo time!• Apex application structure• Application code walk through• How to execute the application• Assignment

Page 9: Writing an Apache Apex Application

© 2015 DataTorrent

Assignment - WordCount

Apex Application DAG

• Problem statement - Count occurrences of words in a fileᵒ Read a file located on HDFSᵒ Emit count at the end of the every window and writes into HDFS

HDFS

Lines <Word, Count>HDFS

Page 10: Writing an Apache Apex Application

© 2015 DataTorrent

Assignment - Word Count Application DAG

Reader Tokenize Counter OutputHDFS HDFS

Lines Words<Word, count>

Page 11: Writing an Apache Apex Application

© 2015 DataTorrent

Assignment - What you need to do

Reader Tokenizer Processor WriterString String String

Line Words Words’

Counter WriterMap

{Word: Count}

Assignment

Page 12: Writing an Apache Apex Application

© 2015 DataTorrent

Assignment - Hints• Create copy of Processor.java. Name it Counter.java• Modify Counter.java as follows:

ᵒ Define a data structure which can hold counts for wordsᵒ Process method of input port must count the occurrencesᵒ Clear the counts in beginWindow() call

ᵒ Emit the counts in endWindow() call

Page 13: Writing an Apache Apex Application

© 2015 DataTorrent

Solution - Changes to Counter.java• Need to define a data structure which can hold counts for words

private HashMap<String, Integer> counts = new HashMap<>();

• Process method of input port must count the occurrencesif(counts.containsKey(refinedWord)) {

counts.put(refinedWord, counts.get(refinedWord) + 1);

} else {

counts.put(refinedWord, 1);

}

● Clear the counts in beginWindow call counts.clear();

● Emit the counts in endWindow call output.emit(counts.toString());

● Run Application Test

Page 14: Writing an Apache Apex Application

© 2015 DataTorrent

Assignment - Are we done yet?• Change the DAG

ᵒ Replace Processor operator with the newly created operator - Counter

Page 15: Writing an Apache Apex Application

© 2015 DataTorrent

Assignment - Slight change• We are emitting a Map. However it is still a string.

ᵒ Change type of output port of Counter to type Mapᵒ Change type of input port of Writer to Mapᵒ Make appropriate changes to Writer to read a Map and write in a format

such that each line belongs to a single word.

Page 16: Writing an Apache Apex Application

© 2015 DataTorrent

Assignment - Final change• Change the code such that each count is the overall count, not just

for each window?

Page 17: Writing an Apache Apex Application

© 2015 DataTorrent

Summary - Recap• Writing Apache Apex operators• Chaining the operators into an Apache Apex application• Executing the application on the Apache Apex platform

Page 18: Writing an Apache Apex Application

© 2015 DataTorrent

Where to go from here?Apache Apex Documentation - http://apex.incubator.apache.org/docs.htmlApache Apex Core Git - https://github.com/apache/incubator-apex-coreApache Apex Malhar Git - https://github.com/apache/incubator-apex-malhar

Join Users Mailing List - [email protected] Dev Mailing List - [email protected]

Send queries to Users Mailing List - [email protected] queries to Dev Mailing List - [email protected]

Page 19: Writing an Apache Apex Application

© 2015 DataTorrent

Thank You