java high level stream api

16
Stream API For Apex June 2016

Upload: apache-apex

Post on 08-Jan-2017

621 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Java High Level Stream API

Stream API For Apex

June 2016

Page 2: Java High Level Stream API

Apex Overview

Page 3: Java High Level Stream API

Apex Overview

• YARN is the resource manager

• HDFS used for storing any persistent state

Page 4: Java High Level Stream API

Current Development ModelDirected Acyclic Graph (DAG)

Filtered

Stream

Output StreamTuple Tuple

Filtered Stream

Enriched Stream

Enriched

Stream

er

Operator

er

Operator

er

Operator

er

Operator

er

Operator

er

Operator

● Stream is a sequence of data tuples● Typical Operator takes one or more input streams, performs computations & emits one or more

output streams● Each operator is your custom business logic in java, or built-in operator from our open source library● Operator has many instances that run in parallel and each instance is single-threaded● Directed Acyclic Graph (DAG) is made up of operators and streams

Page 5: Java High Level Stream API

Current Application Example@ApplicationAnnotation(name="WordCountDemo")

public class Application implements StreamingApplication

{

@Override

public void populateDAG(DAG dag, Configuration conf)

{

WordCountInputOperator input = dag.addOperator("wordinput", new WordCountInputOperator());

UniqueCounter<String> wordCount = dag.addOperator("count", new UniqueCounter<String>());

ConsoleOutputOperator consoleOperator = dag.addOperator("console", new ConsoleOutputOperator());

dag.addStream("wordinput-count", input.outputPort, wordCount.data);

dag.addStream("count-console",wordCount.count, consoleOperator.input);

}

}

Page 6: Java High Level Stream API

o Easier for beginners to start witho Fluent APIo Smaller learning curveo Transform methods in one place vs operator libraryo Operator API provides flexibility while high-level API provides

ease of use

Why we need high-level API

Page 7: Java High Level Stream API

Stream API

map(..)filter(..)…addOperator(...)with(prop, val)…window(Opt...)

ApexStream<T> group(..)

groupByKey(...)reduce(..)fold(..)join(..)count(..)…window(Opt...)

WindowedStream<T>

<<interface>> <<interface>>

Page 8: Java High Level Stream API

Stream API (Application Example)@ApplicationAnnotation(name = "WordCountStreamingApiDemo")

public class ApplicationWithStreamAPI implements StreamingApplication

{

@Override

public void populateDAG(DAG dag, Configuration configuration)

{

String localFolder = "./src/test/resources/data";

ApexStream<String> stream = StreamFactory

.fromFolder(localFolder)

.flatMap(new Split())

.window(new WindowOption.GlobalWindow(), new

TriggerOption().withEarlyFiringsAtEvery(Duration.millis(1000)).accumulatingFiredPanes())

.countByKey(new ConvertToKeyVal()).print();

stream.populateDag(dag);

}

}

Page 9: Java High Level Stream API

How it works

o ApexStream<T> literally means bounded/unbounded data set of type T

o ApexStream<T> also holds a graph data struture of all operator and connections between operators from input to current point

o Each transform method attach one or more operators to current graph data structure and return a new Apex Stream object

o The graph data structure won’t be translated to Apex DAG until populateDag or run method are called

Page 10: Java High Level Stream API

How it works (Con’t)

Page 11: Java High Level Stream API

○ Method chain for readability○ Stateless transform(map, flatmap, filter)○ Some input and output are available (file, console, Kafka)○ Some interoperability (addOperator, getDag, set property/attributes etc)○ Local mode and distributed mode○ Annonymous function class support○ Extensible

Current Status

Page 12: Java High Level Stream API

○ WindowedStream is in pull request along with Operators that support it○ A few window transforms (count, reduce, etc)○ 3 Window types (fix window, sliding window, session window)○ 3 Trigger types (early trigger, late trigger, at watermark)○ 3 Accumulation modes(accumulate, discard, accumulation_retraction)○ In memory window state (checkpointed)

Current Status (Con’t)

Page 13: Java High Level Stream API

Roadmap○ Persistent window state for windowed operators (large state)○ Fully follow Beam model (window, trigger, watermark)○ Rich selection of windowed transform (group, combine, join)○ Support custom window assignor○ Support custom trigger○ More input/output (hbase, cassendra, jdbc, etc)○ Better schema support○ More language support (java 8, scala, etc...)○ What the community asks for

Page 14: Java High Level Stream API

Resources○ Apache Apex website - http://apex.apache.org/○ Subscribe - http://apex.apache.org/community.html○ Download - http://apex.apache.org/downloads.html○ Twitter - @ApacheApex; Follow - https://twitter.com/apacheapex○ Facebook - https://www.facebook.com/ApacheApex/○ Meetup - http://www.meetup.com/topics/apache-apex○ SlideShare -

http://www.slideshare.net/ApacheApex/presentations○ More Examples - https://github.com/DataTorrent/examples○ Pull request

https://github.com/apache/apex-malhar/pull/319 https://github.com/apache/apex-malhar/pull/327

Page 15: Java High Level Stream API

Demo & Code Example

○ Word Count○ AutoComplete

Page 16: Java High Level Stream API

Thank You!

June 2016

Comments/[email protected]