flink case study: amadeus

12
Streaming Streaming Streaming Streaming &Parallel Parallel Parallel Parallel Decision Tree Decision Tree Decision Tree Decision Tree in Flink Flink Flink Flink 1 2 3 4 1 2 3 4 anwar.rizal @anrizal

Upload: flink-forward

Post on 08-Jan-2017

5.851 views

Category:

Technology


0 download

TRANSCRIPT

StreamingStreamingStreamingStreaming &ParallelParallelParallelParallel

Decision TreeDecision TreeDecision TreeDecision Tree in FlinkFlinkFlinkFlink

1 2 3 4

1 2 3 4 anwar.rizal @anrizal

Mo

tivatio

nM

otiva

tion

Mo

tivatio

nM

otiva

tion

MotivationArchitecture

Decision Trees

Implementation

Conclusion

Mo

tivatio

nM

otiva

tion

Mo

tivatio

nM

otiva

tion Need a classifier system on streaming data

The data used for learning come as

a stream

So are the data to be classified

Mo

tivatio

nM

otiva

tion

Mo

tivatio

nM

otiva

tion

$90 $90 $120 $90 $90 $150 $200

$90 $75 $90 $90 $90 $90 $90

$120 $90 Sold out Sold out $75 $90 $90

$120 $90 $90 $90 $100 $90 $120

(predicted) to increase zero to two days

(predicted) to increase this week

(predicted) to increase next week

Mo

tivatio

nM

otiva

tion

Mo

tivatio

nM

otiva

tion

FRA – NYC

FRA - LON

FRA - MEX

Need attention

Revenue decrease

Need attention

passenger decrease

Need attention

revenue decrease,

cost increase

Mo

tivatio

nM

otiva

tion

Mo

tivatio

nM

otiva

tion Need a classifier system on streaming data

The data used for learning come as

a stream

So are the data to be classified

Mo

tivatio

nM

otiva

tion

Mo

tivatio

nM

otiva

tion

The classifier is kept fresh

No need for separate batch learning/evaluation

The feedback is taken into account in real time, regularly

The classifier can be introspectedTransparent model structure

(e.g. know the tree, information gain for each split

point)

Known expected performance (accuracy, precision, recall,

AUC)

Seamless support for workflow of machine

learningData preprocessing: up/down sampling, imputations, …

Feature selections

Model evaluation, cross validation, MUST

Mo

tivatio

nM

otiva

tion

Mo

tivatio

nM

otiva

tion

The classifier is immediately available

The classifier can already predict during learning

When learning phase is terminated, it starts another cycle of

learning

The classifier has a meta-learning capabilityThe classifier has several models different parameters

It is possible to learn about the learning capability of the

models

NICE TO HAVE

Mo

tivatio

nM

otiva

tion

Mo

tivatio

nM

otiva

tion

Learning Learning &

Classifying

End of

learning

New cycle of

learning

Cycle of

Learning, Classifying during Learning, End

of Learning, Classifying, New Learning

Mo

tivatio

nM

otiva

tion

Mo

tivatio

nM

otiva

tion

Classifying Application

Stream Learner

Labeled

points

Classifier Predicted

points

Unlabeled

points

Mo

tivatio

nM

otiva

tion

Mo

tivatio

nM

otiva

tion

Decision Tree Algorithms Outline

Implementation using Flink Streaming

Further Discussions

Thanks!

Credit to: Yiqing Yan (Eurecom) & Tianshu Yang (Telecom Bretagne), Amadeus Interns