flink case study: amadeus
TRANSCRIPT
StreamingStreamingStreamingStreaming &ParallelParallelParallelParallel
Decision TreeDecision TreeDecision TreeDecision Tree in FlinkFlinkFlinkFlink
1 2 3 4
1 2 3 4 anwar.rizal @anrizal
Mo
tivatio
nM
otiva
tion
Mo
tivatio
nM
otiva
tion
MotivationArchitecture
Decision Trees
Implementation
Conclusion
Mo
tivatio
nM
otiva
tion
Mo
tivatio
nM
otiva
tion Need a classifier system on streaming data
The data used for learning come as
a stream
So are the data to be classified
Mo
tivatio
nM
otiva
tion
Mo
tivatio
nM
otiva
tion
$90 $90 $120 $90 $90 $150 $200
$90 $75 $90 $90 $90 $90 $90
$120 $90 Sold out Sold out $75 $90 $90
$120 $90 $90 $90 $100 $90 $120
(predicted) to increase zero to two days
(predicted) to increase this week
(predicted) to increase next week
Mo
tivatio
nM
otiva
tion
Mo
tivatio
nM
otiva
tion
FRA – NYC
FRA - LON
FRA - MEX
Need attention
Revenue decrease
Need attention
passenger decrease
Need attention
revenue decrease,
cost increase
Mo
tivatio
nM
otiva
tion
Mo
tivatio
nM
otiva
tion Need a classifier system on streaming data
The data used for learning come as
a stream
So are the data to be classified
Mo
tivatio
nM
otiva
tion
Mo
tivatio
nM
otiva
tion
The classifier is kept fresh
No need for separate batch learning/evaluation
The feedback is taken into account in real time, regularly
The classifier can be introspectedTransparent model structure
(e.g. know the tree, information gain for each split
point)
Known expected performance (accuracy, precision, recall,
AUC)
Seamless support for workflow of machine
learningData preprocessing: up/down sampling, imputations, …
Feature selections
Model evaluation, cross validation, MUST
Mo
tivatio
nM
otiva
tion
Mo
tivatio
nM
otiva
tion
The classifier is immediately available
The classifier can already predict during learning
When learning phase is terminated, it starts another cycle of
learning
The classifier has a meta-learning capabilityThe classifier has several models different parameters
It is possible to learn about the learning capability of the
models
NICE TO HAVE
Mo
tivatio
nM
otiva
tion
Mo
tivatio
nM
otiva
tion
Learning Learning &
Classifying
End of
learning
New cycle of
learning
Cycle of
Learning, Classifying during Learning, End
of Learning, Classifying, New Learning
Mo
tivatio
nM
otiva
tion
Mo
tivatio
nM
otiva
tion
Classifying Application
Stream Learner
Labeled
points
Classifier Predicted
points
Unlabeled
points
Mo
tivatio
nM
otiva
tion
Mo
tivatio
nM
otiva
tion
Decision Tree Algorithms Outline
Implementation using Flink Streaming
Further Discussions