openml.org: networked science and iot data streams by jan van rijn, university of freiburg

39
OpenML.org: Networked Science and IoT Data Streams Jan N. van Rijn University of Freiburg November 24, 2016

Upload: euroiota

Post on 16-Apr-2017

268 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

OpenML.org: Networked Science and IoT Data Streams

Jan N. van Rijn

University of Freiburg

November 24, 2016

Page 2: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Motivation

Galileo Galilei (1564–1642)

Created the best telescopes

Discovered the rings of Saturn

Sent anagrams of his discoveries,instead of publishing the results

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 2

Page 3: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Motivation

Galileo Galilei (1564–1642)

Created the best telescopes

Discovered the rings of Saturn

Sent anagrams of his discoveries,instead of publishing the results

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 2

Page 4: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg
Page 5: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Openml.org

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 4

Page 6: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Datasets

Data (ARFF) uploaded or referenced, versionedAnalysed, characterized, organized on lineIndexed based on name, meta-features, tags, etc.Support for other data formats (on request)

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 5

Page 7: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Tasks

Data alone does not define an experiment

Tasks contain: data, target attribute, goals, procedures

Readable by tools, automates experimentation

Real time ‘leaderboard’ and overview

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 6

Page 8: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Flows (algorithms)

Run locally, auto-registered by tools

Integrations + APIs (REST, Java, R, Python, . . . )

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 7

Page 9: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Flows (algorithms)

Run locally, auto-registered by tools

Integrations + APIs (REST, Java, R, Python, . . . )

1 from s c i k i t l e a r n import t r e e2 from openml import t a sk s , runs3

4 t a s k = t a s k s . ge t (59)5 c l f = t r e e . D e c i s i o n T r e e C l a s s i f i e r ( )6 run = run . r u n t a s k ( task , c l f )7 r e t u r n t a s k , r e s pon s e = run . p u b l i s h ( )

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 7

Page 10: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Flows (algorithms)

Run locally, auto-registered by tools

Integrations + APIs (REST, Java, R, Python, . . . )

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 7

Page 11: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Runs

Flow uploads predictionsPredictions are evaluated on OpenMLReproducible, linked to data, flows and researcherContains:

predictionsparameter settingsmodel informationevaluation measures

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 8

Page 12: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Analysis

Answer basic questions about performance of algorithms to study . . .

the effect / behaviour of parameters on a given algorithm

the effect of feature selection on a given algorithm

how algorithms behave with respect to each other

which algorithms perform well on a wide range of datasets

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 9

Page 13: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Effect of parameter

93

94

95

96

97

98

99

RBFK

ernel(1)

J48(2)

IBk(1)

Logistic(1)

Random

Forest(1)

REPTree(1)

Pre

dic

tive

Acc

ura

cy (

%)

21

22

23

24

25

26

27

28

4 16 64 256 1024 4096 16384

Op

tim

al v

alu

e

Number Of Features

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 10

Page 14: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Effect of parameter

93

94

95

96

97

98

99

RBFK

ernel(1)

J48(2)

IBk(1)

Logistic(1)

Random

Forest(1)

REPTree(1)

Pre

dic

tive

Acc

ura

cy (

%)

21

22

23

24

25

26

27

28

4 16 64 256 1024 4096 16384

Op

tim

al v

alu

e

Number Of Features

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 10

Page 15: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Effect of parameter

93

94

95

96

97

98

99

RBFK

ernel(1)

J48(2)

IBk(1)

Logistic(1)

Random

Forest(1)

REPTree(1)

Pre

dic

tive

Acc

ura

cy (

%)

21

22

23

24

25

26

27

28

4 16 64 256 1024 4096 16384

Op

tim

al v

alu

e

Number Of Features

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 10

Page 16: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Effect of Feature Selection

256

512

1024

2048

4096

8192

16384

32768

65536

1 4 16 64 256 1024 4096 16384

Nu

mb

er O

f In

stan

ces

Number Of Features

BetterEqual

Worse

k-NN (k = 1)

256

512

1024

2048

4096

8192

16384

32768

65536

1 4 16 64 256 1024 4096 16384

Nu

mb

er O

f In

stan

ces

Number Of Features

BetterEqual

Worse

Naive Bayes

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 11

Page 17: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Effect of Feature Selection

256

512

1024

2048

4096

8192

16384

32768

65536

1 4 16 64 256 1024 4096 16384

Nu

mb

er O

f In

stan

ces

Number Of Features

BetterEqual

Worse

k-NN (k = 1)

256

512

1024

2048

4096

8192

16384

32768

65536

1 4 16 64 256 1024 4096 16384

Nu

mb

er O

f In

stan

ces

Number Of Features

BetterEqual

Worse

Naive Bayes

256

512

1024

2048

4096

8192

16384

32768

65536

1 4 16 64 256 1024 4096 16384

Nu

mb

er O

f In

stan

ces

Number Of Features

BetterEqual

Worse

Decision Tree (C4.5)

256

512

1024

2048

4096

8192

16384

32768

65536

1 4 16 64 256 1024 4096 16384

Nu

mb

er O

f In

stan

ces

Number Of Features

BetterEqual

Worse

SVM (RBF Kernel)

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 11

Page 18: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Performance of Algorithms

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

JRip

LMT

HoeffdingTree

Random

Tree

Random

Forest

NaiveB

ayes

SMO

(PolyKernel)

MultilayerPerceptron

LogitBoost(D

ecisionStump)

MultilayerPerceptron

DecisionTable

SMO

(RBFK

ernel)

Logistic

HyperPipes

MultilayerPerceptron

IBk

FURIA

BayesN

et

AdaB

oostM1(N

aiveBayes)

OLM

SimpleC

art

ConjunctiveR

ule

AdaB

oostM1(D

ecisionStump)

LAD

Tree

OneR

Bagging(R

EPTree)

J48A

daBoostM

1(J48)

IBk

Acc

ura

cy

0.4

0.5

0.6

0.7

0.8

0.9

1

JRip

LMT

HoeffdingTree

Random

Tree

Random

Forest

NaiveB

ayes

SMO

(PolyKernel)

MultilayerPerceptron

LogitBoost(D

ecisionStump)

MultilayerPerceptron

DecisionTable

SMO

(RBFK

ernel)

Logistic

HyperPipes

MultilayerPerceptron

IBk

FURIA

BayesN

et

AdaB

oostM1(N

aiveBayes)

OLM

SimpleC

art

ConjunctiveR

ule

AdaB

oostM1(D

ecisionStump)

LAD

Tree

OneR

Bagging(R

EPTree)

J48A

daBoostM

1(J48)

IBk

Are

a u

nd

er t

he

RO

C c

urv

e

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 12

Page 19: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Performance of Algorithms

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

JRip

LMT

HoeffdingTree

Random

Tree

Random

Forest

NaiveB

ayes

SMO

(PolyKernel)

MultilayerPerceptron

LogitBoost(D

ecisionStump)

MultilayerPerceptron

DecisionTable

SMO

(RBFK

ernel)

Logistic

HyperPipes

MultilayerPerceptron

IBk

FURIA

BayesN

et

AdaB

oostM1(N

aiveBayes)

OLM

SimpleC

art

ConjunctiveR

ule

AdaB

oostM1(D

ecisionStump)

LAD

Tree

OneR

Bagging(R

EPTree)

J48A

daBoostM

1(J48)

IBk

Acc

ura

cy

0.4

0.5

0.6

0.7

0.8

0.9

1

JRip

LMT

HoeffdingTree

Random

Tree

Random

Forest

NaiveB

ayes

SMO

(PolyKernel)

MultilayerPerceptron

LogitBoost(D

ecisionStump)

MultilayerPerceptron

DecisionTable

SMO

(RBFK

ernel)

Logistic

HyperPipes

MultilayerPerceptron

IBk

FURIA

BayesN

et

AdaB

oostM1(N

aiveBayes)

OLM

SimpleC

art

ConjunctiveR

ule

AdaB

oostM1(D

ecisionStump)

LAD

Tree

OneR

Bagging(R

EPTree)

J48A

daBoostM

1(J48)

IBk

Are

a u

nd

er t

he

RO

C c

urv

e

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 12

Page 20: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Performance of Algorithms

105 datasets, 30 classifiers

Friedman - Nemenyi test (α = 0.05)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Logistic Model TreeRandom Forest

Bagging(REP Tree)AdaBoost(J48)

FURIASMO(Poly Kernel)

Simple CartLogitBoost(Decision Stump)

Multilayer Perceptron (20)J48

LogisticJRip

Multilayer Perceptron (10)REP Tree

k-NN (k=10) LAD TreeMultilayer Perc. (10, 10)k-NN (k=1)Decision TableHoeffding TreeSMO(RBF Kernel)Bayesian NetworkAdaBoost(NaiveBayes)NaiveBayesAdaBoost(DecisionStump)Random TreeOneRConjunctive RuleHyper PipesOLM

CD

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 13

Page 21: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Data Streams

On line learning

Many IoT applications in this paradigm

Example: Predict the electricity price for the next day

Feedback whether the prediction was correctModel can become obsolete (concept drift)

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

0 5 10 15 20 25 30 35 40

accu

racy

interval

Hoeffding TreeNaive Bayes

SPegasosk-NN

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 14

Page 22: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Performance of Data Streams Algorithms

●●

0.00

0.25

0.50

0.75

1.00

NoCha

nge

Majo

rityC

lass

SPegas

os lo

gloss

SPegas

os h

ingelo

ss

SGD loglo

ss

SGD hing

eloss

Decisi

onStu

mp

Perce

ptro

n

AWE(O

neR)

AWE(D

ecisi

onStu

mp)

RuleClas

sifier

Rando

mHoe

ffding

Tree

NaiveB

ayes

kNN k

= 1

AWE(R

EPTree

)

kNN k

= 10

AWE(S

MO(P

olyKer

nel))

AWE(L

ogist

ic)

kNNwith

PAW

k =

10

AWE(J

48)

AWE(J

Rip)

Hoeffd

ingTr

ee

ASHoeffd

ingTr

ee

Hoeffd

ingOpt

ionTr

ee

Hoeffd

ingAda

ptive

Tree

Pre

dict

ive

Acc

urac

y

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 15

Page 23: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Performance of Data Streams Algorithms

1 2 3 4 5 6 7 8 9 10111213141516171819202122232425

HoeffdingOptionTreeHoeffdingAdaptiveTree

HoeffdingTreeASHoeffdingTree

AWE(J48)AWE(JRip)

AWE(SMO(PolyKernel))AWE(Logistic)

kNNwithPAW k = 10AWE(REPTree)

kNN k = 10kNN k = 1NaiveBayes

RandomHoeffdingTreePerceptronRuleClassifierAWE(DecisionStump)AWE(OneR)SPegasos loglossDecisionStumpSPegasos hingelossSGD hingelossSGD loglossMajorityClassNoChange

CD

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 16

Page 24: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Goal

Can we build a classifier that does better?

How can we use the expermental results in OpenML for this?

Probably! By combining them in a smart way (ensembles)

Approach: work on intervals of 1,000 observations

Task: try to predict for the next interval which classifier to use

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 17

Page 25: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Goal

Can we build a classifier that does better?

How can we use the expermental results in OpenML for this?

Probably! By combining them in a smart way (ensembles)

Approach: work on intervals of 1,000 observations

Task: try to predict for the next interval which classifier to use

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 17

Page 26: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

The OpenML approach

Many data streams (and tasks) from various sources

Real world: electricity, forest convertype, airlinesSynthetic: Bayesian Network Generator, Moving Hyperplanes, LED

Meta-features per data stream

Direct access to all MOA classifiers

Experimental results

ModelsPredictionsMeasured Performance

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 18

Page 27: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Meta-Features

Category Meta-featuresSimple # Instances, # Attributes, # Classes, Dimensionality, Default Accuracy, # Observations with

Missing Values, # Missing Values, % Observations With Missing Values, % Missing Values, #Numeric Attributes, # Nominal Attributes, # Binary Attributes, % Numeric Attributes, % NominalAttributes, % Binary Attributes, Majority Class Size, % Majority Class, Minority Class Size, %Minority Class

Statistical Mean of Means of Numeric Attributes, Mean Standard Deviation of Numeric Attributes, MeanKurtosis of Numeric Attributes, Mean Skewness of Numeric Attributes

Information Theoretic Class Entropy, Mean Attribute Entropy, Mean Mutual Information, Equivalent Number Of At-tributes, Noise to Signal Ratio

Landmarkers Accuracy, Kappa and Area under the ROC Curve of the following classifiers: Decision Stump, J48(confidence factor: 0.01), k-NN, NaiveBayes, REP Tree (maximum depth: 3)

Drift detection Changes by Adwin (Hoeffding Tree), Warnings by Adwin (Hoeffding Tree), Changes by DDM(Hoeffding Tree), Warnings by DDM (Hoeffding Tree), Changes by Adwin (Naive Bayes), Warningsby Adwin (Naive Bayes), Changes by DDM (Naive Bayes), Warnings by DDM (Naive Bayes)

Stream Landmarkers Accuracy Naive Bayes on previous window, Accuracy k-NN on previous window, . . .

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 19

Page 28: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Meta-Features

Category Meta-featuresSimple # Instances, # Attributes, # Classes, Dimensionality, Default Accuracy, # Observations with

Missing Values, # Missing Values, % Observations With Missing Values, % Missing Values, #Numeric Attributes, # Nominal Attributes, # Binary Attributes, % Numeric Attributes, % NominalAttributes, % Binary Attributes, Majority Class Size, % Majority Class, Minority Class Size, %Minority Class

Statistical Mean of Means of Numeric Attributes, Mean Standard Deviation of Numeric Attributes, MeanKurtosis of Numeric Attributes, Mean Skewness of Numeric Attributes

Information Theoretic Class Entropy, Mean Attribute Entropy, Mean Mutual Information, Equivalent Number Of At-tributes, Noise to Signal Ratio

Landmarkers Accuracy, Kappa and Area under the ROC Curve of the following classifiers: Decision Stump, J48(confidence factor: 0.01), k-NN, NaiveBayes, REP Tree (maximum depth: 3)

Drift detection Changes by Adwin (Hoeffding Tree), Warnings by Adwin (Hoeffding Tree), Changes by DDM(Hoeffding Tree), Warnings by DDM (Hoeffding Tree), Changes by Adwin (Naive Bayes), Warningsby Adwin (Naive Bayes), Changes by DDM (Naive Bayes), Warnings by DDM (Naive Bayes)

Stream Landmarkers Accuracy Naive Bayes on previous window, Accuracy k-NN on previous window, . . .

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 19

Page 29: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Meta-Features

Category Meta-featuresSimple # Instances, # Attributes, # Classes, Dimensionality, Default Accuracy, # Observations with

Missing Values, # Missing Values, % Observations With Missing Values, % Missing Values, #Numeric Attributes, # Nominal Attributes, # Binary Attributes, % Numeric Attributes, % NominalAttributes, % Binary Attributes, Majority Class Size, % Majority Class, Minority Class Size, %Minority Class

Statistical Mean of Means of Numeric Attributes, Mean Standard Deviation of Numeric Attributes, MeanKurtosis of Numeric Attributes, Mean Skewness of Numeric Attributes

Information Theoretic Class Entropy, Mean Attribute Entropy, Mean Mutual Information, Equivalent Number Of At-tributes, Noise to Signal Ratio

Landmarkers Accuracy, Kappa and Area under the ROC Curve of the following classifiers: Decision Stump, J48(confidence factor: 0.01), k-NN, NaiveBayes, REP Tree (maximum depth: 3)

Drift detection Changes by Adwin (Hoeffding Tree), Warnings by Adwin (Hoeffding Tree), Changes by DDM(Hoeffding Tree), Warnings by DDM (Hoeffding Tree), Changes by Adwin (Naive Bayes), Warningsby Adwin (Naive Bayes), Changes by DDM (Naive Bayes), Warnings by DDM (Naive Bayes)

Stream Landmarkers Accuracy Naive Bayes on previous window, Accuracy k-NN on previous window, . . .

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 19

Page 30: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Stream Landmarkers

. . . c . . .

w

l1 ✓ ✓ ✗ ✓ ✗ ✓ ✓ ✓ ✓ ✗ 0.7

l2 ✓ ✓ ✗ ✓ ✓ ✓ ✓ ✗ ✗ ✓ 0.7

l3 ✓ ✓ ✗ ✓ ✓ ✓ ✓ ✓ ✗ ✓ 0.8

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 20

Page 31: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Stream Landmarkers

P(l ′, c, α, L) =

{1 iff c = 0P(l ′, c − 1, α, L) · α+ (1 − L(l ′(PSc ), l(PSc ))) · (1 − α) otherwise

(1)

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 21

Page 32: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Stream Landmarkers

P(l ′, c, α, L) =

{1 iff c = 0P(l ′, c − 1, α, L) · α+ (1 − L(l ′(PSc ), l(PSc ))) · (1 − α) otherwise

(1)

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 21

Page 33: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Classifier Output Difference

25 on line classifiers (data streams)

No

Cha

nge

SGD

HIN

GEL

OSS

SGD

LO

GLO

SSSP

egas

os H

ING

ELO

SSSP

egas

os L

OG

LOSS

Maj

ority

Cla

ssPe

rcep

tron

AWE(

One

Rul

e)D

ecis

ion

Stum

pAW

E(D

ecis

ion

Stum

p)R

ule

Cla

ssifi

er1−

NN

k−N

N w

ith P

AWk−

NN

Ran

dom

Hoe

ffdin

g Tr

eeH

oeffd

ing

Adap

tive

Tree

Hoe

ffdin

g O

ptio

n Tr

eeAS

Hoe

ffdin

g Tr

eeH

oeffd

ing

Tree AW

E(JR

ip)

AWE(

REP

Tre

e)AW

E(J4

8)N

aive

Bay

esAW

E(SM

O)

AWE(

Logi

stic

)0.0

0.1

0.2

0.3

0.4

0.5

0.6

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 22

Page 34: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Results

●●

●●

●●

●●

●●

●●

0.25

0.50

0.75

1.00

Majo

rity V

ote

Ensem

ble

AWE(J

48)

Best S

ingle

Classif

ier

Online

Bag

ging

Met

a−lea

rning

Ens

emble

BLAST (W

indow

)

BLAST (F

F)

Leve

ragin

g Bag

ging

Pre

dict

ive

Acc

urac

y

1 2 3 4 5 6 7 8

Leveraging BaggingBLAST (FF)

Online BaggingBLAST (Window) Meta-learning Ensemble

Best Single ClassifierAWE(J48)Majority Vote Ensemble

CD

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 23

Page 35: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Results

●●

●●

●●

●●

●●

●●

0.25

0.50

0.75

1.00

Majo

rity V

ote

Ensem

ble

AWE(J

48)

Best S

ingle

Classif

ier

Online

Bag

ging

Met

a−lea

rning

Ens

emble

BLAST (W

indow

)

BLAST (F

F)

Leve

ragin

g Bag

ging

Pre

dict

ive

Acc

urac

y

1 2 3 4 5 6 7 8

Leveraging BaggingBLAST (FF)

Online BaggingBLAST (Window) Meta-learning Ensemble

Best Single ClassifierAWE(J48)Majority Vote Ensemble

CD

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 23

Page 36: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Results

●●

●●●

●●●

1

10

100

1000

10000

Best S

ingle

Classif

ier

AWE(J

48)

Majo

rity V

ote

Ensem

ble

BLAST (W

indow

)

BLAST (F

F)

Online

Bag

ging

Leve

ragin

g Bag

ging

Run

Cpu

Tim

e

1 2 3 4 5 6 7

Best Single ClassifierAWE(J48)

Majority Vote EnsembleBLAST (Window)

BLAST (FF)Online BaggingLeveraging Bagging

CD

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 24

Page 37: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Results

●●

●●●

●●●

1

10

100

1000

10000

Best S

ingle

Classif

ier

AWE(J

48)

Majo

rity V

ote

Ensem

ble

BLAST (W

indow

)

BLAST (F

F)

Online

Bag

ging

Leve

ragin

g Bag

ging

Run

Cpu

Tim

e

1 2 3 4 5 6 7

Best Single ClassifierAWE(J48)

Majority Vote EnsembleBLAST (Window)

BLAST (FF)Online BaggingLeveraging Bagging

CD

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 24

Page 38: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Conclusions

Two techniques

Online Performance EstimationEnsemble of heterogeneous classifiers

Individual performances are average

Combination (BLAST) boosts performance considerably

Parameters to optimize:

Ensemble compositionWindow sizeVoting policy

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 25

Page 39: OpenML.org: Networked Science and IoT Data Streams by Jan van Rijn, University of Freiburg

Thank you for your attention

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 26