openml.org: networked science and iot data streams by jan van rijn, university of freiburg

OpenML.org: Networked Science and IoT Data Streams

Jan N. van Rijn

University of Freiburg

November 24, 2016

Motivation

Galileo Galilei (1564–1642)

Created the best telescopes

Discovered the rings of Saturn

Sent anagrams of his discoveries,instead of publishing the results

Jan N. van Rijn OpenML.org: Networked Science and IoT Data Streams November 24, 2016 2

Openml.org


Datasets

Data (ARFF) uploaded or referenced, versionedAnalysed, characterized, organized on lineIndexed based on name, meta-features, tags, etc.Support for other data formats (on request)


Tasks

Data alone does not define an experiment

Tasks contain: data, target attribute, goals, procedures

Readable by tools, automates experimentation

Real time ‘leaderboard’ and overview


Flows (algorithms)

Run locally, auto-registered by tools

Integrations + APIs (REST, Java, R, Python, . . . )


Flows (algorithms)



1 from s c i k i t l e a r n import t r e e2 from openml import t a sk s , runs3

4 t a s k = t a s k s . ge t (59)5 c l f = t r e e . D e c i s i o n T r e e C l a s s i f i e r ( )6 run = run . r u n t a s k ( task , c l f )7 r e t u r n t a s k , r e s pon s e = run . p u b l i s h ( )


Flows (algorithms)




Runs

Flow uploads predictionsPredictions are evaluated on OpenMLReproducible, linked to data, flows and researcherContains:

predictionsparameter settingsmodel informationevaluation measures


Analysis

Answer basic questions about performance of algorithms to study . . .

the effect / behaviour of parameters on a given algorithm

the effect of feature selection on a given algorithm

how algorithms behave with respect to each other

which algorithms perform well on a wide range of datasets


Effect of parameter

93

94

95

96

97

98

99

RBFK

ernel(1)

J48(2)

IBk(1)

Logistic(1)

Random

Forest(1)

REPTree(1)

Pre

dic

tive

Acc

ura

cy (

%)

21

22

23

24

25

26

27

28

4 16 64 256 1024 4096 16384

Op

tim

al v

alu

e

Number Of Features


Effect of Feature Selection

256

512

1024

2048

4096

8192

16384

32768

65536

1 4 16 64 256 1024 4096 16384

Nu

mb

er O

f In

stan

ces

Number Of Features

BetterEqual

Worse

k-NN (k = 1)

256

512

1024

2048

4096

8192

16384

32768

65536

1 4 16 64 256 1024 4096 16384

Nu

mb

er O

f In

stan

ces

Number Of Features

BetterEqual

Worse

Naive Bayes


Effect of Feature Selection

256

512

1024

2048

4096

8192

16384

32768

65536

1 4 16 64 256 1024 4096 16384

Nu

mb

er O

f In

stan

ces

Number Of Features

BetterEqual

Worse

k-NN (k = 1)

256

512

1024

2048

4096

8192

16384

32768

65536

1 4 16 64 256 1024 4096 16384

Nu

mb

er O

f In

stan

ces

Number Of Features

BetterEqual

Worse

Naive Bayes

256

512

1024

2048

4096

8192

16384

32768

65536

1 4 16 64 256 1024 4096 16384

Nu

mb

er O

f In

stan

ces

Number Of Features

BetterEqual

Worse

Decision Tree (C4.5)

256

512

1024

2048

4096

8192

16384

32768

65536

1 4 16 64 256 1024 4096 16384

Nu

mb

er O

f In

stan

ces

Number Of Features

BetterEqual

Worse

SVM (RBF Kernel)


Performance of Algorithms

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

JRip

LMT

HoeffdingTree

Random

Tree

Random

Forest

NaiveB

ayes

SMO

(PolyKernel)

MultilayerPerceptron

LogitBoost(D

ecisionStump)


DecisionTable

SMO

(RBFK

ernel)

Logistic

HyperPipes


IBk

FURIA

BayesN

et

AdaB

oostM1(N

aiveBayes)

OLM

SimpleC

art

ConjunctiveR

ule

AdaB

oostM1(D

ecisionStump)

LAD

Tree

OneR

Bagging(R

EPTree)

J48A

daBoostM

1(J48)

IBk

Acc

ura

cy

0.4

0.5

0.6

0.7

0.8

0.9

1

JRip

LMT

HoeffdingTree

Random

Tree

Random

Forest

NaiveB

ayes

SMO

(PolyKernel)


LogitBoost(D

ecisionStump)


DecisionTable

SMO

(RBFK

ernel)

Logistic

HyperPipes


IBk

FURIA

BayesN

et

AdaB

oostM1(N

aiveBayes)

OLM

SimpleC

art

ConjunctiveR

ule

AdaB

oostM1(D

ecisionStump)

LAD

Tree

OneR

Bagging(R

EPTree)

J48A

daBoostM

1(J48)

IBk

Are

a u

nd

er t

he

RO

C c

urv

e


Performance of Algorithms

105 datasets, 30 classifiers

Friedman - Nemenyi test (α = 0.05)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Logistic Model TreeRandom Forest

Bagging(REP Tree)AdaBoost(J48)

FURIASMO(Poly Kernel)

Simple CartLogitBoost(Decision Stump)

Multilayer Perceptron (20)J48

LogisticJRip

Multilayer Perceptron (10)REP Tree

k-NN (k=10) LAD TreeMultilayer Perc. (10, 10)k-NN (k=1)Decision TableHoeffding TreeSMO(RBF Kernel)Bayesian NetworkAdaBoost(NaiveBayes)NaiveBayesAdaBoost(DecisionStump)Random TreeOneRConjunctive RuleHyper PipesOLM

CD


Data Streams

On line learning

Many IoT applications in this paradigm

Example: Predict the electricity price for the next day

Feedback whether the prediction was correctModel can become obsolete (concept drift)

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

0 5 10 15 20 25 30 35 40

accu

racy

interval

Hoeffding TreeNaive Bayes

SPegasosk-NN


Performance of Data Streams Algorithms

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

0.00

0.25

0.50

0.75

1.00

NoCha

nge

Majo

rityC

lass

SPegas

os lo

gloss

SPegas

os h

ingelo

ss

SGD loglo

ss

SGD hing

eloss

Decisi

onStu

mp

Perce

ptro

n

AWE(O

neR)

AWE(D

ecisi

onStu

mp)

RuleClas

sifier

Rando

mHoe

ffding

Tree

NaiveB

ayes

kNN k

= 1

AWE(R

EPTree

)

kNN k

= 10

AWE(S

MO(P

olyKer

nel))

AWE(L

ogist

ic)

kNNwith

PAW

k =

10

AWE(J

48)

AWE(J

Rip)

Hoeffd

ingTr

ee

ASHoeffd

ingTr

ee

Hoeffd

ingOpt

ionTr

ee

Hoeffd

ingAda

ptive

Tree

Pre

dict

ive

Acc

urac

y


Performance of Data Streams Algorithms

1 2 3 4 5 6 7 8 9 10111213141516171819202122232425

HoeffdingOptionTreeHoeffdingAdaptiveTree

HoeffdingTreeASHoeffdingTree

AWE(J48)AWE(JRip)

AWE(SMO(PolyKernel))AWE(Logistic)

kNNwithPAW k = 10AWE(REPTree)

kNN k = 10kNN k = 1NaiveBayes

RandomHoeffdingTreePerceptronRuleClassifierAWE(DecisionStump)AWE(OneR)SPegasos loglossDecisionStumpSPegasos hingelossSGD hingelossSGD loglossMajorityClassNoChange

CD


Goal

Can we build a classifier that does better?

How can we use the expermental results in OpenML for this?

Probably! By combining them in a smart way (ensembles)

Approach: work on intervals of 1,000 observations

Task: try to predict for the next interval which classifier to use


The OpenML approach

Many data streams (and tasks) from various sources

Real world: electricity, forest convertype, airlinesSynthetic: Bayesian Network Generator, Moving Hyperplanes, LED

Meta-features per data stream

Direct access to all MOA classifiers

Experimental results

ModelsPredictionsMeasured Performance


Meta-Features

Category Meta-featuresSimple # Instances, # Attributes, # Classes, Dimensionality, Default Accuracy, # Observations with

Missing Values, # Missing Values, % Observations With Missing Values, % Missing Values, #Numeric Attributes, # Nominal Attributes, # Binary Attributes, % Numeric Attributes, % NominalAttributes, % Binary Attributes, Majority Class Size, % Majority Class, Minority Class Size, %Minority Class

Statistical Mean of Means of Numeric Attributes, Mean Standard Deviation of Numeric Attributes, MeanKurtosis of Numeric Attributes, Mean Skewness of Numeric Attributes

Information Theoretic Class Entropy, Mean Attribute Entropy, Mean Mutual Information, Equivalent Number Of At-tributes, Noise to Signal Ratio

Landmarkers Accuracy, Kappa and Area under the ROC Curve of the following classifiers: Decision Stump, J48(confidence factor: 0.01), k-NN, NaiveBayes, REP Tree (maximum depth: 3)

Drift detection Changes by Adwin (Hoeffding Tree), Warnings by Adwin (Hoeffding Tree), Changes by DDM(Hoeffding Tree), Warnings by DDM (Hoeffding Tree), Changes by Adwin (Naive Bayes), Warningsby Adwin (Naive Bayes), Changes by DDM (Naive Bayes), Warnings by DDM (Naive Bayes)

Stream Landmarkers Accuracy Naive Bayes on previous window, Accuracy k-NN on previous window, . . .


Stream Landmarkers

. . . c . . .

w

l1 ✓ ✓ ✗ ✓ ✗ ✓ ✓ ✓ ✓ ✗ 0.7

l2 ✓ ✓ ✗ ✓ ✓ ✓ ✓ ✗ ✗ ✓ 0.7

l3 ✓ ✓ ✗ ✓ ✓ ✓ ✓ ✓ ✗ ✓ 0.8


Stream Landmarkers

P(l ′, c, α, L) =

{1 iff c = 0P(l ′, c − 1, α, L) · α+ (1 − L(l ′(PSc ), l(PSc ))) · (1 − α) otherwise

(1)


Classifier Output Difference

25 on line classifiers (data streams)

No

Cha

nge

SGD

HIN

GEL

OSS

SGD

LO

GLO

SSSP

egas

os H

ING

ELO

SSSP

egas

os L

OG

LOSS

Maj

ority

Cla

ssPe

rcep

tron

AWE(

One

Rul

e)D

ecis

ion

Stum

pAW

E(D

ecis

ion

Stum

p)R

ule

Cla

ssifi

er1−

NN

k−N

N w

ith P

AWk−

NN

Ran

dom

Hoe

ffdin

g Tr

eeH

oeffd

ing

Adap

tive

Tree

Hoe

ffdin

g O

ptio

n Tr

eeAS

Hoe

ffdin

g Tr

eeH

oeffd

ing

Tree AW

E(JR

ip)

AWE(

REP

Tre

e)AW

E(J4

8)N

aive

Bay

esAW

E(SM

O)

AWE(

Logi

stic

)0.0

0.1

0.2

0.3

0.4

0.5

0.6


Results

●●

●

●

●

●

●

●

●●

●

●

●

●

●●

●●

●●

●

●

●●

●

●

●

●

0.25

0.50

0.75

1.00

Majo

rity V

ote

Ensem

ble

AWE(J

48)

Best S

ingle

Classif

ier

Online

Bag

ging

Met

a−lea

rning

Ens

emble

BLAST (W

indow

)

BLAST (F

F)

Leve

ragin

g Bag

ging

Pre

dict

ive

Acc

urac

y

1 2 3 4 5 6 7 8

Leveraging BaggingBLAST (FF)

Online BaggingBLAST (Window) Meta-learning Ensemble

Best Single ClassifierAWE(J48)Majority Vote Ensemble

CD


Results

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

1

10

100

1000

10000

Best S

ingle

Classif

ier

AWE(J

48)

Majo

rity V

ote

Ensem

ble

BLAST (W

indow

)

BLAST (F

F)

Online

Bag

ging

Leve

ragin

g Bag

ging

Run

Cpu

Tim

e

1 2 3 4 5 6 7

Best Single ClassifierAWE(J48)

Majority Vote EnsembleBLAST (Window)

BLAST (FF)Online BaggingLeveraging Bagging

CD


Conclusions

Two techniques

Online Performance EstimationEnsemble of heterogeneous classifiers

Individual performances are average

Combination (BLAST) boosts performance considerably

Parameters to optimize:

Ensemble compositionWindow sizeVoting policy


Thank you for your attention


openml.org: networked science and iot data streams by jan van rijn, university of freiburg

Data & Analytics