applicability and parameterization of machine …...©prof. dr.-ing. wolfgang lehner | bachelor...

40
© Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis – Onur Ekici Applicability and Parameterization of Machine Learning Approaches for Time Series Modeling Montag, 12.05.2014

Upload: others

Post on 19-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

© Prof. Dr.-Ing. Wolfgang Lehner |

Bachelor thesis – Onur Ekici

Applicability and Parameterization of Machine Learning Approaches for Time Series Modeling

Montag, 12.05.2014

Page 2: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 2

predict

What is Machine Learning ?

learn Model

Red apple

Green apple

FeaturesResponse

Values

Features

ResponseValue

?

Prediction

future

past

Machine learning is about predicting the feature based on the past

Hal Daumé III

Page 3: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 3

Introduction

Decision-making and investment planning

Statistical models Machine Learning

Random Forests

Linear Model

Page 4: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 4

Outline

TIME SERIES MODELING

MACHINE LEARNING

BUILD MODEL

ENSEMBLE MODELS

Bias Correction Ensemble Estimations

CONCLUSION

Page 5: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 5

Outline

TIME SERIES MODELING

MACHINE LEARNING

BUILD MODEL

ENSEMBLE MODELS

Bias Correction Ensemble Estimations

CONCLUSION

Page 6: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 6

Time Series Modeling

1st year 2nd year 3rd year

Item 1

Item 2

Item 3

•Generally determined by using statistical models e.g. exponential smoothing, autoregressive model•Require a long and consistent history

Page 7: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 7

Cross Sectional Forecasting

Sparse and too short for the statistical models

1st year 2nd year 3rd year

Item 1

Item 2

Item 3

not available

Cross Sectional Forecasting

Page 8: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 8

Outline

TIME SERIES MODELING

MACHINE LEARNING

BUILD MODEL

ENSEMBLE MODELS

Bias Correction Ensemble Estimations

CONCLUSION

Page 9: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 9

Classification and Regression Tree (CART)

Split the data recursively into two groups to fit

How to split data

Maximum decrease of the impurity of a node best split Impurity of a node in regression problems:

2

1

)'(1

n

indErrorMeanSquare

Size >25

Size > 15

6 8

Brand A

19 27

Train Data Model Regression Tree

Brand Size Price

A 10 6

A 20 8

A 30 19

B 40 27

Page 10: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 10

A simple Example

The ”price” for each instance should be predicted from features ”size” and ”brand”

Before Splitting:Regression Tree has a single node, which contains all instance

Now, the aim of CART is to minimize MSE.

Brand Size Price

A 10 6

A 20 8

A 30 19

B 40 27

5.724

12479 2222

MSE

154

60

Page 11: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 11

Brand Size Price

A 10 6

A 20 8

A 30 19

B 40 27

A simple Example

The ”price” for each instance should be predicted from features ”size” and ”brand”

Finding the best split:Splitting on brand :

One possible split :•A or B

Brand Size Price (Y) Predicted price(Y‘) Squared error

A 10 6 11 25

A 20 8 11 9

A 30 19 11 64

B 40 27 27 0

MSE=98/4=24.5

Brand A

11 27

Page 12: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 12

Brand Size Price

A 10 6

A 20 8

A 30 19

B 40 27

A simple Example

The ”price” for each instance should be predicted from features ”size” and ”brand”

Finding the best split:The best split for size ≥ 25:

Three possible split :• ≥ 15 or not• ≥ 25 or not• ≥ 35 or not

Brand Size Price (Y) Predicted price(Y‘) Squared error

A 10 6 7 1

A 20 8 7 1

A 30 19 23 16

B 40 27 23 16

MSE=34/4=8.5

Size ≥ 25

7 23

Page 13: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 13

A simple Example

Size ≥ 25

Size ≥ 15

6 8

Brand A

19 27

Train Data Model Regression Tree

Brand Size Price

A 10 6

A 20 8

A 30 19

B 40 27

Same steps again and again.

Page 14: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 14

Random Forest

BUT

Overfit the training data ( when stops fitting ?)

Small changes lead to big changes in the decision tree

Leo Breiman offers random forest as a solution.Random Subspace MethodBootstrap Aggregating

CART IS

SIMPLE

Computationally fast and

easy interpretable

Page 15: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 15

Random Forest

CART builds just one treeRandom Forest build lots of different trees with CART Algorithm

Best split searched

fromRandomly selected features

Random Subspace MethodsBest split is searched not over all features

Page 16: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 16

Bootstrap Aggregating

Is a machine learning ensemble method to combine models

Tree Tree Tree

Resample randomly with replacement

Average

Build a tree with CART

Observation 1Observation 2Observation 3

Observation 1Observation 3Observation 3

Observation 1Observation 1Observation 2

Observation 2Observation 2Observation 3

Page 17: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 17

Out Of Bag Error

Observation 2Observation 2Observation 3

Observation 1Observation 1Observation 2

1. Tree 2. Tree 3. Tree

Average

Observation 3

OOB Data

OOB Error for 3. Tree

Test 3. Tree

Observation 1Observation 3Observation 3

Observation 1Observation 2Observation 3

Page 18: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 18

Linear Model

The relationship between a number of independent variables (x1, x2, ...)

and a dependent variable y

10 TV advertisement prediction 60 cars sold

Example takes from :http://www.coursehero.com/sitemap/schools/3404-Texas-AM-University-Corpus-Christi/courses/235762-ORMS3310/

Page 19: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 19

Outline

TIME SERIES MODELING

MACHINE LEARNING

BUILD MODEL

ENSEMBLE MODELS

Bias Correction Ensemble Estimations

CONCLUSION

Page 20: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 20

Representing Task and Data

3 years (36 months) market data is available for this case study

•Building predictive model (first 13 months) •Evaluation of the model (remaining 22 months)

Following features avaible for each item:•Sales units in the previous month•Stock units in the previous month•Purchase units in the previous month•Property of Item 1 ... 6

The aim is to predict:1. Sales number per month2. Sales number for each brand per month3. Sales number for each item per month

Page 21: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 21

Selecting Features for Random Forest

Identifying relevant predictor variables, rather than only predicting the response by means of some black-box model, is of

interest in many applications.

Carolin Strobl 2008

2 possible methods for random forest in regression problems:

1. Selection Frequency

2. Permutation Importance

Page 22: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 22

Selection Frequency

how offen was each feature used in the individual trees for the division

The relevant Features:

•Property 6•Property 5

•Purchase units•Stock new units•Sales units

Page 23: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 23

Permutation Importance

The value of the feature is artificially noised and the change of

OOB error is measured.

Calculate OOB error of the tree

Permute the feature in the OOB data

Calculate again OOB error of the tree

Calculate the difference between

the first and second OOB error

Iterate for each tree

and for each feature

Page 24: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 24

Permutation Importance

The relevant Features:

•Purchase units•Stock new units•Sales units

Page 25: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 25

Selecting Features for Random Forest

Now there are 3 possible scenarios:

1. Random Forest with all features (black box model)

2. Random Forest with the relevant features through selection frequency

3. Random Forest with the relevant features through permutation accuracy

1. Question is:

Which feature selection method should be used in random forest and is it necessary ?

Page 26: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 26

Selection of The Relevant Features

8,80

9,00

9,20

9,40

9,60

9,80

10,00

Total sales num.

Blackbox

SelectionFrequency

PermutationImportance

34

36

38

Sales num. for each brand

65

66

67

68

Sales num. for each item

•The permutation importance is more reliable

•Choosing relevant features improves the accuracy of random forests.

SAPE

SAPE

SAPE

Page 27: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 27

Outline

TIME SERIES MODELING

MACHINE LEARNING

BUILD MODEL

ENSEMBLE MODELS

Bias Correction Ensemble Estimations

CONCLUSION

Page 28: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 28

Random Forest and Bias

The bias is a systematic error in the model estimation.

Model:

2 methods are introduced to correct bias:

1. Ensemble of random forest estimation and bias correction with linear model

2. Ensemble of random forest estimation and bias correction with random forest

+5

TV Ads Sold Car

10 65

20 125

30 165

10*5 xy

Page 29: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 29

Bias Correction with linear model

Proposed by Zhang and Lu (2012)

Linear relationship between the real value and estimated value

strandomForeYbbionnewEstimat 10

Random forest trainsRandom forest predicts

in OOB Data

Linear model trains

in OOB Data

Page 30: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 30

Bias Correction with Random Forest

Proposed by Ruo Xu (2013)

Relationship between the features and bias

Use a second random forest predict bias of the first random forest

1. Random forest trains

1. Random forest predicts

in OOB Data

Calculates Bias of 1.RF

in OOB Data

2. Random forest trains

In OOB Data

Page 31: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 31

Bias Correction Methods

1. Random forest

2. Ensemble of random forest estimation and bias correction with linear model

3. Ensemble of random forest estimation and bias correction with random forest

2. Question is:

How effective are the bias correction methods in time series ?

Page 32: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 32

Bias Correction Methods

8,5

9

9,5

10

Total sales num.

RF withsecond RF

RF withlinear m.

RF

35

35,5

36

36,5

Sales num. for each brand

64

66

68

70

Sales num. for each brand

•The bias correction methods have not yielded any significant improvement

SAPE SAPE

SAPE

Page 33: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 33

Ensemble of Estimations

Perlich proved that logistic regression and decision trees act as a complement to each other

• Here the estimations of each model will be equal-weighted combined

3. Question :

Is it possible to improve accuracy with ensemble of linear model and random forest ?

Random forest trains and predicts

Linear model trains and predicts

Average

Page 34: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 34

Ensemble of Models

62

64

66

68

Sales num. for each item

8,6

8,8

9

9,2

9,4

Total Sales num.

Random F

Linear M.

Ensemble

33

34

35

36

Sales num. for each brand

•The ensemble has a better accuracy thanthe linear model andthe random forest

SAPE

SAPE

SAPE

Page 35: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 35

Outline

TIME SERIES MODELING

MACHINE LEARNING

BUILD MODEL

ENSEMBLE MODELS

Bias Correction Ensemble Estimations

CONCLUSION

Page 36: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

| 36

Conclusion

8 different model based on 2 machine learning approaches;

Random Forest and Linear Model

1. Better to select relevant features instead of some black-box model

2. Permutation Importance is more reliable method to select features

3. Ensemble of RF and LM has better accuracy than either RF, or LM alone

4. Bias correction methods have not resulted any improvement

Future WorksCombine different models: SVM, neural networks etc.Combine witfh different ratios

Page 37: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

© Prof. Dr.-Ing. Wolfgang Lehner |

Bachelor thesis – Onur Ekici

Applicability and Parameterization of Machine Learning Approaches for Time Series Modeling

Montag, 12.05.2014

Page 38: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

© Prof. Dr.-Ing. Wolfgang Lehner | | 38

Parameter of Random Forest

There are 3 parameter for random forest:

Number of Trees (ntree) :

the optimization of the performance rather than the optimization of the accuracy.

It should not be set too small,

hence the forest can be stabilized. It should be at least several hundred.

In case it is too large,

the computation takes more time, but the result does not change.

Default value : 500

In this work: 1000

Page 39: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

© Prof. Dr.-Ing. Wolfgang Lehner | | 39

Parameter of Random Forest

Test was performed in the initial data.

The forest stabilizes after 200 trees in most cases.

Page 40: Applicability and Parameterization of Machine …...©Prof. Dr.-Ing. Wolfgang Lehner | Bachelor thesis –Onur Ekici Applicability and Parameterization of Machine Learning Approaches

© Prof. Dr.-Ing. Wolfgang Lehner | | 40

Parameter of Random Forest

Node size (nodesize) :

When should stop CART algorithm ?

If the number of instances in the node is less than or equal to the node

size, then the algorithm stops splitting and this node is called terminal node.

Important parameter in CART, but has not great effect in random forest.

Default Value : 5

In this work: 5

Number of features sampled (mytr):

It is a key parameter to optimize accuracy of random forest.

Breiman suggested one third of the number of features for regression problems.

The tuneRF function in R Implementation of random forest tunes mytr.