time series prediction as a problem of missing values time series prediction as a problem of missing...

22
Time Series Prediction Time Series Prediction as a Problem of as a Problem of Missing Values Missing Values Application to ESTSP2007 and NN3 Competition Benchmarks Antti Sorjamaa Antti Sorjamaa and Amaury Lendasse and Amaury Lendasse Time Series Prediction and Time Series Prediction and ChemoInformatics Group ChemoInformatics Group Adaptive Informatics Research Centre Adaptive Informatics Research Centre Helsinki University of Technology Helsinki University of Technology

Upload: magnus-boone

Post on 22-Dec-2015

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Time Series Prediction Time Series Prediction as a Problem of Missing as a Problem of Missing

ValuesValues

Application to ESTSP2007 and NN3 Competition Benchmarks

Antti SorjamaaAntti Sorjamaa and Amaury Lendasse and Amaury Lendasse

Time Series Prediction and ChemoInformatics Time Series Prediction and ChemoInformatics GroupGroupAdaptive Informatics Research CentreAdaptive Informatics Research CentreHelsinki University of TechnologyHelsinki University of Technology

Page 2: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 22/22/22

OutlineOutline

Time Series PredictionTime Series Predictionvs. Missing Valuesvs. Missing Values

Global methodologyGlobal methodology– Self-Organizing Maps (SOM)Self-Organizing Maps (SOM)– Empirical Orthogonal Functions (EOF)Empirical Orthogonal Functions (EOF)

ResultsResults

Page 3: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 33/22/22

Missing ValuesMissing Values

11 99 ?? 1111

7766

22

44 1133

77 ?? ?? 33

77 ?? 00 88 1122

2211

1100

22 ?? 11 ?? ??

1122

?? 33 ?? 55 66

?? 55 88 ?? ?? 1111

99 66 77 22 9900

66

33 ?? 2211

?? 22 00

Tim

e

4477

4488

4499

5500

??

??

??

??

Tim

e

4422

4433

4444

4455

4466

4477

4433

4444

4455

4466

4477

4488

4444

4455

4466

4477

4488

4499

4455

4466

4477

4488

4499

5500

4466

4477

4488

4499

5500

??

4477

4488

4499

5500

?? ??

4488

4499

5500

?? ?? ??

4499

5500

?? ?? ?? ??

Page 4: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 44/22/22

Time Series PredictionTime Series Predictionvs. Missing Valuesvs. Missing Values Methods designed for finding Methods designed for finding

Missing Values in temporally related Missing Values in temporally related databasesdatabases

Time series is such a databaseTime series is such a database Unknown future can be considered Unknown future can be considered

as a set of missing valuesas a set of missing values

Same methods can be appliedSame methods can be applied

Page 5: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 55/22/22

Global MethodologyGlobal Methodology

Based on two methodsBased on two methods– SOMSOM

Nonlinear projection / interpolationNonlinear projection / interpolation Topology preservation Topology preservation

on a low-dimensional gridon a low-dimensional grid

– EOFEOF Linear projectionLinear projection Projection to high-dimensional output Projection to high-dimensional output

spacespace Needs initializationNeeds initialization

Page 6: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 66/22/22

SOMSOM

200 400 600 800 1000

3.5

4

4.5

5

x1

x2

1 2 3 5 64

1

2

3

56

4

Page 7: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 77/22/22

SOM InterpolationSOM Interpolation

SOM learning is done with known SOM learning is done with known datadata

Missing values are left out Missing values are left out Approach proposed by Cottrell and Approach proposed by Cottrell and LetrémyLetrémy(in Applied Stochastic Models and Data Analysis 2005)(in Applied Stochastic Models and Data Analysis 2005)

11

)(minarg 1,)BMU(

tit NMitIi

tx

mxmmx

11

T1

ttMNMt xxx

)()( xmxxx BMUMM

Page 8: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 88/22/22

EOF ProjectionEOF Projection

Based on Singular Value Based on Singular Value Decomposition (SVD)Decomposition (SVD)

K

kkkk

1

T* vuUDVX

Only Only q q Singular Values and Vectors are Singular Values and Vectors are usedused– qq is smaller than is smaller than KK (the rank of (the rank of XX))– Larger values contain more signal than Larger values contain more signal than

smallersmaller

q

kkkk

1

Tˆ vuX

Page 9: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 99/22/22

EOF Projection (2)EOF Projection (2)

SVD cannot deal with missing SVD cannot deal with missing valuesvalues– Initialization is crucial!Initialization is crucial!

Decomposition with SVD and Decomposition with SVD and reconstructionreconstruction– qq largest singular values and vectors largest singular values and vectors

are used in the reconstructionare used in the reconstruction– Original data is not modified!Original data is not modified!– The selection of The selection of qq using validation using validation

Page 10: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1010/22/22

EOF Projection (3)EOF Projection (3)

11 99 ?? 1111

7766

22

44 1133

77 ?? ?? 33

77 ?? 00 88 1122

2211

1100

22 ?? 11 ?? ??

1122

?? 33 ?? 55 66

?? 55 88 ?? ?? 1111

99 66 77 22 9900

66

33 ?? 2211

?? 22 00

11 99 55 1111

7766

22

44 1133

77 55 55 33

77 55 00 88 1122

2211

1100

22 55 11 55 55

1122

55 33 55 55 66

55 55 88 55 55 1111

99 66 77 22 9900

66

33 55 2211

55 22 00

11 99 44 1111

7766

22

44 1133

77 66 1111

33

77 22 00 88 1122

2211

1100

22 33 11 88 1100

1122

55 33 33 55 66

77 55 88 22 55 1111

99 66 77 22 9900

66

33 88 2211

11 22 00

11 99 44 1111

7766

22

44 1133

77 99 2211

33

77 44 00 88 1122

2211

1100

22 11 11 99 1122

1122

55 33 33 55 66

99 55 88 22 55 1111

99 66 77 22 9900

66

33 88 2211

11 22 00

11 99 44 1111

7766

22

44 1133

77 99 2222

33

77 55 00 88 1122

2211

1100

22 11 11 99 1133

1122

44 33 33 55 66

1100

55 88 22 55 1111

99 66 77 22 9900

66

33 88 2211

11 22 00

1.1. InitializatioInitializationn

2.2. Round 1Round 13.3. Round 2Round 24.4. Round 3Round 3

..

..

..n.n. Done!Done!

Page 11: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1111/22/22

Global Methodology Global Methodology (2)(2)

Missing Missing DataData

SOMSOM

EOFEOF

Data with Data with filled filled valuesvalues

SOM grid sizeSOM grid size

Number of Number of EOFEOF

EOF EOF iterationiteration

Page 12: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1212/22/22

ESTSP2007ESTSP2007Competition DataCompetition Data

100 200 300 400 500 600 700 800

20

22

24

26

28

Time

Com

pet

itio

n D

ata

Validation

Learning

Page 13: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1313/22/22

Results, Regressor size Results, Regressor size 1111

2 4 6 8 10 12 14 16 18 200

1

2

3

4

5

6

SOM Size / Number of EOF

Val

idat

ion

MS

E

EOF

SOM

SOM+EOF

Page 14: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1414/22/22

2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

SOM Size / Number of EOF

Val

idat

ion

MS

E

Results (2)Results (2)

EOF

SOM

SOM+EOF

Page 15: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1515/22/22

PredictionPrediction

750 800 850 90018

20

22

24

26

28

30

Time

Com

pet

itio

n D

ata

Page 16: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1616/22/22

NN3 CompetitionNN3 Competition

Prediction of 111 time seriesPrediction of 111 time series Single, automatic, methodology for Single, automatic, methodology for

predicting all the seriespredicting all the series Prediction of 18 values to the Prediction of 18 values to the

future for each seriesfuture for each series All series rather short, which All series rather short, which

makes the prediction trickymakes the prediction tricky Mean SMAPE of all series evaluated Mean SMAPE of all series evaluated

in the competitionin the competition

Page 17: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1717/22/22

Validation MSE = Validation MSE = 0,15590,1559

NN3: Long SeriesNN3: Long Series

0 20 40 60 80 100 120 140-1.5

-1

-0.5

0

0.5

1

1.5

2

Validation MSE = Validation MSE = 0,00760,0076

20 40 60 80 100 120 140

-1.5

-1

-0.5

0

0.5

1

1.5

Page 18: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1818/22/22

NN3: Short SeriesNN3: Short Series

10 20 30 40 50 60

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

Validation MSE = Validation MSE = 0,34930,3493

Page 19: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 1919/22/22

NN3: Validation ErrorsNN3: Validation Errors

0 0.2 0.4 0.6 0.8 1 1.2 1.40

2

4

6

8

10

12

Validation MSE

Num

ber

of S

erie

s

0 0.2 0.4 0.6 0.8 1 1.2 1.40

2

4

6

8

Validation MSE

Num

ber

of S

erie

s

0 0.2 0.4 0.6 0.8 1 1.2 1.40

5

10

Validation MSE

Num

ber

of S

erie

s

Page 20: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 2020/22/22

SummarySummary

Time Series Prediction can be viewed Time Series Prediction can be viewed as a problem of Missing Valuesas a problem of Missing Values

SOM+EOF methodology works well, SOM+EOF methodology works well, better than individual methods alonebetter than individual methods alone– SOM projection is discreteSOM projection is discrete– EOF needs sufficiently good initializationEOF needs sufficiently good initialization

Methods complete each otherMethods complete each other

Page 21: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

Antti Sorjamaa - TSPCi - AIRC - HUTAntti Sorjamaa - TSPCi - AIRC - HUT 2121/22/22

Further WorkFurther Work

Improvements to the methodologyImprovements to the methodologyThe selection of singular values and The selection of singular values and vectorsvectors

Convergence criterionConvergence criterionHow to guarantee quick convergence?How to guarantee quick convergence?

Applying the methodology to data Applying the methodology to data sets from other fieldssets from other fields

Climatology, finance, process dataClimatology, finance, process data

Page 22: Time Series Prediction as a Problem of Missing Values Time Series Prediction as a Problem of Missing Values Application to ESTSP2007 and NN3 Competition

2222/22/22

Questions?Questions?

[email protected]@hut.fi

[email protected]@cis.hut.fi

http://www.cis.hut.fi/projects/tsphttp://www.cis.hut.fi/projects/tsp

Time Series Prediction as a Time Series Prediction as a Problem of Missing ValuesProblem of Missing Values

Application to ESTSP2007 and NN3 Competition Benchmarks