tokyo research laboratory © copyright ibm corporation 2005sdm 05 | 2005/04/21 | ibm research, tokyo...

13
SDM 05 | 2005/04/21 | Tokyo Research Laboratory © Copyright IBM Corporation 2005 IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from Heterogeneous Dynamic Systems using Change-Point Correlations

Upload: vernon-bridges

Post on 18-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from

SDM 05 | 2005/04/21 |

Tokyo Research Laboratory

© Copyright IBM Corporation 2005

IBM Research, Tokyo Research Lab

Tsuyoshi Ide

Knowledge Discovery from Heterogeneous Dynamic Systems using Change-Point Correlations

Page 2: Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from

SDM 05 | 2005/04/21 |

Tokyo Research Laboratory

© Copyright IBM Corporation 2005

Example:

Data streams in the real world are full of intractable featuresHow can we discover knowledge on system’s mechanism?

sudden & unpredictable

changes

strong correlation

•time series data taken from various sensors in a car

heterogeneity

Direct comparison between time-series leads to meaningless results due to the heterogeneity

Page 3: Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from

SDM 05 | 2005/04/21 |

Tokyo Research Laboratory

© Copyright IBM Corporation 2005

Tackling the heterogeneity using a nonlinear transformation ─ “Correlate by features, not by values”

apparently different

chan

ge-

po

int sco

reo

rigin

al reco

rdin

gs

synchronized features

potential relationship

Page 4: Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from

SDM 05 | 2005/04/21 |

Tokyo Research Laboratory

© Copyright IBM Corporation 2005

12 wtx 1tx

reference interval test interval

Test matrix(Test vectors)

Singular spectrum transformation (SST) ─ transforms an original time series into a time-series of the change-point scores using SVD

1. For each of reference and test intervals, construct a matrix using subsequences as column vectors

2. Perform SVD on those matrices to find representative patterns

3. Compute dissimilarity between the patterns

compute the dissimilarity

12 ,,..., ttntH sss SVD ,.., 21 uu

representative patterns

u 1

u 2

test v

ector

Page 5: Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from

SDM 05 | 2005/04/21 |

Tokyo Research Laboratory

© Copyright IBM Corporation 2005

Surprisingly, SST removes the heterogeneity with a single parameter set. No ad hoc parameter tuning is needed.

SST

raw data change-point score

• comparable to each other• existing clustering techniques are applicable

w =15(15 data points)

• recordings taken from a car• quite heterogeneous

Page 6: Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from

SDM 05 | 2005/04/21 |

Tokyo Research Laboratory

© Copyright IBM Corporation 2005

SST is robust over a very wide range of parametersDependence on the pattern length, w

w =20, l =3

original data

change-point score

change-point scorefor 5 < w < 38, l =3

change-point scorefor 5 < w < 38, l =3

robust over a very wide range of w

Page 7: Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from

SDM 05 | 2005/04/21 |

Tokyo Research Laboratory

© Copyright IBM Corporation 2005

Application to an automobile data set. SST effectively detects change-points irrespective of the heterogeneous behavior

Data powertrain control module sampling rate = 0.1 sec

Variables x1: fuel flow rate x2: engaged gear x3: vehicle speed x4: engine RPM x5: manifold absolute pressure

CP scoreraw data( w =25; 25 data points)

Page 8: Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from

SDM 05 | 2005/04/21 |

Tokyo Research Laboratory

© Copyright IBM Corporation 2005

Application to an automobile data set. SST is useful for discovering hidden structures among variables

x2 and x4 behave quite differently in the original data

However, after the SST, they forms the nearest pair in the MDS plot.

This result can be naturally understood by the mechanism of the car

engaged gear

engine RPM

Multidimensional scalingmethod for retrieving the coordinates from a distance matrix

Page 9: Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from

SDM 05 | 2005/04/21 |

Tokyo Research Laboratory

© Copyright IBM Corporation 2005

Backup

Page 10: Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from

SDM 05 | 2005/04/21 |

Tokyo Research Laboratory

© Copyright IBM Corporation 2005

Looking for a change-point detection method applicable to heterogeneous systems without any ad hoc parameter tuning

Existing methods CUSUM (cumulated sum)

• Suitable only for such a data that distributes around a constant. Autoregressive models

• Cannot be used for streams with sudden & unpredictable changes. Wavelet transforms

• Essentially the same as differentiation. Applicable to a very limited class of variables. Otherwise, fine parameter tuning for individual variable is needed.

Gaussian mixture models• Cannot be used for streams with sudden & unpredictable changes.

A new “nonparametric” approach

Note: The original inspiration for this approach was based on an aspect of [Moskvina-Zhigljavsky 2003]

Singular Spectrum Transformation

Page 11: Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from

SDM 05 | 2005/04/21 |

Tokyo Research Laboratory

© Copyright IBM Corporation 2005

SST needs only two parameters in a standard setting: the pattern length and the number of patterns

t

test interval at t

reference interval at t

w vectors

w vectors

w

w

w

pattern lengthw

# of patterns

l

Page 12: Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from

SDM 05 | 2005/04/21 |

Tokyo Research Laboratory

© Copyright IBM Corporation 2005

Computing the distances between variables and visualize them using multidimensional scaling

original time series change-point score

normalization

distance between

xi & xj

1)( 2 txdt i 1)( txdt i

standardize standardize

SST

smoothing

L1 distanceL2 distance

smoothing

multidimensional scaling

(a) raw (b) SST

2)( ji xxdt || ji xxdt

Normalization policies and distance metrics

* SST series are nonnegative by definition, so they are naturally interpreted as the probability densities of changes

Multidimensional scaling retrieves the coordinates from a distance matrix eigenvalue analysis shows d =2 is sufficient

Page 13: Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from

SDM 05 | 2005/04/21 |

Tokyo Research Laboratory

© Copyright IBM Corporation 2005

Summary and future work

Summary We proposed a new framework of data mining for heterogeneous dynamic systems The SST is a robust and effective way to remove the heterogeneity An experiment demonstrated the utility of SST in discovering hidden structures

Future work Speed up and sophisticate SST Develop methodologies for causality analysis

v

x y

z w

v

x y

z w

zcar A

car B