1. 2 general problem retrieval of time-series similar to a given pattern

61
1 LO C A TIN G PATTERNS IN D ISC R ETE TIM E-SER IES K evin B. Pratt Com m ittee: Eugene Fink Dm itry G oldgof RafaelPerez

Post on 21-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1. 2 General problem Retrieval of time-series similar to a given pattern

1

LOCATING PATTERNS INDISCRETE TIME-SERIES

Kevin B. Pratt

Committee:

Eugene FinkDmitry Goldgof

Rafael Perez

Page 2: 1. 2 General problem Retrieval of time-series similar to a given pattern

2

General problem

Retrieval of time-series similar to a given pattern.

Page 3: 1. 2 General problem Retrieval of time-series similar to a given pattern

3

Example: Stock chartsDatabase of time-series

Page 4: 1. 2 General problem Retrieval of time-series similar to a given pattern

4

Example: Stock chartsDatabase of time-series Pattern

Page 5: 1. 2 General problem Retrieval of time-series similar to a given pattern

5

Example: Stock chartsDatabase of time-series Pattern Retrieval results

Page 6: 1. 2 General problem Retrieval of time-series similar to a given pattern

6

Example: Stock chartsDatabase of time-series Pattern Retrieval results

.92

.87

.86

.84

Page 7: 1. 2 General problem Retrieval of time-series similar to a given pattern

7

Example: ElectrocardiogramDatabase of time-series

Page 8: 1. 2 General problem Retrieval of time-series similar to a given pattern

8

Example: ElectrocardiogramDatabase of time-series Pattern

Page 9: 1. 2 General problem Retrieval of time-series similar to a given pattern

9

Example: ElectrocardiogramDatabase of time-series Pattern Retrieval results

.91

.87

.98

1.0

Page 10: 1. 2 General problem Retrieval of time-series similar to a given pattern

10

Outline

• Previous work

• Important points

• Indexing and retrieval

• Empirical results

• Conclusions

Page 11: 1. 2 General problem Retrieval of time-series similar to a given pattern

11

Outline

• Previous work

• Important points

• Indexing and retrieval

• Empirical results

• Conclusions

Contributions}

Page 12: 1. 2 General problem Retrieval of time-series similar to a given pattern

12

Criteria for retrieval methods

Gunopulos [2000]:

• Work for erratic time-series

• Accept any pattern

• Find inexact matches

• Work when some points are missing

• Work on streaming data

Page 13: 1. 2 General problem Retrieval of time-series similar to a given pattern

13

Outline

• Previous work• Important points

• Indexing and retrieval

• Empirical results

• Conclusions

Page 14: 1. 2 General problem Retrieval of time-series similar to a given pattern

14

Previous work

• Feature choice

• Similarity metrics

• Indexing and retrieval

Page 15: 1. 2 General problem Retrieval of time-series similar to a given pattern

15

Previous work: Feature choice

• Discrete Fourier transforms

• Alphabets

• Statistical features

• Subsets of points

Page 16: 1. 2 General problem Retrieval of time-series similar to a given pattern

16

Previous work: Similarity metrics

• Euclidean distance

• Bounding rectangles

• Envelope count

• Aggregate similarity

Page 17: 1. 2 General problem Retrieval of time-series similar to a given pattern

17

Previous work: Indexing and retrievalAdvanced techniques:

• B-trees

• R-trees

• KD-trees

• VP-trees

• Grids

Applied techniques:

• Linear search with compression

Page 18: 1. 2 General problem Retrieval of time-series similar to a given pattern

18

Outline

• Previous work

• Important points• Indexing and retrieval

• Empirical results

• Conclusions

Page 19: 1. 2 General problem Retrieval of time-series similar to a given pattern

19

Important points

Choose “important” maxima and minima, and discard the other points.

Page 20: 1. 2 General problem Retrieval of time-series similar to a given pattern

20

Important points

Choose “important” maxima and minima, and discard the other points.

Original series

Example:

Page 21: 1. 2 General problem Retrieval of time-series similar to a given pattern

21

Important points

Choose “important” maxima and minima, and discard the other points.

Original series

Example:

Page 22: 1. 2 General problem Retrieval of time-series similar to a given pattern

22

Important points

Choose “important” maxima and minima, and discard the other points.

Original series

Example:

Compressed series

Page 23: 1. 2 General problem Retrieval of time-series similar to a given pattern

23

Definition of important points

Important minimum

Page 24: 1. 2 General problem Retrieval of time-series similar to a given pattern

24

Definition of important points

Important minimum• am is the minimum among

ai,…, aj

Page 25: 1. 2 General problem Retrieval of time-series similar to a given pattern

25

Definition of important points

Important minimum• am is the minimum among

ai,…, aj

• ai/am R and aj/am R

Page 26: 1. 2 General problem Retrieval of time-series similar to a given pattern

26

Definition of important points

Important minimum• am is the minimum among

ai,…, aj

• ai/am R and aj/am R

• R is a knob that determines

compression rate

Page 27: 1. 2 General problem Retrieval of time-series similar to a given pattern

27

Definition of important points

Important maximum• am is the maximum among ai,

…, aj

• am/ai R and am/aj R

• R is a knob that determines

compression rate

Page 28: 1. 2 General problem Retrieval of time-series similar to a given pattern

28

Compression example

Original series

Page 29: 1. 2 General problem Retrieval of time-series similar to a given pattern

29

Compression example

Original series

Compressed series

Page 30: 1. 2 General problem Retrieval of time-series similar to a given pattern

30

Compression example

Original series

Compressed series

Page 31: 1. 2 General problem Retrieval of time-series similar to a given pattern

31

Compression example

Original series

Compressed series

Page 32: 1. 2 General problem Retrieval of time-series similar to a given pattern

32

Compression algorithm

• Linear time

• Constant memory

• Accepts streaming data

For a series with n values, compression time is 0.0133 n milliseconds (300 MHz PC, Visual Basic 6.0).

Page 33: 1. 2 General problem Retrieval of time-series similar to a given pattern

33

Outline

• Previous work

• Important points

• Indexing and retrieval• Empirical results

• Conclusions

Page 34: 1. 2 General problem Retrieval of time-series similar to a given pattern

34

RetrievalRetrieval of time-series similar to a given pattern.

Intuition:

• Find a prominent feature in the pattern

• Find candidate segments with a similar feature

• Compare similarity of candidates to the pattern

Page 35: 1. 2 General problem Retrieval of time-series similar to a given pattern

35

Example: Stock chartsDatabase of time-series

Page 36: 1. 2 General problem Retrieval of time-series similar to a given pattern

36

Example: Stock chartsDatabase of time-series

Page 37: 1. 2 General problem Retrieval of time-series similar to a given pattern

37

Example: Stock chartsDatabase of time-series Pattern

Page 38: 1. 2 General problem Retrieval of time-series similar to a given pattern

38

Example: Stock chartsDatabase of time-series Pattern

Page 39: 1. 2 General problem Retrieval of time-series similar to a given pattern

39

Example: Stock chartsDatabase of time-series Pattern

Page 40: 1. 2 General problem Retrieval of time-series similar to a given pattern

40

Example: Stock chartsDatabase of time-series Pattern Retrieval results

.92

.87

.86

.84

Page 41: 1. 2 General problem Retrieval of time-series similar to a given pattern

41

Algorithm

• Identify the prominent leg in the pattern

• Retrieve similar legs from the database

• Identify corresponding candidate segments

• For each candidate segment, compute its similarity to the pattern

• Output the candidates whose similarity is above the threshold

Page 42: 1. 2 General problem Retrieval of time-series similar to a given pattern

42

Important details

• Use compressed pattern and compressed sequences in the retrieval process

• The prominent feature is the leg having the greatest ratio of right end to left end

• All legs in the database are indexed by their prominence, using a binary search tree

Page 43: 1. 2 General problem Retrieval of time-series similar to a given pattern

43

Alternative versions

• Different prominence definitions

• Different similarity metrics

The end-point ratio prominence usually gives the best empirical results.

Page 44: 1. 2 General problem Retrieval of time-series similar to a given pattern

44

Extended legs

Similar sequence

Page 45: 1. 2 General problem Retrieval of time-series similar to a given pattern

45

Indexing on extended legs

• Advantage: More accurate retrieval

• Disadvantage: Larger index, more memory

If a compressed sequence has n legs:

• Worst case: n2/2 extended legs

• Average case: (n lg n) extended legs

Page 46: 1. 2 General problem Retrieval of time-series similar to a given pattern

46

Outline

• Previous work

• Important points

• Indexing and retrieval

• Empirical results• Conclusions

Page 47: 1. 2 General problem Retrieval of time-series similar to a given pattern

47

Data sets

• Stock charts

• Air and sea temperatures

• Wind speeds

• Electroencephalograms

• Electrocardiograms

Page 48: 1. 2 General problem Retrieval of time-series similar to a given pattern

48

Data sets

• Stock charts

• Air and sea temperatures

• Wind speeds

• Electroencephalograms

• Electrocardiograms

60,000 points

445,000 points

79,000 points

17,000 points

2,000 points

Page 49: 1. 2 General problem Retrieval of time-series similar to a given pattern

49

PatternsCompressed patterns with 4 to 27 legs

Examples:

Page 50: 1. 2 General problem Retrieval of time-series similar to a given pattern

50

Retrieval timeRetrieval time: 0.07 m k milliseconds

m legs in a pattern

k candidates

Page 51: 1. 2 General problem Retrieval of time-series similar to a given pattern

51

Retrieval accuracy: Stock charts

20 % candidates

C = 3

10 %

C = 2

5 %

C = 1.5

1 %

C = 1.1

Page 52: 1. 2 General problem Retrieval of time-series similar to a given pattern

52

Retrieval accuracy: Wind speeds

20 % candidates

C = 1.5

10 %

C = 1.2

5 %

C = 1.1

Page 53: 1. 2 General problem Retrieval of time-series similar to a given pattern

53

Retrieval candidate quality

Stock charts (5,400 legs) 4 4 7

Air and sea temperatures (5,500 legs) 4 5 6

Wind speeds (10,500 legs) 3 7 9

Candidates

5% 10% 20%

Found matches among ten best:

Page 54: 1. 2 General problem Retrieval of time-series similar to a given pattern

54

Outline

• Previous work

• Important points

• Indexing and retrieval

• Empirical results

• Conclusions

Page 55: 1. 2 General problem Retrieval of time-series similar to a given pattern

55

Criteria for retrieval methods

Gunopulos [2000]:

• Work for erratic time-series

• Accept any pattern

• Find inexact matches

• Work when some points are missing

• Work on streaming data

Page 56: 1. 2 General problem Retrieval of time-series similar to a given pattern

56

Criteria for retrieval methods

Gunopulos [2000]:

• Work for erratic time-series

• Accept any pattern

• Find inexact matches

• Work when some points are missing

• Work on streaming data

Page 57: 1. 2 General problem Retrieval of time-series similar to a given pattern

57

Criteria for retrieval methods

Gunopulos [2000]:

• Work for erratic time-series

• Accept any pattern

• Find inexact matches

• Work when some points are missing

• Work on streaming data

~

Page 58: 1. 2 General problem Retrieval of time-series similar to a given pattern

58

Criteria for retrieval methods

Gunopulos [2000]:

• Work for erratic time-series

• Accept any pattern

• Find inexact matches

• Work when some points are missing

• Work on streaming data

~

Page 59: 1. 2 General problem Retrieval of time-series similar to a given pattern

59

Criteria for retrieval methods

Gunopulos [2000]:

• Work for erratic time-series

• Accept any pattern

• Find inexact matches

• Work when some points are missing

• Work on streaming data

~

Page 60: 1. 2 General problem Retrieval of time-series similar to a given pattern

60

Criteria for retrieval methods

Gunopulos [2000]:

• Work for erratic time-series

• Accept any pattern

• Find inexact matches

• Work when some points are missing

• Work on streaming data

~~

Page 61: 1. 2 General problem Retrieval of time-series similar to a given pattern

61

Main results

Compression

• Fast compression procedure

• Preserves similarity

Retrieval

• Works with compressed data

• Controlled trade-off between speed and accuracy