1. 2 general problem retrieval of time-series similar to a given pattern
Post on 21-Dec-2015
214 views
TRANSCRIPT
1
LOCATING PATTERNS INDISCRETE TIME-SERIES
Kevin B. Pratt
Committee:
Eugene FinkDmitry Goldgof
Rafael Perez
2
General problem
Retrieval of time-series similar to a given pattern.
3
Example: Stock chartsDatabase of time-series
4
Example: Stock chartsDatabase of time-series Pattern
5
Example: Stock chartsDatabase of time-series Pattern Retrieval results
6
Example: Stock chartsDatabase of time-series Pattern Retrieval results
.92
.87
.86
.84
7
Example: ElectrocardiogramDatabase of time-series
8
Example: ElectrocardiogramDatabase of time-series Pattern
9
Example: ElectrocardiogramDatabase of time-series Pattern Retrieval results
.91
.87
.98
1.0
10
Outline
• Previous work
• Important points
• Indexing and retrieval
• Empirical results
• Conclusions
11
Outline
• Previous work
• Important points
• Indexing and retrieval
• Empirical results
• Conclusions
Contributions}
12
Criteria for retrieval methods
Gunopulos [2000]:
• Work for erratic time-series
• Accept any pattern
• Find inexact matches
• Work when some points are missing
• Work on streaming data
13
Outline
• Previous work• Important points
• Indexing and retrieval
• Empirical results
• Conclusions
14
Previous work
• Feature choice
• Similarity metrics
• Indexing and retrieval
15
Previous work: Feature choice
• Discrete Fourier transforms
• Alphabets
• Statistical features
• Subsets of points
16
Previous work: Similarity metrics
• Euclidean distance
• Bounding rectangles
• Envelope count
• Aggregate similarity
17
Previous work: Indexing and retrievalAdvanced techniques:
• B-trees
• R-trees
• KD-trees
• VP-trees
• Grids
Applied techniques:
• Linear search with compression
18
Outline
• Previous work
• Important points• Indexing and retrieval
• Empirical results
• Conclusions
19
Important points
Choose “important” maxima and minima, and discard the other points.
20
Important points
Choose “important” maxima and minima, and discard the other points.
Original series
Example:
21
Important points
Choose “important” maxima and minima, and discard the other points.
Original series
Example:
22
Important points
Choose “important” maxima and minima, and discard the other points.
Original series
Example:
Compressed series
23
Definition of important points
Important minimum
24
Definition of important points
Important minimum• am is the minimum among
ai,…, aj
25
Definition of important points
Important minimum• am is the minimum among
ai,…, aj
• ai/am R and aj/am R
26
Definition of important points
Important minimum• am is the minimum among
ai,…, aj
• ai/am R and aj/am R
• R is a knob that determines
compression rate
27
Definition of important points
Important maximum• am is the maximum among ai,
…, aj
• am/ai R and am/aj R
• R is a knob that determines
compression rate
28
Compression example
Original series
29
Compression example
Original series
Compressed series
30
Compression example
Original series
Compressed series
31
Compression example
Original series
Compressed series
32
Compression algorithm
• Linear time
• Constant memory
• Accepts streaming data
For a series with n values, compression time is 0.0133 n milliseconds (300 MHz PC, Visual Basic 6.0).
33
Outline
• Previous work
• Important points
• Indexing and retrieval• Empirical results
• Conclusions
34
RetrievalRetrieval of time-series similar to a given pattern.
Intuition:
• Find a prominent feature in the pattern
• Find candidate segments with a similar feature
• Compare similarity of candidates to the pattern
35
Example: Stock chartsDatabase of time-series
36
Example: Stock chartsDatabase of time-series
37
Example: Stock chartsDatabase of time-series Pattern
38
Example: Stock chartsDatabase of time-series Pattern
39
Example: Stock chartsDatabase of time-series Pattern
40
Example: Stock chartsDatabase of time-series Pattern Retrieval results
.92
.87
.86
.84
41
Algorithm
• Identify the prominent leg in the pattern
• Retrieve similar legs from the database
• Identify corresponding candidate segments
• For each candidate segment, compute its similarity to the pattern
• Output the candidates whose similarity is above the threshold
42
Important details
• Use compressed pattern and compressed sequences in the retrieval process
• The prominent feature is the leg having the greatest ratio of right end to left end
• All legs in the database are indexed by their prominence, using a binary search tree
43
Alternative versions
• Different prominence definitions
• Different similarity metrics
The end-point ratio prominence usually gives the best empirical results.
44
Extended legs
Similar sequence
45
Indexing on extended legs
• Advantage: More accurate retrieval
• Disadvantage: Larger index, more memory
If a compressed sequence has n legs:
• Worst case: n2/2 extended legs
• Average case: (n lg n) extended legs
46
Outline
• Previous work
• Important points
• Indexing and retrieval
• Empirical results• Conclusions
47
Data sets
• Stock charts
• Air and sea temperatures
• Wind speeds
• Electroencephalograms
• Electrocardiograms
48
Data sets
• Stock charts
• Air and sea temperatures
• Wind speeds
• Electroencephalograms
• Electrocardiograms
60,000 points
445,000 points
79,000 points
17,000 points
2,000 points
49
PatternsCompressed patterns with 4 to 27 legs
Examples:
50
Retrieval timeRetrieval time: 0.07 m k milliseconds
m legs in a pattern
k candidates
51
Retrieval accuracy: Stock charts
20 % candidates
C = 3
10 %
C = 2
5 %
C = 1.5
1 %
C = 1.1
52
Retrieval accuracy: Wind speeds
20 % candidates
C = 1.5
10 %
C = 1.2
5 %
C = 1.1
53
Retrieval candidate quality
Stock charts (5,400 legs) 4 4 7
Air and sea temperatures (5,500 legs) 4 5 6
Wind speeds (10,500 legs) 3 7 9
Candidates
5% 10% 20%
Found matches among ten best:
54
Outline
• Previous work
• Important points
• Indexing and retrieval
• Empirical results
• Conclusions
55
Criteria for retrieval methods
Gunopulos [2000]:
• Work for erratic time-series
• Accept any pattern
• Find inexact matches
• Work when some points are missing
• Work on streaming data
56
Criteria for retrieval methods
Gunopulos [2000]:
• Work for erratic time-series
• Accept any pattern
• Find inexact matches
• Work when some points are missing
• Work on streaming data
57
Criteria for retrieval methods
Gunopulos [2000]:
• Work for erratic time-series
• Accept any pattern
• Find inexact matches
• Work when some points are missing
• Work on streaming data
~
58
Criteria for retrieval methods
Gunopulos [2000]:
• Work for erratic time-series
• Accept any pattern
• Find inexact matches
• Work when some points are missing
• Work on streaming data
~
59
Criteria for retrieval methods
Gunopulos [2000]:
• Work for erratic time-series
• Accept any pattern
• Find inexact matches
• Work when some points are missing
• Work on streaming data
~
60
Criteria for retrieval methods
Gunopulos [2000]:
• Work for erratic time-series
• Accept any pattern
• Find inexact matches
• Work when some points are missing
• Work on streaming data
~~
61
Main results
Compression
• Fast compression procedure
• Preserves similarity
Retrieval
• Works with compressed data
• Controlled trade-off between speed and accuracy