Download - Similar search with trillions of time series
![Page 1: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/1.jpg)
Searching and MiningTrillions of Time Series Subsequencesunder Dynamic Time Warping
Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen,
Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, Eamonn Keogh
Hoan Nguyen – Trung Minh Nguyen
![Page 2: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/2.jpg)
2
Abstract
Optimizationsto search and mine
large databasesvery fast
![Page 3: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/3.jpg)
3
Outline
Problem
Related work
Definitions
Method
Results
Conclusion
![Page 4: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/4.jpg)
4
Problem
Similarity search is an important part of most time series data mining algorithm.
Dynamic Time Warping is the best measure to use but slow.
![Page 5: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/5.jpg)
5
DefinitionsTime series
Time series T is an ordered list:
T = t1, t2, … ,tm
![Page 6: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/6.jpg)
6
DefinitionsSubsequence
Subsequence Ti,k of time series T is a time series of length k start at position i:
T = t1, t2, … ,tm
![Page 7: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/7.jpg)
7
DefinitionsDynamic Time Warping
![Page 8: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/8.jpg)
8
Related workKnown optimizations
Squared distance
√❑
![Page 9: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/9.jpg)
9
Related workKnown optimizations
Lower bounding
LB_KimFL LB_Keogh
![Page 10: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/10.jpg)
10
Related workKnown optimizations
Early abandon
![Page 11: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/11.jpg)
11
MethodEarly abandon Z-Normalization
Q
TT3
T2
T1
Z-N
orm
aliz
atio
n
Q’
T3’T2’
T1’
…
Long Time series
SubsequencesNormalized
Subsequences
QueryNormalized
Query
Normal approach
![Page 12: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/12.jpg)
12
MethodEarly abandon Z-Normalization Novel approach
Early abandon with Z-normalization
1. Query is Z-normalized
2. Z-normalization of each subsequence will be calculated on the fly with the distance calculation.
3. If distance > best_so_far then early abandon both calculation
![Page 13: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/13.jpg)
13
MethodRe-ordering Early Abandoning
Ordering is created based on the query.
![Page 14: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/14.jpg)
14
MethodCascading Lower Bounds
Lower bounds are used in a cascade to prune candidates.
![Page 15: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/15.jpg)
15
Results
Comparison between:
Naïve
- Z-normalization from start
- full ED(DTW) calculation
State-of-the-art (SOTA)
- Z-normalization from start
- early abandoning
- LB_Keogh bounding for DTW
UCRSuite
![Page 16: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/16.jpg)
16
ResultsBaseline Tests on Random Walk
Million Billion Trillion0
5000
10000
15000
20000
25000
30000
UCR-ED
SOTA-ED
UCR-DTW
SOTA-DTWmin
ute
s
|𝑄|=128
![Page 17: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/17.jpg)
17
ResultsBaseline Tests on Random Walk
Million Billion0
500
1000
1500
2000
2500
UCR-ED
SOTA-ED
UCR-DTW
SOTA-DTWseco
nd
s
|𝑄|=128
![Page 18: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/18.jpg)
18
ResultsBaseline Tests on Random Walk
|𝑇|=2×106
![Page 19: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/19.jpg)
19
ResultsEEG
Series10
100
200
300
400
500
600
3.4
494.3
UCR-ED
SOTA-ED
ho
urs
![Page 20: Similar search with trillions of time series](https://reader035.vdocument.in/reader035/viewer/2022062308/559040261a28abbd6a8b47cf/html5/thumbnails/20.jpg)
20
Conclusion
- The approach is very simple yet so effective.
- These optimizations can be applied to most measures but may not work for some, like: Hamming distance