accelerating dynamic time warping clustering with a novel admissible pruning strategy nurjahan...
TRANSCRIPT
![Page 1: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/1.jpg)
Accelerating Dynamic Time Warping Clustering with a Novel
Admissible Pruning StrategyNurjahan Begum Liudmila Ulanova Jun Wang1 Eamonn Keogh
University of California, Riverside University of Texas at Dallas1
![Page 2: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/2.jpg)
Outline
• Introduction, Related Work & Background•Density Peaks (DP) Clustering Algorithm• Pruning Using DTW Boundings•Going Anytime: Distance Computation-Ordering Heuristic• Experimental Evaluation
![Page 3: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/3.jpg)
Outline
• Introduction, Related Work & Background•Density Peaks (DP) Clustering Algorithm• Pruning Using DTW Boundings•Going Anytime: Distance Computation-Ordering Heuristic• Experimental Evaluation
![Page 4: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/4.jpg)
Problem Description
• The problem this work plans to address is robustly clustering large time series datasets with invariance to irrelevant data.
• Accuracy
• Invariance to irrelevant data
• Scalability (Efficiency, Interruputability)
• Robustness to parameter settings
![Page 5: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/5.jpg)
Accuracy: The Using of DTW
• For most time series data mining algorithms, the quality of the output depends almost exclusively on the distance measure used.
• A consensus has emerged that DTW is the best in most domains, almost always outperforming the Euclidean Distance (ED) .
• Convergence of DTW and ED for increasing data sizes? – Not for clustering!
![Page 6: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/6.jpg)
Invariance to Irrelevant Data: the Using of DP• It has been suggested that the successful clustering of time series
requires the ability to ignore some data objects.• Anomalous objects themselves are unclusterable; • Interference with the clustering of clusterable data.
• DP, in contrast to clustering algorithms such as K-means, can ignore anomalous objects.
![Page 7: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/7.jpg)
Efficiency: Pruning Using Both Boundings
• Both DTW and DP are slow. – CPU constrained, not I/O constrained.
• In some problems (notably similarity search), the lower-bounding pruning is the main technique used to produce speedup, whose effectiveness tends to improve on large datasets.
• This is not effective in clustering due to the need to know the distance between all pairs, or at least all distances within a certain range.
• Also, due to the non-metric character of DTW, it is hard to build an index for speeding up.
• This work exploits both the lower and upper boundings of DTW in the framework of DP.
![Page 8: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/8.jpg)
Interruptablity: Going Anytime
• What if the pruning is still not sufficient? - User interruption
- This work further adapts the proposed method to an anytime algorithm.
• Anytime algorithms are algorithms that can return a valid solution to a problem, even if interrupted before ending.• Small setup time• Best-so-far answer
• Monotonicity & Diminishing returns
![Page 9: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/9.jpg)
Robustness to Parameter Settings: the Using of DP• Many clustering algorithms require the user to set many parameters.
• DP requires only two parameters. Moreover, they are relatively intuitive and not particularly sensitive to user choice.
![Page 10: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/10.jpg)
Outline
• Introduction, Related Work & Background•Density Peaks (DP) Clustering Algorithm• Pruning Using DTW Boundings•Going Anytime: Distance Computation-Ordering Heuristic• Experimental Evaluation
![Page 11: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/11.jpg)
Internal Logic & Required Parameters of DP
• The DP algorithm assumes that the cluster centers are surrounded by lower local density neighbors and are at a relatively higher distance from any point with a higher local density. For a certain point i, • the Local Density ρi is the number of points that are closer to it than some
cutoff distance dc;• the Distance from Points of Higher Density is the minimum distance δi from
point i to all the points of higher density.
• The DP algorithm requires two pre-set parameters:• The cutoff distance dc
• The number of clusters k (can be determined in a knee-down manner)See Rodriguez, A., & Laio, A. Clustering by Fast Search and Find of Density Peaks. Science, 344(6191), 1492-1496, 2014. for more!
![Page 12: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/12.jpg)
Four Phases of DP
• Local Density Calculation
• Distance to Higher Density Points Computation
• Cluster Center Selection
• Cluster Assignment
![Page 13: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/13.jpg)
Phase 1: Local Density Calculation
![Page 14: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/14.jpg)
Phase 2: Distance to Higher Density Points Computation
![Page 15: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/15.jpg)
Phase 3: Cluster Center Selection
• The cluster centers are selected using a simple heuristic: points with higher values of (ρi×δi) are more likely to be centers.
![Page 16: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/16.jpg)
Phase 4: Cluster Assignment
![Page 17: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/17.jpg)
Why DP?
• Capability of ignoring outlier.
• Capability of handling datasets whose clusters can form arbitrary shapes.
• Few user-set parameters and low sensitivity.
• Amiability to distance computation pruning and conversion to an anytime algorithm.
![Page 18: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/18.jpg)
Outline
• Introduction, Related Work & Background•Density Peaks (DP) Clustering Algorithm• Pruning Using DTW Boundings•Going Anytime: Distance Computation-Ordering Heuristic• Experimental Evaluation
![Page 19: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/19.jpg)
Pruning Using DTW Bounds
• The proposed algorithm, TADPole (Time-series Anything DP), requires distance computations in the following two phases:
• Phase 1: local density computation
• Phase 2: distance to higher density points computation (NN distance computation)
![Page 20: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/20.jpg)
Pruning in the Local Density Computation Phase
![Page 21: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/21.jpg)
Pruning in the NN Distance Computation Phase
![Page 22: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/22.jpg)
Pruning in the NN Distance Computation Phase
![Page 23: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/23.jpg)
Multidimensional Time Series Clustering
Independent calculation → Summation
![Page 24: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/24.jpg)
Multidimensional Time Series Clustering
![Page 25: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/25.jpg)
Pruning Effectiveness: Baselines
• Brute force: all-pair distance matrix computed.
• Oracle (post-hoc): only necessary distance computations are needed.• Local density calculation phase: only distance computations contributing to
the actual density of an object considered.• NN distance calculation phase: only the actual NN distances considered.
![Page 26: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/26.jpg)
Pruning Effectiveness: Illustration
• Dataset: StarLightCurves
![Page 27: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/27.jpg)
Outline
• Introduction, Related Work & Background•Density Peaks (DP) Clustering Algorithm• Pruning Using DTW Boundings•Going Anytime: Distance Computation-Ordering Heuristic• Experimental Evaluation
![Page 28: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/28.jpg)
Going Anytime: Which Phase is Amiable?
• TADPole requires distance computations in the following two phases:
• Phase 1: local density computation• Not amiable to anytime ordering - setup time
• Phase 2: NN distance computation• Amiable to anytime ordering!
![Page 29: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/29.jpg)
Going Anytime: Contestants
• Oracle: In each step of the algorithm, this order cheatingly chooses the object that maximizes the current Rand Index.
• Top-to-bottom, left-to-right? Too brittle to “luck”!
• Random ordering: less brittle to luck.
• The proposed heuristic: ρ × ub
![Page 30: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/30.jpg)
Going Anytime: Effectiveness Illustration
![Page 31: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/31.jpg)
Outline
• Introduction, Related Work & Background•Density Peaks (DP) Clustering Algorithm• Pruning Using DTW Boundings•Going Anytime: Distance Computation-Ordering Heuristic• Experimental Evaluation
![Page 32: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/32.jpg)
Clustering Quality & Efficiency Evaluation
TADPole is at least an order of magnitude faster than the rival methods.
![Page 33: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/33.jpg)
Parameter Sensitivity Evaluation
Performed on Symbols dataset with k = 6
![Page 34: Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University](https://reader035.vdocument.in/reader035/viewer/2022062222/5697bff11a28abf838cbb648/html5/thumbnails/34.jpg)
Conclusions & Comments
• Pruning using both bounds
• Anytime algorithm
• More borrowing than originating!