tutorial 2 finding repeated structure in time series ...mueen/tutorial/sdm2015tutorial2.pdfwe have a...
TRANSCRIPT
![Page 1: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/1.jpg)
Abdullah MueenUniversity of New Mexico, USA
Eamonn KeoghUniversity of California Riverside, USA
Finding Repeated Structure in Time Series: Algorithms and Applications
Tutorial 2
Funding by NSFIIS-1161997 II
Slides available athttp://www.cs.unm.edu/~mueen/Tutorial/SDM2015Tutorial2.pdf
(we will start 5 min late to allow folks to find the room)
1
![Page 2: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/2.jpg)
Tutorial Structure
• I will start with applications and talk about algorithms after that.
• There will be four Q&A segments. Please hold your question till the next segment.
• There is a feedback form available. Negative/positive, anonymous/known feedbacks are welcome.
• There will be a break at 5:00PM for 10 minutes.
2
![Page 3: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/3.jpg)
What are Time Series?
A time series is a collection of observations made sequentially in time.
25.175025.225025.250025.250025.275025.325025.350025.350025.400025.400025.325025.225025.200025.1750
..
..24.625024.675024.675024.625024.625024.625024.675024.7500
0 50 100 150 200 250 300 350 400 450 50023
24
25
26
27
28
29
3
![Page 4: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/4.jpg)
Repeated Pattern (Motif)
Find the subsequences having very high similarity to each other.
0 2000 4000 6000 8000 100000
102030
0 100 200 300 400 500 600 700 800
Finding Motifs in Time Series, Jessica Lin, Eamonn Keogh, Stefano Lonardi, Pranav Patel, KDD 2002
时间序列中的重复模式
4
![Page 5: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/5.jpg)
General Outline
• Applications (50 minutes)
– As Subroutines in Data Mining
– In Other Scientific Research
• Algorithms (100 minutes)
– Uni-dimensional
– Multi-dimensional
– Open Problems
5
![Page 6: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/6.jpg)
Applications Outline
• Applications– As Subroutines in Data Mining
• Never Ending Learning• Time Series Clustering• Rule Discovery• Dictionary Building
– In Other Scientific Research• Data center chiller management• Worm locomotion analysis• Physiological Prediction• Activity recognition
– Motifs in Other Data-types• Audio• Shapes• Motion
6
![Page 7: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/7.jpg)
…Y dijo Dios: Sea la luz; y fue la luz. Y vio Dios que la luz era buena . Y llamó Dios a …
…And God said, “Let there be light”; and there was light. And God saw the light, that it was good. And God …
If you have parallel texts, then over time you can learn a dictionary with high accuracy.
Motifs allow us to learn, forever, without an explicit teacher…
7
![Page 8: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/8.jpg)
…Y dijo Dios: Sea la luz; y fue la luz. Y vio Dios que la luz era buena . Y llamó Dios a …
…And God said, “Let there be light”; and there was light. And God saw the light, that it was good. And God …
Suppose however that the unknown “language” is not discrete, but real-valued time series? In this case, repeated pattern discovery can help*…
*Yuan Hao, Yanping Chen et al (2013). Towards Never-Ending Learning from Time Series Streams. SIGKDD 2013
Note the mapping is non-linear, the learning algorithms in this domain are non-trivial.
If you have parallel texts, then over time you can learn a dictionary with high accuracy.
Motifs allow us to learn, forever, without an explicit teacher…
8
![Page 9: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/9.jpg)
Motifs allow us to learn, forever, without an explicit teacher…
This dataset contains standard IADL housekeeping activities (vacuuming, ironing, dusting, brooming,, watering plants etc). We have a discrete (binary) “text” that notes if the hand is near a cleaning instrument, and a real-valued accelerometer “text”
P (X-axis acceleration of wrist)
0 1000 2000 3000
offon
Glove
B (RFID tag)
offon
Iron(omitted)
5 seconds
Binary “text”
Real-valued “text”
9
![Page 10: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/10.jpg)
We can run motif discovery on the time series stream. If we find motifs, we can see if they correlate with the discrete streams…
P (X-axis acceleration of right wrist)
0 1000 2000 3000
offon
Glove
B (RFID tag)
C1 observed C1 observed C1 observed
Glove 1/1
Iron 0/1
(omitted)
offon
Iron
(omitted)
Glove 2/2
Iron 0/2
(omitted)
Glove 3/3
Iron 0/3
(omitted)
5 seconds
In this snippet, the motifs seem to correlate with the presence of a glove…
10
![Page 11: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/11.jpg)
How well does this work?
Over a hour of activity, we learn to recognize a behavior in the time series that indicates the user is putting on a glove.
0 200 4000 200 400
Learned concept C1
True positives
False positives
Discovered pattern
0 minutes
0
0.5
1
iron P(C1=fan) = 0.07
P(C1=iron) = 0.23
P(C1=glove) = 0.70
fan
P(X-axis acceleration of right wrist)Events shown in last slide take place here
55 minutes
glove
Note: There are false positives, but we considered only a single axis for simplicity. 11
![Page 12: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/12.jpg)
200 400 600 800 1000 12000
And how would we evaluate our answer?
Motifs allow us to cluster subsequences of a time series…
12Thanawin Rakthanmanon, Eamonn Keogh, Stefano Lonardi, and Scott Evans (2011). Time Series Epenthesis: Clustering Time Series Streams
Requires Ignoring Some Data. ICDM 2011
![Page 13: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/13.jpg)
== Poem ==In a sort of Runic rhyme,To the throbbing of the bells--Of the bells, bells, bells,To the sobbing of the bells;Keeping time, time, time,As he knells, knells, knells,In a happy Runic rhyme,To the rolling of the bells,--Of the bells, bells, bells--To the tolling of the bells,Of the bells, bells, bells, bells,Bells, bells, bells,--To the moaning and the groan-ing of the bells.
Data: last 30 seconds from 4-min poem, The Bells by Edgar Allan PoeEdgar Allan Poe
200 400 600 800 1000 12000
And how would we evaluate our answer?
Motifs allow us to cluster subsequences of a time series…
13
![Page 14: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/14.jpg)
== Poem ==In a sort of Runic rhyme,To the throbbing of the bells--Of the bells, bells, bells,To the sobbing of the bells;Keeping time, time, time,As he knells, knells, knells,In a happy Runic rhyme,To the rolling of the bells,--Of the bells, bells, bells--To the tolling of the bells,Of the bells, bells, bells, bells,Bells, bells, bells,--To the moaning and the groan-ing of the bells.
== Text in each clusters ==bells, bells, bells,Bells, bells, bells,
Of the bells, bells, bells,Of the bells, bells, bells--
To the throbbing of the bells--To the sobbing of the bells;To the tolling of the bells,
To the rolling of the bells,--To the moaning and the groan-
time, time, time,knells, knells, knells,
sort of Runic rhyme,groaning of the bells.
And how would we evaluate our answer?
200 400 600 800 1000 12000
Motifs allow us to cluster subsequences of a time series…
14
![Page 15: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/15.jpg)
Key observations that make this possible:
• Time Series Motifs!
• We are willing to allow some data to be unexplained by the clustering
• We score the possible clustering's with MDL, this is parameter-free!
• Allowing the clusters to be of different lengths/sizes
Thanawin Rakthanmanon, Eamonn Keogh, Stefano Lonardi, and Scott Evans (2011). Time Series Epenthesis: Clustering Time Series Streams
Requires Ignoring Some Data. ICDM 2011
200 400 600 800 1000 12000
Motifs allow us to cluster subsequences of a time series…
15
![Page 16: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/16.jpg)
Motifs are useful, but can we predict the future?
0 1000 2000 3000 4000 5000 6000 7000 8000-30
-25
-20
-15
-10
What happens next?
Prediction vs. Forecasting (informal definitions)
Forecasting is “always on”, it constantly predicts a value say, two minutes out (we are not doing this)
Prediction only make a prediction occasionally, when it is sure what will happen next
(unpublished work, email Dr. Keogh for preprint)
16
![Page 17: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/17.jpg)
Why Predict the (short-term) Future?If a robot can predict that is it about to fall, it may be able to..
• Prevent the fall• Mitigate the damage of the fall
More importantly, if the robot can predict a human’s actions
• The robot could catch the human!• This would allow more natural human/robot interaction.• Real time is not fast enough for interaction! We need to be a half second before real time.•
• Other examples:• Predict a car crash, tighten seatbelts, apply brakes• Predict the next spoken word after ‘data’ is ‘mining’, then begin prefetching WebPages..• etc
17
![Page 18: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/18.jpg)
Previous attempts at this have largely failed…
However, we can do this, and time series motifs are the key tool
The rule discovery technique will use:• Time Series Motifs• MDL (minimum description length)• Admissible speed-up techniques (not discussed here)
18
![Page 19: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/19.jpg)
Let us start by finding motifs
0 1000 2000 3000 4000 5000 6000 7000 8000 9000-30
-25
-20
-15
-10
0 20 40 60 80 100
First occurrence
Second occurrence
19
![Page 20: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/20.jpg)
We can convert the motifs to a rule
0 1000 2000 3000 4000 5000 6000 7000 8000 9000-30
-25
-20
-15
-10
0 20 40 60 80 100
0 20 40 60 0 20 40
t1 = 7.58
maxlag = 0
We can use the motif to make a rule…
IF we see thisshape, (antecedent)
THEN we see thatshape, (consequent)
within maxlag time
The Euclidean distance between thisshape and the observed window must be within a threshold t1 = 7.58
20
![Page 21: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/21.jpg)
We can monitor streaming data with our rule..
8000 9000
0 20 40 60 0 20 40
t1 = 7.58
maxlag = 0
21
![Page 22: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/22.jpg)
The rule gets invoked…
0 20 40 60 0 20 40
t1 = 7.58
maxlag = 0
5685 5735 5785 5835 5885
rule fires here predicting this shape’s occurrence
22
![Page 23: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/23.jpg)
It seems to work!
0 20 40 60 0 20 40
t1 = 7.58
maxlag = 0
5685 5735 5785 5835 5885
rule fires here predicting this shape’s occurrence
23
![Page 24: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/24.jpg)
What is the ground truth?
0 1000 2000 3000 4000 5000 6000 7000 8000 9000-30
-25
-20
-15
-10
0 20 40 60 80 100
at door chambermy
The first verse of The Raven by Poe in MFCC space
Once upon a midnight dreary, while I pondered weak and weary.. ..rapping at my chamber door……
The phrase “at my chamber door” does appear 6 more times, and we do fire our rule correctly each time, and have no false positives.
What are we invariant to? • Who is speaking? Somewhat, we can handle other males, but females are tricky.• Rate of speech? To a large extent, yes.• Foreign accents? Sore throat? etc
0 20 40 60 0 20 40
t1 = 7.58
maxlag = 0
24
![Page 25: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/25.jpg)
Why we need the Maxlagparameter
Here the maxlag depends on the number of floors we have in our building.
We can hand-edit this rule to generalize for short buildings to tall buildings
Can physicians edit medical rules to generalize from male to female…
t1 = 0.4
maxlag = 4 sec 0 40 80 120-11
-10
-9 waiting for elevator
stepping inside
elevator moves up
elevator stops
walking away
y-
acce
lera
tio
n
t1 = 0.4
maxlag = 4 sec 0 40 80 120-11
-10
-9 waiting for elevator
stepping inside
elevator moves up
elevator stops
walking away
y-
acce
lera
tio
n
25
![Page 26: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/26.jpg)
maxlag = 20 minutes
0 120 180015500 16000 16500 17000
IF we see a Clothes Washer used
THEN we will see Clothes Dryer used within 20 minutes27
This works, really!
![Page 27: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/27.jpg)
Insect Behavior AnalysisBeet Leafhopper (Circulifer tenellus)
plant membraneStylet
voltage source
input resistor
V
0 50 100 150 2000
1
2
to insectconductive glue
voltage reading
to soil near plant
0 1 2 3 4 5 6 7 8x 104
50010001500200025003000350040004500
Additional examples
of the motif
0 50 100 150 200 250 300 350 400-3
-2
-1
0
1
2
3
4
5
6
Instance at 20,925
Instance at 25,473
The electrical penetration graph or EPG is a system used by biologists to study the interaction of insects with plants.
15 minutes of EPG recorded on Beet Leafhopper
As a bead of sticky secretion, which is by-product of sap feeding, is ejected, it temporarily forms a highly conductive bridge between the insect and the plant.
Exact Discovery of Time Series Motifs. A Mueen, et al. SDM, 2009.29
![Page 28: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/28.jpg)
900 1000 1100 1200 900 1000 1100 1200-5
0
5
Length :412600 800 1000 600 800 1000
-5
0
5
Length :565
7000 7200 7400 7600 7000 7200 7400 7600-5
0
5
Length :7995200 5400 5600 5800 5200 5400 5600 5800
-5
0
5
Length :799
3900 4000 4100 4200 3900 4000 4100-5
0
5
Length :3344800 4900 5000 5100 4800 4900 5000 5100
-5
0
5
Length :384
N
S
N
Insect Behavior Analysis
More motifs reveal different feeding patterns of Beet Leafhopper. 30
![Page 29: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/29.jpg)
Applications Outline
• Applications– As Subroutines in Data Mining
• Never Ending Learning• Time Series Clustering• Rule Discovery• Dictionary Building
– In Other Scientific Research• Data center chiller management• Worm locomotion analysis• Physiological Prediction• Activity recognition
– Motifs in Other Data-types• Audio• Shapes• Motion
31
![Page 30: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/30.jpg)
Sustainable Operation and Management of Data Center Chillers using Temporal Data Mining
HP Labs with Virginia Tech
“Our primary goal is to link the time series temperature data gathered from chiller units to high level sustainability characterizations… thus using time series motifs as a crucial intermediate representation to aid in data reduction.”
“switching from motif 8 to motif 5 gives us a nearly $40,000 in annual savings!” Patnaik et al. SIGKDD09
32
![Page 31: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/31.jpg)
A dictionary of behavioral motifs reveals clusters of genes affecting C. elegans locomotion
Brown et al. PNAS 2013; 110(2): 791–796.
Laboratory of Molecular Biology, Cambridge, United Kingdom
Goal: Detect genotype by using the locomotion only.Convert postures to four dimensional time series.
33
![Page 32: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/32.jpg)
Brown A E X et al. PNAS 2013;110:791-796
A dictionary of behavioral motifs reveals clusters of genes affecting C. eleganslocomotion
34
![Page 33: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/33.jpg)
Variability in motif structure is lower in juvenile Directed than in Undirected and similar to that in adult song.
35Social performance reveals unexpected vocal competency in young songbirdsSatoshi Kojima and Allison J. Doupe, PNAS 2011, 108 (4) 1687-1692.
Amplitude Envelop
![Page 34: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/34.jpg)
Motif discovery in physiological datasets: A methodology for inferring predictive elements
University of Michigan and MIT
We evaluated our solution on a population of patients who experienced sudden cardiac death and attempted to discover electrocardiographic activity that may be associated with the endpoint of death. To assess the predictive patterns discovered, we compared likelihood scores for time series motifsin the sudden death population…
Zeeshan Syed et al. TKDD 201036
![Page 35: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/35.jpg)
Constrained Motif Discovery in Time SeriesToyoaki Nishida, Kyoto University
“we use time series motifs to find gesture patterns with applications to robot-human interactions” Okada, Izukura and Nishida 2011
37
![Page 36: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/36.jpg)
Discovering Characteristic Actions from On-Body Sensor Data
David Minnen, Thad Starner, Irfan Essa, and Charles Isbell, Georgia Tech
Our algorithm successfully discovers motifs that correspond to the real exercises with a recall rate of 96.3% and overall accuracy of 86.7% over six exercises and 864 occurrences.
Minnen et al. Symp. Wearable Computers 200638
![Page 37: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/37.jpg)
Motifs can Spot Dance Moves…
x 104
AB
C
EF
A AA BCC
DDDDDDEE
FF
0 0.5 1 1.5 2 2.5 3
0/20/22/4
1/41/21/2
0/30/40/2
0/40/40/4
Hip
Hand
Arm
Legxyz
xyz
xyz
xyz
Data: H. Pohl et al. SMC 2010Algorithm: Mueen, ICDM 2013
Step Action
A Side steps with no arm movement
B Rock steps sideways without arm movement
C Rock steps sideways with arm movement
D Side steps with arm movement
E Side steps with arms up in the air
F Standing still with head bopping
Motifs are from the same dance steps or the same transitions 86% of the time.
Choreography
Time
39
![Page 38: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/38.jpg)
Applications Outline
• Applications– As Subroutines in Data Mining
• Never Ending Learning• Time Series Clustering• Rule Discovery• Dictionary Building
– In Other Scientific Research• Data center chiller management• Worm locomotion analysis• Physiological Prediction• Activity recognition
– Motifs in Other Data-types• Audio• Shapes• Motion
42
![Page 39: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/39.jpg)
Motifs in AudioMice calls are inaudible and have significant noise
Manual inspection over temporal signal is impossible
Features like MFCC are not good for animal song
Just use the raw spectrogram to find repeated calls
43Yuan Hao et al. Parameter-Free Audio Motif Discovery in Large Data Archives. ICDM 2013: 261-270
![Page 40: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/40.jpg)
Projectile shapes
Petroglyphs
Lampasas River Cornertang
Castroville Cornertang
0 50 100 150 200 250 300 350
Algorithm detects a rare cornertang segment – an object that has long intrigued anthropologists.
Algorithm detects similar petroglyphs drawn across continents and centuries
Motifs in Shapes
44
![Page 41: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/41.jpg)
Motifs in Motion
Two motion can be stitched together by transiting from one motif to the other, a very useful technique for motion synthesis.
45Yankov et al. Detecting time series motifs under uniform scaling. KDD 2007
Mueen et al. A disk-aware algorithm for time series motif discovery. Data
Min. Knowl. Discov. 2011.
![Page 42: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/42.jpg)
Time Series Motifs have 1,000 of Uses
• ..for discovering motifs in the music data is called the Mueen-Keogh (MK) algorithm.. Cabredo et al. 2011
• we apply the MK motif algorithm to time series retrieved from seismicsignals… Cassisi et al 2012
• we take motif developed by Keogh in order to support a medical expert in discovering interesting knowledge. Kitaguchi.
• for the problem of estimation of Micro-drilled Hole Wall of PWBs we take the Motif method developed by Keogh... Toshiki et al. (fabrication)
• the most efficient motif provided a power savings of 41 This translates to an annual reduction of 287 tons of CO2. Watson InterPACK09.
• We use Keogh's Motifs for unsupervised discovery of abnormal humanbehavior in multi-dimensional time series data... Vahdatpour SDM 2010.
• variability of behavior, using motifs, provides more consistent groupings of households across different clustering algorithms… Ian Dent 2014
46
![Page 43: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/43.jpg)
Questions and Comments
47
![Page 44: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/44.jpg)
Algorithms Outline
• Algorithms– Definition, Distance Measures and Invariances– Exact Algorithms
• Fixed Length• Enumeration of All length• K-motif Discovery• Online Maintenance
– Approximate Algorithms• Random Projection Algorithm
– Multi-dimensional Motif Discovery– Open Problems
48
![Page 45: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/45.jpg)
Definition of Time Series Motifs
1. Length of the motif
2. Support of the motif
3. Similarity of the Pattern
4. Relative Position of the Pattern
20 40 60 80 100 120 140 160 180 200-2
-1
0
1
2
49
![Page 46: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/46.jpg)
Distance Measures
• The choices are
– Euclidean Distance
– Correlation
– Dynamic Time Warping
– Longest Common Subsequences
– Uniformly Scaled Euclidean Distance
– Sliding Nearest Neighbor Distance
50
![Page 47: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/47.jpg)
Euclidean Distance Metric
y
x
d(x,y)
Given two time series
x = x1…xn
and
y = y1…yn
their z-Normalized Euclidean distance is
defined as:
n
i
ii
y
yi
i
x
xii
)yx(yxd
yy
xx
1
2ˆˆ),(
ˆ,ˆ
Early abandoning reduces number of operations when minimizing
51
0 10 20 30 40 50 60 70 80 90-2
0
2sum of squared error exceeded best2
![Page 48: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/48.jpg)
Pearson’s Correlation Coefficient
• Given two time series x and y of length m.
• Sufficient Statistics:
• Correlation Coefficient:
𝑐𝑜𝑟𝑟 𝒙, 𝒚 = 𝑖=1𝑚 𝑥𝑖𝑦𝑖 −𝑚𝜇𝑥𝜇𝑦
𝑚𝜎𝑥𝜎𝑦
Where 𝜇𝑥 = 𝑖=1𝑚 𝑥𝑖𝑚
and 𝜎𝑥2 = 𝑖=1𝑚 𝑥𝑖
2
𝑚− 𝜇𝑥
2
• Early abandoning is possible when maximizing
• Correlation is not a metric, therefore, use of triangular inequality needs special attention
𝑖=1𝑚 𝑥𝑖𝑦𝑖 𝑖=1
𝑚 𝑥𝑖 𝑖=1𝑚 𝑦𝑖 𝑖=1
𝑚 𝑥𝑖2 𝑖=1
𝑚 𝑦𝑖2
52
![Page 49: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/49.jpg)
Relationship with Euclidean Distance
𝑑 𝒙, 𝒚 = 2𝑚(1 − 𝑐𝑜𝑟𝑟(𝒙, 𝒚))
𝑥𝑖 =𝑥𝑖−𝜇𝑥
𝜎𝑥and 𝑦𝑖 =
𝑦𝑖−𝜇𝑦
𝜎𝑦
𝑑2 𝒙, 𝒚 =
𝑖=1
𝑚
𝑥𝑖 − 𝑦𝑖2
Minimizing z-normalized Euclidean distance and Maximizing Pearson’s correlation coefficient are identical in effect for motif discovery. 53
![Page 50: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/50.jpg)
Euclidean Vs Dynamic Time Warping
Euclidean DistanceSequences are aligned “one to one”.
“Warped” Time AxisNonlinear alignments are possible.
54
![Page 51: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/51.jpg)
C
QC
Q
How is DTW Calculated?
),(),( nmDCQDTW
Warping path w• Quadratic time complexity• DTW is not a metric
)1,1(),,1(),1,(min)(),( 2 jiDjiDjiDcqjiD ji
55
![Page 52: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/52.jpg)
A four-slide digression, to make sure you understand what invariances are, and why they are important
56
![Page 53: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/53.jpg)
Suppose we are walking in a cemetery in Japan.
We see an interesting grave marker, and we want to learn more about it.
We can take a photo of it and search a database….
57
![Page 54: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/54.jpg)
Campana and Keogh (2010). A Compression Based Distance Measure for Texture. SDM 2010.
58
![Page 55: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/55.jpg)
In order to do this, we must have a distance measure with the right invariances
Color invariance
Occlusion invariance
Size invariance
Rotation invariance59
![Page 56: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/56.jpg)
Time Series Data has Unique Invariances
• These invariances are domain/problem dependent
• They include
– Complexity invariance
– Warping invariance
– Uniform scaling invariance
– Occlusion invariance
– Rotation/phase invariance
– Offset invariance
– Amplitude invariance
• Sometimes you achieve the invariance in the distance measure, sometimes by preprocessing the data.
• In this work, we will just assume offset/amplitude invariance. See [a] for a visual tour of time series invariances.
[a] Batista, et al (2011) A Complexity-Invariant Distance Measure for Time Series. SDM 2011
Z-normalization of each subsequence removes these
60
![Page 57: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/57.jpg)
-2
-1
0
1
2
Z-Normalization ensures scale and offset invariances
x
xi
xx
ˆ
61
-5
-3
-1
1
3
0 100-5
-3
-1
1
3 Query
Distance to the query
0 500 1000 1500
0
40
80
120
Threshold = 30
Wandering Baseline
Without Normalization only 75% of the beats are missed
![Page 58: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/58.jpg)
Algorithms Outline
• Algorithms– Definition, Distance Measures and Invariances– Exact Algorithms
• Fixed Length• Enumeration of All length• K-motif Discovery• Online Maintenance
– Approximate Algorithms• Random Projection Algorithm
– Multi-dimensional Motif Discovery– Open Problems
62
![Page 59: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/59.jpg)
Simplest Definition of Time Series Motifs
1. Length of the motif = Given
2. Support of the motif = 2
3. Similarity of the Pattern = Euclidean distance
4. Relative Position of the Pattern = non-overlapping
20 40 60 80 100 120 140 160 180 200-2
-1
0
1
2
Given a length, the most similar/least distant pair of non-overlapping subsequences
63
![Page 60: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/60.jpg)
Problem Formulation
The most similar pair of non-overlapping
subsequences
100 200 300 400 500 600 700 800 900 1000
-8000
-7500
-7000
12345678...
873
time:1000
The closest pair of points in high dimensional
space
Optimal algorithm in two dimension : Θ(n log n)For large dimensionality d, optimum algorithm is
effectively Θ(n2d) 64
![Page 61: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/61.jpg)
Lower Bound
If P, Q and R are three points in a d-space
d(P,Q)+d(Q,R) ≥ d(P,R)
d(P,Q) ≥ |d(Q,R) - d(P,R)|
A third point R provides a very inexpensive lower bound on the true distance
If the lower bound is larger than the existing best, skip d(P, Q)
d(P,Q) ≥ |d(Q,R) - d(P,R)| ≥ BestPairDistance
P Q
R
65
![Page 62: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/62.jpg)
Circular Projection
r
Pick a reference point r
Circularly Project all points on a line passing through the reference point
Equivalent to computing
distance from r and then sorting the points according
to distance
1
5
3
716
10
12
20
11
6
24
21
18
2
22
17
15
23
13
148
49
19 r
66
![Page 63: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/63.jpg)
The Order Line
r
P Q
r
|d(Q, r) - d(P, r)|
d(Q, r)
d(P, r)
k = 1k = 2k = 3
k=1:n-1• Compare every pair
having k-1 points in between
• Do k scans of the order line, starting with the 1st to kth point
BestPairDistance
0
1
5
3
716
10
12
20
11
6
24
21
18
2
22
17
15
23
13
148
49
19 r
67
![Page 64: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/64.jpg)
Correctness
• If we search for all offset=1,2,…,n-1 then all possible pairs are considered.
– n(n-1)/2 pairs
• if for any offset=k, none of the k scans needs an actual distance computation then for the rest of the offsets=k+1,…,n-1 no
distance computation will be needed.
r
68
![Page 65: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/65.jpg)
Performance
• Orders of Magnitude faster• Exact in execution• No sacrifice of the quality of the results
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
10 30 50 70 90 110
X103
X103
Brute Force Brute Force with Optimized Distance Computation Our Algorithm
69
(MK)
![Page 66: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/66.jpg)
• Use multiple reference points for tighter lower bounds.
Multiple References
70
r1 r2
yx
105 106 107 108 109 1010105
106
107
108
109
1010
Pruning by Triangular Inequality
Pru
nin
g b
y P
tole
mai
c In
equ
alit
y Ptolemaic bound wins
Linear bound wins
𝑥𝑦 ≥|𝑥𝑟1. 𝑦𝑟2 + 𝑥𝑟2. 𝑦𝑟1|
𝑟1𝑟2
Ptolemaic bound
![Page 67: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/67.jpg)
Pruning by Multiple References
71
P
Q
R1
R3
R4
R2max( | d(P,Ri) - d(Q,Ri) | )
0 10 20 30 40 50 60
5
6
7
8
Lo
gar
ithm
of
tim
e
(sec
on
ds)
Number of Reference Time
Series Used
Time to find the motif in a
database of 64,000
random walk time series
of length 1,024
5 10 15 20 25 30 35
500
1000
1500
2000
2500
3000
0 5 10 15 20 25 30 350
200
400
600
800
1000
1200
1400
0 5 10 15 20 25 30 350
200
400
600
800
1000
1200
1400
1600
1800
![Page 68: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/68.jpg)
Algorithms Outline
• Algorithms– Definition, Distance Measures and Invariances– Exact Algorithms
• Fixed Length• Enumeration of All length• K-motif Discovery• Online Maintenance
– Approximate Algorithms• Random Projection Algorithm
– Multi-dimensional Motif Discovery– Open Problems
72
![Page 69: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/69.jpg)
Questions and Comments
73
![Page 70: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/70.jpg)
Simplest Definition of Time Series Motifs
1. Length of the motif = Given All
2. Support of the motif = 2
3. Similarity of the Pattern = Euclidean distance
4. Relative Position of the Pattern = non-overlapping
20 40 60 80 100 120 140 160 180 200-2
-1
0
1
2
The most similar/least distant pairs of non-overlapping subsequences at all lengths.
74
![Page 71: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/71.jpg)
Goals: Enumerating Motifs
1. Remove the length parameter
2. Search for motifs in a range of lengths and report
– ALL of the motifs of all of the lengths
3. Retain Scalability
75
![Page 72: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/72.jpg)
Bound on Extension
1. Two time series 𝒙 and 𝒚 of length 𝑚
2. Their normalized Euclidean distance 𝑑 𝒙, 𝒚
3. Find 𝑑𝐿𝐵 𝒙+1, 𝒚+1 if we increase the length of 𝒙and 𝒚 by appending the next two numbers.
Values Changed
1 2 3 4 5
345678910
Without Normalization
1 2 3 4 5-4-3-2-101234
With Normalization
76
![Page 73: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/73.jpg)
Intuition
1 2 3 4 5-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
1 2 3 4 5-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
1 2 3 4 5-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
Length 4 Length 5 Length 5
Append 10 and re-normalize Append 20 and re-normalizeNormalized
Area shrinks
Area between blue and red is the distance between the signals
Area shrinks further
If infinity is appended to both the signals, they will have zero area/distance. 77
![Page 74: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/74.jpg)
Bounding Euclidean Distance
𝑑𝐿𝐵2 𝒙+1, 𝒚+1 =
1
𝜎𝑚2 𝑑𝑚
2 𝒙, 𝒚 < 𝑑𝑚2 ( 𝒙, 𝒚)
Variances of 𝒙+1and 𝒚+1, 𝜎𝑚2 =
𝑚
𝑚+1+
𝑚
𝑚+1 2 𝑧2
𝑧 = maximum normalized value in the databaseA safe approximation 𝑧 = max(𝑎𝑏𝑠 𝒙 , 𝑎𝑏𝑠 𝒚 )
78
![Page 75: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/75.jpg)
Experimental Validation of the Bounds
2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3x 105
20
20.5
21
21.5
22
22.5
23
23.5
24
24.5
25
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 105
0
5
10
15
20
25
30
35
Pairs in ascending order of distances
Norm
aliz
ed D
ista
nce
Blue: Distances of random signals of length 255Gray: Distances of the signals when they are extended by one random sample Red: The upper and lower bounds before observing the extensions 79
![Page 76: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/76.jpg)
Intuition
3.94 =4.57
1.16
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000-8.5
-8
-7.5
-7 x 103
145, 5410, 1.26
8345, 4211, 2.63
1655, 9461, 2.96
6531, 2501, 3.17
851, 1440, 3.73
2512, 3110, 3.98
1685, 9260, 4.57
Length 𝑚=7
𝑂 𝑛 pairs in order of
the distances for
length 𝑚
145, 5410, 1.79
8345, 4211, 1.63
1655, 9461, 3.61
6531, 2501, 2.71
851, 1440, 3.83
2512, 3110, 4.18
1685, 9260, 4.27
Length 𝑚=8
8345, 4211, 1.63
145, 5410, 1.79
6531, 2501, 2.71
1655, 9461, 3.61
851, 1440, 3.83
2512, 3110, 4.18
1685, 9260, 4.27
𝜎72=1.16
Location 1Location 2
Distance
80
----, -----, -----
----, -----, -----
n = 10000
![Page 77: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/77.jpg)
2.77 =3.68
1.32
Intuition
8345, 4211, 1.63
145, 5410, 1.79
6531, 2501, 2.71
1655, 9461, 3.61
851, 1440, 3.83
Length 𝑚=8
8345, 4211, 1.23
145, 5410, 1.98
6531, 2501, 1.71
1655, 9461, 3.68
851, 1440, 3.61
Length 𝑚=9
8345, 4211, 1.23
6531, 2501, 1.71
145, 5410, 1.98
851, 1440, 3.61
1655, 9461, 3.68
𝜎82=1.32
• Once in every 10 lengths, the exact ordered list is required to be populated.
• This yields a 10x speed-up from running fixed-length motif discovery for all lengths.
81
![Page 78: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/78.jpg)
Sanity Check
0 1000 2000 3000 4000 5000 6000-2
0
2
4
1380 1400 1420 1440 1460-5
0
5
Length :871320 1340 1360 1380 1400 1420
-5
0
5
Length :105600 700 800
-5
0
5
Length :2992200 2300 2400
-5
0
5
Length :299
(1) (2) (3) (4)
White Noise
http://www.cs.unm.edu/~mueen/Projects/MOEN/index.html
• Three Patterns planted in a random signal with different scaling.
• The algorithm finds them appropriately.82
![Page 79: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/79.jpg)
Experimental Results: Scalability
0 2 4 6 8 10 12 14 16x 104
0
1
2
3
4
5
6
7x 105
Data Length (n)
Exec
uti
on
Tim
e in
Sec
on
ds
Smart Brute ForceEEGEOGRandom WalkIterative MK
1 2 3 4 5 6 7 8 9 10
0
2
4
6
8
10
12
14
16
18x 104
Smart Brute ForceEEGRandom WalkEOG
Exec
uti
on
Tim
e in
Sec
on
ds
Range of Lengths (maxLen-minLen+1)
x 102
83
MOENMOEN
![Page 80: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/80.jpg)
Algorithms Outline
• Algorithms– Definition, Distance Measures and Invariances– Exact Algorithms
• Fixed Length• Enumeration of All length• K-motif Discovery• Online Maintenance
– Approximate Algorithms• Random Projection Algorithm
– Multi-dimensional Motif Discovery– Open Problems
84
![Page 81: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/81.jpg)
Simplest Definition of Time Series Motifs
1. Length of the motif = Given All
2. Support of the motif = 2 k and τ
3. Similarity of the Pattern = Euclidean distance
4. Relative Position of the Pattern = non-overlapping
20 40 60 80 100 120 140 160 180 200-2
-1
0
1
2
The non-overlapping subsequences at all lengths having k or more τ-matches.
85
![Page 82: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/82.jpg)
Optimal algorithm is hard
• Search for locations of the τ-balls that contain k subsequences
• NP-Hard
• Instead we search for a motif representative that has k subsequences within τ
86
![Page 83: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/83.jpg)
How do we find the motif representative?
• Simply take one of the two occurrences as the representative
• Take the average of the two
• Find all occurrences within a threshold of pair and train a HMM to capture the concept (Minnen’07)
87
![Page 84: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/84.jpg)
Using each one in the pair as the representative (k=4, τ=0.9)
88
1
2
3
4
0 50 100 150-4
-2
0
2
0 5000 10000
1
2
4
5
3
0 50 100 150-4
-2
0
2
4
0 5000 10000
EEG Data
EPG Data
It takes only k similarity searches to find other occurrences. The overall complexity remains the same.
![Page 85: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/85.jpg)
Finding top-K motif
89
1000 2000 3000 4000 5000 6000
1000
2000
3000
4000
A A B A x A x A B A x A A x B x A A B A x B A x B A C AxC A A x C
Length :155
Length :255
Length :157
Length :251
A
C
B
• Run MK for K times• Replace occurrences by random noise
between iterations
![Page 86: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/86.jpg)
Algorithms Outline
• Algorithms– Definition, Distance Measures and Invariances– Exact Algorithms
• Fixed Length• Enumeration of All length• K-motif Discovery• Online Maintenance
– Approximate Algorithms• Random Projection Algorithm
– Multi-dimensional Motif Discovery– Open Problems
90
![Page 87: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/87.jpg)
• Streaming time series
• Sliding window of the recent history– What minute long trace repeated in the last hour?
Online Time Series Motifs
91
![Page 88: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/88.jpg)
Discovery
100 200 300 400 500 600 700 800 900 1000
-8000
-7500
-7000
12345678...
873
The most similar pair of non-overlapping
subsequences
Maintenance
12345678...
873874
time:100112345678...
873874875
time:100212345678...
873874875876
time:100312345678...
873874875876877
time:100412345678...
873874875876877878
time:1005
Update motif pair afterevery time tick
time
Problem Formulation
92
![Page 89: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/89.jpg)
• A subsequence is a high dimensional point
– The dynamic closest pair of points problem
• Closest pair may change upon every update
• Naïve approach: Do quadratic comparisons.
4
5
2
7
3
6
89
4
5
2
7
3
6
8
1
4
5
2
7
3
6
8
1
Deletion Insertion
Challenges
93
![Page 90: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/90.jpg)
• Goal: Algorithm with Linear update time
• Previous method for dynamic closest pair (Eppstein,00)
– A matrix of all-pair distances is maintained
• O(w2) space required
– Quad-tree is used to update the matrix
• Maintain a set of neighbors and reverse neighbors for all points
• We do it in O( ) spaceww
Related Work
94
![Page 91: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/91.jpg)
• Smallest nearest neighbor → Closest pair
• Upon insertion
– Find the nearest neighbor; Needs O(w) comparisons.
• Upon deletion
– Find the next NN of all the reverse NN
4
5
2
7
3
6
8
1
4
5
2
7
3
6
8
1
Insertion
9
4
5
2
7
3
6
8
1
Deletion
9
Maintaining Motif
95
![Page 92: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/92.jpg)
1 2 3 4 5 6 7 8FIFO Sliding window
N-listsNeighbors in order
of distances
8 7 21 8 1 24
5 1 17 3 3 67
7 6 75 2 4 55
1 4 83 7 5 72
4 8 42 5 2 13
6 1 38 1 8 38
3 5 66 4 6 46
R-listsPoints that should be updated
Total number of nodes in R-lists is w
5 1 3 24
8 67
4
5
2
7
3
6
8
1
4
7
(ptr, distance)
Data Structure
96
![Page 93: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/93.jpg)
While inserting Updating NN of old points is not necessary A point can be removed from the neighbor list if it
violates the temporal order
Averagelength O( )w
Observations
97
![Page 94: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/94.jpg)
4
5
2
7
3
6
8
1
1 1 21 3 1 2
52
83
2 43 5 3 6
4 7
5
6
4
7
6
1 2 3 4 5 6 7 8
2 3 32 3 2 6
3 7 9
5
54 4 6 8
5 7
6
68
4
2 3 4 5 6 7 8 91
4
5
2
7
3
6
8
1
9
4
5
2
7
3
6
89
10
3 34 3 6 6
57 9
54 4 7 8
5
6
4
6 8
5
7
8
9
3 4 5 6 7 8 9 1021
10
Example
98
![Page 95: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/95.jpg)
• Up to from general dynamic closest pairper point with increasing window size
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Insect
EEG
EOG
RW
Av
erag
e U
pd
ate
Tim
e (S
ec)
0 0.5 1 1.5 2 2.5 3 3.5 4x 104
Window Size (w)0 200 400 600 800 1000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Insect
EEG
EOG
RW
Av
erag
e U
pd
ate
Tim
e (S
ec)
Motif Length (m)
0 0.5 1 1.5 2 2.5 3 3.5 4x 104
102030405060708090100110
Insect
EEG
EOG
RW
Av
erag
e L
eng
th o
f N
-lis
t
Window Size (w)0 200 400 600 800 1000
0
50
100
150
200
250
300
350
Insect
EEG
EOG
RW
Motif Length (m)
Av
erag
e L
eng
th o
f N
-lis
t
Results
99
![Page 96: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/96.jpg)
Algorithms Outline
• Algorithms– Definition, Distance Measures and Invariances– Exact Algorithms
• Fixed Length• Enumeration of All length• K-motif Discovery• Online Maintenance
– Approximate Algorithms• Random Projection Algorithm
– Multi-dimensional Motif Discovery– Open Problems
100
![Page 97: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/97.jpg)
Questions and Comments
101
![Page 98: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/98.jpg)
Abdullah MueenUniversity of New Mexico, USA
Eamonn KeoghUniversity of California Riverside, USA
Finding Repeated Structure in Time Series: Algorithms and Applications
Break; We meet back in this room at 5:15PM
Slides available at: http://www.cs.unm.edu/~mueen/Tutorial/SDM2015Tutorial2.pdf102
![Page 99: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/99.jpg)
General Outline
• Applications (50 minutes)
– As Subroutines in Data Mining
– In Other Scientific Research
• Algorithms (100 minutes)
– Uni-dimensional
– Multi-dimensional
103
![Page 100: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/100.jpg)
Algorithms Outline
• Algorithms– Definition, Distance Measures and Invariances– Exact Algorithms
• Fixed Length• Enumeration of All length• K-motif Discovery• Online Maintenance
– Approximate Algorithms• Random Projection Algorithm
– Multi-dimensional Motif Discovery– Open Problems
104
![Page 101: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/101.jpg)
How do we find approximate motif in a time series?
The obvious brute force search algorithm is just too slow…
Our algorithm is based on a hot idea from bioinformatics, random projection* and the fact that SAX allows us to lower bound discrete representations of time series.
* J Buhler and M Tompa. Finding motifs using random projections. In RECOMB'01. 2001.
Symbolic Aggregate ApproXimation
106
![Page 102: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/102.jpg)
How do we obtain SAX?
First convert the time series to PAA representation, then convert the PAA to symbols
It take linear time
0 20 40 60 80 100 120
C
C
0
--
0 20 40 60 80 100 120
bbb
a
cc
c
a
baabccbc 107
![Page 103: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/103.jpg)
Time series subsequences tend to have a highly Gaussian distribution
-10 0 10
0.001
0.003
0.01 0.02
0.05
0.10
0.25
0.50
0.75
0.90
0.95
0.98 0.99
0.997
0.999
Pro
babi
lity
A normal probability plot of the (cumulative) distribution of
values from subsequences of length 128.
0
--
0 20 40 60 80 100 120
bbb
a
cc
c
a
Why a Gaussian?
108
![Page 104: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/104.jpg)
Visual Comparison
A raw time series of length 128 is transformed into the word “ffffffeeeddcbaabceedcbaaaaacddee.”
– We can use more symbols to represent the time series since each symbol requires fewer bits than real-numbers (float, double)
-3
-2 -1 0 1 2 3
DFT
PLA
Haar
APCA
a b c d e f
109
![Page 105: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/105.jpg)
ab:
:
a:
b
cc:
:
c:
c
ba:
:
c:
c
ab:
:
a:
c
1
2
:
:
58
:
985
0 500 1000
a c b a
T (m= 1000 )
a = 3 { a,b,c}n = 16w = 4
S^
C 1
C 1^
Assume that we have a time series T
of length 1,000, and a motif of length
16, which occurs twice, at time T1
and time T58.
A simple worked example of approximate motif discovery algorithm
The next 3 slides
110
![Page 106: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/106.jpg)
A mask {1,2} was randomly chosen, so the values in columns {1,2} were used to project matrix into buckets.
Collisions are recorded by incrementing the appropriate location in the collision matrix
A simple worked example of approximate motif discovery algorithm
111
![Page 107: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/107.jpg)
A mask {2,4} was randomly chosen, so the values in columns {2,4} were used to project matrix into buckets.
Once again, collisions are recorded by incrementing the appropriate location in the collision matrix
A simple worked example of approximate motif discovery algorithm
112
![Page 108: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/108.jpg)
0
1 1 E( , , , , ) 1-
2
t i w id
i
k wi ak a w d t
iw a a
We can calculate the expected values in the matrix, assuming there are NO patterns…
two randomly-generated words of size w over an alphabet of size a, the probability that they match with up to d errors
t is the length of the projected string. We conclude that if we have k random strings of size w, an entry of the similarity matrix will be hit on average
113
![Page 109: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/109.jpg)
Motif significance involves several independent dimensions
Similarity
Support
Length/size
Assessing significance requires estimating a function S:Rd→R over these dimensions so we can rank the motifs 114
![Page 110: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/110.jpg)
More invariances mean more independent dimensions
Similarity
Support
Length/size
Complexity
Dimensionality
115
![Page 111: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/111.jpg)
Multi-dimensional Motif
• Synchronous
– Treat it as an even higher dimensional problem
– Simple extensions of uni-dimensional algorithms work
– To find sub-dimensional motifs, all possible sub-spaces have to be considered
116
![Page 112: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/112.jpg)
Multi-dimensional Motif
• Non-Synchronous
– Lags among motifs are common
– Subsets of dimensions can possibly construct a motif
David Minnen, Charles Isbell, Irfan Essa, and Thad Starner. Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery. ICDM '07
Alireza Vahdatpour, Navid Amini, Majid Sarrafzadeh: Toward Unsupervised Activity Discovery Using Multi-Dimensional Motif Detection in Time Series. IJCAI 2009: 1261-1266
117
![Page 113: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/113.jpg)
Coincidence table
coincident(ri, rj) is the number of overlapping occurrences of motif iand motif j
118
![Page 114: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/114.jpg)
Single-dimensional motifs to graph• Produce a co-occurrence graph
• Nodes are single dimensional motifs
• An edge between x and y denotes, x and y always co-occur within a time lag
• Cluster the graph using min-cut algorithms to find multi-dimensional motifs
x 104
0 0.5 1 1.5 2 2.5 3
0/20/22/4
1/41/21/2
0/30/40/2
0/40/40/4
Hip
Hand
Arm
Legxyz
xyz
xyz
xyz
119
![Page 115: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/115.jpg)
Open Problems
• New Invariances:– P1: Find repeated patterns under warping
distance.
– P2: Finding motifs under Complexity invariance.• Uniform scaling (Yankov 06)
• Significance:– P3: Assessing significance of motifs without
discretization.• Parameter-free
• Data adaptive
120
![Page 116: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/116.jpg)
Open Problems
• Algorithmic:– P4: Optimal k-motif for a given threshold– P5: Exact multi-dimensional motif discovery
• Application:– P6: Finding hidden state machine from motifs
• States == clustering• Rules between patterns only• State machine is for rules among clusters• Systems:
– A suite with all the techniques added
• Parallel motif discovery using GPU
121
![Page 117: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/117.jpg)
References1. Abdullah Mueen, Eamonn J. Keogh: Online discovery and maintenance of time series motifs. KDD 2010: 1089-10982. Abdullah Mueen, Eamonn J. Keogh, Qiang Zhu, Sydney Cash, M. Brandon Westover: Exact Discovery of Time Series
Motifs. SDM 2009: 473-4843. Abdullah Mueen, Eamonn J. Keogh, Nima Bigdely Shamlo: Finding Time Series Motifs in Disk-Resident Data. ICDM 2009:
367-3764. Abdullah Mueen: Enumeration of Time Series Motifs of All Lengths. ICDM 2013: 547-5565. Yuan Hao, Mohammad Shokoohi-Yekta, George Papageorgiou, Eamonn J. Keogh: Parameter-Free Audio Motif Discovery
in Large Data Archives. ICDM 2013: 261-2706. Dragomir Yankov, Eamonn J. Keogh, Jose Medina, Bill Yuan-chi Chiu, Victor B. Zordan: Detecting time series motifs under
uniform scaling. KDD 2007: 844-8537. Xiaopeng Xi, Eamonn J. Keogh, Li Wei, Agenor Mafra-Neto: Finding Motifs in a Database of Shapes. SDM 2007: 249-2608. Bill Yuan-chi Chiu, Eamonn J. Keogh, Stefano Lonardi: Probabilistic discovery of time series motifs. KDD 2003: 493-4989. Pranav Patel, Eamonn J. Keogh, Jessica Lin, Stefano Lonardi: Mining Motifs in Massive Time Series Databases. ICDM
2002: 370-37710. Alireza Vahdatpour, Navid Amini, Majid Sarrafzadeh: Toward Unsupervised Activity Discovery Using Multi-Dimensional
Motif Detection in Time Series. IJCAI 2009: 1261-126611. Debprakash Patnaik, Manish Marwah, Ratnesh Sharma, and Naren Ramakrishnan. 2009. Sustainable operation and
management of data center chillers using temporal data mining. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '09)
12. Yuan Li, Jessica Lin, Tim Oates: Visualizing Variable-Length Time Series Motifs. SDM 2012: 895-90613. Tim Oates, Arnold P. Boedihardjo, Jessica Lin, Crystal Chen, Susan Frankenstein, and Sunil Gandhi. 2013. Motif discovery
in spatial trajectories using grammar inference. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management(CIKM '13). ACM, New York, NY, USA, 1465-1468.
14. Sorrachai Yingchareonthawornchai, Haemwaan Sivaraks, Thanawin Rakthanmanon, Chotirat Ann Ratanamahatana: Efficient Proper Length Time Series Motif Discovery. ICDM 2013: 1265-1270
15. David Minnen, Charles Lee Isbell Jr., Irfan A. Essa, Thad Starner: Discovering Multivariate Motifs using Subsequence Density Estimation and Greedy Mixture Learning. AAAI 2007: 615-620
123
![Page 118: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/118.jpg)
References
16. Philippe Beaudoin, Stelian Coros, Michiel van de Panne, and Pierre Poulin. 2008. Motion-motif graphs. In Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA '08)
17. David Minnen, Charles L. Isbell, Irfan A. Essa, Thad Starner: Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery. ICDM 2007: 601-606
18. Yuan Hao, Yanping Chen, Jesin Zakaria, Bing Hu, Thanawin Rakthanmanon, Eamonn J. Keogh: Towards never-ending learning from time series streams. KDD 2013: 874-882
19. Gustavo E. A. P. A. Batista, Xiaoyue Wang, Eamonn J. Keogh: A Complexity-Invariant Distance Measure for Time Series. SDM 2011: 699-710
20. Thanawin Rakthanmanon, Eamonn J. Keogh, Stefano Lonardi, Scott Evans: Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data. ICDM 2011: 547-556
21. Yasser F. O. Mohammad, Toyoaki Nishida:Constrained Motif Discovery in Time Series. New Generation Comput. 27(4): 319-346 (2009)
22. Nuno Castro, Paulo J. Azevedo:Time Series Motifs Statistical Significance. SDM 2011: 687-69823. Bill Yuan-chi Chiu, Eamonn J. Keogh, Stefano Lonardi: Probabilistic discovery of time series motifs. KDD 2003: 493-49824. Zeeshan Syed, Collin Stultz, Manolis Kellis, Piotr Indyk, John V. Guttag: Motif discovery in physiological datasets: A
methodology for inferring predictive elements. TKDD 4(1) (2010)25. Yasser F. O. Mohammad, Toyoaki Nishida: Exact Discovery of Length-Range Motifs. ACIIDS (2) 2014: 23-32
124
![Page 119: Tutorial 2 Finding Repeated Structure in Time Series ...mueen/Tutorial/SDM2015Tutorial2.pdfWe have a discrete (binary) ^text _ that notes if the hand is near a cleaning instrument,](https://reader033.vdocument.in/reader033/viewer/2022053011/5f0e65c87e708231d43f0ce3/html5/thumbnails/119.jpg)
THANK YOUQuestions and Comments
125