discovery of meaningful rules in time series sigkdd2015
TRANSCRIPT
![Page 1: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/1.jpg)
Discovery of Meaningful Rules in Time Series
SIGKDD2015
![Page 2: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/2.jpg)
What is rule
• 静夜思• 唐 李白• 床前明月光,• 疑是地上霜。• 举头望明月,• 低头思故乡。
• 秋浦歌• 唐 李白• 炉火照天地,• 红星乱紫烟。• 赧郎明月夜,• 歌曲动寒川。
明 -> 月
![Page 3: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/3.jpg)
The Raven
• A poem by Edgar Allan Poe
“Once upon a midnight dreary, while I pondered weak and …”
chamber( 房间 ) → door
chamber: antecedent
door: consequent
![Page 4: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/4.jpg)
Rule in Time Series
• A major difference between text and time series is that the latter does not have a natural segmentation
• onceuponamidnightdrearywhileIponderedweak....
• qncexauponwamidmightmtdreerydwgileuIpponderediweek...
• dist(“chamber”, substring) ≤ t → door
• For example: t = 2 chanbet -> door
![Page 5: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/5.jpg)
About lag
The consequent may not immediately followed the antecedent.
chamberdoor, chamberzdoor, chamberxydoor
So we need to define a parameter, maxlag, which is the maximum number of characters between the the antecedent and the consequent
Example:if maxlag = 2,the above predictions is valid
![Page 6: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/6.jpg)
The formal definition of rule in Time Series
“If we see a substring of length ρ that is within distance of the word chamber, then we fire the rule and expect to see a similar substring to word door, within a learned distance , in the next maxlag time steps.”
![Page 7: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/7.jpg)
Rule is like this
![Page 8: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/8.jpg)
Time Series Motif
The method is based on Time Series Motif, which has been extensively studied in many literature
![Page 9: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/9.jpg)
Definitions
![Page 10: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/10.jpg)
Definitions
![Page 11: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/11.jpg)
DATA DISCRETIZATION
Find the minimum value and maximum value, then we set bin boundaries that are uniformly sized between min and max. The resulting bin width is then: (max - min) / cardinality
![Page 12: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/12.jpg)
MDL
• MDL is used as a scoring function, which is novel in this paper
• Why MDL? Why not ED?
• The Euclidean distance does not allow us to compare the quality of consequents with different lengths.
• The Euclidean distance between two subsequences of length ρ can actually decrease when we expand to length ρ + 1 due to the (re)normalization of the data. So not only is the effect of length not linear, it is not even monotonic.
![Page 13: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/13.jpg)
What is MDL• MDL or Minimum Description Length is used to
score a rule based on how many bits that can be saved.
A hypothesis (green/bold) can be used to score subsequences by subtracting it from them (producing the small integers shown top) and encoding the difference vector with Huffman encoding
Here the left sequence requires 57 bits, whereas the right sequence requires 84.
After encoding, how many bits it cost to save the sequence:
![Page 14: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/14.jpg)
RULE DISCOVERY ALGORITHM
1. A scoring function
2. A search algorithm which repeatedly invokes this scoring function while searching for high quality rules
![Page 15: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/15.jpg)
Rule Scoring
• For clarity, we begin to consider maxlag is 0
![Page 16: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/16.jpg)
![Page 17: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/17.jpg)
![Page 18: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/18.jpg)
![Page 19: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/19.jpg)
![Page 20: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/20.jpg)
![Page 21: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/21.jpg)
Motif-Based Rule Searching
• Efficient algorithms for discovering the top K motifs in a time series are well-known.In this paper, we use MK algorithm
![Page 22: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/22.jpg)
EXPERIMENT-Zebra finch
![Page 23: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/23.jpg)
EXPERIMENT-Energy Disaggregation
Clothes Washer Clothes Dryer
![Page 24: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/24.jpg)
Conclusion
1. Applid MDL to score time series rules
2. Rule representation is expressive enough to allow rules with different length antecedents/consequents/lags/firing thresholds
![Page 25: Discovery of Meaningful Rules in Time Series SIGKDD2015](https://reader033.vdocument.in/reader033/viewer/2022061520/5697c01c1a28abf838ccff6b/html5/thumbnails/25.jpg)
Future work
1. On some datasets, Dynamic Time Warping, in single or multi-dimensional cases, may be more robust than the Euclidean distance, but to massive datasets remains an issue.
2. It may be possible to generalize the rule representation to allow more expressive logical connectives
3. There are currently no standard benchmarks for time series rule discovery.