automatic learning of predictive cep rulesautomatic learning of predictive cep rules bridging the...

30
Raef Mousheimish, Yehia Taher and Karine Zeitouni DAIVD Laboratory, University of Versailles, France Automatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion de Données — Principes, Technologies et Applications, Nancy 14-17 novembre 2017 The 11 th ACM International Conference on Distributed and Event-Based Systems

Upload: others

Post on 02-Jun-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Raef Mousheimish, Yehia Taher and Karine Zeitouni

DAIVD Laboratory, University of Versailles, France

Automatic Learning of

Predictive CEP RulesBridging the Gap between Data Mining and

Complex Event Processing

33ème conférence sur la Gestion de Données — Principes, Technologies et Applications, Nancy 14-17 novembre 2017

The 11th ACM International Conference on Distributed and Event-Based Systems

Page 2: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Context

• CEP helps to instantaneously react against occurring situations

• Employed in different domains• Environmental monitoring

• Fraud detection

• Financial applications

• Anomaly detection

• CEP engines are totally guided by CEP rules• The only inference mechanism in the CEP world

2

CEP Engine

Events

Composite events, results, alerts, …

CEP Rules

Page 3: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Examples of Rules

• Simple CEP Rule:

SELECT * FROM WE.win:time(2 minutes) HAVING avg(temperature)>10

• Complex CEP Rule:

CREATE WINDOW tempWin.win:keepall() (place String, avg int)

INSERT INTO tempWin SELECT * FROM WE.win:time(2 minutes) HAVING avg(temperature)>10

SELECT * FROM PATTERN [EVERY a = tempWin(avg>15) b = tempWin(avg> a.avg)

3

Page 4: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Motivation

• Event-based systems is an active area of research• Scalability

• Latency

• Distribution

• Almost all scenarios and problems tackled by researchers have reactive traits• Anomaly, traffic jam, fraud detection

• What if anomaly needs to be predicted?• E.g., in a manufacturing process

• The usage of CEP in such examples is not at all evident

4

Page 5: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Predictive CEP

• It has been mentioned several times as a future direction and as proposals on the conceptual level [Fulop, 2010] [Engel, 2012]

• However it remains a vision• No real attempt to produce an easy-to-use predictive CEP system

• The main cause: CEP rules need to be specified manually• No support is provided for users to define these rules

• It is hard for experts to manually write rules that predict situations• This is why domains such as data mining and predictive analytics exist in the first place

• There exists a gap between predictive analytics and CEP that needs to be bridged

5

Page 6: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Our Objectives

• Overtake the de facto approach regarding the definition of CEP rules• Manual to automatic

• Go from merely reactive rules into predictive rules

• Allow for the CEP technology to be easily employed in predictive applications

• Create a generic solution that could be used in different domains

• Bridge a gap between data mining and complex event processing

6

Page 7: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Context & Motivation

High Level Goals

Objectives

ContributionsUnivariate Shapelet Extraction

Sequence Extraction

Automatic Learning of Predictive CEP Rules

Evaluations

Conclusion

Outline

Page 8: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Early Classification on Time Series

• A predictive analytics field that fits exactly our goals of creating a generic and predictive approach

• Time series means timestamped events

• Early classification means predictive rules

• Shapelet-based classification style:• Devised in 2009 [Ye, SIGKDD 2009]

• Temporal patterns that are associated with classes

• By definition it works on univariate time series

• Suitable for anomaly prediction

8

Page 9: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Definitions

• A shapelet sh = (s, δ, cs, score)• s is the subsequence of a time series

• δ is the distance threshold

• cs is the class of the shapelet

• score is a utility score

9

Shapelets

The best shapelets:1. Have the smallest length2. Frequent3. Discriminative

Real artwork Transport Data

Page 10: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Definitions

• A multi-dimensional time series MTS dT = {T1, T2, …, Td}

• Data set of d-dimensional time series: Dd = {(dT, c)}

• TAS (Time-Annotated Sequence) is a sequence of shapelets with time annotations between them: sh1 sh2 sh3

• TAS are associated with a class: sh1 sh2 sh3 c• If the TAS appeared then c will probably occur

10

2 3

2 3

Multivariate Setting

Page 11: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Problem Statement

In general: How to go from classified multivariate time series into ready-to-deploy predictive CEP rules?

1. How to go from classified multivariate time series into time-annotated sequences?

2. How to transform time-annotated sequences into ready predictive CEP rules?

11

Contribution 1Input: Classified multivariate time series

Output: Time-annotated sequences

Contribution 2Input: Time-annotated sequences

Output: Predictive CEP rules

Page 12: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Our Proposal

• Two Contributions:1. USE & SEE algorithms to extract predictive temporal patterns with time and sequence

constraints (TAS)

2. A compiler (autoCEP) that transforms these patterns on-the-fly into predictive CEP rules

12

USE SEEDdClassified MTS

Shapelets TAS

autoCEPCEP

Engine

Predictive CEP Rules

Learning time

Real time

Univariate Shapelet Extractor

SEquenceExtractor

Transform TAS into CEP rules

TAS records

Multivariate time series

Page 13: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Step 1: Univariate Shapelet Extractor (USE)

13

Shapelet Learning

Shapelet Learning

Shapelet Learning

Shapelet Learning

Shapelet Learning

Shapelet Learning

MTS = Multivariate Time Series

Univariate Time Series Shapelets

USE SEEDd Classified MTS

Shapelets TAS

autoCEPCEP

EnginePredictive CEP Rules

Parameter-Free

Page 14: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Shapelet Learning

14

Get all subsequences between the lengths

For each subsequence

Calculate SimilaritiesCalculate Distance

ThresholdCalculate Utility

Score

End Loop

Three types of distance measures: Euclidean Distance (default)Dynamic time warpingMass (frequency domain) [Yeh, ICDM 2016]

Page 15: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Pruning• Top K pruning

• Normal top-k pruning

• Depending on the utility score

• Cover pruning

15

Sort

Mark the instances that

it covers

While not all data set is covered and there is still shapelets to test Marked > 0

acceptyes

no

Page 16: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Step 2: SEquenceExtractor (SEE)

16

Δt1

Time Constraint Learning

Sequence Learning

shapelets that constitute the sequences are associated with the same class

Time-annotated Sequences

Dd encoded:

USE SEEDd Classified MTS

Shapelets TAS

autoCEPCEP

EnginePredictive CEP Rules

Parameter-Free

Sequences

Page 17: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

• Each TAS is transformed into a set of simplerules and one complex rule

• A simple rule is a transformation of eachindividual shapelet into the CEP language

• A complex rule is a transformation of the wholepattern into the CEP language

• It captures all constraints: sequence and time windows between shapelets (i.e., between simple rules)

Step 3: Automatic Generation of CEP Rules (AutoCEP)

17

Δt2

Transformed into simple rules

Transformed into a complex rule

USE SEEDd Classified MTS

Shapelets TAS

autoCEPCEP

EnginePredictive CEP Rules

Page 18: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Simple CEP Rule Generation

• Each shapelet in a TAS is transformed into a CEP rule

• A simple rule matches whenever received events are similar to the shapelet that it represents

• To convey this in CEP jargons: for sh = (s, δ, cs, score)

18

Δt2

Transformed into simple rules

Transformed into a complex rule

Page 19: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

• Named window NW created

• 3 simple rules created• They emit their matches to NW

• Complex Rule: • For every a=NW(dim=3) b=NW(dim=1, start – a.start ≤ 2) c=NW(dim=2, start – b.start ≤ 3)

• The chaining of simple rules and complex rules is done through CEP Named Windows

Complex CEP Rules Generation

19

CEP Rule 1CEP Rule 2Input Stream: Sensors

ResultsNamed Window

listenslistens

outputs outputs

CEP Engine

2 3

Dim 1

Dim 2

Dim 3

Page 20: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Complete Picture of CEP Rules Generation

• Named Windows:

• Simple CEP Rules:

20

• Complex CEP Rules:

Page 21: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Experiments: Multivariate Time Series

• Different variants of classification:1. Closest classification: Classify according to closest pattern so far

2. First Classification: Classify according to the first matched pattern

3. Abnormality Detection: Ignore normal instances

4. Majority voting: Check every instance with every rule

21https://archive.ics.uci.edu/

Objectives

QualityPerformanceInterpretability

Metrics

Average f-score (the higher the better)Earliness: (the lower the better)Accuracy (the higher the better)Applicability (the higher the better)Learning time (the lower the better)

Page 22: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Closest Classification: Comparison

22

WaferECG

Robots

[REACT, 2015][MSD, 2013]

Approach Avg. f-score

Earliness App. Acc.

autoCEP 90.5% 28.7% 100% 92.6%

REACT 91.9% 32.8% 100% -

Full 1NN 87.2% 100% 100% 89.9%

Approach Avg. f-score

Earliness App. Acc.

autoCEP 81% 21.2% 100% 82.7%

REACT 76.7% 10.5% 100% -

Full 1NN 87.7% 100% 100% 88.7%

MSD 58.8% 12.8% 100% -

Approach Avg. f-score Earliness App. Acc.

autoCEP 76.6% 50% 100% 80.8%

REACT 72.7% 40.7% 94.7% -

Full 1NN 71.9% 100% 100% 79.3%

MSD 39.6% 27.4% 96.3% -

Page 23: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

All Classifications

23

85.891.3

82.2

94.9

27 28.6 30

100

86.3 85.782.4

95.1

33 35 37

100

FIRST CLASSIFICATION CLOSEST CLASSIFICATION ABNORMALITY DETECTION MAJORITY VOTING

Classification Methods

Wafer:Acc Wafer:Earliness ECG:Acc ECG:Earliness

Page 24: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Sensitivity of Parameters

24

The ECG dataset

Lengths (min, max) Distance Measure

Pruning minAcc (SEE)

Page 25: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Learning Time

25

Brute and mass to compute distances

O means an optimized version with multithreading

Time Complexity

USE(brute) = O(d.n.(m2.log(n)))

USE(mass) = O(d.n.log(n))

SEE = O(n.m.d!)

autoCEP has no time complexity

Empirical experiments with synthetic data

Page 26: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Interpretability of Rules

26

ECG Data Example Wafer Data Example

Page 27: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

Conclusion

Learning of advanced temporal patterns from multivariate time series with USE & SEE

• Adopt shapelets in the multivariate settings

• Step further from current state-of-the-art approaches

• Including sequencing and time constraints

Automatic learning of predictive CEP rules with autoCEP• Learn data-driven rules

• First approach to learn predictive rules

• Employ the CEP technology in predictive contexts without complexity

More Optimization techniques will be integrated

27

Page 28: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

References

[Margara, 2014] et al. “Learning from the past: automated rule generation for complex event processing”. DEBS, 2014.

[Margara, 2014] et al. “Towards automated rule learning for complex event processing”. Tech Report, 2013.

[Mutschler, 2012] et al. “Learning event detection rules with noise hidden Markov models”. AHS, 2012.

[Sen, 2010] et al. “An approach for iterative event pattern recommendation”. DEBS, 2010.

[Turchin, 2009] et al. “Tuning complex event processing rules using the prediction-correction paradigm”. DEBS, 2009.

[Ye, 2009], and Eamonn Keogh. "Time series shapelets: a new primitive for data mining." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009.

28

Page 29: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

References

• [Fulop, 2010] Fülöp, Lajos Jenő, et al. "Survey on complex event processing and predictive analytics." Proceedings of the Fifth Balkan Conference in Informatics. 2010.

• [Engel, 2012] Engel, Yagil, Opher Etzion, and Zohar Feldman. "A basic model for proactive event-driven computing." Proceedings of the 6th ACM international conference on distributed event-based systems. ACM, 2012.

• [REACT, 2015], Lin, H.-H. Chen, V. S. Tseng, and J. Pei. Reliable early classication on multivariate time series with numerical and categorical attributes. In Advances in Knowledge Discovery and Data Mining, pages 199-211. Springer, 2015.

• [MSD, 2013] Ghalwash, V. Radosavljevic, and Z. Obradovic. Extraction of interpretable multivariate patterns for early diagnostics. In Data Mining (ICDM), 2013 IEEE 13th International Conference on, pages 201-210. IEEE, 2013.

29

Page 30: Automatic Learning of Predictive CEP RulesAutomatic Learning of Predictive CEP Rules Bridging the Gap between Data Mining and Complex Event Processing 33ème conférence sur la Gestion

30