discovering multi-label temporal patterns in sequence databases yen-liang chen, shin-yi wu, yu-cheng...

32
Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen , Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

Upload: justin-hood

Post on 17-Jan-2016

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

1

Discovering multi-label temporal patterns in sequence databases

Yen-Liang Chen , Shin-Yi Wu, Yu-Cheng Wang

IS (Information Sciences) 2011

Page 2: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

2

OUTLINE• 1. Introduction• 2. Related works• 3. Problem definition• 4. The algorithm• 5. Performance evaluation and

real case experiments• 6. Conclusions and future work

Page 3: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

3

1. Introduction• Multi-label event

Page 4: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

4

1. Introduction• Multi-label temporal pattern representation

• MLTPM (Multi-label temporal pattern mining)for discovering multi-label temporal patterns from multi-label sequence data.

Page 5: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

5

2. Related works• Allen-based representation

“Maintaining knowledge about temporal intervals”

• Kam and Fu’s method

• TPrefixSpan

• HTPM

Page 6: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

6

3. Problem definition

• Let event types 1, 2, . . ., and u be all the event types in temporal database D .

• Let Li = {li1, li

2, …, lit } be the set of all labels for

event type i.

• A multi-label item has three related attributes:1. event type2. occurrence number of the event type3. label index

Page 7: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

7

3. Problem definition

• We define the following notations for a multi-label item it:

• A multi-label sequence is a sequence of multi-label items.

• The total number of items in a multi-label sequence is the length of the sequence.

Page 8: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

8

3. Problem definition

EXAMPLE

The first occurrence of event type a with three statuses : (a1

1 , a13 , a1

2) length = 3The second occurrence of event type a with two statuses : (a2

2 , a23) length =

2The first occurrence of event type b with two statuses : (b1

2 , b14) length = 2

The second occurrence of event type b with two statuses : (b2

2 , b23) length = 2

The first occurrence of event type c with two statuses : (c1

2 ) length = 1

Page 9: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

9

3. Problem definition

EXAMPLE

(a11 , a1

3 , a12) is the first occurrence of event type a in the sequence.

(a22 , a2

3) is the second occurrence.

a11 .oNum = 1

a11 .lNum = 1

a11 .eType = a

a23 .oNum = 2

a23 .lNum = 3

a23 .eType = a

Page 10: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

10

3. Problem definition

• Let time(u) be the occurrence time of item u. Then, the order relation Rel(u,v) of two items u and v can be defined as ‘‘<” if time(u) < time(v), and as ‘‘=” if time(u) = time(v).

• EX:

Rel(a11 , b1

2) = “<” , because time (a1

1) = 4 < time (b12) = 6

Page 11: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

11

3. Problem definition

• A multi-label temporal sequence or pattern is a sequence of multi-label items interweaved with temporal relationships.

Page 12: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

12

3. Problem definition

• In a multi-label sequence or a multi-label temporal pattern, item u must be placed before item v based on the following conditions:

Page 13: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

13

3. Problem definition

EXAMPLE

a11 < a1

2 , b11 < b1

2

a12 =

a22

a13 =

b11

a22 = a1

3

Page 14: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

14

3. Problem definition

• Function Small (⊕r , ⊕r+1 ,…, ⊕q), where ⊕i ∈ {<, =}, will output “<“ if any ⊕i , r ≤ i ≤ q , is “<”. Otherwise, the output of Small is “=”.

• EX:mltp = (a1

1 < b12 < a1

2 < a13 = b1

3 = c11),

then Rel (a12 , c1

1 ) = Small (< , = , =) = “<“,and Rel (a1

3 , c11 ) = Small (= , =) = “=“,

Page 15: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

15

3. Problem definition

EXAMPLE

mltp = (a11 < a1

2 < a13 < b1

3 < b14 )

mlts = (a11 < a1

2 < b12 < a1

3 < b13 < c1

1 < a22 < b2

3 < b2

4 )

we show that mltp ⊆ mlts because we can find s1,s2, s4, s8,and s9 in mlts.

Page 16: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

16

3. Problem definition

(Cont.)

• (1) Type equivalence (2) Label equivalence:p1.eType = s1.eType = a p1.lNum = s1.lNum = 1p2.eType = s2.eType = a p2.lNum = s2.lNum = 2p3.eType = s4.eType = a p3.lNum = s4.lNum = 3p4.eType = s8.eType = b p4.lNum = s8.lNum = 3p5.eType = s9.eType = b p5.lNum = s9.lNum = 4

1 2 3 4 5 6 7 8 9mltp = (a1

1 < a12 < a1

3 < b13 < b1

4 )mlts = (a1

1 < a12 < b1

2 < a13 < b1

3 < c11 < a2

2

< b23 < b2

4 )

Page 17: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

17

3. Problem definition

(Cont.)

• (3) Occurrence number agreement:p1, p2, p3,have the same event type and occurrence number, so do s1, s2, s4.

p4, p5 have the same event type and occurrence number, so do s8, s9.

1 2 3 4 5 6 7 8 9mltp = (a1

1 < a12 < a1

3 < b13 < b1

4 )mlts = (a1

1 < a12 < b1

2 < a13 < b1

3 < c11 < a2

2

< b23 < b2

4 )

Page 18: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

18

3. Problem definition

(Cont.)

• (4) Same label ordering:¤1 = Small (⊕1) = Small (<) = “<”¤2 = Small (⊕2 , ⊕3) = Small (< , <) = “<”¤3 = Small (⊕4 , ⊕5 , ⊕6 , ⊕7 ) = Small (< , < , < , <) = “<”¤4 = Small (⊕8) = Small (<) = “<”

1 2 3 4 5 6 7 8 mltp = (a1

1 < a12 < a1

3 < b13 < b1

4 )mlts = (a1

1 < a12 < b1

2 < a13 < b1

3 < c11 < a2

2

< b23 < b2

4 )

Page 19: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

19

4. The algorithm• There are two kinds of multi-label temporal

patterns.• Intra-event pattern

It consists of only one event occurrence and intra-Lk is the set of frequent intra-event patterns with length k, where k is the number of items

• Inter-event pattern It consists of more than one event occurrence and inter-Lk is the set of frequent inter-event patterns with length k, where k is the number of event occurrences

Page 20: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

20

4. The algorithm• MLTPM(Multi-label temporal pattern mining)

• Phase 1 : intra-event pattern mining, discovering patterns with only one event occurrence.

• Phase 2 : inter-event pattern mining, discovering patterns with more than one event occurrence.

• EX: A multi-label temporal pattern a11 < a1

2 < a13

< a22 < a2

4 is treated as an inter-event pattern because event type a occurs twice.

Page 21: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

21

4. The algorithm• Phase 1

Page 22: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

22

4. The algorithmEXAMPLE

But occurrence records <(1,4)> and <(2,13)> cannot be joined in this phase.

Join (a11) and (a1

2) , we obtain the pattern (a1

1 < a12)

Page 23: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

23

4. The algorithmEXAMPLE

Generate intra- Lk from intra- L(k-1)

Page 24: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

24

4. The algorithm• Phase 2

Page 25: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

25

4. The algorithm• After phase 1, we combine all intra-event

patterns to obtain inter-L1.

• When generating inter-L2, GenInterLk joins all pairs of inter-event patterns (including self-join) in inter-L1 .

• The occurrence records for two patterns in inter-L1 are joinable(1) If the patterns have different event types.(2) If the patterns have the same event type, they have different occurrence numbers.

Page 26: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

26

4. The algorithmEXAMPLE

The two inter-L1 patterns (a11 < a1

2 ) and (b12 < b1

3 ) have different event types, so they are joinable.

Page 27: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

27

4. The algorithmEXAMPLE

Although the two inter-L1 patterns (b12 ) and (b1

2 < b13 )

have the same event type, their occurrence records have different occurrence numbers.

Page 28: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

28

4. The algorithm• When generating inter-Lk (k > 2), GenInterLk

only joins pairs of inter-event patterns in inter-L(k-1) that have the same first (k-2) events.

• They must have the same occurrence number and the same occurrence time.

• Two occurrence records for patterns in inter-L(k-1) are joinable.(1) If they have different last event types.(2) If they have the same last event type, they have different occurrence numbers.

Page 29: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

29

EXAMPLE

The two inter-L2 patterns are joinable because(1) They have the same first 1 event, a1

1 < a12 , and the

same occurrence record, <(1, 4), (1,11)> .(2) Although they have the same last event type b, they have different occurrence numbers.

Page 30: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

30

5. Performance evaluation and

real case experiments

Page 31: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

31

5. Performance evaluation and

real case experiments

Page 32: Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences) 2011 1

32

6. Conclusions and future work

• MLTPM