discovering multi-label temporal patterns in sequence databases yen-liang chen, shin-yi wu, yu-cheng...
TRANSCRIPT
1
Discovering multi-label temporal patterns in sequence databases
Yen-Liang Chen , Shin-Yi Wu, Yu-Cheng Wang
IS (Information Sciences) 2011
2
OUTLINE• 1. Introduction• 2. Related works• 3. Problem definition• 4. The algorithm• 5. Performance evaluation and
real case experiments• 6. Conclusions and future work
3
1. Introduction• Multi-label event
4
1. Introduction• Multi-label temporal pattern representation
• MLTPM (Multi-label temporal pattern mining)for discovering multi-label temporal patterns from multi-label sequence data.
5
2. Related works• Allen-based representation
“Maintaining knowledge about temporal intervals”
• Kam and Fu’s method
• TPrefixSpan
• HTPM
6
3. Problem definition
• Let event types 1, 2, . . ., and u be all the event types in temporal database D .
• Let Li = {li1, li
2, …, lit } be the set of all labels for
event type i.
• A multi-label item has three related attributes:1. event type2. occurrence number of the event type3. label index
7
3. Problem definition
• We define the following notations for a multi-label item it:
• A multi-label sequence is a sequence of multi-label items.
• The total number of items in a multi-label sequence is the length of the sequence.
8
3. Problem definition
EXAMPLE
The first occurrence of event type a with three statuses : (a1
1 , a13 , a1
2) length = 3The second occurrence of event type a with two statuses : (a2
2 , a23) length =
2The first occurrence of event type b with two statuses : (b1
2 , b14) length = 2
The second occurrence of event type b with two statuses : (b2
2 , b23) length = 2
The first occurrence of event type c with two statuses : (c1
2 ) length = 1
9
3. Problem definition
EXAMPLE
(a11 , a1
3 , a12) is the first occurrence of event type a in the sequence.
(a22 , a2
3) is the second occurrence.
a11 .oNum = 1
a11 .lNum = 1
a11 .eType = a
a23 .oNum = 2
a23 .lNum = 3
a23 .eType = a
10
3. Problem definition
• Let time(u) be the occurrence time of item u. Then, the order relation Rel(u,v) of two items u and v can be defined as ‘‘<” if time(u) < time(v), and as ‘‘=” if time(u) = time(v).
• EX:
Rel(a11 , b1
2) = “<” , because time (a1
1) = 4 < time (b12) = 6
11
3. Problem definition
• A multi-label temporal sequence or pattern is a sequence of multi-label items interweaved with temporal relationships.
12
3. Problem definition
• In a multi-label sequence or a multi-label temporal pattern, item u must be placed before item v based on the following conditions:
13
3. Problem definition
EXAMPLE
a11 < a1
2 , b11 < b1
2
a12 =
a22
a13 =
b11
a22 = a1
3
14
3. Problem definition
• Function Small (⊕r , ⊕r+1 ,…, ⊕q), where ⊕i ∈ {<, =}, will output “<“ if any ⊕i , r ≤ i ≤ q , is “<”. Otherwise, the output of Small is “=”.
• EX:mltp = (a1
1 < b12 < a1
2 < a13 = b1
3 = c11),
then Rel (a12 , c1
1 ) = Small (< , = , =) = “<“,and Rel (a1
3 , c11 ) = Small (= , =) = “=“,
15
3. Problem definition
EXAMPLE
mltp = (a11 < a1
2 < a13 < b1
3 < b14 )
mlts = (a11 < a1
2 < b12 < a1
3 < b13 < c1
1 < a22 < b2
3 < b2
4 )
we show that mltp ⊆ mlts because we can find s1,s2, s4, s8,and s9 in mlts.
16
3. Problem definition
(Cont.)
• (1) Type equivalence (2) Label equivalence:p1.eType = s1.eType = a p1.lNum = s1.lNum = 1p2.eType = s2.eType = a p2.lNum = s2.lNum = 2p3.eType = s4.eType = a p3.lNum = s4.lNum = 3p4.eType = s8.eType = b p4.lNum = s8.lNum = 3p5.eType = s9.eType = b p5.lNum = s9.lNum = 4
1 2 3 4 5 6 7 8 9mltp = (a1
1 < a12 < a1
3 < b13 < b1
4 )mlts = (a1
1 < a12 < b1
2 < a13 < b1
3 < c11 < a2
2
< b23 < b2
4 )
17
3. Problem definition
(Cont.)
• (3) Occurrence number agreement:p1, p2, p3,have the same event type and occurrence number, so do s1, s2, s4.
p4, p5 have the same event type and occurrence number, so do s8, s9.
1 2 3 4 5 6 7 8 9mltp = (a1
1 < a12 < a1
3 < b13 < b1
4 )mlts = (a1
1 < a12 < b1
2 < a13 < b1
3 < c11 < a2
2
< b23 < b2
4 )
18
3. Problem definition
(Cont.)
• (4) Same label ordering:¤1 = Small (⊕1) = Small (<) = “<”¤2 = Small (⊕2 , ⊕3) = Small (< , <) = “<”¤3 = Small (⊕4 , ⊕5 , ⊕6 , ⊕7 ) = Small (< , < , < , <) = “<”¤4 = Small (⊕8) = Small (<) = “<”
1 2 3 4 5 6 7 8 mltp = (a1
1 < a12 < a1
3 < b13 < b1
4 )mlts = (a1
1 < a12 < b1
2 < a13 < b1
3 < c11 < a2
2
< b23 < b2
4 )
19
4. The algorithm• There are two kinds of multi-label temporal
patterns.• Intra-event pattern
It consists of only one event occurrence and intra-Lk is the set of frequent intra-event patterns with length k, where k is the number of items
• Inter-event pattern It consists of more than one event occurrence and inter-Lk is the set of frequent inter-event patterns with length k, where k is the number of event occurrences
20
4. The algorithm• MLTPM(Multi-label temporal pattern mining)
• Phase 1 : intra-event pattern mining, discovering patterns with only one event occurrence.
• Phase 2 : inter-event pattern mining, discovering patterns with more than one event occurrence.
• EX: A multi-label temporal pattern a11 < a1
2 < a13
< a22 < a2
4 is treated as an inter-event pattern because event type a occurs twice.
21
4. The algorithm• Phase 1
22
4. The algorithmEXAMPLE
But occurrence records <(1,4)> and <(2,13)> cannot be joined in this phase.
Join (a11) and (a1
2) , we obtain the pattern (a1
1 < a12)
23
4. The algorithmEXAMPLE
Generate intra- Lk from intra- L(k-1)
24
4. The algorithm• Phase 2
25
4. The algorithm• After phase 1, we combine all intra-event
patterns to obtain inter-L1.
• When generating inter-L2, GenInterLk joins all pairs of inter-event patterns (including self-join) in inter-L1 .
• The occurrence records for two patterns in inter-L1 are joinable(1) If the patterns have different event types.(2) If the patterns have the same event type, they have different occurrence numbers.
26
4. The algorithmEXAMPLE
The two inter-L1 patterns (a11 < a1
2 ) and (b12 < b1
3 ) have different event types, so they are joinable.
27
4. The algorithmEXAMPLE
Although the two inter-L1 patterns (b12 ) and (b1
2 < b13 )
have the same event type, their occurrence records have different occurrence numbers.
28
4. The algorithm• When generating inter-Lk (k > 2), GenInterLk
only joins pairs of inter-event patterns in inter-L(k-1) that have the same first (k-2) events.
• They must have the same occurrence number and the same occurrence time.
• Two occurrence records for patterns in inter-L(k-1) are joinable.(1) If they have different last event types.(2) If they have the same last event type, they have different occurrence numbers.
29
EXAMPLE
The two inter-L2 patterns are joinable because(1) They have the same first 1 event, a1
1 < a12 , and the
same occurrence record, <(1, 4), (1,11)> .(2) Although they have the same last event type b, they have different occurrence numbers.
30
5. Performance evaluation and
real case experiments
31
5. Performance evaluation and
real case experiments
32
6. Conclusions and future work
• MLTPM