acctg 6910 building enterprise & business intelligence systems (e.bis)

18
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Sequential Pattern Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business

Upload: yoko

Post on 06-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis). Sequential Pattern Mining. Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business. Sequential Patterns. Given: A Transaction Database { cid, tid, date, item } - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

1

ACCTG 6910Building Enterprise &

Business Intelligence Systems(e.bis)

ACCTG 6910Building Enterprise &

Business Intelligence Systems(e.bis)

Sequential Pattern Mining

Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business

Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business

Page 2: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

2

Sequential Patterns

Given: A Transaction Database { cid, tid, date, item }

Find: inter-transaction patterns among customers

Example: customers typically rent “ Star Wars”, then “Empire Strikes Back” and then “Return of the Jedi”

Page 3: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

3

Sequential Patterns

cid tid date item

1 1 01/01/2000 30

1 2 01/02/2000 90

2 3 01/01/2000 40,70

2 4 01/02/2000 30

2 5 01/03/2000 40,60,70

3 6 01/01/2000 30,50,70

4 7 01/01/2000 30

4 8 01/02/2000 40,70

4 9 01/03/2000 90

5 10 01/01/2000 90

Page 4: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

4

Sequential Patterns

Itemset : is a non-empty set of items,

e.g., {30} , {40, 70}.

Sequence: is an ordered list of itemsets,

e.g. <{30} {40,70}> , <{40,70} {30} >.

Size of sequence is the number of itemsets in that sequence.

Page 5: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

5

Sequential Patternscid tid date item

1 1 01/01/2000 30

1 2 01/02/2000 90

2 3 01/01/2000 40,70

2 4 01/02/2000 30

2 5 01/03/2000 40,60,70

3 6 01/01/2000 30,50,70

4 7 01/01/2000 30

4 8 01/02/2000 40,70

4 9 01/03/2000 90

5 10 01/01/2000 90

Each transaction of a customer can be viewed as an itemset

A customer’s sequences contains the customer’s ordered itemsets

Page 6: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

6

Sequential Patterns

cid customer sequence

1 <{30} {90} >

2 <{40,70} {30} {40,60,70}>

3 <{30,50,70}>

4 <{30} {40,70} {90}>

5 <{90}>

Page 7: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

7

Sequential Patterns

Sequence <a1 a2 ….an> is contained in sequence <b1 b2 ….bm> if there exist indexes i1<i2….<in such that

a1 bi1, a2 bi2, …, and an bin.

E.g., <{3} {4,5} {8}> is contained in < {3,8}{4,5,6} {8}>

Is <{3} {4,5} {8}> contained in <{7} {3,8} {9}{4,5,6} {8}> ?

Is <{3} {4,5} {8}> contained in <{7} {9} {4,5,6} {3,8} {8}> ?

Is <{3} {4,5} {8}> contained in <{7} {9} {3,8}{4,5,6} > ?

Page 8: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

8

Sequential Patterns

cid customer sequence

1 <{30} {90} >

2 <{40,70} {30} {40,60,70}>

3 <{30,50,70}>

4 <{30} {40,70} {90}>

5 <{90}>

A customer supports sequence s if s is contained in the

sequence for this customer.

E.g., customers 1 and 4 support sequence <{30} {90}>

Page 9: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

9

Sequential Patterns

cid customer sequence

1 <{30} {90} >

2 <{40,70} {30} {40,60,70}>

3 <{30,50,70}>

4 <{30} {40,70} {90}>

5 <{90}>

The support for a sequence s is defined as the fraction of

total customers who support s .

E.g., customers 1 and 4 support sequence <{30} {90}>

Supp(<{30} {90}>) = 2/5 = 40%

Page 10: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

10

Sequential Patterns

cid customer sequence

1 <{30} {90} >

2 <{40,70} {30} {40,60,70}>

3 <{30,50,70}>

4 <{30} {40,70} {90}>

5 <{90}>

Supp(<{40,70}>) = 2/5 = 40%

Supp({40,70}) = 3/10 = 30%

Page 11: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

11

Sequential Patterns Mining

Given: A Transaction Database { cid, tid, date, item }

Find: All sequences that have support larger than user-specified minimum support

Apriori property: if a sequence is large then all sequences contained in that sequence should be large.

Page 12: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

12

Identify all Large 1-Sequences

Repeat until there is no more Candidate k-SequencesIdentify all Candidate k-Sequences using Large (k-1)-Sequences

Join:Two large (k-1)-sequences, L1 amd L2, that are joinable must satisfy the following conditions:

L1(1)=L2(1) and L1(2)=L2(2) and …. L1(K-2)=L2(K-2) L1(K-1) L2(K-1)

Prune :prune candidate k-sequences generated in step 2-1

that have sub-sequences not large.

Determine Large k-Sequences from Candidate k-Sequences

Sequential Patterns Mining

Page 13: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

13

Sequential Patterns Mining

cid customer sequence

1 <{30} {90} >

2 <{40,70} {30} {40,60,70}>

3 <{30,50,70}>

4 <{30} {40,70} {90}>

5 <{90}>

Minimum Support: 40%

Page 14: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

14

Sequential Patterns Mining

Large 1-Sequence:

<{30}> support=4/5=80%

<{40}> support=2/5=40%

<{70}> support=3/5=60%

<{90}> support=3/5=60%

<{40,70}> support=2/5=40%

cid customer sequence

1 <{30} {90} >

2 <{40,70} {30} {40,60,70}>

3 <{30,50,70}>

4 <{30} {40,70} {90}>

5 <{90}>

Minimum Support: 40%

Page 15: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

15

Sequential Patterns MiningLarge 1-Sequence:

<{30}> support=4/5=80%

<{40}> support=2/5=40%

<{70}> support=3/5=60%

<{90}> support=3/5=60%

<{40,70}> support=2/5=40%Candidate 2-Sequence:

<{30} {40}> <{30} {70}> <{30} {90}> <{30} {40,70}>

<{40} {30}> <{40} {70}> <{40} {90}> <{40} {40,70}>

<{70} {30}> <{70} {40}> <{70} {90}> <{70} {40,70}>

<{90} {30}> <{90} {40}> <{90} {70}> <{90} {40,70}>

<{40,70} {30}> <{40,70} {40}> <{40,70} {70}> <{40,70} {90}>

Page 16: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

16

Sequential Patterns Mining

Large 2-Sequence:

<{30} {40}> support=2/5=40%

<{30} {70}> support=2/5=40%

<{30} {90}> support=2/5=40%

<{30} {40,70}> support=2/5=40%

Candidate 2-Sequence:

<{30} {40}> <{30} {70}> <{30} {90}> <{30} {40,70}>

<{40} {30}> <{40} {70}> <{40} {90}> <{40} {40,70}>

<{70} {30}> <{70} {40}> <{70} {90}> <{70} {40,70}>

<{90} {30}> <{90} {40}> <{90} {70}> <{90} {40,70}>

<{40,70} {30}> <{40,70} {40}> <{40,70} {70}> <{40,70} {90}>

Page 17: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

17

Sequential Patterns Mining

Candidate 3-Sequence:

<{30} {40} {70}> <{30} {40} {40,70}>

<{30} {70} {40}> <{30} {70} {40,70}>

<{30} {40,70} {40}> <{30} {40,70} {70}>

<{30} {40} {90}> <{30} {90} {40}><{30} {70} {90}> <{30} {90} {70}><{30} {90} {40,70}> <{30} {40,70} {90}>

Large 2-Sequence:

<{30} {40}> support=2/5=40%

<{30} {70}> support=2/5=40%

<{30} {90}> support=2/5=40%

<{30} {40,70}> support=2/5=40%

Candidate 3-Sequence:

No candidate 3-sequence. Stop.

Prune:

All sub-sequences of a candidate k-sequence should be large.

Page 18: ACCTG 6910 Building Enterprise &  Business Intelligence Systems (e.bis)

18

Summary

• What is a sequential pattern?

• What is support for a sequential pattern?

• How to mine sequential patterns?

• What are the similarities and dissimilarities between association rules and sequential patterns mining?