frequent itemset mining methods

31

Click here to load reader

Upload: profnilesh-magar

Post on 16-Apr-2017

14.003 views

Category:

Education


11 download

TRANSCRIPT

Page 1: Frequent itemset mining methods

Frequent Item-set Mining Methods

Prepared By- Mr.Nilesh Magar

Page 2: Frequent itemset mining methods

Data Mining:

Data mining is the efficient discovery of valuable, non obvious information from a large collection of data.

Prepared By- Mr.Nilesh Magar

Page 3: Frequent itemset mining methods

Most important concepts in Data-mining

Item-set & frequent item-set:

Market Basket model

Frequent Item-set:

Prepared By- Mr.Nilesh Magar

Page 4: Frequent itemset mining methods

Example Of Market basket Model:B1 = {m, c, b} B2 = {m, p, j} B3 = {m, b} B4 = {c, j}

B5 = {m, p, b} B6 = {m, c, b, j} B7 = {c, b, j} B8 = {b,

c}

Suppose Min support =3

Frequent item-sets: {m:5}, {c:5}, {b:6}, {j:4}, {m, b:4}, {c,

b:4}, {j, c:3}.

Prepared By- Mr.Nilesh Magar

Page 5: Frequent itemset mining methods

Association Rule: Medical diagnosis dataset-symptoms and illness.

A rule is define as an implication of the form X Y where X,Y

I (Items). Or in other words: if { i1, i2,…, ik} j, means: if a

basket contains all of i1,…, ik then it is likely to Contain j.

The probability of finding Y for us to accept this rule is called

the confidence of the rule.

Conf(X Y)=SUPP(X U Y)/SUPP(X)

{m,b}c ::: Confidence = 2/4 = 50%

Thus Association mining is 2 step Process:

Find all frequent item-sets:

Generate Strong association rules from frequent item-set Prepared By- Mr.Nilesh Magar

Page 6: Frequent itemset mining methods

The Apriori algorithm

Mining frequent item-set for Boolean association rule

Prior knowledge

Iterative approach known as level-wise search k-item-

sets are used to explore (k+1)-item-sets

One full scan of the database required to find lK , L1->

Items with Min Support. L2-> generating 2-item-set etc.

Prepared By- Mr.Nilesh Magar

Page 7: Frequent itemset mining methods

Two steps:

Joinfinding Lk, a set of candidate k-itemsets is generated

by joining Lk-1 with itselfPrune

To reduce the size of Ck the Apriori property is used:

if any (k-1) subset of a candidate k-itemset is not in

Lk-1, then the candidate cannot be frequent either,so

it can be removed from Ck. – subset testing.

Prepared By- Mr.Nilesh Magar

Page 8: Frequent itemset mining methods

Join & prune Step

Prepared By- Mr.Nilesh Magar

Page 9: Frequent itemset mining methods

Example:

TID List of item_IDsT100 I1, I2, I5T200 I2, I4T300 I2, I3T400 I1, I2, I4T500 I1, I3T600 I2, I3T700 I1, I3T800 I1, I2, I3, I5T900 I1, I2, I3

Prepared By- Mr.Nilesh Magar

Page 10: Frequent itemset mining methods

Prepared By- Mr.Nilesh Magar

Page 11: Frequent itemset mining methods

Scan D for count of each candidate C1: I1 – 6, I2 – 7, I3 -6, I4 – 2, I5 - 2

Compare candidate support count with minimum support count (min_sup=2) L1: I1 – 6, I2 – 7, I3 -6, I4 – 2, I5 - 2

Generate C2 candidates from L1 and scan D for count of each candidate C2: {I1,I2} – 4, {I1, I3} – 4, {I1, I4} – 1, …

Compare candidate support count with minimum support count L2: {I1,I2} – 4, {I1, I3} – 4, {I1, I5} – 2, {I2, I3} – 4, {I2, I4} - 2, {I2, I5} – 2

Generate C3 candidates from L2 using the join and prune steps: Join: C3=L2xL2={{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4,

I5}} Prune: C3: {I1, I2, I3}, {I1, I2, I5}

Scan D for count of each candidate C3: {I1, I2, I3} - 2, {I1, I2, I5} – 2

Compare candidate support count with minimum support count L3: {I1, I2, I3} – 2, {I1, I2, I5} – 2

Generate C4 candidates from L3 C4=L3xL3={I1, I2, I3, I5} This itemset is pruned, because its subset {{I2, I3, I5}} is not frequent => C4=null

Prepared By- Mr.Nilesh Magar

Page 12: Frequent itemset mining methods

Generating association rules from frequent item-sets: from Slide 5

Finding the frequent item-sets from transactions in a database D

Generating strong association rules:Confidence(A=>B)=P(B|A)=

support_count(AUB)/support_count(A)support_count(AUB) – number of transactions

containing the itemsets AUBsupport_count(A) - number of transactions

containing the itemsets A

Prepared By- Mr.Nilesh Magar

Page 13: Frequent itemset mining methods

Example: lets have l={I1, I2, I5} The nonempty subsets are {I1, I2}, {I1, I5}, {I2, I5}, {I1}, {I2}, {I5}. Generating association rules:

I1 and I2=>I5 conf=2/4=50%I1 and I5=>I2 conf=2/2=100%I2 and I5=> I1 conf=2/2=100%I1=>I2 and I5 conf=2/6=33%I2=>I1 and I5 conf=2/7=29%I5=>I1 and I2 conf=2/2=100%

If min_conf is 70%, then only the second, third and last rules above are output.

Prepared By- Mr.Nilesh Magar

Page 14: Frequent itemset mining methods

Advantages & Disadvantages:

Adv:

1) Uses Large item-set

Property

2) Easily parallelized

3) Easy to implement

Dis-Adv:1) Assumes

transaction database is memory resident

Requires up to ‘m’ database scan.

Prepared By- Mr.Nilesh Magar

Page 15: Frequent itemset mining methods

Mining Frequent Itemsets without candidate generationThe candidate generate and test method

It may need to generate a huge number of candidate sets

It may need to repeatedly scan the database and check a large set of candidates by pattern matching

Frequent-pattern growth method(FP-growth) – frequent pattern tree(FP-tree)

Prepared By- Mr.Nilesh Magar

Page 16: Frequent itemset mining methods

Example:

TID List of item_IDsT100 I1, I2, I5T200 I2, I4T300 I2, I3T400 I1, I2, I4T500 I1, I3T600 I2, I3T700 I1, I3T800 I1, I2, I3, I5T900 I1, I2, I3

Prepared By- Mr.Nilesh Magar

Page 17: Frequent itemset mining methods

Step-1:Item Count

I1 6

I2 7

I3 6

I4 2

I5 2

Step-2:Arrange Transaction in descending order

TID List of item(Before)

List of item(After)

T100 I1, I2, I5 I2,I1,I5T200 I2, I4 I2,I4T300 I2, I3 I2,I3T400 I1, I2, I4 I2,I1,I4T500 I1, I3 I1,I3T600 I2, I3 I2,I3T700 I1, I3 I1,I3T800 I1, I2, I3, I5 I2,I1,I3,I5T900 I1, I2, I3 I2,I1,I3

Prepared By- Mr.Nilesh Magar

Page 18: Frequent itemset mining methods

FP-TREE

Prepared By- Mr.Nilesh Magar

Page 19: Frequent itemset mining methods

Item Conditional Pattern Base

Conditional FP-tree

Frequent Pattern Generated

I5 {{I2, I1:1}, {I2, I1, I3:1}}

(I2:2, I1:2) {I2, I5:2}, {I1, I5:2}, {I2, I1, I5:2}

I4 {{I2, I1:2}, {I2:1}}

(I2:2) {I2, I4:2}

I3 {{I2, I1:2}, {I2:2}, {I1:2}}

(I2:4, I1:2), (I1:2),

{I2, I3:4}, {I1, I3:4}, {I2, I1, I3:2}

I1 {{I2:4}} (I2:4) {I2, I1:4}Prepared By- Mr.Nilesh Magar

Page 20: Frequent itemset mining methods

Mining frequent itemsets using vertical data formatTransforming the horizontal data format of the

transaction database D into a vertical data format:

Itemset TID_setI1 {T100, T400, T500, T700, T800, T900}I2 {T100, T200, T300, T400, T600, T800, T900}I3 {T300, T500, T600, T700, T800, T900}I4 {T200, T400}I5 {T100, T800}

Prepared By- Mr.Nilesh Magar

Page 21: Frequent itemset mining methods

Example For Practice

Prepared By- Mr.Nilesh Magar

Page 22: Frequent itemset mining methods

Minimum support threshold is 3

Prepared By- Mr.Nilesh Magar

Page 23: Frequent itemset mining methods

Prepared By- Mr.Nilesh Magar

Page 24: Frequent itemset mining methods

T List of item(After)

T1 f,c,a,m,p

T2 f,c,a,b,m

T3 f,b

T4 c,b,p

T5 f,c,a,p,m

Prepared By- Mr.Nilesh Magar

Page 25: Frequent itemset mining methods

{}

f:4 c:1

b:1

p:1

b:1c:3

a:3

b:1m:2

p:2 m:1

Header Table

Item frequency head f 4c 4a 3b 3m 3p 3

FP-Growth Example

Prepared By- Mr.Nilesh Magar

Page 26: Frequent itemset mining methods

FP-Growth Example

EmptyEmptyf

{(f:3)}|c{(f:3)}c

{(f:3, c:3)}|a{(fc:3)}a

Empty{(fca:1), (f:1), (c:1)}b

{(f:3, c:3, a:3)}|m{(fca:2), (fcab:1)}m

{(c:3)}|p{(fcam:2), (cb:1)}p

Conditional FP-treeConditional pattern-baseItem

Prepared By- Mr.Nilesh Magar

Page 27: Frequent itemset mining methods

FP-Tree Algorithm:

Input: DB, min_support

Output: FP-Tree

1. Scan DB & count all frequent items.

2. Create null root & set as current node.

3. For each Transaction T Sort T’s items.

For each sorted Item I

Insert I into tree as a child of current node.

Connect new tree node to header list.

Prepared By- Mr.Nilesh Magar

Page 28: Frequent itemset mining methods

FP- Growth Algorithm:

Prepared By- Mr.Nilesh Magar

Page 29: Frequent itemset mining methods

Adv. & disAdv. Of FP- Growth:

Adv:1) Only 2 Passes Over Data-set2) No Candidate Generation 3) Much Faster Than Apriori

DisAdv:• FP-Tree may not fit in memory.• FP-Tree is expensive to build

Prepared By- Mr.Nilesh Magar

Page 30: Frequent itemset mining methods

Subjects

1) U.M.L.2) P.P.L.3) D.M.D.W.4) O.S.5) Programming Languages6) RDBMS

Mr. Nilesh MagarLecturer at MIT, Kothrud, Pune.9975155310.

Prepared By - Mr. Nilesh Magar

Page 31: Frequent itemset mining methods

Thank You

Prepared By - Mr. Nilesh Magar