1.9.association mining 1

16
Association Rule Mining 1 Association Rule Mining

Upload: krishver2

Post on 12-Aug-2015

31 views

Category:

Education


0 download

TRANSCRIPT

Page 1: 1.9.association mining 1

Association Rule Mining 1

Association Rule Mining

Page 2: 1.9.association mining 1

Association Rule Mining 2

Association Rules

Finds Interesting associations / correlation relationships among large sets of data

Business Decision Making Example – Market Basket Analysis

Items likely to be purchased Advertising strategy, Catalog Design, Store layout

Page 3: 1.9.association mining 1

Association Rule Mining 3

Association Rules

Forming Association rules Universe – all items Boolean Vector

Example: Computer Accounting_Software

[support = 5%, confidence = 60%] Minimum support and Confidence threshold

Page 4: 1.9.association mining 1

Association Rule Mining 4

Basic Concepts

I = {i1, i2, …im} – Set of Items D – Set of database Transactions T – Transaction contains a set of items and T I Association rule – A B where A I B I and A B = Support – Percentage of transactions in D containing both A and B -

P(AB) Confidence – Percentage of transactions in D containing A that also

contain B – P(B/A) Confidence(A B) = Support(A B) / Support (A)

Page 5: 1.9.association mining 1

Association Rule Mining 5

Basic Concepts

Itemset K-Itemset Occurrence frequency of an itemset

Frequency, support_count (absolute support) or count Itemset satisfies minimum support when count >=

min_sup * number of transactions in D Minimum Support Count Frequent Itemset

Page 6: 1.9.association mining 1

Association Rule Mining 6

Association Rule Mining Process Find all frequent itemsets Generate strong association rules from

frequent itemsets Satisfy Minimum Support and Minimum

Confidence

Page 7: 1.9.association mining 1

Association Rule Mining 7

Itemsets Complete Itemsets Closed Frequent Itemset

X is closed in a data set S if there exists no proper super itemset Y such that Y has the same support count as X in S

X is frequent

Maximal Frequent Itemset X is Frequent and there exists no super-itemset Y such that X Y

and Y is frequent in S

Example: T = { {a1,a2,…a100}, {a1,a2,…a50}}, min_sup = 1

Closed frequent itemsets : Both {{a1,a2,…a100}:1, {a1,a2,…a50}: 2}

Maximal frequent itemset: {a1,a2,…a100}

Page 8: 1.9.association mining 1

Association Rule Mining 8

Types of Association Rules Types of Values

Boolean, Quantitative Association Rule

Dimensions of data Single Dimensional, Multi-dimensional

Level of abstraction Multilevel association rules

Based on kinds of rules Association rules, Correlation rules, Strong gradient relationships

Based on completeness of patterns Complete, Closed, Maximal, top-k, constrained, approximate…

Page 9: 1.9.association mining 1

Association Rule Mining 9

Mining Single Dimensional Boolean Association Rules

Apriori Algorithm – Finding Frequent Itemsets using Candidate Generation Uses prior knowledge of frequent itemset properties Level wise search

K itemsets used for exploring k+1 itemsets Frequent 1-itemsets – L1

L1 is used to find L2

Page 10: 1.9.association mining 1

Association Rule Mining 10

Apriori Property

Reduces Search space

All non empty subsets of a frequent itemset must also be

frequent

If P(I) < min_sup then P(I U A) < min_sup

Anti-monotone property – If a set cannot pass a test all

of its supersets will fail the test as well.

Any subset of a frequent itemset must be frequent

Page 11: 1.9.association mining 1

Association Rule Mining 11

Apriori property application

Join Step To find Lk - join Lk-1 with itself - Ck

li[j] – jth item in li

Members of Lk-1 are joinable if their first (k-2) items are

common

Members l1 and l2 of Lk-1 are joinable if (l1[1]=l2[1])

(l1[2]=l2[2]) …(l1[k-2]=l2[k-2]) (l1[k-1]< l2[k-1])

Resulting itemset is l1[1], l1[2], … l1[k-1], l2[k-1]

Page 12: 1.9.association mining 1

Association Rule Mining 12

Apriori property application

Prune Step Ck is a superset of Lk

Determine the count of each candidate of Ck

To reduce the size of Ck - if any (k-1) subset is not in Lk-1 it can be

removed from Ck

Page 13: 1.9.association mining 1

Association Rule Mining 13

The Apriori Algorithm

Pseudo-code:Ck: Candidate itemset of size kLk : frequent itemset of size kL1 = {frequent items};for (k = 2; Lk-1 !=; k++) do begin Ck = candidates generated from Lk-1; for each transaction t in database do

increment the count of all candidates in Ck that are contained in t

Lk = candidates in Ck with min_support endreturn k Lk;

Page 14: 1.9.association mining 1

Association Rule Mining 14

The Apriori Algorithm—An Example

Database TDBTid Items

T100 I1,I2,I5

T200 I2,I4

T300 I2,I3

T400 I1, I2, I4

T500 I1, I3

T600 I2, I3

T700 I1, I3

T800 I1, I2, I3, I5

T900 I1, I2, I3

Minimum Support = 2 / 9 = 22%

Page 15: 1.9.association mining 1

Association Rule Mining 15

Apriori Algorithm

Input: Database of transactions – D, min_sup Output: L, frequent itemsetsL1 = find_frequent_1-itemsets(D);for(k=2;Lk-1≠ Ø; k++){

Ck = apriori_gen(Lk-1, min_sup);for each transaction t Є D {

Ct = subset(Ck, t)for each candidate c Є Ct

c.count++;}Lk = {c Є Ck | c.count >= min_sup }

}return L = UkLk;

Page 16: 1.9.association mining 1

Association Rule Mining 16

Apriori Algorithm

procedure apriori_gen(Lk-1 , min_sup)for each itemset l1 Є Lk-1

for each itemset l2 Є Lk-1

if(l1[1]=l2[1]) (l1[2]=l2[2]) … (l1[k-1]< l2[k-1]){c = l1 join l2; // Join stepif has_infrequent_subset(c, Lk-i) then

delete c;// Prune stepelse add c to Ck; }

return Ck

procedure has_infrequent_subset(c, Lk-1)for each (k-1) subset s of c

if s is not an element of Lk-1 then return TRUE;return false;