association rule(ar)
TRANSCRIPT
-
8/14/2019 Association Rule(AR)
1/27
Introduction to Association Rules
Albert Orriols i Puig
aorr o s sa e.ur .e u
Artificial Intelligence Machine Learning
En in eria i Ar uitectura La Salle
Universitat Ramon Llull
-
8/14/2019 Association Rule(AR)
2/27
Recap of Lecture 5-12
LETS START WITH DATA
Slide 2Artificial Intelligence Machine Learning
-
8/14/2019 Association Rule(AR)
3/27
Recap of Lecture 5-12
Data Set Classification Model How?
We have seen four different types of approaches to classification :
Decision trees (C4.5)
-
Bayesian classifiers (Nave Bayes)
Slide 3Artificial Intelligence Machine Learning
eura e wor s ercep ron, a ne, a a ne,
-
8/14/2019 Association Rule(AR)
4/27
Todays Agenda
Introduction to Association Rules
A Taxonomy of Association Rules
Measures of Interest Apriori
Slide 4Artificial Intelligence Machine Learning
-
8/14/2019 Association Rule(AR)
5/27
Introduction to AR
Ideas come from the market basket anal sis MBA
Lets go shopping!
Milk, eggs, sugar,bread
Milk, eggs, cereal, Eggs, sugar
rea
Customer2 Customer3
What do my customer buy? Which product are bought together?
Aim: Find associations and correlations between the different
Slide 5
items that customers place in their shopping basket
Artificial Intelligence Machine Learning
-
8/14/2019 Association Rule(AR)
6/27
Introduction to AR
Formalizin the roblem a little bit
Transaction Database T: a set of transactions T = {t1, t2, , tn}
ac ransac on con a ns a se o ems em se An itemset is a collection of items I = {i1, i2, , im}
Find frequent/interesting patterns, associations, correlations, or
databases or other information repositories.
X Y
Slide 6Artificial Intelligence Machine Learning
-
8/14/2019 Association Rule(AR)
7/27
Example of AR
TID Items
T1 bread, jelly, peanut-butter
Examples:
-
T2 bread, peanut-butter
T3 bread, milk, peanut-butter
beer bread
T4 beer, bread
T5 beer, milk
Frequent itemsets: Items that frequently appear together
= - ,
I = {beer, bread}
Slide 7Artificial Intelligence Machine Learning
-
8/14/2019 Association Rule(AR)
8/27
Whats an Interesting Rule?
Su ort count TID Items
Frequency of occurrence of
and itemset
T1 bread, jelly, peanut-butter
T2 bread, eanut-butter
({bread, peanut-butter}) = 3
beer bread = 1
T3 bread, milk, peanut-butter
T4 beer, bread
Support T5 beer, milk
rac on o ransac ons a
contain an itemset
, -
s ({beer, bread}) = 1/5
requen emse
An itemset whose support is greater than or equal to a
Slide 8
minimum support threshold (minsup)
Artificial Intelligence Machine Learning
-
8/14/2019 Association Rule(AR)
9/27
Whats an Interesting Rule?
An association rule is an TID Itemsimplication of two itemsets T1 bread, jelly, peanut-butter
T2 bread, eanut-butter
T3 bread, milk, peanut-butter
T4 beer, bread
Many measures of interest.
The two most used are:
T5 beer, milk
Support (s)
The occurring frequency of the rule, )( YXi.e., number of transactions that
contain both X and Y trans.of#
=
on ence c
The strength of the association, )( YXc
=
Slide 9
. .,appear in transactions that contain X
Artificial Intelligence Machine Learning
-
8/14/2019 Association Rule(AR)
10/27
Interestingness of Rules
TID Items
TID s c
bread peanut-butter 0.60 0.75
T1 bread, jelly, peanut-butter
T2 bread, peanut-butter
peanut-butter bread 0.60 1.00
beer bread 0.20 0.50
T3 bread, milk, peanut-butter
T4 beer, bread
peanut-butterjelly 0.20 0.33
jelly peanut-butter 0.20 1.00
T5 beer, milk
e y m . .
Many other interesting measures
approaches
Slide 10Artificial Intelligence Machine Learning
-
8/14/2019 Association Rule(AR)
11/27
Types of AR
Binar association rules:
bread peanut-butter
Quantitative association rules:
weight in [70kg 90kg] height in [170cm 190cm]
Fuzzy association rules:
weight in TALL height in TALL
Lets start for the beginning
Slide 11
nary assoc a on ru es pr or
Artificial Intelligence Machine Learning
-
8/14/2019 Association Rule(AR)
12/27
Apriori
This is the most influential AR miner
It consists of two steps
1. Generate all frequent itemsets whose support minsup2. Use frequent itemsets to generate association rules
,
Slide 12Artificial Intelligence Machine Learning
-
8/14/2019 Association Rule(AR)
13/27
Apriori
null
A B C D E
AB ADAC AE BDBC BE CECD DE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Slide 13
Do I have to generate them all?
Artificial Intelligence Machine Learning
A i i
-
8/14/2019 Association Rule(AR)
14/27
Apriori
Lets avoid ex andin all the ra h
Key idea:
Downward closure property:Any subsets of a frequent itemsetare also frequent itemsets
Therefore, the algorithm iteratively does: Create itemsets
Slide 14Artificial Intelligence Machine Learning
E l It t G ti
-
8/14/2019 Association Rule(AR)
15/27
Example Itemset Generation
nullInfrequent
A B C D E
AB ADAC AE BDBC BE CECD DE
ABCD ABCE ABDE ACDE BCDE
ABCD
Slide 15
Do I have to generate them all?
Artificial Intelligence Machine Learning
R i th E l
-
8/14/2019 Association Rule(AR)
16/27
Recovering the Example
TID Items
T1 bread, jelly, peanut-butter
T2 bread, peanut-butter
rea , m , peanut- utter T4 beer, bread
Minimum su ort = 3eer, m
1-itemsets
bread 4
-Item count
2-itemsets
jelly 1 bread, peanut-b 3
beer 1
Slide 16Artificial Intelligence Machine Learning
A i i Al ith
-
8/14/2019 Association Rule(AR)
17/27
Apriori Algorithm
k=1
Generate frequent itemsets of length 1
Repeat until no frequent itemsets are found k := k+1
Generate itemsets of size k from the k-1 frequent itemsets
Slide 17Artificial Intelligence Machine Learning
A i i Al ith
-
8/14/2019 Association Rule(AR)
18/27
Apriori Algorithm
AlgorithmApriori(T)
C1 init-pass(T);
F1 {f| f C1, f.count/n minsup}; // n: no. of transactions in T
for(k = 2; Fk-1 ; k++) doCk candidate-gen(Fk-1);
for each transaction t T do
for each candidate c
Ck doc s con a ne n en
c.count++;
en
end
k k .
end
Slide 18Artificial Intelligence Machine Learning
k k
A i i Al ith
-
8/14/2019 Association Rule(AR)
19/27
Apriori Algorithm
Function candidate-gen(Fk-1)
Ck ;
forallf1, f2 Fk-1
withf1 = {i1, , ik-2, ik-1}andf2 = {i1, , ik-2, ik-1}
andik-1 < ik-1 do
c
{i1, , ik-1, ik-1}; // join f1and f2Ck Ck {c};
foreach (k-1)-subset s of c do
s k-1 en
delete c from Ck; // prune
end
Slide 19Artificial Intelligence Machine Learning
k
Example of Apriori Run
-
8/14/2019 Association Rule(AR)
20/27
Example of Apriori Run
Itemset supa a ase
C1L1
Tid Items
{A} 2
{B} 3{A} 2
{B} 3
1st scan, ,
20 B, C, E
30 A, B, C, E
{D} 1
{E} 3
{C} 3{E} 3
C2 C2
40 B, E
ItemsetItemset sup
L2 2nd scan {A, B}
{A, C}
,
{A, C} 2
{A, E} 1
Itemset sup
{A, C} 2
{A, E}
{B, C}
{B, C} 2
{B, E} 3
,
{B, E} 3{C, E} 2
C Lrd
,
{C, E}
Itemset
Slide 20Artificial Intelligence Machine Learning
{B, C, E}emse sup
{B, C, E} 2
Apriori
-
8/14/2019 Association Rule(AR)
21/27
Apriori
Remember that A riori consists of two ste s
1. Generate all frequent itemsets whose support minsup
2. se requen emse s o genera e assoc a on ru es
We accomplished step 1. So we have all frequent
itemsets So, lets pay attention to the second step
Slide 21Artificial Intelligence Machine Learning
Rule Generation in Apriori
-
8/14/2019 Association Rule(AR)
22/27
Rule Generation in Apriori
Given a fre uent itemset L
Find all non-empty subsets F in L, such that the association
rule F L-F satisfies the minimum confidence
Create the rule F {L-F}
If L={A,B,C} The candidate itemsets are: ABC, ACB, BCA, ABC,
BAC, CAB
In general, there are 2K
-2 candidate solutions, where k is thelength of the itemset L
Slide 22Artificial Intelligence Machine Learning
Can you Be More Efficient?
-
8/14/2019 Association Rule(AR)
23/27
Can you Be More Efficient?
Can we a l the same trick used with su ort?
Confidence does not have anti-monote property
a s, c
> c
Dont know!
But confidence of rules generated from the same itemsetoes ave t e ant -monote property
L={A,B,C,D}
C(ABC
D)
c(AB CD)
c(A BCD) We can a l this ro ert to rune the rule eneration
Slide 23Artificial Intelligence Machine Learning
Example of Efficient Rule Generation
-
8/14/2019 Association Rule(AR)
24/27
Example of Efficient Rule Generation
ABCDLow
confidence
ABCD ABDC ACDB BCDA
ABCD ACBD BCAD BDADADBC CDAB
ABCD BACD CABD DABC
Slide 24Artificial Intelligence Machine Learning
Challenges in AR Mining
-
8/14/2019 Association Rule(AR)
25/27
Challenges in AR Mining
Challen es
Apriori scans the data base multiple times
os o en, ere s a g num er o can a es Support counting for candidates can be time expensive
Reduce the number of scans of the data base
Shrink the number of candidates
Counting the support of candidates more efficiently
Slide 25Artificial Intelligence Machine Learning
Next Class
-
8/14/2019 Association Rule(AR)
26/27
Next Class
Advanced topics in association rule mining
Slide 26Artificial Intelligence Machine Learning
-
8/14/2019 Association Rule(AR)
27/27
Introduction to Association Rules
Albert Orriols i Puig
aorr o s sa e.ur .e u
Artificial Intelligence Machine Learning
En in eria i Ar uitectura La Salle
Universitat Ramon Llull