1 apriori algorithm rakesh agrawal ramakrishnan srikant
TRANSCRIPT
![Page 1: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/1.jpg)
11
Apriori AlgorithmApriori Algorithm
Rakesh AgrawalRakesh Agrawal
Ramakrishnan SrikantRamakrishnan Srikant
![Page 2: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/2.jpg)
22
Association RulesAssociation Rules(Agrawal, Imielinski & Swami: SIGMOD ’93)(Agrawal, Imielinski & Swami: SIGMOD ’93)
I = { i1, i2 …, im }: a set of literals, called items.
Transaction T: a set of items such that T I. Database D: a set of transactions.
A transaction T contains X, a set of some items in I, if X T.
An association rule is an implication of the form X Y, where X, Y I.– Support: % of transactions in D that contain X Y.– Confidence: Among transactions that contain X, what % also
contain Y.
Find all rules that have support and confidence greater than user-specified minimum support and minimum confidence.
![Page 3: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/3.jpg)
33
1. Find all sets of items that have minimum support (frequent itemsets).
2. Use the frequent itemsets to generate the desired rules. confidence ( X Y ) = support ( X Y ) / support ( X )
Computing Association Rules: Problem Decomposition
What itemsets should you count?
How do you count them efficiently?
![Page 4: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/4.jpg)
44
What itemsets do you count?What itemsets do you count?
Search space is exponential.– With n items, nCk potential candidates of size k.
Anti-monotonicity: Any superset of an infrequent itemset is also infrequent (SIGMOD ’93).– If an itemset is infrequent, don’t count any of its extensions.
Flip the property: All subsets of a frequent itemset are frequent.
Need not count any candidate that has an infrequent subset (VLDB ’94)– Simultaneously observed by Mannila et al., KDD ’94
Broadly applicable to extensions and restrictions.
![Page 5: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/5.jpg)
55
Apriori Algorithm: Apriori Algorithm: Breadth First SearchBreadth First Search
{ }
a b c d
![Page 6: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/6.jpg)
66
Apriori Algorithm: Apriori Algorithm: Breadth First SearchBreadth First Search
{ }
a b c d
![Page 7: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/7.jpg)
77
Apriori Algorithm: Apriori Algorithm: Breadth First SearchBreadth First Search
{ }
a b c
a b a d
d
b d
![Page 8: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/8.jpg)
88
Apriori Algorithm: Apriori Algorithm: Breadth First SearchBreadth First Search
{ }
a b c
a b a d
d
b d
![Page 9: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/9.jpg)
99
Apriori Algorithm: Apriori Algorithm: Breadth First SearchBreadth First Search
{ }
a b c
a b a d
a b d
d
b d
![Page 10: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/10.jpg)
1010
Apriori Algorithm: Apriori Algorithm: Breadth First SearchBreadth First Search
{ }
a b c
a b a d
a b d
d
b d
![Page 11: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/11.jpg)
1111
APRIORI Candidate GenerationAPRIORI Candidate Generation(VLDB 94)(VLDB 94)
Lk : Frequent itemsets of size k, Ck :
Candidate itemsets of size k
Given Lk, generate Ck+1 in two steps:
1. Join Step : Join Lk with Lk, with the
join condition that the first k-1 items should be the same and l1[k] < l2[k].
L3
{ a b c }
{ a b d }
{ a c d }
{ a c e }
{ b c d }
C4
{ a b c d }
{ a c d e }
![Page 12: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/12.jpg)
1212
APRIORI Candidate GenerationAPRIORI Candidate Generation(VLDB 94)(VLDB 94)
Lk : Frequent itemsets of size k, Ck :
Candidate itemsets of size k
Given Lk, generate Ck+1 in two steps:
1. Join Step : Join Lk with Lk, with the
join condition that the first k-1 items should be the same and l1[k] < l2[k].
2. Prune Step : Delete all candidates which have a non-frequent subset.
C4
{ a b c d }
{ a c d e }
L3
{ a b c }
{ a b d }
{ a c d }
{ a c e }
{ b c d }
![Page 13: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/13.jpg)
1313
How do you count?How do you count?
Given a set of candidates Ck, for each transaction T:
– Find all members of Ck which are contained in T.
Hash-tree data structure [VLDB ’94]– C2 : { }– T : {c, e, f}
a b c d e f g
{a, b} {e, f} {e, g}
![Page 14: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/14.jpg)
1414
How do you count?How do you count?
Given a set of candidates Ck, for each transaction T:
– Find all members of Ck which are contained in T.
Hash-tree data structure [VLDB ’94]– C2 : { }– T : {c, e, f}
a b c d e f g
{a, b} {e, f} {e, g}
![Page 15: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/15.jpg)
1515
How do you count?How do you count?
Given a set of candidates Ck, for each transaction T:
– Find all members of Ck which are contained in T.
Hash-tree data structure [VLDB ’94]– C2 : { }– T : {c, e, f}
a b c d e f g
{a, b} {e, f} {e, g}
Recursively construct hash tables if number of itemsets is above a
threshold.
f g
![Page 16: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/16.jpg)
1616
ImpactImpact
Concepts in Apriori also applied to many generalizations, e.g., Concepts in Apriori also applied to many generalizations, e.g., taxonomies, quantitative Associations, sequential Patterns, taxonomies, quantitative Associations, sequential Patterns, graphs, …graphs, …
Over 3000 citations in Google Scholar.Over 3000 citations in Google Scholar.
![Page 17: 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant](https://reader036.vdocument.in/reader036/viewer/2022082709/56649d215503460f949f6bdd/html5/thumbnails/17.jpg)
1717
Subsequent Algorithmic InnovationsSubsequent Algorithmic Innovations
Reducing the cost of checking whether a candidate itemset is Reducing the cost of checking whether a candidate itemset is contained in a transaction:contained in a transaction:– TID intersection.TID intersection.– Database projection, FP GrowthDatabase projection, FP Growth
Reducing the number of passes over the data:Reducing the number of passes over the data:– Sampling & Dynamic CountingSampling & Dynamic Counting
Reducing the number of candidates counted:Reducing the number of candidates counted:– For maximal patterns & constraints.For maximal patterns & constraints.
Many other innovative ideas …Many other innovative ideas …