sin título de diapositivasin título de diapositiva author: miquel sànchez i marrè subject:...

16
https://kemlg.upc.edu Association Rules Miquel Sànchez-Marrè Intelligent Data Science and Artificial Intelligence Research Centre (IDEAI-UPC) Knowledge Engineering and Machine Learning Group (KEMLG-UPC) Computer Science Dept. Universitat Politècnica de Catalunya · BarcelonaTech [email protected] http://www.cs.upc.edu/~miquel Course 2019/2020

Upload: others

Post on 27-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

httpskemlgupcedu

Association Rules

Miquel Sagravenchez-Marregrave

Intelligent Data Science and Artificial Intelligence Research Centre (IDEAI-UPC)Knowledge Engineering and Machine Learning Group (KEMLG-UPC)

Computer Science DeptUniversitat Politegravecnica de Catalunya middot BarcelonaTech

miquelcsupceduhttpwwwcsupcedu~miquel

Course 20192020

httpskemlgupcedu

Associative Models

Association Rules

copy Miquel Sagravenchez i Marregrave KEMLG 20203

Association Rules (1)

Goal to obtain a set of association rules which express the correlation among attributes from a database of item transactions

Applicability criteria database should have enough number of transactions in order that the correlation appear a sufficient number of times

Most common Methods Apriori [Agrawal amp Srikant 1994] Eclat (Equivalence CLAss Transformation) [Zaki 2000] FP-growth (Frequent Pattern Growth) [Han et al 2004]

Input original data matrix (unsupervised but could be supervised) Output set of association rules satisfying a minimum support and

a minimum confidence Parameters support confidence number of rules

Association Rules (2) Given a database consisting of a set of transactions

D = t1 t2 hellip tn and given I=i1 in be a set of n attributes called items

Each transaction in D has a unique transaction ID and contains a subset of the items in It1 i2 i3 i4 i6 i9t2 i1 i2 i4 i7 i8 i9t3 i2 i4 i5 i6t4 i1 i3 i4 i8 i9 i10 tn i3 i4 i6 i9

The issue is to obtain common patterns of co-occurrence of the same items along the database

4copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (3) For instance the following common patterns can be obtained

i2 i4i4 i9i2 i4 i9i3 i4 i9i3 i4 i6 i9

From a common pattern several association rules can be generated

An association rule is defined as an implication of the formX rArr YWhere X Y sube I and X cap Y = empty

5copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (4) Every rule is composed by two different sets of items also

known as itemsets X and Y X is called the antecedent or left-hand-side (LHS) of the rule

and Y is called the consequent or right-hand-side (RHS) of the ruleFor instancei2 rArr i4 i4 rArr i2i4 rArr i9 i9 rArr i4i2 rArr i4 and i9 i4 rArr i2 and i9 i9 rArr i2 and i4i2 and i4 rArr i9 i2 and i9 rArr i4 i4 and i9 rArr i2 i3 and i4 rArr i9 i3 and i9 rArr i4 i4 and i9 rArr i3 i3 and i4 and i6 rArr i9 i3 and i4 and i9 rArr i6 i3 and i6 and i9 rArr i4i3 and i4 and i6 rArr i9 i3 and i4 rArr i6 and i9 i3 and i6 rArr i4 and i9

6copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (5) An example dataset

7

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (6)

Item an attribute ndash value pair Itemset combination of items that have a minimum specified support

(minsup) Support Coverage of an itemset

The support value of X with respect to T is defined as the number of transactions in the database which contains the itemset X supp(X) = |t isinT X sube T| (absolute definition) supp(X) = |t isinT X sube T| |T| (relative definition)

Support of a rule supp(X rArr Y) = supp(X cup Y ) |T|

Confidence of a ruleThe confidence value of a rule X rArr Y with respect to a set of transactions T is the proportion of the transactions containing X which also contains Y

conf(X rArr Y) = supp(X cup Y) supp(X)

8copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (7)

Example of an Item Temperature = cool

Example of Itemsets Temperature = cool Temperature = cool Humidity = normal

Example of rulesif Temperature = cool then humidity = normal

supp(Temperature = cool) = 4supp(Humidity = normal Temperature = cool)= 4conf(if Temperature = cool then Humidity = normal) = 44 = 100

if Humidity = normal then Temperature = coolsupp(Humidity = normal) = 7 conf(if Humidity = normal then Temperature = cool) = 47 = 5714

The rules we are interested are those ones with a minimum supportand with high confidence

9copy Miquel Sagravenchez i Marregrave KEMLG 2020

httpskemlgupcedu

Apriori Algorithm

Apriori algorithm (1)

1 Define the minsup and the minconf and eventually the rules desired2 Compute the itemsets with supp(itemset) ge minsup for n=1 hellip N-1 being N= the number of available attributes

Use a hash-table to store the itemsets Use lexicographical ordering for generating and storing the

itemsets in the hash-table Apply the filter property frequent itemsets of length L

must be formed from frequent itemsets of length L-13 For each itemset generated in the previous step generate the candidate rules from it checking that they have the specified minimum accuracy (conf(rule) ge minconf)

Generate the rules starting first with one itemset in the consequent and progress with two itemsets etc

11copy Miquel Sagravenchez i Marregrave KEMLG 2020

Apriori (Tminsup) algorithm (2)

12copy Miquel Sagravenchez i Marregrave KEMLG 2020

L1 = large 1-itemsetsfor (k = 2 Lk-1 ne empty k++) do

Ck = Candidate-generation (Lk-1) New apriori candidates generated by extending Lk-1 candidates

forall transactions t isin T doCt = c isin Ck | Ck sube t Candidates contained in tforall candidates c isin Ct do

ccount++end

endLk = c | c isin Ck and ccount ge minsup

end

return cupk Lk

Candidate Generation

Two steps join step and prune filter stepCk = emptyforall a b isin Lk-1 such that a=I1 hellip Ik-2 Ik-1 and

b=I1 hellip Ik-2 Irsquok-1 and Ik-1 lt Irsquok-1 do join k-1 large itemsets with a common prefix and one item different

in lexicographic order to not repeat itemsets

c larr I1 hellip Ik-2 Ik-1 Irsquok-1 c is the join of a and b

Ck larr Ck cup c endforforeach c such that exists | s sube c |s|=k-1 and s notin Lk-1 do

Ck larr Ck ndash c apply filter property step

endforeachreturn Ck

copy Miquel Sagravenchez i Marregrave KEMLG-IDEAI 202013

Apriorialgorithm

(3)

Example minsup = 2

14copy Miquel Sagravenchez i Marregrave KEMLG 2020

One-item sets (12) sup Two-item sets

(47) sup Three-item sets (39) sup Four-item sets

(6) sup

outlook = sunny 5 outlook = sunnytemperature = mild 2

outlook = sunnytemperature = hothumidity = high

2outlook = sunny

temperature = hothumidity = high

play = no2

outlook = overcast 4 outlook = sunnytemperature = hot 2

outlook = sunnytemperature = hot

play = no2

outlook = sunnyhumidity = highwindy = false

play = no2

outlook = rainy 5 outlook = sunnyhumidity = normal 2

outlook = sunnyhumidity = normal

play = yes2

outlook = overcasttemperature = hot

windy = falseplay = yes

2

temperature = cool 4 outlook = sunnyhumidity = high 3

outlook = sunnyhumidity = highwindy = false

2outlook = rainy

temperature = mildwindy = false

play = yes2

temperature = mild 6 outlook = sunnywindy = true 2

outlook = sunnyhumidity = high

play = no3

outlook = rainyhumidity = normal

windy = falseplay = yes

2

temperature = hot 4 outlook = sunnywindy = false 3

outlook = sunnywindy = false

play = no2

temperature = coolhumidity = normal

windy = falseplay = yes

2

humidity = normal 7 outlook = sunnyplay = yes 2

outlook = overcasttemperature = hot

windy = false2

humidity = high 7 outlook = sunnyplay = no 3

outlook = overcasttemperature = hot

play = yes2

windy = true 6 outlook = overcasttemperature = hot 2

outlook = overcasttemperature = hot

play = yes2

windy = false 8 outlook = overcasthumidity = normal 2

outlook = overcasthumidity = high

play = yes2

play = yes 9 outlook = overcasthumidity = high 2

outlook = overcastwindy = trueplay = yes

2

play = no 5 outlook = overcastwindy = true 2

outlook = overcastwindy = false

play = yes2

hellip hellip

Association Rules (8) An example dataset

15copy Miquel Sagravenchez i Marregrave KEMLG 2020

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

Rule Generation Let us take the 3-item set

L3 = humidity = normal windy = false play = yesSupp (L3) = 4

Rules generated Confidenceif humidity = normal and windy = false then play = yes 44 = 100if humidity = normal and play = yes then windy = false 46 = 666if windy = false and play = yes then humidity = normal 46 = 666if humidity = normal then windy = false and play = yes 47 = 5714if windy = false then humidity = normal and play = yes 48 = 50if play = yes then humidity = normal and windy = false 49 = 4444if empty then humidity = normal and windy = false and play = yes 414 = 2857

Rules(N-item set) = sum119894119894=1119873119873 119862119862119873119873 119894119894

In the example database with 100 confidence and minsup ge 2 there are 58 rules

16copy Miquel Sagravenchez i Marregrave KEMLG 2020

  • Association Rules
  • Associative Models
  • Association Rules (1)
  • Association Rules (2)
  • Association Rules (3)
  • Association Rules (4)
  • Association Rules (5)
  • Association Rules (6)
  • Association Rules (7)
  • Apriori Algorithm
  • Apriori algorithm (1)
  • Apriori (Tminsup) algorithm (2)
  • Candidate Generation
  • Apriori algorithm (3)
  • Association Rules (8)
  • Rule Generation

httpskemlgupcedu

Associative Models

Association Rules

copy Miquel Sagravenchez i Marregrave KEMLG 20203

Association Rules (1)

Goal to obtain a set of association rules which express the correlation among attributes from a database of item transactions

Applicability criteria database should have enough number of transactions in order that the correlation appear a sufficient number of times

Most common Methods Apriori [Agrawal amp Srikant 1994] Eclat (Equivalence CLAss Transformation) [Zaki 2000] FP-growth (Frequent Pattern Growth) [Han et al 2004]

Input original data matrix (unsupervised but could be supervised) Output set of association rules satisfying a minimum support and

a minimum confidence Parameters support confidence number of rules

Association Rules (2) Given a database consisting of a set of transactions

D = t1 t2 hellip tn and given I=i1 in be a set of n attributes called items

Each transaction in D has a unique transaction ID and contains a subset of the items in It1 i2 i3 i4 i6 i9t2 i1 i2 i4 i7 i8 i9t3 i2 i4 i5 i6t4 i1 i3 i4 i8 i9 i10 tn i3 i4 i6 i9

The issue is to obtain common patterns of co-occurrence of the same items along the database

4copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (3) For instance the following common patterns can be obtained

i2 i4i4 i9i2 i4 i9i3 i4 i9i3 i4 i6 i9

From a common pattern several association rules can be generated

An association rule is defined as an implication of the formX rArr YWhere X Y sube I and X cap Y = empty

5copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (4) Every rule is composed by two different sets of items also

known as itemsets X and Y X is called the antecedent or left-hand-side (LHS) of the rule

and Y is called the consequent or right-hand-side (RHS) of the ruleFor instancei2 rArr i4 i4 rArr i2i4 rArr i9 i9 rArr i4i2 rArr i4 and i9 i4 rArr i2 and i9 i9 rArr i2 and i4i2 and i4 rArr i9 i2 and i9 rArr i4 i4 and i9 rArr i2 i3 and i4 rArr i9 i3 and i9 rArr i4 i4 and i9 rArr i3 i3 and i4 and i6 rArr i9 i3 and i4 and i9 rArr i6 i3 and i6 and i9 rArr i4i3 and i4 and i6 rArr i9 i3 and i4 rArr i6 and i9 i3 and i6 rArr i4 and i9

6copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (5) An example dataset

7

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (6)

Item an attribute ndash value pair Itemset combination of items that have a minimum specified support

(minsup) Support Coverage of an itemset

The support value of X with respect to T is defined as the number of transactions in the database which contains the itemset X supp(X) = |t isinT X sube T| (absolute definition) supp(X) = |t isinT X sube T| |T| (relative definition)

Support of a rule supp(X rArr Y) = supp(X cup Y ) |T|

Confidence of a ruleThe confidence value of a rule X rArr Y with respect to a set of transactions T is the proportion of the transactions containing X which also contains Y

conf(X rArr Y) = supp(X cup Y) supp(X)

8copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (7)

Example of an Item Temperature = cool

Example of Itemsets Temperature = cool Temperature = cool Humidity = normal

Example of rulesif Temperature = cool then humidity = normal

supp(Temperature = cool) = 4supp(Humidity = normal Temperature = cool)= 4conf(if Temperature = cool then Humidity = normal) = 44 = 100

if Humidity = normal then Temperature = coolsupp(Humidity = normal) = 7 conf(if Humidity = normal then Temperature = cool) = 47 = 5714

The rules we are interested are those ones with a minimum supportand with high confidence

9copy Miquel Sagravenchez i Marregrave KEMLG 2020

httpskemlgupcedu

Apriori Algorithm

Apriori algorithm (1)

1 Define the minsup and the minconf and eventually the rules desired2 Compute the itemsets with supp(itemset) ge minsup for n=1 hellip N-1 being N= the number of available attributes

Use a hash-table to store the itemsets Use lexicographical ordering for generating and storing the

itemsets in the hash-table Apply the filter property frequent itemsets of length L

must be formed from frequent itemsets of length L-13 For each itemset generated in the previous step generate the candidate rules from it checking that they have the specified minimum accuracy (conf(rule) ge minconf)

Generate the rules starting first with one itemset in the consequent and progress with two itemsets etc

11copy Miquel Sagravenchez i Marregrave KEMLG 2020

Apriori (Tminsup) algorithm (2)

12copy Miquel Sagravenchez i Marregrave KEMLG 2020

L1 = large 1-itemsetsfor (k = 2 Lk-1 ne empty k++) do

Ck = Candidate-generation (Lk-1) New apriori candidates generated by extending Lk-1 candidates

forall transactions t isin T doCt = c isin Ck | Ck sube t Candidates contained in tforall candidates c isin Ct do

ccount++end

endLk = c | c isin Ck and ccount ge minsup

end

return cupk Lk

Candidate Generation

Two steps join step and prune filter stepCk = emptyforall a b isin Lk-1 such that a=I1 hellip Ik-2 Ik-1 and

b=I1 hellip Ik-2 Irsquok-1 and Ik-1 lt Irsquok-1 do join k-1 large itemsets with a common prefix and one item different

in lexicographic order to not repeat itemsets

c larr I1 hellip Ik-2 Ik-1 Irsquok-1 c is the join of a and b

Ck larr Ck cup c endforforeach c such that exists | s sube c |s|=k-1 and s notin Lk-1 do

Ck larr Ck ndash c apply filter property step

endforeachreturn Ck

copy Miquel Sagravenchez i Marregrave KEMLG-IDEAI 202013

Apriorialgorithm

(3)

Example minsup = 2

14copy Miquel Sagravenchez i Marregrave KEMLG 2020

One-item sets (12) sup Two-item sets

(47) sup Three-item sets (39) sup Four-item sets

(6) sup

outlook = sunny 5 outlook = sunnytemperature = mild 2

outlook = sunnytemperature = hothumidity = high

2outlook = sunny

temperature = hothumidity = high

play = no2

outlook = overcast 4 outlook = sunnytemperature = hot 2

outlook = sunnytemperature = hot

play = no2

outlook = sunnyhumidity = highwindy = false

play = no2

outlook = rainy 5 outlook = sunnyhumidity = normal 2

outlook = sunnyhumidity = normal

play = yes2

outlook = overcasttemperature = hot

windy = falseplay = yes

2

temperature = cool 4 outlook = sunnyhumidity = high 3

outlook = sunnyhumidity = highwindy = false

2outlook = rainy

temperature = mildwindy = false

play = yes2

temperature = mild 6 outlook = sunnywindy = true 2

outlook = sunnyhumidity = high

play = no3

outlook = rainyhumidity = normal

windy = falseplay = yes

2

temperature = hot 4 outlook = sunnywindy = false 3

outlook = sunnywindy = false

play = no2

temperature = coolhumidity = normal

windy = falseplay = yes

2

humidity = normal 7 outlook = sunnyplay = yes 2

outlook = overcasttemperature = hot

windy = false2

humidity = high 7 outlook = sunnyplay = no 3

outlook = overcasttemperature = hot

play = yes2

windy = true 6 outlook = overcasttemperature = hot 2

outlook = overcasttemperature = hot

play = yes2

windy = false 8 outlook = overcasthumidity = normal 2

outlook = overcasthumidity = high

play = yes2

play = yes 9 outlook = overcasthumidity = high 2

outlook = overcastwindy = trueplay = yes

2

play = no 5 outlook = overcastwindy = true 2

outlook = overcastwindy = false

play = yes2

hellip hellip

Association Rules (8) An example dataset

15copy Miquel Sagravenchez i Marregrave KEMLG 2020

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

Rule Generation Let us take the 3-item set

L3 = humidity = normal windy = false play = yesSupp (L3) = 4

Rules generated Confidenceif humidity = normal and windy = false then play = yes 44 = 100if humidity = normal and play = yes then windy = false 46 = 666if windy = false and play = yes then humidity = normal 46 = 666if humidity = normal then windy = false and play = yes 47 = 5714if windy = false then humidity = normal and play = yes 48 = 50if play = yes then humidity = normal and windy = false 49 = 4444if empty then humidity = normal and windy = false and play = yes 414 = 2857

Rules(N-item set) = sum119894119894=1119873119873 119862119862119873119873 119894119894

In the example database with 100 confidence and minsup ge 2 there are 58 rules

16copy Miquel Sagravenchez i Marregrave KEMLG 2020

  • Association Rules
  • Associative Models
  • Association Rules (1)
  • Association Rules (2)
  • Association Rules (3)
  • Association Rules (4)
  • Association Rules (5)
  • Association Rules (6)
  • Association Rules (7)
  • Apriori Algorithm
  • Apriori algorithm (1)
  • Apriori (Tminsup) algorithm (2)
  • Candidate Generation
  • Apriori algorithm (3)
  • Association Rules (8)
  • Rule Generation

copy Miquel Sagravenchez i Marregrave KEMLG 20203

Association Rules (1)

Goal to obtain a set of association rules which express the correlation among attributes from a database of item transactions

Applicability criteria database should have enough number of transactions in order that the correlation appear a sufficient number of times

Most common Methods Apriori [Agrawal amp Srikant 1994] Eclat (Equivalence CLAss Transformation) [Zaki 2000] FP-growth (Frequent Pattern Growth) [Han et al 2004]

Input original data matrix (unsupervised but could be supervised) Output set of association rules satisfying a minimum support and

a minimum confidence Parameters support confidence number of rules

Association Rules (2) Given a database consisting of a set of transactions

D = t1 t2 hellip tn and given I=i1 in be a set of n attributes called items

Each transaction in D has a unique transaction ID and contains a subset of the items in It1 i2 i3 i4 i6 i9t2 i1 i2 i4 i7 i8 i9t3 i2 i4 i5 i6t4 i1 i3 i4 i8 i9 i10 tn i3 i4 i6 i9

The issue is to obtain common patterns of co-occurrence of the same items along the database

4copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (3) For instance the following common patterns can be obtained

i2 i4i4 i9i2 i4 i9i3 i4 i9i3 i4 i6 i9

From a common pattern several association rules can be generated

An association rule is defined as an implication of the formX rArr YWhere X Y sube I and X cap Y = empty

5copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (4) Every rule is composed by two different sets of items also

known as itemsets X and Y X is called the antecedent or left-hand-side (LHS) of the rule

and Y is called the consequent or right-hand-side (RHS) of the ruleFor instancei2 rArr i4 i4 rArr i2i4 rArr i9 i9 rArr i4i2 rArr i4 and i9 i4 rArr i2 and i9 i9 rArr i2 and i4i2 and i4 rArr i9 i2 and i9 rArr i4 i4 and i9 rArr i2 i3 and i4 rArr i9 i3 and i9 rArr i4 i4 and i9 rArr i3 i3 and i4 and i6 rArr i9 i3 and i4 and i9 rArr i6 i3 and i6 and i9 rArr i4i3 and i4 and i6 rArr i9 i3 and i4 rArr i6 and i9 i3 and i6 rArr i4 and i9

6copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (5) An example dataset

7

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (6)

Item an attribute ndash value pair Itemset combination of items that have a minimum specified support

(minsup) Support Coverage of an itemset

The support value of X with respect to T is defined as the number of transactions in the database which contains the itemset X supp(X) = |t isinT X sube T| (absolute definition) supp(X) = |t isinT X sube T| |T| (relative definition)

Support of a rule supp(X rArr Y) = supp(X cup Y ) |T|

Confidence of a ruleThe confidence value of a rule X rArr Y with respect to a set of transactions T is the proportion of the transactions containing X which also contains Y

conf(X rArr Y) = supp(X cup Y) supp(X)

8copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (7)

Example of an Item Temperature = cool

Example of Itemsets Temperature = cool Temperature = cool Humidity = normal

Example of rulesif Temperature = cool then humidity = normal

supp(Temperature = cool) = 4supp(Humidity = normal Temperature = cool)= 4conf(if Temperature = cool then Humidity = normal) = 44 = 100

if Humidity = normal then Temperature = coolsupp(Humidity = normal) = 7 conf(if Humidity = normal then Temperature = cool) = 47 = 5714

The rules we are interested are those ones with a minimum supportand with high confidence

9copy Miquel Sagravenchez i Marregrave KEMLG 2020

httpskemlgupcedu

Apriori Algorithm

Apriori algorithm (1)

1 Define the minsup and the minconf and eventually the rules desired2 Compute the itemsets with supp(itemset) ge minsup for n=1 hellip N-1 being N= the number of available attributes

Use a hash-table to store the itemsets Use lexicographical ordering for generating and storing the

itemsets in the hash-table Apply the filter property frequent itemsets of length L

must be formed from frequent itemsets of length L-13 For each itemset generated in the previous step generate the candidate rules from it checking that they have the specified minimum accuracy (conf(rule) ge minconf)

Generate the rules starting first with one itemset in the consequent and progress with two itemsets etc

11copy Miquel Sagravenchez i Marregrave KEMLG 2020

Apriori (Tminsup) algorithm (2)

12copy Miquel Sagravenchez i Marregrave KEMLG 2020

L1 = large 1-itemsetsfor (k = 2 Lk-1 ne empty k++) do

Ck = Candidate-generation (Lk-1) New apriori candidates generated by extending Lk-1 candidates

forall transactions t isin T doCt = c isin Ck | Ck sube t Candidates contained in tforall candidates c isin Ct do

ccount++end

endLk = c | c isin Ck and ccount ge minsup

end

return cupk Lk

Candidate Generation

Two steps join step and prune filter stepCk = emptyforall a b isin Lk-1 such that a=I1 hellip Ik-2 Ik-1 and

b=I1 hellip Ik-2 Irsquok-1 and Ik-1 lt Irsquok-1 do join k-1 large itemsets with a common prefix and one item different

in lexicographic order to not repeat itemsets

c larr I1 hellip Ik-2 Ik-1 Irsquok-1 c is the join of a and b

Ck larr Ck cup c endforforeach c such that exists | s sube c |s|=k-1 and s notin Lk-1 do

Ck larr Ck ndash c apply filter property step

endforeachreturn Ck

copy Miquel Sagravenchez i Marregrave KEMLG-IDEAI 202013

Apriorialgorithm

(3)

Example minsup = 2

14copy Miquel Sagravenchez i Marregrave KEMLG 2020

One-item sets (12) sup Two-item sets

(47) sup Three-item sets (39) sup Four-item sets

(6) sup

outlook = sunny 5 outlook = sunnytemperature = mild 2

outlook = sunnytemperature = hothumidity = high

2outlook = sunny

temperature = hothumidity = high

play = no2

outlook = overcast 4 outlook = sunnytemperature = hot 2

outlook = sunnytemperature = hot

play = no2

outlook = sunnyhumidity = highwindy = false

play = no2

outlook = rainy 5 outlook = sunnyhumidity = normal 2

outlook = sunnyhumidity = normal

play = yes2

outlook = overcasttemperature = hot

windy = falseplay = yes

2

temperature = cool 4 outlook = sunnyhumidity = high 3

outlook = sunnyhumidity = highwindy = false

2outlook = rainy

temperature = mildwindy = false

play = yes2

temperature = mild 6 outlook = sunnywindy = true 2

outlook = sunnyhumidity = high

play = no3

outlook = rainyhumidity = normal

windy = falseplay = yes

2

temperature = hot 4 outlook = sunnywindy = false 3

outlook = sunnywindy = false

play = no2

temperature = coolhumidity = normal

windy = falseplay = yes

2

humidity = normal 7 outlook = sunnyplay = yes 2

outlook = overcasttemperature = hot

windy = false2

humidity = high 7 outlook = sunnyplay = no 3

outlook = overcasttemperature = hot

play = yes2

windy = true 6 outlook = overcasttemperature = hot 2

outlook = overcasttemperature = hot

play = yes2

windy = false 8 outlook = overcasthumidity = normal 2

outlook = overcasthumidity = high

play = yes2

play = yes 9 outlook = overcasthumidity = high 2

outlook = overcastwindy = trueplay = yes

2

play = no 5 outlook = overcastwindy = true 2

outlook = overcastwindy = false

play = yes2

hellip hellip

Association Rules (8) An example dataset

15copy Miquel Sagravenchez i Marregrave KEMLG 2020

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

Rule Generation Let us take the 3-item set

L3 = humidity = normal windy = false play = yesSupp (L3) = 4

Rules generated Confidenceif humidity = normal and windy = false then play = yes 44 = 100if humidity = normal and play = yes then windy = false 46 = 666if windy = false and play = yes then humidity = normal 46 = 666if humidity = normal then windy = false and play = yes 47 = 5714if windy = false then humidity = normal and play = yes 48 = 50if play = yes then humidity = normal and windy = false 49 = 4444if empty then humidity = normal and windy = false and play = yes 414 = 2857

Rules(N-item set) = sum119894119894=1119873119873 119862119862119873119873 119894119894

In the example database with 100 confidence and minsup ge 2 there are 58 rules

16copy Miquel Sagravenchez i Marregrave KEMLG 2020

  • Association Rules
  • Associative Models
  • Association Rules (1)
  • Association Rules (2)
  • Association Rules (3)
  • Association Rules (4)
  • Association Rules (5)
  • Association Rules (6)
  • Association Rules (7)
  • Apriori Algorithm
  • Apriori algorithm (1)
  • Apriori (Tminsup) algorithm (2)
  • Candidate Generation
  • Apriori algorithm (3)
  • Association Rules (8)
  • Rule Generation

Association Rules (2) Given a database consisting of a set of transactions

D = t1 t2 hellip tn and given I=i1 in be a set of n attributes called items

Each transaction in D has a unique transaction ID and contains a subset of the items in It1 i2 i3 i4 i6 i9t2 i1 i2 i4 i7 i8 i9t3 i2 i4 i5 i6t4 i1 i3 i4 i8 i9 i10 tn i3 i4 i6 i9

The issue is to obtain common patterns of co-occurrence of the same items along the database

4copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (3) For instance the following common patterns can be obtained

i2 i4i4 i9i2 i4 i9i3 i4 i9i3 i4 i6 i9

From a common pattern several association rules can be generated

An association rule is defined as an implication of the formX rArr YWhere X Y sube I and X cap Y = empty

5copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (4) Every rule is composed by two different sets of items also

known as itemsets X and Y X is called the antecedent or left-hand-side (LHS) of the rule

and Y is called the consequent or right-hand-side (RHS) of the ruleFor instancei2 rArr i4 i4 rArr i2i4 rArr i9 i9 rArr i4i2 rArr i4 and i9 i4 rArr i2 and i9 i9 rArr i2 and i4i2 and i4 rArr i9 i2 and i9 rArr i4 i4 and i9 rArr i2 i3 and i4 rArr i9 i3 and i9 rArr i4 i4 and i9 rArr i3 i3 and i4 and i6 rArr i9 i3 and i4 and i9 rArr i6 i3 and i6 and i9 rArr i4i3 and i4 and i6 rArr i9 i3 and i4 rArr i6 and i9 i3 and i6 rArr i4 and i9

6copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (5) An example dataset

7

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (6)

Item an attribute ndash value pair Itemset combination of items that have a minimum specified support

(minsup) Support Coverage of an itemset

The support value of X with respect to T is defined as the number of transactions in the database which contains the itemset X supp(X) = |t isinT X sube T| (absolute definition) supp(X) = |t isinT X sube T| |T| (relative definition)

Support of a rule supp(X rArr Y) = supp(X cup Y ) |T|

Confidence of a ruleThe confidence value of a rule X rArr Y with respect to a set of transactions T is the proportion of the transactions containing X which also contains Y

conf(X rArr Y) = supp(X cup Y) supp(X)

8copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (7)

Example of an Item Temperature = cool

Example of Itemsets Temperature = cool Temperature = cool Humidity = normal

Example of rulesif Temperature = cool then humidity = normal

supp(Temperature = cool) = 4supp(Humidity = normal Temperature = cool)= 4conf(if Temperature = cool then Humidity = normal) = 44 = 100

if Humidity = normal then Temperature = coolsupp(Humidity = normal) = 7 conf(if Humidity = normal then Temperature = cool) = 47 = 5714

The rules we are interested are those ones with a minimum supportand with high confidence

9copy Miquel Sagravenchez i Marregrave KEMLG 2020

httpskemlgupcedu

Apriori Algorithm

Apriori algorithm (1)

1 Define the minsup and the minconf and eventually the rules desired2 Compute the itemsets with supp(itemset) ge minsup for n=1 hellip N-1 being N= the number of available attributes

Use a hash-table to store the itemsets Use lexicographical ordering for generating and storing the

itemsets in the hash-table Apply the filter property frequent itemsets of length L

must be formed from frequent itemsets of length L-13 For each itemset generated in the previous step generate the candidate rules from it checking that they have the specified minimum accuracy (conf(rule) ge minconf)

Generate the rules starting first with one itemset in the consequent and progress with two itemsets etc

11copy Miquel Sagravenchez i Marregrave KEMLG 2020

Apriori (Tminsup) algorithm (2)

12copy Miquel Sagravenchez i Marregrave KEMLG 2020

L1 = large 1-itemsetsfor (k = 2 Lk-1 ne empty k++) do

Ck = Candidate-generation (Lk-1) New apriori candidates generated by extending Lk-1 candidates

forall transactions t isin T doCt = c isin Ck | Ck sube t Candidates contained in tforall candidates c isin Ct do

ccount++end

endLk = c | c isin Ck and ccount ge minsup

end

return cupk Lk

Candidate Generation

Two steps join step and prune filter stepCk = emptyforall a b isin Lk-1 such that a=I1 hellip Ik-2 Ik-1 and

b=I1 hellip Ik-2 Irsquok-1 and Ik-1 lt Irsquok-1 do join k-1 large itemsets with a common prefix and one item different

in lexicographic order to not repeat itemsets

c larr I1 hellip Ik-2 Ik-1 Irsquok-1 c is the join of a and b

Ck larr Ck cup c endforforeach c such that exists | s sube c |s|=k-1 and s notin Lk-1 do

Ck larr Ck ndash c apply filter property step

endforeachreturn Ck

copy Miquel Sagravenchez i Marregrave KEMLG-IDEAI 202013

Apriorialgorithm

(3)

Example minsup = 2

14copy Miquel Sagravenchez i Marregrave KEMLG 2020

One-item sets (12) sup Two-item sets

(47) sup Three-item sets (39) sup Four-item sets

(6) sup

outlook = sunny 5 outlook = sunnytemperature = mild 2

outlook = sunnytemperature = hothumidity = high

2outlook = sunny

temperature = hothumidity = high

play = no2

outlook = overcast 4 outlook = sunnytemperature = hot 2

outlook = sunnytemperature = hot

play = no2

outlook = sunnyhumidity = highwindy = false

play = no2

outlook = rainy 5 outlook = sunnyhumidity = normal 2

outlook = sunnyhumidity = normal

play = yes2

outlook = overcasttemperature = hot

windy = falseplay = yes

2

temperature = cool 4 outlook = sunnyhumidity = high 3

outlook = sunnyhumidity = highwindy = false

2outlook = rainy

temperature = mildwindy = false

play = yes2

temperature = mild 6 outlook = sunnywindy = true 2

outlook = sunnyhumidity = high

play = no3

outlook = rainyhumidity = normal

windy = falseplay = yes

2

temperature = hot 4 outlook = sunnywindy = false 3

outlook = sunnywindy = false

play = no2

temperature = coolhumidity = normal

windy = falseplay = yes

2

humidity = normal 7 outlook = sunnyplay = yes 2

outlook = overcasttemperature = hot

windy = false2

humidity = high 7 outlook = sunnyplay = no 3

outlook = overcasttemperature = hot

play = yes2

windy = true 6 outlook = overcasttemperature = hot 2

outlook = overcasttemperature = hot

play = yes2

windy = false 8 outlook = overcasthumidity = normal 2

outlook = overcasthumidity = high

play = yes2

play = yes 9 outlook = overcasthumidity = high 2

outlook = overcastwindy = trueplay = yes

2

play = no 5 outlook = overcastwindy = true 2

outlook = overcastwindy = false

play = yes2

hellip hellip

Association Rules (8) An example dataset

15copy Miquel Sagravenchez i Marregrave KEMLG 2020

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

Rule Generation Let us take the 3-item set

L3 = humidity = normal windy = false play = yesSupp (L3) = 4

Rules generated Confidenceif humidity = normal and windy = false then play = yes 44 = 100if humidity = normal and play = yes then windy = false 46 = 666if windy = false and play = yes then humidity = normal 46 = 666if humidity = normal then windy = false and play = yes 47 = 5714if windy = false then humidity = normal and play = yes 48 = 50if play = yes then humidity = normal and windy = false 49 = 4444if empty then humidity = normal and windy = false and play = yes 414 = 2857

Rules(N-item set) = sum119894119894=1119873119873 119862119862119873119873 119894119894

In the example database with 100 confidence and minsup ge 2 there are 58 rules

16copy Miquel Sagravenchez i Marregrave KEMLG 2020

  • Association Rules
  • Associative Models
  • Association Rules (1)
  • Association Rules (2)
  • Association Rules (3)
  • Association Rules (4)
  • Association Rules (5)
  • Association Rules (6)
  • Association Rules (7)
  • Apriori Algorithm
  • Apriori algorithm (1)
  • Apriori (Tminsup) algorithm (2)
  • Candidate Generation
  • Apriori algorithm (3)
  • Association Rules (8)
  • Rule Generation

Association Rules (3) For instance the following common patterns can be obtained

i2 i4i4 i9i2 i4 i9i3 i4 i9i3 i4 i6 i9

From a common pattern several association rules can be generated

An association rule is defined as an implication of the formX rArr YWhere X Y sube I and X cap Y = empty

5copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (4) Every rule is composed by two different sets of items also

known as itemsets X and Y X is called the antecedent or left-hand-side (LHS) of the rule

and Y is called the consequent or right-hand-side (RHS) of the ruleFor instancei2 rArr i4 i4 rArr i2i4 rArr i9 i9 rArr i4i2 rArr i4 and i9 i4 rArr i2 and i9 i9 rArr i2 and i4i2 and i4 rArr i9 i2 and i9 rArr i4 i4 and i9 rArr i2 i3 and i4 rArr i9 i3 and i9 rArr i4 i4 and i9 rArr i3 i3 and i4 and i6 rArr i9 i3 and i4 and i9 rArr i6 i3 and i6 and i9 rArr i4i3 and i4 and i6 rArr i9 i3 and i4 rArr i6 and i9 i3 and i6 rArr i4 and i9

6copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (5) An example dataset

7

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (6)

Item an attribute ndash value pair Itemset combination of items that have a minimum specified support

(minsup) Support Coverage of an itemset

The support value of X with respect to T is defined as the number of transactions in the database which contains the itemset X supp(X) = |t isinT X sube T| (absolute definition) supp(X) = |t isinT X sube T| |T| (relative definition)

Support of a rule supp(X rArr Y) = supp(X cup Y ) |T|

Confidence of a ruleThe confidence value of a rule X rArr Y with respect to a set of transactions T is the proportion of the transactions containing X which also contains Y

conf(X rArr Y) = supp(X cup Y) supp(X)

8copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (7)

Example of an Item Temperature = cool

Example of Itemsets Temperature = cool Temperature = cool Humidity = normal

Example of rulesif Temperature = cool then humidity = normal

supp(Temperature = cool) = 4supp(Humidity = normal Temperature = cool)= 4conf(if Temperature = cool then Humidity = normal) = 44 = 100

if Humidity = normal then Temperature = coolsupp(Humidity = normal) = 7 conf(if Humidity = normal then Temperature = cool) = 47 = 5714

The rules we are interested are those ones with a minimum supportand with high confidence

9copy Miquel Sagravenchez i Marregrave KEMLG 2020

httpskemlgupcedu

Apriori Algorithm

Apriori algorithm (1)

1 Define the minsup and the minconf and eventually the rules desired2 Compute the itemsets with supp(itemset) ge minsup for n=1 hellip N-1 being N= the number of available attributes

Use a hash-table to store the itemsets Use lexicographical ordering for generating and storing the

itemsets in the hash-table Apply the filter property frequent itemsets of length L

must be formed from frequent itemsets of length L-13 For each itemset generated in the previous step generate the candidate rules from it checking that they have the specified minimum accuracy (conf(rule) ge minconf)

Generate the rules starting first with one itemset in the consequent and progress with two itemsets etc

11copy Miquel Sagravenchez i Marregrave KEMLG 2020

Apriori (Tminsup) algorithm (2)

12copy Miquel Sagravenchez i Marregrave KEMLG 2020

L1 = large 1-itemsetsfor (k = 2 Lk-1 ne empty k++) do

Ck = Candidate-generation (Lk-1) New apriori candidates generated by extending Lk-1 candidates

forall transactions t isin T doCt = c isin Ck | Ck sube t Candidates contained in tforall candidates c isin Ct do

ccount++end

endLk = c | c isin Ck and ccount ge minsup

end

return cupk Lk

Candidate Generation

Two steps join step and prune filter stepCk = emptyforall a b isin Lk-1 such that a=I1 hellip Ik-2 Ik-1 and

b=I1 hellip Ik-2 Irsquok-1 and Ik-1 lt Irsquok-1 do join k-1 large itemsets with a common prefix and one item different

in lexicographic order to not repeat itemsets

c larr I1 hellip Ik-2 Ik-1 Irsquok-1 c is the join of a and b

Ck larr Ck cup c endforforeach c such that exists | s sube c |s|=k-1 and s notin Lk-1 do

Ck larr Ck ndash c apply filter property step

endforeachreturn Ck

copy Miquel Sagravenchez i Marregrave KEMLG-IDEAI 202013

Apriorialgorithm

(3)

Example minsup = 2

14copy Miquel Sagravenchez i Marregrave KEMLG 2020

One-item sets (12) sup Two-item sets

(47) sup Three-item sets (39) sup Four-item sets

(6) sup

outlook = sunny 5 outlook = sunnytemperature = mild 2

outlook = sunnytemperature = hothumidity = high

2outlook = sunny

temperature = hothumidity = high

play = no2

outlook = overcast 4 outlook = sunnytemperature = hot 2

outlook = sunnytemperature = hot

play = no2

outlook = sunnyhumidity = highwindy = false

play = no2

outlook = rainy 5 outlook = sunnyhumidity = normal 2

outlook = sunnyhumidity = normal

play = yes2

outlook = overcasttemperature = hot

windy = falseplay = yes

2

temperature = cool 4 outlook = sunnyhumidity = high 3

outlook = sunnyhumidity = highwindy = false

2outlook = rainy

temperature = mildwindy = false

play = yes2

temperature = mild 6 outlook = sunnywindy = true 2

outlook = sunnyhumidity = high

play = no3

outlook = rainyhumidity = normal

windy = falseplay = yes

2

temperature = hot 4 outlook = sunnywindy = false 3

outlook = sunnywindy = false

play = no2

temperature = coolhumidity = normal

windy = falseplay = yes

2

humidity = normal 7 outlook = sunnyplay = yes 2

outlook = overcasttemperature = hot

windy = false2

humidity = high 7 outlook = sunnyplay = no 3

outlook = overcasttemperature = hot

play = yes2

windy = true 6 outlook = overcasttemperature = hot 2

outlook = overcasttemperature = hot

play = yes2

windy = false 8 outlook = overcasthumidity = normal 2

outlook = overcasthumidity = high

play = yes2

play = yes 9 outlook = overcasthumidity = high 2

outlook = overcastwindy = trueplay = yes

2

play = no 5 outlook = overcastwindy = true 2

outlook = overcastwindy = false

play = yes2

hellip hellip

Association Rules (8) An example dataset

15copy Miquel Sagravenchez i Marregrave KEMLG 2020

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

Rule Generation Let us take the 3-item set

L3 = humidity = normal windy = false play = yesSupp (L3) = 4

Rules generated Confidenceif humidity = normal and windy = false then play = yes 44 = 100if humidity = normal and play = yes then windy = false 46 = 666if windy = false and play = yes then humidity = normal 46 = 666if humidity = normal then windy = false and play = yes 47 = 5714if windy = false then humidity = normal and play = yes 48 = 50if play = yes then humidity = normal and windy = false 49 = 4444if empty then humidity = normal and windy = false and play = yes 414 = 2857

Rules(N-item set) = sum119894119894=1119873119873 119862119862119873119873 119894119894

In the example database with 100 confidence and minsup ge 2 there are 58 rules

16copy Miquel Sagravenchez i Marregrave KEMLG 2020

  • Association Rules
  • Associative Models
  • Association Rules (1)
  • Association Rules (2)
  • Association Rules (3)
  • Association Rules (4)
  • Association Rules (5)
  • Association Rules (6)
  • Association Rules (7)
  • Apriori Algorithm
  • Apriori algorithm (1)
  • Apriori (Tminsup) algorithm (2)
  • Candidate Generation
  • Apriori algorithm (3)
  • Association Rules (8)
  • Rule Generation

Association Rules (4) Every rule is composed by two different sets of items also

known as itemsets X and Y X is called the antecedent or left-hand-side (LHS) of the rule

and Y is called the consequent or right-hand-side (RHS) of the ruleFor instancei2 rArr i4 i4 rArr i2i4 rArr i9 i9 rArr i4i2 rArr i4 and i9 i4 rArr i2 and i9 i9 rArr i2 and i4i2 and i4 rArr i9 i2 and i9 rArr i4 i4 and i9 rArr i2 i3 and i4 rArr i9 i3 and i9 rArr i4 i4 and i9 rArr i3 i3 and i4 and i6 rArr i9 i3 and i4 and i9 rArr i6 i3 and i6 and i9 rArr i4i3 and i4 and i6 rArr i9 i3 and i4 rArr i6 and i9 i3 and i6 rArr i4 and i9

6copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (5) An example dataset

7

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (6)

Item an attribute ndash value pair Itemset combination of items that have a minimum specified support

(minsup) Support Coverage of an itemset

The support value of X with respect to T is defined as the number of transactions in the database which contains the itemset X supp(X) = |t isinT X sube T| (absolute definition) supp(X) = |t isinT X sube T| |T| (relative definition)

Support of a rule supp(X rArr Y) = supp(X cup Y ) |T|

Confidence of a ruleThe confidence value of a rule X rArr Y with respect to a set of transactions T is the proportion of the transactions containing X which also contains Y

conf(X rArr Y) = supp(X cup Y) supp(X)

8copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (7)

Example of an Item Temperature = cool

Example of Itemsets Temperature = cool Temperature = cool Humidity = normal

Example of rulesif Temperature = cool then humidity = normal

supp(Temperature = cool) = 4supp(Humidity = normal Temperature = cool)= 4conf(if Temperature = cool then Humidity = normal) = 44 = 100

if Humidity = normal then Temperature = coolsupp(Humidity = normal) = 7 conf(if Humidity = normal then Temperature = cool) = 47 = 5714

The rules we are interested are those ones with a minimum supportand with high confidence

9copy Miquel Sagravenchez i Marregrave KEMLG 2020

httpskemlgupcedu

Apriori Algorithm

Apriori algorithm (1)

1 Define the minsup and the minconf and eventually the rules desired2 Compute the itemsets with supp(itemset) ge minsup for n=1 hellip N-1 being N= the number of available attributes

Use a hash-table to store the itemsets Use lexicographical ordering for generating and storing the

itemsets in the hash-table Apply the filter property frequent itemsets of length L

must be formed from frequent itemsets of length L-13 For each itemset generated in the previous step generate the candidate rules from it checking that they have the specified minimum accuracy (conf(rule) ge minconf)

Generate the rules starting first with one itemset in the consequent and progress with two itemsets etc

11copy Miquel Sagravenchez i Marregrave KEMLG 2020

Apriori (Tminsup) algorithm (2)

12copy Miquel Sagravenchez i Marregrave KEMLG 2020

L1 = large 1-itemsetsfor (k = 2 Lk-1 ne empty k++) do

Ck = Candidate-generation (Lk-1) New apriori candidates generated by extending Lk-1 candidates

forall transactions t isin T doCt = c isin Ck | Ck sube t Candidates contained in tforall candidates c isin Ct do

ccount++end

endLk = c | c isin Ck and ccount ge minsup

end

return cupk Lk

Candidate Generation

Two steps join step and prune filter stepCk = emptyforall a b isin Lk-1 such that a=I1 hellip Ik-2 Ik-1 and

b=I1 hellip Ik-2 Irsquok-1 and Ik-1 lt Irsquok-1 do join k-1 large itemsets with a common prefix and one item different

in lexicographic order to not repeat itemsets

c larr I1 hellip Ik-2 Ik-1 Irsquok-1 c is the join of a and b

Ck larr Ck cup c endforforeach c such that exists | s sube c |s|=k-1 and s notin Lk-1 do

Ck larr Ck ndash c apply filter property step

endforeachreturn Ck

copy Miquel Sagravenchez i Marregrave KEMLG-IDEAI 202013

Apriorialgorithm

(3)

Example minsup = 2

14copy Miquel Sagravenchez i Marregrave KEMLG 2020

One-item sets (12) sup Two-item sets

(47) sup Three-item sets (39) sup Four-item sets

(6) sup

outlook = sunny 5 outlook = sunnytemperature = mild 2

outlook = sunnytemperature = hothumidity = high

2outlook = sunny

temperature = hothumidity = high

play = no2

outlook = overcast 4 outlook = sunnytemperature = hot 2

outlook = sunnytemperature = hot

play = no2

outlook = sunnyhumidity = highwindy = false

play = no2

outlook = rainy 5 outlook = sunnyhumidity = normal 2

outlook = sunnyhumidity = normal

play = yes2

outlook = overcasttemperature = hot

windy = falseplay = yes

2

temperature = cool 4 outlook = sunnyhumidity = high 3

outlook = sunnyhumidity = highwindy = false

2outlook = rainy

temperature = mildwindy = false

play = yes2

temperature = mild 6 outlook = sunnywindy = true 2

outlook = sunnyhumidity = high

play = no3

outlook = rainyhumidity = normal

windy = falseplay = yes

2

temperature = hot 4 outlook = sunnywindy = false 3

outlook = sunnywindy = false

play = no2

temperature = coolhumidity = normal

windy = falseplay = yes

2

humidity = normal 7 outlook = sunnyplay = yes 2

outlook = overcasttemperature = hot

windy = false2

humidity = high 7 outlook = sunnyplay = no 3

outlook = overcasttemperature = hot

play = yes2

windy = true 6 outlook = overcasttemperature = hot 2

outlook = overcasttemperature = hot

play = yes2

windy = false 8 outlook = overcasthumidity = normal 2

outlook = overcasthumidity = high

play = yes2

play = yes 9 outlook = overcasthumidity = high 2

outlook = overcastwindy = trueplay = yes

2

play = no 5 outlook = overcastwindy = true 2

outlook = overcastwindy = false

play = yes2

hellip hellip

Association Rules (8) An example dataset

15copy Miquel Sagravenchez i Marregrave KEMLG 2020

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

Rule Generation Let us take the 3-item set

L3 = humidity = normal windy = false play = yesSupp (L3) = 4

Rules generated Confidenceif humidity = normal and windy = false then play = yes 44 = 100if humidity = normal and play = yes then windy = false 46 = 666if windy = false and play = yes then humidity = normal 46 = 666if humidity = normal then windy = false and play = yes 47 = 5714if windy = false then humidity = normal and play = yes 48 = 50if play = yes then humidity = normal and windy = false 49 = 4444if empty then humidity = normal and windy = false and play = yes 414 = 2857

Rules(N-item set) = sum119894119894=1119873119873 119862119862119873119873 119894119894

In the example database with 100 confidence and minsup ge 2 there are 58 rules

16copy Miquel Sagravenchez i Marregrave KEMLG 2020

  • Association Rules
  • Associative Models
  • Association Rules (1)
  • Association Rules (2)
  • Association Rules (3)
  • Association Rules (4)
  • Association Rules (5)
  • Association Rules (6)
  • Association Rules (7)
  • Apriori Algorithm
  • Apriori algorithm (1)
  • Apriori (Tminsup) algorithm (2)
  • Candidate Generation
  • Apriori algorithm (3)
  • Association Rules (8)
  • Rule Generation

Association Rules (5) An example dataset

7

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (6)

Item an attribute ndash value pair Itemset combination of items that have a minimum specified support

(minsup) Support Coverage of an itemset

The support value of X with respect to T is defined as the number of transactions in the database which contains the itemset X supp(X) = |t isinT X sube T| (absolute definition) supp(X) = |t isinT X sube T| |T| (relative definition)

Support of a rule supp(X rArr Y) = supp(X cup Y ) |T|

Confidence of a ruleThe confidence value of a rule X rArr Y with respect to a set of transactions T is the proportion of the transactions containing X which also contains Y

conf(X rArr Y) = supp(X cup Y) supp(X)

8copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (7)

Example of an Item Temperature = cool

Example of Itemsets Temperature = cool Temperature = cool Humidity = normal

Example of rulesif Temperature = cool then humidity = normal

supp(Temperature = cool) = 4supp(Humidity = normal Temperature = cool)= 4conf(if Temperature = cool then Humidity = normal) = 44 = 100

if Humidity = normal then Temperature = coolsupp(Humidity = normal) = 7 conf(if Humidity = normal then Temperature = cool) = 47 = 5714

The rules we are interested are those ones with a minimum supportand with high confidence

9copy Miquel Sagravenchez i Marregrave KEMLG 2020

httpskemlgupcedu

Apriori Algorithm

Apriori algorithm (1)

1 Define the minsup and the minconf and eventually the rules desired2 Compute the itemsets with supp(itemset) ge minsup for n=1 hellip N-1 being N= the number of available attributes

Use a hash-table to store the itemsets Use lexicographical ordering for generating and storing the

itemsets in the hash-table Apply the filter property frequent itemsets of length L

must be formed from frequent itemsets of length L-13 For each itemset generated in the previous step generate the candidate rules from it checking that they have the specified minimum accuracy (conf(rule) ge minconf)

Generate the rules starting first with one itemset in the consequent and progress with two itemsets etc

11copy Miquel Sagravenchez i Marregrave KEMLG 2020

Apriori (Tminsup) algorithm (2)

12copy Miquel Sagravenchez i Marregrave KEMLG 2020

L1 = large 1-itemsetsfor (k = 2 Lk-1 ne empty k++) do

Ck = Candidate-generation (Lk-1) New apriori candidates generated by extending Lk-1 candidates

forall transactions t isin T doCt = c isin Ck | Ck sube t Candidates contained in tforall candidates c isin Ct do

ccount++end

endLk = c | c isin Ck and ccount ge minsup

end

return cupk Lk

Candidate Generation

Two steps join step and prune filter stepCk = emptyforall a b isin Lk-1 such that a=I1 hellip Ik-2 Ik-1 and

b=I1 hellip Ik-2 Irsquok-1 and Ik-1 lt Irsquok-1 do join k-1 large itemsets with a common prefix and one item different

in lexicographic order to not repeat itemsets

c larr I1 hellip Ik-2 Ik-1 Irsquok-1 c is the join of a and b

Ck larr Ck cup c endforforeach c such that exists | s sube c |s|=k-1 and s notin Lk-1 do

Ck larr Ck ndash c apply filter property step

endforeachreturn Ck

copy Miquel Sagravenchez i Marregrave KEMLG-IDEAI 202013

Apriorialgorithm

(3)

Example minsup = 2

14copy Miquel Sagravenchez i Marregrave KEMLG 2020

One-item sets (12) sup Two-item sets

(47) sup Three-item sets (39) sup Four-item sets

(6) sup

outlook = sunny 5 outlook = sunnytemperature = mild 2

outlook = sunnytemperature = hothumidity = high

2outlook = sunny

temperature = hothumidity = high

play = no2

outlook = overcast 4 outlook = sunnytemperature = hot 2

outlook = sunnytemperature = hot

play = no2

outlook = sunnyhumidity = highwindy = false

play = no2

outlook = rainy 5 outlook = sunnyhumidity = normal 2

outlook = sunnyhumidity = normal

play = yes2

outlook = overcasttemperature = hot

windy = falseplay = yes

2

temperature = cool 4 outlook = sunnyhumidity = high 3

outlook = sunnyhumidity = highwindy = false

2outlook = rainy

temperature = mildwindy = false

play = yes2

temperature = mild 6 outlook = sunnywindy = true 2

outlook = sunnyhumidity = high

play = no3

outlook = rainyhumidity = normal

windy = falseplay = yes

2

temperature = hot 4 outlook = sunnywindy = false 3

outlook = sunnywindy = false

play = no2

temperature = coolhumidity = normal

windy = falseplay = yes

2

humidity = normal 7 outlook = sunnyplay = yes 2

outlook = overcasttemperature = hot

windy = false2

humidity = high 7 outlook = sunnyplay = no 3

outlook = overcasttemperature = hot

play = yes2

windy = true 6 outlook = overcasttemperature = hot 2

outlook = overcasttemperature = hot

play = yes2

windy = false 8 outlook = overcasthumidity = normal 2

outlook = overcasthumidity = high

play = yes2

play = yes 9 outlook = overcasthumidity = high 2

outlook = overcastwindy = trueplay = yes

2

play = no 5 outlook = overcastwindy = true 2

outlook = overcastwindy = false

play = yes2

hellip hellip

Association Rules (8) An example dataset

15copy Miquel Sagravenchez i Marregrave KEMLG 2020

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

Rule Generation Let us take the 3-item set

L3 = humidity = normal windy = false play = yesSupp (L3) = 4

Rules generated Confidenceif humidity = normal and windy = false then play = yes 44 = 100if humidity = normal and play = yes then windy = false 46 = 666if windy = false and play = yes then humidity = normal 46 = 666if humidity = normal then windy = false and play = yes 47 = 5714if windy = false then humidity = normal and play = yes 48 = 50if play = yes then humidity = normal and windy = false 49 = 4444if empty then humidity = normal and windy = false and play = yes 414 = 2857

Rules(N-item set) = sum119894119894=1119873119873 119862119862119873119873 119894119894

In the example database with 100 confidence and minsup ge 2 there are 58 rules

16copy Miquel Sagravenchez i Marregrave KEMLG 2020

  • Association Rules
  • Associative Models
  • Association Rules (1)
  • Association Rules (2)
  • Association Rules (3)
  • Association Rules (4)
  • Association Rules (5)
  • Association Rules (6)
  • Association Rules (7)
  • Apriori Algorithm
  • Apriori algorithm (1)
  • Apriori (Tminsup) algorithm (2)
  • Candidate Generation
  • Apriori algorithm (3)
  • Association Rules (8)
  • Rule Generation

Association Rules (6)

Item an attribute ndash value pair Itemset combination of items that have a minimum specified support

(minsup) Support Coverage of an itemset

The support value of X with respect to T is defined as the number of transactions in the database which contains the itemset X supp(X) = |t isinT X sube T| (absolute definition) supp(X) = |t isinT X sube T| |T| (relative definition)

Support of a rule supp(X rArr Y) = supp(X cup Y ) |T|

Confidence of a ruleThe confidence value of a rule X rArr Y with respect to a set of transactions T is the proportion of the transactions containing X which also contains Y

conf(X rArr Y) = supp(X cup Y) supp(X)

8copy Miquel Sagravenchez i Marregrave KEMLG 2020

Association Rules (7)

Example of an Item Temperature = cool

Example of Itemsets Temperature = cool Temperature = cool Humidity = normal

Example of rulesif Temperature = cool then humidity = normal

supp(Temperature = cool) = 4supp(Humidity = normal Temperature = cool)= 4conf(if Temperature = cool then Humidity = normal) = 44 = 100

if Humidity = normal then Temperature = coolsupp(Humidity = normal) = 7 conf(if Humidity = normal then Temperature = cool) = 47 = 5714

The rules we are interested are those ones with a minimum supportand with high confidence

9copy Miquel Sagravenchez i Marregrave KEMLG 2020

httpskemlgupcedu

Apriori Algorithm

Apriori algorithm (1)

1 Define the minsup and the minconf and eventually the rules desired2 Compute the itemsets with supp(itemset) ge minsup for n=1 hellip N-1 being N= the number of available attributes

Use a hash-table to store the itemsets Use lexicographical ordering for generating and storing the

itemsets in the hash-table Apply the filter property frequent itemsets of length L

must be formed from frequent itemsets of length L-13 For each itemset generated in the previous step generate the candidate rules from it checking that they have the specified minimum accuracy (conf(rule) ge minconf)

Generate the rules starting first with one itemset in the consequent and progress with two itemsets etc

11copy Miquel Sagravenchez i Marregrave KEMLG 2020

Apriori (Tminsup) algorithm (2)

12copy Miquel Sagravenchez i Marregrave KEMLG 2020

L1 = large 1-itemsetsfor (k = 2 Lk-1 ne empty k++) do

Ck = Candidate-generation (Lk-1) New apriori candidates generated by extending Lk-1 candidates

forall transactions t isin T doCt = c isin Ck | Ck sube t Candidates contained in tforall candidates c isin Ct do

ccount++end

endLk = c | c isin Ck and ccount ge minsup

end

return cupk Lk

Candidate Generation

Two steps join step and prune filter stepCk = emptyforall a b isin Lk-1 such that a=I1 hellip Ik-2 Ik-1 and

b=I1 hellip Ik-2 Irsquok-1 and Ik-1 lt Irsquok-1 do join k-1 large itemsets with a common prefix and one item different

in lexicographic order to not repeat itemsets

c larr I1 hellip Ik-2 Ik-1 Irsquok-1 c is the join of a and b

Ck larr Ck cup c endforforeach c such that exists | s sube c |s|=k-1 and s notin Lk-1 do

Ck larr Ck ndash c apply filter property step

endforeachreturn Ck

copy Miquel Sagravenchez i Marregrave KEMLG-IDEAI 202013

Apriorialgorithm

(3)

Example minsup = 2

14copy Miquel Sagravenchez i Marregrave KEMLG 2020

One-item sets (12) sup Two-item sets

(47) sup Three-item sets (39) sup Four-item sets

(6) sup

outlook = sunny 5 outlook = sunnytemperature = mild 2

outlook = sunnytemperature = hothumidity = high

2outlook = sunny

temperature = hothumidity = high

play = no2

outlook = overcast 4 outlook = sunnytemperature = hot 2

outlook = sunnytemperature = hot

play = no2

outlook = sunnyhumidity = highwindy = false

play = no2

outlook = rainy 5 outlook = sunnyhumidity = normal 2

outlook = sunnyhumidity = normal

play = yes2

outlook = overcasttemperature = hot

windy = falseplay = yes

2

temperature = cool 4 outlook = sunnyhumidity = high 3

outlook = sunnyhumidity = highwindy = false

2outlook = rainy

temperature = mildwindy = false

play = yes2

temperature = mild 6 outlook = sunnywindy = true 2

outlook = sunnyhumidity = high

play = no3

outlook = rainyhumidity = normal

windy = falseplay = yes

2

temperature = hot 4 outlook = sunnywindy = false 3

outlook = sunnywindy = false

play = no2

temperature = coolhumidity = normal

windy = falseplay = yes

2

humidity = normal 7 outlook = sunnyplay = yes 2

outlook = overcasttemperature = hot

windy = false2

humidity = high 7 outlook = sunnyplay = no 3

outlook = overcasttemperature = hot

play = yes2

windy = true 6 outlook = overcasttemperature = hot 2

outlook = overcasttemperature = hot

play = yes2

windy = false 8 outlook = overcasthumidity = normal 2

outlook = overcasthumidity = high

play = yes2

play = yes 9 outlook = overcasthumidity = high 2

outlook = overcastwindy = trueplay = yes

2

play = no 5 outlook = overcastwindy = true 2

outlook = overcastwindy = false

play = yes2

hellip hellip

Association Rules (8) An example dataset

15copy Miquel Sagravenchez i Marregrave KEMLG 2020

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

Rule Generation Let us take the 3-item set

L3 = humidity = normal windy = false play = yesSupp (L3) = 4

Rules generated Confidenceif humidity = normal and windy = false then play = yes 44 = 100if humidity = normal and play = yes then windy = false 46 = 666if windy = false and play = yes then humidity = normal 46 = 666if humidity = normal then windy = false and play = yes 47 = 5714if windy = false then humidity = normal and play = yes 48 = 50if play = yes then humidity = normal and windy = false 49 = 4444if empty then humidity = normal and windy = false and play = yes 414 = 2857

Rules(N-item set) = sum119894119894=1119873119873 119862119862119873119873 119894119894

In the example database with 100 confidence and minsup ge 2 there are 58 rules

16copy Miquel Sagravenchez i Marregrave KEMLG 2020

  • Association Rules
  • Associative Models
  • Association Rules (1)
  • Association Rules (2)
  • Association Rules (3)
  • Association Rules (4)
  • Association Rules (5)
  • Association Rules (6)
  • Association Rules (7)
  • Apriori Algorithm
  • Apriori algorithm (1)
  • Apriori (Tminsup) algorithm (2)
  • Candidate Generation
  • Apriori algorithm (3)
  • Association Rules (8)
  • Rule Generation

Association Rules (7)

Example of an Item Temperature = cool

Example of Itemsets Temperature = cool Temperature = cool Humidity = normal

Example of rulesif Temperature = cool then humidity = normal

supp(Temperature = cool) = 4supp(Humidity = normal Temperature = cool)= 4conf(if Temperature = cool then Humidity = normal) = 44 = 100

if Humidity = normal then Temperature = coolsupp(Humidity = normal) = 7 conf(if Humidity = normal then Temperature = cool) = 47 = 5714

The rules we are interested are those ones with a minimum supportand with high confidence

9copy Miquel Sagravenchez i Marregrave KEMLG 2020

httpskemlgupcedu

Apriori Algorithm

Apriori algorithm (1)

1 Define the minsup and the minconf and eventually the rules desired2 Compute the itemsets with supp(itemset) ge minsup for n=1 hellip N-1 being N= the number of available attributes

Use a hash-table to store the itemsets Use lexicographical ordering for generating and storing the

itemsets in the hash-table Apply the filter property frequent itemsets of length L

must be formed from frequent itemsets of length L-13 For each itemset generated in the previous step generate the candidate rules from it checking that they have the specified minimum accuracy (conf(rule) ge minconf)

Generate the rules starting first with one itemset in the consequent and progress with two itemsets etc

11copy Miquel Sagravenchez i Marregrave KEMLG 2020

Apriori (Tminsup) algorithm (2)

12copy Miquel Sagravenchez i Marregrave KEMLG 2020

L1 = large 1-itemsetsfor (k = 2 Lk-1 ne empty k++) do

Ck = Candidate-generation (Lk-1) New apriori candidates generated by extending Lk-1 candidates

forall transactions t isin T doCt = c isin Ck | Ck sube t Candidates contained in tforall candidates c isin Ct do

ccount++end

endLk = c | c isin Ck and ccount ge minsup

end

return cupk Lk

Candidate Generation

Two steps join step and prune filter stepCk = emptyforall a b isin Lk-1 such that a=I1 hellip Ik-2 Ik-1 and

b=I1 hellip Ik-2 Irsquok-1 and Ik-1 lt Irsquok-1 do join k-1 large itemsets with a common prefix and one item different

in lexicographic order to not repeat itemsets

c larr I1 hellip Ik-2 Ik-1 Irsquok-1 c is the join of a and b

Ck larr Ck cup c endforforeach c such that exists | s sube c |s|=k-1 and s notin Lk-1 do

Ck larr Ck ndash c apply filter property step

endforeachreturn Ck

copy Miquel Sagravenchez i Marregrave KEMLG-IDEAI 202013

Apriorialgorithm

(3)

Example minsup = 2

14copy Miquel Sagravenchez i Marregrave KEMLG 2020

One-item sets (12) sup Two-item sets

(47) sup Three-item sets (39) sup Four-item sets

(6) sup

outlook = sunny 5 outlook = sunnytemperature = mild 2

outlook = sunnytemperature = hothumidity = high

2outlook = sunny

temperature = hothumidity = high

play = no2

outlook = overcast 4 outlook = sunnytemperature = hot 2

outlook = sunnytemperature = hot

play = no2

outlook = sunnyhumidity = highwindy = false

play = no2

outlook = rainy 5 outlook = sunnyhumidity = normal 2

outlook = sunnyhumidity = normal

play = yes2

outlook = overcasttemperature = hot

windy = falseplay = yes

2

temperature = cool 4 outlook = sunnyhumidity = high 3

outlook = sunnyhumidity = highwindy = false

2outlook = rainy

temperature = mildwindy = false

play = yes2

temperature = mild 6 outlook = sunnywindy = true 2

outlook = sunnyhumidity = high

play = no3

outlook = rainyhumidity = normal

windy = falseplay = yes

2

temperature = hot 4 outlook = sunnywindy = false 3

outlook = sunnywindy = false

play = no2

temperature = coolhumidity = normal

windy = falseplay = yes

2

humidity = normal 7 outlook = sunnyplay = yes 2

outlook = overcasttemperature = hot

windy = false2

humidity = high 7 outlook = sunnyplay = no 3

outlook = overcasttemperature = hot

play = yes2

windy = true 6 outlook = overcasttemperature = hot 2

outlook = overcasttemperature = hot

play = yes2

windy = false 8 outlook = overcasthumidity = normal 2

outlook = overcasthumidity = high

play = yes2

play = yes 9 outlook = overcasthumidity = high 2

outlook = overcastwindy = trueplay = yes

2

play = no 5 outlook = overcastwindy = true 2

outlook = overcastwindy = false

play = yes2

hellip hellip

Association Rules (8) An example dataset

15copy Miquel Sagravenchez i Marregrave KEMLG 2020

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

Rule Generation Let us take the 3-item set

L3 = humidity = normal windy = false play = yesSupp (L3) = 4

Rules generated Confidenceif humidity = normal and windy = false then play = yes 44 = 100if humidity = normal and play = yes then windy = false 46 = 666if windy = false and play = yes then humidity = normal 46 = 666if humidity = normal then windy = false and play = yes 47 = 5714if windy = false then humidity = normal and play = yes 48 = 50if play = yes then humidity = normal and windy = false 49 = 4444if empty then humidity = normal and windy = false and play = yes 414 = 2857

Rules(N-item set) = sum119894119894=1119873119873 119862119862119873119873 119894119894

In the example database with 100 confidence and minsup ge 2 there are 58 rules

16copy Miquel Sagravenchez i Marregrave KEMLG 2020

  • Association Rules
  • Associative Models
  • Association Rules (1)
  • Association Rules (2)
  • Association Rules (3)
  • Association Rules (4)
  • Association Rules (5)
  • Association Rules (6)
  • Association Rules (7)
  • Apriori Algorithm
  • Apriori algorithm (1)
  • Apriori (Tminsup) algorithm (2)
  • Candidate Generation
  • Apriori algorithm (3)
  • Association Rules (8)
  • Rule Generation

httpskemlgupcedu

Apriori Algorithm

Apriori algorithm (1)

1 Define the minsup and the minconf and eventually the rules desired2 Compute the itemsets with supp(itemset) ge minsup for n=1 hellip N-1 being N= the number of available attributes

Use a hash-table to store the itemsets Use lexicographical ordering for generating and storing the

itemsets in the hash-table Apply the filter property frequent itemsets of length L

must be formed from frequent itemsets of length L-13 For each itemset generated in the previous step generate the candidate rules from it checking that they have the specified minimum accuracy (conf(rule) ge minconf)

Generate the rules starting first with one itemset in the consequent and progress with two itemsets etc

11copy Miquel Sagravenchez i Marregrave KEMLG 2020

Apriori (Tminsup) algorithm (2)

12copy Miquel Sagravenchez i Marregrave KEMLG 2020

L1 = large 1-itemsetsfor (k = 2 Lk-1 ne empty k++) do

Ck = Candidate-generation (Lk-1) New apriori candidates generated by extending Lk-1 candidates

forall transactions t isin T doCt = c isin Ck | Ck sube t Candidates contained in tforall candidates c isin Ct do

ccount++end

endLk = c | c isin Ck and ccount ge minsup

end

return cupk Lk

Candidate Generation

Two steps join step and prune filter stepCk = emptyforall a b isin Lk-1 such that a=I1 hellip Ik-2 Ik-1 and

b=I1 hellip Ik-2 Irsquok-1 and Ik-1 lt Irsquok-1 do join k-1 large itemsets with a common prefix and one item different

in lexicographic order to not repeat itemsets

c larr I1 hellip Ik-2 Ik-1 Irsquok-1 c is the join of a and b

Ck larr Ck cup c endforforeach c such that exists | s sube c |s|=k-1 and s notin Lk-1 do

Ck larr Ck ndash c apply filter property step

endforeachreturn Ck

copy Miquel Sagravenchez i Marregrave KEMLG-IDEAI 202013

Apriorialgorithm

(3)

Example minsup = 2

14copy Miquel Sagravenchez i Marregrave KEMLG 2020

One-item sets (12) sup Two-item sets

(47) sup Three-item sets (39) sup Four-item sets

(6) sup

outlook = sunny 5 outlook = sunnytemperature = mild 2

outlook = sunnytemperature = hothumidity = high

2outlook = sunny

temperature = hothumidity = high

play = no2

outlook = overcast 4 outlook = sunnytemperature = hot 2

outlook = sunnytemperature = hot

play = no2

outlook = sunnyhumidity = highwindy = false

play = no2

outlook = rainy 5 outlook = sunnyhumidity = normal 2

outlook = sunnyhumidity = normal

play = yes2

outlook = overcasttemperature = hot

windy = falseplay = yes

2

temperature = cool 4 outlook = sunnyhumidity = high 3

outlook = sunnyhumidity = highwindy = false

2outlook = rainy

temperature = mildwindy = false

play = yes2

temperature = mild 6 outlook = sunnywindy = true 2

outlook = sunnyhumidity = high

play = no3

outlook = rainyhumidity = normal

windy = falseplay = yes

2

temperature = hot 4 outlook = sunnywindy = false 3

outlook = sunnywindy = false

play = no2

temperature = coolhumidity = normal

windy = falseplay = yes

2

humidity = normal 7 outlook = sunnyplay = yes 2

outlook = overcasttemperature = hot

windy = false2

humidity = high 7 outlook = sunnyplay = no 3

outlook = overcasttemperature = hot

play = yes2

windy = true 6 outlook = overcasttemperature = hot 2

outlook = overcasttemperature = hot

play = yes2

windy = false 8 outlook = overcasthumidity = normal 2

outlook = overcasthumidity = high

play = yes2

play = yes 9 outlook = overcasthumidity = high 2

outlook = overcastwindy = trueplay = yes

2

play = no 5 outlook = overcastwindy = true 2

outlook = overcastwindy = false

play = yes2

hellip hellip

Association Rules (8) An example dataset

15copy Miquel Sagravenchez i Marregrave KEMLG 2020

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

Rule Generation Let us take the 3-item set

L3 = humidity = normal windy = false play = yesSupp (L3) = 4

Rules generated Confidenceif humidity = normal and windy = false then play = yes 44 = 100if humidity = normal and play = yes then windy = false 46 = 666if windy = false and play = yes then humidity = normal 46 = 666if humidity = normal then windy = false and play = yes 47 = 5714if windy = false then humidity = normal and play = yes 48 = 50if play = yes then humidity = normal and windy = false 49 = 4444if empty then humidity = normal and windy = false and play = yes 414 = 2857

Rules(N-item set) = sum119894119894=1119873119873 119862119862119873119873 119894119894

In the example database with 100 confidence and minsup ge 2 there are 58 rules

16copy Miquel Sagravenchez i Marregrave KEMLG 2020

  • Association Rules
  • Associative Models
  • Association Rules (1)
  • Association Rules (2)
  • Association Rules (3)
  • Association Rules (4)
  • Association Rules (5)
  • Association Rules (6)
  • Association Rules (7)
  • Apriori Algorithm
  • Apriori algorithm (1)
  • Apriori (Tminsup) algorithm (2)
  • Candidate Generation
  • Apriori algorithm (3)
  • Association Rules (8)
  • Rule Generation

Apriori algorithm (1)

1 Define the minsup and the minconf and eventually the rules desired2 Compute the itemsets with supp(itemset) ge minsup for n=1 hellip N-1 being N= the number of available attributes

Use a hash-table to store the itemsets Use lexicographical ordering for generating and storing the

itemsets in the hash-table Apply the filter property frequent itemsets of length L

must be formed from frequent itemsets of length L-13 For each itemset generated in the previous step generate the candidate rules from it checking that they have the specified minimum accuracy (conf(rule) ge minconf)

Generate the rules starting first with one itemset in the consequent and progress with two itemsets etc

11copy Miquel Sagravenchez i Marregrave KEMLG 2020

Apriori (Tminsup) algorithm (2)

12copy Miquel Sagravenchez i Marregrave KEMLG 2020

L1 = large 1-itemsetsfor (k = 2 Lk-1 ne empty k++) do

Ck = Candidate-generation (Lk-1) New apriori candidates generated by extending Lk-1 candidates

forall transactions t isin T doCt = c isin Ck | Ck sube t Candidates contained in tforall candidates c isin Ct do

ccount++end

endLk = c | c isin Ck and ccount ge minsup

end

return cupk Lk

Candidate Generation

Two steps join step and prune filter stepCk = emptyforall a b isin Lk-1 such that a=I1 hellip Ik-2 Ik-1 and

b=I1 hellip Ik-2 Irsquok-1 and Ik-1 lt Irsquok-1 do join k-1 large itemsets with a common prefix and one item different

in lexicographic order to not repeat itemsets

c larr I1 hellip Ik-2 Ik-1 Irsquok-1 c is the join of a and b

Ck larr Ck cup c endforforeach c such that exists | s sube c |s|=k-1 and s notin Lk-1 do

Ck larr Ck ndash c apply filter property step

endforeachreturn Ck

copy Miquel Sagravenchez i Marregrave KEMLG-IDEAI 202013

Apriorialgorithm

(3)

Example minsup = 2

14copy Miquel Sagravenchez i Marregrave KEMLG 2020

One-item sets (12) sup Two-item sets

(47) sup Three-item sets (39) sup Four-item sets

(6) sup

outlook = sunny 5 outlook = sunnytemperature = mild 2

outlook = sunnytemperature = hothumidity = high

2outlook = sunny

temperature = hothumidity = high

play = no2

outlook = overcast 4 outlook = sunnytemperature = hot 2

outlook = sunnytemperature = hot

play = no2

outlook = sunnyhumidity = highwindy = false

play = no2

outlook = rainy 5 outlook = sunnyhumidity = normal 2

outlook = sunnyhumidity = normal

play = yes2

outlook = overcasttemperature = hot

windy = falseplay = yes

2

temperature = cool 4 outlook = sunnyhumidity = high 3

outlook = sunnyhumidity = highwindy = false

2outlook = rainy

temperature = mildwindy = false

play = yes2

temperature = mild 6 outlook = sunnywindy = true 2

outlook = sunnyhumidity = high

play = no3

outlook = rainyhumidity = normal

windy = falseplay = yes

2

temperature = hot 4 outlook = sunnywindy = false 3

outlook = sunnywindy = false

play = no2

temperature = coolhumidity = normal

windy = falseplay = yes

2

humidity = normal 7 outlook = sunnyplay = yes 2

outlook = overcasttemperature = hot

windy = false2

humidity = high 7 outlook = sunnyplay = no 3

outlook = overcasttemperature = hot

play = yes2

windy = true 6 outlook = overcasttemperature = hot 2

outlook = overcasttemperature = hot

play = yes2

windy = false 8 outlook = overcasthumidity = normal 2

outlook = overcasthumidity = high

play = yes2

play = yes 9 outlook = overcasthumidity = high 2

outlook = overcastwindy = trueplay = yes

2

play = no 5 outlook = overcastwindy = true 2

outlook = overcastwindy = false

play = yes2

hellip hellip

Association Rules (8) An example dataset

15copy Miquel Sagravenchez i Marregrave KEMLG 2020

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

Rule Generation Let us take the 3-item set

L3 = humidity = normal windy = false play = yesSupp (L3) = 4

Rules generated Confidenceif humidity = normal and windy = false then play = yes 44 = 100if humidity = normal and play = yes then windy = false 46 = 666if windy = false and play = yes then humidity = normal 46 = 666if humidity = normal then windy = false and play = yes 47 = 5714if windy = false then humidity = normal and play = yes 48 = 50if play = yes then humidity = normal and windy = false 49 = 4444if empty then humidity = normal and windy = false and play = yes 414 = 2857

Rules(N-item set) = sum119894119894=1119873119873 119862119862119873119873 119894119894

In the example database with 100 confidence and minsup ge 2 there are 58 rules

16copy Miquel Sagravenchez i Marregrave KEMLG 2020

  • Association Rules
  • Associative Models
  • Association Rules (1)
  • Association Rules (2)
  • Association Rules (3)
  • Association Rules (4)
  • Association Rules (5)
  • Association Rules (6)
  • Association Rules (7)
  • Apriori Algorithm
  • Apriori algorithm (1)
  • Apriori (Tminsup) algorithm (2)
  • Candidate Generation
  • Apriori algorithm (3)
  • Association Rules (8)
  • Rule Generation

Apriori (Tminsup) algorithm (2)

12copy Miquel Sagravenchez i Marregrave KEMLG 2020

L1 = large 1-itemsetsfor (k = 2 Lk-1 ne empty k++) do

Ck = Candidate-generation (Lk-1) New apriori candidates generated by extending Lk-1 candidates

forall transactions t isin T doCt = c isin Ck | Ck sube t Candidates contained in tforall candidates c isin Ct do

ccount++end

endLk = c | c isin Ck and ccount ge minsup

end

return cupk Lk

Candidate Generation

Two steps join step and prune filter stepCk = emptyforall a b isin Lk-1 such that a=I1 hellip Ik-2 Ik-1 and

b=I1 hellip Ik-2 Irsquok-1 and Ik-1 lt Irsquok-1 do join k-1 large itemsets with a common prefix and one item different

in lexicographic order to not repeat itemsets

c larr I1 hellip Ik-2 Ik-1 Irsquok-1 c is the join of a and b

Ck larr Ck cup c endforforeach c such that exists | s sube c |s|=k-1 and s notin Lk-1 do

Ck larr Ck ndash c apply filter property step

endforeachreturn Ck

copy Miquel Sagravenchez i Marregrave KEMLG-IDEAI 202013

Apriorialgorithm

(3)

Example minsup = 2

14copy Miquel Sagravenchez i Marregrave KEMLG 2020

One-item sets (12) sup Two-item sets

(47) sup Three-item sets (39) sup Four-item sets

(6) sup

outlook = sunny 5 outlook = sunnytemperature = mild 2

outlook = sunnytemperature = hothumidity = high

2outlook = sunny

temperature = hothumidity = high

play = no2

outlook = overcast 4 outlook = sunnytemperature = hot 2

outlook = sunnytemperature = hot

play = no2

outlook = sunnyhumidity = highwindy = false

play = no2

outlook = rainy 5 outlook = sunnyhumidity = normal 2

outlook = sunnyhumidity = normal

play = yes2

outlook = overcasttemperature = hot

windy = falseplay = yes

2

temperature = cool 4 outlook = sunnyhumidity = high 3

outlook = sunnyhumidity = highwindy = false

2outlook = rainy

temperature = mildwindy = false

play = yes2

temperature = mild 6 outlook = sunnywindy = true 2

outlook = sunnyhumidity = high

play = no3

outlook = rainyhumidity = normal

windy = falseplay = yes

2

temperature = hot 4 outlook = sunnywindy = false 3

outlook = sunnywindy = false

play = no2

temperature = coolhumidity = normal

windy = falseplay = yes

2

humidity = normal 7 outlook = sunnyplay = yes 2

outlook = overcasttemperature = hot

windy = false2

humidity = high 7 outlook = sunnyplay = no 3

outlook = overcasttemperature = hot

play = yes2

windy = true 6 outlook = overcasttemperature = hot 2

outlook = overcasttemperature = hot

play = yes2

windy = false 8 outlook = overcasthumidity = normal 2

outlook = overcasthumidity = high

play = yes2

play = yes 9 outlook = overcasthumidity = high 2

outlook = overcastwindy = trueplay = yes

2

play = no 5 outlook = overcastwindy = true 2

outlook = overcastwindy = false

play = yes2

hellip hellip

Association Rules (8) An example dataset

15copy Miquel Sagravenchez i Marregrave KEMLG 2020

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

Rule Generation Let us take the 3-item set

L3 = humidity = normal windy = false play = yesSupp (L3) = 4

Rules generated Confidenceif humidity = normal and windy = false then play = yes 44 = 100if humidity = normal and play = yes then windy = false 46 = 666if windy = false and play = yes then humidity = normal 46 = 666if humidity = normal then windy = false and play = yes 47 = 5714if windy = false then humidity = normal and play = yes 48 = 50if play = yes then humidity = normal and windy = false 49 = 4444if empty then humidity = normal and windy = false and play = yes 414 = 2857

Rules(N-item set) = sum119894119894=1119873119873 119862119862119873119873 119894119894

In the example database with 100 confidence and minsup ge 2 there are 58 rules

16copy Miquel Sagravenchez i Marregrave KEMLG 2020

  • Association Rules
  • Associative Models
  • Association Rules (1)
  • Association Rules (2)
  • Association Rules (3)
  • Association Rules (4)
  • Association Rules (5)
  • Association Rules (6)
  • Association Rules (7)
  • Apriori Algorithm
  • Apriori algorithm (1)
  • Apriori (Tminsup) algorithm (2)
  • Candidate Generation
  • Apriori algorithm (3)
  • Association Rules (8)
  • Rule Generation

Candidate Generation

Two steps join step and prune filter stepCk = emptyforall a b isin Lk-1 such that a=I1 hellip Ik-2 Ik-1 and

b=I1 hellip Ik-2 Irsquok-1 and Ik-1 lt Irsquok-1 do join k-1 large itemsets with a common prefix and one item different

in lexicographic order to not repeat itemsets

c larr I1 hellip Ik-2 Ik-1 Irsquok-1 c is the join of a and b

Ck larr Ck cup c endforforeach c such that exists | s sube c |s|=k-1 and s notin Lk-1 do

Ck larr Ck ndash c apply filter property step

endforeachreturn Ck

copy Miquel Sagravenchez i Marregrave KEMLG-IDEAI 202013

Apriorialgorithm

(3)

Example minsup = 2

14copy Miquel Sagravenchez i Marregrave KEMLG 2020

One-item sets (12) sup Two-item sets

(47) sup Three-item sets (39) sup Four-item sets

(6) sup

outlook = sunny 5 outlook = sunnytemperature = mild 2

outlook = sunnytemperature = hothumidity = high

2outlook = sunny

temperature = hothumidity = high

play = no2

outlook = overcast 4 outlook = sunnytemperature = hot 2

outlook = sunnytemperature = hot

play = no2

outlook = sunnyhumidity = highwindy = false

play = no2

outlook = rainy 5 outlook = sunnyhumidity = normal 2

outlook = sunnyhumidity = normal

play = yes2

outlook = overcasttemperature = hot

windy = falseplay = yes

2

temperature = cool 4 outlook = sunnyhumidity = high 3

outlook = sunnyhumidity = highwindy = false

2outlook = rainy

temperature = mildwindy = false

play = yes2

temperature = mild 6 outlook = sunnywindy = true 2

outlook = sunnyhumidity = high

play = no3

outlook = rainyhumidity = normal

windy = falseplay = yes

2

temperature = hot 4 outlook = sunnywindy = false 3

outlook = sunnywindy = false

play = no2

temperature = coolhumidity = normal

windy = falseplay = yes

2

humidity = normal 7 outlook = sunnyplay = yes 2

outlook = overcasttemperature = hot

windy = false2

humidity = high 7 outlook = sunnyplay = no 3

outlook = overcasttemperature = hot

play = yes2

windy = true 6 outlook = overcasttemperature = hot 2

outlook = overcasttemperature = hot

play = yes2

windy = false 8 outlook = overcasthumidity = normal 2

outlook = overcasthumidity = high

play = yes2

play = yes 9 outlook = overcasthumidity = high 2

outlook = overcastwindy = trueplay = yes

2

play = no 5 outlook = overcastwindy = true 2

outlook = overcastwindy = false

play = yes2

hellip hellip

Association Rules (8) An example dataset

15copy Miquel Sagravenchez i Marregrave KEMLG 2020

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

Rule Generation Let us take the 3-item set

L3 = humidity = normal windy = false play = yesSupp (L3) = 4

Rules generated Confidenceif humidity = normal and windy = false then play = yes 44 = 100if humidity = normal and play = yes then windy = false 46 = 666if windy = false and play = yes then humidity = normal 46 = 666if humidity = normal then windy = false and play = yes 47 = 5714if windy = false then humidity = normal and play = yes 48 = 50if play = yes then humidity = normal and windy = false 49 = 4444if empty then humidity = normal and windy = false and play = yes 414 = 2857

Rules(N-item set) = sum119894119894=1119873119873 119862119862119873119873 119894119894

In the example database with 100 confidence and minsup ge 2 there are 58 rules

16copy Miquel Sagravenchez i Marregrave KEMLG 2020

  • Association Rules
  • Associative Models
  • Association Rules (1)
  • Association Rules (2)
  • Association Rules (3)
  • Association Rules (4)
  • Association Rules (5)
  • Association Rules (6)
  • Association Rules (7)
  • Apriori Algorithm
  • Apriori algorithm (1)
  • Apriori (Tminsup) algorithm (2)
  • Candidate Generation
  • Apriori algorithm (3)
  • Association Rules (8)
  • Rule Generation

Apriorialgorithm

(3)

Example minsup = 2

14copy Miquel Sagravenchez i Marregrave KEMLG 2020

One-item sets (12) sup Two-item sets

(47) sup Three-item sets (39) sup Four-item sets

(6) sup

outlook = sunny 5 outlook = sunnytemperature = mild 2

outlook = sunnytemperature = hothumidity = high

2outlook = sunny

temperature = hothumidity = high

play = no2

outlook = overcast 4 outlook = sunnytemperature = hot 2

outlook = sunnytemperature = hot

play = no2

outlook = sunnyhumidity = highwindy = false

play = no2

outlook = rainy 5 outlook = sunnyhumidity = normal 2

outlook = sunnyhumidity = normal

play = yes2

outlook = overcasttemperature = hot

windy = falseplay = yes

2

temperature = cool 4 outlook = sunnyhumidity = high 3

outlook = sunnyhumidity = highwindy = false

2outlook = rainy

temperature = mildwindy = false

play = yes2

temperature = mild 6 outlook = sunnywindy = true 2

outlook = sunnyhumidity = high

play = no3

outlook = rainyhumidity = normal

windy = falseplay = yes

2

temperature = hot 4 outlook = sunnywindy = false 3

outlook = sunnywindy = false

play = no2

temperature = coolhumidity = normal

windy = falseplay = yes

2

humidity = normal 7 outlook = sunnyplay = yes 2

outlook = overcasttemperature = hot

windy = false2

humidity = high 7 outlook = sunnyplay = no 3

outlook = overcasttemperature = hot

play = yes2

windy = true 6 outlook = overcasttemperature = hot 2

outlook = overcasttemperature = hot

play = yes2

windy = false 8 outlook = overcasthumidity = normal 2

outlook = overcasthumidity = high

play = yes2

play = yes 9 outlook = overcasthumidity = high 2

outlook = overcastwindy = trueplay = yes

2

play = no 5 outlook = overcastwindy = true 2

outlook = overcastwindy = false

play = yes2

hellip hellip

Association Rules (8) An example dataset

15copy Miquel Sagravenchez i Marregrave KEMLG 2020

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

Rule Generation Let us take the 3-item set

L3 = humidity = normal windy = false play = yesSupp (L3) = 4

Rules generated Confidenceif humidity = normal and windy = false then play = yes 44 = 100if humidity = normal and play = yes then windy = false 46 = 666if windy = false and play = yes then humidity = normal 46 = 666if humidity = normal then windy = false and play = yes 47 = 5714if windy = false then humidity = normal and play = yes 48 = 50if play = yes then humidity = normal and windy = false 49 = 4444if empty then humidity = normal and windy = false and play = yes 414 = 2857

Rules(N-item set) = sum119894119894=1119873119873 119862119862119873119873 119894119894

In the example database with 100 confidence and minsup ge 2 there are 58 rules

16copy Miquel Sagravenchez i Marregrave KEMLG 2020

  • Association Rules
  • Associative Models
  • Association Rules (1)
  • Association Rules (2)
  • Association Rules (3)
  • Association Rules (4)
  • Association Rules (5)
  • Association Rules (6)
  • Association Rules (7)
  • Apriori Algorithm
  • Apriori algorithm (1)
  • Apriori (Tminsup) algorithm (2)
  • Candidate Generation
  • Apriori algorithm (3)
  • Association Rules (8)
  • Rule Generation

Association Rules (8) An example dataset

15copy Miquel Sagravenchez i Marregrave KEMLG 2020

Outlook Temperature Humidity Windy Play

sunny hot high false no

sunny hot high true noovercast hot high false yesrainy mild high false yesrainy cool normal false yesrainy cool normal true noovercast cool normal true yessunny mild high false nosunny cool normal false yesrainy mild normal false yessunny mild normal true yesovercast mild high true yesovercast hot normal false yesrainy mild high true no

Rule Generation Let us take the 3-item set

L3 = humidity = normal windy = false play = yesSupp (L3) = 4

Rules generated Confidenceif humidity = normal and windy = false then play = yes 44 = 100if humidity = normal and play = yes then windy = false 46 = 666if windy = false and play = yes then humidity = normal 46 = 666if humidity = normal then windy = false and play = yes 47 = 5714if windy = false then humidity = normal and play = yes 48 = 50if play = yes then humidity = normal and windy = false 49 = 4444if empty then humidity = normal and windy = false and play = yes 414 = 2857

Rules(N-item set) = sum119894119894=1119873119873 119862119862119873119873 119894119894

In the example database with 100 confidence and minsup ge 2 there are 58 rules

16copy Miquel Sagravenchez i Marregrave KEMLG 2020

  • Association Rules
  • Associative Models
  • Association Rules (1)
  • Association Rules (2)
  • Association Rules (3)
  • Association Rules (4)
  • Association Rules (5)
  • Association Rules (6)
  • Association Rules (7)
  • Apriori Algorithm
  • Apriori algorithm (1)
  • Apriori (Tminsup) algorithm (2)
  • Candidate Generation
  • Apriori algorithm (3)
  • Association Rules (8)
  • Rule Generation

Rule Generation Let us take the 3-item set

L3 = humidity = normal windy = false play = yesSupp (L3) = 4

Rules generated Confidenceif humidity = normal and windy = false then play = yes 44 = 100if humidity = normal and play = yes then windy = false 46 = 666if windy = false and play = yes then humidity = normal 46 = 666if humidity = normal then windy = false and play = yes 47 = 5714if windy = false then humidity = normal and play = yes 48 = 50if play = yes then humidity = normal and windy = false 49 = 4444if empty then humidity = normal and windy = false and play = yes 414 = 2857

Rules(N-item set) = sum119894119894=1119873119873 119862119862119873119873 119894119894

In the example database with 100 confidence and minsup ge 2 there are 58 rules

16copy Miquel Sagravenchez i Marregrave KEMLG 2020

  • Association Rules
  • Associative Models
  • Association Rules (1)
  • Association Rules (2)
  • Association Rules (3)
  • Association Rules (4)
  • Association Rules (5)
  • Association Rules (6)
  • Association Rules (7)
  • Apriori Algorithm
  • Apriori algorithm (1)
  • Apriori (Tminsup) algorithm (2)
  • Candidate Generation
  • Apriori algorithm (3)
  • Association Rules (8)
  • Rule Generation