association rule(ar)

Upload: dangngan118

Post on 04-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Association Rule(AR)

    1/27

    Introduction to Association Rules

    Albert Orriols i Puig

    aorr o s sa e.ur .e u

    Artificial Intelligence Machine Learning

    En in eria i Ar uitectura La Salle

    Universitat Ramon Llull

  • 8/14/2019 Association Rule(AR)

    2/27

    Recap of Lecture 5-12

    LETS START WITH DATA

    Slide 2Artificial Intelligence Machine Learning

  • 8/14/2019 Association Rule(AR)

    3/27

    Recap of Lecture 5-12

    Data Set Classification Model How?

    We have seen four different types of approaches to classification :

    Decision trees (C4.5)

    -

    Bayesian classifiers (Nave Bayes)

    Slide 3Artificial Intelligence Machine Learning

    eura e wor s ercep ron, a ne, a a ne,

  • 8/14/2019 Association Rule(AR)

    4/27

    Todays Agenda

    Introduction to Association Rules

    A Taxonomy of Association Rules

    Measures of Interest Apriori

    Slide 4Artificial Intelligence Machine Learning

  • 8/14/2019 Association Rule(AR)

    5/27

    Introduction to AR

    Ideas come from the market basket anal sis MBA

    Lets go shopping!

    Milk, eggs, sugar,bread

    Milk, eggs, cereal, Eggs, sugar

    rea

    Customer2 Customer3

    What do my customer buy? Which product are bought together?

    Aim: Find associations and correlations between the different

    Slide 5

    items that customers place in their shopping basket

    Artificial Intelligence Machine Learning

  • 8/14/2019 Association Rule(AR)

    6/27

    Introduction to AR

    Formalizin the roblem a little bit

    Transaction Database T: a set of transactions T = {t1, t2, , tn}

    ac ransac on con a ns a se o ems em se An itemset is a collection of items I = {i1, i2, , im}

    Find frequent/interesting patterns, associations, correlations, or

    databases or other information repositories.

    X Y

    Slide 6Artificial Intelligence Machine Learning

  • 8/14/2019 Association Rule(AR)

    7/27

    Example of AR

    TID Items

    T1 bread, jelly, peanut-butter

    Examples:

    -

    T2 bread, peanut-butter

    T3 bread, milk, peanut-butter

    beer bread

    T4 beer, bread

    T5 beer, milk

    Frequent itemsets: Items that frequently appear together

    = - ,

    I = {beer, bread}

    Slide 7Artificial Intelligence Machine Learning

  • 8/14/2019 Association Rule(AR)

    8/27

    Whats an Interesting Rule?

    Su ort count TID Items

    Frequency of occurrence of

    and itemset

    T1 bread, jelly, peanut-butter

    T2 bread, eanut-butter

    ({bread, peanut-butter}) = 3

    beer bread = 1

    T3 bread, milk, peanut-butter

    T4 beer, bread

    Support T5 beer, milk

    rac on o ransac ons a

    contain an itemset

    , -

    s ({beer, bread}) = 1/5

    requen emse

    An itemset whose support is greater than or equal to a

    Slide 8

    minimum support threshold (minsup)

    Artificial Intelligence Machine Learning

  • 8/14/2019 Association Rule(AR)

    9/27

    Whats an Interesting Rule?

    An association rule is an TID Itemsimplication of two itemsets T1 bread, jelly, peanut-butter

    T2 bread, eanut-butter

    T3 bread, milk, peanut-butter

    T4 beer, bread

    Many measures of interest.

    The two most used are:

    T5 beer, milk

    Support (s)

    The occurring frequency of the rule, )( YXi.e., number of transactions that

    contain both X and Y trans.of#

    =

    on ence c

    The strength of the association, )( YXc

    =

    Slide 9

    . .,appear in transactions that contain X

    Artificial Intelligence Machine Learning

  • 8/14/2019 Association Rule(AR)

    10/27

    Interestingness of Rules

    TID Items

    TID s c

    bread peanut-butter 0.60 0.75

    T1 bread, jelly, peanut-butter

    T2 bread, peanut-butter

    peanut-butter bread 0.60 1.00

    beer bread 0.20 0.50

    T3 bread, milk, peanut-butter

    T4 beer, bread

    peanut-butterjelly 0.20 0.33

    jelly peanut-butter 0.20 1.00

    T5 beer, milk

    e y m . .

    Many other interesting measures

    approaches

    Slide 10Artificial Intelligence Machine Learning

  • 8/14/2019 Association Rule(AR)

    11/27

    Types of AR

    Binar association rules:

    bread peanut-butter

    Quantitative association rules:

    weight in [70kg 90kg] height in [170cm 190cm]

    Fuzzy association rules:

    weight in TALL height in TALL

    Lets start for the beginning

    Slide 11

    nary assoc a on ru es pr or

    Artificial Intelligence Machine Learning

  • 8/14/2019 Association Rule(AR)

    12/27

    Apriori

    This is the most influential AR miner

    It consists of two steps

    1. Generate all frequent itemsets whose support minsup2. Use frequent itemsets to generate association rules

    ,

    Slide 12Artificial Intelligence Machine Learning

  • 8/14/2019 Association Rule(AR)

    13/27

    Apriori

    null

    A B C D E

    AB ADAC AE BDBC BE CECD DE

    ABCD ABCE ABDE ACDE BCDE

    ABCDE

    Slide 13

    Do I have to generate them all?

    Artificial Intelligence Machine Learning

    A i i

  • 8/14/2019 Association Rule(AR)

    14/27

    Apriori

    Lets avoid ex andin all the ra h

    Key idea:

    Downward closure property:Any subsets of a frequent itemsetare also frequent itemsets

    Therefore, the algorithm iteratively does: Create itemsets

    Slide 14Artificial Intelligence Machine Learning

    E l It t G ti

  • 8/14/2019 Association Rule(AR)

    15/27

    Example Itemset Generation

    nullInfrequent

    A B C D E

    AB ADAC AE BDBC BE CECD DE

    ABCD ABCE ABDE ACDE BCDE

    ABCD

    Slide 15

    Do I have to generate them all?

    Artificial Intelligence Machine Learning

    R i th E l

  • 8/14/2019 Association Rule(AR)

    16/27

    Recovering the Example

    TID Items

    T1 bread, jelly, peanut-butter

    T2 bread, peanut-butter

    rea , m , peanut- utter T4 beer, bread

    Minimum su ort = 3eer, m

    1-itemsets

    bread 4

    -Item count

    2-itemsets

    jelly 1 bread, peanut-b 3

    beer 1

    Slide 16Artificial Intelligence Machine Learning

    A i i Al ith

  • 8/14/2019 Association Rule(AR)

    17/27

    Apriori Algorithm

    k=1

    Generate frequent itemsets of length 1

    Repeat until no frequent itemsets are found k := k+1

    Generate itemsets of size k from the k-1 frequent itemsets

    Slide 17Artificial Intelligence Machine Learning

    A i i Al ith

  • 8/14/2019 Association Rule(AR)

    18/27

    Apriori Algorithm

    AlgorithmApriori(T)

    C1 init-pass(T);

    F1 {f| f C1, f.count/n minsup}; // n: no. of transactions in T

    for(k = 2; Fk-1 ; k++) doCk candidate-gen(Fk-1);

    for each transaction t T do

    for each candidate c

    Ck doc s con a ne n en

    c.count++;

    en

    end

    k k .

    end

    Slide 18Artificial Intelligence Machine Learning

    k k

    A i i Al ith

  • 8/14/2019 Association Rule(AR)

    19/27

    Apriori Algorithm

    Function candidate-gen(Fk-1)

    Ck ;

    forallf1, f2 Fk-1

    withf1 = {i1, , ik-2, ik-1}andf2 = {i1, , ik-2, ik-1}

    andik-1 < ik-1 do

    c

    {i1, , ik-1, ik-1}; // join f1and f2Ck Ck {c};

    foreach (k-1)-subset s of c do

    s k-1 en

    delete c from Ck; // prune

    end

    Slide 19Artificial Intelligence Machine Learning

    k

    Example of Apriori Run

  • 8/14/2019 Association Rule(AR)

    20/27

    Example of Apriori Run

    Itemset supa a ase

    C1L1

    Tid Items

    {A} 2

    {B} 3{A} 2

    {B} 3

    1st scan, ,

    20 B, C, E

    30 A, B, C, E

    {D} 1

    {E} 3

    {C} 3{E} 3

    C2 C2

    40 B, E

    ItemsetItemset sup

    L2 2nd scan {A, B}

    {A, C}

    ,

    {A, C} 2

    {A, E} 1

    Itemset sup

    {A, C} 2

    {A, E}

    {B, C}

    {B, C} 2

    {B, E} 3

    ,

    {B, E} 3{C, E} 2

    C Lrd

    ,

    {C, E}

    Itemset

    Slide 20Artificial Intelligence Machine Learning

    {B, C, E}emse sup

    {B, C, E} 2

    Apriori

  • 8/14/2019 Association Rule(AR)

    21/27

    Apriori

    Remember that A riori consists of two ste s

    1. Generate all frequent itemsets whose support minsup

    2. se requen emse s o genera e assoc a on ru es

    We accomplished step 1. So we have all frequent

    itemsets So, lets pay attention to the second step

    Slide 21Artificial Intelligence Machine Learning

    Rule Generation in Apriori

  • 8/14/2019 Association Rule(AR)

    22/27

    Rule Generation in Apriori

    Given a fre uent itemset L

    Find all non-empty subsets F in L, such that the association

    rule F L-F satisfies the minimum confidence

    Create the rule F {L-F}

    If L={A,B,C} The candidate itemsets are: ABC, ACB, BCA, ABC,

    BAC, CAB

    In general, there are 2K

    -2 candidate solutions, where k is thelength of the itemset L

    Slide 22Artificial Intelligence Machine Learning

    Can you Be More Efficient?

  • 8/14/2019 Association Rule(AR)

    23/27

    Can you Be More Efficient?

    Can we a l the same trick used with su ort?

    Confidence does not have anti-monote property

    a s, c

    > c

    Dont know!

    But confidence of rules generated from the same itemsetoes ave t e ant -monote property

    L={A,B,C,D}

    C(ABC

    D)

    c(AB CD)

    c(A BCD) We can a l this ro ert to rune the rule eneration

    Slide 23Artificial Intelligence Machine Learning

    Example of Efficient Rule Generation

  • 8/14/2019 Association Rule(AR)

    24/27

    Example of Efficient Rule Generation

    ABCDLow

    confidence

    ABCD ABDC ACDB BCDA

    ABCD ACBD BCAD BDADADBC CDAB

    ABCD BACD CABD DABC

    Slide 24Artificial Intelligence Machine Learning

    Challenges in AR Mining

  • 8/14/2019 Association Rule(AR)

    25/27

    Challenges in AR Mining

    Challen es

    Apriori scans the data base multiple times

    os o en, ere s a g num er o can a es Support counting for candidates can be time expensive

    Reduce the number of scans of the data base

    Shrink the number of candidates

    Counting the support of candidates more efficiently

    Slide 25Artificial Intelligence Machine Learning

    Next Class

  • 8/14/2019 Association Rule(AR)

    26/27

    Next Class

    Advanced topics in association rule mining

    Slide 26Artificial Intelligence Machine Learning

  • 8/14/2019 Association Rule(AR)

    27/27

    Introduction to Association Rules

    Albert Orriols i Puig

    aorr o s sa e.ur .e u

    Artificial Intelligence Machine Learning

    En in eria i Ar uitectura La Salle

    Universitat Ramon Llull