association) analysis -...

15
Association Analysis a.k.a. Market-Basket Analysis Affinity Analysis

Upload: others

Post on 21-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Association) Analysis - birch.iaa.ncsu.edubirch.iaa.ncsu.edu/~slrace/DataMining2019/Slides/Association.pdf · Association)Analysis! Unsupervised: No target outcome for training. Searching

Association  Analysis

a.k.a. Market-Basket Analysis

Affinity Analysis

Page 2: Association) Analysis - birch.iaa.ncsu.edubirch.iaa.ncsu.edu/~slrace/DataMining2019/Slides/Association.pdf · Association)Analysis! Unsupervised: No target outcome for training. Searching

The  Famous  ExampleØ June 1992 Study by NCR (now TeraData) for Osco

Drug searched for product associationsØ Beer and Diapers likely to be purchased togetherØ (and many others, including Fruit Juice and Cough Syrup)

Page 3: Association) Analysis - birch.iaa.ncsu.edubirch.iaa.ncsu.edu/~slrace/DataMining2019/Slides/Association.pdf · Association)Analysis! Unsupervised: No target outcome for training. Searching

Association  AnalysisØ Unsupervised: No target outcome for training.

Searching for patterns in the data.Ø Association Analysis gives us sets of products that

are likely to be purchased together.Ø Used in retail for coupon marketing, targeted

upselling, and product placement. Ø Flexible analytics tool that can be used in many

situations!

Page 4: Association) Analysis - birch.iaa.ncsu.edubirch.iaa.ncsu.edu/~slrace/DataMining2019/Slides/Association.pdf · Association)Analysis! Unsupervised: No target outcome for training. Searching

Association  RulesØ Data comes in the form of transactions:

Ø Question: are there any relationships between items that might be hiding in these transactions?

Transaction  ID Items10001 Bread,  Juice10002 Diapers, Beer,  Eggs,  Formula10003 Milk,  Diapers,  Bread,  Formula10004 Juice, Diapers,  Milk,  Eggs10005 Soda, Milk,  Eggs,  Bread10006 Formula,  Juice,  Diapers,  Bread

Page 5: Association) Analysis - birch.iaa.ncsu.edubirch.iaa.ncsu.edu/~slrace/DataMining2019/Slides/Association.pdf · Association)Analysis! Unsupervised: No target outcome for training. Searching

Association  RulesØ Data comes in the form of transactions:

Ø Diapers è Formula?

Transaction  ID Items10001 Bread,  Juice10002 Diapers, Beer,  Eggs,  Formula10003 Milk,  Diapers,  Bread,  Formula10004 Juice, Diapers,  Milk,  Eggs10005 Soda, Milk,  Eggs,  Bread10006 Formula,  Juice,  Diapers,  Bread

Page 6: Association) Analysis - birch.iaa.ncsu.edubirch.iaa.ncsu.edu/~slrace/DataMining2019/Slides/Association.pdf · Association)Analysis! Unsupervised: No target outcome for training. Searching

An  Association  RuleDiapers è Formula

Interpretation: Someone who buys diapers is also likely to (simultaneously) buy formula.

Antecedent Consequent

Page 7: Association) Analysis - birch.iaa.ncsu.edubirch.iaa.ncsu.edu/~slrace/DataMining2019/Slides/Association.pdf · Association)Analysis! Unsupervised: No target outcome for training. Searching

Quantifying  Association  Rules

Ø The strength of an association rule A è B is quantified using three statistics:Ø Support: 𝑷 𝑨 ∩ 𝑩 = 𝑷(𝑨  and  𝑩)

ØMeasures how often we find instances of this rule in the training data.

Ø Confidence:  𝑷 𝑩 𝑨 =   𝑷(𝑨∩𝑩)𝑷(𝑨)

ØMeasures what percent of transactions containing A also contain B.

Ø Lift: ,(-|/),(-)

=   ,(/∩-), / ,(-)

ØMeasures how much more likely we are to buy B given that we also buy A than we are to buy B at random.

ØWant Lift values greater than 1 !!

Page 8: Association) Analysis - birch.iaa.ncsu.edubirch.iaa.ncsu.edu/~slrace/DataMining2019/Slides/Association.pdf · Association)Analysis! Unsupervised: No target outcome for training. Searching

Quantifying  Association  Rules

Diapers è Formula?

Support: How often does this “rule” present itself? P(Diapers and Formula)

In 50% of the transactions

Transaction  ID Items10001 Bread,  Juice10002 Diapers, Beer,  Eggs,  Formula10003 Milk,  Diapers,  Bread,  Formula10004 Juice, Diapers,  Milk,  Eggs10005 Soda, Milk,  Eggs,  Bread10006 Formula,  Juice,  Diapers,  Bread

Page 9: Association) Analysis - birch.iaa.ncsu.edubirch.iaa.ncsu.edu/~slrace/DataMining2019/Slides/Association.pdf · Association)Analysis! Unsupervised: No target outcome for training. Searching

Quantifying  Association  Rules

Diapers è Formula?

Confidence: Given one buys Diapers, what is chance they also buy Formula? P(Formula | Diapers)

75% of Diaper purchases also contained Formula

Transaction  ID Items10001 Bread,  Juice10002 Diapers, Beer,  Eggs,  Formula10003 Milk,  Diapers,  Bread,  Formula10004 Juice, Diapers,  Milk,  Eggs10005 Soda, Milk,  Eggs,  Bread10006 Formula,  Juice,  Diapers,  Bread

Page 10: Association) Analysis - birch.iaa.ncsu.edubirch.iaa.ncsu.edu/~slrace/DataMining2019/Slides/Association.pdf · Association)Analysis! Unsupervised: No target outcome for training. Searching

Quantifying  Association  Rules

Diapers è Formula?

Lift: How much more likely are we to see Formula with Diapers than see Formula overall? P(F|D)/P(F)

We are (0.75/0.5 =) 1.5 times more likely to see a formula purchase with Diapers than we are to see one

at random.

Transaction  ID Items10001 Bread,  Juice10002 Diapers, Beer,  Eggs,  Formula10003 Milk,  Diapers,  Bread,  Formula10004 Juice, Diapers,  Milk,  Eggs10005 Soda, Milk,  Eggs,  Bread10006 Formula,  Juice,  Diapers,  Bread

Page 11: Association) Analysis - birch.iaa.ncsu.edubirch.iaa.ncsu.edu/~slrace/DataMining2019/Slides/Association.pdf · Association)Analysis! Unsupervised: No target outcome for training. Searching

Some  Post-­‐‑Hoc  TakeawaysProduct A à Product B

Ø Product B as consequent: Determine what can be done to boost its sales.Ø product placementØ upsellingØ coupons for related products

Ø Product A as antecendent: Determine what other products would be affected by changes to Prod A.Ø discontinuationsØ price changes

Page 12: Association) Analysis - birch.iaa.ncsu.edubirch.iaa.ncsu.edu/~slrace/DataMining2019/Slides/Association.pdf · Association)Analysis! Unsupervised: No target outcome for training. Searching

AèB vs. BèA

Same SupportSame Lift

Different Confidence

We DO NOT say “those who buy A will THEN buy B.”

Direction  ofAssociation  Rules

Page 13: Association) Analysis - birch.iaa.ncsu.edubirch.iaa.ncsu.edu/~slrace/DataMining2019/Slides/Association.pdf · Association)Analysis! Unsupervised: No target outcome for training. Searching

FindingAssociation  Rules

Ø Most algorithms have two parts:Ø Itemset generation – find all sets of items that satisfy some minimum

support. Ø Rule generation – determine which sets generated in step 1 satisfy some

minimum confidence.Ø For more details of each part, see text by Tan, Steinbach, and Kumar

Ø To perform this analysis in SAS EM, must input a transaction dataset.

Page 14: Association) Analysis - birch.iaa.ncsu.edubirch.iaa.ncsu.edu/~slrace/DataMining2019/Slides/Association.pdf · Association)Analysis! Unsupervised: No target outcome for training. Searching
Page 15: Association) Analysis - birch.iaa.ncsu.edubirch.iaa.ncsu.edu/~slrace/DataMining2019/Slides/Association.pdf · Association)Analysis! Unsupervised: No target outcome for training. Searching