fungsi mayor assosiation. what is association mining? association rule mining: –finding frequent...

22
FUNGSI MAYOR Assosiation

Upload: alexys-fowles

Post on 14-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

FUNGSI MAYORAssosiation

What Is Association Mining?

• Association rule mining:– Finding frequent patterns, associations,

correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.

• Applications:– Basket data analysis, cross-marketing, catalog

design, loss-leader analysis, clustering, classification, etc.

• Examples. – Rule form: “Body ® Head [support,

confidence]”.– buys(x, “diapers”) ® buys(x, “beers”) [0.5%, 60%]

• Tugas asosiasi data mining adalah menemukan atribut yang muncul dalam satu waktu.

Rule Measures: Support and Confidence

• Find all the rules X & Y Z with minimum confidence and support– support, s, probability that a

transaction contains {X Y Z}

– confidence, c, conditional probability that a transaction having {X Y} also contains Z

Transaction ID Items Bought2000 A,B,C1000 A,C4000 A,D5000 B,E,F

Let minimum support 50%, and minimum confidence 50%, we have

A C (50%, 66.6%)C A (50%, 100%)

Customerbuys diaper

Customerbuys both

Customerbuys beer

Association Rule Mining• Given a set of transactions, find rules that will

predict the occurrence of an item based on the occurrences of other items in the transaction

Market-Basket transactions

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

Example of Association Rules

{Diaper} {Beer},{Milk, Bread} {Eggs,Coke},{Beer, Bread} {Milk},

Definition: Frequent Itemset• Itemset

– A collection of one or more items• Example: {Milk, Bread, Diaper}

– k-itemset• An itemset that contains k items

• Support count ()– Frequency of occurrence of an

itemset– E.g. ({Milk, Bread,Diaper}) = 2

• Support– Fraction of transactions that

contain an itemset– E.g. s({Milk, Bread, Diaper}) =

2/5• Frequent Itemset

– An itemset whose support is greater than or equal to a minsup threshold

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

Definition: Association RuleExample:

Beer}Diaper,Milk{

4.052

|T|)BeerDiaper,,Milk( s

67.032

)Diaper,Milk()BeerDiaper,Milk,(

c

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

Example of Rules:

{Milk,Beer} {Diaper}{Diaper,Beer} {Milk}{Beer} {Milk,Diaper} {Diaper} {Milk,Beer} {Milk} {Diaper,Beer}

Definition: Association RuleExample:

Beer}Diaper,Milk{

4.052

|T|)BeerDiaper,,Milk( s

67.032

)Diaper,Milk()BeerDiaper,Milk,(

c

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

Example of Rules:

{Milk,Beer} {Diaper}{Diaper,Beer} {Milk}{Beer} {Milk,Diaper} {Diaper} {Milk,Beer} {Milk} {Diaper,Beer}

(s=0.4, c=1.0) (s=0.4, c=0.67) (s=0.4, c=0.67) (s=0.4, c=0.5) (s=0.4, c=0.5)

The Apriori Algorithm — Example

TID Items100 1 3 4200 2 3 5300 1 2 3 5400 2 5

Database D itemset sup.{1} 2{2} 3{3} 3{4} 1{5} 3

itemset sup.{1} 2{2} 3{3} 3{5} 3

Scan D

C1L1

itemset{1 2}{1 3}{1 5}{2 3}{2 5}{3 5}

itemset sup{1 2} 1{1 3} 2{1 5} 1{2 3} 2{2 5} 3{3 5} 2

itemset sup{1 3} 2{2 3} 2{2 5} 3{3 5} 2

L2

C2 C2

Scan D

C3 L3Scan D itemset sup{2 3 5} 2

itemset{1 3 5}{2 3 5}

Asosiasi dengan Business Intelligence

pada SQL Server

Algoritma Asosiasi MBA (Market Basket Analysis)Langkah-langkah algoritma MBA:1. Tetapkan besaran dari konsep itemset sering,

nilai minimum besaran support dan besaran confidence yang diinginkan.

2. Menetapkan semua itemset sering, yaitu itemset yang memiliki frekuensi itemset minimal sebesar bilangan sebelumnya.

3. Dari semua itemset sering, hasilkan aturan asosiasi yang memenuhi nilai minimum support dan confidence

Support (AB) = P(AB)

Confidence(AB) = P(B|A)

tuplesofnumber total

B andA both containing tuplesofnumber B)support(A

A containing tuplesofnumber

B andA both containing tuplesofnumber B)(Aconfidence

unt(A)support_co

B)unt(Asupport_coA)|P(BB)(Aconfidence