利用關聯規則建構查詢關鍵字之網站推薦ir.lib.nknu.edu.tw/ir/retrieve/18234/利用關聯規則建構查詢關鍵字之網站... ·...

2010, 28, 45-60

99 3 17 99 6 30

1 2

(association rules)

1 2

46

Using Association Rules to Construct Website Recommendations

of Query Keywords

Chui-cheng Chen* Tsung-yi Chen**

Abstract

The website search engine is one of the most popular services in electronic commerce business models. This paper uses browsing data as the source data of mining, and a browsing data contains query keywords and browse websites in search engine. The association rule is used to find the adaptive website recommendations of query keywords from two aspects. One is to propose a fast algorithm to mine association rules between query keywords and browse websites. The other is to use some query keywords as the target of mining, and to modify the previous algorithm to mine association rules as those antecedents are the query keywords. The adaptive website recommendations with rank by the confidence of the association rules can be provided for the query keywords in search engine. A mining system for adaptive website recommendations of query keywords is designed and constructed according to both algorithms, and the performances of both algorithms are evaluated.

Keywords: electronic commerce, association rules, query keywords, search engine, website

recommendations

* Associate Professor, Department of Information Management, Southern Taiwan University. ** Assistant Professor, Department of Electronic Commerce Management, Nanhua University.

47

(Internet)(World Wide Web)(search engine) Yahoo(http://www.yahoo.com) Google (http://www.google.com)

(data mining)

(Han & Kamber, 2006)

(association rules)

1.

2.

(sequence)(clustering)(classification)(forecasting)(Han & Kamber, 2006Chen, Han & Yu, 1996)

48

(2007)(2002)

(2002)(2002)(2002)(2002)(2007)Hosseini Abolhassani(2007)Huang, Lee Lin(2001)

Li, ChenYang(2002)Fonseca, Golgher, de Moura, Ziviani(2003)Li, Y. Li, G. Y.(2007) (Ontology)Nettleton, Calderon-Benavides Baeza-ates(2006)

Agrawal, Imielinski Swami (1993)

(itemsets)(items) XY XY XY= X (antecedent)Y(consequent) XY (support)(XY)(confidence)(XY) X (minimum support)(minimum confidence) XY

k k1 k-(frequent temsets) k k- frequentk ABC ABC Agrawal Srikant(1994)Han, Pei, Yin Mao(2004)Holt Chung(2000)Li, He Lei(2005)Tsay Chang- Chien(2004)

Apriori (Agrawal & Srikant, 1994) Apriori

1. frequentk-1k>1

49

2. (1) k-2 frequentk-1 itemsetk 3. (2) itemsetk itemsetk-1 (1) itemsetk

4. (3) itemsetk frequentk

5. frequentk 6. (1) frequentk+1 (4) itemsetk

frequentkTsay Chang-Chien(2004) Apriori itemsetkk1 k CDAR

(1) I={i1, i2, , ia} a (2) J={j1, j2, , jb} b (3) T={T1, T2, , Tk, Tm} m Tk

k 1km (4) Tk Tk=[X, Y]XIX

YJY X Y X1 or X2 oror Xd d Y [X1, Y], [X2, Y],, [Xd, Y] d

andor A and B or CABC A and B C

andand A and B AB

50

Apriori

Apriori AKW(mining association rules between keywords and websites)

AKW 1. D1 frequent1 2. (1) frequent1 frequent1 itemset2 D1 itemset2 frequent2 D1 itemset2 3 D2(3)

3. frequentk-1k>2 Dk-1 4. (3) k-2 frequentk-1 itemsetk 5. (4) itemsetk itemsetk-1 (3)() itemsetk

6. Dk-1(5) itemsetk frequentk Dk-1 itemsetk k+1 Dk

7. frequentk XYXY=frequentkXIYJ

8. (3) frequentk+1 XY

X Y XY X Y n Yi1in Y1Y2Yn

Apriori itemsetkk>1 frequentk itemsetk k+1

51

itemsetk+1 frequentk+1 AKW Apriori

1 5 {T1, T2, T3, T4, T5} I={Q, R, S, T, U, V, W, X, Y, Z }J={A, B, C, D, E, F, G, H, I. J} 60%( 3) 60%

1

T1 V, W, X, Y, Z A, B, C, G

T2 S, Q, R, Z A, C, E, F

T3 Q, S, T, X H, I, J

T4 R, U, W, X, Y, Z C, D, F, H, I

T5 Q, T, V, Z A, B, C, F, G, J

frequent1

itemset1 itemset1 itemset1 frequent1

A Q A 3 Q 3 A 3

B R B 2 R 2 C 4

C S C 4 S 2 F 3

D T D 1 T 2 Q 3

E U

D1 E 1 U 1

3

X 3

F V F 3 V 2 Z 4

G W G 2 W 2

H X H 2 X 3

I Y I 2 Y 2

J Z J 2 Z 4

52

D2 itemset2 frequent2

AQ 2 AZ 3 T1 V, W, X, Y, Z A, B, C, G

AX 1 CX 3 T2 S, Q, R, Z A, C, E, F

AZ 3 CZ 4 T4 R, U, W, X, Y, Z C, D, F, H, I

CQ 2

3

FZ 3 T5 Q, T, V, Z A, B, C, F, G, J

CX 3

CZ 4

itemset2

FQ 2

FX 1

FZ 3 D3

itemset3 frequent2

ACZ 3 ACZ 3 T1 V, W, X, Y, Z A, B, C, G

AFZ 2 CFZ 3 T2 S, Q, R, Z A, C, E, F

CXZ 2 T4 R, U, W, X, Y, Z C, D, F, H, I

CFZ 3

3

itemset3

T5 Q, T, V, Z A, B, C, F, G, J

itemset4 ACFZ(5) ACZ- ZAC 3/4=75%

Z A C

andorX and Y or ZX, Y, Z X and YZ

andandX and YXY

53

XX

XYXIYJXY

X Y Y X ii1 AKW AAK (mining association rules those antecedents are the query keywords) X

1. D1 frequent1 X D2

2. (1) X frequent1 itemseti+1 D2itemseti+1 frequenti+1 D2 itemseti+1 i+2 D3

3. frequenti+k-1k>1 Dk+1 4. (3) i+k-2 frequenti+k-1 itemseti+k 5. (4) itemseti+k itemseti+k-1

(3)( X) itemseti+k 6. Dk+1(5) itemseti+k frequentk Dk+1 itemseti+k i+k+1 Dk+2

7. frequentk XYXY=frequentkYJ

8. (3) frequenti+k+1 XY

X Y XY X Y n Yj1jn Y1Y2Yn

X i X frequent1 X itemseti+kk>1 frequenti+k itemseti+k i+k+1 AAK

54

X

1 Z 60%( 3)60%Z

itemset1 itemset1 frequent1

A A 3 A 3

B B 2 C 4

C C 4 F 3

D D 1

E

D1

E 1

3

F F 3

G G 2

H H 2

I I 2

J J 2

D 2

itemset2

T1 V, W, X , Y, Z A, B , C, G AZ 3

T2 S, Q, R, Z A, C, E, F CZ 3

T4 R, U , W, X , Y, Z C, D, F, H , I FZ 3

Z

T5 Q, T, V, Z A, B , C, F, G, J

D 3

frequent2

AZ 3 T1 V, W, X, Y, Z A, B, C, G

CZ 3 T2 S, Q, R, Z A, C, E, F

FZ 3 T4 R, U, W, X, Y, Z C, D, F, H, I

3

itemset3

T5 Q, T, V, Z A, B, C, F, G, J

D4 itemset3 frequent3

ACZ 3 ACZ 3 T1 V, W, X, Y, Z A, B, C, G

AFZ 2 CFZ 3 T2 S, Q, R, Z A, C, E, F

CFZ 3 T4 R, U, W, X, Y, Z C, D, F, H, I

3

itemset3

T5 Q, T, V, Z A, B, C, F, G, J

itemset4 ACFZ(5)

55

ACZ- ZAC 3/4=75% Z A C X

1

(1) (2)

(2)

(1)

1

C# IBM Data Mining (http://www.almaden.ibm.com/) 50000 D1 D2 D1 D2 2n ntran np tl pl D1 D2 50000 D3

2

n ntran np tl pl

D 1 1000 50000 10000 10 4

D 2 1000 50000 10000 10 4

D 3 D 1 D 2 50000

D1 k1, k2, k3, , k1000 D2

56

w1, w2, w3,, w1000 Access 2003 D3

2

( 1.5%)( 70%)

2

3 ( k1 and k5 )

3

2 3

3

3

CPU CPU-Pentium IV 2.4G Hz

Main memory 256 Mbytes

Windows XP Professional

C#

Access 2003

70% Apriori

57

CDAR AKW 4 4 AKW Apriori CDAR

:50000

0

20

40

60

80

100

120

0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05

AKW

CDAR

Apriori

4 Apriori CDAR AKW

70% AKW AAK () 10 5 5 AAK AKW

:50000

0

10

20

30

40

50

60

70

80

90

0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05

0

2

4

6

8

10

12

14

16

18

AKW

AAK

5 AKW AAK

58

4 Apriori CDAR Apriori CDAR

4 Apriori CDAR AKW AAK

itemsetk

k>1

k itemsetkk>1

itemsetk

k>1 itemsetk

k+1

itemseti+k i

k>1

itemseti+k i+k+1

1. 2. Huang, Lee Lin(2001)(feedback)

3. 4.

59

(2002)

(2007)

(2002)

(2002)

(2002)

(2002)

(2007)

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large database. Proceedings of the 20th International Conference on Very Large Data Bases, 487-499.

Agrawal, R., Imielinski, T., & Swami, A.(1993). Mining association rules between sets of items in very large ddatabase. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.

Chen, M. S., Han, J., & Yu, P. S. (1996). Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6), 866-883.

Fonseca, B. M., Golgher, P. B., de Moura, E. S. & Ziviani, N. (2003). Using association rules to discover search engines related queries. Proceedings of the First Latin American Web Congress, 66-71.

Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques (2nd Ed.). Morgan Kaufmann. Han, J., Pei, J., Yin, Y., & Mao, R. (2004). Mining frequent patterns without candidate generation: A

frequent-pattern tree approach. Data Mining and Knowledge Discovery, 8(1), 53-87. Holt, J. D., & Chung, S. M. (2000). Mining association rules using inverted hashing and pruning.

Information Processing Letters, 83, 211-220. Hosseini, M., & Abolhassani, H. (2007). Mining search engine query log for evaluating content and

structure of a web site. Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, 235-241.

Huang, Y. P., Lee, Y. C., & Lin, K. (2001). An intelligent approach to mining the related websites. Proceedings of the International Conference on IFSA World Congress and 20th NAFIPS, 1, 435-440.

60

Li, Y., & Li, G. Y. (2007). Research and realization of personalized search engine based on ontology. Proceedings of the International Conference on Network and Parallel Computing Workshops, 1016-1020.

Li, Y., Chen, X. Z., & Yang, B. R. (2002). Research on web mining-based intelligent search engine. Proceedings of the First International Conference on Machine Learning and Cybernetics, 1, 386-390.

Li, Z. C., He, P. L., & Lei, M. (2005). A high efficient aprioriTid algorithm for mining association rule. Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, 1812-1815.

Nettleton, D. F., Calderon-Benavides, L., & Baeza-ates, R. (2006). Analysis of web search engine clicked documents. Proceedings of the Fourth Latin American Web Congress, 209-219.

Tsay, Y. J., & Chang-Chien, Y. W. (2004). An efficient cluster and decomposition algorithm for mining association rules. Information Sciences, 160, 161-171.

利用關聯規則建構查詢關鍵字之網站推薦ir.lib.nknu.edu.tw/ir/retrieve/18234/利用關聯規則建構查詢關鍵字之網站... ·...

Documents