利用關聯規則建構查詢關鍵字之網站推薦ir.lib.nknu.edu.tw/ir/retrieve/18234/利用關聯規則建構查詢關鍵字之網站... ·...
TRANSCRIPT
-
2010, 28, 45-60
99 3 17 99 6 30
1 2
(association rules)
1 2
-
46
Using Association Rules to Construct Website Recommendations
of Query Keywords
Chui-cheng Chen* Tsung-yi Chen**
Abstract
The website search engine is one of the most popular services in electronic commerce business models. This paper uses browsing data as the source data of mining, and a browsing data contains query keywords and browse websites in search engine. The association rule is used to find the adaptive website recommendations of query keywords from two aspects. One is to propose a fast algorithm to mine association rules between query keywords and browse websites. The other is to use some query keywords as the target of mining, and to modify the previous algorithm to mine association rules as those antecedents are the query keywords. The adaptive website recommendations with rank by the confidence of the association rules can be provided for the query keywords in search engine. A mining system for adaptive website recommendations of query keywords is designed and constructed according to both algorithms, and the performances of both algorithms are evaluated.
Keywords: electronic commerce, association rules, query keywords, search engine, website
recommendations
* Associate Professor, Department of Information Management, Southern Taiwan University. ** Assistant Professor, Department of Electronic Commerce Management, Nanhua University.
-
47
(Internet)(World Wide Web)(search engine) Yahoo(http://www.yahoo.com) Google (http://www.google.com)
(data mining)
(Han & Kamber, 2006)
(association rules)
1.
2.
(sequence)(clustering)(classification)(forecasting)(Han & Kamber, 2006Chen, Han & Yu, 1996)
-
48
(2007)(2002)
(2002)(2002)(2002)(2002)(2007)Hosseini Abolhassani(2007)Huang, Lee Lin(2001)
Li, ChenYang(2002)Fonseca, Golgher, de Moura, Ziviani(2003)Li, Y. Li, G. Y.(2007) (Ontology)Nettleton, Calderon-Benavides Baeza-ates(2006)
Agrawal, Imielinski Swami (1993)
(itemsets)(items) XY XY XY= X (antecedent)Y(consequent) XY (support)(XY)(confidence)(XY) X (minimum support)(minimum confidence) XY
k k1 k-(frequent temsets) k k- frequentk ABC ABC Agrawal Srikant(1994)Han, Pei, Yin Mao(2004)Holt Chung(2000)Li, He Lei(2005)Tsay Chang- Chien(2004)
Apriori (Agrawal & Srikant, 1994) Apriori
1. frequentk-1k>1
-
49
2. (1) k-2 frequentk-1 itemsetk 3. (2) itemsetk itemsetk-1 (1) itemsetk
4. (3) itemsetk frequentk
5. frequentk 6. (1) frequentk+1 (4) itemsetk
frequentkTsay Chang-Chien(2004) Apriori itemsetkk1 k CDAR
(1) I={i1, i2, , ia} a (2) J={j1, j2, , jb} b (3) T={T1, T2, , Tk, Tm} m Tk
k 1km (4) Tk Tk=[X, Y]XIX
YJY X Y X1 or X2 oror Xd d Y [X1, Y], [X2, Y],, [Xd, Y] d
andor A and B or CABC A and B C
andand A and B AB
-
50
Apriori
Apriori AKW(mining association rules between keywords and websites)
AKW 1. D1 frequent1 2. (1) frequent1 frequent1 itemset2 D1 itemset2 frequent2 D1 itemset2 3 D2(3)
3. frequentk-1k>2 Dk-1 4. (3) k-2 frequentk-1 itemsetk 5. (4) itemsetk itemsetk-1 (3)() itemsetk
6. Dk-1(5) itemsetk frequentk Dk-1 itemsetk k+1 Dk
7. frequentk XYXY=frequentkXIYJ
8. (3) frequentk+1 XY
X Y XY X Y n Yi1in Y1Y2Yn
Apriori itemsetkk>1 frequentk itemsetk k+1
-
51
itemsetk+1 frequentk+1 AKW Apriori
1 5 {T1, T2, T3, T4, T5} I={Q, R, S, T, U, V, W, X, Y, Z }J={A, B, C, D, E, F, G, H, I. J} 60%( 3) 60%
1
T1 V, W, X, Y, Z A, B, C, G
T2 S, Q, R, Z A, C, E, F
T3 Q, S, T, X H, I, J
T4 R, U, W, X, Y, Z C, D, F, H, I
T5 Q, T, V, Z A, B, C, F, G, J
frequent1
itemset1 itemset1 itemset1 frequent1
A Q A 3 Q 3 A 3
B R B 2 R 2 C 4
C S C 4 S 2 F 3
D T D 1 T 2 Q 3
E U
D1 E 1 U 1
3
X 3
F V F 3 V 2 Z 4
G W G 2 W 2
H X H 2 X 3
I Y I 2 Y 2
J Z J 2 Z 4
-
52
D2 itemset2 frequent2
AQ 2 AZ 3 T1 V, W, X, Y, Z A, B, C, G
AX 1 CX 3 T2 S, Q, R, Z A, C, E, F
AZ 3 CZ 4 T4 R, U, W, X, Y, Z C, D, F, H, I
CQ 2
3
FZ 3 T5 Q, T, V, Z A, B, C, F, G, J
CX 3
CZ 4
itemset2
FQ 2
FX 1
FZ 3 D3
itemset3 frequent2
ACZ 3 ACZ 3 T1 V, W, X, Y, Z A, B, C, G
AFZ 2 CFZ 3 T2 S, Q, R, Z A, C, E, F
CXZ 2 T4 R, U, W, X, Y, Z C, D, F, H, I
CFZ 3
3
itemset3
T5 Q, T, V, Z A, B, C, F, G, J
itemset4 ACFZ(5) ACZ- ZAC 3/4=75%
Z A C
andorX and Y or ZX, Y, Z X and YZ
andandX and YXY
-
53
XX
XYXIYJXY
X Y Y X ii1 AKW AAK (mining association rules those antecedents are the query keywords) X
1. D1 frequent1 X D2
2. (1) X frequent1 itemseti+1 D2itemseti+1 frequenti+1 D2 itemseti+1 i+2 D3
3. frequenti+k-1k>1 Dk+1 4. (3) i+k-2 frequenti+k-1 itemseti+k 5. (4) itemseti+k itemseti+k-1
(3)( X) itemseti+k 6. Dk+1(5) itemseti+k frequentk Dk+1 itemseti+k i+k+1 Dk+2
7. frequentk XYXY=frequentkYJ
8. (3) frequenti+k+1 XY
X Y XY X Y n Yj1jn Y1Y2Yn
X i X frequent1 X itemseti+kk>1 frequenti+k itemseti+k i+k+1 AAK
-
54
X
1 Z 60%( 3)60%Z
itemset1 itemset1 frequent1
A A 3 A 3
B B 2 C 4
C C 4 F 3
D D 1
E
D1
E 1
3
F F 3
G G 2
H H 2
I I 2
J J 2
D 2
itemset2
T1 V, W, X , Y, Z A, B , C, G AZ 3
T2 S, Q, R, Z A, C, E, F CZ 3
T4 R, U , W, X , Y, Z C, D, F, H , I FZ 3
Z
T5 Q, T, V, Z A, B , C, F, G, J
D 3
frequent2
AZ 3 T1 V, W, X, Y, Z A, B, C, G
CZ 3 T2 S, Q, R, Z A, C, E, F
FZ 3 T4 R, U, W, X, Y, Z C, D, F, H, I
3
itemset3
T5 Q, T, V, Z A, B, C, F, G, J
D4 itemset3 frequent3
ACZ 3 ACZ 3 T1 V, W, X, Y, Z A, B, C, G
AFZ 2 CFZ 3 T2 S, Q, R, Z A, C, E, F
CFZ 3 T4 R, U, W, X, Y, Z C, D, F, H, I
3
itemset3
T5 Q, T, V, Z A, B, C, F, G, J
itemset4 ACFZ(5)
-
55
ACZ- ZAC 3/4=75% Z A C X
1
(1) (2)
(2)
(1)
1
C# IBM Data Mining (http://www.almaden.ibm.com/) 50000 D1 D2 D1 D2 2n ntran np tl pl D1 D2 50000 D3
2
n ntran np tl pl
D 1 1000 50000 10000 10 4
D 2 1000 50000 10000 10 4
D 3 D 1 D 2 50000
D1 k1, k2, k3, , k1000 D2
-
56
w1, w2, w3,, w1000 Access 2003 D3
2
( 1.5%)( 70%)
2
3 ( k1 and k5 )
3
2 3
3
3
CPU CPU-Pentium IV 2.4G Hz
Main memory 256 Mbytes
Windows XP Professional
C#
Access 2003
70% Apriori
-
57
CDAR AKW 4 4 AKW Apriori CDAR
:50000
0
20
40
60
80
100
120
0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
AKW
CDAR
Apriori
4 Apriori CDAR AKW
70% AKW AAK () 10 5 5 AAK AKW
:50000
0
10
20
30
40
50
60
70
80
90
0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
0
2
4
6
8
10
12
14
16
18
AKW
AAK
5 AKW AAK
-
58
4 Apriori CDAR Apriori CDAR
4 Apriori CDAR AKW AAK
itemsetk
k>1
k itemsetkk>1
itemsetk
k>1 itemsetk
k+1
itemseti+k i
k>1
itemseti+k i+k+1
1. 2. Huang, Lee Lin(2001)(feedback)
3. 4.
-
59
(2002)
(2007)
(2002)
(2002)
(2002)
(2002)
(2007)
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large database. Proceedings of the 20th International Conference on Very Large Data Bases, 487-499.
Agrawal, R., Imielinski, T., & Swami, A.(1993). Mining association rules between sets of items in very large ddatabase. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
Chen, M. S., Han, J., & Yu, P. S. (1996). Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6), 866-883.
Fonseca, B. M., Golgher, P. B., de Moura, E. S. & Ziviani, N. (2003). Using association rules to discover search engines related queries. Proceedings of the First Latin American Web Congress, 66-71.
Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques (2nd Ed.). Morgan Kaufmann. Han, J., Pei, J., Yin, Y., & Mao, R. (2004). Mining frequent patterns without candidate generation: A
frequent-pattern tree approach. Data Mining and Knowledge Discovery, 8(1), 53-87. Holt, J. D., & Chung, S. M. (2000). Mining association rules using inverted hashing and pruning.
Information Processing Letters, 83, 211-220. Hosseini, M., & Abolhassani, H. (2007). Mining search engine query log for evaluating content and
structure of a web site. Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, 235-241.
Huang, Y. P., Lee, Y. C., & Lin, K. (2001). An intelligent approach to mining the related websites. Proceedings of the International Conference on IFSA World Congress and 20th NAFIPS, 1, 435-440.
-
60
Li, Y., & Li, G. Y. (2007). Research and realization of personalized search engine based on ontology. Proceedings of the International Conference on Network and Parallel Computing Workshops, 1016-1020.
Li, Y., Chen, X. Z., & Yang, B. R. (2002). Research on web mining-based intelligent search engine. Proceedings of the First International Conference on Machine Learning and Cybernetics, 1, 386-390.
Li, Z. C., He, P. L., & Lei, M. (2005). A high efficient aprioriTid algorithm for mining association rule. Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, 1812-1815.
Nettleton, D. F., Calderon-Benavides, L., & Baeza-ates, R. (2006). Analysis of web search engine clicked documents. Proceedings of the Fourth Latin American Web Congress, 209-219.
Tsay, Y. J., & Chang-Chien, Y. W. (2004). An efficient cluster and decomposition algorithm for mining association rules. Information Sciences, 160, 161-171.