利用關聯規則建構查詢關鍵字之網站推薦ir.lib.nknu.edu.tw/ir/retrieve/18234/利用關聯規則建構查詢關鍵字之網站... ·...

16
ϲ 高雄師大學報 2010, 28, 45-60 投稿日期:民國 99 3 17 日;接受刊登日期 99 6 30 利用關聯規則建構查詢關鍵字之網站推薦 陳垂呈 1 陳宗義 2 網站搜尋引擎是電子商務經營模式中最受歡迎的服務之一,本研究以搜尋引擎中 查詢關鍵字與點選網站做為探勘的資料來源,利用關聯規則(association rules)從以下 兩方面找出查詢關鍵字適性化的網站推薦:一是提出一個快速探勘查詢關鍵字,與點 選網站之間關聯規則的演算法;二是以某查詢關鍵字為探勘目標,文中修改前面的演 算法,探勘前置項目組為此查詢關鍵字之關聯規則。藉由以上關聯規則所顯示出的點 選傾向特徵,當搜尋引擎中輸入關鍵字搜尋網站時,可依據以上關聯規則的信賴度, 提供其適性化的網站推薦及排名順序。文中根據所提出的兩個演算法,設計與建置一 個查詢關鍵字適性化的網站推薦系統,並實驗評估所提出之演算法的執行效能。 關鍵字:電子商務、關聯規則、查詢關鍵字、搜尋引擎、網站推薦 1 南台科技大學資訊管理系副教授。 2 南華大學電子商務管理系助理教授。

Upload: others

Post on 30-Aug-2019

5 views

Category:

Documents


0 download

TRANSCRIPT

  • 2010, 28, 45-60

    99 3 17 99 6 30

    1 2

    (association rules)

    1 2

  • 46

    Using Association Rules to Construct Website Recommendations

    of Query Keywords

    Chui-cheng Chen* Tsung-yi Chen**

    Abstract

    The website search engine is one of the most popular services in electronic commerce business models. This paper uses browsing data as the source data of mining, and a browsing data contains query keywords and browse websites in search engine. The association rule is used to find the adaptive website recommendations of query keywords from two aspects. One is to propose a fast algorithm to mine association rules between query keywords and browse websites. The other is to use some query keywords as the target of mining, and to modify the previous algorithm to mine association rules as those antecedents are the query keywords. The adaptive website recommendations with rank by the confidence of the association rules can be provided for the query keywords in search engine. A mining system for adaptive website recommendations of query keywords is designed and constructed according to both algorithms, and the performances of both algorithms are evaluated.

    Keywords: electronic commerce, association rules, query keywords, search engine, website

    recommendations

    * Associate Professor, Department of Information Management, Southern Taiwan University. ** Assistant Professor, Department of Electronic Commerce Management, Nanhua University.

  • 47

    (Internet)(World Wide Web)(search engine) Yahoo(http://www.yahoo.com) Google (http://www.google.com)

    (data mining)

    (Han & Kamber, 2006)

    (association rules)

    1.

    2.

    (sequence)(clustering)(classification)(forecasting)(Han & Kamber, 2006Chen, Han & Yu, 1996)

  • 48

    (2007)(2002)

    (2002)(2002)(2002)(2002)(2007)Hosseini Abolhassani(2007)Huang, Lee Lin(2001)

    Li, ChenYang(2002)Fonseca, Golgher, de Moura, Ziviani(2003)Li, Y. Li, G. Y.(2007) (Ontology)Nettleton, Calderon-Benavides Baeza-ates(2006)

    Agrawal, Imielinski Swami (1993)

    (itemsets)(items) XY XY XY= X (antecedent)Y(consequent) XY (support)(XY)(confidence)(XY) X (minimum support)(minimum confidence) XY

    k k1 k-(frequent temsets) k k- frequentk ABC ABC Agrawal Srikant(1994)Han, Pei, Yin Mao(2004)Holt Chung(2000)Li, He Lei(2005)Tsay Chang- Chien(2004)

    Apriori (Agrawal & Srikant, 1994) Apriori

    1. frequentk-1k>1

  • 49

    2. (1) k-2 frequentk-1 itemsetk 3. (2) itemsetk itemsetk-1 (1) itemsetk

    4. (3) itemsetk frequentk

    5. frequentk 6. (1) frequentk+1 (4) itemsetk

    frequentkTsay Chang-Chien(2004) Apriori itemsetkk1 k CDAR

    (1) I={i1, i2, , ia} a (2) J={j1, j2, , jb} b (3) T={T1, T2, , Tk, Tm} m Tk

    k 1km (4) Tk Tk=[X, Y]XIX

    YJY X Y X1 or X2 oror Xd d Y [X1, Y], [X2, Y],, [Xd, Y] d

    andor A and B or CABC A and B C

    andand A and B AB

  • 50

    Apriori

    Apriori AKW(mining association rules between keywords and websites)

    AKW 1. D1 frequent1 2. (1) frequent1 frequent1 itemset2 D1 itemset2 frequent2 D1 itemset2 3 D2(3)

    3. frequentk-1k>2 Dk-1 4. (3) k-2 frequentk-1 itemsetk 5. (4) itemsetk itemsetk-1 (3)() itemsetk

    6. Dk-1(5) itemsetk frequentk Dk-1 itemsetk k+1 Dk

    7. frequentk XYXY=frequentkXIYJ

    8. (3) frequentk+1 XY

    X Y XY X Y n Yi1in Y1Y2Yn

    Apriori itemsetkk>1 frequentk itemsetk k+1

  • 51

    itemsetk+1 frequentk+1 AKW Apriori

    1 5 {T1, T2, T3, T4, T5} I={Q, R, S, T, U, V, W, X, Y, Z }J={A, B, C, D, E, F, G, H, I. J} 60%( 3) 60%

    1

    T1 V, W, X, Y, Z A, B, C, G

    T2 S, Q, R, Z A, C, E, F

    T3 Q, S, T, X H, I, J

    T4 R, U, W, X, Y, Z C, D, F, H, I

    T5 Q, T, V, Z A, B, C, F, G, J

    frequent1

    itemset1 itemset1 itemset1 frequent1

    A Q A 3 Q 3 A 3

    B R B 2 R 2 C 4

    C S C 4 S 2 F 3

    D T D 1 T 2 Q 3

    E U

    D1 E 1 U 1

    3

    X 3

    F V F 3 V 2 Z 4

    G W G 2 W 2

    H X H 2 X 3

    I Y I 2 Y 2

    J Z J 2 Z 4

  • 52

    D2 itemset2 frequent2

    AQ 2 AZ 3 T1 V, W, X, Y, Z A, B, C, G

    AX 1 CX 3 T2 S, Q, R, Z A, C, E, F

    AZ 3 CZ 4 T4 R, U, W, X, Y, Z C, D, F, H, I

    CQ 2

    3

    FZ 3 T5 Q, T, V, Z A, B, C, F, G, J

    CX 3

    CZ 4

    itemset2

    FQ 2

    FX 1

    FZ 3 D3

    itemset3 frequent2

    ACZ 3 ACZ 3 T1 V, W, X, Y, Z A, B, C, G

    AFZ 2 CFZ 3 T2 S, Q, R, Z A, C, E, F

    CXZ 2 T4 R, U, W, X, Y, Z C, D, F, H, I

    CFZ 3

    3

    itemset3

    T5 Q, T, V, Z A, B, C, F, G, J

    itemset4 ACFZ(5) ACZ- ZAC 3/4=75%

    Z A C

    andorX and Y or ZX, Y, Z X and YZ

    andandX and YXY

  • 53

    XX

    XYXIYJXY

    X Y Y X ii1 AKW AAK (mining association rules those antecedents are the query keywords) X

    1. D1 frequent1 X D2

    2. (1) X frequent1 itemseti+1 D2itemseti+1 frequenti+1 D2 itemseti+1 i+2 D3

    3. frequenti+k-1k>1 Dk+1 4. (3) i+k-2 frequenti+k-1 itemseti+k 5. (4) itemseti+k itemseti+k-1

    (3)( X) itemseti+k 6. Dk+1(5) itemseti+k frequentk Dk+1 itemseti+k i+k+1 Dk+2

    7. frequentk XYXY=frequentkYJ

    8. (3) frequenti+k+1 XY

    X Y XY X Y n Yj1jn Y1Y2Yn

    X i X frequent1 X itemseti+kk>1 frequenti+k itemseti+k i+k+1 AAK

  • 54

    X

    1 Z 60%( 3)60%Z

    itemset1 itemset1 frequent1

    A A 3 A 3

    B B 2 C 4

    C C 4 F 3

    D D 1

    E

    D1

    E 1

    3

    F F 3

    G G 2

    H H 2

    I I 2

    J J 2

    D 2

    itemset2

    T1 V, W, X , Y, Z A, B , C, G AZ 3

    T2 S, Q, R, Z A, C, E, F CZ 3

    T4 R, U , W, X , Y, Z C, D, F, H , I FZ 3

    Z

    T5 Q, T, V, Z A, B , C, F, G, J

    D 3

    frequent2

    AZ 3 T1 V, W, X, Y, Z A, B, C, G

    CZ 3 T2 S, Q, R, Z A, C, E, F

    FZ 3 T4 R, U, W, X, Y, Z C, D, F, H, I

    3

    itemset3

    T5 Q, T, V, Z A, B, C, F, G, J

    D4 itemset3 frequent3

    ACZ 3 ACZ 3 T1 V, W, X, Y, Z A, B, C, G

    AFZ 2 CFZ 3 T2 S, Q, R, Z A, C, E, F

    CFZ 3 T4 R, U, W, X, Y, Z C, D, F, H, I

    3

    itemset3

    T5 Q, T, V, Z A, B, C, F, G, J

    itemset4 ACFZ(5)

  • 55

    ACZ- ZAC 3/4=75% Z A C X

    1

    (1) (2)

    (2)

    (1)

    1

    C# IBM Data Mining (http://www.almaden.ibm.com/) 50000 D1 D2 D1 D2 2n ntran np tl pl D1 D2 50000 D3

    2

    n ntran np tl pl

    D 1 1000 50000 10000 10 4

    D 2 1000 50000 10000 10 4

    D 3 D 1 D 2 50000

    D1 k1, k2, k3, , k1000 D2

  • 56

    w1, w2, w3,, w1000 Access 2003 D3

    2

    ( 1.5%)( 70%)

    2

    3 ( k1 and k5 )

    3

    2 3

    3

    3

    CPU CPU-Pentium IV 2.4G Hz

    Main memory 256 Mbytes

    Windows XP Professional

    C#

    Access 2003

    70% Apriori

  • 57

    CDAR AKW 4 4 AKW Apriori CDAR

    :50000

    0

    20

    40

    60

    80

    100

    120

    0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05

    AKW

    CDAR

    Apriori

    4 Apriori CDAR AKW

    70% AKW AAK () 10 5 5 AAK AKW

    :50000

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    AKW

    AAK

    5 AKW AAK

  • 58

    4 Apriori CDAR Apriori CDAR

    4 Apriori CDAR AKW AAK

    itemsetk

    k>1

    k itemsetkk>1

    itemsetk

    k>1 itemsetk

    k+1

    itemseti+k i

    k>1

    itemseti+k i+k+1

    1. 2. Huang, Lee Lin(2001)(feedback)

    3. 4.

  • 59

    (2002)

    (2007)

    (2002)

    (2002)

    (2002)

    (2002)

    (2007)

    Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large database. Proceedings of the 20th International Conference on Very Large Data Bases, 487-499.

    Agrawal, R., Imielinski, T., & Swami, A.(1993). Mining association rules between sets of items in very large ddatabase. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.

    Chen, M. S., Han, J., & Yu, P. S. (1996). Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6), 866-883.

    Fonseca, B. M., Golgher, P. B., de Moura, E. S. & Ziviani, N. (2003). Using association rules to discover search engines related queries. Proceedings of the First Latin American Web Congress, 66-71.

    Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques (2nd Ed.). Morgan Kaufmann. Han, J., Pei, J., Yin, Y., & Mao, R. (2004). Mining frequent patterns without candidate generation: A

    frequent-pattern tree approach. Data Mining and Knowledge Discovery, 8(1), 53-87. Holt, J. D., & Chung, S. M. (2000). Mining association rules using inverted hashing and pruning.

    Information Processing Letters, 83, 211-220. Hosseini, M., & Abolhassani, H. (2007). Mining search engine query log for evaluating content and

    structure of a web site. Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, 235-241.

    Huang, Y. P., Lee, Y. C., & Lin, K. (2001). An intelligent approach to mining the related websites. Proceedings of the International Conference on IFSA World Congress and 20th NAFIPS, 1, 435-440.

  • 60

    Li, Y., & Li, G. Y. (2007). Research and realization of personalized search engine based on ontology. Proceedings of the International Conference on Network and Parallel Computing Workshops, 1016-1020.

    Li, Y., Chen, X. Z., & Yang, B. R. (2002). Research on web mining-based intelligent search engine. Proceedings of the First International Conference on Machine Learning and Cybernetics, 1, 386-390.

    Li, Z. C., He, P. L., & Lei, M. (2005). A high efficient aprioriTid algorithm for mining association rule. Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, 1812-1815.

    Nettleton, D. F., Calderon-Benavides, L., & Baeza-ates, R. (2006). Analysis of web search engine clicked documents. Proceedings of the Fourth Latin American Web Congress, 209-219.

    Tsay, Y. J., & Chang-Chien, Y. W. (2004). An efficient cluster and decomposition algorithm for mining association rules. Information Sciences, 160, 161-171.