Transcript
  • 2010, 28, 45-60

    99 3 17 99 6 30

    1 2

    (association rules)

    1 2

  • 46

    Using Association Rules to Construct Website Recommendations

    of Query Keywords

    Chui-cheng Chen* Tsung-yi Chen**

    Abstract

    The website search engine is one of the most popular services in electronic commerce business models. This paper uses browsing data as the source data of mining, and a browsing data contains query keywords and browse websites in search engine. The association rule is used to find the adaptive website recommendations of query keywords from two aspects. One is to propose a fast algorithm to mine association rules between query keywords and browse websites. The other is to use some query keywords as the target of mining, and to modify the previous algorithm to mine association rules as those antecedents are the query keywords. The adaptive website recommendations with rank by the confidence of the association rules can be provided for the query keywords in search engine. A mining system for adaptive website recommendations of query keywords is designed and constructed according to both algorithms, and the performances of both algorithms are evaluated.

    Keywords: electronic commerce, association rules, query keywords, search engine, website

    recommendations

    * Associate Professor, Department of Information Management, Southern Taiwan University. ** Assistant Professor, Department of Electronic Commerce Management, Nanhua University.

  • 47

    (Internet)(World Wide Web)(search engine) Yahoo(http://www.yahoo.com) Google (http://www.google.com)

    (data mining)

    (Han & Kamber, 2006)

    (association rules)

    1.

    2.

    (sequence)(clustering)(classification)(forecasting)(Han & Kamber, 2006Chen, Han & Yu, 1996)

  • 48

    (2007)(2002)

    (2002)(2002)(2002)(2002)(2007)Hosseini Abolhassani(2007)Huang, Lee Lin(2001)

    Li, ChenYang(2002)Fonseca, Golgher, de Moura, Ziviani(2003)Li, Y. Li, G. Y.(2007) (Ontology)Nettleton, Calderon-Benavides Baeza-ates(2006)

    Agrawal, Imielinski Swami (1993)

    (itemsets)(items) XY XY XY= X (antecedent)Y(consequent) XY (support)(XY)(confidence)(XY) X (minimum support)(minimum confidence) XY

    k k1 k-(frequent temsets) k k- frequentk ABC ABC Agrawal Srikant(1994)Han, Pei, Yin Mao(2004)Holt Chung(2000)Li, He Lei(2005)Tsay Chang- Chien(2004)

    Apriori (Agrawal & Srikant, 1994) Apriori

    1. frequentk-1k>1

  • 49

    2. (1) k-2 frequentk-1 itemsetk 3. (2) itemsetk itemsetk-1 (1) itemsetk

    4. (3) itemsetk frequentk

    5. frequentk 6. (1) frequentk+1 (4) itemsetk

    frequentkTsay Chang-Chien(2004) Apriori itemsetkk1 k CDAR

    (1) I={i1, i2, , ia} a (2) J={j1, j2, , jb} b (3) T={T1, T2, , Tk, Tm} m Tk

    k 1km (4) Tk Tk=[X, Y]XIX

    YJY X Y X1 or X2 oror Xd d Y [X1, Y], [X2, Y],, [Xd, Y] d

    andor A and B or CABC A and B C

    andand A and B AB

  • 50

    Apriori

    Apriori AKW(mining association rules between keywords and websites)

    AKW 1. D1 frequent1 2. (1) frequent1 frequent1 itemset2 D1 itemset2 frequent2 D1 itemset2 3 D2(3)

    3. frequentk-1k>2 Dk-1 4. (3) k-2 frequentk-1 itemsetk 5. (4) itemsetk itemsetk-1 (3)() itemsetk

    6. Dk-1(5) itemsetk frequentk Dk-1 itemsetk k+1 Dk

    7. frequentk XYXY=frequentkXIYJ

    8. (3) frequentk+1 XY

    X Y XY X Y n Yi1in Y1Y2Yn

    Apriori itemsetkk>1 frequentk itemsetk k+1

  • 51

    itemsetk+1 frequentk+1 AKW Apriori

    1 5 {T1, T2, T3, T4, T5} I={Q, R, S, T, U, V, W, X, Y, Z }J={A, B, C, D, E, F, G, H, I. J} 60%( 3) 60%

    1

    T1 V, W, X, Y, Z A, B, C, G

    T2 S, Q, R, Z A, C, E, F

    T3 Q, S, T, X H, I, J

    T4 R, U, W, X, Y, Z C, D, F, H, I

    T5 Q, T, V, Z A, B, C, F, G, J

    frequent1

    itemset1 itemset1 itemset1 frequent1

    A Q A 3 Q 3 A 3

    B R B 2 R 2 C 4

    C S C 4 S 2 F 3

    D T D 1 T 2 Q 3

    E U

    D1 E 1 U 1

    3

    X 3

    F V F 3 V 2 Z 4

    G W G 2 W 2

    H X H 2 X 3

    I Y I 2 Y 2

    J Z J 2 Z 4

  • 52

    D2 itemset2 frequent2

    AQ 2 AZ 3 T1 V, W, X, Y, Z A, B, C, G

    AX 1 CX 3 T2 S, Q, R, Z A, C, E, F

    AZ 3 CZ 4 T4 R, U, W, X, Y, Z C, D, F, H, I

    CQ 2

    3

    FZ 3 T5 Q, T, V, Z A, B, C, F, G, J

    CX 3

    CZ 4

    itemset2

    FQ 2

    FX 1

    FZ 3 D3

    itemset3 frequent2

    ACZ 3 ACZ 3 T1 V, W, X, Y, Z A, B, C, G

    AFZ 2 CFZ 3 T2 S, Q, R, Z A, C, E, F

    CXZ 2 T4 R, U, W, X, Y, Z C, D, F, H, I

    CFZ 3

    3

    itemset3

    T5 Q, T, V, Z A, B, C, F, G, J

    itemset4 ACFZ(5) ACZ- ZAC 3/4=75%

    Z A C

    andorX and Y or ZX, Y, Z X and YZ

    andandX and YXY

  • 53

    XX

    XYXIYJXY

    X Y Y X ii1 AKW AAK (mining association rules those antecedents are the query keywords) X

    1. D1 frequent1 X D2

    2. (1) X frequent1 itemseti+1 D2itemseti+1 frequenti+1 D2 itemseti+1 i+2 D3

    3. frequenti+k-1k>1 Dk+1 4. (3) i+k-2 frequenti+k-1 itemseti+k 5. (4) itemseti+k itemseti+k-1

    (3)( X) itemseti+k 6. Dk+1(5) itemseti+k frequentk Dk+1 itemseti+k i+k+1 Dk+2

    7. frequentk XYXY=frequentkYJ

    8. (3) frequenti+k+1 XY

    X Y XY X Y n Yj1jn Y1Y2Yn

    X i X frequent1 X itemseti+kk>1 frequenti+k itemseti+k i+k+1 AAK

  • 54

    X

    1 Z 60%( 3)60%Z

    itemset1 itemset1 frequent1

    A A 3 A 3

    B B 2 C 4

    C C 4 F 3

    D D 1

    E

    D1

    E 1

    3

    F F 3

    G G 2

    H H 2

    I I 2

    J J 2

    D 2

    itemset2

    T1 V, W, X , Y, Z A, B , C, G AZ 3

    T2 S, Q, R, Z A, C, E, F CZ 3

    T4 R, U , W, X , Y, Z C, D, F, H , I FZ 3

    Z

    T5 Q, T, V, Z A, B , C, F, G, J

    D 3

    frequent2

    AZ 3 T1 V, W, X, Y, Z A, B, C, G

    CZ 3 T2 S, Q, R, Z A, C, E, F

    FZ 3 T4 R, U, W, X, Y, Z C, D, F, H, I

    3

    itemset3

    T5 Q, T, V, Z A, B, C, F, G, J

    D4 itemset3 frequent3

    ACZ 3 ACZ 3 T1 V, W, X, Y, Z A, B, C, G

    AFZ 2 CFZ 3 T2 S, Q, R, Z A, C, E, F

    CFZ 3 T4 R, U, W, X, Y, Z C, D, F, H, I

    3

    itemset3

    T5 Q, T, V, Z A, B, C, F, G, J

    itemset4 ACFZ(5)

  • 55

    ACZ- ZAC 3/4=75% Z A C X

    1

    (1) (2)

    (2)

    (1)

    1

    C# IBM Data Mining (http://www.almaden.ibm.com/) 50000 D1 D2 D1 D2 2n ntran np tl pl D1 D2 50000 D3

    2

    n ntran np tl pl

    D 1 1000 50000 10000 10 4

    D 2 1000 50000 10000 10 4

    D 3 D 1 D 2 50000

    D1 k1, k2, k3, , k1000 D2

  • 56

    w1, w2, w3,, w1000 Access 2003 D3

    2

    ( 1.5%)( 70%)

    2

    3 ( k1 and k5 )

    3

    2 3

    3

    3

    CPU CPU-Pentium IV 2.4G Hz

    Main memory 256 Mbytes

    Windows XP Professional

    C#

    Access 2003

    70% Apriori

  • 57

    CDAR AKW 4 4 AKW Apriori CDAR

    :50000

    0

    20

    40

    60

    80

    100

    120

    0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05

    AKW

    CDAR

    Apriori

    4 Apriori CDAR AKW

    70% AKW AAK () 10 5 5 AAK AKW

    :50000

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    AKW

    AAK

    5 AKW AAK

  • 58

    4 Apriori CDAR Apriori CDAR

    4 Apriori CDAR AKW AAK

    itemsetk

    k>1

    k itemsetkk>1

    itemsetk

    k>1 itemsetk

    k+1

    itemseti+k i

    k>1

    itemseti+k i+k+1

    1. 2. Huang, Lee Lin(2001)(feedback)

    3. 4.

  • 59

    (2002)

    (2007)

    (2002)

    (2002)

    (2002)

    (2002)

    (2007)

    Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large database. Proceedings of the 20th International Conference on Very Large Data Bases, 487-499.

    Agrawal, R., Imielinski, T., & Swami, A.(1993). Mining association rules between sets of items in very large ddatabase. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.

    Chen, M. S., Han, J., & Yu, P. S. (1996). Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6), 866-883.

    Fonseca, B. M., Golgher, P. B., de Moura, E. S. & Ziviani, N. (2003). Using association rules to discover search engines related queries. Proceedings of the First Latin American Web Congress, 66-71.

    Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques (2nd Ed.). Morgan Kaufmann. Han, J., Pei, J., Yin, Y., & Mao, R. (2004). Mining frequent patterns without candidate generation: A

    frequent-pattern tree approach. Data Mining and Knowledge Discovery, 8(1), 53-87. Holt, J. D., & Chung, S. M. (2000). Mining association rules using inverted hashing and pruning.

    Information Processing Letters, 83, 211-220. Hosseini, M., & Abolhassani, H. (2007). Mining search engine query log for evaluating content and

    structure of a web site. Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, 235-241.

    Huang, Y. P., Lee, Y. C., & Lin, K. (2001). An intelligent approach to mining the related websites. Proceedings of the International Conference on IFSA World Congress and 20th NAFIPS, 1, 435-440.

  • 60

    Li, Y., & Li, G. Y. (2007). Research and realization of personalized search engine based on ontology. Proceedings of the International Conference on Network and Parallel Computing Workshops, 1016-1020.

    Li, Y., Chen, X. Z., & Yang, B. R. (2002). Research on web mining-based intelligent search engine. Proceedings of the First International Conference on Machine Learning and Cybernetics, 1, 386-390.

    Li, Z. C., He, P. L., & Lei, M. (2005). A high efficient aprioriTid algorithm for mining association rule. Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, 1812-1815.

    Nettleton, D. F., Calderon-Benavides, L., & Baeza-ates, R. (2006). Analysis of web search engine clicked documents. Proceedings of the Fourth Latin American Web Congress, 209-219.

    Tsay, Y. J., & Chang-Chien, Y. W. (2004). An efficient cluster and decomposition algorithm for mining association rules. Information Sciences, 160, 161-171.


Top Related