recommender by time shift

7/29/2019 Recommender By Time Shift

1/6

Detection of Preference Shift Timing

using Time-Series Clustering

Fuyuko Ito, Tomoyuki Hiroyasu, Mitsunori Miki and Hisatake Yokouchi

Abstract Recommendation methods help online users topurchase products more easily by presenting products thatare likely to match their preferences. In these methods, userprofiles are constructed according to past activities on the site.When a user accesses an e-commerce site, the user preferencesmay change during the course of web shopping. We calledthis a preference shift in this paper. However, conventionalrecommendation methods suppose that user profiles are static,and therefore these methods cannot follow the preference shift.Here, a novel product recommendation method is proposed,which responds to the preference shift. With use of thisrecommendation method, the users remain at the site longerthan before. This paper discusses the detection method for

finding the preference shift timing using time-series clustering.In the proposed method, the products preferred by a user areclustered and the preference shift timing is detected as thechange in the clustering results.

I. INTRODUCTION

There is increasing demand for e-commerce sites because

they allow larger numbers of products to be presented than

physical stores, and provide vendors with increased sales

opportunities and greater choice for consumers. Recom-

mendation methods used in these services extract a users

profile based on the users activity and present information

that suits the obtained profile. For example, Amazon.com1

has attempted to increase sales opportunities by presentingproducts that are likely to be purchased by the user based on

their purchase history.

However, the users preference may change during shop-

ping on the web, a situation that we refer to here as a

preference shift. Conventional recommendation methods

cannot follow the preference shift because these methods

assume that user profiles are static. Moreover, increasing

the time a user spends on the e-commerce site can increase

sales opportunities. To lead users to remain at a site longer

than before, it is necessary to update the users preferences

constantly and be able to induce a preference shift by

presenting certain products because a users purchase can be

changed by visual priming of the e-commerce site[9], [11],

[12].

In this paper, a method to detect the timing of the

preference shift using time-series clustering is discussed. In

the proposed method, the products preferred by a user are

Fuyuko Ito is with the Graduate School of Enigineering, DoshishaUniversity, Kyoto, Japan and she is the research fellow of the Japan societyfor the promotion of science. e-mail: [email protected]

Tomoyuki Hiroyasu and Hisatake Yokouchi are with the Department ofLife and Medical Sciences, Doshisha University.

Mitsunori Miki is with the Department of Science and Engineering,Doshisha University.

1http://amazon.com/

clustered and the timing of the preference shift is determined

from changes in the features of obtained clusters. The outline

of this paper is as follows. The next section describes the

user preference model in the proposed method. In section

III, the preference shift is defined and we discuss ways of

detecting the timing of the preference shift. In section IV, we

discuss an experiment performed to investigate which cluster

features can be used to determine the preference shift timing

with some artificial data. Finally, we present our conclusions

in section V.

II. USE RS PREFERENCE MODEL FOR PRODUCTRECOMMENDATION

A. Preference Model in Conventional Recommendation

Methods

In general, recommendation methods are classified into

the following three types[1]: collaborative filtering[7], [19],

content-based filtering, and hybrid approaches. In content-

based filtering, it is necessary to build a model of a users

profile based on the preference information acquired from

his/her purchase history. First, a target product is represented

as a vector consisting of a number of features. There are

several approaches to model a users preference in the

feature space, i.e., detecting preferred regions in the featurespace and representing suitability according to the users

preference as the fitness function. In the former approach,

preferred regions are detected based on the users preferred

products and the suitability of a given product according to

this preference is predicted by the similarity between that

product and another preferred product. On the other hand, the

selection of products to present is optimized by maximization

of fitness. The input of the fitness function of the preference

is a product and the output is the fitness of the product for the

preference. However, the fitness function as the preference

model is not known a priori. Therefore, some methods opti-

mize product presentation by predicting the fitness function

interactively based on the users preferences[2], [10], [20]. In

this study, the preference in the e-commerce site was defined

as the tendency toward the users ideal product. Hence, the

preferred regions in the product feature space were identified

interactively based on the products preferred by the user.

B. Detection of Users Preference in Feature Space using

Clustering

In this study, the users preference is defined as a set of

preferred products, which are defined as those clicked by

the user as a metric of user interest. Meanwhile, a product

is described as a feature vector of the feature space. For

FUZZ-IEEE 2009, Korea, August 20-24, 2009

978-1-4244-3597-5/09/$25.00 2009 IEEE 1585


2/6

Fig. 2. Detect regions corresponding to users preference by clustering.

example, when clothes are the target products, a product is

represented as a combination of the values of various fea-

tures, such as color, material, sleeve length, etc. Nevertheless,

the users preference may have multiple tendencies (see Fig.

1). For this reason, a set of preferred products is clustered

in the feature space and multiple tendencies of the users

preferences can be obtained. The capability of this method

has already been confirmed in a subjective experiment in

which application of clustering to the preferred productswas shown to be able to acquire the multiple preferences

of the user[5], [6]. In that experiment, the multiple preferred

regions were specified in the feature space from the clustering

result and the products included in the specified regions were

presented (see Fig. 2). The results verified that the multiple

preferences can be obtained appropriately based on clustering

of the preferred products and the products that are presented

from the specified regions suited the users preference.

III. USE RS PREFERENCE SHIFT ON PRODUCT

RECOMMENDATION

A. Time-Series Clustering

The preference shift on product recommendation is defined

as the change in a users tendency regarding the ideal product

during web shopping. For example, a user may be looking

for a dress to wear to a party on an e-commerce site. She may

initially begin looking for a black dress, but notice vividly

colored dresses while shopping. She may then begin to search

for dresses that are pink, orange, green, blue, etc. If a dress

is represented as a vector consisting of two elements, color

and price, all dresses clicked by the user can be mapped

to the feature space as shown in Fig. 3.

Fig. 1. Each region corresponds to each preference.

Fig. 3. An example of the preference shift.

As mentioned above, the preference shift is represented as

the change in clustering result with clicking on a product.

Therefore, the clustering result of clicked dresses varies as

the search advances (see Fig. 3). However, the following

items must be considered when clustering is applied to the

time-series data.

How to select the data for clustering

How to suppress drastic changes in the clustering result

The phrase time-series clustering in this paper means

applying clustering to the data per unit of time as shown in

Fig. 3 and differs from the concept of the clustering of time-series data such as waves[17]. One of the simplest methods

of time-series clustering is the application of clustering to all

stored preferred data as the user clicks a product. However,

if a set of data is stored for a long time, the clustering result

may not be changed, although small amounts of new data

with different characteristics from most of the stored data

may be added. Therefore, it is necessary to select data for

clustering. Here, the sliding window technique was used and

a certain amount of the newest data is selected as the sample

data of the window.

Moreover, when clustering is applied to the stored data


1586


3/6

independently as the user clicks a product, it is possible that

the cluster structure obtained before clicking maybe changed

dramatically. Hence, the constraints of past clustering results

are added to clustering of the current data in the proposed

method to avoid drastic changes in the clustering result.

B. Detection of Preference Shift Timing

When the data of the preferred products are stored and allstored data are clustered, the clustering result is compared

with the former to find the preference shift timing in the

proposed method. Nevertheless, it is not known which feature

of the cluster we should compare to detect the preference

shift timing. For this reason, the features of the cluster are

discussed to judge when the clustering result has changed in

this paper.

IV. DISCUSSION OF CLUSTER FEATURES FOR

DETECTION OF THE PREFERENCE SHIFT TIMING

A. Experimental Overview

In this experiment, clustering was applied to the incremen-

tal time-series data of the preferred products and the features

of the cluster were examined to determine the preference shift

timing. The experimental data, the clustering algorithm, the

method for identification of the relevance between two states

of the same cluster, and the features of the cluster were as

described below.

1) Experimental Data: The feature space of the data is

a two-dimensional space and datum x is described as x =(x0, x1) when x0(0 x0 16) and x1(0 x1 16) arereal numbers. Three test data including 24 data (1 t 24)are generated by an agent implementing the following three

preferences. Each of the following preferences represents a

possible model of the preference shift on an e-commerce site.

Test data (1): Preference shift of a single preference

As shown in Fig. 4(a), the preferred region was set

as region (1) and region (2) in the first and second

halves of the search, respectively. First, twelve data

were generated randomly and uniformly in region (1)

and then an additional twelve data were generated in

region (2) in the same way. Therefore, the preference

shift timing of these test data was t = 13.

Test data (2): Preference shift of one of two preferences

In the first two thirds of the search, the preferred regions

were set as regions (1) and (2) (see Fig. 4(b)). In theremaining third, regions (2) and (3) were set as the

preferred regions. Thus, region (2) was preferred by the

agent for the whole search. First, the agent generated

eight and four data in regions (1) and (2), respec-

tively. The order of generation in these regions was

randomized. Then, four and eight data were generated

in regions (2) and (3), respectively. The preference shift

timing of these test data was t = 14 because the firstdata generated in region (3) appeared at t = 14.

Test data (3): Simultaneous preference shifts of two

preferences toward a new preference

Fig. 4. The agent generates data on the regions defined as a userspreference.

The preferred regions were set as region (1) and region

(2) in the first two thirds of the search, and eight data

were generated randomly in each region (see Fig. 4(c)).

In the remaining third, the preferred region was set

as region (3) and the last eight data were generated

randomly in region (3). Therefore, the preference shift

timing of this test data was t = 17.

Test data (4): Without preference shift

The agent randomly generated 24 data in region (1)

throughout the whole search. In this test data, the

preference did not change (see Fig. 4(d)).

2) Clustering Algorithm: The algorithm to detect com-

munities in a network, as proposed by Newman[14], was

employed and extended to handle the weighted network in

this experiment. This method is a hierarchical clustering

algorithm and can obtain an optimal division of nodes in

a network with a high density of within-cluster edges and

a lower density of between-cluster edges by maximizing

quality function modularity Q. Therefore, this method can

automatically determine the number of clusters. Here, a kksymmetric matrix e whose element eij is the number of all

edges that link nodes in cluster Ai to nodes in cluster Ajis defined. Then, ai =

j eij is calculated. Therefore, eii

indicates the number of edges that link nodes in cluster Aito nodes in the same cluster and ai describes the number of

all edges emerging from nodes in cluster Ai. Q is designed

to emphasize the connection within a cluster and diminish

the connection between clusters as shown in the following

equation.


1587


4/6

Fig. 5. dc(t) is the distance between the centroids of two clusters. dS(t)is the difference between the spaces of two clusters.

Q =

i

(eii a2

i ) (1)

In the proposed method, the clustering method mentioned

above is applied to the weighted network whose weight is the

degree of relevance between each of two products, whereas

the relevance between two nodes is described as the existence

of an edge in Newmans original method. Therefore, the

degree of relevance between two data, xi and xj , is defined

as the inverse of the distance between them in the feature

space, as shown in the following equation.

Similarity(xi,xj) =1

Distance(xi,xj)(2)

Meanwhile, the latest n data are utilized as samples of a

window for clustering. In this experiment, n, the number of

sample data for a window, was set to 9. The constraint of

past clustering result was not added in this experiment.

3) Identification of the Relevance between Two States of

the Same Cluster: To verify the time-series variation of

a certain cluster, it is necessary to identify which cluster

Aj(t0 + t) at t = t0 + t is most relevant to the clusterAi(t0) at t = t0. In this study, the similarity between two

clusters was computed by the auto-correlation function[16]as shown below. Moreover, |Ai(t0) Aj(t0 + t)| is thenumber of data in common between Ai(t0) and Aj(t0+t),and |Ai(t0)Aj(t0+t)| is the number of nodes in the unionof Ai(t0) and Aj(t0 + t). C Aij(t0 + t) is computed forall pairs of two clusters at t = t0 and t = t0 + t, andeach pair is defined as the same cluster in decreasing order

of similarity. Here, t is set as t = 1.

C Aij(t0 + t) |Ai(t0) Aj(t0 + t)|

|Ai(t0) Aj(t0 + t)|(3)

Fig. 6. Transitions of sum ofdc(t), sum of dS(t) and C(t) of test data(1).

4) Features of Clusters: The following features of the

cluster A(t) are discussed to find the preference shift timing

in this experiment. dc(t), dS(t) and C(t) are features of thecluster A(t) in transition from t 1 to t. The concepts ofdc(t) and dS(t) are shown in Fig. 5.

dc(t): Distance between the centroids of A(t 1) andA(t)

dS(t): Difference between the spaces occupied by thedata of A(t 1) and A(t)

C(t): Similarity of data between A(t 1) and A(t)

B. Experimental Results and Discussion

First, the time-series variation of each feature of clusters

of the test data (1), representing a preference shift of a singlepreference, is discussed. Transitions of the sum of dc(t),sum of dS(t), and C(t) are shown in Fig. 6. The horizontalaxes in Fig. 6 describe the time t. Meanwhile, clustering is

applied from t = 9 because the number of sample data forthe window is nine.

Figure 6(a) shows that dc(t) and dS(t) increased rapidlyat t = 13. The preference shift timing of test data (1) wasset at t = 13. Therefore, the variations of dc(t) and dS(t)may indicate the change in clustering result. On the other

hand, dc(t) and dS(t) were also increased at t = 20 becausethe data that suit the preference in early steps disappeared


1588


5/6

Fig. 7. Transitions of sum ofdc(t) and dS(t) of test data (2).

from the window. For this reason, the number of sample data

for a window should be discussed further in future studies.

Moreover, it is difficult to determine the preference shift

appropriately with C(t) because the data included in each

cluster change rapidly (see Fig. 6(b)).Second, we discuss the variation in the sum of dc(t) and

sum of dS(t) of test data (2) as shown in Fig. 7. In the testdata, dc(t) and dS(t) increased simultaneously at t = 14when the preference shifted. However, these increments were

small in comparison with the increments at t = 17. Two datain region (2) were very close to each other in a cluster at

t = 13 (see Fig. 4), and a datum in region (3) was added tothis cluster at t = 14 due to the preference shift. However,the centroid of the cluster moved over slightly and the

space covered by the data of the cluster was approximately

the same as before because these three data were close to

each other. This result indicated that the distribution or the

covariance of the data in each cluster must be considered in

future studies.

Next, the time-series variation of each feature of clusters

of test data (3) is shown in Fig. 8. The test data represent

simultaneous preference shifts of two preferences toward a

single preference. dc(t) and dS(t) increased rapidly at t = 17when the preference shift timing of this test data was set. In

the same way as the test data (1), it is possible to detect the

preference shift timing based on the time-series variation of

dc(t) and dS(t). However, the number of sample data for awindow and the constraint of the past clustering result must

be considered because the last datum that suits the initial

preference is merged into other clusters, and dc(t) and dS(t)increased at t = 20 and t = 24.

Finally, the time-series variation of each feature of clusters

of test data (4) is shown in Fig. 9. The test data showed

consistent preference with no preference shift, and it must be

confirmed whether dc(t) and dS(t) increased or not. Figure9 shows that v were consistent in comparison with Figs. 6,

7, and 8.

Overall, it is possible to detect the preference shift tim-

ing according to the rapid increases in distance between

the centroids dc(t) and the difference between the spacesdS(t). Moreover, these features would not change when a

Fig. 8. Transitions of sum ofdc(t) and dS(t) of the test data (3).

Fig. 9. Transitions of sum ofdc(t) and dS(t) of test data (4).

preference shift does not occur. Nevertheless, the distribution

or the covariance of data within a cluster must be discussed

further in future studies.

V. CONCLUSIONS

The purpose of this study was to increase sales oppor-

tunities by detection of the preference shift on e-commerce

sites and its triggers. In this paper, a method that applies

time-series clustering on preferred products was proposed

to detect the preference shift timing. The features of the

cluster were also discussed using three sets of artificial test

data to determine when the clustering result had changed. As

an experimental result, the preference shift timing could be

detected according to the time-series variations of distance

between the centroids and the difference between the spacesof two states of the same cluster. In future studies, the

number of sample data for a window and application of

constraints of past clustering results should be discussed.

Eventually, the capability of the proposed method to detect

the preference shift timing of actual users should also be

assessed in subjective experiments.

REFERENCES

[1] Adomavicius, G., Tuzhilin, E.: Toward the next generation of recom-mender systems: A survey of the state-of-the-art and possible exten-sions. IEEE Transactions on Knowledge and Data Engineering, Vol.17,No.6, pp.734749 (2005)


1589


6/6

[2] Aoki, K., Takagi, H.: Interactive ga-based design support system forlighting design in 3-d computer graphics. The transactions of theInstitute of Electronics, Information and Communication Engineers,Vol.81, No.7, pp.16011608 (1998)

[3] Fukui, K., Saito, K., Kimura, M., Numao, M.: Visualizing Dynamics ofthe Hot Topics Using Sequence-Based Self-Organizing Maps. LectureNotes in Artificial Intelligence, Vol.3684, pp.745751 (2005)

[4] Fukui, K., Saito, K., Kimura, M., Numao, M.: Compilation to Visu-alize the Dynamic Clusters by the Adapted Self-Organizing Network.

Transactions of the Japanese Society for Artificial Intelligence, Vol.23,No.5, pp.319329 (2008)[5] Ito, F., Hiroyasu, T., Miki, M., Yokouchi, H.: Discussion of Offspring

Generation Method for Interactive Genetic Algorithms with Consider-ation of Multimodal Preference. Simulated Evolution and Learning,Lecture Notes in Computer Science, Springer, Vol.5361, pp.349359(2008)

[6] Ito, F., Hiroyasu, T., Miki, M., Yokouchi, H.: Offspring GenerationMethod for interactive Genetic Algorithm considering Multimodal Pref-erence. Transactions of the Japanese Society for Artificial Intelligence,Vol.24, No.1, pp.127135 (2009)

[7] Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gordon, L.R.,Riedl, J.: Grouplens: applying collaborative filtering to usenet news.Communications of the ACM, Vol.40, No.3, pp.7787 (1997)

[8] Kleinberg, J., Karypis, G., Konstan, J., Reidl, J.: Bursty and hierarchicalstructure in streams. In: Proceedings of the 8th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining,

pp.91101 (2002)[9] Koufaris, M., Kambil, A., LaBarbera, P.A.: Consumer Behavior in

Web-Based Commerce: An Empirical Study. International Journal ofElectronic Commerce, Vol.6, No.2, pp.115138 (2002)

[10] Llora, X., Sastry, K., Goldberg, D.E., Gupta, A., Lakshmi, L.: Combat-ing user fatigue in igas: partial ordering, support vector machines, andsynthetic fitness. In: Proceedings of Genetic Evolutionary ComputationConference, pp.13631370 (2005)

[11] Mandel, N., Johnson, E.J.: When Web Pages Influence Choice:Effects of Visual Primes on Experts and Novices. Journal of ConsumerResearch, Vol.29, No.2, pp.235245 (2002)

[12] Mandel, N, Nowlis, S.M.: The Effect of Making a Prediction aboutthe Outcome of a Consumption Experience on the Enjoyment of ThatExperience. Journal of Consumer Research, Vol.35, No.1, pp.920(2008)

[13] Newman, M.E.J., Girvan, J.: Finding and evaluating communitystructure in networks. Physics Review E, Vol.69, Issue 2, 026113 (2004)

[14] Newman, M.E.J.: Fast algorithm for detecting community structure innetworks. Physics Review E, Vol.69, Issue 6, 066133 (2004)

[15] Palla, G., Derenyi, I., Farkas, I., Vicsek, T.: Uncovering the overlap-ping community structure of complex networks in nature and society.Nature, Vol.435, No.7043, pp.814818 (2005)

[16] Palla, G., Barabasi, A.L., Vicsek, T.: Quantifying social groupevolution. Nature, Vol.446, No.7136, pp.664667 (2007)

[17] Ratanamahatana, C.A. and Keogh, E.: Making Time-series Classi-fication More Accurate Using Learned Constraints. In: Proceedingsof SIAM International Conference on Data Mining (SDM 04), LakeBuena Vista, Florida, pp.1122 (2004)

[18] Sakaki, T., Matsuo, Y., Ishizuka, M.: Topic Extraction from ScientificPaper Database. In: Proceedings of the Annual Conference on theJapanese Society for Artificial Intelligence, Vol.20, 1A1-1 (2006)

[19] Sarwar, B., Karypis, G., Konstan, J., Reidl, J.: Item-based collaborativefiltering recommendation algorithms. In: Proceedings of the 10th

international conference on World Wide Web, pp.285295 (2001)[20] Takagi, H.: Interactive evolutionary computation: Fusion of the

capabilities of ec optimization and human evaluation. In: Proceedingsof the IEEE, Vol.89, No.9, pp.12751296 (2001)


1590

recommender by time shift

Documents